A user is exploring whether to buy a second GPU to handle larger context sizes with GGUF models. They are looking for support for dual GPU setups to improve performance and context handling.
Hey! My PC: Ryzen 9 5950X, RTX 5070 Ti, 64 GB RAM, ASUS Prime X570-P motherboard (second PCIe x4) I use LLM in conjunction with OpenCode or Claude Code. I want to use something like Qwen3 Coder Next or Qwen3.5 122b with 5-6-bit quantisation and a context size of 200k+. Could you advise whether it’s worth buying a second GPU for this (rtx 5060ti 16gb? Rtx 3090?), or whether I should consider increasing the RAM? Or perhaps neither option will make a difference and it’ll just be a waste of money? On my current setup, I’ve tried Qwen3 Coder Next Q5, which fits about 50k of context. Of course, that’s nowhere near enough. Q4 manages around 100–115k, which is also a bit low. I often have to compress the dialogue, and because of this, the agent quickly loses track of what it’s actually doing. Or is the gguf model with two cards a bad idea altogether?