User is experiencing tool call failures and poor code quality when using quantized models with Mistral Vibe and other CLIs. They need a more reliable combination of CLI and model for coding.
I've dipped my toe in the water with Mistral Vibe, using LM Studio and Devstral Small for inference. I've had pretty good success refactoring a small python project, and a few other small tasks. Overall, it seems to work well on my MacBook w/ 92GB RAM, although I've encountered issues when it gets near or above 100k tokens of context. Sometimes it stops working entirely with no errors indicated in LM Studio logs, just notice the model isn't loaded anymore. Aggressively compacting the context to stay under ~80k helps. I've tried plugging other models in via the config.toml, and haven't had much luck. They "work", but not well. Lots of tool call failures, syntax errors. (I was especially excited about GLM 4.7 Air, but keep running into looping issues, no matter what inference settings I try, GGUF or MLX models, even at Q8) I'm curious what my best option is at this point, or if I'm already using it. I'm open to trying anything I can run on this machine--it runs GPT-OSS-120B beautifully, but it just doesn't seem to play well with Vibe (as described above). I don't really have the time or inclination to install every different CLI to see which one works best. I've heard good things about Claude Code, but I'm guessing that's only with paid cloud inference. Prefer open source anyway. [This comment](https://old.reddit.com/r/LocalLLaMA/comments/1qt76qs/mistral_vibe_20/o314ydx/) on a Mistral Vibe thread says I might be best served using the tool that goes with each model, but I'm loathe to spend the time installing and experimenting. Is there another proven combination of CLI coding interface and model that works as well/better than Mistral Vibe with Devstral Small? Ideally, I could run >100k context, and get a bit more speed with an MoE model. I did try Qwen Coder, but experienced the issues I described above with failed tool calls and poor code quality.