Loading request...
User is experiencing issues with the Qwen3.5 model's reasoning capabilities and is looking for a way to disable its 'thinking' feature effectively. They want better control over the model's reasoning behavior.
I've tried --chat-template-kwargs '{"enable\_thinking": false}' and its successor --reasoning off in llama-server, and although it works for other models (I've tried successfully on several Qwen and Nemotron models), it doesn't work for the Qwen3.5 27B model. It just thinks anyway (without inserting a <think> tag, but it finishes its thinking with </think>). Anybody else have this problem / know how to solve it? llama.cpp b8295