I want to be able to configure the Qwen model to behave like an instruct model more easily, specifically to disable the 'thinking' feature and adjust parameters for better performance on limited hardware.
Hey all, I have been trying to get Qwen 3.5 (Unsloth), under linux, to behave like an instruct model. I have tried a number of permutations, different Qwen 3.5 models, different quants, parameter after parameter. I even tried someone's Jinja template, which supposedly restored the /no\_thinking argument. Nothing has worked. Certainly, Qwen 3 instruct models work, but I am working with limited hardware. Here are the models I have tried: * Qwen3.5-27B-UD-Q8\_K\_XL.gguf * Qwen3.5-27B-Q5\_K\_M.gguf * Qwen3.5-9B-UD-Q3\_K\_XL.gguf * Qwen3.5-122B-A10BQ4\_K\_M-00001-of-00003.gguf * Qwen3.5-35B-A3B-Q4\_K\_M.gguf * Qwen3.5-35B-A3B-UD-Q5\_K\_XL.gguf My usual parameters are: `llama-server --no-mmap -ngl 99 -fa 1 --fit on --seed 3407 --temp 0.7 --top-p 0.8 --min-p 0.0 --top-k 20 --jinja --presence_penalty 1.5 --repeat-penalty 1` And I have tried the following options to disable thinking: * \--chat-template-kwargs '{"enable\_thinking":false}' * \--reasoning-budget 0 * \--chat-**template**\-file Qwen\\ 3.5\\ Jinja\\ template\\ –\\ Restores\\ no\_thinking\\ behavior.txt Any idea what is going on? What am I missing? Thanks!