The model currently unloads between requests, causing errors. A fix to keep the model loaded during consecutive requests would enhance usability.
I’m running LM Studio as a local API for a pipeline. The pipeline only calls the chat/completions endpoint; it doesn’t load or unload models. I’m seeing the model drop between requests so the next call fails. **What happens** 1. A chat completion runs and finishes normally (prompt processed, full response returned). 2. The next request starts right after (“Running chat completion on conversation with 2 messages”). (This is System and User Message's, this is the same for all calls) 3. That request fails with: * \[ERROR\] Error: Channel Error * Then: No models loaded. Please load a model in the developer page or use the 'lms load' command. So the model appears to unload (or the channel breaks) between two back-to-back requests, not after long idle. The first request completes; the second hits “Channel Error” and “no models loaded.” **Setup** * Model: qwen3-vl-8b, have tried 4b and 30b getting same issue * 10k Token set on RTX 3080, 32gb of ram * Usage: stateless requests (one system + one user message per call, no conversation memory). * No load/unload calls from my side, only POSTs to the chat/completions API. **Question** Has anyone seen “Channel Error” followed by “No models loaded” when sending another request right after a successful completion? Is there a setting to keep the model loaded between requests (e.g. avoid unloading after each completion), or is this a known issue? Any workarounds or recommended settings for back-to-back API usage? Thanks in advance. **Update (before I even got to post):** with debug logs: I turned on debug logging. The Channel Error happens right after the server tries to prepare the next request, not during the previous completion. Sequence: 1. First request completes; slot is released; “all slots are idle.” 2. New POST to /v1/chat/completions arrives. 3. Server selects a slot (LCP/LRU, session\_id empty), then: * srv get\_availabl: updating prompt cache * srv prompt\_save: saving prompt with length 1709, total state size = 240.349 MiB * srv load: looking for better prompt... found better prompt with f\_keep = 0.298, sim = 0.231 4. Immediately after that: \[ERROR\] Error: Channel Error → then “No models loaded.” So it’s failing during prompt cache update / slot load (saving or loading prompt state for the new request). Has anyone seen Channel Error in this code path, or know if there’s a way to disable prompt caching / LCP reuse for the API so it just runs each request without that logic? Using qwen3-vl-8b, stateless 2-message requests. Thanks.