Loading request...
User reports that Nemotron model logging is locked, even with vLLM logging enabled, making it difficult to debug and monitor the model's behavior.
- Nemotron model locks logging, we can't see even with vLLM logging enabled and python logging and verbose from C++ cuda enforcer. - They expect the model to follow KV cache based on their document or official press release, but you open modeling_nemotron_h.py: .forward() method (line 1695) expects cache_params, NOT past_key_values: def forward(self, …, cache_params=None, …)