Implement built-in support to monitor all models running under the vLLM router by exposing relevant metrics in Prometheus format. This would allow centralized tracking of usage, performance, and health of each model.
### Describe the feature ## Description It would be great to have built-in support to monitor all models running under the vLLM router through the router itself, exposing relevant metrics in Prometheus format. This would allow users to track usage, performance, and health of each model centrally without needing to access each model instance directly. ## Motivation - Centralized monitoring simplifies observability across multiple models. - Easier integration with existing Prometheus + Grafana monitoring stacks. - Enables tracking of metrics such as request rates, latency, error rates, token usage, and resource usage per model through the router layer. ## Proposed Solution - Extend the vLLM router to collect and expose Prometheus metrics for every model it routes requests to. - Suggested metrics include: - Number of requests per model - Latency (request processing time) per model - Error counts per model - Number of tokens processed per model (input and output to