Loading request...
RedHat AI mentions that they are planning to add support for advanced quantization methods like AWQ and KV cache quantization to vLLM.
What’s Coming Soon? Support for non-NVIDIA hardware, expansion into Mixture of Experts (MoE) and vision-language models, advanced quantization methods like AWQ and KV cache quantization, plus Quantization-Aware Training (QAT) and LoRA support. Join our vLLM Office Hours to suggest additional features (link in our X bio)! (4/6)