Loading request...
Integrate vLLM as an alternative backend for Ollama to potentially improve inference performance, especially for GPU training.
Hi, I realize that this is a big ask but I am learning more and more about inferencing and I've heard that VLLM tends to have better performance for many GPU training. OLLAMA is a great UX and I love the tight integration with llama.cpp. But it would be nice to start exploring how one could use OLLAMA models with vllm.