The user proposes adding support for deploying local LLMs using Ollama within the Helm chart, enabling users to run LLMs directly in their Kubernetes cluster for enhanced privacy, reduced latency, and greater control, especially for RAG use cases.
### Description: This issue proposes adding support for deploying local Large Language Models (LLMs) using Ollama within the Helm chart. This will enable users to run LLMs directly within their Kubernetes cluster, providing enhanced privacy, reduced latency, and greater control over their LLM infrastructure for RAG and other applications. ### Motivation: Running LLMs locally offers several advantages, especially for RAG use cases: Privacy: Data remains within the cluster, reducing the risk of sensitive information being exposed to external services. Latency: Eliminates network latency associated with external API calls, resulting in faster response times. Control: Users have full control over the LLM model, its configuration, and its resources. Cost: Avoids costs associated with using cloud-based LLM APIs. Ollama simplifies the process of running and managing LLMs locally, making it an ideal choice for this integration. ### Proposed Functionality: Optional Ollama