When multiple developers or internal AI tools hit the same Ollama instance, it can lead to resource monopolization and lack of request tracking. A lightweight middleware layer that provides request logging and rate limiting would help manage shared server environments effectively.
I’ve been experimenting with running **local LLM infrastructure using Ollama** for small internal teams and agent-based tools. One problem I keep running into is what happens when **multiple developers or internal AI tools start hitting the same Ollama instance**. Ollama itself works great for running models locally, but when several users or services share the same hardware, a few operational issues start showing up: • One client can accidentally **consume all GPU/CPU resources** • There’s **no simple request logging** for debugging or auditing • No straightforward **rate limiting or request control** • Hard to track **which tool or user generated which requests** I looked into existing LLM gateway layers like LiteLLM: [https://docs.litellm.ai/docs/](https://docs.litellm.ai/docs/) They’re very powerful, but they seem designed more for **multi-provider LLM routing (OpenAI, Anthropic, etc.)**, whereas my use case is simpler: A **single Ollama server shared across a small LAN team**. So I started experimenting with a lightweight middleware layer specifically for that situation. The idea is a small **LAN gateway sitting between clients and Ollama** that provides things like: • basic request logging • simple rate limiting • multi-user access through a single endpoint • compatibility with existing API-based tools or agents • keeping the setup lightweight enough for homelabs or small dev teams Right now, it’s mostly an **experiment to explore what the minimal infrastructure layer around a shared local LLM should look like**. I’m mainly curious how others are handling this problem. For people running **Ollama or other local LLMs in shared environments**, how do you currently deal with: 1. Preventing one user/tool from monopolizing resources 2. Tracking requests or debugging usage 3. Managing access for multiple users or internal agents 4. Adding guardrails without introducing heavy infrastructure If anyone is interested in the prototype I’m experimenting with, the repo is here: [https://github.com/855princekumar/ollama-lan-gateway](https://github.com/855princekumar/ollama-lan-gateway) But the main thing I’m trying to understand is **what a “minimal shared infrastructure layer” for local LLMs should actually include**. Would appreciate hearing how others are approaching this.