Improve and standardize tool calling mechanisms to enhance compatibility with frameworks like LiteLLM and OpenAI. The current manual tool template configuration creates friction for generic agentic use cases, such as autonomous coding with local GPU inference.
Hi team, first off—thank you for your incredible work. The performance and memory efficiency here are a massive step up from Ollama. It’s allowed me to run models on my GPU that were previously out of reach. ### The Challenge While the current documentation is helpful, the **manual tool template configuration** creates friction for generic "agentic" use cases eg autonomous coding with efficient local GPU inference. * **Compatibility:** Frameworks like [SWE-Agent](https://github.com/SWE-agent/) and [Open Hands](https://github.com/OpenHands/OpenHands) rely on the OpenAI-standardized tool-calling format via LiteLLM. * **Current State:** The "hacky" integration required for LiteLLM limits TabbyAPI’s accessibility for automated agent workflows. ### The Proposal: Pydantic-based Structured Output A LiteLLM integration/compatibility, where TabbyAPI is integrated in a similar manner as VLLM would be an extraordinary addition Currently LiteLLM doesnt support any local-inference engine tha