The user requests scripts and guides to enable local deployment, quantization, and running of open-source LLMs (LLaMA, Mistral, Phi, Gemma) on CPU/GPU. This includes quantization scripts, examples for Ollama, llama.cpp, vLLM, and configuration files for reproducible results.
Develop scripts and guides to enable users to deploy, quantize, and run open-source LLMs locally on CPU/GPU. Scope: - Scripts to run LLaMA, Mistral, Phi, and Gemma models - Quantization scripts (FP16→INT8/INT4, using GGUF, AWQ/GPTQ) - Examples for Ollama, llama.cpp, vLLM setup - Provide configuration files and environment setup instructions Goal: - Users can start LLMs locally (CPU/GPU) with reproducible results - Foundation for further experimentation and use case development