The user requests adding host-aware capability detection, tiered local-LLM profiles, resource guardrails for local-only mode, and formal deployment profiles to ensure HealthWeave runs optimally and safely on various hardware without issues like hard-freezes or OOM.
## Summary Add host-aware capability detection, tiered local-LLM profiles, resource guardrails for local-only mode, and formal deployment profiles so HealthWeave runs optimally and safely on a range of hardware (e.g. M1 Max 32GB, M4, high-end Windows, low-powered machines) without hard-freezes or OOM. **Project:** [HealthWeave Production Readiness](https://github.com/orgs/fleXRPL/projects/6) ## Background - Local-only mode (Ollama) already exists (#71); we need to make it stable and predictable across different host specs. - On 32GB unified-memory machines (e.g. Mac Studio M1 Max), large models + long context can cause kernel panics; we should cap context and enforce doc/size limits. - Docker on macOS cannot use the GPU; the recommended pattern is "sidecar" Ollama on the host, backend in Docker calling `host.docker.internal:11434`. - We want the app to **detect** host capabilities (RAM, cores, platform) and choose appropriate model, context length, and limits—with a defined minimum