Request to add an optional local embedding pass using a quantized model like Gemma 4 (via `llama.cpp` or `ollama`) to generate `semantically_similar_to` edges without API calls, reducing cost and subjectivity.
## Summary Add an optional local embedding pass using a quantized model — leading candidate is **Gemma 4** (Q4/Q8 via `llama.cpp` or `ollama`) — to generate `semantically_similar_to` edges across all nodes without any API calls. ## Motivation Currently, semantic similarity edges come from Claude's judgment during extraction — one pass per file, subjective, and costs API tokens. A local embedding pass would: - Generate embeddings for every node (label + docstring) after the AST and semantic passes - Add cosine-similarity edges above a configurable threshold, marked `INFERRED` - Make cross-file concept linking exhaustive rather than sampled - Work fully offline, cached per-node alongside the existing SHA256 file cache - Cost zero API tokens after the initial model download The two approaches complement rather than replace each other — Claude finds the *interesting* cross-cutting edges, local embeddings find the *exhaustive* ones. Both end up in the same graph. ## Design **Model**: