AI coding agents currently have no memory between sessions, causing them to start from scratch each time. A system that hooks into the agent lifecycle to build and query a persistent knowledge base would improve efficiency and reduce the 'cold start' problem.
I've been working on a problem that comes up when using AI coding agents in multi-agent setups: \*\*they have no memory between sessions\*\*. Each agent spawns, does work, and disappears. The next agent with the same role starts from zero. I built a system to solve this and wanted to share the technical approach, since the architecture might be interesting even if you're not using Claude Code specifically. **The core idea:** hook into the agent lifecycle at 7 points (session start/end, agent spawn/stop, task completion, pre-tool-use, idle) and use those hooks to build and query a persistent knowledge base. **Storage: SQLite + FTS5** Each project gets its own SQLite database with tables for sessions, tasks, decisions, file ownership, and a knowledge base. The KB uses FTS5 with Porter stemming for full-text search. When a new agent spawns, the system queries by role name + task description, ranks results by relevance, deduplicates, and injects the top results within a token budget (default 3000 tokens). The relevance scoring considers: FTS5 match score, role affinity (same role gets a boost), recency decay, and task description similarity. **Output filtering: command-aware token reduction** The other half of the system is an output filtering pipeline for shell commands. AI agents burn through their context window fast when every \`git log\`, \`npm test\`, or \`docker build\` dumps thousands of lines of raw output. **The filter is an 8-stage pipeline:** 1. ANSI escape code stripping 2. Regex replacements (chainable) 3. Short-circuit pattern matching (e.g., \`git push\` success → \`ok main\`) 4. Line-level keep/strip by regex per command type 5. Per-line length truncation 6. Head/tail with omission markers 7. Absolute line cap 8. Empty-output fallback messages For specific commands, there are specialized parsers — for example, the pytest parser is a state machine that tracks header → progress → failures → summary states, extracts only failing test names + assertion lines, and caps at 5 failures. Similar parsers exist for Jest, Cargo test, Go test, and build tools. There's also a log deduplication system that normalizes timestamps, UUIDs, hex values, and paths before deduplicating, then groups by severity level. A kubectl log dump of 10,000 lines might come back as 15 lines with \`\[x42\]\` multipliers. Raw output is always preserved in the indexed KB — the filtering only affects what enters the agent's context window. **Constraints I set for myself:** \- Zero external Python dependencies (stdlib only — sqlite3, subprocess, hashlib, json, pathlib) \- All hooks are fire-and-forget with timeouts — a hook failure never crashes the agent session \- Fully local, no network calls, no telemetry \- Graceful degradation everywhere — if a filter fails, raw output passes through **Results:** On real projects, the output filtering reduces token consumption by 80-99% per command, and the memory system eliminates the "cold start" problem from the second session onwards. The biggest win is on noisy commands like \`npm install\` **(90%+)** and \`git push\` **(98%)** where almost all the output is transfer stats and progress bars that agents don't need. The whole thing is \~2500 lines of Python + \~1500 lines of JavaScript. Open source (MIT): [Gr122lyBr/claude-teams-brain: Give your Claude Code Agent Teams a memory. Auto-injects role-specific context into every new teammate — your team never starts blind again.](https://github.com/Gr122lyBr/claude-teams-brain) Curious if anyone has tackled similar problems with different approaches, or sees obvious improvements I'm missing.