Users need a central registry for AI agents that displays the last successful run timestamp for each agent. This would allow for quick oversight of agent health and prevent silent failures, rather than requiring manual checks of individual digest pages.
Custom Agents dropped in Notion yesterday and I've already seen about forty screenshots of people's first email triage agent. That's not a dig — it's a good place to start. But I've been running a more complicated setup for a while now and figured it was worth documenting while everyone's actually paying attention to this stuff. **Fair warning: this is long and specific. I'm not going to explain what an AI agent is.** # The Setup Eleven agents. Each handles one slice of my work. **Inbox Manager** — work email, runs three times a day (7am, noon, 5pm). Scores emails 0–100, creates tasks, drafts replies, tracks SLA on retainer clients (48h standard, tightened to 24h for 🔴 At Risk). Flags scope creep and billing signals. Detects attachments — flags contracts, invoices, and proposals at score 70+. Reads the Health Grade property on Client records — At Risk clients get elevated priority automatically. Does the DMARC thing automatically. Writes a daily digest in a five-section scannable format: Flagged Items → Actions Taken → Routing Summary → Unclassified Emails → Escalations. **Personal Ops Manager** — same idea but for personal email. Completely separate. Files receipts to Home Docs with the merchant and amount extracted, manages trip dossiers with deduplication, escalates anything that looks like fraud or a suspicious login immediately. Tied into five personal calendars including medical and deliveries. Produces a Monday weekly summary that ranks the top five items by urgency heuristic: medical > financial > household > subscriptions. **GitHub Insyncerator** — daily sync of every issue, PR, and repo in my GitHub org into Notion. Propagates client and project tags downward, flags stale PRs (7+ days), marks items eligible for archival (90+ days), auto-closes linked tasks when a PR merges. Writes a machine-readable Sync Status line at the top of each report — ✅ Complete / ⚠️ Partial / ❌ Failed — so downstream agents can check data quality before proceeding. Triggers a scoped Client Repo Auditor spot-check when it detects ≥ 5 newly stale items in a single sync, with a minimum floor of 10 total stale items org-wide to suppress noise. **Client Repo Auditor** — weekly, deeper audit. Checks the Sync Status line on the most recent GitHub Sync digest before querying — adds a data quality notice if partial, shortens scope if failed. Makes sure repos are actually linked to the right projects, checks for stale branches (30+ days), pulls vulnerability summaries, flags projects with no linked docs. When @mentioned with a new-client context, runs a scoped audit for that client only rather than the full weekly audit. **Docs Librarian** — bi-weekly and monthly, different scope for each. Finds orphaned docs, classifies them via content inference when unambiguous, flags ambiguous cases. Handles retention — archives agent digest pages older than 90 days. After every run, writes a shared data snapshot with a machine-readable Snapshot Status line that the Client Health Scorecard reads. **Client Weekly Reporter** — every Friday. Pulls data from GitHub Items, Tasks, Time Log, and Docs filtered to that client. Always writes a report — even on zero-activity weeks. The Client Health Scorecard checks for this as part of its retainer perception risk scoring. **Home & Life Task Watcher** and **Template Freshness Watcher** — the former runs Monday mornings and flags overdue home projects; the latter is suspended until May. Neither has interesting architecture. **Time Log Auditor** — Friday afternoons. Missing weekday entries, retainer burn (flags at 80% and 100%), unbilled completed tasks, anomalies (8+ hour sessions, billable entries with no client, zero-rate entries). Writes a shared hours snapshot after every run. **Client Health Scorecard** — monthly. Runs an upstream data quality check at run start before pulling any data. Applies a freshness gate: verifies each snapshot timestamp is from the current run cycle (within 48 hours), not stale prior-month data — even if the status line says ✅ Complete. Scores each client across six dimensions: task health, SLA responsiveness, hours vs. budget, docs activity, GitHub activity, and retainer perception risk. Writes the Health Grade back to each Client record — which Inbox Manager reads for priority scoring. **Morning Briefing** — daily at 9am, after everything else. One page. Uses a signal-based pre-scan: checks page titles and first lines for keywords (⚠️, 🔴, "flagged", "anomaly"), and checks machine-readable status lines on producer digests. Distinguishes heartbeat-only runs (agent ran, nothing to report) from true missing digests (agent didn't run at all — failure). That's it. # The Stuff That Actually Makes It Work # Email routing before reasoning Both email agents query a Label Registry database at the start of each batch run. Labeled emails get handled deterministically — skip, file, flag — no scoring, no LLM call. This covers about **60% of incoming volume**. The other 40% goes through full reasoning. This was the single biggest credit optimization I made. It also makes behavior more predictable, which honestly matters as much as the cost savings when you're trusting an agent to file things without you watching. The registry is self-updating in a soft way: when either agent encounters a label it doesn't have a rule for, it writes a "Pending Review" row with a suggested rule. I approve or reject it. This has mostly eliminated the "agent didn't know what to do with this" failure mode. # Notion Mail auto-labeling closes the loop The Label Registry is powerful, but it's still reactive — agents discover unhandled labels at run time. Notion Mail auto-labeling moves it upstream. If the mail client has already labeled an email as `Receipt`, `Client: Acme`, or `GitHub Notification` before the agent's batch window opens, the Label Registry lookup becomes a direct key-value hit rather than inference. No scoring, no LLM call — just execute the rule. The practical effect: the 60% bypass rate isn't a ceiling. With a well-maintained Notion Mail label ruleset, that number pushes to **70–80%**. At three batch windows a day for Inbox Manager, that's a meaningful credit delta heading into paid credits in May. The one thing that has to hold: your Notion Mail label taxonomy needs to be consistent with your Label Registry rule names. If they diverge, you lose the determinism benefit entirely. # Shared data pages between agents Client Health Scorecard needs docs activity per client and hours per client. Rather than having it query those databases directly, Docs Librarian and Time Log Auditor each write a summary page after their runs. CHS reads those two pages. One read each. The alternative was CHS doing its own queries into the underlying databases — redundant, expensive, and fragile. If I ever change the Time Log schema, CHS breaks. This way, TLA is responsible for publishing a clean summary; CHS just reads it. I've started thinking of this as a **contract between agents**: you publish a known artifact, I read the artifact, neither of us cares about the other's internals. # The email agents can talk to each other — but there's a limit If Inbox Manager gets an email that genuinely doesn't resolve after full reasoning — it's truly ambiguous whether it's work or personal — it can @mention Personal Ops Manager for a second opinion. That's it, once per direction. If POM still can't resolve it, the item goes into a Needs Review queue and surfaces after 3 days. **Without this constraint it becomes a loop. I tested it. It becomes a loop.** There's also a re-escalation cap: unresolved items get re-surfaced at day 3 and day 6 — two re-escalations maximum. After that, the item moves to a permanent Needs Manual Review section. This prevents unbounded accumulation during travel or extended absence. # Morning Briefing doesn't actually read everything Every agent writes a digest and @mentions me on the page. Morning Briefing reads those pages — but it doesn't read them in full. It checks titles and first lines for signal keywords first. Clean digests get a single ✅ line. Only flagged ones get read fully. So the morning page looks something like: ✅ Inbox Manager — 14 emails, 3 tasks created ✅ Personal Ops Manager — clean ⚠️ GitHub Insyncerator — 3 stale PRs, Repo Auditor notified [⚠️ Partial] ✅ Home & Life Watcher ⚠️ Time Log Auditor — missing Wednesday; Acme at 82% burn That's what I'm actually reading. The full digests exist if I need to dig in. Every agent also writes a minimal **heartbeat record** on runs where there's nothing to report — just a timestamp and "nothing to report." This means Morning Briefing can tell the difference between an agent that ran clean and an agent that didn't run at all. Silent failures are the hardest kind to catch. # Credit Cost Is Not an Afterthought Notion moves to paid credits on May 4. $10 per 1,000 credits. I have 11 agents running somewhere north of **230 times a month**, with two of those doing heavy per-run reasoning. Every design decision above has a credit rationale attached to it — batch runs instead of per-message triggers, label routing eliminating 60% of email reasoning, the signal pre-scan in Morning Briefing, shared data pages to avoid redundant queries, Docs Librarian archiving digest pages after 90 days so query surfaces stay clean. In February I did a second optimization pass: removed per-email triggers from both email agents entirely (they were causing double-processing on some messages), consolidated to three batch windows for Inbox Manager and two for Personal Ops. Notion has a credits dashboard now and I'm watching it daily. # Things That Aren't Fully Solved **Error handling** is better than it was but not done. What's still missing is a central Agent Registry with a last successful run timestamp per agent, so I can see at a glance when something last completed successfully without reading individual digest pages. **The Label Registry is going to get bloated.** Pending Review rows accumulate and there's no automated cleanup. I'll probably hand that to Docs Librarian. **Calendar event deduplication is an open question.** Both email agents trigger on calendar events. If something is genuinely work-personal — a lunch with a client — both agents could create prep tasks for it. No elegant solution to this yet. # One Thing Worth Saying Explicitly None of the architecture here is specific to Notion. The shared data contracts, the deterministic routing before reasoning, the circuit breaker on inter-agent comms — those are general. What Notion gives you is a place where agents live in the same data model they're operating on, not bolted on from outside. That changes what's practical to build. But the principles travel. If you're building multi-agent systems anywhere — in n8n, in LangGraph, in whatever orchestration layer you prefer — the same questions apply: **what does each agent publish, what does each agent consume, and what happens when an upstream producer fails quietly.** Answer those and the rest is implementation detail. *The instruction templates for each of these agents are worth their own post. I'll write that up separately.*