Users want AI agents to automatically process files from their Downloads folder. This includes reading, classifying, associating with projects, filing into subfolders, and updating project indexes, eliminating manual filing of downloaded documents like PDFs, CSVs, and screenshots.
Most AI agent frameworks I see online are obsessed with tool-calling benchmarks, autonomous coding loops, or flashy one-shot demos. Two weeks ago I started daily-driving a personal assistant I've been building on **Claude Cowork**, and I'm already convinced the unsolved problems are somewhere else entirely — and almost no one is talking about them. I'm not going to share the name or identifying details (this is a personal system I use for my actual work). But I'll share what I've learned, because most of it wasn't obvious before I started. **The background:** I'm an executive at a mid-sized company. Every commercial AI assistant I tried was amnesiac. I wanted something closer to an actual chief of staff — persistent, opinionated, and aware of my context. **The stack:** The whole system runs on **Claude Cowork**, with **Google Workspace** (email, calendar, Drive, Chat) and **Notion** (tasks, projects, GTD) as the surrounding data layer. I started on Claude Pro, then upgraded to **Claude Max 20× ($200/month)** once the architecture outgrew the lower tier — the system now runs **15 scheduled background tasks** around the clock, plus the interactive sessions I have through the day. What that looks like in practice: the agent is **active 6–7 hours every day**. Roughly **2–3 of those hours are development** — debugging, iterating on skills, refactoring prompts, reviewing audit outputs, designing the next automation. I treat the system itself as an ongoing product. The other **4–5 hours are real work**: inbox triage, draft reviews, research delegations, decision support, report generation, meeting prep. The dev/real-work ratio will shift (more real work, less tinkering) — but I've decided the 30–40% overhead is worth it while the system is still maturing. Two weeks. That's how long this took to reach the state below. Most of the heavy lifting was architectural decisions, not code — Cowork's memory, scheduled tasks, skills, and MCP ecosystem did the technical work. I just designed the system on top. **What it is now:** * A persistent, file-based memory system with \~200 curated markdown entries, indexed by semantic topic — not a vector DB * **11 specialized sub-agents** (legal, finance, research, sales, operations, real estate, etc.) with a delegation matrix * A **development constitution** — a versioned governance doc for how the system evolves: which structural changes it can make autonomously when improving itself vs. which require my approval. Governs *how the system changes*, not individual task decisions. * A **distributed architecture**: always-on background sentinel (inbox scans, health checks, nightly closeouts, conflict scanning) + interactive node * A **self-improvement loop** that audits instruction files, researches new techniques, proposes a change plan, waits for approval, implements **The filesystem — the architectural choice nobody talks about:** The agent **lives inside a Dropbox folder**. Not as a UI feature — as its actual substrate. Everything is organized **first by project, then by artifact type**: every active project has its own folder with sub-folders for briefs, research, drafts, correspondence, contracts, and archived items. Cross-project stuff (memory, skills, scheduled-task logs, session transcripts, audit outputs) lives in dedicated top-level folders. When a new project starts, the agent spins up the folder skeleton. When a project closes, it moves to a cold-storage path and the index updates. **Inboxes → Outboxes — the system works like a pipeline:** On one side, multiple **inboxes**: * My work email + a dedicated shadow email for the agent itself * Chat messages (Google Chat) * A **Notion GTD inbox** where I drop raw tasks and unclassified items * A file dropzone in the shared folder * A daily working folder * A general triage inbox * And the one I haven't seen anyone else talk about: **my Downloads folder on every computer I use is redirected straight into the agent's inbox folder**. Every PDF, CSV, screenshot, contract, invoice, or random file I download during a workday automatically lands in the pipeline. The agent reads it, classifies it, associates it with the right project, files it into the matching subfolder, and updates the project index. I haven't manually filed a downloaded file in two weeks. On the other side, **outboxes**: * Email drafts * Notion tasks created and pages updated * Project deliverables in their project subfolder * Research reports, audit logs, session summaries, memory updates — each in its own structured destination Every task flows left-to-right. New items arrive in inboxes. The agent (or a scheduled task) routes them to the right downstream process — triage, memory extraction, project assignment, or drop. Whatever gets produced lands in a structured outbox with traceable provenance: which inbox item triggered it, which skill processed it, which decisions were made along the way. Nothing disappears into a black box. Everything is greppable. **The stuff I find genuinely unique:** **1. Graduated autonomy** — every action has authority level L0–L3. L0 silent. L1 done-and-logged. L2 propose-and-wait. L3 escalate immediately. This single idea killed 90% of "AI doing something stupid" problems. **2. Autolearn mechanism** — at the end of every session, the agent extracts what it just learned about me, my preferences, my corrections, and the decisions I made, then proposes memory updates before I close the chat. Here's what that actually looks like. Yesterday, mid-session, I corrected it — it had summarized an inbound email as a single bullet, and I said: *"Don't condense these further. One-line context plus any deadline explicitly called out."* At the end of that session it surfaced this: > I approved. Next day, it just did it — no repeat instruction needed. Multiply that by fifty corrections over two weeks, and you end up with a system that encodes how you actually work, instead of one you re-teach every conversation. This is the feature I appreciate most day-to-day. **3. Ambient knowledge capture** — this one surprised me the most. Whatever task I'm running — research, drafting, decision support, analysis — if the agent encounters a useful contact, a piece of context, a pattern, or a fact that could matter later, it evaluates whether it's worth preserving, and if yes, tucks it into the right memory file (silently at L0, or as a proposal at L1). A supplier name dropped in passing during a research task gets stored as a candidate entity. A deadline buried deep in an email gets surfaced. A stakeholder preference mentioned once gets linked to their profile. Nothing valuable leaks through the cracks, because memory capture is a side-effect of every task, not a separate ritual. **4. 4-level memory architecture** — hot session context (loaded now), persistent auto-memory (long-term knowledge base), transcript archive (full session logs for retrieval), and a mirror vault with graph view (concepts linked across files). Each level has different read/write rules and retention. Most "AI memory" I see online is just one of these four, implemented poorly. **5. Behavioral retrospective after every session** — the agent audits its own conversation against a dynamic rulebook of past mistakes. Score 0–100, specific quotes, specific corrections. Uncomfortable, but it's how it actually gets better. **6. 3-tier confidence memory writes** — new facts are sorted into auto-write, propose-to-me, or escalate-urgently. I stopped worrying about hallucinations polluting memory. **7. Scheduled conflict scanner** — hourly background task checking whether two instructions I gave on different days contradict each other. Caught me being inconsistent more times than I want to admit. **8. Portable across 4 machines** — cloud-synced filesystem + hardware fingerprinting, so it knows which computer I'm on and adjusts. **9. Scheduled inbox-to-memory extraction** — batch sweep of email and chat inboxes on a schedule, pulling out anything for long-term memory, routing it to the right file. (Distinct from #3: ambient capture happens during any task; this one is a systematic nightly sweep.) **The honest limitations:** * Memory governance is the hardest part. Five rewrites of garbage-collection rules before memory stopped bloating. * Scheduled tasks are fragile — anything depending on "the PC being awake at 2am" will eventually fail in embarrassing ways. * Self-improvement loops are seductive but easy to over-trust. I now require human approval on every structural change. **Why I'm posting:** I'll post weekly updates on what's breaking and what's working. **If there's interest, I'm happy to go deeper** on specific pieces — the development constitution and its versioning, the internal rule system, the dev process for new skills, the 11-agent delegation matrix, the autolearn mechanism, the ambient capture logic, the session retrospective rubric, the inbox/outbox routing rules, the Downloads-folder redirect setup. Tell me in the comments which of these you'd want to see first. Especially interested in hearing from anyone else building on Cowork (or similar platforms) — there's not much written about the architectural side of personal AI systems yet, and I think we're all reinventing the same wheels. Sry for long post. hm...