User finds that AI agents often write "good code that misses the point" due to misalignment. To counter this, he makes the agent define concrete, measurable success criteria and write the validation for the task *before* generating the actual code. This suggests a need for the AI assistant to natively support or be guided through this "validation-first" approach to ensure intent alignment.
Coding agents hardly ever write "bad code" anymore. They write good code that misses the point. Misalignment is the whole problem. The agent's code compiles, runs, and passes every check I throw at it, then turns out to do something subtly different from what I actually meant. That's been eating my time for two years, and the "reflex" (more review, more tests, more guardrails) doesn't actually fix it. Review catches the misalignment, sure, but the failure already happened at intent time. The agent never knew what 'done' actually meant, so it confidently built the wrong thing. So now I work backwards. Before the agent writes a single line of code, I make it define the success criteria for the task: concrete, measurable, tied to actual behavior. Then I make it write the validation itself: the script, the test, the check that proves those criteria are met. Only then does it implement. It's TDD adapted for agents. Same underlying insight (defining 'done' before you build forces you to understand the task), rebuilt around the way agents actually fail. They don't fail at coding. They fail at reading your mind. The validation-first plan replaces mind-reading with a contract. This is the substance of my talk at GOTO Accelerate Chicago this June: "The Validation-First Loop: How to Ship Production Code with AI Coding Assistants." 40 minutes walking through the full process on real codebases: what the success criteria phase actually looks like in practice and how to wire validation into the agent's plan before writing a single line of code. I'm really looking forward to this conference (June 22nd-24th) and my talk! There's something powerful about being forced to compress a year of figuring something out into 40 minutes that has to hold up for a live audience.