Users need a single, out-of-the-box observability tool that can perfectly merge heavy LLM backend telemetry (including rendered prompts, hidden scratchpads, sub-agent handoffs) with front-end UI video or DOM-state reconstruction. This integration is crucial for seeing the full debugging story in one place, as current trace tools often ignore the difficulty of debugging UI state alongside backend calls.
Who is building this? I’d love to know, because we need it. At Kite, one of our hardest internal problems is debugging and improving a multi-model agentic system. Kite is not one model with one prompt. It’s a collection of small agents, skills, tools, MCPs, memory, and orchestration logic all working together. When something goes wrong, the bug can be anywhere: the user conversation, retrieved context, a bad tool call, a sub-agent handoff, a rendered prompt, or the UI the user saw while all of this was happening. We’ve tried a few things. We used Braintrust, but the traces UI was not customizable enough for how we work. We now use Langfuse to inspect traces, and we have internal bots pulling from Langfuse and logs, but it is still too hard to see the full story in one place. We want one view where we can follow the conversation, inspect tool calls, read the actual rendered prompts each agent received, trace where context came from, and compare all of that with the user’s on-screen experience. That’s the missing piece. There are lots of observability tools now. That part is getting crowded. But there still seems to be a real gap in software for debugging agentic products the way product, design, and engineering teams actually need to debug them. If you’re building this, or know a team that is, I’d love to see it. I’d prefer not to build and maintain this, but use an off-the-shelf tool.