Loading request...
Need a way to measure conversation quality and user satisfaction in AI-powered features, as current methods are insufficient and manual tracking is not scalable.
I'm a PM on an AI agent, and there is this gap between what engineering monitors and what I actually need to know. They've got traces and error rates covered but that doesn't tell me anything about whether users are actually getting helped or just going in circles. I end up reading transcripts manually which obviously doesn't scale. Has anyone found a good way to track stuff like where users drop off, where the agent confidently gives wrong answers, or where people just give up and leave? or is everyone just spot checking and hoping for the best