Evals are becoming part of the product manager craft, the same way reading a funnel chart or a SQL query was in 2015. AI agents don't behave like the products we learned to measure. Users type…

Evals are becoming part of the product manager craft, the same way reading a funnel chart or a SQL query was in 2015. AI agents don't behave like the products we learned to measure. Users type intent into a chat box, the agent calls tools and retrieves context, and the output changes from one run to the next. Click-and-form analytics never sees inside that. Evals are how you measure it: repeatable tests that score an agent's output against your quality bar and run on every change. The part most teams underrate is the last mile. A high pass rate tells you the model performed on a test set. It doesn't tell you whether good agent interactions drive retention, whether failures concentrate in your highest-value segments, or whether your most expensive queries are also your lowest-converting. You answer those by joining eval scores to product engagement under the same user identity. I put together a getting-started guide for PMs covering traces, LLM judges, offline vs online evals, and how to wire eval scores to outcomes. Link in the comments.

5 Comments

Darshil Gandhi 1w

Link to blog: https://amplitude.com/blog/ai-evals-for-product-managers

1 Reaction

Bhavesh Dhingra 1w

Eval scores alone don't close the last mile. Wiring them to traces, so you can see what the model received when it produced each score, is where the "why" surfaces.

Hiral Shah 3d

Love this, what a great explanation!

LinkedIn respects your privacy

Darshil Gandhi’s Post

More from this author

Using Agent Analytics, we found the eval signal that predicts 3x agent retention

Explore content categories