Ask HN: Agent evaluations, what is everything I should know?

1 hour ago 2
Ask HN: Agent evaluations, what is everything I should know?
1 point by akira_067 4 minutes ago | hide | past | favorite | discuss

I'm currently building coding agents, and wondering what the standard is for creating and running evals for most people? I gather that the tasks and their definitions will be dramatically different across domains and instances, so I'm not hoping for a one size fits all. Just... what actually works for you in practice?


Read Entire Article