| Ask HN: Agent evaluations, what is everything I should know? | ||
| 1 point by akira_067 4 minutes ago | hide | past | favorite | discuss | ||
I'm currently building coding agents, and wondering what the standard is for creating and running evals for most people? I gather that the tasks and their definitions will be dramatically different across domains and instances, so I'm not hoping for a one size fits all. Just... what actually works for you in practice? | ||
.png)

