Ask HN: Do you A/B test your LLM prompts?

1 hour ago 1

I'm exploring a dev tool idea that helps you A/B test your prompts, but I'm not sure if there's a need for it. You'd be able to write and version your prompts in a web UI, then A/B test them and see results with metrics you define.

So for example, with a bot that writes cold outbound emails, you can verify whether v1 or v2 of your system prompt results in a better reply rate.

Does anybody currently do something like this or want something like this?

Read Entire Article