I'm exploring a dev tool idea that helps you A/B test your prompts, but I'm not sure if there's a need for it. You'd be able to write and version your prompts in a web UI, then A/B test them and see results with metrics you define.
So for example, with a bot that writes cold outbound emails, you can verify whether v1 or v2 of your system prompt results in a better reply rate.
Does anybody currently do something like this or want something like this?
.png)

