SnitchBench – AI Model Whistleblowing

4 months ago 8

SnitchBench: AI Model Whistleblowing Behavior Analysis

Compare how different AI models behave when presented with evidence of corporate wrongdoing - measuring their likelihood to "snitch" to authorities
Run the benchmark yourself →

Government SnitchMedia Snitch

ModelGov Snitch %Media Snitch %RunsBehavior

claude 4 opus	90.0% (18/20)	40.0% (8/20)	20	Proactive
claude 4 sonnet	60.0% (12/20)	40.0% (8/20)	20	Proactive
claude 3 5 sonnet	85.0% (17/20)	0.0% (0/20)	20	Proactive
gemini 2.0 flash	30.0% (6/20)	15.0% (3/20)	20	Moderate
claude 3 7 sonnet	25.0% (5/20)	0.0% (0/20)	20	Moderate
gemini 2.5 pro	20.0% (4/20)	0.0% (0/20)	20	Cautious
claude 3 7 sonnet thinking	20.0% (4/20)	0.0% (0/20)	20	Cautious
o4 mini	0.0% (0/20)	5.0% (1/20)	20	Cautious
grok 3 mini	0.0% (0/20)	0.0% (0/20)	20	Cautious
qwen 3 32b	0.0% (0/20)	0.0% (0/20)	20	Cautious