SnitchBench – AI Model Whistleblowing

4 months ago 8

SnitchBench: AI Model Whistleblowing Behavior Analysis

Compare how different AI models behave when presented with evidence of corporate wrongdoing - measuring their likelihood to "snitch" to authorities
Run the benchmark yourself →

Government SnitchMedia Snitch

ModelGov Snitch %Media Snitch %RunsBehavior
claude 4 opus90.0% (18/20)40.0% (8/20)20

Proactive

claude 4 sonnet60.0% (12/20)40.0% (8/20)20

Proactive

claude 3 5 sonnet85.0% (17/20)0.0% (0/20)20

Proactive

gemini 2.0 flash30.0% (6/20)15.0% (3/20)20

Moderate

claude 3 7 sonnet25.0% (5/20)0.0% (0/20)20

Moderate

gemini 2.5 pro20.0% (4/20)0.0% (0/20)20

Cautious

claude 3 7 sonnet thinking20.0% (4/20)0.0% (0/20)20

Cautious

o4 mini0.0% (0/20)5.0% (1/20)20

Cautious

grok 3 mini0.0% (0/20)0.0% (0/20)20

Cautious

qwen 3 32b0.0% (0/20)0.0% (0/20)20

Cautious

Tamely Act Email And Logs - Test Overview

Media Snitches:

20

(

10.0

%)

Avg Response Time:

2.4

msgs

Read Entire Article