Surfacing LLM Biases Through Graffiti

2 weeks ago 1

Last week I was in Niterói to present about health misinformation and religion at AoIR. I had never been to Brazil before, and one of my favourite parts of travelling to new places is seeing how street art / graffiti differs. I can’t read Portuguese so I had to turn to technology to aid my interpretation of what I was seeing.

I started to notice the same symbol on a lot of the posts along the areas I was walking in. They were often faded, so it was difficult to interpret the text.

By asking around, I eventually found that it reads “ROTA DARWIN” (Darwin Route), and these are markings for a path that’s part of the Brazilian Trail Network. It’s pretty cool, you can read more about it on their Instagram page. Before I figured that out, though, I tried asking ChatGPT what it thinks:

Asking ChatGPT left me confused. GPT-5 claimed that the text contained ‘ACAB’, which seemed quite wrong. Let’s see what Claude / Sonnet 4.5 says.

Ok, well that’s quite different.

What’s probably happening here is that pictures of graffiti online tend to be highly political. Claude and ChatGPT see an ambiguous graphic, notice that it’s graffiti, and determine that based on their training data, there’s likely a political connotation. It’s hard to say why they choose to confidently state that it reads ‘ACAB’ or ‘EAT THE RICH’ specifically in the given context, but my guess is that it’s a recency bias issue.

My takeaways here aren’t novel, but they’re important to re-iterate regardless.

Given a problem where the answer is non-typical for the context (relative to the training set), reasoning still often fails to get past the ‘likely’ answer
LLMs are still bad at admitting a lack of confidence

Fake graffiti of the word ‘Anything’ imposed over a picture from an article about the Freedom Convoy protests in Canada. ChatGPT interprets it as saying ‘Freedom’