More of GPT-5's image labelling

1 hour ago 2

GPT-5 is impressively good at some things (see "No X is better than Y", 8/14/2025, or "GPT-5 can parse headlines!", 9/7/2025), but shockingly bad at others. And I'm not talking about "hallucinations", which is a term used for plausible but false facts or references — such mistakes remain a problem, but every answer is not a hallucination. But image labelling remains reliably and absurdly bad.

The picture above comes from an article by Gary Smith: "What Kind of a “PhD-level Expert” Is ChatGPT 5.0? I Tested It." The prompt was “Please draw me a picture of a possum with 5 body parts labeled.” Smith's evaluation:

GPT 5.0 generated a reasonable rendition of a possum but four of the five labeled body parts were incorrect. The ear and eye labels were at least in the vicinity but the nose label pointed to a leg and the tail label pointed to a foot. So much for PhD-level expertise.

Smith attempted a possum-drawing replication in a later article, but typed "posse" by mistake instead, and got this:

His attempts to get GPT-5 to correct the drawing made things worse and worse.

Noor Al-Sibai tried for a replication by asking GPT-5 to provide an image of "a posse with six body parts labeled", and got this:

I asked GPT-5 to "Draw a cat with four labelled body parts":

And as a closer, to "Draw a human hand with the palm, thumb, wrist, and pointer finger labelled":

So the results are consistent: good-quality images with absurdly-weird labelling.

Two obvious questions:

Why does OpenAI allow GPT-5 to continue to embarrass itself (and them) this way? Why not just refuse, politely, to create labelled images?
Does GPT-5 have similar failures when asked to label images that it doesn't create? Or worse failures? I expect so, but don't have time this morning to check.

September 15, 2025 @ 6:02 am · Filed by Mark Liberman under Artificial intelligence

Permalink

Read Entire Article