For some people, it seems, AI is an amazing machine which - while fallible - represents an incredible leap forward in productivity.
For other people, it seems, AI is wrong more often than right and - although occasionally useful - requires constant supervision.
Who is right?
I recently pointed out a few common problems with LLMs. I was discussing this with someone relatively senior who works on Google's Gemini. I explained that every time I get a Google AI overview it is wrong. Sometimes obviously wrong, sometimes subtly wrong. I asked if that was really the experience of AI Google wanted to promote? My friend replied (lightly edited for clarity):
I find AI Overview to be helpful for my searches and my work. I use it all the time to look up technical terms and hardware specs.
I, somewhat impolitely, called bullshit and sent a couple of screenshots of recent cases where Google was just laughably wrong. He replied:
Interesting. We are seeing the opposite.
Why is that?
I'll happily concede that LLMs are reasonable at outputting stuff which looks plausible and - in many cases - that's all that's necessary. If I can't remember which command line switch to use, AI is easier than crappy documentation. Similarly, if I don't know how to program a specific function, most AIs are surprisingly decent at providing me with something which mostly works.
But the more I know about something, the less competent the AI seems to be.
Let me give you a good example.
At my friend's prompting, I asked Gemini to OCR an old newspaper clipping. It is a decent resolution scan of English text printed in columns. The sort of thing a million AI projects have been trained on. Here's a sample:

So what did Gemini make of it when asked to extract the text from it?
Children at Witham's Chip-ping Hill Infants School are en-
gaged in trying out all sorts of
imaginations ready for October
31... "And god knows what
strange spirits will be abroad."
That reads pretty well. It is utterly wrong, but it is convincing. This isn't a one-off either. Later in the clipping was this:

I'm sure a child of 6 could read that aloud without making any mistakes. Is Gemini as smart as a 6-year-old?
All the children say halloweenis fun. So it is for 6-year-old
Joanne Kirby admits she will be
staying up to watch on October
31, just in case. She has made a
paper "witch," to "tell stories
about witches," she said.
Again, superficially right, but not accurate in the slightest.
There were half a dozen mistakes in a 300 word article. That, frankly, is shit. I could have copy-typed it and made fewer mistakes. I probably spent more time correcting the output than I saved by using AI.
Boring old Tesseract - a mainstay of OCR - did far better. Yes, it might occasionally mistake a speck of dust for a comma or confuse two similar characters - but it has never invented new sentences!
Like a fool, I asked Gemini what was going on:

This isn't just a problem with Gemini - ChatGPT also invented brand-new sentences when scanning the text.
All the children say Halloween is fun, rather than frightening. Six-year-old Joanne Kirby admits she will be “a scary little witch” on the night, but she does like ghost stories.
So what's going on?
A question one has to ask of any source, including LLMs but also newspapers, influencers, podcasts, books, etc., is "how would I know if they were wrong?"This is not a prompt to doubt everything – down that path is denialism – but about reflecting on how much you rely on even "trusted" sources.
— Adrian Hon (@adrianhon.bsky.social) 2025-06-17T15:39:06.772ZWith OCR, it is simple. I can read the ground-truth and see how it compares to the generated output. I don't have to trust; I can verify.
I suppose I mostly use AI for things with which I have a passing familiarity. I can quickly see when it is wrong. I've never used it for, say, tax advice or instructions to dismantle a nuclear bomb. I'd have zero idea if the information it spat back was in any way accurate.
Is that the difference? If you don't understand what you're asking for then you can't judge whether you're being mugged off.
Or is there something more fundamentally different between users which results in this disparity of experience?
.png)

