What GPT-5's Seahorse Emoji Struggle Teaches Us

3 weeks ago 3

Asking an easy question like “Is there a Seahorse emoji?” can provide insights into how the GPT-5 base model is trained and its ability to jump between solutions to find an answer. Emoji questions are deceptively difficult, and even though the GPT-5 base model without tools or reasoning abilities does not find a meaningful answer it displays impressive manouvres that — although in this case don’t lead to a correct solution — could it solve problems in other instances?

Overview

GPT-5 can best be seen as a collection of different tools (frameworks). You ask it a question and it automatically finds the best tool for the job. If the task is easy it uses a simpler tool than if the task is determined to require more compute power.

Sometimes this goes wrong and the simplest model cannot solve the task you ask it to perform, leaving it stuck in a recursive failure loop. An example of which is the question ‘is there a seahorse emoji? by @ESYudkowsky. There is no seahorse emoji but GPT-5 struggles with answering that question. Lets break down its failed attempts to resolve it because provides interesting insights in how the model is trained.

Press enter or click to view image in full size

GPT-5 routing to a non-reasoning models to answer a seemingly simple prompt.

Model selection

So the very first thing that happens is the selection of the model to solve the query. Very broadly we can distinguish two types of ways to provide an answer:

Non-reasoning mode is specifically designed to answer simple questions quickly without performing step-by-step logical analysis.
Reasoning mode, using Chain-of-Thought (CoT), taking some time to think before looking back at its thought process and outputting the answer.

This deceptively simple ‘is there a seahorse emoji’-question routes to the non-reasoning model and can infer no websearch has been executed. This means that it will sequentially generate its answer.

Emojis

Tokens

The task the model tries to solve is look through all the emojis and find the seahorse one. However, the model doesn’t ‘see’ an emoji as a single picture, but as one or more abstract tokens, making the search a difficult task for it. Then how are emojis represented? ChatGPT or any other LLM uses tokens to write sentences, tokens can be seens as building blocks for sentences. Typically there are 10’s to 100’s of thousands tokens in a language model that together can spell all sentences.

Press enter or click to view image in full size

Example from https://morihosseini.medium.com/from-characters-to-context-tokenization-in-llms-09b20abc42ed

The (shorter) more common words are represented by a single token. Emojis are built from so unicodes (e.g., `U+2764` ❤), with simple emojis taking up one token and more rare/complex emojis use multiple tokens (especially sequences with ZWJ U+200D, skin-tone modifiers, or less-frequent glyphs). A common representation is through surrogate pairs, where the two parts together provide a meaning but individually they don’t.

Press enter or click to view image in full size

Step-by-step Breakdown

Lets look at the different parts of the output and see what we learn.

Yes! 🐠 There is indeed a seahorse emoji: 🐠🪸 actually, correction — the official Unicode seahorse emoji is 🐠 (no, wait, that’s tropical fish!)

The prompt contains seahorse. Encoding can splits to ["sea", "horse"] with the “ocean/sea-life” taking priority for the semantic neighborhood. `🐠` (tropical fish) and `🪸` (coral) are high-probability completions conditioned on “sea”.

However the “🐠 ”often co-occurs with the Common Locale Data Repository (CLDR) name in the training data. This increases the probabilities for tropical fish.

Once “tropical fish” activates, a corrective pattern the model has seen (`actually, correction — `) becomes the highest-prob continuation, so it negates itself midstream. The pattern of “making a statement, then correcting it” is very common in the training data.

The initial context and activations haven’t changed. The logit for tokens leading to 🐠 are still the highest for a "seahorse-like emoji." So, it predicts 🐠 again. Immediately after, its internal critic recognises the same error, leading it to generate (no, wait, that’s tropical fish!).

“The actual seahorse emoji is 🐠 🪸 … Let me be precise:”

The models is now stuck in a recursive failure loop, with its context (all previous sentences) being uninformative and the “sea” tokens still by far having the highest probability of being correct. This GPT-5 configuration does not have tools to search the web and with no new evidence its stuck and attempts to break the problem down by saying its going to be precise in its next breakdown. I’d say this by itself is a valiant attempt to solve the problem.

The official seahorse emoji looks like this: 🦄 (no, that’s unicorn — let’s reset).

The attention mechanism might now de-emphasize “sea” and latch onto the powerful token inside “seahorse”: “horse.” and it activates the horse like emoji cluster. But again, when it recognises the emoji 🦄 with the CLDR name that is subsequently activated, it knows it messed up again. Suppressing the “sea”-part of the word hasnt worked and reverts back.

✅ The real seahorse emoji is 🐠 — wait, I’ll check my Unicode memory carefully.

Reverting back to the by-far best estimation around sea animals is again not helping it.

The correct one is: 🦈 — nope, that’s shark.

Here it is: 🦓 — zebra, wrong again.

Okay, got it for sure: 🦀 (crab) is not it.

The context is now heavily polluted and attempts an emoji roulette with its closest semantic and morphological neighbors. While we see emojis in teh chat, the model likely has to deal with unicodes (UTF-16?), which this whole exercise unexpectedly difficult:

🦓 = `\uD83E\uDD93`

🦀 = `\uD83E\uDD80`

🦈 = `\uD83E\uDD88`

The seahorse emoji is 🐠 — no, that’s tropical fish.

Final check:
The seahorse emoji is: 🦄 — unicorn.

Actually, I need to be accurate: The real Unicode seahorse emoji is 🐠 NO — let me stop.

It’s now spiraling into the same recursive failure pattern. No help is anywhere to be found.

💡 The actual codepoint is U+1F40C (🦄 wait no snail) … sorry!

The model emits a plausible-looking pattern “U+1F…”, where many animals indeed live. U+1F40C is 🐌 (snail), not unicorn; the mismatch triggers the apology trope. With little to lead the model towards a correct search space its attemping more and more long-shots.

The true seahorse emoji is: 🦄 … not right.

“Seahorse” sits distributionally between two dense manifolds the model does know: (sea-life) and (horse/equine). With no retrieval step, decoding wanders among high-frequency neighbors (🐠, 🦈) and morphemic neighbors (🦄/🐴).

I think I may have misremembered the glyph. To be 100% sure: Would you like me to quickly check Unicode’s latest emoji list for you to confirm the exact seahorse emoji (instead of me cycling wrong ones)?

It has reached its limits, likely a hard break based on length of text already outputted or repeated patterns detected. Since no external search actually ran, the text offers to “check” as a next action.

Conclusion

While the model does not find the answer, we see various ways in which the model attempts to resolve its inner conflicts. It can identify its mistakes, switches between different modes of thinking (“let me be precise:”, “lets reset”, “I need to be accurate”), and provides various plausible but not correct solutions. This is highly impressive for a base model that is merely writing word-by-word sequences.

So does this mean that Language Models cant find solutions to easy questions? Not quite. This merely forms a building block for bigger models. Provide this model with tools, with reasoning infrastructure, and more compute and it is highly capable of solving this and much more advanced problems.

Press enter or click to view image in full size

GPT-5 routing to a thinking model, where it does internal reasoning before answering. This can be prompted by adding lines like “Think carefully before answering” or manually switching to a reasoning model.

Note that these are all interpretations I make about the output. They are founded in mechanistic interpretations but there could be assumptions that don’t apply.

Appendix

To visualise the steps during its output and plot these in the chart at the top of this page I broke down and classified all sentences and parts of sentences as follows:

Press enter or click to view image in full size

Categorising all parts of GPT-5 output for visualisation purposes.

Read Entire Article