Sonnet 4.5 in Claude Code vs. GPT-5 High in Codex CLI: Why Speed Won for Me

1 month ago 2

October 1, 2025 (updated October 3)

TL;DR. We obsess over giving LLMs the right context: relevant files, docs, prompts. Meanwhile, we completely ignore the cost of destroying our own context through task switching while waiting for slower tools.

We obsess over LLM context. Give it the right files. Better documentation. Relevant background information. Prompt engineering. Context windows of 200K, 1M, 2M tokens.

We're becoming experts at managing their context. Meanwhile, we completely ignore the cost of destroying ours.

Here's what actually happens when you use a slower coding tool: You ask a question. Maybe you need help with refactoring, or understanding why a test is failing. You know the response will take a few minutes. So you switch. Start another task in a separate terminal. Scan HN. Reply to an email.

Then the response comes back.

Now you have to reload: What was I doing? Which file was I in? What was the bug? What had I already tried?

This reload cost (the mental overhead of reconstructing where you were) is exactly what we know happens with context switching. But somehow we forgot this applies to us while we wait for our tools.

I accidentally ran this experiment on myself. In August, I switched from Claude to Codex with GPT-5. I ran A/B tests, and Codex won on accuracy. It took several times longer, but I rationalized: "While it's working, I'll just switch to another task." I did this for a few weeks.

A couple of days ago, Claude Code 2.0 came out. I gave it another shot.

The difference was immediate. Not in accuracy, but in something I hadn't been paying attention to: I wasn't switching tasks while waiting. I stayed in the problem. The mental model stayed loaded. My personal context stayed intact.

I still find Codex more accurate on individual tasks. But the whole approach of "give it a task and switch to another one" works poorly for me.

This isn't about being impatient. It's about protecting flow state. When you're deep in a problem, you have this fragile graph in working memory: how the components connect, where the data flows, what you just learned from the last error. Every task switch shatters that graph.

We've been obsessing over optimizing context for our tools.

It's time to start protecting context for ourselves.

Context is king. Both kinds.

Read Entire Article