The Claude 4 series is here. Finally, Anthropic has given us the prized Opus, the model that became everyone’s darling overnight. After a year, we have the next iteration. Besides Opus, Anthropic also released Claude 4 Sonnet, a civilised version of 3.7 Sonnet.
How good are these models? They are good models, sir!
They certainly have the SOTA model smell, especially Opus. Until now, I have been heavy on Gemini 2.5 for coding tasks and OpenAI o3 for everything else. So, I have seen both the models up close, and now I am curious how Claude 4 Opus performs against the titans.
It’s a battle of titans. I have tested all the models on four different coding problems, and apart from outputs, we will also be testing the tasteful coder among them.

TL;DR
If you want to jump straight to the conclusion, when Claude Opus 4 is compared against the other two models, Gemini 2.5 Pro and OpenAI o3, Opus simply dominates and that too by a good margin in coding which you can see for yourself below in the comparison.
Claude 4 Opus leads in
- • Quality of code generation.
- • Prompt adherence.
- • ‘Taste’ in code generation.
- • Not tested here, but Opus has a much better personality.
Gemini wins when it comes to the price-to-performance ratio. o3 is mid at everything. Sorry, Sama!
If you are looking for a good AI coding assistant, maybe for your editor or in general, Claude Opus 4 is the best option if price is not an issue.
Brief on Claude 4 Opus
So, let’s get a quick overview of the Claude 4 Opus.
According to Anthropic, it’s the best model for coding, and apparently, it can code continuously for seven straight hours at the efficiency of a mid-senior developer. (Yikes!)

It has about a 200K token context window (not the numbers you might expect, but it is what it is), and it’s said to be the best model for coding. I expected this to have 1 million, but well, not bad.
Claude Opus 4 leads on the SWE-bench with a score of 72.5% and can reach up to 79.4% with parallel test-time compute.

As you can see, there has already been over a 10% improvement over Anthropic’s previous model, Claude 3.7 Sonnet.
This Claude 4 lineup also marks a 65% lower chance of the model using hacky and shortcut methods to get the job done.
The Claude team has shared this quick GitHub Actions integration with Claude Opus 4, in which you can see the model making changes on the PR and addressing feedback in real time.
The bombshell of a price

In Comparison, Gemini 2.5 Pro costs $1.25 (≤ 200 k-token prompts) or $2.50 (> 200 k) per million input tokens and $10 or $15 per million output tokens; and OpenAI o3 is priced at $10 per million input tokens (or $2.50 with cached input) and $40 per million output tokens.
Opus is a fair bit pricier than its counterparts. So, let’s see if it justifies its cost.
Coding Comparison
As you might have already guessed, this section compares Claude Opus 4 (SWE 72.5%), Gemini 2.5 Pro (SWE 63.2%), and OpenAI o3 (69.1%) in terms of coding.
All three models are coding beasts, so we won’t be testing them with easy questions. We’ll use tough ones and see how they perform head-on. One thing I will also account for is taste.
1. Particles Morph
Prompt: Link
Response from Claude Opus 4
You can find the code it generated here: Link
Here’s the output of the program:
This looks crazy good, and the fact that it was able to do this in one shot after thinking for about 100 seconds (1.66 minutes) is even crazier to me. The particles’ morph behaviour from one shape to another is exactly how I expected; it does not start from one point and morph to another shape, but right from the shape it’s in.
There is room for improvement, like the shapes aren’t 100% correct, but the overall implementation is rock solid!
Response from Gemini 2.5 Pro
You can find the code it generated here: Link
Here’s the output of the program:
This is not bad, but it’s not at the Claude Opus 4 level of quality. The shapes look poor and don’t meet my expectations. Is that how the bird looks? Seriously? The overall UI is also not up to par.
This is definitely not what I was expecting and somewhat disappointing from this model, but we’re comparing it (SWE bench 63.2%) to Claude Opus 4 (SWE bench 72.5%), and maybe that’s the reason.
I’ve noticed that after every new model is launched, the previous best model seems to fade compared to the new one. How fast the AI models are improving is just crazy.
Response from OpenAI o3
Code: Link
Here’s the output of the program:
The response we got from the o3 is even worse than from the Gemini 2.5 Pro. I was expecting more from this model, yet here we have it.
The particles don’t morph directly from their current shape; instead, they default to a spherical shape and then morph to the requested shape.
2. 2D Mario Game
Prompt: Link
Response from Claude Opus 4
You can find the code it generated here: Link
Program Output:
It did it in seconds. Implementing a whole 2D Mario game, which is super difficult in just seconds, is a pretty impressive feat.
And not just that, look at how beautiful the UI and the overall vibe are. This could be a solid start for someone trying to build a 2D Mario game in vanilla JS.
Response from Gemini 2.5 Pro
You can find the code it generated here: Link
Here’s the output of the program:
It is functional and pleasing, but it’s too minimal and also a bit buggy.
If you see the timer running at the top right, it’s not working correctly. (I am not familiar with this game, and this may be how it works.) Whatever, this doesn’t feel like a good output from a model considered this good.
Response from OpenAI o3
Code: Link
Here’s the output of the program:
O3 didn’t really do any good on this question. As you can see, it just looks like a prototype and not even a working game. It’s complete nonsense, and there’s no real Mario game here. It has many bugs, and there’s no way the game ends.
Disappointing result from this model, one more time!
3. Tetris Game
Prompt: Link
Response from Claude Opus 4
Code for the game: Link
Here’s the output of the program:
As you can see, we got a perfectly implemented Tetris game with vanilla HTML/CSS/JS in no time; I even forgot to keep track of it. It did it that fast.
It implemented everything I requested, including optional features like the ghost piece and high score persistence in local storage. You might not hear it, but it also implemented background theme music and the following three upcoming pieces.
Tell me, for real, how long would it take you if you were to code this all alone, with no AI models?
Response from Gemini 2.5 Pro
You can find the code it generated here: Link
Here’s the output of the program:
This one is equally good and works perfectly like the Claude Opus 4; even the UI and everything look nice. I love that it could devise a nice solution to this problem.
Response from OpenAI o3
You can find the code it generated here: Link
Here’s the output of the program:
This one’s interesting. Everything from the tetriminos falling to everything else seems to work fine, but there’s no way for the game to end. Once the tetriminos hit the top, the game is supposed to end, but it doesn’t, and the game is stuck forever.
Now, this could be an easy fix in the follow-up prompt, but this is a pretty simple question, so I decided to do it in one shot. It’s not that big of an issue, but still.
4. Chess Game
Prompt: You can find the prompt I’ve used here: Link
Response from Claude Opus 4
You can find the code it generated here: Link
Here’s the output of the program:
Now, this is out of this world. It implemented an entire chess game from scratch with no libraries. I had thought it would use something like Chess.js or any other external libraries, but there you have it, a fully working chess game, even though it misses some moves like “en passant” and some other specific moves.
Other than piece-specific moves, the move log calculates all the moves perfectly. This is pure insanity!
Response from Gemini 2.5 Pro
You can find the code it generated here: Link
Here’s the output of the program:
Gemini 2.5 Pro also decided to implement everything from scratch and has tried implementing other moves, such as “en passant,” instead of just piece-specific moves.
The game seemed fine overall, but the soul of chess is missing. The pieces are just there; they don’t move. This minor issue could easily be fixed in follow-up prompts, but the model did not.

You can find its updated code from the follow-up prompt here: Link
Response from OpenAI o3
You can find the code it generated here: Link
Here’s the output of the program:

OpenAI o3 took a more solid approach and decided to use Chess.js, which I’d prefer if I were looking to build a production-level Chess game, but the implementation didn’t really fit.
The external Chess.js imports didn’t work and are failing because it’s trying to use the undefined Chess object.
Conclusion
Did we get a clear winner here? Yes, and absolutely yes, and it’s Claude Opus 4.
Anthropic is doing some real magic with these Claude models. The Opus has taste and a pleasant personality that will make you talk to it. However, I like the Gemini 2.5 pro’s freely available status with fairly great rate limits.
Claude Opus 4 is expensive, but please use Opus if your company is footing the bills else Gemini 2.5 is your friend.
.png)


