Comparing Claude 4 Opus, Gemini 2.5 pro and o3

4 months ago 168

The Claude 4 series is here. Finally, Anthropic has given us the prized Opus, the model that became everyone’s darling overnight. After a year, we have the next iteration. Besides Opus, Anthropic also released Claude 4 Sonnet, a civilised version of 3.7 Sonnet.

How good are these models? They are good models, sir!

They certainly have the SOTA model smell, especially Opus. Until now, I have been heavy on Gemini 2.5 for coding tasks and OpenAI o3 for everything else. So, I have seen both the models up close, and now I am curious how Claude 4 Opus performs against the titans.

It’s a battle of titans. I have tested all the models on four different coding problems, and apart from outputs, we will also be testing the tasteful coder among them.

TL;DR

If you want to jump straight to the conclusion, when Claude Opus 4 is compared against the other two models, Gemini 2.5 Pro and OpenAI o3, Opus simply dominates and that too by a good margin in coding which you can see for yourself below in the comparison.

Claude 4 Opus leads in

• Quality of code generation.
• Prompt adherence.
• ‘Taste’ in code generation.
• Not tested here, but Opus has a much better personality.

Gemini wins when it comes to the price-to-performance ratio. o3 is mid at everything. Sorry, Sama!

If you are looking for a good AI coding assistant, maybe for your editor or in general, Claude Opus 4 is the best option if price is not an issue.

Brief on Claude 4 Opus

So, let’s get a quick overview of the Claude 4 Opus.

According to Anthropic, it’s the best model for coding, and apparently, it can code continuously for seven straight hours at the efficiency of a mid-senior developer. (Yikes!)

It has about a 200K token context window (not the numbers you might expect, but it is what it is), and it’s said to be the best model for coding. I expected this to have 1 million, but well, not bad.

Claude Opus 4 leads on the SWE-bench with a score of 72.5% and can reach up to 79.4% with parallel test-time compute.

As you can see, there has already been over a 10% improvement over Anthropic’s previous model, Claude 3.7 Sonnet.

This Claude 4 lineup also marks a 65% lower chance of the model using hacky and shortcut methods to get the job done.

The Claude team has shared this quick GitHub Actions integration with Claude Opus 4, in which you can see the model making changes on the PR and addressing feedback in real time.

The bombshell of a price

In Comparison, Gemini 2.5 Pro costs $1.25 (≤ 200 k-token prompts) or $2.50 (> 200 k) per million input tokens and $10 or $15 per million output tokens; and OpenAI o3 is priced at $10 per million input tokens (or $2.50 with cached input) and $40 per million output tokens.

Opus is a fair bit pricier than its counterparts. So, let’s see if it justifies its cost.

Coding Comparison

As you might have already guessed, this section compares Claude Opus 4 (SWE 72.5%), Gemini 2.5 Pro (SWE 63.2%), and OpenAI o3 (69.1%) in terms of coding.

All three models are coding beasts, so we won’t be testing them with easy questions. We’ll use tough ones and see how they perform head-on. One thing I will also account for is taste.

1. Particles Morph

Prompt: Link

Implement a single‐file HTML+JavaScript demo using Three.js and WebGL that displays a cloud of GPU‐accelerated particles which the user can morph among several predefined shapes. Your code must import and use anything that's required. here's some that you might require: The demo must include: Particle System & Morph Targets Start with particles arranged as a sphere. Provide three alternative morph targets: A sphere A stylized bird in flight A human face A tree Smoothly tween particle positions when switching targets. Post‐Processing & Lighting Set up an EffectComposer with a RenderPass and an UnrealBloomPass. Use GammaCorrectionShader to correct final output. Add minimal ambient + directional light so morph targets are visible. Camera & Controls Use OrbitControls for mouse/touch rotation, pan, and zoom. Center the camera on the particle cloud. UI Sliders & Dat.GUI (or similar) Particle size (range 0.1 → 5.0) Rotation speed (−2.0 → 2.0 radians/sec) Particle color (hue selector or hex input) Bloom strength (0.0 → 3.0) Motion trail opacity (0.0 → 1.0) Morph target selector (Sphere, Face, Bird, Text) Performance Considerations Use a single BufferGeometry with custom attributes to drive GPU particle positions. Update positions in a ShaderMaterial vertex shader. No External Assets Procedurally generate geometry for the face mesh, bird, and text. You may use Three.js’s built-in TextGeometry or simple custom paths. Deliverables A single index.html file containing all HTML, CSS, and JS. Inline <script> and <style> tags only—no module bundler. Thorough comments explaining: How morph targets are built and interpolated Shader logic for GPU particle updates How postprocessing pipeline is assembled Remember: your goal is to implement a complete, runnable demo in one shot. Reason step‐by‐step to ensure you include every import, initialization, GUI control, morph target, and postprocessing pass correctly.

Response from Claude Opus 4

You can find the code it generated here: Link

Here’s the output of the program:

This looks crazy good, and the fact that it was able to do this in one shot after thinking for about 100 seconds (1.66 minutes) is even crazier to me. The particles’ morph behaviour from one shape to another is exactly how I expected; it does not start from one point and morph to another shape, but right from the shape it’s in.

There is room for improvement, like the shapes aren’t 100% correct, but the overall implementation is rock solid!

Response from Gemini 2.5 Pro

You can find the code it generated here: Link

Here’s the output of the program:

This is not bad, but it’s not at the Claude Opus 4 level of quality. The shapes look poor and don’t meet my expectations. Is that how the bird looks? Seriously? The overall UI is also not up to par.

This is definitely not what I was expecting and somewhat disappointing from this model, but we’re comparing it (SWE bench 63.2%) to Claude Opus 4 (SWE bench 72.5%), and maybe that’s the reason.

I’ve noticed that after every new model is launched, the previous best model seems to fade compared to the new one. How fast the AI models are improving is just crazy.

Response from OpenAI o3

Code: Link

Here’s the output of the program:

The response we got from the o3 is even worse than from the Gemini 2.5 Pro. I was expecting more from this model, yet here we have it.

The particles don’t morph directly from their current shape; instead, they default to a spherical shape and then morph to the requested shape.

2. 2D Mario Game

Prompt: Link

Build a playable Super Mario Bros-style platformer game in a single HTML file using HTML5 canvas, CSS, and vanilla JavaScript — no external libraries or assets. The final result must be clean, responsive, and fully self-contained (index.html only). The game should capture the core mechanics and feel of the classic Mario platformer: ✅ Core Gameplay Features Player Movement Smooth left/right movement with arrow keys or A/D Jumping with Space or W Gravity, momentum, and mid-air physics that feel like classic Mario Platforms & Level Layout Basic level design with ground tiles, floating platforms, gaps to jump across Background (simple parallax or scrolling optional) Put those floating bars in the sky as well like in a classic Mario game Enemies (like Koopas) Include roaming turtle-like enemies If the player touches them from the side/front, they take damage or die If the player jumps on top of them, the enemy is defeated Collision Detection Handle all collision logic cleanly: platforms, ground, ceilings, enemies Ensure precise bounding boxes for both player and enemies Scoring & HUD Show player score and coins collected Display number of lives Optional: a simple timer (like in classic Mario) Game States Game Over screen when player runs out of lives or falls Restart functionality Optional: Level complete condition (e.g., reach the right side or a flag) 🎮 Controls ← → or A/D — Move left/right Space or W — Jump (Optional: R — Restart game) 💅 Visual & UX Guidelines Use Canvas to render all visuals — tilemap, character, enemies, etc. Style with embedded <style> and script inside <script> — all in a single index.html Pixel-style visuals (blocky tiles, 8-bit feel) are preferred Use simple color fills, gradients, or inline SVG/CSS tricks to simulate sprite graphics Include simple animations: walking cycle, enemy movement, player death ✅ Deliverables A single HTML file with all logic, styles, and game assets embedded Functional game loop using requestAnimationFrame Clean, readable, and commented JavaScript code. Fully playable and enjoyable browser experience — no build steps or setup Remember: this is a Mario-inspired clone, so aim for gameplay that’s fun, responsive, and nostalgic — even with minimal graphics. Prioritize solid mechanics, working collision logic, and enemy behavior.

Response from Claude Opus 4

You can find the code it generated here: Link

Program Output:

It did it in seconds. Implementing a whole 2D Mario game, which is super difficult in just seconds, is a pretty impressive feat.

And not just that, look at how beautiful the UI and the overall vibe are. This could be a solid start for someone trying to build a 2D Mario game in vanilla JS.

Response from Gemini 2.5 Pro

You can find the code it generated here: Link

Here’s the output of the program:

It is functional and pleasing, but it’s too minimal and also a bit buggy.

If you see the timer running at the top right, it’s not working correctly. (I am not familiar with this game, and this may be how it works.) Whatever, this doesn’t feel like a good output from a model considered this good.

Response from OpenAI o3

Code: Link

Here’s the output of the program:

O3 didn’t really do any good on this question. As you can see, it just looks like a prototype and not even a working game. It’s complete nonsense, and there’s no real Mario game here. It has many bugs, and there’s no way the game ends.

Disappointing result from this model, one more time!

3. Tetris Game

Prompt: Link

Implement a single‐file HTML + JavaScript Tetris game using modern, clean rendering. The goal is to recreate the classic Tetris gameplay with smooth animations, sound effects, modern UI elements, and a polished user experience — all in one HTML file. You must use HTML5 canvas for rendering and vanilla JavaScript (no external libraries or frameworks). Use <script> and <style> tags within a single index.html file. Use no external assets. Core Features (Required) Tetromino Shapes Implement the 7 standard tetrominoes (I, O, T, S, Z, J, L), each with correct rotation behavior and bounding constraints. Game Loop & Physics Handle gravity (piece falling), lock delay, and collision detection. Implement a rotation system that supports wall kicks (Super Rotation System preferred). Input Controls Move left/right with ← → Rotate with ↑ Soft drop with ↓ Hard drop with spacebar Hold piece with Shift Reset game with R Pause/resume with P Scoring System Classic Tetris scoring: Single, Double, Triple, Tetris Soft drop bonus Hard drop bonus Track level and lines cleared Increase speed with level Next Queue & Hold Box Show next 3 upcoming pieces in a preview area Allow hold-and-swap for one tetromino at a time Line Clearing Animation When a line is cleared, animate its disappearance (fade or flash effect) Lines above should fall smoothly to fill the gap Visual & UX Details Canvas Rendering Animate piece movements and rotations Use grid background with slight color variation UI Layout Playfield in center Next queue on the right Hold box on the left Score, level, lines at the top or sides Buttons: Pause, Restart Theme Use a dark theme with neon-style colors Tetromino colors: I → Cyan O → Yellow T → Purple S → Green Z → Red J → Blue L → Orange Responsive Design Center canvas on screen Scale based on window size Optional fullscreen toggle with F key Advanced Features (Optional but Impressive) Ghost piece (shows where the tetromino will land) Combo counter and T-spin detection High score persistence using localStorage Background theme music (low volume loop, toggleable) Deliverables A single index.html file with all logic, styles, and assets embedded Clean, commented code with clear function separation No dependencies — just HTML, CSS, JS in one file Ensure all controls work as expected across modern browsers Remember: your goal is to implement a complete, beautiful, playable version of Tetris in one shot. Think through every aspect of the game logic, UI, controls, and visuals before coding. Build something fun and polished.

Response from Claude Opus 4

Code for the game: Link

Here’s the output of the program:

As you can see, we got a perfectly implemented Tetris game with vanilla HTML/CSS/JS in no time; I even forgot to keep track of it. It did it that fast.

It implemented everything I requested, including optional features like the ghost piece and high score persistence in local storage. You might not hear it, but it also implemented background theme music and the following three upcoming pieces.

Tell me, for real, how long would it take you if you were to code this all alone, with no AI models?

Response from Gemini 2.5 Pro

You can find the code it generated here: Link

Here’s the output of the program:

This one is equally good and works perfectly like the Claude Opus 4; even the UI and everything look nice. I love that it could devise a nice solution to this problem.

Response from OpenAI o3

You can find the code it generated here: Link

Here’s the output of the program:

This one’s interesting. Everything from the tetriminos falling to everything else seems to work fine, but there’s no way for the game to end. Once the tetriminos hit the top, the game is supposed to end, but it doesn’t, and the game is stuck forever.

Now, this could be an easy fix in the follow-up prompt, but this is a pretty simple question, so I decided to do it in one shot. It’s not that big of an issue, but still.

4. Chess Game

Prompt: You can find the prompt I’ve used here: Link

Response from Claude Opus 4

You can find the code it generated here: Link

Here’s the output of the program:

Now, this is out of this world. It implemented an entire chess game from scratch with no libraries. I had thought it would use something like Chess.js or any other external libraries, but there you have it, a fully working chess game, even though it misses some moves like “en passant” and some other specific moves.

Other than piece-specific moves, the move log calculates all the moves perfectly. This is pure insanity!

Response from Gemini 2.5 Pro

You can find the code it generated here: Link

Here’s the output of the program:

Gemini 2.5 Pro also decided to implement everything from scratch and has tried implementing other moves, such as “en passant,” instead of just piece-specific moves.

The game seemed fine overall, but the soul of chess is missing. The pieces are just there; they don’t move. This minor issue could easily be fixed in follow-up prompts, but the model did not.

You can find its updated code from the follow-up prompt here: Link

Response from OpenAI o3

You can find the code it generated here: Link

Here’s the output of the program:

OpenAI o3 took a more solid approach and decided to use Chess.js, which I’d prefer if I were looking to build a production-level Chess game, but the implementation didn’t really fit.

The external Chess.js imports didn’t work and are failing because it’s trying to use the undefined Chess object.

Conclusion

Did we get a clear winner here? Yes, and absolutely yes, and it’s Claude Opus 4.

Anthropic is doing some real magic with these Claude models. The Opus has taste and a pleasant personality that will make you talk to it. However, I like the Gemini 2.5 pro’s freely available status with fairly great rate limits.

Claude Opus 4 is expensive, but please use Opus if your company is footing the bills else Gemini 2.5 is your friend.

Read Entire Article