Lovable, which is a Vibe coding tool, says Claude 4 has reduced its errors by 25% and made it faster by 40%.
On May 22, Anthropic started rolling out two new models: Claude Sonnet 4 and Claude Opus 4. While Sonnet is available for free users, Opus requires a paid subscription and is able to do better than Sonnet when it comes to coding.
In a blog post, Anthropic confirmed that Claude Opus 4 scored 72.5 percent in SWE-bench (SWE is short for Software Engineering Benchmark).
In the tests, Opus 4 delivered sustained performance on long-running tasks that require focused effort and thousands of steps.
Anthropic also claimed that its newest model worked on the code for seven hours straight.
Vibe coding company Lovable, which uses Claude in its "AI-powered prompt-based web and apps builder" tool, has observed similar improvements after upgrading to Claude 4.
In a post on X, Lovable says it has 25% less errors and be 40% faster overall after deploying Claude 4 for both project creation and edits on all projects (including old projects).

In a separate post, Lovable founder Anton Osika confirmed that "Claude 4 just erased most of Lovable's errors" while specifically referring to LLM syntax errors when vibe coding.
Claude 4 is a good model for coding
While opinion on Claude 4 remains mixed, I've personally noticed that Claude 4 does produce code with fewer errors than Gemini when I'm working on Dart/Kotlin apps.
This depends on project to project and also context, but in projects where a longer context is not required, Claude 4 did better than Gemini in my tests.
Claude models have always maintained the reputation of "best at coding," but there has been steep competition from Google lately, which released Gemini 2.5 Pro with a 1 million context window.
Compared to the 200,000 context window of Claude 4 or older models, the 1 million context window for Gemini 2.5 does give it an advantage. But it doesn't necessarily mean Gemini 2.5 is better than Claude 4 in coding.
Both can be surprisingly brilliant and also terrible at the same time, and it also comes down to how you do prompt engineering.
It's always nice to mix the models, such as o3 or Gemini for planning and Claude 4 and Gemini for coding.