Performance results of AI models and agents on Next.js code generation and migration, measuring success rate, execution time, token usage, and quality improvements.
Model Performance Results
gpt-5-codex | 50 | 42 % | 42.80 s | 186,082 |
claude-opus-4.1 | 50 | 40 % | 29.47 s | 165,810 |
glm-4.6 | 50 | 40 % | 20.36 s | 106,177 |
grok-4-fast-reasoning | 50 | 38 % | 6.02 s | 137,439 |
grok-4 | 50 | 38 % | 53.10 s | 207,672 |
kimi-k2-turbo | 50 | 38 % | 4.13 s | 82,567 |
gemini-2.5-pro | 50 | 36 % | 50.98 s | 322,147 |
kimi-k2-0905 | 50 | 36 % | 1.82 s | 85,713 |
gpt-5 | 50 | 34 % | 25.62 s | 149,904 |
grok-4-fast-non-reasoning | 50 | 34 % | 3.81 s | 131,962 |
claude-sonnet-4.5 | 50 | 32 % | 11.14 s | 139,310 |
claude-sonnet-4 | 50 | 32 % | 10.27 s | 134,302 |
claude-haiku-4.5 | 50 | 32 % | 6.10 s | 132,122 |
gemini-2.5-flash | 50 | 32 % | 7.52 s | 159,274 |
qwen3-coder | 50 | 32 % | 0.78 s | 89,090 |
qwen3-coder-plus | 50 | 32 % | 5.08 s | 88,820 |
claude-3.7-sonnet | 50 | 30 % | 11.17 s | 166,654 |
gpt-5-mini | 50 | 30 % | 17.15 s | 132,010 |
qwen3-max | 50 | 30 % | 11.57 s | 87,364 |
deepseek-v3.2-exp | 50 | 30 % | 26.77 s | 109,837 |
gpt-oss-120b | 50 | 28 % | 1.39 s | 109,730 |
gemini-2.0-flash | 50 | 26 % | 2.82 s | 99,913 |
gpt-4o | 50 | 26 % | 4.77 s | 81,569 |
gpt-4.1-mini | 50 | 24 % | 6.15 s | 88,294 |
gemini-2.5-flash-lite | 50 | 24 % | 1.35 s | 102,762 |
gemini-2.0-flash-lite | 50 | 22 % | 2.46 s | 98,950 |
gpt-5-nano | 50 | 14 % | 21.29 s | 194,587 |
gpt-4o-mini | 50 | 12 % | 6.85 s | 85,563 |
Agent Performance Results
claude | 50 | 42 % |
cursor | 50 | 30 % |
codex | 50 | 30 % |
gemini | 50 | 28 % |