Next.js AI Model Performance Evaluations

3 hours ago 1

Performance results of AI models and agents on Next.js code generation and migration, measuring success rate, execution time, token usage, and quality improvements.

Model Performance Results

Model

Total Evals

Success Rate

Avg Duration

Total Tokens

gpt-5-codex

50

42

%

42.80

s

186,082

claude-opus-4.1

50

40

%

29.47

s

165,810

glm-4.6

50

40

%

20.36

s

106,177

grok-4-fast-reasoning

50

38

%

6.02

s

137,439

grok-4

50

38

%

53.10

s

207,672

kimi-k2-turbo

50

38

%

4.13

s

82,567

gemini-2.5-pro

50

36

%

50.98

s

322,147

kimi-k2-0905

50

36

%

1.82

s

85,713

gpt-5

50

34

%

25.62

s

149,904

grok-4-fast-non-reasoning

50

34

%

3.81

s

131,962

claude-sonnet-4.5

50

32

%

11.14

s

139,310

claude-sonnet-4

50

32

%

10.27

s

134,302

claude-haiku-4.5

50

32

%

6.10

s

132,122

gemini-2.5-flash

50

32

%

7.52

s

159,274

qwen3-coder

50

32

%

0.78

s

89,090

qwen3-coder-plus

50

32

%

5.08

s

88,820

claude-3.7-sonnet

50

30

%

11.17

s

166,654

gpt-5-mini

50

30

%

17.15

s

132,010

qwen3-max

50

30

%

11.57

s

87,364

deepseek-v3.2-exp

50

30

%

26.77

s

109,837

gpt-oss-120b

50

28

%

1.39

s

109,730

gemini-2.0-flash

50

26

%

2.82

s

99,913

gpt-4o

50

26

%

4.77

s

81,569

gpt-4.1-mini

50

24

%

6.15

s

88,294

gemini-2.5-flash-lite

50

24

%

1.35

s

102,762

gemini-2.0-flash-lite

50

22

%

2.46

s

98,950

gpt-5-nano

50

14

%

21.29

s

194,587

gpt-4o-mini

50

12

%

6.85

s

85,563

Agent Performance Results

Agent

Total Evals

Success Rate

claude

50

42

%

cursor

50

30

%

codex

50

30

%

gemini

50

28

%

Read Entire Article