Kimina Prover – Test-time RL to reach 92.2% on miniF2F

4 months ago 5

Numina & Kimi Team

Figure 1: Performance comparison of theorem proving models on the miniF2F-test dataset.

We're excited to announce the release of Kimina-Prover-72B, our state-of-the-art theorem proving model trained with the Kimi k1.5[1] RL pipeline based on Qwen2.5-72B [2]. Alongside it, we are also releasing two distilled variants: Kimina-Prover-Distill-8B and 1.7B (based on Qwen3-8B and Qwen3-1.7B[3] respectively).

Our key innovations include:

Test-Time Reinforcement Learning Search: A trainable agentic proving framework that enables the model to recursively discover, combine and apply multiple lemmas to construct complex proofs, building on a novel lemma-enabled pattern.
Error-Fixing Capability: Kimina-Prover can read and interpret Lean’s error messages and propose targeted fixes, demonstrating significantly higher sample efficiency compared to regenerating proofs from scratch.

These advancements enable Kimina-Prover to solve challenging mathematical problems and surpass prior methods. As shown in Figure 1, on the widely used miniF2F benchmark, Kimina-Prover achieves a state-of-the-art pass rate of 92.2%.

We focus on automated theorem proving (ATP) in the Lean 4 language, aiming to automate the construction of formal mathematical proofs. Recent advances in neural theorem proving have significantly improved the ability of AI systems to assist with or automate this process. Notable progress includes AlphaProof[4] from Google DeepMind, which demonstrated strong performance on problems at the level of the International Mathematical Olympiad. Open-source systems such as DeepSeek-Prover-V2[5], which incorporates reinforcement learning, have also achieved state-of-the-art results. In addition, neuro-symbolic agentic approaches like DSP+[6] have shown that competitive performance is possible without large-scale training by leveraging off-the-shelf models in a modular framework.

Our earlier work of Kimina-Prover Preview[7] introduced a large language model for formal theorem proving in Lean, establishing a new performance baseline on the miniF2F benchmark. Trained with a large-scale reinforcement learning pipeline, the model adopted a reasoning-driven exploration paradigm and demonstrated that larger models can serve as stronger formal reasoners. Its structured reasoning pattern enabled efficient proof search and emulated human-like problem-solving strategies.

Following this initial success, we continued to improve the model through further iterations of reinforcement learning. However, single-step reasoning remains insufficient for solving complex problems that require long, multi-stage proofs. To address this limitation, we introduce the Test-Time Reinforcement Learning (TTRL) search framework, which enables the model to discover, combine, and reuse multiple intermediate lemmas autonomously. This framework supports deeper, long-horizon reasoning by decomposing hard problems into reusable subcomponents.

A key ingredient of TTRL search is the lemma-enabled pattern, which allows the model to identify and apply intermediate lemmas as part of its proof construction process. This structured reuse of intermediate results significantly expands the model’s problem-solving capability beyond single-step generation.

To further enhance robustness, we also integrate an error-fixing mechanism that interprets Lean’s error messages and proposes targeted corrections. This allows the model to refine its outputs through iterative feedback, improving proof reliability and overall sample efficiency.

A New State-of-the-Art

Model pass@1 pass@32 pass@1024

Kimina-Prover-1.7B	46.7	73.4	—
DSP+	52.5	71.3	80.7
DeepSeek-Prover-V2-7B	58.6	75.6	79.9
Kimina-Prover-8B	61.1	78.3	—
DeepSeek-Prover-V2-671B	61.9	82.4	86.6
Kimina-Prover-72B	63.9	84.0	87.7

Table 1: Performance comparison of theorem proving models on the miniF2F-test dataset under equivalent sampling budgets. Kimina-Prover-72B achieves state-of-the-art performance across all evaluated settings.

As the combination of the proposed techniques results in significant improvements in formal theorem proving performance. On the miniF2F benchmark, Kimina-Prover achieves a pass rate of 84.0% with pass@32 and 86.4% with the addition of a single round of error correction. With pass@1024, the pass rate reaches 87.7%. Applying the full Test-Time Reinforcement Learning (TTRL) search framework yields a final pass rate of 92.2%, with an estimated pass upper bound of approximately 42,000. However, this pass budget can be substantially optimized in future versions, as a large portion of the current sampling is spent on proving unhelpful or redundant lemmas.

Notably, these results indicate a shift in the scaling behavior of the prover. While earlier versions exhibited approximately linear improvements in log scale with increasing sampling budget, the current system shows diminishing returns beyond pass@1024. This suggests that further gains are less dependent on increased sampling and instead require more sophisticated search strategies, such as those introduced by TTRL.

Lemma enabled pattern

The lemma-enabled pattern is designed to equip the model with the ability to identify and utilize useful lemmas provided in the input. To support this capability, during reinforcement learning (RL) training, a random subset of one to three formal lemmas is prepended to the problem context, exposing the prover to potentially useful intermediate results that can aid in constructing the final proof. These lemmas are prepared through a two-step pipeline: (1) a general-purpose LLM generates candidate lemmas in natural language; (2) these are then translated into formal statements using our Kimina-Autoformalizer-7b.

Initial observations showed that the model had a low tendency to incorporate the provided lemmas. To address this, we introduced a preference-based reward shaping strategy within the RL framework. When a theorem could be proven through multiple trajectories, solutions that successfully leveraged a provided lemma were assigned a higher reward, while those that did not were penalized. This approach proved effective, increasing the lemma utilization rate to a stable 30–40% after training. Importantly, this method not only encouraged lemma usage but also promoted selectivity: the model learned to apply lemmas strategically when they were useful, while ignoring irrelevant ones, demonstrating more efficient and human-like reasoning behavior.

TTRL search

Figure 2: Diagram of the Test-Time Reinforcement Learning (TTRL) Search framework. It is composed of three main components: RL training, sublemma generation, and negation filtering. Sublemmas are generated and formalized, then integrated into the training loop with dynamic scoring and pruning. A negation filter ensures logical consistency by discarding invalid lemmas.

The lemma-enabled pattern allowed the model to incorporate pre-generated lemmas as intermediate steps in constructing proofs. However, we observed that randomly sampling and inserting lemmas was insufficient for solving complex problems requiring highly structured and deeply nested reasoning. To address this limitation, we developed the Test-Time Reinforcement Learning Search framework—a trainable, agentic approach that systematically organizes, filters, and composes a large set of candidate lemmas to build complete proofs. This framework transitions the process from random exploration to a more strategic, goal-directed search.

As illustrated in Figure 2, we define the search scope of each problem as the problem itself along with its associated candidate lemmas. TTRL Search tracks a lemma utilization score within each search scope to measure how frequently and effectively each lemma contributes to the final proof. At the beginning of each RL training iteration, for each problem (i.e., search scope), we construct K = 10 input variants by prepending different combinations of lemmas. To balance exploration and exploitation, 60% of the inputs are built using the top-ranking lemmas—those with the highest utilization scores—focusing the model on the most promising proof paths. The remaining 40% are formed by including these top lemmas alongside one to four randomly selected additional lemmas, encouraging exploration of novel and potentially useful lemma combinations.

To ensure quality, a filtering mechanism prunes lemmas that consistently fail to contribute meaningfully: any lemma that fails to achieve a utilization score of at least τ=0.10 after 50 insertion attempts is removed from the search pool.

A key feature of TTRL is its recursive search mechanism. The search scope is not limited to the original theorem but is also maintained for each lemma, allowing the framework to recursively decompose the problem into smaller subproblems. A parallel sublemma generation process runs throughout, and whenever a theorem or lemma fails to find a proof after N = 128 attempts, new candidate sublemmas are generated. This recursive strategy enables test-time scaling of reasoning depth, significantly extending the model’s effective problem-solving capability.

To maintain logical soundness, we address a failure mode wherein incorrectly formalized lemmas could lead to trivial or invalid proofs. In such cases, the model may exploit inconsistencies to construct seemingly valid but unsound solutions. To prevent this, we introduce a negation proving process: for each newly generated lemma, we attempt to prove its logical negation. If the negated statement is provable, it indicates that the original lemma is logically inconsistent and is immediately discarded. This step ensures the reliability and soundness of the overall proof construction process.

Enabling Error-Fixing in Kimina-Prover

A critical limitation of recent advanced theorem proving models is their lack of the ability to correct proofs based on feedback from the proof assistant—capabilities that human users routinely leverage. To address this gap, we developed a dedicated framework to integrate error-fixing capabilities into Kimina-Prover.

SFT Data Generation for Error Correction. General-purpose large language models exhibit a low success rate when interpreting Lean’s error messages and proposing valid corrections. To overcome this, we constructed a specialized Supervised Fine-Tuning (SFT) dataset tailored for error correction. This dataset consists of triplets in the form of (incorrect proof, Lean feedback, correct proof). To enrich the supervision signal, we prompted Claude 3.7 sonnet[8] to synthesize step-by-step reasoning chains explaining how the incorrect proof can be transformed into the correct one using the provided feedback. The result is a high-quality dataset that includes not only the initial and corrected proofs but also the intermediate reasoning, facilitating more effective learning.

Batched Failure Replay Strategy. Integrating error correction directly into the reinforcement learning (RL) loop initially proved ineffective due to the low success rate (~1%) of the SFT model in fixing errors, resulting in sparse rewards and unstable training. To address this, we designed the Batched Failure Replay strategy. Rather than attempting to correct errors immediately during an RL iteration, all failed proof attempts from iteration N are collected. In the subsequent iteration N+1, the training batch is composed of a fixed-size union of these prior failures (e.g., 500 samples) and standard problems from the prompt set (e.g., another 500 samples). This ensures consistent and high-volume exposure to error correction tasks in each training step, enabling the model to gradually learn effective error-handling behaviors in a stable and data-efficient manner.

16+16 attempt-and-fix 32×1 brute-force 32+32 attempt-and-fix

kimina-prover	35.6	28.8	44.1

Table 2: Performance comparison between error-fixing and brute-force strategies on a selected subset of 59 MiniF2F-Test problems with the lowest win rates. Under equal sample budgets, the error-fixing strategies (16+16) outperform the brute-force baseline (32×1), demonstrating improved sample efficiency.

This training approach led to measurable improvements in the model’s ability to recover from failure. The model advanced from correcting basic syntax errors to resolving complex logical mistakes and, eventually, discovering alternative proof strategies when initial attempts failed. Crucially, this capability improved sample efficiency. As shown in Table 2, we compared different strategies under fixed computational budgets. The 16+16 attempt-and-fix strategy—consisting of 16 initial proof attempts, each followed by one error-fixing attempt—achieved a success rate of 35.6%, outperforming the 32×1 brute-force baseline, which reached 28.8% with 32 independent attempts. Further increasing the sample budget to 32+32 with error correction yielded a success rate of 44.1%. These results demonstrate that enabling the model to correct its own errors is a more efficient use of computational resources than repeated trial-and-error.

Other improvement

In addition to our core TTRL search and Error fixing, we developed several other novel techniques to enhance the model's learning process and problem-solving capabilities.

Prompt Set Curation and Iteration. Effective data curation is critical for reinforcement learning (RL) efficiency. Our initial prompt set of over 300K problems was refined to a competition-focused subset of approximately 90K, with an emphasis on problems from sources such as the olympiads-ref subset of NuminaMath 1.5[9]. During RL, we applied dynamic filtering: problems with consistently high success rates were removed to maintain challenge, while persistently difficult problems were decomposed into simpler subproblems or variations using a general-purpose LLM. The final prompt set combines auto-formalized statements for broad coverage, human-annotated problems for quality, and augmented variations for targeted skill development. The curated prompt set will be open-sourced.

Continuous Pre-training on Lean Data. To address the limited Lean proficiency in most base models, we applied Continuous Pre-training (CPT). A 6-billion-token CPT dataset was constructed from multiple sources: 260M tokens from online platforms such as GitHub, 5.5B tokens of compiler-validated rollout data from our RL pipeline, and additional structured data in state–tactic–state and state–tactic–error formats to enhance diversity. These structured data supplements the base model with additional domain knowledge.

Random Proof Cut Data Augmentation. To better utilize high-quality, human-annotated proofs that lack intermediate reasoning steps, we developed the Random Proof Cut technique. This method augments such data using two variations:

Proof truncation: The final part of a proof is removed, and the model must complete it.
Proof infilling: A randomly selected internal block is replaced with the placeholder token sorry, and the model must reconstruct the missing steps.

This strategy allows the model to learn how to extend existing lines of reasoning and to fill in logical gaps, effectively integrating human-generated content into the training process.

Non-proof Problem Solving. Many problems are presented in the form of requiring a final answer, which cannot be naturally framed as a traditional proof task. Inspired by the CombiBench[10] evaluation methodology, we introduced a capability for non-proof problem solving to unify answer generation with proof construction. In this setting, the model is given a problem with the final answer field left blank. It performs a two-stage process within a single inference: it first deduces the correct answer and then generates a complete formal proof to justify that answer. This approach ensures the model’s reasoning is explicitly conditioned on reaching the correct solution.

Here we provide some proof cases found by our newest Kimina-prover model. The complete list of solved miniF2F problems is available on GitHub

TTRL search proof example

The system recursively decomposed the problem into four layers of lemmas using a TTRL search approach, ultimately generating a 520-line formal proof. During this recursive proving process, initially, only one lemma, sum_cexp_div_pow_ne_zero, remained to be solved. While this conclusion seems straightforward to prove informally, it would still require a lot of steps to prove it rigorously in Lean 4. Our system recursively generated two extra layers of lemmas and obtained a 260-line formal proof for sum_cexp_div_pow_ne_zero. This proof is valid under Mathlib v4.15

Sample proof for imo 1969 p2

Proof structure

drawing

Figure 3: Proof dependency graph for the theorem imo_1969_p2, illustrating the structure of lemma references and calls. Each node represents a lemma or theorem, with directed edges indicating dependency relationships (i.e., one lemma calling another). Lemma names are randomly generated during proof construction; for clarity, we manually renamed them post hoc to reflect their mathematical content more accurately.

Formal code

import Mathlib open Finset in theorem norm_le_sum_norm_of_sum_eq_zero {N : ℕ} (hN : N > 0) (z : ℕ → ℂ) (hsum : ∑ i ∈ range N, z i = 0) : ‖z 0‖ ≤ ∑ i ∈ Icc 1 (N - 1), ‖z i‖ := by by_cases h : N > 1 · have h6 : ∑ i in range N, z i = z 0 + ∑ i in Icc 1 (N - 1), z i := by have h10 : range N = {0} ∪ Icc 1 (N - 1) := by ext x simp omega rw [h10] rw [sum_union] · simp · apply disjoint_left.mpr intro x hx1 hx2 simp at hx1 simp [hx1] at hx2 have h3 : z 0 = - (∑ i ∈ Icc 1 (N - 1), z i) := by have h4 : ∑ i in range N, z i = 0 := hsum rw [h6] at h4 calc z 0 = (z 0 + ∑ i in Icc 1 (N - 1), z i) - ∑ i in Icc 1 (N - 1), z i := by ring _ = (0 : ℂ) - ∑ i in Icc 1 (N - 1), z i := by rw [h4] _ = - (∑ i in Icc 1 (N - 1), z i) := by ring have h11 : ‖z 0‖ = ‖(∑ i ∈ Icc 1 (N - 1), z i : ℂ)‖ := by rw [h3] simp [norm_neg] have h12 : ‖(∑ i ∈ Icc 1 (N - 1), z i : ℂ)‖ ≤ ∑ i ∈ Icc 1 (N - 1), ‖z i‖ := by apply norm_sum_le linarith [h11, h12] · have h5 : N = 1 := by omega rw [h5] have h6 : ∑ i in range 1, z i = 0 := by have h7 : N = 1 := h5 have h8 : ∑ i in range N, z i = 0 := hsum rw [show N = 1 by omega] at h8 simpa using h8 have h7 : z 0 = 0 := by simp at h6 all_goals try tauto rw [show z 0 = (0 : ℂ) by exact h7] simp open Finset in theorem sum_pow_one_half_from_two_to_N {N : ℕ} (hN : 0 < N) : ∑ j ∈ Icc 1 (N - 1), (1 / 2 : ℝ) ^ (j + 1) = 1 / 2 - 1 / 2 ^ N := by have h2 : ∀ (k : ℕ), ∑ j in Icc 1 (k), (1 / 2 : ℝ) ^ (j + 1) = 1 / 2 - 1 / 2 ^ (k + 1 : ℕ) := by intro k induction k with | zero => simp | succ k ihk => rw [sum_Icc_succ_top (by linarith)] rw [ihk] field_simp at * ring_nf have h4 : ∑ j in Icc 1 (N - 1), (1 / 2 : ℝ) ^ (j + 1) = 1 / 2 - 1 / 2 ^ (N : ℕ) := by have h5 : ∑ j in Icc 1 (N - 1), (1 / 2 : ℝ) ^ (j + 1) = 1 / 2 - 1 / 2 ^ ((N - 1 : ℕ) + 1 : ℕ) := by apply h2 (N - 1) rw [h5] have h6 : (N - 1 : ℕ) + 1 = N := by omega rw [h6] exact h4 open Complex Finset in theorem sum_eq_zero_of_sq_add_sq_eq_geom_seq_false {N : ℕ} (hN : 0 < N) (x y : Fin N → ℝ) (hxsum : ∑ i : Fin N, x i = 0) (hysum : ∑ i : Fin N, y i = 0) (hxy : ∀ j : Fin N, (x j)^2 + (y j)^2 = (1/4)^(j.1+1)) : False := by let z : ℕ → ℂ := fun i => if h : i < N then (x ⟨i, h⟩ : ℂ) + (y ⟨i, h⟩ : ℂ) * I else 0 have hsumz : ∑ i in range N, z i = 0 := by have h4 : ∑ i in range N, z i = (∑ i : Fin N, ((x i : ℂ) + (y i : ℂ) * I)) := by simp [z, Finset.sum_fin_eq_sum_range, Finset.sum_congr] rw [h4] simp [Complex.ext_iff, Finset.sum_congr, hxsum, hysum] have hz0 : ‖z 0‖ ≤ ∑ i ∈ Icc 1 (N - 1), ‖z i‖ := norm_le_sum_norm_of_sum_eq_zero hN z hsumz have h2 : ‖z 0‖ = (1 / 2 : ℝ) := by have h6 : z 0 = (x ⟨0, by omega⟩ : ℂ) + (y ⟨0, by omega⟩ : ℂ) * I := by have h7 : (0 : ℕ) < N := by omega simp [z, h7] rw [h6] have h7 : ((x ⟨0, by omega⟩ : ℝ) ^ 2 + (y ⟨0, by omega⟩ : ℝ) ^ 2) = (1 / 4 : ℝ) := by have h8 := hxy (⟨0, by omega⟩) norm_num at h8 ⊢ nlinarith have h8 : ‖((x ⟨0, by omega⟩ : ℂ) + (y ⟨0, by omega⟩ : ℂ) * I)‖ = Real.sqrt ((x ⟨0, by omega⟩ : ℝ) ^ 2 + (y ⟨0, by omega⟩ : ℝ) ^ 2) := by simp [Complex.norm_eq_abs, Complex.abs, Complex.normSq] all_goals ring_nf rw [h8, h7] have h9 : Real.sqrt ((1 / 4 : ℝ)) = (1 / 2 : ℝ) := by rw [Real.sqrt_eq_iff_mul_self_eq] <;> norm_num rw [h9] have h3 : ∑ i ∈ Icc 1 (N - 1), ‖z i‖ = ∑ i ∈ Icc 1 (N - 1), (1 / 2 : ℝ) ^ (i + 1 : ℕ) := by apply Finset.sum_congr . rfl . intro i hi have h10 : i ≥ 1 := by have h11 : i ∈ Icc (1 : ℕ) (N - 1) := by simpa using hi simp at h11 ⊢ omega have h11 : i ≤ N - 1 := by have h12 : i ∈ Icc (1 : ℕ) (N - 1) := by simpa using hi simp at h12 ⊢ omega have h12 : i < N := by omega have h13 : z i = (x ⟨i, by omega⟩ : ℂ) + (y ⟨i, by omega⟩ : ℂ) * I := by have h14 : i < N := by omega simp [z, h14] have h14 : ‖z i‖ = (1 / 2 : ℝ) ^ (i + 1 : ℕ) := by rw [h13] have h14 : ((x ⟨i, by omega⟩ : ℝ) ^ 2 + (y ⟨i, by omega⟩ : ℝ) ^ 2) = (1 / 4 : ℝ) ^ (i + 1 : ℕ) := by have h15 := hxy (⟨i, by omega⟩) norm_num at h15 ⊢ nlinarith have h15 : ‖((x ⟨i, by omega⟩ : ℂ) + (y ⟨i, by omega⟩ : ℂ) * I)‖ = Real.sqrt ((x ⟨i, by omega⟩ : ℝ) ^ 2 + (y ⟨i, by omega⟩ : ℝ) ^ 2) := by simp [Complex.norm_eq_abs, Complex.abs, Complex.normSq] all_goals ring_nf rw [h15] have h16 : Real.sqrt ((x ⟨i, by omega⟩ : ℝ) ^ 2 + (y ⟨i, by omega⟩ : ℝ) ^ 2) = (1 / 2 : ℝ) ^ (i + 1 : ℕ) := by have h17 : ((x ⟨i, by omega⟩ : ℝ) ^ 2 + (y ⟨i, by omega⟩ : ℝ) ^ 2) = (1 / 4 : ℝ) ^ (i + 1 : ℕ) := h14 rw [h17] have h18 : Real.sqrt ((1 / 4 : ℝ) ^ (i + 1 : ℕ)) = (1 / 2 : ℝ) ^ (i + 1 : ℕ) := by have h19 : ((1 / 4 : ℝ) : ℝ) = (1 / 2 : ℝ) ^ (2 : ℕ) := by norm_num rw [h19] have h20 : Real.sqrt (((1 / 2 : ℝ) ^ (2 : ℕ)) ^ (i + 1 : ℕ)) = (1 / 2 : ℝ) ^ (i + 1 : ℕ) := by have h21 : ((1 / 2 : ℝ) ^ (2 : ℕ)) ^ (i + 1 : ℕ) = ((1 / 2 : ℝ) ^ (i + 1 : ℕ)) ^ 2 := by have h22 : ((1 / 2 : ℝ) ^ (2 : ℕ)) ^ (i + 1 : ℕ) = (1 / 2 : ℝ) ^ (2 * (i + 1 : ℕ) : ℕ) := by rw [← pow_mul] all_goals ring_nf have h23 : ((1 / 2 : ℝ) ^ (i + 1 : ℕ)) ^ 2 = (1 / 2 : ℝ) ^ (2 * (i + 1 : ℕ) : ℕ) := by ring_nf rw [h22, h23] rw [h21] have h22 : Real.sqrt (((1 / 2 : ℝ) ^ (i + 1 : ℕ)) ^ 2) = (1 / 2 : ℝ) ^ (i + 1 : ℕ) := by rw [Real.sqrt_sq (show (1 / 2 : ℝ) ^ (i + 1 : ℕ) ≥ 0 by positivity)] rw [h22] exact h20 exact h18 exact h16 linarith rw [h3] at hz0 rw [h2] at hz0 have h6 : ∑ i ∈ Icc 1 (N - 1), (1 / 2 : ℝ) ^ (i + 1 : ℕ) = (1 / 2 - 1 / 2 ^ N : ℝ) := by have h7 : ∑ i ∈ Icc 1 (N - 1), (1 / 2 : ℝ) ^ (i + 1 : ℕ) = (1 / 2 - 1 / 2 ^ N : ℝ) := sum_pow_one_half_from_two_to_N hN linarith rw [h6] at hz0 have h7 : (1 / 2 ^ N : ℝ) > 0 := by have h10 : (1 / 2 ^ N : ℝ) > 0 := by have h11 : (1 / 2 ^ N : ℝ) = (1 / 2 : ℝ) ^ N := by ring_nf rw [h11] positivity linarith have h11 : (1 / 2 - 1 / 2 ^ N : ℝ) < (1 / 2 : ℝ) := by have h12 : (1 / 2 ^ N : ℝ) > 0 := h7 linarith linarith open BigOperators Real Nat Topology Rat in theorem sum_cexp_div_pow_ne_zero {N : ℕ} (hN : N > 0) (a : Fin N → ℝ) : ((∑ j : Fin N, cos (a j) / 2 ^ (j : ℕ))^2 + (∑ j : Fin N, sin (a j) / 2 ^ (j : ℕ))^2) > 0 := by by_contra h push_neg at h have h1 : (∑ j : Fin N, cos (a j) / 2 ^ (j : ℕ)) ^ 2 ≥ 0 := by apply sq_nonneg have h2 : (∑ j : Fin N, sin (a j) / 2 ^ (j : ℕ)) ^ 2 ≥ 0 := by apply sq_nonneg have h3 : (∑ j : Fin N, cos (a j) / 2 ^ (j : ℕ)) ^ 2 = 0 := by linarith have h4 : (∑ j : Fin N, sin (a j) / 2 ^ (j : ℕ)) ^ 2 = 0 := by linarith have h5 : (∑ j : Fin N, cos (a j) / 2 ^ (j : ℕ)) = 0 := by nlinarith have h6 : (∑ j : Fin N, sin (a j) / 2 ^ (j : ℕ)) = 0 := by nlinarith let x : Fin N → ℝ := fun j => cos (a j) / 2 ^ (j.1 + 1 : ℕ) let y : Fin N → ℝ := fun j => sin (a j) / 2 ^ (j.1 + 1 : ℕ) have hxsum : ∑ i : Fin N, x i = 0 := by have eq2 : ∑ i : Fin N, x i = (∑ i : Fin N, (cos (a i) / 2 ^ (i.1 : ℕ) : ℝ)) / 2 := by have eq3 : ∑ i : Fin N, x i = ∑ i : Fin N, (cos (a i) / 2 ^ (i.1 + 1 : ℕ) : ℝ) := by congr rw [eq3] have eq4 : ∑ i : Fin N, (cos (a i) / 2 ^ (i.1 + 1 : ℕ) : ℝ) = (∑ i : Fin N, (cos (a i) / 2 ^ (i.1 : ℕ) : ℝ)) / 2 := by calc ∑ i : Fin N, (cos (a i) / 2 ^ (i.1 + 1 : ℕ) : ℝ) = ∑ i : Fin N, ((cos (a i) / 2 ^ (i.1 : ℕ) : ℝ) / 2) := by apply Finset.sum_congr . rfl intro i hi have h12 : (cos (a i) / 2 ^ (i.1 + 1 : ℕ) : ℝ) = (cos (a i) / 2 ^ (i.1 : ℕ) : ℝ) / 2 := by have h11 : (2 : ℝ) ^ (i.1 + 1 : ℕ) = (2 : ℝ) ^ (i.1 : ℕ) * 2 := by ring_nf rw [h11] field_simp all_goals ring exact h12 _ = (∑ i : Fin N, (cos (a i) / 2 ^ (i.1 : ℕ) : ℝ)) / 2 := by simp [Finset.sum_div] exact eq4 rw [eq2] rw [show ∑ i : Fin N, (cos (a i) / 2 ^ (i.1 : ℕ) : ℝ) = (0 : ℝ) by simpa using h5] all_goals norm_num have hysum : ∑ i : Fin N, y i = 0 := by have eq2 : ∑ i : Fin N, y i = (∑ i : Fin N, (sin (a i) / 2 ^ (i.1 : ℕ) : ℝ)) / 2 := by have eq3 : ∑ i : Fin N, y i = ∑ i : Fin N, (sin (a i) / 2 ^ (i.1 + 1 : ℕ) : ℝ) := by congr rw [eq3] have eq4 : ∑ i : Fin N, (sin (a i) / 2 ^ (i.1 + 1 : ℕ) : ℝ) = (∑ i : Fin N, (sin (a i) / 2 ^ (i.1 : ℕ) : ℝ)) / 2 := by calc ∑ i : Fin N, (sin (a i) / 2 ^ (i.1 + 1 : ℕ) : ℝ) = ∑ i : Fin N, ((sin (a i) / 2 ^ (i.1 : ℕ) : ℝ) / 2) := by apply Finset.sum_congr . rfl intro i hi have h12 : (sin (a i) / 2 ^ (i.1 + 1 : ℕ) : ℝ) = (sin (a i) / 2 ^ (i.1 : ℕ) : ℝ) / 2 := by have h11 : (2 : ℝ) ^ (i.1 + 1 : ℕ) = (2 : ℝ) ^ (i.1 : ℕ) * 2 := by ring_nf rw [h11] field_simp all_goals ring exact h12 _ = (∑ i : Fin N, (sin (a i) / 2 ^ (i.1 : ℕ) : ℝ)) / 2 := by simp [Finset.sum_div] exact eq4 rw [eq2] rw [show ∑ i : Fin N, (sin (a i) / 2 ^ (i.1 : ℕ) : ℝ) = (0 : ℝ) by simpa using h6] all_goals norm_num have hxy : ∀ j : Fin N, (x j) ^ 2 + (y j) ^ 2 = (1 / 4 : ℝ) ^ (j.1 + 1 : ℕ) := by intro j have h11 : cos (a j) ^ 2 + sin (a j) ^ 2 = 1 := by exact Real.cos_sq_add_sin_sq (a j) simp [x, y] have h12 : (4 : ℝ) ^ (j.1 + 1 : ℕ) = (2 : ℝ) ^ (2 * (j.1 + 1 : ℕ)) := by have h13 : (4 : ℝ) = (2 : ℝ) ^ (2 : ℕ) := by norm_num rw [h13] rw [← pow_mul] have h13 : (2 : ℝ) ^ (2 * (j.1 + 1 : ℕ)) = ((2 : ℝ) ^ (j.1 + 1 : ℕ)) ^ 2 := by ring_nf field_simp [h11, h12, h13] all_goals ring_nf have h12 : False := sum_eq_zero_of_sq_add_sq_eq_geom_seq_false (show 0 < N by omega) x y hxsum hysum hxy tauto open BigOperators Real Nat Topology Rat in theorem mul_cos_sub_mul_sin_eq_mul_cos_add (C S : ℝ) (h : C ^ 2 + S ^ 2 ≠ 0) : ∃ R α : ℝ, R > 0 ∧ ∀ x : ℝ, C * cos x - S * sin x = R * cos (x + α) := by have h1 : C ^ 2 + S ^ 2 > 0 := by by_contra h2 push_neg at h2 have h3 : C ^ 2 ≥ 0 := sq_nonneg C have h4 : S ^ 2 ≥ 0 := sq_nonneg S have h5 : C ^ 2 + S ^ 2 = 0 := by nlinarith tauto have h2 : ∃ R : ℝ, R > 0 ∧ R ^ 2 = C ^ 2 + S ^ 2 := by use Real.sqrt (C ^ 2 + S ^ 2) constructor · -- Show that sqrt (C^2 + S^2) > 0 apply Real.sqrt_pos.mpr linarith · -- Show that (sqrt (C^2 + S^2)) ^ 2 = C^2 + S^2 rw [Real.sq_sqrt] positivity rcases h2 with ⟨R, hR_pos, hR_sq⟩ have h12 : (C / R) ^ 2 + (S / R) ^ 2 = 1 := by have h14 : R ≠ 0 := by linarith have h15 : R ^ 2 = C ^ 2 + S ^ 2 := hR_sq field_simp at * nlinarith have h13 : ∃ α : ℝ, Real.cos α = C / R ∧ Real.sin α = S / R := by have h9 : (C / R) ^ 2 + (S / R) ^ 2 = 1 := h12 by_cases hC : C ≥ 0 · -- C ≥ 0 use Real.arcsin (S / R) constructor · -- Show cos (arcsin (S / R)) = C / R have h14 : Real.cos (Real.arcsin (S / R)) = Real.sqrt (1 - (S / R) ^ 2) := by rw [Real.cos_arcsin] have h15 : Real.sqrt (1 - (S / R) ^ 2) = C / R := by have h151 : (C / R) ^ 2 = 1 - (S / R) ^ 2 := by linarith [h9] have h161 : Real.sqrt (1 - (S / R) ^ 2) = Real.sqrt ((C / R) ^ 2) := by rw [show 1 - (S / R) ^ 2 = (C / R) ^ 2 by linarith [h9]] rw [h161] have h171 : Real.sqrt ((C / R) ^ 2) = C / R := by apply Real.sqrt_sq (show 0 ≤ C / R by have h211 : R > 0 := hR_pos apply div_nonneg linarith linarith ) rw [h171] rw [h14, h15] · -- Show sin (arcsin (S / R)) = S / R have h20 : -1 ≤ S / R := by have h6 : (S / R) ^ 2 ≤ 1 := by nlinarith [h9] nlinarith [sq_nonneg (S / R - 1), sq_nonneg (S / R + 1)] have h21 : S / R ≤ 1 := by have h6 : (S / R) ^ 2 ≤ 1 := by nlinarith [h9] nlinarith [sq_nonneg (S / R - 1), sq_nonneg (S / R + 1)] have h18 : Real.sin (Real.arcsin (S / R)) = S / R := by apply Real.sin_arcsin all_goals linarith rw [h18] · -- C < 0 have hC2 : C < (0 : ℝ) := by linarith have hC3 : C / R < 0 := by have hR_pos1 : R > 0 := hR_pos apply div_neg_of_neg_of_pos hC2 (by linarith) use Real.pi - Real.arcsin (S / R) constructor · -- Show cos (π - arcsin (S / R)) = C / R have h28 : Real.cos (Real.pi - Real.arcsin (S / R)) = - Real.sqrt (1 - (S / R) ^ 2) := by have h1 : Real.cos (Real.pi - Real.arcsin (S / R)) = - Real.cos (Real.arcsin (S / R)) := by rw [Real.cos_pi_sub] have h2 : Real.cos (Real.arcsin (S / R)) = Real.sqrt (1 - (S / R) ^ 2) := by rw [Real.cos_arcsin] rw [h1, h2] have h29 : Real.sqrt (1 - (S / R) ^ 2) = - (C / R) := by have h301 : (C / R) ^ 2 = 1 - (S / R) ^ 2 := by linarith [h9] have h311 : Real.sqrt (1 - (S / R) ^ 2) = Real.sqrt ((C / R) ^ 2) := by rw [show 1 - (S / R) ^ 2 = (C / R) ^ 2 by linarith [h9]] rw [h311] have h321 : Real.sqrt ((C / R) ^ 2) = - (C / R) := by have h331 : C / R < 0 := hC3 have : Real.sqrt ((C / R) ^ 2) = - (C / R) := by rw [Real.sqrt_sq_eq_abs] rw [abs_of_neg h331] linarith linarith rw [h28, h29] all_goals nlinarith · -- Show sin (π - arcsin (S / R)) = S / R have h30 : Real.sin (Real.pi - Real.arcsin (S / R)) = Real.sin (Real.arcsin (S / R)) := by rw [Real.sin_pi_sub] have h31 : Real.sin (Real.arcsin (S / R)) = S / R := by have h20 : -1 ≤ S / R := by have h6 : (S / R) ^ 2 ≤ 1 := by nlinarith [h9] nlinarith [sq_nonneg (S / R - 1), sq_nonneg (S / R + 1)] have h21 : S / R ≤ 1 := by have h6 : (S / R) ^ 2 ≤ 1 := by nlinarith [h9] nlinarith [sq_nonneg (S / R - 1), sq_nonneg (S / R + 1)] apply Real.sin_arcsin all_goals linarith rw [h30, h31] rcases h13 with ⟨α, h14, h15⟩ use R, α constructor · -- Show R > 0 linarith · -- Show ∀ x : ℝ, C * cos x - S * sin x = R * cos (x + α) intro x have h21 : Real.cos (x + α) = Real.cos x * Real.cos α - Real.sin x * Real.sin α := by rw [Real.cos_add] have h22 : Real.cos α = C / R := by linarith [h14] have h23 : Real.sin α = S / R := by linarith [h15] calc C * Real.cos x - S * Real.sin x = R * (Real.cos x * (C / R) - Real.sin x * (S / R)) := by field_simp [show R ≠ 0 by linarith] all_goals ring _ = R * (Real.cos x * Real.cos α - Real.sin x * Real.sin α) := by rw [show Real.cos α = C / R by linarith [h14], show Real.sin α = S / R by linarith [h15]] _ = R * Real.cos (x + α) := by rw [h21] theorem sum_cos_div_two_pow_eq_mul_cos (N : ℕ) (a : ℕ → ℝ) (hN : N > 0) : ∃ R0 α : ℝ, R0 > 0 ∧ ∀ x : ℝ, ∑ j : Fin N, Real.cos (a j + x) / 2 ^ j.1 = (R0 : ℝ) * Real.cos (x + α) := by have h2 : ((∑ j : Fin N, Real.cos (a j) / 2 ^ (j : ℕ)) ^ 2 + (∑ j : Fin N, Real.sin (a j) / 2 ^ (j : ℕ)) ^ 2) > 0 := by apply sum_cexp_div_pow_ne_zero (by omega) (fun j : Fin N => a j) have h4 : (∑ j : Fin N, Real.cos (a j) / 2 ^ (j : ℕ)) ^ 2 + (∑ j : Fin N, Real.sin (a j) / 2 ^ (j : ℕ)) ^ 2 ≠ 0 := by linarith have h5 : ∃ R α : ℝ, R > 0 ∧ ∀ x : ℝ, (∑ j : Fin N, Real.cos (a j) / 2 ^ (j : ℕ)) * Real.cos x - (∑ j : Fin N, Real.sin (a j) / 2 ^ (j : ℕ)) * Real.sin x = R * Real.cos (x + α) := by apply mul_cos_sub_mul_sin_eq_mul_cos_add (∑ j : Fin N, Real.cos (a j) / 2 ^ (j : ℕ)) (∑ j : Fin N, Real.sin (a j) / 2 ^ (j : ℕ)) (by exact h4 ) rcases h5 with ⟨R0, α, hR0_pos, h_eq⟩ use R0, α constructor . exact hR0_pos . intro x have h_eq2 : ∑ j : Fin N, Real.cos (a j + x) / 2 ^ j.1 = (∑ j : Fin N, Real.cos (a j) / 2 ^ (j : ℕ)) * Real.cos x - (∑ j : Fin N, Real.sin (a j) / 2 ^ (j : ℕ)) * Real.sin x := by have h8 : ∀ j : Fin N, Real.cos (a j + x) / (2 : ℝ) ^ (j : ℕ) = (Real.cos (a j) * Real.cos x - Real.sin (a j) * Real.sin x) / (2 : ℝ) ^ (j : ℕ) := by intro j have h9 : Real.cos (a j + x) = Real.cos (a j) * Real.cos x - Real.sin (a j) * Real.sin x := by rw [Real.cos_add] rw [h9] have h10 : ∑ j : Fin N, Real.cos (a j + x) / 2 ^ j.1 = ∑ j : Fin N, ((Real.cos (a j) * Real.cos x - Real.sin (a j) * Real.sin x) / (2 : ℝ) ^ (j : ℕ)) := by apply Finset.sum_congr . rfl . intro j hj have h9 : Real.cos (a j + x) / (2 : ℝ) ^ (j : ℕ) = (Real.cos (a j) * Real.cos x - Real.sin (a j) * Real.sin x) / (2 : ℝ) ^ (j : ℕ) := h8 j simpa using h9 rw [h10] have h11 : ∑ j : Fin N, ((Real.cos (a j) * Real.cos x - Real.sin (a j) * Real.sin x) / (2 : ℝ) ^ (j : ℕ)) = (∑ j : Fin N, Real.cos (a j) / (2 : ℝ) ^ (j : ℕ)) * Real.cos x - (∑ j : Fin N, Real.sin (a j) / (2 : ℝ) ^ (j : ℕ)) * Real.sin x := by have h12 : ∀ j : Fin N, ((Real.cos (a j) * Real.cos x - Real.sin (a j) * Real.sin x) / (2 : ℝ) ^ (j : ℕ)) = (Real.cos (a j) / (2 : ℝ) ^ (j : ℕ)) * Real.cos x - (Real.sin (a j) / (2 : ℝ) ^ (j : ℕ)) * Real.sin x := by intro j ring_nf calc ∑ j : Fin N, ((Real.cos (a j) * Real.cos x - Real.sin (a j) * Real.sin x) / (2 : ℝ) ^ (j : ℕ)) = ∑ j : Fin N, ((Real.cos (a j) / (2 : ℝ) ^ (j : ℕ)) * Real.cos x - (Real.sin (a j) / (2 : ℝ) ^ (j : ℕ)) * Real.sin x) := by apply Finset.sum_congr . rfl . intro j hj specialize h12 j linarith _ = (∑ j : Fin N, (Real.cos (a j) / (2 : ℝ) ^ (j : ℕ)) * Real.cos x) - (∑ j : Fin N, (Real.sin (a j) / (2 : ℝ) ^ (j : ℕ)) * Real.sin x) := by rw [Finset.sum_sub_distrib] _ = (∑ j : Fin N, Real.cos (a j) / (2 : ℝ) ^ (j : ℕ)) * Real.cos x - (∑ j : Fin N, Real.sin (a j) / (2 : ℝ) ^ (j : ℕ)) * Real.sin x := by rw [Finset.sum_mul, Finset.sum_mul] exact h11 rw [h_eq2] specialize h_eq x linarith open Real Set in theorem sub_eq_int_mul_pi_of_cos_eq_zero : ∀ (θ₁ θ₂ : ℝ), cos θ₁ = 0 → cos θ₂ = 0 → ∃ m : ℤ, θ₂ - θ₁ = m * π := by intro θ₁ θ₂ h₁ h₂ have h1 : ∃ k : ℤ, θ₁ = Real.pi / 2 + k * Real.pi := by rw [Real.cos_eq_zero_iff] at h₁ rcases h₁ with ⟨k, hk⟩ use k linarith rcases h1 with ⟨k, hk1⟩ have h2 : ∃ l : ℤ, θ₂ = Real.pi / 2 + l * Real.pi := by rw [Real.cos_eq_zero_iff] at h₂ rcases h₂ with ⟨l, hl⟩ use l linarith rcases h2 with ⟨l, hl2⟩ use l - k rw [hl2, hk1] field_simp at * <;> ring_nf <;> norm_num open BigOperators Real Nat Topology Rat in theorem imo_1969_p2 (m n : ℝ) (k : ℕ) (a : ℕ → ℝ) (y : ℝ → ℝ) (h₀ : 0 < k) (h₁ : ∀ x, y x = ∑ i in Finset.range k, Real.cos (a i + x) / 2 ^ i) (h₂ : y m = 0) (h₃ : y n = 0) : ∃ t : ℤ, m - n = t * Real.pi := by have h4 : y m = ∑ i in Finset.range k, Real.cos (a i + m) / 2 ^ i := by specialize h₁ m simpa using h₁ have h5 : y n = ∑ i in Finset.range k, Real.cos (a i + n) / 2 ^ i := by specialize h₁ n simpa using h₁ rw [h4] at h₂ rw [h5] at h₃ have h9 : ∃ R0 α : ℝ, R0 > 0 ∧ ∀ x : ℝ, ∑ j : Fin k, Real.cos (a j + x) / 2 ^ j.1 = R0 * Real.cos (x + α) := by apply sum_cos_div_two_pow_eq_mul_cos k (fun (j : ℕ) => a j) (by omega) rcases h9 with ⟨R0, α, hR0, h_eq3⟩ have h10 : ∑ i in Finset.range k, Real.cos (a i + m) / 2 ^ i = R0 * Real.cos (m + α) := by have h11 : ∑ i in Finset.range k, Real.cos (a i + m) / 2 ^ i = ∑ j : Fin k, Real.cos (a j + m) / 2 ^ j.1 := by simp [Finset.sum_range] rw [h11] specialize h_eq3 m simpa using h_eq3 have h11 : ∑ i in Finset.range k, Real.cos (a i + n) / 2 ^ i = R0 * Real.cos (n + α) := by have h12 : ∑ i in Finset.range k, Real.cos (a i + n) / 2 ^ i = ∑ j : Fin k, Real.cos (a j + n) / 2 ^ j.1 := by simp [Finset.sum_range] rw [h12] specialize h_eq3 n simpa using h_eq3 have h10' : R0 * Real.cos (m + α) = 0 := by have h_eq10 : ∑ i in Finset.range k, Real.cos (a i + m) / 2 ^ i = 0 := by linarith [h₂, h10] linarith [h10, h_eq10] have h11' : R0 * Real.cos (n + α) = 0 := by have h_eq11 : ∑ i in Finset.range k, Real.cos (a i + n) / 2 ^ i = 0 := by linarith [h₃, h11] linarith [h11, h_eq11] have h12 : Real.cos (m + α) = 0 := by apply (mul_left_inj' (ne_of_gt hR0)).mp linarith have h13 : Real.cos (n + α) = 0 := by apply (mul_left_inj' (ne_of_gt hR0)).mp linarith have h14 : ∃ m1 : ℤ, (n + α) - (m + α) = m1 * Real.pi := by apply sub_eq_int_mul_pi_of_cos_eq_zero (m + α) (n + α) h12 h13 rcases h14 with ⟨m1, hm1⟩ use -m1 have hm4 : (m : ℝ) - (n : ℝ) = - (m1 * Real.pi) := by have hm3 : (n + α : ℝ) - (m + α : ℝ) = (m1 : ℝ) * Real.pi := by linarith [hm1] nlinarith have hm5 : (-m1 : ℤ) * Real.pi = - (m1 * Real.pi) := by simp rw [hm5] linarith [hm4]

Error fixing example

In the following example, the initial proof timed out because directly applying simp to ((Real.sqrt 3 / 2 : ℂ) + (1 / 2) * Complex.I) ^ 12 resulted in excessive search complexity. Breaking down the proof by step-by-step calculating intermediate powers like $z^3$, then $z^6$, and finally $z^{12}$ significantly reduced the computational burden on the compiler.

Sample proof of Error fixing

Proof before error fixing

import Mathlib theorem algebra_299086 {z : ℂ} (hz : z = (Real.sqrt 3 / 2) + (1 / 2) * .I) : z ^ 2016 = 1 := by have h1 : z = (Real.sqrt 3 / 2) + (1 / 2) * Complex.I := hz -- Prove z^12 = 1 have h2 : z ^ 12 = 1 := by rw [h1] have h3 : ((Real.sqrt 3 / 2 : ℂ) + (1 / 2) * Complex.I) ^ 12 = 1 := by simp [pow_succ, pow_zero, Complex.ext_iff, Complex.I_mul_I, mul_add, add_mul, mul_comm, mul_assoc, pow_two, pow_three, pow_succ, Real.sqrt_sq] field_simp [Complex.ext_iff, mul_add, add_mul, pow_two, pow_three] <;> ring_nf <;> norm_num [Real.sqrt_sq] exact h3 calc z ^ 2016 = (z ^ 12) ^ 168 := by ring _ = 1 ^ 168 := by rw [h2] _ = 1 := by norm_num

Error message

Error message: Lean process timed out

Proof after error fixing

import Mathlib theorem algebra_299086 {z : ℂ} (hz : z = (Real.sqrt 3 / 2) + (1 / 2) * .I) : z ^ 2016 = 1 := by have h1 : z = (Real.sqrt 3 / 2) + (1 / 2) * Complex.I := by exact_mod_cast hz have h2 : z ^ 3 = Complex.I := by rw [h1] simp [pow_three, Complex.ext_iff, Complex.add_re, Complex.add_im, Complex.mul_re, Complex.mul_im, mul_add, add_mul, pow_two, mul_comm, mul_assoc] field_simp <;> ring_nf <;> norm_num have h3 : z ^ 6 = -1 := by calc z ^ 6 = (z ^ 3) ^ 2 := by ring _ = Complex.I ^ 2 := by rw [h2] _ = -1 := by simp [Complex.I_sq] have h4 : z ^ 12 = 1 := by calc z ^ 12 = (z ^ 6) ^ 2 := by ring _ = (-1) ^ 2 := by rw [h3] _ = 1 := by norm_num calc z ^ 2016 = (z ^ 12) ^ 168 := by ring_nf _ = (1) ^ 168 := by rw [h4] _ = 1 := by norm_num

If you'd like to cite Kimina-Prover, please feel free to reference our preview paper for now. A full paper is currently in preparation and will be available soon.

@article{kimina_prover_2025, title = {Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning}, author = {Wang, Haiming and Unsal, Mert and Lin, Xiaohan and Baksys, Mantas and Liu, Junqi and Santos, Marco Dos and Sung, Flood and Vinyes, Marina and Ying, Zhenzhe and Zhu, Zekai and Lu, Jianqiao and Saxcé, Hugues de and Bailey, Bolton and Song, Chendong and Xiao, Chenjun and Zhang, Dehao and Zhang, Ebony and Pu, Frederick and Zhu, Han and Liu, Jiawei and Bayer, Jonas and Michel, Julien and Yu, Longhui and Dreyfus-Schmidt, Léo and Tunstall, Lewis and Pagani, Luigi and Machado, Moreira and Bourigault, Pauline and Wang, Ran and Polu, Stanislas and Barroyer, Thibaut and Li, Wen-Ding and Niu, Yazhe and Fleureau, Yann and Hu, Yangyang and Yu, Zhouliang and Wang, Zihan and Yang, Zhilin and Liu, Zhengying and Li, Jia}, year = {2025}, url = {http://arxiv.org/abs/2504.11354}, }

[1]Team, Kimi, et al. "Kimi k1. 5: Scaling reinforcement learning with llms." arXiv preprint arXiv:2501.12599 (2025).

[2]Qwen, et al. "Qwen2.5 Technical Report" arXiv preprint arXiv:2412.15115 (2024).

[3]Yang, An, et al. "Qwen3 technical report." arXiv preprint arXiv:2505.09388 (2025).

[4]https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/

[5]Ren, Z. Z., et al. "Deepseek-prover-v2: Advancing formal mathematical reasoning via reinforcement learning for subgoal decomposition." arXiv preprint arXiv:2504.21801 (2025).

[6]Cao, Chenrui, et al. "Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models." arXiv preprint arXiv:2506.11487 (2025).

[7]Wang, Haiming, et al. "Kimina-prover preview: Towards large formal reasoning models with reinforcement learning." arXiv preprint arXiv:2504.11354 (2025).

[8]https://assets.anthropic.com/m/785e231869ea8b3b/original/claude-3-7-sonnet-system-card.pdf

[9]Li, Jia, et al. "Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions." Hugging Face repository 13 (2024): 9.

[10]Liu, Junqi, et al. "CombiBench: Benchmarking LLM capability for combinatorial mathematics." arXiv preprint arXiv:2505.03171 (2025).

Read Entire Article