Ring-1T: Trillion-Parameter Model Trained with RLVR and RLHF

3 weeks ago 3

Ant Ling

“The Flow State of Insight, Born of Epiphany”

Today, we officially announce the release of the trillion-parameter thinking model, Ring-1T, the first open-source thinking model in this form factor. You can find download links and chat entry points towards the end of this article.

Ring-1T is built upon the preview version released at the end of last month through continuous, large-scale training using Reinforcement Learning with Verifiable Rewards (RLVR). This process further unleashes the natural language reasoning capabilities of the trillion-parameter foundation model. Additionally, RLHF (Reinforcement Learning from Human Feedback) training was used to enhance the model’s general utility, resulting in a more balanced performance across various tasks for the newly released Ring-1T.

Ring-1T inherits the Ling 2.0 architecture, trained on the Ling-1T-base foundation model, which features 1T total parameters and 50B active parameters, supporting a maximum 128K context window. Leveraging our proprietary stable reinforcement learning training method, Icepop (Popsicle), and the highly efficient reinforcement learning system, ASystem (with the AReaL framework already open-sourced), we have achieved stable reinforcement learning scaling of the MoE (Mixture-of-Experts) architecture from ten billion (Ring-mini-2.0) to a hundred billion (Ring-flash-2.0) and finally to a trillion (Ring-1T) parameters. This significantly boosts the model’s deep thinking and natural language reasoning abilities.

Continuously Evolving Deep Thinking Capability

To evaluate the deep thinking capability of Ring-1T, we selected representative open-source thinking models (Ring-1T-preview, Deepseek-V3.1-Terminus-Thinking, Qwen-235B-A22B-Thinking-2507) and closed-source APIs (Gemini-2.5-pro and GPT-5-Thinking(High)) as benchmarks. First, Ring-1T shows more balanced performance across all tasks compared to the previously open-sourced preview version. Furthermore, Ring-1T has achieved an open-source state-of-the-art level on high-difficulty reasoning benchmarks such as mathematical competitions (AIME 25, HMMT 25), code generation (LiveCodeBench, CodeForce), and logical reasoning (ARC-AGI-1). It also demonstrates strong competitiveness in comprehensive tasks (Arena-Hard-v2.0), healthcare (HealthBench), and creative writing (Creative Writing v3).

Press enter or click to view image in full size

While we have implemented string and semantic-level filtering to mitigate benchmark contamination across all training stages — including pre-training, instruction fine-tuning, and RL prompts — strict de-contamination remains a major industry challenge for older benchmarks. To provide a more objective analysis of Ring-1T’s deep thinking capability, we tested it on the IMO 2025 (International Mathematical Olympiad), held in July this year, and the ICPC World Finals 2025 (International Collegiate Programming Contest World Finals), which concluded last month.

The IMO 2025 test followed a setup similar to the preview version: we integrated Ring-1T into the multi-agent framework AWorld (https://github.com/inclusionAI/AWorld) and used pure natural language reasoning to solve the problems. The results showed that Ring-1T successfully solved problems 1, 3, 4, and 5 in a single attempt (achieving IMO silver medal standard). In the third attempt, it also provided a near-perfect proof for the geometry problem (problem 2). For the most difficult problem, problem 6 (which AI contestants in IMO 2025 could not solve correctly), the answer converged to “4048,” identical to the output of Gemini 2.5 Pro (the correct answer is 2112). We believe that with ongoing optimization, Ring-1T has a strong chance of achieving an IMO gold medal level in a single attempt in the future.

IMO 2025 Test Results

IMO 2025 questions and answers

Press enter or click to view image in full size

IMO 2025 summarized test results

In the ICPC World Finals 2025, we compared GPT-5-thinking, Gemini-2.5-pro, and Ring-1T. In the direct model solution test, which allowed for three attempts, they solved 6 (CDEFKL), 3 (DFK), and 5 (DFJKL) problems, respectively. This highlights Ring-1T’s notable performance in a top-tier international coding competition. More tests are in progress, and we will open-source the model’s solution trajectories for these competitions (IMO trajectory link at the end), anticipating community engagement to optimize the reasoning potential of this trillion-parameter thinking model.

Icepop (Popsicle): “Escorting” Long-Cycle RL Training

In the reinforcement learning training of MoE models, the implementation discrepancy between the training and inference engines is significantly more pronounced than in Dense models. This disparity tends to widen over long-cycle training and when generating long sequences, escalating with increased sequence length and training steps. As demonstrated in the experiment below, the original GRPO algorithm begins to crash within a small number of training steps. Our proposed icepop algorithm addresses this by using a masked bi-directional truncation technique to correct the distribution, effectively reducing the train-inference discrepancy and mitigating the sharp rise in this divergence.

Press enter or click to view image in full size

Figure 1: GRPO train-inference discrepancy increases exponentially with training, while icepop remains relatively stable.

Figure 2: The maximum train-inference discrepancy shows a very sharp increase for GRPO with training, while icepop is maintained at a low level.

ASystem: Self-Developed RL Framework “Manages” Trillion-Scale Training

To guarantee stable and highly efficient reinforcement learning training for the trillion-parameter foundation model, we self-developed the high-performance reinforcement learning system — ASystem. ASystem employs a SingleController + SPMD architecture. In the training and inference engine, we implemented sophisticated optimizations specifically targeting memory management and training/inference weight exchange for the trillion-parameter base. Based on our proprietary unified memory pool technology for training and inference, we achieved transparent memory offloading, efficiently releasing memory fragments and reducing the risk of out-of-memory errors. Through techniques such as P2P direct communication between GPUs and in-place updates, we achieved zero-redundancy, second-level model weight exchange. For the RL training framework, we built a hybrid reward system utilizing large-scale Serverless Sandbox technology. This system can start up in milliseconds, provides a sandbox execution environment for over 10 programming languages, and supports a request throughput of up to 10K/s. We have open-sourced AReaL, hoping that this technology sharing will accelerate RL training and research within the open-source community.

Demo

In line with Ling-1T, Ring-1T also exhibits excellent performance in visualization and front-end development tasks.

  1. Bouncing Ball Simulation

Demo Video https://vimeo.com/1127206617

2. Solar System Motion Simulation

Demo Video https://vimeo.com/1127206260

3. Fireworks Display

Demo Video https://vimeo.com/1127206046

4. 3D Building Demolition Simulation

Demo Video https://vimeo.com/1127205971

5. Memory Match Master Game Development

Demo Video https://vimeo.com/1127205708

Furthermore, Ring-1T can not only complete logical puzzle reasoning but also directly generate a demo page to visually present the reasoning process.

6. Farmer, Wolf, Goat, and Cabbage Riddle

Demo Video https://vimeo.com/1127205572

Limitations and Future Plans

Ring-1T is the Ant Ling team’s inaugural attempt at a trillion-scale deep thinking model. Currently, the model still exhibits a certain probability of identity confusion, language mixing, and repetitive generation. Additionally, since its attention architecture retains the GQA (Grouped Query Attention) scheme from Ling 2.0, there is still room for improvement in inference efficiency for long-context scenarios. We will continue to optimize these issues in subsequent versions and eagerly await feedback from the community. Furthermore, the training of Ring-1T is ongoing. We will continue to tap into the reasoning potential of this trillion-parameter foundation, expecting to release a more mature, upgraded version soon.

We welcome everyone to visit our open-source repositories and experience pages for download and use.

🤗 HuggingFace: https://huggingface.co/inclusionAI/Ling-1T

🤖 ModelScope: https://modelscope.cn/models/inclusionAI/Ling-1T

GitHub: https://github.com/inclusionAI/Ling-V2

Ling Chat (For China domestic users): https://ling.tbox.cn/chat

ZenMux (For global developers, providing chat testing, API access, and other capabilities): https://zenmux.ai/inclusionai/ring-1t

Read Entire Article