OpenThoughts: Data Recipes for Reasoning Models

4 months ago 18

[Submitted on 4 Jun 2025 (v1), last revised 5 Jun 2025 (this version, v2)]

View PDF HTML (experimental)

Abstract:Reasoning models have made rapid progress on many benchmarks involving math, code, and science. Yet, there are still many open questions about the best training recipes for reasoning since state-of-the-art models often rely on proprietary datasets with little to no public information available. To address this, the goal of the OpenThoughts project is to create open-source datasets for training reasoning models. After initial explorations, our OpenThoughts2-1M dataset led to OpenThinker2-32B, the first model trained on public reasoning data to match DeepSeek-R1-Distill-32B on standard reasoning benchmarks such as AIME and LiveCodeBench. We then improve our dataset further by systematically investigating each step of our data generation pipeline with 1,000+ controlled experiments, which led to OpenThoughts3. Scaling the pipeline to 1.2M examples and using QwQ-32B as teacher yields our OpenThoughts3-7B model, which achieves state-of-the-art results: 53% on AIME 2025, 51% on LiveCodeBench 06/24-01/25, and 54% on GPQA Diamond - improvements of 15.3, 17.2, and 20.5 percentage points compared to the DeepSeek-R1-Distill-Qwen-7B. All of our datasets and models are available on this https URL.

Submission history

From: Etash Guha [view email]
[v1] Wed, 4 Jun 2025 17:25:39 UTC (1,779 KB)
[v2] Thu, 5 Jun 2025 02:21:52 UTC (1,779 KB)

Read Entire Article