Cot-Self-Instruct: Synthetic prompts for reasoning and non-reasoning tasks

3 months ago 2

[Submitted on 31 Jul 2025]

View PDF HTML (experimental)

Abstract:We propose CoT-Self-Instruct, a synthetic data generation method that instructs LLMs to first reason and plan via Chain-of-Thought (CoT) based on the given seed tasks, and then to generate a new synthetic prompt of similar quality and complexity for use in LLM training, followed by filtering for high-quality data with automatic metrics. In verifiable reasoning, our synthetic data significantly outperforms existing training datasets, such as s1k and OpenMathReasoning, across MATH500, AMC23, AIME24 and GPQA-Diamond. For non-verifiable instruction-following tasks, our method surpasses the performance of human or standard self-instruct prompts on both AlpacaEval 2.0 and Arena-Hard.

Submission history

From: Jason Weston [view email]
[v1] Thu, 31 Jul 2025 17:38:50 UTC (239 KB)

Read Entire Article