We generated 50k+ task instances with SWE-smith to train SWE-agent-LM-32B (open-weight SotA on Verified). More in the paper!
Open Weight Model Open Source System Checked
SWE-bench Lite is a subset curated for less costly evaluation [Post].
SWE-bench Verified is a human-filtered subset [Post].
SWE-bench Multimodal features issues with visual elements [Post].
Each entry reports the % Resolved metric, the percentage of instances solved (out of 2294 Full, 500 Verified, 300 Lite, 517 Multimodal).
News
- [05/2025] Our new paper SWE-smith is out! Train your own models for software engineering agents. [Link]
- [03/2025] SWE-agent 1.0 is the open source SOTA on SWE-bench Lite! [Link]
- [10/2024] Introducing SWE-bench Multimodal! [Link]
- [08/2024] SWE-bench x OpenAI = SWE-bench Verified [Report]
- [06/2024] Docker-ized SWE-bench for easier evaluation [Report]
- [03/2024] Check out SWE-agent (12.47% on SWE-bench) [Link]
- [03/2024] Released SWE-bench Lite [Report]
Acknowledgements
We thank the following institutions for their generous support: Open Philanthropy, AWS, Modal, Andreessen Horowitz, OpenAI, and Anthropic.