gpt-oss-safeguard: A new milestone for open source safety infrastructure

3 hours ago 1

After months of joint technical work between ROOST and OpenAI, together we are releasing one of the core components of OpenAI’s safety stack in the open—granting everyone the access to study a moderation model currently used in some of the most consequential products for safety. Called gpt-oss-safeguard, it is a powerful AI model that can be used to tackle a range of complicated online harms, including self-harm.

“OpenAI is proud to work with ROOST because we believe safety tools should be available for all to study, modify and reuse. We’re looking forward to seeing this happen with gpt-oss-safeguard and to continue building an open ecosystem of safety technologies adapted to the AI era.”

Johannes Heidecke, Head of Safety Systems, OpenAI

Alongside Nobel Peace Laureate Maria Ressa; Ryan Beiermeister, Vice President of Product Policy at OpenAI; Martin Tisne, CEO of the AI Collaborative; Dr. Giada Pistilli, Principal Ethicist of Hugging Face; and, Founding President of ROOST and Professor at Columbia University Camille François – ROOST and OpenAI unveiled the news on a panel at the Paris Peace Forum.

This marks the second major tech company to open source critical safety infrastructure through ROOST. Hear from Fidji Simo, OpenAI's CEO of Applications, on the launch:

About gpt-oss-safeguard

Gpt-oss-safeguard, which will be publicly available with open source weights and under an Apache 2 license, was designed to specifically help with tackling complex online harms— one of the more challenging parts of managing online platforms.

This technology will stand as a permanent shared resource, part of the online safety commons that can never be withdrawn or rolled back.

Specifically, gpt-oss-safeguard features two technical characteristics, which make its release particularly significant:

  • It is a reasoning model, so it can explain why it has made the moderation decisions it has. This creates transparency for the organizations using it, which is particularly valuable in nuanced domains.

  • It is configurable to any policy (“bring your own policy,” BYOP), so organizations don’t have to adopt OpenAI’s rules and standards—they can self-determine and apply their own content policies. This also is critical to tackle emerging harms, as organizations may need to rapidly evolve their policies.

"gpt-oss-safeguard is the first open source reasoning model with a ‘bring your own policies and definitions of harm’ design. Organizations deserve to freely study, modify and use critical safety technologies and be able to innovate. In our testing, it was skillful at understanding different policies, explaining its reasoning, and showing nuance in applying the policies, which we believe will be beneficial to builders and safety teams."

- Vinay Rao, CTO of ROOST

Here’s how to find, study, modify, and deploy gpt-oss-safeguard:

  • Hosted by the team at Hugging Face, anyone can download the weights of gpt-oss-safeguard here.

  • Learn more about how to use gpt-oss-safeguard with our user guide here.

  • Learn about our joint hackathon with OpenAI and Hugging Face here and hop into our Discord server here.

The Big Picture: Why This Matters

Online safety is broken today. There's a crisis of transparency and trust, a crisis of access, and a crisis of innovation. Closed systems concentrate risk, whereas open tooling can help improve real-world safety.

Safety decisions impact the daily realities for millions of people, and this release is a major step forward in bringing transparency to how the technology used for online safety actually works.

By opening up tools like gpt-oss-safeguard, we can enable everyone — from researchers to developers — to improve the very processes and systems that keep the internet safe. At ROOST, we believe that this collaborative approach between us and our partners benefits everyone—from companies building products to our society that relies on them

Open source transparency and collaboration strengthen the ecosystem and our world.

Learn About gpt-oss-safeguard by Joining the New ROOST Model Community

A robust open source ecosystem relies on an active community. That’s why, alongside the model release, we are launching the ROOST Model Community (RMC). RMC will bring together researchers and practitioners using open source AI models to protect online spaces and is open to all. We highly encourage anyone looking to join the RMC to visit the ROOST Model Community repository on GitHub for more information.

And to jumpstart the RMC’s launch, we are hosting a hackathon together with OpenAI and Hugging Face on December 8, 2025, in San Francisco. Click here to learn more!

The movement toward open source safety tools is happening now, and it’s bigger than any single organization. Around ROOST, a community of researchers, developers, policymakers, and platform teams is coalescing. We are experimenting, sharing, and learning together and proving that, when it comes to this line of work, we are all stronger than the sum of our parts.

We see every shared tool, every model released, and every community discussion as a building block toward a safer internet. By making open source tools the default foundation for how the world approaches online safety, we’re setting a new standard — one defined by transparency, collaboration, and community-driven innovation.

Read Entire Article