OpenAI AgentKit

1 month ago 6

Today we’re launching AgentKit, a complete set of tools for developers and enterprises to build, deploy, and optimize agents. Until now, building agents meant juggling fragmented tools—complex orchestration with no versioning, custom connectors, manual eval pipelines, prompt tuning, and weeks of frontend work before launch. With AgentKit, developers can now design workflows visually and embed agentic UIs faster using new building blocks like:

  • Agent Builder: a visual canvas for creating and versioning multi-agent workflows
  • Connector Registry: a central place for admins to manage how data and tools connect across OpenAI products
  • ChatKit: a toolkit for embedding customizable chat-based agent experiences in your product

We’re also expanding evaluation capabilities with new features like datasets, trace grading, automated prompt optimization, and third-party model support to measure and improve agent performance.

Since releasing the Responses API and Agents SDK⁠ in March, we’ve seen developers and enterprises build end-to-end agentic workflows for deep research, customer support, and more. Klarna built a support agent⁠ that handles two-thirds of all tickets and Clay 10x’ed growth⁠ with a sales agent. AgentKit builds on the Responses API to help developers build agents more efficiently and reliably.

Design workflows with Agent Builder

As agent workflows grow more complex, developers need clearer visibility into how they work. Agent Builder⁠(opens in a new window) provides a visual canvas for composing logic with drag-and-drop nodes, connecting tools, and configuring custom guardrails. It supports preview runs, inline eval configuration, and full versioning—ideal for fast iteration.

Interface view of a customer service automation flow in a visual builder tool. The canvas shows connected nodes labeled Start, Jailbreak guardrail, Classification agent, If/else, Return agent, Retention agent, Information agent, Hallucination guardrail, and End. A sidebar on the left lists available node types such as Agent, Note, File search, Guardrails, MCP, and User approval. Top controls include options for Evaluate, Code, Preview, and Publish.

Builders can get started with a blank canvas or with prebuilt templates.

At Ramp, the team went from a blank canvas to a buyer agent in just a few hours:

Agent Builder transformed what once took months of complex orchestration, custom code, and manual optimizations into just a couple of hours. The visual canvas keeps product, legal, and engineering on the same page, slashing iteration cycles by 70% and getting an agent live in two sprints rather than two quarters.”

— Ramp

Similarly, LY Corporation—a leading Japanese technology and internet services company—built a work assistant agent with Agent Builder in less than two hours.

"Agent Builder allowed us to orchestrate agents in a whole new way, with engineers and subject matter experts collaborating all in one interface. We built our first multi-agentic workflow and ran it in less than two hours, dramatically accelerating the time to create and deploy agents."

— LY Corporation

We’re also launching a Connector Registry for enterprises to govern and maintain data across multiple workspaces and organizations. The Connector Registry⁠(opens in a new window) consolidates data sources into a single admin panel across ChatGPT and the API. The registry includes all pre-built connectors like Dropbox, Google Drive, Sharepoint, and Microsoft Teams, as well as third-party MCPs.

Developers can also enable Guardrails⁠(opens in a new window) in Agent Builder—an open-source, modular safety layer that helps protect agents against unintended or malicious behavior. Guardrails can mask or flag PII, detect jailbreaks, and apply other safeguards, making it easier to build and deploy reliable, safe agents. Guardrails can be deployed standalone or via the guardrails library for Python⁠(opens in a new window) and JavaScript⁠(opens in a new window).

Embed agentic chat experiences with ChatKit

Deploying chat UIs for agents can be surprisingly complex— handling streaming responses, managing threads, showing the model thinking, and designing engaging in-chat experiences. ChatKit⁠(opens in a new window) makes it simple to embed chat-based agents that feel native to your product. It can be embedded into apps or websites and customized to match your theme or brand.

ChatKit already powers a range of use cases, from internal knowledge assistants and onboarding guides to customer support and research agents. HubSpot⁠(opens in a new window)’s customer support agent is one example:

Measure agent performance with new Evals capabilities

Building reliable, production-ready agents requires rigorous performance evaluations. Last year, we launched Evals⁠(opens in a new window) to help developers test prompts and measure model behavior. We’re now adding four new capabilities that make it even easier to build evals:

  • Datasets–rapidly build agent evals from scratch and expand them over time with automated graders and human annotations..
  • Trace grading–run end-to-end assessments of agentic workflows and automate grading to pinpoint shortcomings.
  • Automated prompt optimization–generate improved prompts based on human annotations and grader outputs.
  • Third-party model support–evaluate models from other providers within the OpenAI Evals platform.

We’ve already seen major performance gains from customers using Evals.

Push agent performance with reinforcement fine-tuning

Reinforcement fine-tuning⁠(opens in a new window) (RFT) lets developers customize our reasoning models. It is generally available on OpenAI o4-mini and in private beta for GPT‑5. We are working closely with dozens of customers to refine the RFT for GPT‑5 before wider release.

Today, we’re introducing two new features in that RFT beta designed to push agent performance even further:

  • Custom tool calls–train models to call the right tools at the right time for better reasoning 
  • Custom graders–set custom evaluation criteria for what matters most in your use case

Starting today, ChatKit and the new Evals capabilities are generally available to all developers. Agent Builder is available in beta, and Connector Registry is beginning its beta rollout to some API, ChatGPT Enterprise and Edu customers with a Global Admin Console⁠(opens in a new window) (where Global Owners can manage domains, SSO, multiple API orgs). The Global Admin console is a pre-requisite to enabling Connector Registry. All of these tools are included with standard API model pricing.

We plan to add a standalone Workflows API and agent deployment options to ChatGPT soon.

We can’t wait to see what you build.

Read Entire Article