We open-sourced a GAIA-ready agent framework that builds super-agents in minutes

2 hours ago 2

GAIA Agent Logo

🤖 Build GAIA-Benchmark-ready Super AI Agents in seconds, not weeks

Production-ready Super AI agent with 18+ tools and swappable providers
Built on AI SDK v6 ToolLoopAgent & ToolSDK.ai with ReAct reasoning

npm version License TypeScript AI SDK

Quick Start · Features · GAIA Benchmark · Documentation


Pre-configured agent ready for GAIA benchmarks out of the box

🧠 ReAct Reasoning Pattern

Built-in Reasoning + Acting framework for structured thinking

� Planning & Verification

Multi-step planning + answer verification for complex tasks

Organized by category with official SDKs (Tavily, Exa, E2B, BrowserUse, Steel)

Easy provider switching for sandbox, browser, search, and memory

Integrated Tavily and Exa for intelligent web search

E2B cloud sandbox with code execution + filesystem operations

Steel, BrowserUse or AWS AgentCore for web interactions

Persistent memory with Mem0 or AWS AgentCore

ESM with granular exports, TypeScript-first


Empower developers to build world-class Super AI Agents in minutes, not months.

Whether you're creating a production-ready AI assistant for your product or competing in GAIA benchmarks, GAIA Agent provides the enterprise-grade foundation you need.

  • Days/weeks setting up APIs
  • Writing tool wrappers manually
  • Error handling for each service
  • Figuring out which providers to use
  • Integration testing headaches
  • 3 lines of code to get started
  • 16 tools ready with official SDKs
  • GAIA benchmark ready immediately
  • Swap providers with one line
  • Production-tested implementations

Time savings: From weeks of infrastructure setup → 3 lines of code

Result: A world-class, production-ready Super Agent that rivals top AI systems

🌟 What is the GAIA Benchmark?

The GAIA Benchmark is a comprehensive evaluation suite designed to test the capabilities of AI agents across a wide range of tasks, including reasoning, search, code execution, and browser automation.

📖 Read more about GAIA →


npm install @gaia-agent/sdk ai @ai-sdk/openai zod
import { createGaiaAgent } from '@gaia-agent/sdk'; // Create the agent - reads from environment variables const agent = createGaiaAgent(); const result = await agent.generate({ prompt: 'Calculate 15 * 23 and search for the latest AI papers', }); console.log(result.text);

Create a .env file:

# Required OPENAI_API_KEY=sk-... # Default providers (at least one required) TAVILY_API_KEY=tvly-... # Search E2B_API_KEY=... # Sandbox STEEL_API_KEY=steel_live_... # Browser

📖 Complete environment variables guide →


Category Tools Providers
🧮 Core calculator, httpRequest Built-in
Planning planner, verifier Built-in
�🔍 Search tavilySearch, exaSearch, exaGetContents Tavily (default), Exa
🛡️ Sandbox e2bSandbox, sandockExecute E2B (default), Sandock
🖥️ Browser steelBrowser, browserUseTool, awsBrowser Steel (default), BrowserUse, AWS
🧠 Memory mem0Remember, mem0Recall, memoryStore Mem0 (default), AWS AgentCore

📖 Full tools documentation →
📖 Provider comparison →
📖 ReAct + Planning guide → ⭐ NEW


Switch providers with one line:

import { createGaiaAgent } from '@gaia-agent/sdk'; const agent = createGaiaAgent({ providers: { search: 'exa', // Use Exa instead of Tavily sandbox: 'sandock', // Use Sandock instead of E2B browser: 'browseruse', // Use BrowserUse instead of Steel }, });

Or set via environment variables:

GAIA_AGENT_SEARCH_PROVIDER=exa GAIA_AGENT_SANDBOX_PROVIDER=sandock GAIA_AGENT_BROWSER_PROVIDER=browseruse

Run official GAIA benchmarks with enhanced results tracking:

# Basic benchmark pnpm benchmark # Run validation set pnpm benchmark --limit 10 # Test with 10 tasks # Resume interrupted runs pnpm benchmark --resume # Continue from checkpoint # Filter by capability pnpm benchmark:files # Tasks with file attachments pnpm benchmark:code # Code execution tasks pnpm benchmark:search # Web search tasks pnpm benchmark:browser # Browser automation tasks # Stream mode (real-time thinking) pnpm benchmark:random --stream # Watch agent think in real-time # Wrong answers collection pnpm benchmark:wrong # Retry only failed tasks

📚 Wrong Answers Collection

Automatically track and retry failed tasks:

# 1. Run benchmark (auto-creates wrong-answers.json) pnpm benchmark --limit 20 # 2. View wrong answers cat benchmark-results/wrong-answers.json # 3. Retry only failed tasks pnpm benchmark:wrong --verbose # 4. Keep retrying until all pass pnpm benchmark:wrong # → "🎉 No wrong answers! All previous tasks passed."

📖 Wrong answers guide →
📖 Resume feature guide →
📖 Benchmark module docs →
📖 GAIA setup guide →


📊 Enhanced Benchmark Results

Benchmark results now include full task details:

{ "taskId": "abc123", "question": "What year was X founded?", "level": 2, "files": ["image.png"], "answer": "1927", "expectedAnswer": "1927", "correct": true, "durationMs": 5234, "steps": 3, "toolsUsed": ["search", "browser"], "summary": { "totalToolCalls": 5, "uniqueTools": ["search", "browser", "calculator"], "hadError": false }, "stepDetails": [ /* ... */ ] }

Easier to analyze and debug! 🎉


Run unit tests with Vitest:

pnpm test # Run all tests pnpm test:watch # Watch mode pnpm test:coverage # Coverage report

📖 Testing guide →


import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk'; import { tool } from 'ai'; import { z } from 'zod'; const agent = createGaiaAgent({ tools: { ...getDefaultTools(), weatherTool: tool({ description: 'Get weather', inputSchema: z.object({ city: z.string() }), execute: async ({ city }) => ({ temp: 72, condition: 'sunny' }), }), }, });

Integrate thousands of tools from ToolSDK.ai ecosystem:

import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk'; import { ToolSDKApiClient } from 'toolsdk/api'; // npm install toolsdk // Initialize ToolSDK client const toolSDK = new ToolSDKApiClient({ apiKey: process.env.TOOLSDK_AI_API_KEY }); // Load tools from ToolSDK packages const emailTool = await toolSDK.package('@toolsdk.ai/mcp-send-email', { RESEND_API_KEY: process.env.RESEND_API_KEY, }).getAISDKTool("send-email"); const agent = createGaiaAgent({ tools: { ...getDefaultTools(), emailTool }, }); const result = await agent.generate({ prompt: 'Help me search for the latest AI news and send it to [email protected]', });

📖 ToolSDK Packages →

import { GAIAAgent } from '@gaia-agent/sdk'; class ResearchAgent extends GAIAAgent { constructor() { super({ instructions: 'Research assistant specialized in AI papers', additionalTools: { /* custom tools */ }, }); } }

📖 Advanced usage guide →
📖 API reference →



This project uses automated NPM publishing. When changes are merged to main:

  1. ✅ Tests run automatically
  2. 📦 Version bumps to next patch (e.g., 0.1.0 → 0.1.1)
  3. 📝 Changelog created in changelog/
  4. 🚀 Published to NPM
  5. 🏷️ Git tag created

For manual version bumps (minor/major), see docs/NPM_PUBLISH_SETUP.md.


Apache License 2.0


Made with ❤️ for the AI community

Read Entire Article