We open-sourced a GAIA-ready agent framework that builds super-agents in minutes

2 hours ago 2

🤖 Build GAIA-Benchmark-ready Super AI Agents in seconds, not weeks

Production-ready Super AI agent with 18+ tools and swappable providers
Built on AI SDK v6 ToolLoopAgent & ToolSDK.ai with ReAct reasoning

Quick Start · Features · GAIA Benchmark · Documentation

Pre-configured agent ready for GAIA benchmarks out of the box

🧠 ReAct Reasoning Pattern

Built-in Reasoning + Acting framework for structured thinking

� Planning & Verification

Multi-step planning + answer verification for complex tasks

Organized by category with official SDKs (Tavily, Exa, E2B, BrowserUse, Steel)

Easy provider switching for sandbox, browser, search, and memory

Integrated Tavily and Exa for intelligent web search

E2B cloud sandbox with code execution + filesystem operations

Steel, BrowserUse or AWS AgentCore for web interactions

Persistent memory with Mem0 or AWS AgentCore

ESM with granular exports, TypeScript-first

Empower developers to build world-class Super AI Agents in minutes, not months.

Whether you're creating a production-ready AI assistant for your product or competing in GAIA benchmarks, GAIA Agent provides the enterprise-grade foundation you need.

Days/weeks setting up APIs
Writing tool wrappers manually
Error handling for each service
Figuring out which providers to use
Integration testing headaches

3 lines of code to get started
16 tools ready with official SDKs
GAIA benchmark ready immediately
Swap providers with one line
Production-tested implementations

Time savings: From weeks of infrastructure setup → 3 lines of code

Result: A world-class, production-ready Super Agent that rivals top AI systems

🌟 What is the GAIA Benchmark?

The GAIA Benchmark is a comprehensive evaluation suite designed to test the capabilities of AI agents across a wide range of tasks, including reasoning, search, code execution, and browser automation.

📖 Read more about GAIA →

npm install @gaia-agent/sdk ai @ai-sdk/openai zod

import { createGaiaAgent } from '@gaia-agent/sdk'; // Create the agent - reads from environment variables const agent = createGaiaAgent(); const result = await agent.generate({ prompt: 'Calculate 15 * 23 and search for the latest AI papers', }); console.log(result.text);

Create a .env file:

# Required OPENAI_API_KEY=sk-... # Default providers (at least one required) TAVILY_API_KEY=tvly-... # Search E2B_API_KEY=... # Sandbox STEEL_API_KEY=steel_live_... # Browser

📖 Complete environment variables guide →

Category Tools Providers

🧮 Core	calculator, httpRequest	Built-in
� Planning	planner, verifier	Built-in
�🔍 Search	tavilySearch, exaSearch, exaGetContents	Tavily (default), Exa
🛡️ Sandbox	e2bSandbox, sandockExecute	E2B (default), Sandock
🖥️ Browser	steelBrowser, browserUseTool, awsBrowser	Steel (default), BrowserUse, AWS
🧠 Memory	mem0Remember, mem0Recall, memoryStore	Mem0 (default), AWS AgentCore

📖 Full tools documentation →
📖 Provider comparison →
📖 ReAct + Planning guide → ⭐ NEW

Switch providers with one line:

import { createGaiaAgent } from '@gaia-agent/sdk'; const agent = createGaiaAgent({ providers: { search: 'exa', // Use Exa instead of Tavily sandbox: 'sandock', // Use Sandock instead of E2B browser: 'browseruse', // Use BrowserUse instead of Steel }, });

Or set via environment variables:

GAIA_AGENT_SEARCH_PROVIDER=exa GAIA_AGENT_SANDBOX_PROVIDER=sandock GAIA_AGENT_BROWSER_PROVIDER=browseruse

Run official GAIA benchmarks with enhanced results tracking:

# Basic benchmark pnpm benchmark # Run validation set pnpm benchmark --limit 10 # Test with 10 tasks # Resume interrupted runs pnpm benchmark --resume # Continue from checkpoint # Filter by capability pnpm benchmark:files # Tasks with file attachments pnpm benchmark:code # Code execution tasks pnpm benchmark:search # Web search tasks pnpm benchmark:browser # Browser automation tasks # Stream mode (real-time thinking) pnpm benchmark:random --stream # Watch agent think in real-time # Wrong answers collection pnpm benchmark:wrong # Retry only failed tasks

📚 Wrong Answers Collection

Automatically track and retry failed tasks:

# 1. Run benchmark (auto-creates wrong-answers.json) pnpm benchmark --limit 20 # 2. View wrong answers cat benchmark-results/wrong-answers.json # 3. Retry only failed tasks pnpm benchmark:wrong --verbose # 4. Keep retrying until all pass pnpm benchmark:wrong # → "🎉 No wrong answers! All previous tasks passed."

📖 Wrong answers guide →
📖 Resume feature guide →
📖 Benchmark module docs →
📖 GAIA setup guide →

📊 Enhanced Benchmark Results

Benchmark results now include full task details:

{ "taskId": "abc123", "question": "What year was X founded?", "level": 2, "files": ["image.png"], "answer": "1927", "expectedAnswer": "1927", "correct": true, "durationMs": 5234, "steps": 3, "toolsUsed": ["search", "browser"], "summary": { "totalToolCalls": 5, "uniqueTools": ["search", "browser", "calculator"], "hadError": false }, "stepDetails": [ /* ... */ ] }

Easier to analyze and debug! 🎉

Run unit tests with Vitest:

pnpm test # Run all tests pnpm test:watch # Watch mode pnpm test:coverage # Coverage report

📖 Testing guide →

import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk'; import { tool } from 'ai'; import { z } from 'zod'; const agent = createGaiaAgent({ tools: { ...getDefaultTools(), weatherTool: tool({ description: 'Get weather', inputSchema: z.object({ city: z.string() }), execute: async ({ city }) => ({ temp: 72, condition: 'sunny' }), }), }, });

Integrate thousands of tools from ToolSDK.ai ecosystem:

import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk'; import { ToolSDKApiClient } from 'toolsdk/api'; // npm install toolsdk // Initialize ToolSDK client const toolSDK = new ToolSDKApiClient({ apiKey: process.env.TOOLSDK_AI_API_KEY }); // Load tools from ToolSDK packages const emailTool = await toolSDK.package('@toolsdk.ai/mcp-send-email', { RESEND_API_KEY: process.env.RESEND_API_KEY, }).getAISDKTool("send-email"); const agent = createGaiaAgent({ tools: { ...getDefaultTools(), emailTool }, }); const result = await agent.generate({ prompt: 'Help me search for the latest AI news and send it to [email protected]', });

📖 ToolSDK Packages →

import { GAIAAgent } from '@gaia-agent/sdk'; class ResearchAgent extends GAIAAgent { constructor() { super({ instructions: 'Research assistant specialized in AI papers', additionalTools: { /* custom tools */ }, }); } }

📖 Advanced usage guide →
📖 API reference →

This project uses automated NPM publishing. When changes are merged to main: