Show HN: State of the Art Open-source alternative to ChatGPT Agents for browsing

21 hours ago 1

Meka Agent is an open-source, autonomous computer-using agent that delivers state-of-the-art browsing capabilities. The agent works and acts in the same way humans do, by purely using vision as its eyes and acting within a full computer context.

It is designed as a simple, extensible, and customizable framework, allowing flexibility in the choice of models, tools, and infrastructure providers.

The agent primarily focuses on web browsing today, and achieves state-of-the-art benchmark results in the WebArena Benchmark (72.7%).

Read more about the details of the benchmark results.

If you would like to get started with browser automations without any setup, visit the Meka App to try the Meka Agent with $10 in free credits.

To get started with Meka, we packaged various providers that we have extensively tested. There are two main pieces:

A vision model that has good visual grounding. From our experimentation, OpenAI o3, Claude Sonnet 4, and Claude Opus 4 are the best US-based models. We have not experimented with Chinese-based models but would love to see contributions!
An infrastructure provider that exposes OS-level controls, not just a browser layer with Playwright screenshots. This is important for performance as a number of common web elements are rendered at the system level, invisible to the browser page. (Examples include dropdown menus, browser alerts, file uploads, and more)

To get started, we choose OpenAI o3 as the model and Anchor Browser as the VM-based infrastructure provider. We are open to submissions by other infra providers with OS-level controls!

Install the main components of the SDK

npm install @trymeka/core @trymeka/ai-provider-vercel @ai-sdk/openai @trymeka/computer-provider-anchor-browser playwright-core

Create your .env file and enter your API keys from the starter providers

OPENAI_API_KEY=GET FROM https://platform.openai.com/settings/organization/api-keys ANCHOR_BROWSER_API_KEY=GET FROM https://app.anchorbrowser.io/api-access

Start the agent

For more usage examples, check out /examples.

Meka is created from lessons learned from experimentation and publicly available research. Our fundamental philosophy in creating this agent is to think like how humans would, from vision to tools to memory.

For more details, visit our blog post on the Meka Agent.

Bring your own LLM: Meka is inherently hackable and works with any Model that Vercel's ai-sdk supports. It is important that the model is a vision model that has good visual grounding. In our experiments, OpenAI o3, Sonnet 4, and Opus 4 are good candidates.
Extensible: Meka is designed to be extensible. You can easily add your own tools and providers to the agent.
Open Source: Meka is oepn and builds on learnings that we've developed over testing ai agents on autonomous task.
Typesafe: Meka is written in TypeScript and provides a typesafe API for building and interacting with agents.

We welcome contributions to Meka Agent! If you'd like to contribute, please read our contributing guidelines to get started.

Meka Agent is licensed under the MIT License. See the LICENSE file for more information.

Read Entire Article