Bring AI-native actions and verifications into your Playwright tests – open source, vision-enabled, and BYOL.

The Problem
Most “AI testing” frameworks make you throw away what already works.
They replace your entire test suite with “agentic” systems — where an LLM drives every click, assertion, and navigation step.
Sounds cool… until you hit:
- Slow, flaky, or non-deterministic runs
- Proprietary test formats
- Complete vendor lock-in
For most teams, that’s a non-starter.
What if you could keep your existing Playwright scripts, and just inject AI where it’s actually needed – the ambiguous, messy, or dynamic parts of your app?
The Idea
ai-wright brings AI steps to Playwright.
You still write regular Playwright tests – deterministic, fast, inspectable – but when you hit a fuzzy point, you can drop in a step like:
await ai.act('Click on a top rated campaign', { page, test });
Or
await ai.verify('The campaign description should not contain offensive words"', { page, test });
That’s it. AI only handles that step.
Everything else stays Playwright-native.
Why It’s Different
1. Vision-Enabled
Existing libraries (like ZeroStep and auto-playwright) use sanitized HTML – which misses what’s actually on screen.
This causes many issues:
- HTML ≠ UI reality – static DOM can’t reveal if elements are disabled, visible, obscured, or off-screen – resulting in LLMs attempting interaction with non-interactive elements.
- Loss of semantics – sanitized HTML strips ARIA roles, computed text, layout cues, and shadow DOM content, which are critical for accurate reasoning.
- Unbounded prompt size – large DOMs can often get too verbose, requiring truncation (resulting in loss of context).
- Fragile selectors – HTML-based approaches force LLMs to guess selectors; ai-wright uses precise SoM IDs bound to live DOM nodes, enabling accurate one-shot execution.
ai-wright is vision-enabled: it blends SOM (Set-Of-Marks) annotated screenshots + structured DOM context for grounded, visual reasoning.
The result: AI that operates just like a normal user would – based on what it sees on the screen.
2. Better Reasoning
Instead of one-shot “guess the next click,” ai-wright uses a multi-step reasoning loop.
It plans ahead, performs coarse-grained objective handling (e.g., “fill out login form,” not just “click button”), and adapts to UI state changes – minimizing retries and random flailing.
It can identify blockers (such as Modals etc.), and execute pre-steps before actioning on the objective.
3. BYOL (Bring Your Own License)
ai-wright is LLM-agnostic – unlike existing solutions which require either proprietary licenses or supports specific providers only.
You can use your own OpenAI, Claude, Gemini key, or your self-hosted model – avoiding vendor lock-in.
You can choose to use your TestChimp license as well – which will proxy the LLM calls, removing separate token costs for you.
4. Fully Open Source
Unlike agentic SaaS offerings which are closed source, proprietary solutions, ai-wright is fully open source, giving you complete transparency and community support.
ai-wright lets you inject AI where it matters — the tricky, ambiguous, or dynamic parts of your app — without giving up the speed, determinism, and maintainability of Playwright.
With vision-enabled reasoning, resilient multi-step planning, LLM flexibility, and a fully open source foundation, ai-wright bridges the best of both worlds: reliable, scriptable tests and AI-powered intelligence where you need it most – without any vendor lock-in.
AI where it helps, plain Playwright everywhere else.
.png)
![Warren Buffett's final shareholder letter [pdf]](https://news.najib.digital/site/assets/img/broken.gif)
