AI-wright: AI steps in Playwright scripts: open-source, Vision-enabled, BYOL

2 hours ago 2

Bring AI-native actions and verifications into your Playwright tests – open source, vision-enabled, and BYOL.

The Problem

Most “AI testing” frameworks make you throw away what already works.

They replace your entire test suite with “agentic” systems — where an LLM drives every click, assertion, and navigation step.

Sounds cool… until you hit:

Slow, flaky, or non-deterministic runs
Proprietary test formats
Complete vendor lock-in

For most teams, that’s a non-starter.

What if you could keep your existing Playwright scripts, and just inject AI where it’s actually needed – the ambiguous, messy, or dynamic parts of your app?

The Idea

ai-wright brings AI steps to Playwright.

You still write regular Playwright tests – deterministic, fast, inspectable – but when you hit a fuzzy point, you can drop in a step like:

await ai.act('Click on a top rated campaign', { page, test });

await ai.verify('The campaign description should not contain offensive words"', { page, test });

That’s it. AI only handles that step.

Everything else stays Playwright-native.

Why It’s Different

1. Vision-Enabled

Existing libraries (like ZeroStep and auto-playwright) use sanitized HTML – which misses what’s actually on screen.

This causes many issues:

HTML ≠ UI reality – static DOM can’t reveal if elements are disabled, visible, obscured, or off-screen – resulting in LLMs attempting interaction with non-interactive elements.
Loss of semantics – sanitized HTML strips ARIA roles, computed text, layout cues, and shadow DOM content, which are critical for accurate reasoning.
Unbounded prompt size – large DOMs can often get too verbose, requiring truncation (resulting in loss of context).
Fragile selectors – HTML-based approaches force LLMs to guess selectors; ai-wright uses precise SoM IDs bound to live DOM nodes, enabling accurate one-shot execution.

ai-wright is vision-enabled: it blends SOM (Set-Of-Marks) annotated screenshots + structured DOM context for grounded, visual reasoning.

The result: AI that operates just like a normal user would – based on what it sees on the screen.

2. Better Reasoning

Instead of one-shot “guess the next click,” ai-wright uses a multi-step reasoning loop.

It plans ahead, performs coarse-grained objective handling (e.g., “fill out login form,” not just “click button”), and adapts to UI state changes – minimizing retries and random flailing.

It can identify blockers (such as Modals etc.), and execute pre-steps before actioning on the objective.

3. BYOL (Bring Your Own License)

ai-wright is LLM-agnostic – unlike existing solutions which require either proprietary licenses or supports specific providers only.

You can use your own OpenAI, Claude, Gemini key, or your self-hosted model – avoiding vendor lock-in.

You can choose to use your TestChimp license as well – which will proxy the LLM calls, removing separate token costs for you.

4. Fully Open Source

Unlike agentic SaaS offerings which are closed source, proprietary solutions, ai-wright is fully open source, giving you complete transparency and community support.

ai-wright lets you inject AI where it matters — the tricky, ambiguous, or dynamic parts of your app — without giving up the speed, determinism, and maintainability of Playwright.

With vision-enabled reasoning, resilient multi-step planning, LLM flexibility, and a fully open source foundation, ai-wright bridges the best of both worlds: reliable, scriptable tests and AI-powered intelligence where you need it most – without any vendor lock-in.

AI where it helps, plain Playwright everywhere else.

Read Entire Article