Build your own web-based coding agent

2 weeks ago 1

What does it take to turn an agent that runs on your laptop into something resilient and scalable enough to run in the cloud at millions of executions? For an AI insurance-claim processor, that’s the difference between a prototype and a system that handles millions of claims per month. For coding agents, it’s the difference between a local Cursor or Claude Code instance and a platform like Lovable that runs background agents at scale.

This sounds really complex, but with the powerful tools that we have available today, this becomes surprisingly approachable. So let's build a scalable coding agent platform with:

Modal for code sandboxes (environments where the agent develops the code) and the serverless compute for the agent.
Restate for durable execution and state management — resilience, idempotency, retries, and scalable orchestration for agent context and workflows.
OpenAI’s GPT-5 as the LLM.

We don’t try to build the smartest coding agent - instead, we’ll show how to make an agent scale to millions of users while staying resilient to crashes, outages, network hiccups, and rate limits. The superpowers come primarily from Modal and Restate: these are the components that define the scalable, fault-tolerant architecture. The LLM calls look the same whether you run one agent or millions.

For the impatient, here is a video of what this looks like when it is done.

Coding agent implementing hello world in C.

Without too much work, our agent will tolerate transient and hard failures (continues from the latest completed step), handle interruptions when new input becomes available, suspend on inactivity (free resources), handle sandbox TTLs, snapshots, restores, plus it lets us observe and monitor all the details of what it does.

The anatomy of a simple coding agent

Most coding agents offer a chat interface: the user states a goal, and the agent adds messages explaining its chain of thoughts and steps to achieve the task. The screenshot below shows how this looks in the Cursor IDE.

The coding agent's work begins with planning and creating a TODO list. Each step on the list is itself a multi-step workflow that might check the status of the environment, run commands, check results, experience errors, gather more information, retry, etc.

A crucial part of this experience is the ability to interrupt an agent mid-task. When a user sends a new message, the agent stops its current task to re-assess the situation with the additional input, then re-create the TODO list and resume the work.

Learning from this we will need:

A sandbox environment, where all the code gets created, compiled, commands are executed, and where we can retrieve the final application from.
A workflow that creates ToDo list and runs the steps. Each step is an agent loop that works towards the step’s goal and sends commands to the sandbox. The workflow ideally runs on serverless compute to be quickly scalable and scale to zero when the agent is inactive.
An agent orchestrator that manages chat history and agent context, spawns workflows, and reacts to interruptions.
Optionally, a UI where we enter chat messages and stream the agent’s thoughts to.

The Sandbox is our code environment: here code gets created, compiled, and launched, dependencies get installed, etc.

Modal offers a straightforward API for defining containers, launching them quickly at scale, and executing arbitrary code securely. Each sandbox is assigned a TTL or inactivity timeout, after which it is automatically terminated.

Once a sandbox is created we can execute commands directly and collect the response.

// create sandbox const sb = await app.createSandbox(image, { name, workdir: "/project", command: ["cat"], timeout, }); // use sandbox const sb = await Sandbox.fromName(APP_NAME, name); const args = ["sh", "-c", command]; const res = await sb.exec(args); const statusCode = await res.wait(); const [output, error] = await Promise.all([ res.stdout.readText(), res.stderr.readText(), ]);

Modal APIs are built to support stateless clients, with that clients connect and issue commands after creation, making them especially well-suited for FaaS environments.

Restate

Restate is a system for building innately resilient applications, workflows, or agents in the form of stateful durable functions. It manages failover, recovers completed steps of workflows, and gives us reliable RPC and simple state management.

Instead of stitching together queues, locks, K/V stores or heavyweight workflow engines, you simply turn a function or RPC handler into a stateful durable function by connecting it to Restate. Restate becomes the proxy/broker for the serverless function that manages durability and recovery.

The basic workflow of our agent looks like this, implemented with Restate’s TypeScript SDK:

// workflows are durable functions; they get reliably retried after // crashes and other failures const runTask = async (restate: Context, task: CodingTask) => { // (1) functions can define persistent steps. when functions // retry after a failure, these completed steps return their // previous results // We compute the plan in such a persistent step so we // always get he ongoing plan back during retries and failover const plan = await restate.run( "Call LLM to compute plan", () => preparePlan(task)); // (2) Looping over steps is just regular code const stepResults = [] for (const step of plan) { const stepInput = { task, step } // (3) Execute agent loop per step: repeatedly invoke tools // and provide feedback to the LLM, until the step's // objective is met. const stepResult = await runAgentLoop(restate, stepInput); stepResults.push(stepResult); } } // The agent loop can be called directly, or through Restate (making // it a subworkflow) const runAgentLoop = async (restate: Context, step: StepInput) => { // we always start with a new history bootstrapped from // a summary of previous progress const history: CoreMessage[] = [] history.push({ role: "system", content: `You are an expert AI coding assistant ... Overall task: ${task.prompt} Current step: ${step} Completed steps: ${concat(stepResults)}` }) for (let i = 0; i < MAX_STEPS; i++) { // run the LLM inference again as a durable step so that we // get the same deterministic result on recovery const { tools, messages, finished } = await restate.run( `execute ${step.title} iteration ${i + 1}`, () => callLLM(history), { maxRetryAttempts: 3 } ); history.push(...messages); if (finished === "stop") { return extractFinalResult(messages); } // we call each tool as a durable step as well, so they // don't get re-executed after a failure for (const tool of tools) { const result = await restate.run( "run tool {tool}", () => runTool(sandboxId, tool) ); history.push({ role: "tool", content: result }); } } throw new TerminalError(`Failed to execute within ${MAX_STEPS} steps`); }

The final version has a few more details, like moving the agent loop to a sub-workflow, SAGAs-style exception handling to release sandboxes (see below). For the final code, take a look at the Github Repository.

Context Across Workflows

Our agent should remember previous interactions within a session, so that tasks can build on each other. That means keeping chat history and accumulating context updates (from the workflow as it completes steps).

One approach would be to write continuously to a database. To ensure a consistent linear chat/responses history, we would need to make sure that we handle concurrency and synchronization (updates form workflow racing with new chat messages) and fencing (stale retries interfering, lingering workflow zombie sending late updates).

The simpler approach is to use Restate’s virtual objects: entities identified by a key (e.g., actor-id or session-name), with access to their own isolated K/V state and request queue. A single function can execute for an object at a time (ensures linear history of operations) and state updates are transactionally integrated into the durable execution, so we don’t need to worry about most consistency issues.

We can start modeling the conversation with the following simple handler, which is the single main entry point for all agent interactions.

Scalable Fault-tolerant Agents in Action

We deploy our agent code as serverless functions, in this example on Vercel. Restate gives us the options to run our own server or use Restate Cloud - we use the cloud version here and register our Vercel deployment URLs. Restate becomes our entry point for the durable agent execution. Our infrastructure looks like this:

To make it nicer to interact with, we put a simple UI in front of the APIs (courtesy of V0) and add some code to stream LLM output into the UI, to give us faster feedback.

A simple coding task

Let’s give the agent a first task watch the workflow execution.

The Restate and Modal UIs show us how the workflow executes and the sandboxes spin up.

Failures and recovery

One of the nice properties of this setup is that it handles most infrastructure failures automatically. Below we see what happens if we let the Agent function process crash randomly. It recovered seamlessly and we didn’t write any code for that at all 🤯

Fast Scale-out

Finally, let’s check what happens when we throw a bunch of tasks to different agent sessions:

We see Restate building up sets of concurrent ongoing workflow invocations
Our functions on Vercel rapidly scale out and handle all the requests
Modal quickly starts spinning up sandboxes

And once we are done, everything goes back to zero. Kinda nice!

That’s it!

We just built a serverless, scalable, fault-tolerant cloud coding agent in a few hours. We truly live in the best of times (infra-building wise).

Check out the full code at: https://github.com/igalshilman/agent47/tree/main/packages/agent

Learn more about Modal’s sandboxes at https://modal.com/products/sandboxes

Check the Restate docs or create a free Restate cloud environment.

Bonus 1: Interruptions

Interruptions are messages to the agent while it is working. These are important for coding agents, because a coding task can take a while, and you might see the agent going off in the wrong direction, and you want to quickly add the missing context to get it back on track.

Restate lets us build interruptions easily using cancellation signals: These cause the current workflow action to terminally fail and raise an Error, but allow the code to continue with further durable actions such as for cleanup and notifying the agent orchestrator. Cancellation signals automatically propagate through sub-workflows and give us something similar to stack unwinding with exceptions, just as a distributed version.

We make a small change to our main message handler in the agent orchestrator Virtual Object, to track and cancel ongoing workflows upon messages.

async function message(restate: ObjectContext, message: Message) { const agentId = restate.key; // (1) access state of the virtual object const messages = (await restate.get("messages")) ?? []; messages.push({ role: "user", message }); // (2) interrupt ongoing workflow const ongoingTaskId= await restate.get("currentTaskId"); if (ongoingTaskId) { restate.cancel(ongoingTaskId) restate.clear("currentTaskId"); messages.push({ role: "agent", message: "Task was cancelled" }); } // (3) start executing the new task const handle = restate .serviceSendClient(agentWorkflow) .runTask({ agentId, context: messages }); // (4) store handle to task const invocationId = await handle.invocationId; restate.set("currentTaskId", invocationId); }

When we run this, we can see the agent stopping workflows and the context getting updated.

Bonus 2: Managing Sandbox Lifetimes

Modal sandboxes use a Time-To-Live (TTL) or inactivity timeout, after which, the sandbox gets terminated. To keep the sandbox alive even across long coding sessions without setting a high TTL (and spending a lot of money), we will use Modal’s filesystem snapshot API and Restate’s durable timers to track TTL and extend it when needed with a stateful sandbox restart.

Each sandbox is represented via a Virtual Object that handles interactions (e.g., writeFile, executeShellCommand).
The virtual object scheduled times for restarts to extend the default TTL.
The object’s single-writer semantics automatically ensure that timers, interactions, and restart sequences never interfere with each other.

The full code is in the GitHub repository - here we only look at the restart handler, which will be periodically invoked.

// a handler on our sandbox virtual object async function restart(restate: ObjectContext) { const { sandboxName } = await ctx.get<SandboxState>("state"); // simple three step workflow to snapshot and restart sandbox const snapshotId = await restate.run("take snapshot", () => snapshotModalSandbox(sandboxName) ); await restate.run("terminate sandbox", () => terminateModalSandbox(sandboxName) ); await restate.run("start from snapshot", () => startModalSandbox(sandboxName, snapshotId, SANDBOX_TIMEOUT_MS) ); // schedule a new restart in a few minutes ctx .objectClient(modal, ctx.key) .restart(opts({ delay: { minutes: 30 } })); }

Bonus 3: SAGAs for cleanup

One of the amazing properties of durable execution is that it guarantees functions run to the end. That even extends to handling terminal exceptions (that don’t cause retries but should fail a function execution): We can catch those exceptions and handle them to perform cleanup and send RPCs (notifications). Adding this to our agent workflow looks like that:

Read Entire Article