What does it take to turn an agent that runs on your laptop into something resilient and scalable enough to run in the cloud at millions of executions? For an AI insurance-claim processor, that’s the difference between a prototype and a system that handles millions of claims per month. For coding agents, it’s the difference between a local Cursor or Claude Code instance and a platform like Lovable that runs background agents at scale.
This sounds really complex, but with the powerful tools that we have available today, this becomes surprisingly approachable. So let's build a scalable coding agent platform with:
- Modal for code sandboxes (environments where the agent develops the code) and the serverless compute for the agent.
- Restate for durable execution and state management — resilience, idempotency, retries, and scalable orchestration for agent context and workflows.
- OpenAI’s GPT-5 as the LLM.
We don’t try to build the smartest coding agent - instead, we’ll show how to make an agent scale to millions of users while staying resilient to crashes, outages, network hiccups, and rate limits. The superpowers come primarily from Modal and Restate: these are the components that define the scalable, fault-tolerant architecture. The LLM calls look the same whether you run one agent or millions.
For the impatient, here is a video of what this looks like when it is done.
Without too much work, our agent will tolerate transient and hard failures (continues from the latest completed step), handle interruptions when new input becomes available, suspend on inactivity (free resources), handle sandbox TTLs, snapshots, restores, plus it lets us observe and monitor all the details of what it does.
The anatomy of a simple coding agent
Most coding agents offer a chat interface: the user states a goal, and the agent adds messages explaining its chain of thoughts and steps to achieve the task. The screenshot below shows how this looks in the Cursor IDE.

The coding agent's work begins with planning and creating a TODO list. Each step on the list is itself a multi-step workflow that might check the status of the environment, run commands, check results, experience errors, gather more information, retry, etc.
A crucial part of this experience is the ability to interrupt an agent mid-task. When a user sends a new message, the agent stops its current task to re-assess the situation with the additional input, then re-create the TODO list and resume the work.
Learning from this we will need:
-
A sandbox environment, where all the code gets created, compiled, commands are executed, and where we can retrieve the final application from.
-
A workflow that creates ToDo list and runs the steps. Each step is an agent loop that works towards the step’s goal and sends commands to the sandbox. The workflow ideally runs on serverless compute to be quickly scalable and scale to zero when the agent is inactive.
-
An agent orchestrator that manages chat history and agent context, spawns workflows, and reacts to interruptions.
-
Optionally, a UI where we enter chat messages and stream the agent’s thoughts to.

Modal
The Sandbox is our code environment: here code gets created, compiled, and launched, dependencies get installed, etc.
Modal offers a straightforward API for defining containers, launching them quickly at scale, and executing arbitrary code securely. Each sandbox is assigned a TTL or inactivity timeout, after which it is automatically terminated.
Once a sandbox is created we can execute commands directly and collect the response.
Modal APIs are built to support stateless clients, with that clients connect and issue commands after creation, making them especially well-suited for FaaS environments.

Restate
Restate is a system for building innately resilient applications, workflows, or agents in the form of stateful durable functions. It manages failover, recovers completed steps of workflows, and gives us reliable RPC and simple state management.
Instead of stitching together queues, locks, K/V stores or heavyweight workflow engines, you simply turn a function or RPC handler into a stateful durable function by connecting it to Restate. Restate becomes the proxy/broker for the serverless function that manages durability and recovery.
The basic workflow of our agent looks like this, implemented with Restate’s TypeScript SDK:
The final version has a few more details, like moving the agent loop to a sub-workflow, SAGAs-style exception handling to release sandboxes (see below). For the final code, take a look at the Github Repository.
Context Across Workflows
Our agent should remember previous interactions within a session, so that tasks can build on each other. That means keeping chat history and accumulating context updates (from the workflow as it completes steps).
One approach would be to write continuously to a database. To ensure a consistent linear chat/responses history, we would need to make sure that we handle concurrency and synchronization (updates form workflow racing with new chat messages) and fencing (stale retries interfering, lingering workflow zombie sending late updates).
The simpler approach is to use Restate’s virtual objects: entities identified by a key (e.g., actor-id or session-name), with access to their own isolated K/V state and request queue. A single function can execute for an object at a time (ensures linear history of operations) and state updates are transactionally integrated into the durable execution, so we don’t need to worry about most consistency issues.

We can start modeling the conversation with the following simple handler, which is the single main entry point for all agent interactions.
Scalable Fault-tolerant Agents in Action
We deploy our agent code as serverless functions, in this example on Vercel. Restate gives us the options to run our own server or use Restate Cloud - we use the cloud version here and register our Vercel deployment URLs. Restate becomes our entry point for the durable agent execution. Our infrastructure looks like this:

To make it nicer to interact with, we put a simple UI in front of the APIs (courtesy of V0) and add some code to stream LLM output into the UI, to give us faster feedback.
A simple coding task
Let’s give the agent a first task watch the workflow execution.
The Restate and Modal UIs show us how the workflow executes and the sandboxes spin up.
Failures and recovery
One of the nice properties of this setup is that it handles most infrastructure failures automatically. Below we see what happens if we let the Agent function process crash randomly. It recovered seamlessly and we didn’t write any code for that at all 🤯

Fast Scale-out
Finally, let’s check what happens when we throw a bunch of tasks to different agent sessions:
- We see Restate building up sets of concurrent ongoing workflow invocations
- Our functions on Vercel rapidly scale out and handle all the requests
- Modal quickly starts spinning up sandboxes
And once we are done, everything goes back to zero. Kinda nice!
That’s it!
We just built a serverless, scalable, fault-tolerant cloud coding agent in a few hours. We truly live in the best of times (infra-building wise).
Check out the full code at: https://github.com/igalshilman/agent47/tree/main/packages/agent
Learn more about Modal’s sandboxes at https://modal.com/products/sandboxes
Check the Restate docs or create a free Restate cloud environment.
Bonus 1: Interruptions
Interruptions are messages to the agent while it is working. These are important for coding agents, because a coding task can take a while, and you might see the agent going off in the wrong direction, and you want to quickly add the missing context to get it back on track.
Restate lets us build interruptions easily using cancellation signals: These cause the current workflow action to terminally fail and raise an Error, but allow the code to continue with further durable actions such as for cleanup and notifying the agent orchestrator. Cancellation signals automatically propagate through sub-workflows and give us something similar to stack unwinding with exceptions, just as a distributed version.
We make a small change to our main message handler in the agent orchestrator Virtual Object, to track and cancel ongoing workflows upon messages.
When we run this, we can see the agent stopping workflows and the context getting updated.


Bonus 2: Managing Sandbox Lifetimes
Modal sandboxes use a Time-To-Live (TTL) or inactivity timeout, after which, the sandbox gets terminated. To keep the sandbox alive even across long coding sessions without setting a high TTL (and spending a lot of money), we will use Modal’s filesystem snapshot API and Restate’s durable timers to track TTL and extend it when needed with a stateful sandbox restart.
- Each sandbox is represented via a Virtual Object that handles interactions (e.g., writeFile, executeShellCommand).
- The virtual object scheduled times for restarts to extend the default TTL.
- The object’s single-writer semantics automatically ensure that timers, interactions, and restart sequences never interfere with each other.
The full code is in the GitHub repository - here we only look at the restart handler, which will be periodically invoked.
Bonus 3: SAGAs for cleanup
One of the amazing properties of durable execution is that it guarantees functions run to the end. That even extends to handling terminal exceptions (that don’t cause retries but should fail a function execution): We can catch those exceptions and handle them to perform cleanup and send RPCs (notifications). Adding this to our agent workflow looks like that:
.png)


