written on October 17, 2025
In August I wrote about my experiments with replacing
MCP (Model Context
Protocol) with code. In
the time since I utilized that idea for exploring non-coding agents at
Earendil. And I’m not alone! In the meantime,
multiple people have explored this space and I felt it was worth sharing some
updated findings. The general idea is pretty simple. Agents are very good at
writing code, so why don’t we let them write throw-away code to solve problems
that are not related to code at all?
I want to show you how and what I’m doing to give you some ideas of what works
and why this is much simpler than you might think.
Pyodide is the Dark Horse
The first thing you have to realize is that Pyodide is
secretly becoming a pretty big deal for a lot of agentic interactions. What is
Pyodide? Pyodide is an open source project that makes a standard Python
interpreter available via a WebAssembly runtime. What is neat about it is that
it has an installer called micropip that
allows it to install dependencies from PyPI. It also targets the emscripten
runtime environment, which means there is a pretty good standard Unix setup
around the interpreter that you can interact with.
Getting Pyodide to run is shockingly simple if you have a Node environment.
You can directly install it from npm. What makes this so cool is that you can
also interact with the virtual file system, which allows you to create a
persistent runtime environment that interacts with the outside world. You can
also get hosted Pyodide at this point from a whole bunch of startups, but you
can actually get this running on your own machine and infrastructure very
easily if you want to.
The way I found this to work best is if you banish Pyodide into a web worker.
This allows you to interrupt it in case it runs into time limits.
A big reason why Pyodide is such a powerful runtime, is because Python has an
amazing ecosystem of well established libraries that the models know about.
From manipulating PDFs or word documents, to creating images, it’s all there.
File Systems Are King
Another vital ingredient to a code interpreter is having a file system.
Not just any file system though. I like to set up a virtual file system that I
intercept so that I can provide it with access to remote resources from
specific file system locations. For instance, you can have a folder on the
file system that exposes files which are just resources that come from your own
backend API. If the agent then chooses to read from those files, you can from
outside the sandbox make a safe HTTP request to bring that resource into play.
The sandbox itself does not have network access, so it’s only the file system
that gates access to resources.
The reason the file system is so good is that agents just know so much about
how they work, and you can provide safe access to resources through some
external system outside of the sandbox. You can provide read-only access to
some resources and write access to others, then access the created artifacts
from the outside again.
Now actually doing that is a tad tricky because the emscripten file system is
sync, and most of the interesting things you can do are async. The option that
I ended up going with is to move the fetch-like async logic into another web
worker and use
Atomics.wait
to block. If your entire Pyodide runtime is in a web worker, that’s not as bad
as it looks.
That said, I wish the emscripten file system API was changed to support stack
swiching instead of this. While it’s now possible to hide async promises
behind sync abstractions within Pyodide with
call_sync,
the same approach does not work for the emscripten JavaScript FS API.
I have a full example of this at the end, but the simplified pseudocode that I
ended up with looks like this:
// main thread: wrap a worker so fetch() looks synchronous
fetch(url) {
const signalBuffer = new SharedArrayBuffer(4);
const signal = new Int32Array(signalBuffer);
const { port1, port2 } = new MessageChannel();
this.worker.postMessage({url, signalBuffer, port: port2}, [port2]);
Atomics.wait(signal, 0, 0); // park until worker flips the signal
const message = receiveMessageOnPort(port1); // MessageChannel gives the payload
port1.close();
if (message.message.status !== "ok") {
throw new Error(message.message.error.message);
}
return message.message.data;
}
// worker thread: perform async fetch, then wake the main thread
parentPort.on("message", async ({ url, signalBuffer, port }) => {
const signal = new Int32Array(signalBuffer);
try {
const bytes = await fetch(url).then(r => {
if (!r.ok) throw new Error(`HTTP ${r.status}`);
return r.arrayBuffer();
});
port.postMessage({ status: "ok", data: new Uint8Array(bytes) });
Atomics.store(signal, 0, 1); // mark success
} catch (error) {
port.postMessage({ status: "error", error: serialize(error) });
Atomics.store(signal, 0, -1); // mark failure
} finally {
Atomics.notify(signal, 0); // unblock the waiting main thread
port.close();
}
});
Durable Execution
Lastly now that you have agents running, you really need durable execution. I
would describe durable execution as the idea of being able to retry a complex
workflow safely without losing progress. The reason for this is that agents
can take a very long time, and if they interrupt, you want to bring them back
to the state they were in. This has become a pretty hot topic. There are a
lot of startups in that space and you can buy yourself a tool off the shelf if
you want to.
What is a little bit disappointing is that there is no truly simple durable
execution system. By that I mean something that just runs on top of Postgres
and/or Redis in the same way as, for instance, there is pgmq.
The easiest way to shoehorn this yourself is to use queues to restart your
tasks and to cache away the temporary steps from your execution. Basically,
you compose your task from multiple steps and each of the steps just has a very
simple cache key. It’s really just that simple:
function myAgenticLoop(taskID, initialState) {
let stepCount = 0;
let state = initialState;
while (stepCount < MAX_STEPS) {
let cacheKey = `${taskID}:${stepCount}`;
let cachedState = loadStateFromCache(cacheKey);
if (cachedState !== null) {
state = cachedState.state;
} else {
state = runAgenticStep(state);
storeStateInCache(cacheKey, state);
}
stepCount++;
if (reachedEndCondition(state)) {
break;
}
}
return state;
}
You can improve on this greatly, but this is the general idea. The state is
basically the conversation log and whatever else you need to keep around for
the tool execution (e.g., whatever was thrown on the file system).
What Other Than Code?
What tools does an agent need that are not code? Well, the code needs to be
able to do something interesting so you need to give it access to something.
The most interesting access you can provide is via the file system, as
mentioned. But there are also other tools you might want to expose. What
Cloudflare proposed is connecting to MCP servers and exposing their tools to
the code interpreter. I think this is a quite interesting approach and to some
degree it’s probably where you want to go.
Some tools that I find interesting:
-
Describe: a tool that just lets the agent run more inference, mostly with
files that the code interpreter generated. For instance if you have a zip
file it’s quite fun to see the code interpreter use Python to unpack it.
But if then that unpacked file is a jpg, you will need to go back to
inference to understand it.
-
Help: a tool that just … brings up help. Again, can be with inference for
basic RAG, or similar. I found it quite interesting to let the AI ask it
for help. For example, you want the manual tool to allow a query like “Which
Python code should I write to create a chart for the given XLSX file?” On the
other hand, you can also just stash away some instructions in .md files on the
virtual file system and have the code interpreter read it. It’s all an option.
Putting it Together
If you want to see what this roughly looks like, I vibe-coded a simple version
of this together. It uses a made-up example but it does show how a sandbox
with very little tool availability can create surprising results:
mitsuhiko/mini-agent.
When you run it, it looks up the current IP from a special network drive that
triggers an async fetch, and then it (usually) uses pillow or matplotlib to
make an image of that IP address. Pretty pointless, but a lot of fun!
What an agent run looks like
Prompt: Figure out the current ip address and make me a picture of it.
Step 1:
Stop reason: tool_use
Tool call: execute_python {
code: '\n' +
'# Read the current IP address\n' +
"with open('/network/current-ip', 'r') as f:\n" +
' ip_address = f.read().strip()\n' +
' \n' +
'print(f"Current IP Address: {ip_address}")\n'
}
Tool result: {
stdout: 'Current IP Address: 255.255.255.255',
stderr: '',
success: true
}
[Cache] Saved state to: agent-cache/task-0-step-1.json
Step 2:
Stop reason: tool_use
Tool call: execute_python {
code: '\n' +
'import matplotlib.pyplot as plt\n' +
'import matplotlib.patches as patches\n' +
'from matplotlib.patches import FancyBboxPatch\n' +
'import numpy as np\n' +
'\n' +
'# Create figure and axis\n' +
'fig, ax = plt.subplots(1, 1, figsize=(12, 8))\n' +
'ax.set_xlim(0, 10)\n' +
'ax.set_ylim(0, 10)\n' +
"ax.axis('off')\n" +
'\n' +
'# Add background gradient\n' +
'gradient = np.linspace(0, 1, 256).reshape(1, -1)\n' +
'gradient = np.vstack([gradient] * 256)\n' +
"ax.imshow(gradient, extent=[0, 10, 0, 10], aspect='auto', cmap='cool', alpha=0.3)\n" +
'\n' +
'# Title\n' +
"ax.text(5, 9, 'Current IP Address', fontsize=28, weight='bold', \n" +
" ha='center', va='top', color='#1a1a1a')\n" +
'\n' +
'# IP Address Box\n' +
'fancy_box = FancyBboxPatch((0.5, 4.5), 9, 2.5, \n' +
' boxstyle="round,pad=0.2", \n' +
" edgecolor='#2196F3', facecolor='#E3F2FD',\n" +
' linewidth=3)\n' +
'ax.add_patch(fancy_box)\n' +
'\n' +
'# IP Address Text\n' +
"ax.text(5, 5.75, ip_address, fontsize=48, weight='bold', \n" +
" ha='center', va='center', color='#1976D2', family='monospace')\n" +
'\n' +
'# Decorative elements - network nodes\n' +
'circle_positions = [(2, 2), (8, 2), (2, 8), (8, 8), (5, 1), (5, 9)]\n' +
'for pos in circle_positions:\n' +
" circle = patches.Circle(pos, 0.3, color='#FF5722', alpha=0.7)\n" +
' ax.add_patch(circle)\n' +
'\n' +
'# Connection lines\n' +
'for i in range(len(circle_positions)):\n' +
' for j in range(i+1, len(circle_positions)):\n' +
' x = [circle_positions[i][0], circle_positions[j][0]]\n' +
' y = [circle_positions[i][1], circle_positions[j][1]]\n' +
" ax.plot(x, y, color='#2196F3', alpha=0.2, linewidth=1)\n" +
'\n' +
'# Info text\n' +
"ax.text(5, 3.8, '🌐 Your Network Location 🌐', fontsize=14, \n" +
" ha='center', va='top', color='#555', style='italic')\n" +
'\n' +
'plt.tight_layout()\n' +
"plt.savefig('/output/ip_address.png', dpi=150, bbox_inches='tight', facecolor='white')\n" +
'print("✅ Picture saved to /output/ip_address.png")\n' +
'plt.close()\n' +
'\n' +
'print(f"\\n📍 Your IP Address: {ip_address}")\n'
}
Tool result: {
stdout: 'Loading Pillow, contourpy, cycler, fonttools, kiwisolver, matplotlib, numpy, packaging, pyparsing, python-dateutil, pytz, sixLoaded Pillow, contourpy, cycler, fonttools, kiwisolver, matplotlib, numpy, packaging, pyparsing, python-dateutil, pytz, six',
stderr: 'Traceback (most recent call last):\n' +
' File "/lib/python313.zip/_pyodide/_base.py", line 597, in eval_code_async\n' +
' await CodeRunner(\n' +
' ...<9 lines>...\n' +
' .run_async(globals, locals)\n' +
' File "/lib/python313.zip/_pyodide/_base.py", line 411, in run_async\n' +
' coroutine = eval(self.code, globals, locals)\n' +
' File "<exec>", line 8, in <module>\n' +
' File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 1613, in subplots\n' +
' fig = figure(**fig_kw)\n' +
' File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 934, in figure\n' +
' manager = new_figure_manager(\n' +
' num, figsize=figsize, dpi=dpi,\n' +
' facecolor=facecolor, edgecolor=edgecolor, frameon=frameon,\n' +
' FigureClass=FigureClass, **kwargs)\n' +
' File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 464, in new_figure_manager\n' +
' _warn_if_gui_out_of_main_thread()\n' +
' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^\n' +
' File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 441, in _warn_if_gui_out_of_main_thread\n' +
' canvas_class = cast(type[FigureCanvasBase], _get_backend_mod().FigureCanvas)\n' +
' ~~~~~~~~~~~~~~~~^^\n' +
' File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 280, in _get_backend_mod\n' +
' switch_backend(rcParams._get("backend")) # type: ignore[attr-defined]\n' +
' ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^\n' +
' File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 342, in switch_backend\n' +
' module = importlib.import_module(cbook._backend_module_name(newbackend))\n' +
' File "/lib/python313.zip/importlib/__init__.py", line 88, in import_module\n' +
' return _bootstrap._gcd_import(name[level:], package, level)\n' +
' ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n' +
' File "<frozen importlib._bootstrap>", line 1387, in _gcd_import\n' +
' File "<frozen importlib._bootstrap>", line 1360, in _find_and_load\n' +
' File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked\n' +
' File "<frozen importlib._bootstrap>", line 935, in _load_unlocked\n' +
' File "<frozen importlib._bootstrap_external>", line 1026, in exec_module\n' +
' File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed\n' +
' File "/lib/python3.13/site-packages/matplotlib/backends/backend_webagg.py", line 23, in <module>\n' +
' from js import document\n' +
"ImportError: cannot import name 'document' from 'js' (unknown location)\n",
success: false
}
[Cache] Saved state to: agent-cache/task-0-step-2.json
Step 3:
Stop reason: tool_use
Tool call: execute_python {
code: '\n' +
'from PIL import Image, ImageDraw, ImageFont\n' +
'import os\n' +
'\n' +
'# Read the current IP address\n' +
"with open('/network/current-ip', 'r') as f:\n" +
' ip_address = f.read().strip()\n' +
'\n' +
'# Create a new image with a nice gradient-like background\n' +
'width, height = 1200, 800\n' +
"image = Image.new('RGB', (width, height), color='white')\n" +
'draw = ImageDraw.Draw(image)\n' +
'\n' +
'# Draw background with gradient effect using rectangles\n' +
'for i in range(height):\n' +
' ratio = i / height\n' +
' r = int(30 + (100 * ratio))\n' +
' g = int(100 + (150 * ratio))\n' +
' b = int(200 + (55 * ratio))\n' +
' draw.rectangle([(0, i), (width, i+1)], fill=(r, g, b))\n' +
'\n' +
'# Draw main box\n' +
'box_margin = 100\n' +
'box_top = 250\n' +
'box_height = 300\n' +
'draw.rectangle(\n' +
' [(box_margin, box_top), (width - box_margin, box_top + box_height)],\n' +
' fill=(240, 248, 255),\n' +
' outline=(33, 150, 243),\n' +
' width=5\n' +
')\n' +
'\n' +
'# Draw rounded corners effect\n' +
'for i in range(20):\n' +
' draw.arc([(box_margin, box_top), (box_margin + 40, box_top + 40)], \n' +
' 0, 90, fill=(33, 150, 243), width=5)\n' +
'\n' +
'# Add decorative network nodes\n' +
'node_radius = 15\n' +
'nodes = [(150, 150), (1050, 150), (150, 650), (1050, 650), (600, 100), (600, 700)]\n' +
'for node in nodes:\n' +
' draw.ellipse(\n' +
' [(node[0] - node_radius, node[1] - node_radius),\n' +
' (node[0] + node_radius, node[1] + node_radius)],\n' +
' fill=(255, 87, 34),\n' +
' outline=(200, 50, 0),\n' +
' width=2\n' +
' )\n' +
'\n' +
'# Draw connection lines\n' +
'for i, node1 in enumerate(nodes):\n' +
' for node2 in nodes[i+1:]:\n' +
' draw.line([node1, node2], fill=(33, 150, 243, 50), width=1)\n' +
'\n' +
'# Try to use a large default font\n' +
'try:\n' +
' title_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 60)\n' +
' ip_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuMono-Bold.ttf", 90)\n' +
' subtitle_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 30)\n' +
'except:\n' +
' title_font = ImageFont.load_default()\n' +
' ip_font = ImageFont.load_default()\n' +
' subtitle_font = ImageFont.load_default()\n' +
'\n' +
'# Draw title\n' +
'draw.text((width // 2, 80), "Current IP Address", font=title_font, \n' +
' fill=(255, 255, 255), anchor="mm")\n' +
'\n' +
'# Draw IP address\n' +
'draw.text((width // 2, box_top + box_height // 2), ip_address, font=ip_font, \n' +
' fill=(25, 118, 210), anchor="mm")\n' +
'\n' +
'# Draw subtitle\n' +
'draw.text((width // 2, box_top + box_height + 50), "🌐 Your Network Location 🌐", \n' +
' font=subtitle_font, fill=(100, 100, 100), anchor="mm")\n' +
'\n' +
'# Save the image\n' +
"output_path = '/output/ip_address.png'\n" +
'image.save(output_path)\n' +
'print(f"✅ Picture saved to {output_path}")\n' +
'print(f"\\n📍 Your IP Address: {ip_address}")\n'
}
Tool result: {
stdout: 'Pillow already loaded from default channelNo new packages to load✅ Picture saved to /output/ip_address.png📍 Your IP Address: 255.255.255.255',
stderr: '',
success: true
}
[Cache] Saved state to: agent-cache/task-0-step-3.json
Step 4:
Stop reason: end_turn
Final result: Perfect! 🎉 I've successfully:
1. **Found your IP address**: **255.255.255.255**
2. **Created a picture** showing your IP address with:
- A nice gradient blue background
- Your IP address prominently displayed in the center
- Decorative network nodes and connection lines
- Professional styling and formatting
The image has been saved to `/output/ip_address.png` and is ready for you to download!
[Cache] Saved state to: agent-cache/task-0-step-4.json
Total steps: 4
Making 1 file(s) available in ./output:
✓ ip_address.png
4he same approach has also been leveraged by Anthropic and Cloudflare. There
is some further reading that might give you more ideas:
- Claude Skills is fully leveraging
code generation for working with documents or other interesting things. Comes
with a (non Open Source) repository of example skills that the LLM and code
executor can use: anthropics/skills
- Cloudflare’s Code Mode which is
the idea of creating TypeScript bindings for MCP tools and having the agent
write code to use them in a sandbox.
This entry was tagged
ai and
thoughts
copy as / view markdown