Navigating the Storm: Driving AI Agents

1 hour ago 1

storm.jpg

I've been spending a lot of time lately orchestrating multiple AI coding agents, and I've noticed something strange: it's surprisingly stressful. Not in the way that debugging a production outage is stressful, but in this odd, sustained, attention-demanding way that I'm still trying to articulate.

The best analogy I've found is that it's like driving in heavy rain.

How I got here #

Like most developers, I started with a single coding agent. I'd describe what I wanted in natural language, review what it built, iterate on the prompt. It was a new muscle to develop—learning to be specific about requirements, understanding how to break down problems in a way that an LLM could execute on—but it felt manageable. Almost relaxing, even.

Then I tried running multiple agents in parallel. One refactoring the backend authentication system, another updating the frontend components, a third writing tests. The productivity was remarkable. What would have taken me days was happening in hours. But I found myself in this weird state of hypervigilance that I wasn't expecting.

I mentioned this to a few other developers who'd been experimenting with multi-agent workflows, and they all immediately recognized what I was describing. We kept using similar words: "intense," "draining," "you can't look away." One friend called it "productive anxiety," which feels about right.

The driving in rain thing #

Here's what I realized: when you're orchestrating multiple agents, you're mostly not coding. You're watching. Terminal outputs scrolling. File changes appearing in your IDE. Agents reporting progress, hitting errors, asking for clarification.

It's like those long stretches of highway driving in the rain where nothing is really happening—you're just maintaining your lane, keeping your speed steady, staying present. Most of the time, the agents are doing fine. The code is being written, tests are passing, the work is progressing.

But you can't mentally check out. Because every so often, you need to make a small but critical correction. An agent is about to overwrite the wrong file. Two agents are operating on conflicting assumptions about the API contract. A dependency change in one agent's work is about to break another agent's context. These moments come suddenly, and if you miss them, recovery gets expensive.

What makes it particularly draining is that you're responsible for outcomes you're not directly controlling. The agents are doing the actual work—you're barely touching the keyboard—but you're the one who has to sense when something's drifting. By the time an agent has fully gone off-road, you're looking at a much bigger cleanup job.

What I'm learning about this #

I've been keeping notes on what seems to work. Not sure if these generalize, but here's what I'm seeing:

The struggle of "not coding" was real for me at first. I kept feeling like I should be doing more. But orchestration is the work. Making judgment calls about when to intervene, maintaining the mental model of what each agent knows and is doing, catching the moment before things drift—this is cognitively demanding in a different way than writing code directly.

I've found my personal limit is around three concurrent agents on complex tasks. More than that and the context-switching cost starts to outweigh the parallelization benefit. Your mileage may vary.

Checkpoints matter more than I expected. Every 15-20 minutes, I stop everything and review what each agent has done. Commit the good work, adjust course where needed. Without these deliberate pause points, I find myself losing track of the overall state.

The fatigue is different from normal coding fatigue. After a few hours of agent orchestration, I'm more tired than after a full day of writing code myself. I think it's the sustained attention without the natural breaks that coding normally provides.

The tooling problem #

Here's what's bothering me: we're doing all of this with tools that weren't designed for it.

Current IDEs have basically added a chat panel to the sidebar and called it AI integration. But the rest of the interface is still organized around the same paradigm—file trees, code editors, maybe some tabs. The code is the primary artifact, and the AI is a feature.

When you're running multiple agents, though, you're not primarily looking at code. You're monitoring state, tracking progress, managing context, catching divergence. The file tree view is almost beside the point.

I keep thinking we need something more like a dashboard. What is each agent currently doing? What's in their context? Where are they in their task graphs? What's queued? A unified view that shows agent state as the primary interface, not as a sidebar add-on.

Some specific things I wish I had:

Real-time diff streams, not file-at-a-time changes. Show me what's being modified across all agents, let me approve or redirect in the moment. I'm not editing files anymore; I'm steering edits.

Visual context management. Let me see what each agent knows, when context is getting stale, when two agents have incompatible understandings of the same system. Right now I'm tracking this in my head.

Built-in checkpointing. The tool should prompt me: "Agent 2 has made 47 changes in 12 minutes. Review?" One-click to see the aggregate, one-click to commit or rollback per-agent.

Task graphs instead of file trees. Show me the work breakdown—"Refactor authentication" splits into "Update user model," "Migrate sessions," "Add tests." Assign agents to tasks. The files are implementation details.

Communication intercepts. When Agent A's output feeds into Agent B's input, show me that dependency. Better yet, let me intercept and adjust that handoff before Agent B runs with wrong assumptions.

The meta-problem #

And that's just for one developer with multiple agents. We haven't really started thinking about what happens when multiple developers are each orchestrating multiple agents on the same codebase. Current Git workflows weren't designed for this. What does code review even mean when 80% of the diff was machine-generated? You're not reviewing code quality—you're reviewing orchestration decisions. Did the human steer the agents toward good architecture? Are the task boundaries sensible? Is the system coherent?

I don't have answers here. I'm not even sure what the right questions are yet.

Where this might be going #

Right now we're making do. We're driving in the storm with tools designed for sunny weather. It works through sheer concentration and adaptation, but it doesn't feel sustainable.

I suspect someone is building the right tooling for this. Not the "AI agent" as a feature bolted onto existing IDEs, but tools designed from the ground up for orchestration. Where agent state is the primary view. Where humans are conductors, not composers.

We're still early. Most of us are squinting through that little chat panel, hoping we catch the critical moment before an agent drifts into the median.

But the driving metaphor keeps resonating with people I talk to. We're all learning to navigate the same storm.

If you're working on this problem—either as someone orchestrating agents or building tools for it—I'd be curious to hear what you're seeing. I'm @lolsborn on Substack, or you can email me at [email protected].

Read Entire Article