Agents, knowledge debt, and asynchronous work

5 hours ago 1

We came across a short-but-interesting post this past week about how learning new things by definition creates technical debt. We’d encourage you to read the whole post, but a brief summary of the idea is that as soon as you learn how to build better software (e.g., improved performance, better abstraction, etc.), all the parts of your codebase that don’t follow those best practices are your tech debt. Some debt can be more insidious (have a higher interest rate) than others, but debt is debt.

Zooming out a little, while tech debt is a familiar concept to any software engineer who’s seen an ugly part of a codebase, this framing was particularly interesting to us because it can actually be applied more broadly. You can have infrastructure debt (e.g., a better deployment pattern on your cloud that you haven’t implemented), marketing message debt (e.g., a new marketing message that hasn’t worked its way into all your assets yet), or team process debt (e.g., a new way of doing things that everyone isn’t caught up on) amongst others.

What caught our attention more than anything is that this is an ideal problem statement for a whole suite of agents that we haven’t yet started to see on the market — systems that run in the background and clean up our knowledge debt without requiring us to do the tedious work by hand.

At a 30,000 foot level, the promise of agents is that we can do dramatically more work than we were doing before because there are agents running around working for us. Largely because of the way ChatGPT set our expectations, we still tend to think of these agents as interactive (and synchronous). We ask an agent to go do something, and we get the results faster than we would have if we did that same task ourselves. And while we see examples on Twitter of people getting better at spinning off tasks in parallel, we find that we ourselves most often tend to sit and watch ChatGPT and wait for its responses.

The problem with the idea of knowledge debt is often that you don’t know you have it until you come across it. You might have a nagging sense that there are outdated parts of the codebase, but it doesn’t really bite you til you have to touch the code. When you finally have to change it, and then you either have to remember how things used to be done or spend your time refactoring the code (or having Cursor refactor the code) before you can move forward.

The opportunity with an agent is to have work that’s detected and done in the background for you. When you decide to adopt a new pattern or a new language in your codebase, you can spin up an agent that scans through the codebase at its leisure and refactors your code over the course of days (especially once we have spot token pricing!). This is not really the world we live in yet (though its in plenty of pitch decks that we’ve seen). As much as Cursor or Cognition would like us to have tasks running async, the most productive coding tools are still being used synchronously by engineers who are providing concrete, fine-grained guidance.

While it’s natural that interactive features are the starting point, we think that asynchronous work is the natural next step in agentic systems.

While asynchronous work is a nice ideal to paint, the natural question is how you actually operationalize these agents. They haven’t taken all our jobs yet, so presumably there will be humans in the loop at some point. How do you tell the agent what to do (this is easy — prompting) and how do you make sure it did the right thing (much harder)?

This is where software teams are the ideal example early adopters for these types of agents. Software workflows have historically had clear change management. Every software engineer is familiar with creating a branch for a new task, opening a pull request when the task is done, and asking for reviews on the work they just finished. This is obviously an easy workflow for LLMs to plug into — Cursor and Devin have already automated this process. But what’s valuable about this workflow is that once an agent executes it, a human knows exactly what to look for (a git diff) and how to validate that it meets their expectations.

What’s more interesting is how you actually build trust in this work. We all probably know from personal experience that no matter how Claude Code and Cursor prompt you to review the changes they make, you probably just end up hitting enter or cmd-Y repeatedly to see what the code does. It’s often only when you look at the results and wonder what the heck is going on that you go back and review the code changes. Thankfully for background tasks like refactors and rewrites, testing is a natural sanity check, and (most) software teams have invested at least a little bit in tests. Tests are rarely comprehensive, but combined with code reviews, software workflows tend to be pretty reasonable at avoiding the worst bugs.

With all this in mind, we fully expect that asynchronous background agents are likely to get adopted very early on in software, much in the same way that coding agents were some of the most widely adopted early examples of agentic applications.

Unfortunately, things are significantly more complicated in other areas of knowledge work. Let’s take marketing for example. At RunLLM, we’re just now finishing up an exercise in which we’ve rewritten large parts of our positioning based on what we’ve been hearing from customers recently. It would be amazing if an LLM could go and take a 1-page positioning brief and update our website, our sales collateral, and our internal guides. Except that would be a nightmare.

Our website’s on Webflow, our sales collateral is on Google Drive, and our internal docs are in Notion — we’re spanning many eras of technology here. None of those tools have great change management processes, and there are plenty of old documents that are no longer used that we wouldn’t want to update. Validating that the agent made the right changes in all these places would be incredibly painful — probably much worse than doing the work by hand. And while this might seem like a particularly freeform problem, the same is likely true of many of the tools you use — imagine having an LLM update your PagerDuty thresholds or your sales outreach sequences on Apollo without an easy way to review what they did.

The fundamental challenge here is that most of our daily workflows rely on humans doing the right thing to be successful. When they don’t, it’s hard to know what happened and revert easily to a previous state — there’s no git checkout for most of your systems!

This is probably the biggest impediment to adopting asynchronous agents for most of the business problems that can be solved with AI. There likely is no silver bullet (most of your team isn’t going to want to learn how to use git for marketing copy!) which means that there’s lots of innovation to be done both on the AI and on the UX. We wish we could tell you we’d figure it out, but we’re still early — this is one of the most interesting areas for us to see continued innovation.

What we’re calling knowledge debt here has many manifestations — tribal knowledge, senior talent walking out the door, and cross-team processes that require many layers of approval. We picked an intentionally broad framing because we didn’t want to get caught up in the semantic debates, but whatever your favorite vocabulary is, the idea is the same: With the AI systems that will inevitably be built over the next few years, having teams and systems be out of sync because the right updates weren’t communicated will be a thing of the past.

Of course, that then raises a human problem. If the source of truth changes, but you haven’t updated what’s in your head… what happens? How do you make sure the whole sales team is aware of the latest marketing framing? Peeling back every layer of what’s possible exposes another requirement that we hadn’t thought about before, and the challenges often flip between technical and organizational. There’s always more to do!