Using AI to Debug Your Programs with Undo

1 day ago 4

Author: Marco Barisione, Principal Software Engineer at Undo

AI is transforming how we write software, but debugging remains in the dark ages. At Undo, we’re changing that by giving AI access to complete execution history. Our time travel debugging engine records every instruction, variable, and function call, allowing the AI not only to watch what happened but also to understand why.

clippy

What is Undo?

If you are unfamiliar with Undo, at the core of our technology is the Undo Engine, which implements record, replay, and time travel debugging. It can produce recordings of your program’s execution.

Recordings capture the whole execution history – every line of code in every thread, every variable, every I/O. All you have to do is rewind and fast-forward in time in the recording to observe how the state changes and why the code behaves the way it does.

This means you can:

Get a picture of the code flow by exploring how the code is executed dynamically
See exactly what happened and how it happened
Understand subtle interactions amongst complex components

Recordings are portable, allowing them to be replayed outside of the original environment and shared with your team, or even generated on a customer or production system and then shared with developers.

But can we use execution history, along with AI, either to fix existing bugs or to be part of the AI feedback loop when generating new code?

Debugging with AI

You’ve just been assigned a bug. It only happened once on a test machine, or it happens in a customer environment you don’t have access to. The code is unfamiliar, large, and complicated. Maybe you’ve got a few logs and a vague description, and now it’s your problem to solve. This is probably a situation all developers have had to deal with – what are your options?

AI with code and logs

Nowadays, many developers will probably try using an AI with some logs and relevant code. It might make a plausible guess, but most of the time, with complex bugs, it will fixate on something unrelated, like a line that wasn’t even executed, or give a general suggestion that doesn’t apply. Without knowing what actually happened during the run, the AI is operating in the dark.

Now, imagine you have an Undo recording and you can use time travel debugging on the failing program run. You can see exactly what happened: which paths the program took, what values changed, and when. You can step back from the failure, inspect state, and understand the code as it actually behaved. This is powerful – and it’s what our customers already rely on – but it still involves a lot of manual work, especially when the code is unfamiliar.

A possible next step is to integrate Undo with an AI: you give the AI access to the recording and let it help drive the investigation. It can walk through function calls, summarize what the program did, and highlight areas worth looking into. This speeds up exploration, but current models still struggle with complex debugging, probably because, while there is a lot of training data on how to write code, there isn’t much high-quality data on how to debug.

The core problem is that the number of possible program states is astronomically huge, but most bugs only show up in one very specific scenario. What the AI needs is guidance. Instead of the AI guessing or manually probing the recording, we feed it structured, targeted insights from the program’s execution history. The model isn’t just driving a debugger – it’s informed by what actually happened and why. There are quite a few open questions about how to achieve this, but we’ve discovered a handful of new and interesting ways that are really promising to greatly improve debugging capabilities.

The goal is for the LLM not only to identify when something went wrong in the recording, but also to explain why it happened – and even suggest a fix.

Watch this space!

A less annoying Clippy?

In the meantime, while we work on deeper integrations, is there anything we can do now to gain some practical value from AI?

If you remember Clippy, the animated paperclip from Office 97, you’ll understand why we were cautious about bolting AI onto a debugger. Until recently, the idea felt a bit gimmicky. But newer models like ChatGPT, Claude Opus and Google’s Gemini have changed that. They’re more capable, more context-aware, and genuinely helpful in the right situations. While this is not yet the “help the AI” future we envision in the long term, we believe that some integration can already be very beneficial to our users.

We’ve been experimenting with a new explain command in UDB, which allows an AI (we currently use Claude Code) to drive the debugger to answer your questions.

In the video, you can see the AI solve a stack smash bug. This is, of course, a very simple example, but the AI integration has already proven useful in practice, especially when dealing with unfamiliar code. It can trace what happened, summarize it, and point you in the right direction. That can save a lot of time, particularly when stepping through complex or legacy code.

We’ll be releasing a preview version of this feature as an optional add-on soon. Stay tuned!

Interested in trying Undo on your code? Get a free trial below.

Read Entire Article