Quick notes on a brief agentic coding experience

4 months ago 11

olano.dev blog etc

2025-06-17 #ai

Perhaps inadvisable from a mental health perspective, I have my RSS reader set up to fetch top stories from Hacker News and lobste.rs. This means that every morning I get a fresh batch of AI-related blog posts delivered to my feed.
I continue to feel ambivalent about this topic and enjoy reading all but the maximalist takes on either side of the fence. I like to see how some people are trying and failing to benefit from these tools, and how others invest in their setup and get something remarkable out of them. I particularly liked a couple of recent stories: Field Notes From Shipping Real Code With Claude and Agentic Coding Recommendations.
What I like about these is that the authors don’t deny the limitations of coding agents, they don’t pretend that everything magically works out of the box, and they give practical recommendations on how to best use them. I especially appreciate the “never let AI write your tests” rule from the first link. I imagine it’s not the only way to go about it, but it resonates with my own pre-AI experience. Making a big investment in nailing the tests so they become insurance against untrusted code is something I can relate to.
Before this, I had been using LLMs almost exclusively in chat sessions inside Emacs, through gptel. First ChatGPT, then Claude. After reading the linked stories I decided to give Claude Code a try, using my personal feed reader as a playground. I planned to task the LLM with a bunch of little bug fixes, refactors, and features I’d been filing since I started using the app, but hadn’t planned to tackle any time soon. This is the resulting raw, stream-of-consciousness bullet-list brain dump of that process.
I can summarize my experience like this: exhilarating and reckless. (Recklessly exhilarating? Exhilarating recklessness?) Kind of like gambling: addictively fun, but you can lose your house if you don’t pay attention.
Just like with my earlier experiences with LLMs, it’s the interface that I’m most impressed with. In this case, it’s how the tool can interact with shell commands, how it runs little experiments to improve its context, how I can conversationally help it learn new tricks and adapt its workflow to my preferences. Claude Sonnet is just as knowledgeable and dumb and sloppy as in the chat buffer, but the fact that it can access my project, do trial and error on its own, and rely on any tool I give it access to, makes up for a good chunk of its limitations.
I can’t really say that Claude Code made me more productive in terms of time, let alone quality, than if I was tackling the same tasks manually in my editor. But the thing is: this was a side project, not real work, and these were features that I had been putting off; it was the fun I was having with the agent that pushed me to finally get them done. What’s more, the part of my brain that I needed to engage in the process was completely different: I didn’t need to concentrate nearly as much, even while I was carefully reviewing all the code; I wouldn’t get as tired—it felt more like play than work.
I must clarify that I wasn’t vibecoding; I’m not interested in entirely letting go of my codebase like that. I provided instructions and let the agent do its thing, but then I would review and iterate on the changes before merging them. Because I expected to continue using the app, I wanted to be able to still understand and extend the code myself when necessary, and the signal I was getting was that I couldn’t really trust the agent to do the reasonable thing most of the time.
It needs a lot of babysitting and guard-railing, kind of what I expected going in. I could perhaps improve the experience if I was willing to make a bigger investment in the process—carefully curating my CLAUDE.md, customizing the Makefile, tuning my workflow. More deliberate tool-building, less automagic productivity. Which is fine; it would be a worthwhile—even fun—effort, assuming I would keep using this tool.
But the elephant in the room, the showstopper problem, was the amount of money I needed to pour into it to keep going. I worked for a couple of half-day sessions, maybe 4-6 hours in total, and spent about 30 dollars to get a couple of trivial code edits and a half-broken simple feature. I could get better at it, sure, get more efficient, but who knows how much I’d have to spend to get there?
Some people would say this is cheap compared to a typical programmer’s salary instead of a typical software subscription, a comparison I’m not willing to make. And maybe the monthly subscription is more cost-effective than the pay-as-you-go model—I didn’t really do the math. But I live in the Third World and I’ve learned to be wary of where I stick my credit card numbers. This is not a work tool I’m paying for; it’s a dopamine fix. Like a slot machine, I was pouring money in one end just to see what came out on the other, just to see if I’d get lucky. It was fun but also made me sick and felt wrong.
Ultimately, this mode of working felt irresponsible because I witnessed, real-time, how I was being estranged from my own project. This happened before with one of my open-source libraries: after a few years I lost interest in it and let the community take over, limiting myself to reviewing external contributions. Since I didn’t use the library myself, and didn’t care much about it, I would just accept most seemingly working contributions. Eventually, the codebase changed so much that I couldn’t even understand its module structure; it lost any conceptual integrity. I couldn’t update it now without a big re-learning effort. Seeing the LLM agent in action showed me how that same process could be accelerated from several years to a week or two.
There’s more to it. You can always familiarize yourself with a strange project, no matter how badly shaped it is. I have cleared a few haunted forests in my career—progressively understanding and recovering ownership of a project, rewriting it line-by-line without indulging in the lazy start-over—and my experience is: for all the mud that can accumulate over years of careless maintenance, if you know how to look, you can always find traces of intent, you can infer the presence, the needs and constraints of previous maintainers. And you can use that to put parts of the puzzle together, to gain understanding and confidence to support some of your decisions. This wouldn’t be the case, I believe, with LLM-generated code. With LLMs it’s all random text, plausible nonsense, mocked intent. Past a certain point, the surrender would become irreversible—there’s no resurrecting that kind of project death.
If Claude Code was cheaper I think I could get a good kick out of it for certain low-stakes projects. No work, all play. I don’t think I could bring myself to program like that professionally, though. At least the current state of the art feels incompatible with my duties as a software designer.

Read Entire Article

Quick notes on a brief agentic coding experience

Related

IndQA

BeerTuner: How many beers would it take to enjoy a song?

Deploying Temporal on AWS ECS with Terraform