I’ve been using Claude Code extensively for personal projects, and similar AI coding tools at work. Recently I came across this excellent blog post that resonated with a lot of my experience.
One part stuck with me though: Noah emphasizes that tools fail with LLMs when they’re “overly complex,” with the Unix philosophy being particularly well-suited for tool calling. But then I thought about git.
Git breaks the Unix philosophy completely. It’s sprawling, stateful, and complex. And yet Claude Code handles it effortlessly. It composes commands that, even after 10+ years of daily git usage, I wouldn’t think to use. It handles rebasing, cherry-picking, complex resets—stuff that trips up experienced developers regularly.
So if simplicity and the Unix philosophy aren’t the whole story, what else matters?
I’ve come up with three “hallmarks” of a good tool for tool calling with LLMs.
1. It’s been around for a long time and/or is used by lots of people
Examples: Unix tools like cat, sed, awk, grep, find—but also git, npm, docker, kubectl. Every Stack Overflow thread, blog post, and tutorial using these tools has likely ended up in the training data. Claude isn’t reasoning from first principles—it’s drawing on millions of examples.
This is why git works despite its complexity: Claude has effectively memorized decades of collective wisdom.
2. It has really good documentation (built-in help or external docs)
Even if a tool isn’t widely used, great documentation can bridge the gap. I’ve been building a finance system on top of Beancount—a double-entry accounting system that’s definitely not mainstream (maybe I’ll write a post about this in the future). Claude Code handles it surprisingly well because Beancount has exceptional documentation. When I point Claude at the docs, it can figure out the directive syntax, transaction formats, and account structures without necessarily having seen millions of examples in its training data.
Good --help text matters. Clear external documentation matters. If Claude can discover how your tool works, it can use it effectively.
3. It has good error messages when something is wrong
Error messages, especially those with suggestions like “you used X, did you mean Y?”, can be tremendously helpful for LLMs. The best example is the Rust compiler: it gives errors like “you typed foobar, did you mean foobaz?” and Claude Code can actually use that feedback to correct itself.
This might be one reason why people feel Claude Code is particularly good at Rust programming—the compiler is essentially coaching it through mistakes in real time.
None of this is to say the Unix philosophy is wrong — it’s that Unix tools work well with Claude Code for different reasons than simplicity. Tools like grep and cat nail hallmarks 1 and 2: they’ve been around for decades (massive training data) and have extensive man pages (great documentation). The fact that they follow “do one thing well” is almost incidental to their success with LLMs.
So if I’m building a tool today, I can’t make it instantly popular, but I can make it understandable. That means good documentation, clear error messages, and a few solid examples of how it’s used. Those aren’t new ideas — they’ve always mattered. It’s just that, with LLMs in the mix, the payoff for doing them well is suddenly much higher.