Giving LLMs access to tools (which turns them into ✨agents✨) is an incredibly powerful way to give LLMs capabilities that go beyond generating text. But it’s important to think clearly about the costs and limitations of tool calling, and in particular, people should understand that calling a tool is many orders of magnitude more costly than calling a plain old function from code. There is and probably always will be a limit on how many tool calls an agent can effectively make, and people should design their agentic systems accordingly.
For this to make sense, you have to consider what a tool call is “under the hood.” LLMs are typically used as very fancy text generation machines. And the way they do tool calls is by generating text, although that’s typically abstracted away from us.
Let’s say you have an agent with one tool, add, for adding 2 numbers together. A user asks the agent a question that’s easy to answer with the add tool:
What’s 15 + 27?
To actually call the add tool, the model generates a message like this (simplified):
At this point the model stops generating tokens. The thing that’s driving the model (the agentic loop?) parses that message, passes those arguments to some function like add(15, 27), and then puts the output of that into chat history as a new message:
Inference resumes, and the LLM now has everything it needs to tell the user that the answer is 42. This works! It’s the foundation of some really incredible software systems! But it wasn’t free:
- The model had to generate a bunch of tokens.
- We used up precious context window for the 2 messages.
If you’re adding 2 numbers once, it probably doesn’t matter. If you’re summing up 1,000 numbers… you’re going to be waiting a very long time for those 999 tool calls to finish, and you might blow through your entire context window.
This might seem like an academic point, but calling a function many times in a loop is one of the most common ways to solve a problem with code. To give a contrived example, say we have 100 user IDs and we want to count the users whose name starts with ‘R’:
- A programmer with a get_user_info(id) function can write+run a simple for loop
- An agent with a get_user_info(id) tool can try to make 100 tool calls, but it will probably run out of context window long before it finishes
- Remember, the entire result of every tool call ends up in the context window
Designing agentic tools that are flexible enough for every use case (or even most use cases) is hard, and I don’t think enough people are talking about that.
As always, it depends. Maybe your agent is solving problems where it will never need to make large numbers of tool calls. Maybe you’re clever and you can design your tools to be very flexible+powerful. Maybe you can sidestep this problem by letting your agent write+run code (keeping in mind all of the necessary security precautions).