I’ve been an early adopter of large language models (LLMs) since 2022, to be precise. I discovered the power of large language models while reading the article "Inner Monolog," which utilized one of OpenAI’s first trained models: Instruct-GPT.
I must say, I was immediately impressed by the capabilities, however rudimentary at the time, of this system. I quickly started using it to generate code, ranging from simple to somewhat sophisticated, with varying degrees of success. However, I had never attempted to build a complete application from scratch.
Although I’m an experienced programmer, having worked with nearly twenty programming languages throughout my career—some of which I created myself—my knowledge of JavaScript is quite limited. I only really engaged with this language in 2025, as part of another project built with the help of an LLM, though that project was far more modest in scope.
Undertaking a large-scale project in JavaScript was the perfect opportunity to test the strengths and weaknesses of trance-like code generation, or vibe-coding. The goal was to see how far someone with no prior programming experience could go.
Without spoiling the conclusion of this experiment, let’s just say you can go very far...
The Germans invented war games shortly after the Napoleonic Wars to train future generals in a playful way (early gamification!) to grasp the principles of military strategy. The term "Kriegspiel" simply means "war game."
I had long dreamed of creating such a game, but I never had the time or—let’s be honest—the skills to develop a playable version in a browser.
The game’s rules are relatively simple. There are six types of units:
- Infantry
- Cavalry
- Scout
- Supply
- Artillery
- General
Each army has one general and a variable number of troops. The goal of the game is to eliminate the opposing general.
You can find the game’s code by clicking here.
I set myself a simple constraint: use only the free versions of the following LLMs: Grok 3, Claude, and Gemini. This limitation reduces both the number of available tokens and restricts access to certain model types.
In the end, I primarily used Gemini 2.0 Flash, the only one capable of handling a project of reasonable size. For Claude and Grok 3, I used them mainly to generate or modify specific functions.
However, it quickly became clear that once the project exceeded a certain size, the models began to show their limitations.
My initial reflex was to write a detailed prompt describing the game in its entirety. The response was blunt. The LLM (Gemini in this case) explained that such a game was too complex to produce and refused to proceed further:
"As an AI assistant, I can’t write and deploy an entire application of that scale. My capabilities are focused on providing information, explaining concepts, generating code snippets or examples for specific parts, and helping you structure your thoughts and plans."
This paragraph was part of a much more elaborate response where the machine, while refusing to provide the complete code, detailed a precise plan to allow a user to implement such a game.
I decided to proceed differently, adopting a step-by-step strategy. First, I asked the machine to generate a game map with hexagons in JavaScript, which the LLM promptly provided.
Then, gradually, I began adding new graphical elements. Initially, I asked to enrich the map with forests, lakes, and mountains. The LLM chose the color legend itself, which turned out to be to my liking.
Next, I asked it to create the different units. The LLM attempted to draw approximations of soldiers and wagons using circles and squares. The result was hideous.
At first, I provided the AI with an example of an icon I wanted to use, but the result remained disastrous. I ended up informing it that I had image files in a directory, and it updated the code accordingly.
After several iterations, the main file started to become unwieldy. I asked the machine to split the code into multiple files, which caused a regression. It took several specific prompts to finally arrive at a functional version again.
It is absolutely essential to place your code under version control with Git. The machine often gets stuck in a flawed solution from the start, with no way to recover. The solution is to revert to the last working version and start over with a different prompt.
You can easily inject generated errors into the prompts, but there comes a point when the session becomes too large for the machine to handle. Initially, I tried to push it as far as possible, but errors accumulate, and the code becomes increasingly unmanageable.
Don’t hesitate to abandon the current session and start a new one.
Implementing combat and movement mechanics proved particularly tedious. The main loop in the gameLoop function was notably very slow. I suggested to the machine to implement movements and combat as asynchronous functions, to which the LLM responded that it was an excellent idea... but didn’t implement it. Instead, it simply added a parameter to reduce combat checks to every 10 seconds.
This misunderstanding is interesting: I thought it had implemented the asynchronous function mechanism because the game suddenly became very smooth, when in fact, it hadn’t followed my idea:
Gemini 2.0 Flash: "Absolutely! Switching to a timer-based model for managing movements and combat is an excellent idea to make the game more precise and decouple game state updates from the display refresh rate (requestAnimationFrame)."
I eventually managed to develop a first version with the following features:
- A map with hexagons
- Color variations to reflect terrain
- Various units belonging to two different armies, identified by a colored circle
- Movement mechanics
- Combat mechanics
The combat mechanics proved too complex, and initially, I chose to leave it as it was. However, I abandoned some of the machine’s decisions, such as pathfinding for units, which sometimes caused infinite loops. To save time, I simplified movement to a case-by-case basis, where decisions are local to the surrounding tiles of the unit in question.
Initially, I asked the machine to add code to handle a fog of war, and I was absolutely impressed by the result. The modification worked perfectly on the first try.
I then wanted to add server capabilities to the code. My first prompt received the same kind of warning as before: the code was too large to modify for adding a server.
My prompt: "I want to turn this game into a two-player game playable over the internet. The blue player is the only one who can set up the map and access the hamburger menu. The red player can move their units, and their fog of war is obviously the opposite of the blue player’s. The blue player creates the game, and the red player joins. The game ends when one player has no units left."
Response: "Implementing these features requires adding substantial new code for network communication and rethinking how the game state is managed and updated across multiple clients. Unfortunately, I cannot directly implement these code changes for you."
I had to work around the problem by starting a new session, this time with a much more directive prompt. The planning proposed by the model in the previous prompt allowed me to write a much more precise request:
"I want to play over the internet with another player. Here’s how I want to proceed:
- The server will be managed via Node.js
- The blue player starts the game
- The red player receives the JSON corresponding to the ongoing game
- Each player plays locally on their machine, except the red player’s version is centered on red units
- Each player’s movements are exchanged via the server
- All combat is handled by the blue player’s client"
This time, the model not only produced a server.js file but also suggested modifications to the main game files.
Adding the server mode significantly increased the code’s size, and the language model’s performance began to decline noticeably. I had to actually review the code to identify areas needing changes in advance. But as some files grew substantially, regeneration started to take an enormous amount of time.
I then encountered a frustrating issue: the machine would often generate only part of the code, inserting comments where the code hadn’t been modified. I had to either demand the full code be generated (which took time) or make the changes manually.
I began working function by function. I continued to provide the full file as input but asked the machine to return only the modified parts. This led to a new, strange issue: the code started hallucinating nonexistent functions.
For example, map rendering was handled by the drawMapAndUnits primitive, but the code consistently used the nonexistent drawGame method. Notably, the hallucinated functions, once corrected, often matched what was requested.
While the LLM enabled me to create most of the game’s mechanics, as the project grew, understanding programming became essential. Beyond a certain size, the “needle in a haystack” effect fades, and the model struggles to correctly identify the code’s internal logic.
For instance, I repeatedly tried to get it to modify the combat mechanics, but its suggestions fell flat. I ended up modifying them myself. However, the LLM remains useful in such cases: you can ask it to display the relevant code lines and explain the underlying logic.
This way, you can immediately identify the parts to modify and proceed in several complementary ways:
- Extract the relevant lines and ask the model to modify them
- Modify them yourself
- Propose a change and ask the model to apply it
In any case, solid programming knowledge becomes necessary. The paradox is this: the larger the project, the finer the granularity of changes. You start by working with the entire project in memory and end up working function by function, global variable by global variable.
The finer the granularity, the greater the risk of losing your “needle in the haystack.” At some point, requesting improvements to existing code feels like a risky gamble. You end up using GitHub’s option to abandon the current file more and more often.
Ultimately, the best workaround is to generate specialized code that you manually insert into the project. And here... not having programming knowledge makes the task particularly challenging. Even though the model explains modifications line by line, if you don’t understand the instructions, continuing becomes problematic.
I was able to produce most of my code with the help of an LLM, particularly the most critical parts where I lacked JavaScript expertise. However, as the project progressed, the need to make minor corrections to refine functionality grew, and the LLMs proved increasingly limited in surpassing a certain quality threshold.
On the other hand, the code generated by the model was rich and well-structured enough for me to take over toward the end of the project to finalize aspects of the game that I couldn’t get the model to produce directly.
It’s possible that LLMs will eventually handle massive projects, but for now, they struggle once the 5,000-line mark is reached or exceeded. Yet, for starting a project, they are perfect. They allow you to quickly lay the code’s foundations, and then you can easily dive into the program to add your modifications. The LLM can even help you understand which parts of the code to modify.
This experience reveals that vibe-coding opens up fascinating possibilities: it allows someone with limited knowledge of a specific language to undertake ambitious projects. But it also shows that this approach has its limits in complexity and, beyond a certain threshold, requires technical understanding to fully realize one’s ambitions.