How I bootstrapped a platform with a team of LLMs

3 weeks ago 1

Over the past two and a half years, I’ve been building a cross-platform marketplace with a team of LLMs to disrupt a specific industry. I’ve learned a lot from working day to day with these models, all of which have unique skillsets and technical capabilities that I orchestrate together to build relatively clean and quickly.

They are my interns and junior colleagues. They are over confident in their abilities and need careful management. Some handle large context windows better (understanding vast amounts of information), while others are stronger at solving the actual problem. None are skilled at system architecture or project management, so my day-to-day job is the careful orchestration of sliced-up, narrowly defined tasks for each LLM, and optimized to decrease time to solution success to keep us on schedule.

I was first inspired to quit my comfy, albeit stressful tech job as a principal engineer / team lead when ChatGPT 3.5 was released in November 2022. This tipping point in generative AI made it now possible to build a platform my co-founder and I had been planning out for several years.

We chose to bootstrap with our own savings rather than raise outside funding. As much as I wanted to hire developers, it was simply not affordable for us when trying to keep burn rate super low.

I love working with teams of real people and growing their talent, but unfortunately I was stuck figuring out how to work with a digital intern that costs $20 USD per month. I started building in July 2023, and after trial and error and headbashing rate limits, I found some cadence with 3.5, and even more so with 4.0 and beyond. Eventually, I incorporated more models into my workflow, such as Gemini and Qwen. As LLMs evolved, these interns turned into junior engineers, and now some LLM models are even mid-level.

Acting as the single engineer and architect orchestrating a team of LLMs, the workflow has evolved and optimized over time, and I imagine it will continue to do so. I took all of my past experience growing and managing local and remote teams of engineers across various roles and chunk out clear requirements to each LLM, while I control the vision, architecture, and interoperability.

I’m not doing anything fancy with IDE tools like Cursor or GitHub Copilot, aside from autocomplete. I’ve tried all of them, I don’t like the level of trust I need to hand over. After a certain level of code complexity and depth, none of these agents can be trusted running hands off.

I assign different pieces of code and requirements to LLMs of varying “skill levels,” based on their likelihood of producing a correct answer in a single pass, which maximizes time savings.

I use a combination of various ChatGPT, Gemini, and Qwen models depending on context window size, level of difficulty, etc, as each LLM is better at certain tasks. This is equivalent to assigning specific areas of work to engineers based on their individual qualities.

For nasty bugs that require a high degree of specialized skill to resolve, but also a larger context window than ChatGPT 5 (Extended Thinking) – or whatever the latest model I’m using is – can handle, I distill down vast amounts of telemetry logs or code using Gemini 2.5 Pro’s massive 1-million token context window into a condensed version of the observed issue, combined with code snippets/logs, and let that frontier model resolve the issue. I also prompt it to ask for more snippets/logs/telemetry if needed.

I then assemble the generated code into the IDE by copying from each LLM’s chat window, performing quick reviews that serve as final spot checks. At this point, I’ve developed an intuition for spotting flawed or overconfident LLM logic, memory entropy, and other LLM bugs before integration.

Sometimes, I ask an LLM to outline its solution without writing code so I can review its reasoning (even when using integrated thinking models), which helps with quality control and increase likelihood of next pass acceptance.

I’ve recently discovered that if I have an LLM add telemetry logs it needs for debugging, it will yield faster solutions to complex issues. Instead of me needing to regain context across all parts of a bug to my own debugging, let the LLM auto-iterate on the solution. This only works sometimes though, and there are still instances where handmade coding and fully manual debugging is required.

The complexity of building across iOS, Android, web, backend, and DevOps means it takes too long for me to context-switch into each deeply specialized domain. Therefore, if I can help it, I try to steer above the baseline platform complexity and only step down into the code below if I hit a snag or need to investigate/verify something deeper down.

I don’t believe in the idea of vibe coding if you’re trying to build something novel or technically challenging. It’s simply not possible to develop a cohesive system without being actively at the helm, steering the production of platform architecture.

In all, this has helped me as a single engineer / architect to build a massive, feature rich platform in a fraction of the time doing it solo or even with in-house or outsourced developers. I’m able to quickly shift context to various parts of the system (animation, row level security (don’t worry, I’m extra paranoid about LLM contributions here and triple check their work), etc) without the typical 2 day mental turnover I’ve needed in the past when shifting between extremely different mental models.

Like any good manager or senior colleague, I must guide their hand and curate their work. The greatest challenge for humans is to resist the temptation of letting AI steer and control thought, with the natural tendency for our minds to calm under lessened pressure. The smarter and more confident AI becomes, the more critical it is for us to remain engaged and steering the wheel.

If one person can build a platform, this does not mean that developers’ jobs will go away. Like an architect drafting out a building with the stroke of a pen, there will be whole departments of engineers with their own pens building even more complexity on top of this new baseline.

Discussion about this post

Read Entire Article