Why over-engineering happens

1 month ago 4

If you’ve worked in software long enough, you’ve probably seen it: a CRUD app serving a handful of users, deployed on a Kubernetes cluster with half the CNCF landscape stitched together for good measure. On paper it looks impressive. In reality, it’s a Rube Goldberg machine solving problems the team doesn’t actually have.

Contrast that with Levels.fyi. The site now helps millions of engineers compare salaries and career ladders, but when it started, the “backend” was just Google Forms feeding into a Google Sheet. No microservices, no Kubernetes, no event bus. Just the simplest tools they could get their hands on. That lightweight setup gave them speed. They validated the idea, grew an audience, and only invested in more complex systems once the product had proven itself. In other words: simplicity didn’t hold them back; it made their success possible. You also need to remember how some of the most complex infrastructures started out very simple. Airbnb, facebook, reddit to name a few. They were scrappy monoliths before they became household names.

That’s the main character syndrome we all keep catching. Somewhere along the way, architecture became less about solving today’s problems and more about defending against imaginary futures. Every new project seems to begin with the full playbook of distributed systems, regardless of whether the app will ever have the scale to need them. Obviously, teams end up drowning in complexity, ballooning cloud bills, and crawling delivery speed.

When you split the monolith into 24 services…
…and now every bug is a distributed systems lecture.

I’m not against thoughtful, well-designed architecture. But there has to be a call to sanity. The best systems I’ve seen weren’t the most complex. They were almost always the most appropriate ones. They matched the scale of the problem, grew with the business, and left room for engineers to breathe.

One of my professors once gave me advice that stuck: “You want people to hate you for the simplicity of your idea. They could have done it too. But you were the one who executed it well.” This goes beyond academia. Levels.fyi is living proof. A site that millions of engineers rely on today didn’t begin with a massive distributed system.

This is where the conversation about over-engineering really begins. Because the problem isn’t architecture itself. It’s architecture applied without context. It’s when we forget that software should first and foremost serve a purpose: solving real problems for real people.

In this post, I’ll dig into why over-engineering happens, the real costs it creates, and the principles we can use to keep architecture grounded. By the end, I’ll circle back to what simplicity actually looks like in practice and why it’s harder, braver, and more impactful than chasing complexity.

What Do We Mean by Over-Engineering?

You probably know yourself that over-engineering is rarely about writing too much code. It’s about designing a system with more moving parts, abstractions, and buzzwords than the actual problem requires. It’s like preparing for a tsunami, all you really will get is a big wave. The preparation for tsunamis is obviously on a different scale, the same goes to computer systems.

At its core, good architecture is about fit. It gives you flexibility to change direction, maintainability so future engineers can work with it, and scalability so the system can grow with demand. Over-architecture, on the other hand, is when the system becomes a showcase of complexity for its own sake. It’s when design decisions are driven more by what looks impressive on a resume or in a tech talk than by what the business actually needs.

You’ve probably seen it in the wild:

A startup with 24 microservices serving a few dozen users.
An internal dashboard with an event bus because someone read Kafka case studies at Linkedin.
A CRUD app running on Kubernetes when a simple VM or even serverless function would have been enough.
Teams practicing resume-driven development, picking tech not because it’s the right tool but because it “sounds good” in an interview later.

None of these patterns are inherently bad. Event buses, microservices, orchestration platforms. They all solve real problems at scale. But they become wasteful and I think most of the time harmful when applied prematurely. The irony is that over-engineering rarely makes systems stronger; it often makes them more fragile. Each extra service, abstraction, or layer is another place for bugs to hide, for costs to creep, and for developers to lose hours debugging. I’ve done it. I know it well. Facepalm! And I’ve seen it numerous times!

Over-engineering, then, is not the presence of modern tools. It’s the misalignment between architecture and actual need. It’s designing for scale you don’t have, complexity you don’t need, and futures you can’t predict.

Why Does Over-Engineering Happen?

If over-engineering is so obviously painful, why do we keep doing it? The truth is, it’s rarely malice or ignorance. Most of the time, it comes from good intentions gone sideways, incentives misaligned, or plain old human nature.

Here are some of the biggest drivers I’ve seen:

Premature optimization

We like to feel prepared. Especially engineers. We like worst case scenarios. Nothing is wrong with that except for the likelihood of that happening. It’s comforting to imagine millions of users on day one and to design for that future. But most products never get there. In the meantime, you’re stuck maintaining an architecture sized for an audience that doesn’t exist. It’s better to throw money at a problem when you have money than when you do not.

Resume-driven development

Engineers are ambitious. They want to grow. They want to get promoted. They want to stay marketable. Nothing is wrong with that. Thus, learning Kubernetes or spinning up microservices looks better on a CV than quietly delivering a reliable monolith. The problem occurs when the resume takes priority over the product. In that case, these technologies don’t really help, because they never understand why they need Kubernetes in the first place. If the requirements lead the engineer there, then they gain a deep understanding of why it becomes the necessary evil. In contrast, engineers who have built and scaled monoliths often perform better in interviews because they truly understand the problem.

Management incentives

Leadership often rewards scale and complexity. It’s easier to sell “we built a distributed architecture with service meshes” than “we delivered a simple system that works.” Promotions and recognition follow the former, even when the latter brings more value. Obviously, it depends from one organization to another but the sentiment is somewhat true across the industry.

FOMO and trend-chasing

Nobody wants to be the team still running a PHP monolith when everyone else is buzzing about service meshes and AI copilots. But chasing trends just to look modern can trap you in tools and patterns that don’t actually fit your problems. That’s why “boring” is good. By boring, most people mean something that has worked reliably for a decade. It doesn’t change that fast. And what works for almost everyone will probably work for you too. Your software is almost certainly not a snowflake.

Misaligned priorities

Teams sometimes optimize for what’s interesting instead of what’s useful. It’s more exciting to solve hard technical puzzles than to wire up boring business logic. The danger is that customers don’t care about your abstractions. They care about whether the product solves their problem.

When I was building a billing engine in a past life, I poured in every best practice I knew. Patterns, abstractions, designs for scale. You name it. I had all the trimmings. Then we had to debug it. A colleague went five layers deep just to figure out what was happening. That’s when it really hit me. I didn’t build an elegant architecture. I built a big spaghetti universe. Stepping back, I realized everything could have lived in a single file. He just had to go through that. I overloaded him cognitively without any reason.

Now, if you put all these forces together, it’s no wonder we end up with your modern, infinitely scalable architectures. Each decision has a logic of its own, but the cumulative effect is a system heavier than it needs to be.

The Cost of Over-Engineering

When you think of over-engineering, you quickly realize the issue isn’t just fancy architecture diagrams. It comes with very real costs. Some of them show up immediately, others slowly bleed teams over months or years.

I still remember a moment at AWS when I discovered DynamoDB was being used for a project. Nothing wrong with DynamoDB. It’s a powerful tool. But in this case, we really needed joins. The data model wasn’t even that complicated. I struggled to see why plain old MySQL wouldn’t have been enough. Instead, the team had locked themselves into a system that made simple queries harder, slowed everyone down, and added unnecessary complexity. But you know we needed the scale!

Anyway, let’s take a look at the different aspects of the cost of over-engineering.

Slower delivery

Every layer of abstraction, every extra service, every “best practice” framework adds drag. At this point, I’m highly skeptical of bringing anything new in unless it’s proven. And by proven, I don’t mean “proven at AWS or Google.” I mean proven for small teams, small projects, in real-world conditions that look like yours. Features that should take a week often stretch into months. Teams spend more time wiring pieces together than actually delivering value.

Fragility disguised as resilience

Ironically, systems designed to look robust often become brittle. Each moving part is another point of failure. I’ve seen architectures where a single misconfigured queue brought the entire system to its knees. Complexity multiplies failure modes. The more you add, the less you actually get.

Every component also needs to be monitored and maintained. Even something as “simple” as a library upgrade can become a nightmare. And I can tell you from experience: a single library upgrade can break things in ways you never expected. What looks like resilience on the whiteboard often turns into fragility in production.

Higher costs

Over-engineering bleeds money. Your bills can skyrocket not because you have users, but because you’re running fleets of idle services that nobody actually needs. You’re paying for machines to sit around and look busy.

Then comes the hidden payroll tax. You train engineers on tools they’ll never use again, and you burn cycles just keeping the whole thing alive. Every dependency, every service, every abstraction has to be patched, upgraded, and babysat.

Loss of developer velocity

Debugging across microservices is not fun. Tracing through layers of patterns or waiting for CI pipelines to rebuild a dozen services kills momentum. What could have been a two-hour fix in a monolith turns into a two-day ordeal.

And here’s the kicker: you need far more tooling just to make sense of it. You can’t just attach a debugger and see what’s going on. One service calls another, half of it happens in transient states, logs are scattered everywhere. Maybe it’s networking. Maybe it’s not. Either way, shit gets real, fast.

Business risk

The ultimate cost of over-engineering is that the product never ships or ships too late. Startups die before they find a product–market fit. Big companies slow to a crawl until competitors eat their lunch. Customers don’t care how clever your architecture is if they never see the features.

I’ve felt this pain myself more than once. That’s why I think Amazon’s leadership principle of Deliver Results exists. It’s a reminder: focus on delivery, not fancy stuff. The first Kindle wasn’t perfect. Perhaps, it was far from it. But it shipped. Customers bought it, Amazon learned, and over time it grew into its own market. If they had waited until it was flawless, the opportunity would have passed.

The harsh truth: most teams aren’t failing because their architecture can’t scale to millions. They’re failing because their architecture is too heavy to let them move at all.

Monoliths vs. Microservices

The monolith vs. microservices debate is a perfect lens for understanding over-engineering.

Monoliths used to be the default: everything under one roof. Front-end, back-end, business logic, and database access bundled together. For small projects and early startups, this simplicity was a gift. You could ship fast, debug with a single stack trace, and deploy everything in one go.

But monoliths came with pain points as systems and teams grew:

One small change requires rebuilding and redeploying the whole app.
A single crash can bring down the entire system.
Large teams can step on each other’s toes, slowing delivery.
Scaling is all-or-nothing: you can’t just scale the part under pressure (say, payments).

Microservices promised to fix this. By breaking the big house into a neighborhood of smaller ones, each service could:

Deploy independently.
Scale separately.
Be owned by a dedicated team.
Use different tech stacks if needed.

This sounds great and for giants like Google or Netflix, it was. But for most teams, microservices creates new headaches:

More moving parts means harder debugging and more failure modes.
Function calls get replaced by network calls, introducing latency and retries.
Monitoring, logging, and testing becomes a distributed nightmare.
Data consistency across services becomes tricky.
Dev environments balloons. You need half a dozen services running just to test one feature.

In other words, microservices didn’t eliminate complexity. If I’m being honest, it only redistributes it. What often gets lost is the nuance: microservices solve problems that most teams don’t have. They make sense once you have big teams, massive traffic, or strict scaling needs. But if you’re a small startup or an internal team with modest scale, a well-structured monolith (or modular monolith) often gives you 90% of the benefits without the operational overhead.

A good rule of thumb is to start with a monolith and get modularization right. Only carve out microservices when you have a reason.

Principles for Avoiding Over-Engineering

So how do you keep yourself honest? How do you draw the line between thoughtful architecture and unnecessary noise? Over the years, through many mistakes, facepalms, and a few battle scars, I’ve found a handful of principles that act like guardrails. They don’t guarantee simplicity, but they tilt the odds in your favor.

Here are some of the principles I keep coming back to when I want to avoid building the kind of mess I’ll regret later.

Start simple. Smaller is faster. Make it boring. Default to the smallest thing that could possibly work: a single repo, a modular monolith, one database, one queue if any. “Boring” means proven, documented, easy to hire for, and stable over years—not weeks.
YAGNI > future-proofing. Don’t build for scale you don’t have. Add complexity only when you have evidence (load, teams, SLAs) that demands it.
Rule of 3 (for abstractions and services).
1. Don’t introduce a new abstraction until the third real use case.
  Don’t split a service until the third concrete pain shows up (deploy lag, team contention, hotspots).
Evidence before elegance. Use data to drive architecture: real latency, real error budgets, real oncall pain, real bills. If you can’t measure the problem, you’re guessing.
Prefer a modular monolith first. Enforce boundaries in-process (modules, packages, clear interfaces). Treat module interfaces like external APIs. If/when you need to extract, it’s a lift. It won’t be surgery.
Design for delete. Every new component should be easy to rip out. Document the “uninstall plan” up front (dependencies, migration path, rollback). If you can’t remove it, you’ll be owned by it.
One-page RFCs, time-boxed spikes. Before adding tech, write a 1-pager: problem, options, cost, ops impact, exit criteria. If still unsure, run a 1–2 day spike. Ship learnings, not platforms.
Keep the ops surface area small. Each extra runtime, queue, or database multiplies oncall load, patching, observability, and costs. Consolidate runtimes and infra wherever possible.
Stable contracts beat shared databases. Define clear API/contract boundaries. Avoid cross-team “reach into my DB” patterns. If you can keep contracts stable, you can move fast without a distributed hairball.
Optimize for developer velocity first. Local dev in <10 minutes. CI under 10 minutes. One command to run, one command to deploy. If this slips, complexity is winning.
Cost guardrails by default. Set budgets/alerts early (spend per env, per service). Kill idle capacity. Complexity that doesn’t earn its keep gets turned off.
Observability that fits the size. Start with logs + metrics, add tracing only when needed.
Choose defaults, then stick to them. Pick a language, a framework, a DB, a queue. Deviations require a short RFC and a real reason (latency target, library gap), not taste.
Conway with intent. Shape team boundaries to match clean seams in the system (not org politics). If teams don’t map to modules/services, you’ll get a distributed monolith.
Avoid distributed transactions unless you absolutely must. Prefer idempotency, retries, and reconciliation jobs. If you can keep a workflow in one transaction, do it.
Small batches, fast feedback. Ship thin slices. If your change can’t be deployed independently and verified quickly, your architecture is fighting you.
Make the unhappy path explicit. Chaos follows ambiguity. Document failure modes, timeouts, retries, and back-pressure up front. Complexity tolerated is complexity multiplied.
Simplicity budget. For every new component introduced, retire or consolidate something else. Net complexity should trend down, not up.

Healthy Engineering Culture vs. Complexity Fetish

The hardest part of fighting over-engineering isn’t the tech. It’s almost always the culture. You can have the smartest engineers in the world, but if the culture rewards complexity, you’ll end up with bloated systems every time. Period!

In a healthy engineering culture, simplicity is a strength. Teams are encouraged to pick boring, proven tools. Leaders reward delivery and customer impact, not the length of an architecture diagram. People feel safe saying “we don’t need that yet” without being labeled as lazy or out of touch. The real heroes are the ones who make things easier, not harder.

In a culture addicted to complexity, it’s the opposite. Every project becomes a chance to flex with buzzwords, kubernetes, service mesh, event-driven everything. Engineers build blow your mind systems to impress their peers or flashy resumes. Leaders reward big sounding projects because they look visionary on slides. The result is a fetish for complexity: systems nobody fully understands, costs nobody controls, and velocity nobody can recover.

The difference comes down to incentives. If you celebrate elegance and delivery, you’ll get lean, maintainable systems. If you celebrate complexity, you’ll get tangled messes that look impressive in a conference talk but collapse under real-world pressure.

A healthy culture teaches engineers that simplicity is not cutting corners. It’s a discipline. It’s saying no to the shiny tool, resisting the urge to over-abstract, and remembering that customers don’t care about your architecture diagrams. They care about whether it works or not.

Closing Remarks

Over-engineering is not a technical problem. It’s a mindset problem. We’ve bought into the absolute clown show that ‘complex’ is better. It’s not. It’s just slower and expensive. The systems that stand the test of time aren’t flashiest; they’re the ones that fit the problem, grow with the business, and leave room for engineers to breathe.

The lesson from stories like Levels.fyi, Airbnb, or even the first Kindle is simple: you don’t win by building the biggest system. You win by building the right one. Execution beats elegance. Delivery beats polish.

As engineers and leaders, we have a choice. We can chase complexity for its own sake, or we can build cultures that celebrate clarity, delivery, and impact. In the end, customers won’t remember whether you used Kafka, Kubernetes, or the hottest framework of 2025. They’ll remember whether your product solved their problem.

So the next time you’re tempted to reach for the “enterprise” toolbelt, stop and ask: what’s the simplest thing that could work? That question has saved me more times than I can count.

Just like a shell pipeline: small tools, each doing one job well, connected in simple, powerful ways. That’s elegance. That’s engineering.

Read Entire Article