Fear Premium Ruins Engineering Teams

5 days ago 2

Oleksandr Tryshchenko

On average you have a:

  • 1 in 100 chance of dying in a car accident in your lifetime
  • 1 in 3 chance of divorcing (if married)
  • 1 in 100 chance of losing everything when you have a mortgage, and face financial hardships (1 in 67 people files for bankruptcy in the US)
  • 1 in 200 chance of being robbed in your lifetime

If you maximised personal safety you would stay home, never drive, skip sushi, avoid mortgages, weddings, flights, concerts, and playgrounds. Yet billions of us do those things every day because the expected upside dwarfs the downside. Risk-free life is nearly impossible, and arguably undesirable for most; only the risk-adjusted return matters.

Business is no different — but the return curve varies by sector.

  • Banks live under Basel capital rules: every basis point of extra risk attracts punitive capital, regulatory scrutiny, and reputational ruin. Their shareholders are not rewarded for daring bets; stability is the product.
  • SaaS is on another side of the spectrum. Markets crown the team that ships first, learns fastest, and compounds network effects. They are rewarded for velocity, not for stockpiling risk buffers. Fear-driven latency shows up as lost users and churn, not a lower capital charge. It’s a competitive, and agile market, businesses lagging behind the competition get obliterated fast, even the bigger ones.

And of course there’s all sorts of companies in between the two.

Accepting smart, bounded risk is literally how most software companies outrun their rivals — and why the Fear Premium is poison to them. Just like in life, we make bets, and riskier bets often have higher returns if successful.

Every now and then I get to have a conversation with an engineering manager (or someone in a comparable role) who proudly describes their perfect Jira setup. And while striving for good is a good quality, I treat it as a risk signal when I hear it in an interview. Sounds weird? Let’s talk.

The highest-velocity teams I’ve led treated Jira as a chat window with lanes, a means of communication. They prized judgment over extensive acceptance criteria, or award winning ticket naming. In fact, I’m confident they’d keep performing if we used a simple Google Sheet to track who’s busy with what.

That’s the difference between technical seniority (deep craft) and team-work seniority (shared ownership, bias for action). We aim to uplevel the latter.

I’m not trying to make a case that Jira (or similar tools) are meaningless. My point is that on different stages of a team’s evolution (i.e. storming, forming, performing) you have a different ROI of your process. You can tell me an impressive story about the purity of your Scrum process, and I’ll leave the room without any idea of how good your teams perform.

The problem is that there’s so many engineering books talking about the right way for a process, that we started prioritizing the process over content, and I think it’s wrong. I strongly believe that a large fraction of value I bring to the business is risk management, and ability to make a bet, while owning the consequences of it.

Fear is a natural human trait necessary for survival. It helps us to avoid undesirable outcomes, and stick to safety. However, fear is paralysing, keeping your hands in its invisible handcuffs. This is a universal feeling that applies to all parts of our lives, including our job. Taking control of one’s fear is perhaps one of the most overlooked skills one can develop for their career, especially engineers.

Why Engineers? Engineers are multipliers. Oftentimes the results of their work can impact millions of people and hold a business at stake. Consequently engineers are aiming to have equally massive safeguards to protect the business (and themselves). It’s true on the level of engineers, managers, directors, C-levels, everyone in the engineering organisation.

But what if we’re insuring ourselves too much? I suggest using the term “Fear Premium” to describe it.

Fear Premium = opportunity abandoned today to insure against a failure that may never arrive.

Each extra ritual — triage call, second checklist, “quick” sync — seems prudent. Accumulated, they throttle throughput, sap morale, and quietly drain upside. Spending two weeks to plan a project that takes a month, meeting thrice to discuss a decision that can be reverted in a day, sticking to the status quo even when it doesn’t make any sense anymore, you name it.

With that said, oftentimes decisions are, in fact, dangerous and need insurance. How do you understand which is which?

Let

  • L = Latent-time drag
  • M = Morale / attrition drag
  • O = Opportunity cost of delayed value
  • C = Capital & maintenance overhead

Then

FP ($) = L + M + O + C

A rough first pass:

  • L = Delta Lead-Time (days) × Burn-Rate ($/day)
  • M = Delta Attrition (%) × Replacement-Cost ($/hire) + delayed opportunity cost
  • O = Lost-Revenue ($/day) × Delta Lead-Time
  • C = Annual run-cost uplift of extra infra & tooling

If FP > Expected-Loss (impact × probability), you’re literally paying more to be safe than the calamity would cost. Even with a maximum impact (obliteration of the business), you will do your own risk management.

And then, there’s a risk management call you get to take, is it something critical, or what you can absorb? Paying for risk reduction is pretty much the same as insurance.

A good analogy can be made for car insurance. When you purchase a new expensive car on credit, you would insure not only the liability, but also damage of your own vehicle. Oppositely, when buying a used car that costs 3 monthly salaries you’re much less likely to pay 5% of its total price a year to insure it.

My point here is that most of the time you won’t choose to spend 500k to prevent a 5% chance of the business losing 1 million, right? It gets more obvious on scale, however the devil hides into small efforts that are hard to measure.

If you were to ask yourself the following questions, do you think the answer would be “yes”?

  • Does our planning granularity justify the time it takes?
  • Do our daily meetings help more than they distract?
  • Is writing a granular acceptance criteria in Jira a profitable use of our time?
  • Does participation of every engineer in feature scoping make a difference comparable to their time spent, and distraction impact?

Obviously, good acceptance criteria reduces the chance of engineers misunderstanding the task. But if it’s only 1 in 50 tasks misunderstood, does it warrant spending time to extensively document 50?

These are quite difficult questions to ask, but only through repetitively asking them can we develop skill, and honesty to make better judgement calls.

Jeff Bezos (2015 shareholder letter) splits decisions into two categories:

  • One-way doors: Decisions that are irreversible, for example an elaborate massive database migration, where rollback is impossible without either expensive workarounds, or downtime that breaches the SLA
  • Two-way doors: Decisions that are reasonably reversible. Architecture of a small service, that can be refactored in a couple of weeks if necessary.

It’s helpful to look into every decision through this lens. Early in my career I’ve spent a good chunk of my time into debates about architecture solutions for what we build. Retrospectively, many times it didn’t matter, and my efforts were counterproductive, despite resulting in better architecture.

A friend of mine, a serial entrepreneur, once said a good phrase:

The best code is the code that fits the purpose”, which in the context of commercial business means “The best code is the code that earns more money than it consumes as a result of operations and maintenance / extension”. Being an engineer back then I used to disagree wholeheartedly with him. However while life is more nuanced than his statements, I can’t stop observing plenty of the situations where his statement was more truthful, than it’s not.

Coming back to the friend, he still ships products using PHP & jQuery and very junior Fiverr freelancers by giving them a one-pager document of outcome he wants to get. He’s continuously finding customer demand with at least 1–2 products a year, and successfully selling them for enough money to live a chill life with a cocktail on a beach. And while he can obviously bear much higher risks than an established company, it remains a good example of pragmatic thinking when it comes to the codebases, and processes that surround it, because most of his decisions are two-way doors. He knows he’ll inevitably lose money on many projects, he just factors it in, and thinks that the loss is cheaper than its prevention.

Not every service is meant to handle high load, neither every service needs 99.99% availability. Many of MVP features businesses launch are meant to collapse, and we shouldn’t forget that M stands for Minimum. Frequently you’d benefit from shipping a bunch of hacky features, and rebuilding them “properly” if they work out, as agility to experiment, and innovate is an equivalent for SaaS businesses survival. If the feature can be chopped off in a way that you can return to the state of the app before it was built — most of the time this is a two-way door decision.

Once in my career I’ve seen a company allocating around $1M to migrate an admin panel from Angular.js to React, and that less than 100 people used daily. We had brilliant engineers discussing brilliant frontend architecture for hours weekly, while in fact, we’d be better off choosing a cheaper, less ambitious way of dealing with older technology, and spending that money on building the features that make a difference for our customers. And again, back then I had a spark in my eyes, and I’ve been fully onboard with what we do. In fact, I overengineered a part of it to an extent, that makes me wonder if my impact was even positive. I’ve built something really cool, but it had virtually no impact. And I was solving a two-way door problem, which has never materialised in the end. I’ve been aiming to enable scalability, but many years later, the company has roughly the same amount of clients, and the same throughput as it used to have.

I do not encourage you to be reckless, no. Often the safeguards are justified; instead, I encourage you to start asking critical questions about what you do, and doubt your habits more frequently. Do not go, and migrate you terabyte database to another platform to see what happens, because it’s likely one-way door decision 😉

These are the actionable steps you can start taking today. I wish I had this list early on in my career, as all of these are very quick changes that you can make.

  1. Normalize the mistakes in your teams. An engineer that knows that a mistake can cost them a job would be much more defensive against taking risks. And mistakes will happen when risks are.
  2. Build the culture of trust and autonomy. Teach your team when it’s appropriate to take risks. Celebrate their accomplishments, but don’t blame them for failure, learn from it instead.
  3. Map the door. Classify work as Type 1 or Type 2 before debate. Explain this concept to your teams.
  4. Price risks, not fear. Quantify blast radius; if survivable, ship. Often our emotions paint more depressive outcomes, than what we’d really get
  5. Invest into failure mitigation, not only prevention. Feature flags, monitors, traces, blue-green, canaries. Often, quick rollback pays better dividends than 10,000 tests to prevent the failure.
  6. Cull zombie rituals regularly. If no one owns a step, delete it. Not certain? Delete it, and see what happens, you can always have your rituals back if mistaken.
  7. Coach team-work seniority. Reward the dev who kills pointless ceremony as highly as the one who optimises a database query.

Don’t fear, be brave.

Read Entire Article