“Good communication” is one of those phrases everyone nods along to — until the incident hits, and suddenly comms unravel before your eyes.
So here’s what I actually mean when I say communication matters.
Note: I must say the meaning and practice of good incident response communication is far from well understood and I’m sharing my learnings. These are not necessarily the right answer. Please share your views and examples of good communication.
I used to dread incidents (I’ve Been On Both Sides of the Incident War Room)
I’ve been the poor soul on overnight support, bleary-eyed, trying to make sense of a critical incident … all while being pinged every five minutes by anxious managers demanding updates I didn’t have yet.
I’ve been that manager too — stuck between a rock and a hard place. Caught between pressure from the C-level and the instinct to give responders breathing room to fix the issue.
Let’s face it — major incidents often feel like a lose-lose game for mid-level managers, especially when the organisation lacks a clear setup for incident management and communication.
Today, my story is a little different. At Uptime Labs, our business revolves around creating incidents—on purpose—and watching teams resolve them.
This is how we earn our bread, so for once, as both an engineer and a business leader, I can happily say: I want a lot more incidents!
We’ve chosen the challenge of starting a new company that runs incident drills because we believe something fundamental:
Incident response demands a unique set of skills — ones we, as engineers, don’t use every day.
When these skills are properly understood and regularly practiced, incidents stop feeling like chaos. Instead, they become exactly what they should be — opportunities for learning, growth, and even a bit of team bonding.
In this post, I’ll touch on two of the core skills we’ve observed after running 4000+ incident drills— and how practice turns them from stress-inducing to second nature.
1 – “Good Communication” During an Incident
The key to good communication is understanding that it’s never one-size-fits-all. Different audiences need different things — and knowing how to tailor your message is what separates noise from clarity during an incident.
Business Stakeholders: Keep It Clear, Calm, and Customer-Focused
Their priority: “Is this affecting customers or revenue? Are competent people handling it?”
I’m very cautious of using term “competent people”. What does it mean? What makes someone competent/right person for an incident? It’s another topic worth exploring, nevertheless I’ve been asked this question many times by my execs.
- Tell them what’s broken — but also what’s still working.
- Give them confidence that it’s being handled competently.
- Provide just enough info so they can manage customer expectations.
- Don’t let them hover. The best communication keeps them informed and at a healthy distance.
- Set clear expectations on when will be the next update and what is needed from them.
Do:
“We’re currently experiencing an issue affecting the ability of some customers (less than 10%) to see their transaction history. The core services remain operational, and our team is actively working on resolution. We’ll provide an update in 30 minutes.”
→ Clear impact, assurance, and a timeline.
Don’t:
“There’s a P1 Sev incident affecting API latency due to a potential database lock issue. We’re investigating.”
→ Technical jargon = confusion and panic.
Senior IT Management: Facts First, No Surprises
Their priority: “Do I have a handle on this before the business blindsides me? Are we managing risk?”
- Make sure they hear about the incident from you, not from the business side.
- Provide clear, concise impact summaries and key technical facts.
- Leave no room for rumours or side-channel noise.
- Offer assurance that risks are understood (or still being assessed), working to contain and mitigate, and lessons will be learned once the incident is resolved.
Do:
“We’ve identified an issue impacting the payment service. Current scope: ~20% of transactions are delayed. Investigating circumstances leading to the incident . Mitigation options being explored. Next update in 15 minutes.”
→ Crisp, factual, proactive — leaves no room for rumours.
Don’t:
“We think something’s wrong with production, but we’re not sure yet.”
→ Vague updates trigger more questions, escalations, and unwanted involvement.
The golden rule: Vagueness if not clearly stated that is deliberate due to lack of information, can cause a lot more question and distract the investigation. Keep them confident enough to stay out of the responder’s way.
Fellow Responders: Shared Understanding in Real-Time
Their priority: “What’s happening, what do we know, and how can I help?”
- Maintain a shared understanding of what’s happening.
- Share clear context when escalating or bringing others in.
- Regularly update on new info and thought processes.
- Always separate facts from opinions.
This is where real-time collaboration shines — but only if communication stays disciplined.
Do:
“Seeing increased 5xx errors from the auth service since 14:32. DB connections are maxed out. Can you check if recent deployments could be linked?”
→ Provides facts, timeframe, and a clear ask.
Don’t:
“Can you take a look at your changes?”
→ Zero context = wasted time and duplicate effort.
Also, good responders constantly update thought processes:
“I’m thinking this could be related to yesterday’s config change, because we changed DB queries — I’ll check logs to confirm.”
→ Distinguishing theory from fact keeps everyone aligned.
2 – Progressing the Incident: It Starts with Comfort in Ambiguity
Beyond communication, effective responders share another trait: They’re comfortable navigating ambiguity.
The best teams start by building a factual picture of the incident’s impact, no matter how little they know at first. From there, they develop a working theory — a living, evolving understanding of what’s happening.
That working theory isn’t just technical; it’s the backbone of clear communication and teamwork. As it evolves, so does the path to resolution.
Practice Builds Muscle Memory — Not Learning Alone
These skills—whether it’s crisp communication or managing ambiguity—aren’t learned by reading a handbook. Traditionally, it takes years of real-world incidents to develop them.
But there’s a faster, safer path: frequent, realistic incident drills.
In future posts, I’ll dive deeper into teamwork dynamics and domain expertise. But if there’s one takeaway here, it’s this: Good communication in incidents isn’t luck — it’s a skill, forged through repetition.
Hamed is the co-founder and CEO of Uptime Labs. He has 20 years of experience in engineering leadership, reliability engineering and IT operations. Having spent the majority of his career at the sharp end of incident response in financial services, he's looking to help all companies master the unexpected.