IR slowdown

When a security incident hits (a data breach, an active intrusion, ransomware crawling across your network) most people assume technical complexity is what slows everything down. And sure, modern infrastructures are a mess of layers, distributed components, and fragile dependencies. But if you’ve ever been in the middle of a serious cybersecurity incident, you know the truth: the systems aren’t usually the problem.

People are.

More specifically, it’s the way we make decisions under pressure. It’s not the flood of alerts or the chaos on the SIEM dashboard that wears responders down. It’s the constant friction around what to do next: waiting for someone to confirm a theory, waiting for approval, waiting for anyone to just take ownership. These invisible decision bottlenecks drag everything down, often long before anyone even starts typing commands.

Once you see this pattern, you can’t unsee it. And once you start designing around it, incident response gets dramatically better. Not because people move faster, but because they stop getting stuck.


What we actually mean by decision bottleneck

During a live incident, responders face a constant stream of small choices that feel huge in the moment:

  • Should we pull the plug on this system right now?
  • Can we safely roll back, or will that make things worse?
  • Do we need to call the CERT?
  • Is this bad enough to wake up the CISO?

When any of these decisions stalls, everything downstream stalls with it. People start second-guessing data they already checked. Slack channels fill with the same questions asked three different ways. What should take minutes stretches into an hour of collective paralysis, and nobody’s being lazy or incompetent. They’re just stuck.

A decision bottleneck happens when the team is waiting on something (information, permission, consensus) that isn’t coming fast enough. Think of it like a traffic jam in your organization’s brain. Everyone’s technically working, but nothing’s actually moving.

And here’s the thing: this is exhausting even when you’re not doing anything. You’re not tired from typing. You’re tired from the mental weight of waiting, wondering, and worrying about what’s happening while you sit there.


Why incidents feel so draining

Teams often describe long incidents as brutal or chaotic, but there’s something specific going on underneath. Unresolved decisions eat mental energy way faster than resolved ones.

Every open question your brain has to hold onto (should we do X? what if Y happens? who’s handling Z?) stays active in working memory. It keeps getting revisited, re-evaluated, turned over and over. Now multiply that by fifteen or twenty open threads, and you’ve got a recipe for cognitive overload, even if you haven’t actually done much yet.

Psychologists have a name for this: decision fatigue. It’s the gradual erosion of your ability to make good choices after too many decisions in a row. But incident response adds a twist: the decisions aren’t just yours. They’re shared across a team, and everyone feels partially responsible for all of them. That diffuse ownership is what makes incidents so socially and mentally exhausting.

If you want a technical analogy: unresolved decisions are like threads stuck waiting on a lock. The whole system loses throughput, even though every thread looks busy.


The three ways decisions get stuck

After sitting through (or reading about) hundreds of postmortems, I’ve noticed the same patterns show up over and over. They’re not technical problems. They’re coordination problems.

1. Waiting for certainty that’s not coming

It’s tempting to wait until you know what’s happening before you act. That traffic spike: is it a DDoS? Exfiltration? Just a weird bug? Teams freeze up trying to prove a hypothesis before committing to a response.

But here’s the reality: incidents don’t hand you clean data. Logs are delayed. IoCs are partial. If you treat certainty as a prerequisite for action, you’ll watch the compromise spread while everyone argues about what’s really going on.

The teams that handle this well have learned to act on “good enough” information. They make reversible moves early and adjust as the picture clarifies.

2. Waiting for permission nobody knows how to give

Even experienced responders freeze when they’re not sure about their authority. Can the SOC analyst isolate that endpoint, or does someone senior need to approve it? Can we cut off VPN access without checking with IT leadership first?

In normal operations, those approval chains make sense. During a crisis, they’re deadly. Every round-trip up and down the org chart is time the attacker uses to move laterally. Clear delegation (decided before the incident) is what keeps this from turning into an endless game of hot potato.

3. Waiting for people to agree on a path

Sometimes everyone knows something needs to happen. The problem is nobody can agree on what. The SOC wants to contain aggressively. Legal is worried about preserving evidence. Management is thinking about customer impact. PR is already drafting statements.

Diverse perspectives genuinely help (they catch things you’d miss otherwise), but unresolved disagreement during a live incident burns time and erodes trust fast. What you need isn’t consensus. What you need is a pre-agreed way to break ties when time is short.


The compounding problem

If decision stalls only affected one task at a time, they’d be annoying but manageable. The real issue is that they cascade.

Delays stack up because downstream tasks can’t start until upstream decisions land. Uncertainty spreads as people lose confidence in their own read of the situation. Stress ratchets up when timelines slip and nobody knows why. Context-switching goes through the roof as people bounce between waiting and checking and waiting again.

This is why hour three of an incident feels ten times harder than hour one, even if nothing new has broken. You’re not fighting a bigger technical problem. You’re drowning in unresolved decisions that keep multiplying.

I like to think of it like browser tabs. Every open decision is a tab consuming memory in the background. Eventually everything slows to a crawl. Closing tabs (even by making an imperfect choice) frees up capacity to actually think.


Building for decision flow

Technical incident response gets better through automation. Cognitive incident response gets better through decision design. You can’t eliminate decisions, but you can make sure they don’t get stuck.

Here’s what that looks like in practice.

1. Make authority boundaries explicit

Everyone on the response team should know, without asking, who can make which calls. This kills hesitation and prevents the finger-pointing that happens when nobody wants to be the one who “made the wrong decision.”

A useful heuristic I’ve seen work well:

If you can reverse it within an hour, do it and tell people after.

This comes from high-stakes fields like aviation, where waiting for approval can be more dangerous than acting. It creates a bias toward movement when the stakes are recoverable, while preserving escalation for the truly irreversible stuff.

Incident command frameworks like ICS formalize this: one Incident Commander, one Ops Lead, one Comms Lead. No ambiguity about who owns what. Ambiguity is the enemy of speed.

2. Pre-decide the routine stuff

Some decisions come up every single time. “Is this alert real?” “Should we page the on-call?” Instead of debating these fresh each incident, define default plays:

  • High severity alert with confidence above 60%? Treat it as a true positive until you can prove otherwise.
  • Likely customer impact within 30 minutes? Loop in comms immediately, don’t wait.

These rules don’t replace judgment. They scaffold it. You’re not automating away human thinking; you’re saving it for the decisions that actually need it.

3. Design for partial information

Here’s the paradox: waiting for complete information guarantees a worse outcome. But acting on nothing is reckless. The middle path is incremental action.

Think of it like this: isolate the suspicious endpoint, but don’t wipe it yet. Contain first, preserve forensic options, escalate eradication as a separate decision once you know more. Each step buys time without burning bridges.

Good systems degrade gracefully. Partial knowledge should lead to partial action, not total paralysis.


Speed comes from structure, not adrenaline

When companies write about their incident response in public postmortems, they often brag about how “fast” the team moved. But speed during incidents isn’t about running faster. It’s about having less to trip over.

Urgency runs on fear. Flow runs on structure.

When the structure is clear (who decides, what happens when you’re unsure, what the fallback is), people stop spinning their wheels. They’re not rushing. They’re just moving, steadily, because they know where they’re going.

Here’s a simple way to think about it:

  • Urgency = burning energy to fight entropy
  • Flow = conserving energy through alignment

When a team hits flow state during an incident, even long ones feel manageable. Communication gets tighter. Loops close faster. And people come out the other side tired but not destroyed.


The infrastructure behind good decisions

Incident response is infrastructure. And like any infrastructure, it needs to be built before you need it, not during the crisis.

Set up authority levels ahead of time

Create a simple matrix that maps incident severity to decision authority:

Incident level Decision owner Escalation path Example decisions
Minor On-call SOC analyst Team lead Isolate endpoint, block suspicious IP
Major Incident commander Security director Disable VPN access, activate IR team
Critical CISO CEO/Legal Public communication, authority notification

When stress is high, nobody should have to wonder “who can authorize this?” The answer should already exist.

Define triggers that force decisions

Specify the conditions that require action. Not suggestions, requirements:

  • Trigger: Lateral movement detected on three or more endpoints → Incident Commander takes over immediately.
  • Trigger: Two conflicting containment proposals on the table → IR Lead makes the call within 5 minutes.

This shifts the question from “should we do something?” to “what specifically should we do?” That’s a much easier question to answer under pressure.

Practice with ambiguous scenarios

Run drills. Not just tabletops where the answer is obvious, but scenarios where the right move isn’t clear. Present an alert that could be a false positive or an early-stage APT. See how fast people can:

  • Figure out who should decide
  • Articulate their reasoning
  • Take a first action

Every time you practice resolving ambiguity, you’re building muscle memory that pays off when a real incident lands at 3am.


What good incident commanders actually do

The Incident Commander role isn’t about making every decision yourself. It’s about keeping decisions flowing through the team so nobody gets stuck.

Effective commanders share a few habits:

  1. They announce decisions out loud. Even something basic like “We’re going with option A, revisit in 15 minutes” cuts through ambiguity.
  2. They recap regularly. Restating current status and next steps keeps everyone synchronized and reduces drift.
  3. They push decisions down. Domain experts should be empowered to act within their area without asking permission for everything.
  4. They close loops fast. When something’s decided and done, make sure everyone hears the outcome, good or bad.

Command isn’t about control. It’s about creating enough coherence that people can act confidently even when things are uncertain.


When the problem is cultural

Some organizations have deep habits around decision-making, especially in risk-averse industries like finance or healthcare. Those habits (multi-level approvals, mandatory sign-offs, consensus requirements) exist for good reasons during normal operations.

But during an incident, they can be poison.

Leadership needs to explicitly distinguish between “how we govern normally” and “how we govern during a crisis.” Crisis mode needs its own rules, optimized for speed and adaptability rather than procedure.

This isn’t about throwing accountability out the window. It’s about recognizing that acting quickly is itself a form of safety. Every minute you spend waiting for approval is a minute the attacker uses to dig deeper. The safest organizations are the ones where informed action beats bureaucratic caution.


Learning from the bottlenecks you hit

When the incident is over, the decision bottlenecks shouldn’t just disappear into a generic postmortem. They’re some of the most valuable data you have.

Every “we waited too long for X” is a signal. Every “nobody knew who should decide Y” is a gap you can close.

A good retrospective digs into:

  • Which decisions were unclear or delayed?
  • What information or authority was missing?
  • How could the same situation flow faster next time?

When you treat decisions as first-class artifacts in your postmortem process, you stop just fixing technical symptoms and start improving the human system underneath.


Changing how you think about incidents

At its core, this is a mental model shift. Instead of framing incident response as fighting fires, think of it as optimizing flow. That reframe turns stress management into system design, and converts speed from something you improvise into something you architect.

Decision bottlenecks will never go away completely. Uncertainty is baked into complex systems. But with the right structure, clear authority, and a culture that values action over hesitation, you can shrink their impact dramatically.

Teams that get this right learn to move confidently even with partial visibility. They delegate without second-guessing. They trust the playbook until they have a good reason not to.

And when that happens, incidents stop feeling like battles. They start feeling like coordinated operations, intense but manageable.

Because in the middle of a live incident, the biggest slowdown is almost never technical. It’s the friction between people trying to figure out what to do.

And speed doesn’t come from working harder. It comes from getting unstuck.