What caused the CrowdStrike outage — and what happens next

Friday morning, chaos hit. Sky News couldn’t broadcast its morning news. Airlines halted flights as airports were overwhelmed. Emergency call centres, hospitals and banks were all affected.

Photos popped up on social channels of the dreaded Blue Screen of Death, with computers unable to load Windows beyond a recovery screen. Rebooting solved nothing. Was it a Microsoft fault? Was it an epic cyberattack? And then, clarity: a security company called CrowdStrike had shipped a buggy update. Whoops.

CrowdStrike’s CEO George Kurtz posted on X.com: “CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted. This is not a security incident or cyberattack.”

And this is how CrowdStrike — and its customers, and their customers — learned the hard way the ancient IT proverb: “Never push updates on a Friday.” And it’s how we all learned that our systems are deeply not resilient.

Why did the CrowdStrike outage happen

Let’s just start by saying this: it wasn’t — contrary to much of mainstream reporting — anything to do with Microsoft and not the company’s fault. Although its Azure cloud computing was annoyingly hit by an unrelated widespread outage at the same time.

Instead, the blame for this outage rests firmly on the shoulders of American firm CrowdStrike, which offers cloud-based security. CrowdStrike’s Falcon is a specific product for endpoint detection and response, which means it looks for dodgy activity proactively — and that requires privileged access, meaning any faults can hit particularly hard.

On Friday, CrowdStrike pushed out a sensor configuration update to its Window user base. “Sensor configuration updates are an ongoing part of the protection mechanisms of the Falcon platform,” the company said in a blog post. “This configuration update triggered a logic error resulting in a system crash and blue screen (BSOD) on impacted systems.”

The update went out at 4:09 UTC on July 19; the update was halted by 5:27. That means any customers that were online and operational to run the update between that time were potentially impacted — they’re probably already aware of that, as their systems will have crashed.

How many Windows systems were hit by the CrowdStrike outage?

Microsoft has said 8.5 million Windows devices were hit by the outage, less than than 1% of the Windows entire install base, according to a blog post by David Weston, Vice President of Enterprise and OS Security at Microsoft.

Still, it was enough to wreak havoc. Or as Weston put it: “While the percentage was small, the broad economic and societal impacts reflect the use of CrowdStrike by enterprises that run many critical services.”

What next?

While we await the inside report into how this was allowed to happen — and which intern is going to get fired over the incident — there’s plenty of work to be done to clean up now.

Both CrowdStrike and Microsoft have issued workarounds, and are working together on developing an automated solution. That means IT admins at impacted companies have had a busy weekend figuring out their own mitigations to get companies back online. 

Anecdotally, we’ve heard of technicians in large enterprises visiting tens of thousands of individual PCs to fix the errors.

CrowdStrike said Sunday night that a “significant number” of the impacted 8.5 million devices were already back working.

“Together with customers, we tested a new technique to accelerate impacted system remediation,” the company said in a social media post. “We’re in the process of operationalising an opt-in to this technique. We’re making progress by the minute.”

Not all installations will be automatically recoverable, however, so some manual work will be required.

Fixes from Microsoft and CrowdStrike

In the meantime, IT teams are working to manually fix the problem, be it by rolling back systems to their state before the update or wiping and restoring from backups, notes Toby Murray, associate professor of cybersecurity at the University of Melbourne. Some reports suggest IT admins have had to go in and delete the offending file manually, machine by machine. Microsoft has also released its own recovery tool that works via a bootable USB.

Raise a glass for those tireless IT workers, but also everyone else mopping up the mess. While plenty of people got the day off with an “outage Friday”, many across IT, healthcare, aviation and more had a sudden rise in paperwork and other tasks to complete just to keep things ticking along.

This problem only hit companies and organisations using CrowdStrike Falcon, not home PCs. So if your friends and families are worried about booting up their laptop, tell them they need not worry — the concern is instead the sudden failure of societal infrastructure.

Lessons to learn from CrowdStrike outage

As has been pointed out countless times over the past few days, the outage highlights the delicacy of digital networks and our overreliance on them. Hopefully it will prove to be a relatively benign wakeup call that we need better backup systems for if and when technology fails us. We can just about get by without such systems, if only for a short while — some airlines even began handwriting boarding passes.

If your company uses CrowdStrike, there will likely be discussion around whether that should continue. CEO Kurtz would like you to know the company plans to never do this again.

“I want to sincerely apologise directly to all of you for the outage,” he wrote in a blog post. “All of CrowdStrike understands the gravity and impact of the situation… As we resolve this incident, you have my commitment to provide full transparency on how this occurred and steps we’re taking to prevent anything like this from happening again.”

Nicole Kobie
Nicole Kobie

Nicole is a journalist and author who specialises in the future of technology and transport. Her first book is called Green Energy, and she's working on her second, a history of technology. At TechFinitive she frequently writes about innovation and how technology can foster better collaboration.

NEXT UP