A technical breakdown of the root cause of the world’s biggest IT fuck up is out. If it’s correct, we believe no one should consider this an accident.
Here’s a short, non technical explanation.
Windows contains fundamental programs called drivers that are required for the operating system to work. They are pretty low level software that load in the boot sequence and whenever needed. Users don’t directly interact with them and they don’t appear in Task Manager’s easy view. The user is kept away from them. Drivers have powerful system privileges and access. If essential drivers don’t load, work or are corrupt, Windows can completely crash. That can look like the blue screen of death BSOD.
CrowdStrike make a security software product, Falcon, that is a Windows driver that loads during boot up. The update mechanism for Falcon is within Windows. Users don’t get a direct say. CrowdStrike released an update to the globe that contained a direct, guaranteed fatal coding error.
The coding error in C++ language is fatal because it makes the program try to access a non existent part of the machine’s memory. No machine anywhere will have this memory address. When this access attempt happens, a fatal error results that causes the program to crash. When that program crashes, Windows crashes.
CrowdStrike’s Falcon was intrinsic to Windows boot up, so once a crash happened the machine could never be booted up again until Falcon was literally deleted from the machine’s boot sequence. This requires manual access to the machine in many, many cases. Remote fixing isn’t possible, so “the fix” is very high labour and access. That’s an insanely expensive fix. Literally dudes going to each machine and manually doing the fix over and over.
The above is absolutely fucking insane.