Following the “worst IT outage in history” caused by a faulty CrowdStrike update that affected 8.5 million PCs, Microsoft is advocating for changes to enhance Windows’ resilience and is considering restricting security vendors’ access to the Windows kernel.
The Redmond tech giant says in a new incident response post that ways to prevent future similar outages are for vendors to minimize the use of kernel mode and for customers to fully use the integrated Windows security features.
The outage was triggered by a faulty update to CrowdStrike’s CSagent.sys driver, which led to memory access violations and system boot loops. Microsoft’s analysis confirms CrowdStrike’s findings, saying that that kernel-mode drivers while providing crucial system visibility and tamper resistance, can cause significant issues if errors occur.
The company is also considering restricting third-party access to the Windows kernel, which is the core of the operating system, to prevent similar issues in the future. A similar attempt was made during Windows Vista days back in 2006, but it fell through due to criticism from cybersecurity vendors and EU regulators.
In another blog post, Microsoft also urges resiliency in the Windows ecosystem.
Microsoft has mobilized over 5,000 support engineers and is sharing updates on their Windows release health dashboard. They advise businesses to have solid plans for continuity and incident response, back up data regularly, quickly restore devices, use safe update practices, and consider cloud management solutions.
The company also says that it plans to implement advanced security measures like Virtualization-Based Security (VBS) and zero-trust approaches. Most affected PCs are now operational, and Microsoft aims to improve system resilience going forward.
On the morning of July 17th, 2024, the tech world woke up to chaos. What started as a routine update from CrowdStrike, one of the biggest names in cybersecurity, quickly spiraled into a significant Windows outage. The ripple effects of this incident were felt across the globe, affecting millions of users and businesses. CrowdStrike, renowned for its cutting-edge endpoint detection and response (EDR) and extended detection and response (XDR) solutions, had made a grave error. They sent out an update without proper patch testing, a technical blunder that exposed the vulnerabilities in their processes. This untested update made its way into production, and the results were catastrophic. Immediately, systems began to fail. From IT giants in Silicon Valley to commercial airlines in Europe, and banks in Asia, the impact was widespread. The outage highlighted the fragility of our interconnected digital infrastructure and the critical importance of rigorous testing protocols. The world was left grappling with the fallout, and the question on everyone’s mind was: How could this happen?
CrowdStrike, listed on the NASDAQ with a significant market presence, experienced a sharp decline in its share value following the outage. According to Nasdaq, CrowdStrike’s shares fell by 15% in the immediate aftermath, reflecting the market’s reaction to the disruption caused by the faulty update. The answer lay in a series of oversights. Despite CrowdStrike’s strong market position and reputation, there were glaring gaps in their governance and risk analysis practices.
Microsoft mulls restricting third-party access to Windows kernel after Crowd Strike outage
on 06/08/2024