Businesses Scramble for Backup After CrowdStrike Update Hobbles IT Networks

If there ever was a day that needed a Plan B, it was Friday, July 19. Banks, airlines, hospitals, fast food chains, retailers, even the Paris Olympics, and nearly any and every business relying on a Microsoft Windows computer system found themselves grappling with a massive disruption that brought critical services to a standstill.

As of late afternoon Friday, what was being called “the worst IT outage in history” was still rippling through various systems. It was all caused by a seemingly innocuous single software update issued by security firm CrowdStrike that inadvertently took down Microsoft’s systems. CrowdStrike software is used by over half of Fortune 500 companies.

“CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted. This was not a cyberattack,” read a 1:25PM update on the CrowdStrike site Friday. “The issue has been identified, isolated and a fix has been deployed. We are referring customers to the support portal for the latest updates and will continue to provide complete and continuous public updates on our blog.We understand the gravity of the situation and are deeply sorry for the inconvenience and disruption. We are working with all impacted customers to ensure that systems are back up and they can deliver the services their customers are counting on.”

The incident underscores the pervasive reliance on access to IT infrastructure in today’s digital economy and puts a pin in the fact that — as within the digital payments and commerce landscape — when systems go down, organizations need to ensure they have an analog backup in place.

“It’s really a core lesson in the ability to not have a single point of failure,” said CompoSecure/Arculus Chief Product and Innovation Officer Adam Lowe, PhD. “If you look at the systems and look at the way the tool affected the problem it did not affect Linux servers. It did not affect Mac systems. It only affected Windows. That’s a challenge and it’s a lesson in picking alternatives to critical systems.”

Lowe, who has had a 10-year plus career in software research and development, has had his share of software updates and knows how they can go astray without the right operational resilience management. According to Lowe, when a software update goes wrong, companies usually have a backup plan to undo the changes. However, with essential security software like CrowdStrike, the problem can become more severe. He explained that if the update affects the system’s core functions, particularly at the startup level in Windows, fixing it can be very difficult. Essentially, it might require completely reinstalling the system from a previous backup, similar to wiping a hard drive and starting over. This process is complicated and time-consuming, especially for systems that are locked out at startup, leaving very few options for a quick fix.

Read more: CDK Fallout Continues as Car Dealers Go Old School

Managing the Chaos of an Unprecedented Global IT Meltdown

In this case, the issue is believed to stem from an update to CrowdStrike’s Falcon Sensor software.

“Falcon is what is known as an Endpoint Detection and Response platform, which monitors the computers that it is installed on to detect intrusions (i.e., hacks) and respond to them,” said Toby Murray, an associate professor at the University of Melbourne’s School of Computing and Information Systems, in a statement distributed by the Australian Science Media Centre. “That means that Falcon is a pretty privileged piece of software in that it is able to influence how the computers it is installed on behave.”

The problem that will continue to impact businesses around the world as they try to come back online is that the solution appears to be entirely manual. Affected systems will need to reboot each computer, delete a specific file, and then restart the computer — all manual, hands-on tasks that, while simple in nature, are challenging to automate at scale.

The economic fallout from the outage is still being calculated, but the incident has highlighted the fragility of the global IT infrastructure and the domino effect that can ensue from the failure of a single major component. After all, when it works well, business software fades into the background. But when a disruption happens, entire sectors can grind to a halt.

“Incidents of this nature do occur in a connected world that is reliant on technology. Disruption at one stage in a digital supply chain can have a ripple effect all the way throughout it … We’ve seen this today, with the incident having a tangible impact across the globe, from aviation to banking and healthcare. It highlights the critical need for organizations to take cyber resilience seriously, and ensure they have an incident management plan in place should situations like this occur,” Mike Maddison, CEO of global cyber security organization NCC Group, told PYMNTS in a statement.

Within today’s business landscape, where partnerships are helping companies stand up modern infrastructure capabilities by streamlining the technical and engineering lift, it is especially important to stay on top and secure each link in the vendor supply chain. 

Read more: Managing Third-Party Risks Emerges as Key B2B Issue

The Importance of Having a Backup Plan

It is crucial for 21st-century businesses to ensure their technology systems function reliably and to a high standard. Increasingly, that means having a backup plan — or several — ready in case things go wrong.

After all, it was just this Thursday (July 19) that another hours-long outage at Swift affected the Bank of England and the European Central Bank. The outage disrupted high-value transactions across Europe, and the European Central Bank said its settlements system was affected.

As the dust settles, the focus worldwide is shifting towards learning from these incidents and strengthening the resilience of global IT infrastructure to withstand future challenges.

At the same time, businesses and organizations are reassessing their reliance on centralized cloud services and considering diversifying their IT infrastructure to mitigate the risk of similar disruptions.

The situation is especially crucial within payments as businesses increasingly make the transition from legacy systems to electronic ones. The PYMNTS Intelligence report in “Getting Paid: Digital Payments for Improving Cash Flow and Customer Experience” found that 79% of B2B suppliers want to receive digital payments, including wire, ACH and virtual cards. Faster payment processing is not the only impetus, as 76% of firms believe that buyers are likelier to pay on time when they pay electronically.

For all PYMNTS coverage, subscribe to the daily newsletters.

PYMNTS-MonitorEdge-May-2024