Crisis as Catalyst: What AT&T, CrowdStrike Incidents Say About Recovery Best Practices

Never let a crisis go to waste, as the adage goes.

And when it comes to cybersecurity best practices and procedures, recent digital disruptions — from CrowdStrike’s Microsoft outage to AT&T and beyond — can teach enterprises a lot about fortifying their defenses and recovery plans.

After all, it was just Monday (July 22) that the Federal Communications Commission (FCC) released its report criticizing AT&T’s role and response in a February outage that blocked 92 million calls, including over 25,000 attempts to reach 911.

It ultimately took over 12 hours for AT&T to fully restore service. The outage affected all 50 states as well as Washington, D.C., and other domestic territories like Puerto Rico and the U.S. Virgin Islands. It was caused by a botched update related to a network expansion.

Sound familiar? A software update was what led to the CrowdStrike crash last Friday (July 19) which affected 8.5 million Windows machines around the world, leading to chaos at banks, airports and hospitals.

That’s why, in today’s interconnected world, where digital disruptions are, unfortunately, increasingly common, the ability to learn from and adapt to these challenges is crucial for long-term business success.

Read more: CrowdStrike Outage Rolls On; Attention Turns to Software Update Quality Control

Embracing Best Practices Derived From Recent Incidents

Per the FCC’s report on AT&T’s February outage — and not its July cyberattack — factors behind the “extensive scope and duration” of the outage included “a configuration error, a lack of adherence to AT&T Mobility’s internal procedures, a lack of peer review, a failure to adequately test after installation, inadequate laboratory testing, insufficient safeguards and controls to ensure approval of changes affecting the core network, a lack of controls to mitigate the effects of the outage once it began, and a variety of system issues that prolonged the outage once the configuration error had been remedied.”

In other words, a cascading failure to follow proper procedures across key workflows.

While the immediate cause of the outage was an employee who misconfigured a lone network element, adequate peer review should have prevented the change from being approved, the FCC emphasized.

In an interview with PYMNTS Friday, CompoSecure/Arculus Chief Product and Innovation Officer Adam Lowe noted that when a software update fails, companies usually have contingency plans. But issues with essential security software like CrowdStrike can quickly escalate, and disruptions to core functions, especially at the Windows startup level, can be difficult to correct.

Post-incident reviews and analyses help refine incident response plans, making future responses more efficient and effective. This continuous improvement process is vital for maintaining a strong security posture.

See also: Businesses Scramble for Backup After CrowdStrike Update Hobbles IT Networks

Crises can also catalyze a shift in organizational culture, heightening awareness of cybersecurity issues and encouraging proactive behaviors among employees.

And proactive, hyper-aware behavior is crucial in today’s operating landscape where threat actors can move in real-time to activate new vulnerabilities and manipulate unsuspecting end-users.

As just one example, cybercriminals have already jumped on the CrowdStrike outage by developing fake, malware-infected recovery manuals.

According to a Monday blog post by CrowdStrike, “CrowdStrike Intelligence identified a Word document containing macros that download an unidentified stealer now tracked as Daolpu. The document impersonates a Microsoft recovery manual. Initial analysis suggests the activity is likely criminal.”

CrowdStrike Intelligence has also monitored other malicious activity leveraging the event as a lure theme, with the company saying it had received reports that threat actors are conducting activities such as sending phishing emails posing as CrowdStrike support to customers; impersonating CrowdStrike staff in phone calls; posing as independent researchers, claiming to have evidence the technical issue is linked to a cyberattack and offering remediation insights; as well as selling scripts purporting to automate recovery from the content update issue.

At the same time, there has been a surge in “typosquatting domains” now being registered to exploit the CrowdStrike outage. Typosquatting is when bad actors set up domain names that appear to be legitimate in order to lure genuine users but have small typos and lead to malicious cyber infections.

For example, some domains already flagged as malicious include crowdstrikefix[.]com; crowdstrike-helpdesk[.]com; and crowdstrikebsod[.]com.

That’s why employee training should cover best practices for password management, recognizing phishing attempts and reporting suspicious activities.

PYMNTS-MonitorEdge-May-2024