CrowdStrike, a US-based cybersecurity solution providing endpoint detection and response (EDR) services, experienced a significant technical issue during the early hours of July 19, 2024. This incident, reportedly due to an update made to CrowdStrike’s Falcon antivirus software, primarily affected Windows PCs causing a widespread downtime on these systems with Mac and Linux hosts remaining unaffected. The outage has affected organizations from hospitals to commercial airline flights and even emergency call centers around the globe.
CrowdStrike is one of the most popular EDR solutions currently installed at over 29,000 organizations. EDR is a popular category of cybersecurity solutions because it operates directly on laptops, desktops and servers to identify and block common cyber-attacks, such as malware and ransomware.
As organizations look to respond to this event, we recommend that they consider the following areas:
1. Start with securely restoring your operations.
2. Understand your organization’s exposure to similar threats and take steps to reduce them.
3. Build resiliency to account for and endure future cybersecurity/IT outages.
Securely restoring operations
We recognize there are significant pressures to bring back ‘business as usual’ as quickly as possible. However, it is important that organizations incorporate the following to avoid generating new operational impacts or opening the door to future cyber-attacks.
- Go to the source of truth for fixes. The ‘latest’ news from social media is often untested or specific to individual environments. In addition, cyber threat actors will generate malicious websites indicating quick fixes that result in malware deployment. You should always go to a vendor site or other reputable source, such as your managed security provider, for fix information.
- Reinforce your data protections following the fix. Procedures to restore Windows devices require the distribution of ‘BitLocker’ keys which provide encryption to disk-based storage. These keys should be centrally managed and securely stored. The required distribution of keys to users reduces this control’s effectiveness and organizations should plan to resecure these keys following system restoration.
- Monitor for phishing emails. As with any major issue, bad actors are looking to capitalize and have begun reaching out to organizations masquerading as CrowdStrike support. If you are a CrowdStrike customer, contact their support directly for assistance. Cybersecurity vendors will not proactively reach out and almost always only respond to support tickets.
Understanding and reducing your exposure
This issue highlights a systemic change in how digital systems have evolved into the central nervous systems of business operations that require careful governance like any other enterprise risk. During your post-incident debrief, we recommend evaluating the following areas to understand your exposure and drive further research and investments to reduce risk to future events:
- Inventory and evaluate the risks associated with which vendors receive 'implicit' trust in your environment. The impact of the CrowdStrike outage is far-reaching due to the level of advanced access it had, not just to its own software, but to the underlying Windows environment. This is a more common operating model in today’s ‘as-a-service’ world. Identifying what vendors receive this access, the level of access, and its purpose are critical to understanding your exposure to similar events and to accounting for potential business impacts.
- Heavily regulated industries such as financial services should anticipate specific questions regarding these vendors as well as third-party risk management practices (see below) in the upcoming assessment cycles.
- Assess your technology stack diversification. This involves steps to review how beholden you are to a single provider which would otherwise adversely affect your operations should one provider go down (think vendor lock-in and single-point-of-failure). Choices exist in the marketplace, which could easily cover your business objective and aid in effective risk management and contingency planning. For example, consider the recent impact of the CDK Global outage which affected nearly 30,000 car dealerships, or the Change Healthcare event that impeded revenue cycle processes across the healthcare industry.
- Review your third-party risk management practices. As IT becomes more specialized and critical to businesses, they are often turning to third parties for support. Organizations should consistently evaluate their third-party providers and even their vendors (fourth- and fifth-party) regardless of their market share. Ensuring the organization consistently aligns with your internal and external regulatory expectations is fundamental. "Trust, but verify" is table stakes.
- Build your understanding of system identities. When we think of system access, our first thought goes to people. However, system access is often granted to other IT elements to operate program interactions (non-human identities). Similar to our review and cataloging of human users, organizations should work to inventory and understand non-human identities and the roles they play in software updates.
Maturing your resiliency
In today’s increasingly interconnected world, organizations can work to address risks from these types of events but those risks will never be removed from our digital society. Organizations can increase their operational resilience to these events by developing or maturing a business continuity program. We recommend that organizations consider the following areas to build and enhance their operational resilience.
- Develop and test your business continuity program. Develop and update a business continuity plan. This documents critical business functions and identifies downtime, and manual procedures to sustain critical business operations during these events, even in a limited capacity. Maintaining these critical business functions (operations) requires continuity strategies and activities for alternate staffing and vendor redundancies.
- Consider ‘enterprise-as-a-system’ thinking. This approach links business functions to underlying IT to add operational context to IT risks. It focuses on using risk management principles to build an in-depth understanding of complex interconnections between systems and how each influences enterprise risk.
- Mature configuration management and vulnerability program execution. These common cybersecurity approaches require significant operational discipline to deploy and sustain your cyber and IT hygiene. This review is focused on both quickly responding to risks and balancing inadvertently exposing the organization to new risks are core traits of an effective cyber program.
What to expect in the future?
While this situation and its impacts are still unfolding, it raises important questions and considerations for organizations to monitor:
- Focus on how much trust we place in our technology vendors.
- Reviews and tighter controls around how deeply we let vendors into our environment.
- Potential legal ramifications when a trusted provider causes an outage. Further considerations for implications to parties who recommend specific technology products.
How can RSM help?
We are operating a command center to respond rapidly to the changing landscape of this outage. Reach out to us at recovery@rsmus.com for support. In addition, our team members across the U.S. and Canada can be hands in to help you to respond and recover from this incident, as well as strategize for the future through the following ways: