Operational Resilience – Protecting Against Extreme and Adverse Events

When extreme events happen, it’s major news. And likely, we all find ourselves thinking, “Thank God it’s not us.” This time. We all remember the 2017 nightmare scenario shipping giant Maersk endured when it was infected with the NotPetya virus, essentially bringing the company’s infrastructure and technology to a complete halt. Forty thousand end-user devices, 4,500 servers and 2,500 applications were taken down.

While Maersk managed to retain 80 percent of its business throughout the event, the price tag to address the issue topped an estimated $300 million. And when we think about operational resilience, this is a perfect example of the types of events about which we are becoming more concerned. Will we be relying on manual processes to survive, as Maersk did? Will we stake our futures on heroics? Or, will we be prepared with a systematic response plan?

What’s New About Operational Resilience?

Operational risk involves understanding and managing the organization’s risk appetite, or the tolerance for a variety of risks the business exposes the organization to. Operational resilience is the ability of an organization to withstand adverse changes in its operating environment and continue the delivery of business services and economic functions.

The concern isn’t just “Am I, as a company, going to survive?” The concern is also “Will my external stakeholders, my customers, the broader economy, the critical infrastructure, the kind of services that are provided, be impacted by this event?”

Most firms have had disaster recovery and business continuity programs in place for a long time, but the lens is a little bit different when it comes to operational resilience, which looks beyond tactical activities like keeping systems and servers up and running, focusing on the business processes that those systems and services support. The operational resilience lens measures success by keeping business services running through any type of event but in particular during a highly destructive extreme but plausible event, like the one at Maersk. At the highest level, operational resilience considers the services the business provides and the impact an extreme but plausible event might have on critical external stakeholders. Those are the types of big questions that the regulators are asking, encouraging and enforcing companies to address more formally going forward than they may have in the past.

The Protiviti Operational Resilience Framework

We use this Operational Resilience Framework to start the discussion with our clients. On top is the governance component, followed by business services – essentially, formalizing the process of defining criticality and the impact tolerance component. We also review whether foundational elements are in place and what kind of testing takes place to anticipate extreme scenarios. Most companies have certain components of this framework in place today, specifically with regard to existing programs to manage foundational elements. What’s needed is that last 20 percent of resilience planning, which formalizes business services, adopts an impact tolerance view and determines the right measurements. Those are the types of strategies that many firms are not implementing today but will be in the future.

Defining Business Services, Determining Governance Lines of Defense

We often use the U.S. Department of Homeland Security’s National Critical Functions Set as a basis for our conversations with clients about business services. Criteria we have seen organizations use to define crucial business services include:

  • Volume
  • Value
  • Market share
  • Reputational impact
  • Systemic
  • Substitutability

The Bank of England (BOE) has also recently published a series of consultation papers (CP) that clients find helpful, including one titled Operational Resilience: Impact Tolerance for Important Business Services. In a press release introducing this new paper, the BOE said:

“The policy proposals make it clear that firms and financial market infrastructures (FMIs) are expected to take ownership of their operational resilience and that they will need to prioritise plans and investment choices based on their impacts on the public interest. If disruption occurs, firms are expected to communicate clearly, for example providing customers with advice about alternative means of accessing the service.  Under the proposals, firms and FMIs would be expected to:

  • identify their important business services that if disrupted could cause harm to consumers or market integrity, threaten the viability of firms or cause instability in the financial system;
  • set impact tolerances for each important business service, which quantify the maximum tolerable level of disruption they would tolerate;
  • identify and document the people, processes, technology, facilities and information that support their important business services; and
  • take actions to be able to remain within their impact tolerances through a range of severe but plausible disruption scenarios.”

It is important to note that while these requirements are issued to financial institutions, they are a good frame of reference for all organizations.

An integral component of demonstrating resilience is governance. Many financial services institutions are large, if not global in nature, which means there are a lot of regulators to be satisfied. This makes it critical for the organization to engage the board of directors in understanding and agreeing to operational resilience plans. Senior leadership also needs to establish a tone from the top, providing the appropriate vision, direction and resources to implement a proper operational resilience program.

The graphic above illustrates our point. We believe it is critically important to have these lines of defense. The first line is composed of business units, and it is important that organizations put their resilience office and accountability for operational resilience management into this first line, as they know the business, operations and systems best. The second line is responsible for challenging the first line, in the classic sense of a first-/second-line challenge. The second line challenging the first is important because the organization’s KRIs and KPIs have to be effective, and they have to understand and give a view of what resilience is in the organization, how it’s enhanced and, most important, where the recovery challenges lie. The third line, the classic audit, is also important — coming in to make sure everyone adheres to expected policies and procedures.

Operational Resilience = New Thinking

Operational resilience certainly brings new challenges, but the evolution of risk means organizations need to stay as far ahead of the next big threat as possible. We’ve taken the NIST cybersecurity framework and provided both traditional controls and programs that help build and maintain resilience (“Old”), as well as new and evolving techniques that firms should consider to continuing maturing in the face of the extreme threats posed to them:

This blog entry just touches on all that operational resilience involves. To learn more, listen to a recent webinar on the topic, or contact us.

Andrew Retrum

Managing Director
Technology Consulting – Security and Privacy

Douglas Wilbert

Managing Director
Risk and Compliance