How to Build a Trusted Digital Ecosystem through Business Continuity in 5 steps

A “trusted digital ecosystem” and “business continuity” are seldom mentioned in one sentence, and quite wrongly so.

Let’s start with the definition of the digital ecosystem. Hearing this term, one might think of IT systems, networks, maybe some peripheral devices like scanners or printers in the familiar business environment. But the world keeps changing, and we need to get along and take a broader perspective. Present digital ecosystems include hybrid, industry-specific solutions, and configurations, interlaced with various external services, often cloud-based, such as: wireless technologies, industrial control systems (ICS) and robots, internet of things (IoT), bring-your-own-device (BYOD), corporate-owned personally enabled (COPE), and so on. But, unfortunately, it is not easy to get your head around them and secure them appropriately.

The first obstacle is understanding how to prioritize components of our digital environment. Based on what criteria should that be done? In search of an answer, the business and ICT departments turn to each other, often in vain, or the requirements they receive are too broad, unnecessarily restrictive, or inefficient.

The truth is that setting these priorities right is a significant endeavor. But what could “right priority” mean in this case? By “right priority,” I mean the priority which follows the organization’s risk appetite.

Even in the most prominent organizations, there is not enough information on the current risk appetite. Everything concerning risk management, especially operational risk management, is often muddy and vague. In business continuity management (BCM), we tend to identify precisely the scope of risk management and acceptable risk level. Since the BCM System (BCMS) is all about preparing an appropriate reaction to risk, it can also help as a reference point for managing operational risk in other areas. Here is a proposition on defining the credible risk acceptance criteria and building a trustworthy ICT environment.

Set the scope

Organizations – commercial or non-profit – exist to provide products or services to the external world. The missioncritical activity is usually managed by one or many formalized management standards. Many of these standards require correct identification of the context of an organization – e.g., ISO/IEC 27001 and ISO 22301. In BCM, the context of an organization identifies all legal and contractual obligations (and even informal promises) made towards the external parties, based on which they expect our availability at a certain level. This exercise can and should also be used to identify information security requirements – confidentiality, integrity, and in case you operate under the NIS Directive in the EU, authenticity. Business continuity requirements, of course, refer to availability.

When we have identified all the BCM requirements, it’s time to map out internal processes which meet expectations. These are our key processes. As the next step, we may use the organization’s context to build a map of interdependencies between business processes and the assets supporting them.

This first step in BCM is often overlooked or conducted on a high level of generality by describing some obvious facts – who we are, what we do, who our customers are, and so on. However, it’s worth putting more effort into this. A detailed context of an organization provides a root reference point for risk assessment and adequate further development of an ICT infrastructure – knowing what kind of requirements we have to fulfill.

Potential impact

The most crucial step in building a digital ecosystem, which will be appropriate for your organization, is assessing the potential impact (financial and non-financial) of not fulfilling previously identified external obligations. Based on objective data derived from financial documents, legal regulations, and contractual clauses, business impact analysis (BIA) helps us understand our “level of pain” and why and when certain things need to happen to avoid negative consequences. Only based on this assessment, we can define and justify our internal business continuity requirements for each key process:

maximum tolerable period of disruption
a minimum level of recovery
maximum period of time after which the process needs to return to normal.

In BCM, we usually focus on these processes, bringing the highest negative impact within the shortest time. However, there is no reason why we shouldn’t use requirements for all the processes covered by the BIA.

At this point, we know and understand our priorities and are ready to identify assets, including components of the digital environment, necessary to keep those critical processes running at the predefined, minimal level. These are our critical assets.

MITRE® Crown Jewels Analysis (CJE) methodology can be a great help in this task.

It is a common practice to define priorities for the ICT components based on the employees’ experience or even on the cost of the solutions, where the more expensive the component, the higher its criticality. It may be a helpful approach for calculating the value of an insurance policy. Still, it’s not a good indicator of the operational priority of this particular ICT system or service because we lose the relation to the business function.

Risk assessment & treatment

I observe a similar practice when it comes to ICT risk analysis. Companies use methodologies that base the risk assessment solely on expert judgment. It is a valuable source of information in risk management; however, when we need to decide on costly investments in the new security controls, we need something more specific.

A trustworthy risk analysis is based not on the educated guesses, but on the measurable criteria built on data on the potential impact of business interruption (or information security breach if we include this aspect in our assessment). In such an approach, the probability is of little importance. Instead, what counts the most is the potential impact of a particular threat.

Risk assessment lets us identify all the gaps between business requirements for ICT availability and its actual capabilities. Business process owners and ICT operations then jointly decide on the risk treatment plan: either filling the gaps in security measures, or accepting the risk of inevitable business interruptions. There is no third way. Thanks to the BIA results, those decisions will be indeed “well informed,” and both parties better understand each other’s perspectives and develop mutual trust.

Implementing some of the risk treatment plans can take months or even years, in case there is a need to change our ICT architecture completely. However, we shouldn’t get disheartened by that. It is advisable to keep in mind that in BCM and information security management, it is the journey that matters, not the destination. Due to the dynamic changes in ICT environments, we may never attain our goal of uninterrupted, undisturbed secure operation, but we should keep on trying, steadily reducing the distance between us and the “bad actors” or simple bad luck.

Response

BCM Strategy, which is the next milestone, defines what we need to do, why, when and where to focus on first, what needs to be available, what can be omitted, and who is needed. The BCM Strategy comprises top-management directives on how to proceed in the worst-case scenario. The BCM requirements, defined during business impact analysis, tell us why we should do it.

BCM Strategy defines our 5 Ws (what, who, when, where, and why). From the ICT point of view, it forces us to take a close look at where the critical assets come from, how we ensure their availability, within what timeframes, and in particular, what our recovery options are: external service providers, the backup, and/or alternative solutions.

It’s one of those steps in building organizational BCM, where we find additional vulnerabilities: SLAs (servicelevel agreements) which are not in line with the risk appetite, unique and hard to replace service providers, backup or alternative options which require changing operational procedures, the list goes on.

Business Continuity Plan and ICT Readiness Plan answer this question: “How do we implement BCM Strategy and the top-management directives it contains?

There is a significant added value for ICT operations in developing contingency procedures collectively during workshops. When representatives from different ICT departments work together, they can reveal several vulnerabilities, gaps, and missing links, which they could otherwise overlook. Furthermore, such an approach ensures that we haven’t excluded any uncomfortable fact from our analysis or haven’t made any unfounded, optimistic assumptions.

A tiny hint from my practice: business departments should write their procedures after the ICT teams. If we develop the contingency procedures in such an order, the business knows in detail what they can count on during a crisis.

Verification

Testing BCM procedures and solutions is the best tool for the verification of organizational resilience. Well-designed and prepared tests and exercises reveal the actual state of affairs, the real problems, and the threat to our digital environment. But, the organization needs to understand that tests on which we can rely are time-consuming, and the more complicated the ICT environment, the more thought and effort we need to put into it. Still, tests are an integral part of the process of building trustworthy BCMS and ICT services.

I can’t recall tests, no matter their type, which haven’t improved ICT and organizational resilience. Testing and exercising of Business Continuity or ICT Readiness Plan successfully replace traditional training of employees and is also an effective tool for verifying the readiness of external providers. Without the tests and exercises, our contingency plans shrink to a declaration of goodwill.

Conclusion

To truly build trust into your operations, including the digital ecosystem supporting them, your organization needs a proactive audit and risk management role. Their aim should always be to support the continual improvement of business continuity efforts, to indicate gaps, nonconformities, and recommend corrective actions to remove them. Unfortunately, sometimes auditors try (often in good faith) to cheer the organization by praising the efforts and turning a blind eye to the shortcomings and vulnerabilities, causing a severe disservice to everyone involved and undermining faith in the organization’s resilience.

By following the steps above, the Business Continuity Management System (BCMS) can become a strategic tool for a watchful oversight of vital processes, and since all activities depend on ICT technology, it’s also helpful in managing the digital ecosystem and improving its trustworthiness.