The traditional concept of IT Disaster Recovery (DR), i.e. the solution where an organization sets up an alternate site where servers, applications, and data can be used in case the primary data center burns down, floods, loses power, or otherwise fails, needs to be re-thought completely due to two major developments.
The first one is Cloud Computing, resulting in the IT DR responsibility seemingly being transferred to the shoulders of an external supplier. “We have outsourced our business continuity challenges to a cloud vendor” is a popular comment. Do not be fooled though. As perhaps everything in life, any benefits usually come with a set of new challenges.
new challenges. Whilst you may have picked a cloud partner with ISO/IEC 27001 and/or related certifications, you will unlikely have full control over their operating procedures, any changes in security practices between audits, their mergers and acquisitions, their staff background checking processes, any temporary skill gaps, any disgruntled employees they may have, exactly where on their systems your data resides, and who else’s data resides on it.
Additionally, many customers of cloud vendors have ‘all eggs in one basket’ when it comes to storing their various data environments (e.g., production, test, development, and DR) all with the same cloud vendor. This is not always the best choice, if we consider the risk of your account being compromised or in case that supplier’s system or infrastructure go out of operation – which even happens to the best of them, as was demonstrated in 2021 when a leading CRM vendor went down for 6-8 hours taking their clients with them. Many of whom had stopped worrying about having a Business Continuity Plan (BCP) including manual work-arounds, because “they had outsourced their BCP to the cloud” – remember?
Without negating the upsides of cloud solutions for BCP, one should just be conscious of the aforementioned issues as well as further downsides, such as the relatively little ability to customize the user interface (compared to in-house software). But possibly the biggest downside is the complete and utter reliance on network connectivity. Whilst in a pre-cloud world, your staff may have been able to continue working on local file and mail servers, now they are no longer able to even email the colleague sitting next to them if internet connectivity is affected. Cloud can absolutely be an excellent choice, only as long as the decision is made with all pros and cons in mind.
The next development that has changed the concept of IT DR entirely is the uprise in information (including cyber) security threats. The traditional ‘primary site vs backup site’ concept makes little sense if malware has worked its way into both environments. Further complicating this risk is not knowing how far it has traveled, “so let us initially unplug all systems and investigate”. A fire, flood, or power outage makes itself heard and seen in an obvious way, but with information security threats, part of the challenge comes with the inability to assess properly what has happened, what components are affected, how to remove the cause, and when a patch may become available. Finding an expert cybersecurity consultancy partner to quickly assist in this process may also be a challenge, particularly in case of a large-scale cyberattack, which means you will not be the only one seeking their help.
In a nutshell, DR is not as predictable as in the past, therefore, having a solid BCP with initial/manual work-arounds and excellent communication procedures and tools is imperative – more so than in the past. However, BCPs and Cyber Incident Response Plans (CIRPs) often exist on paper, rather than actually being embedded across the organization.
There is too much focus on ticking boxes to please auditors or clients, too much paperwork, too much-required effort to maintain such plans, too little hands-on implementation, too little buy-in, too little enthusiasm from staff, too little actual incident readiness, and too little effort put into preparing staff to think ‘on their feet’ when a disruptive incident occurs.
It affects entire organizations. Senior management ends up with a false sense of security; that everything is covered with technical controls, that risks are managed well, and that staff is ready to act if a cyberattack or other incidents were to occur – and that is if management even understands that the broader workforce must play a part in identifying and reducing information security risks.
Whilst, in reality, only a few individuals, such as the BCP manager, the Chief Information Security Officer (CISO), and any IT (Security) staff keep themselves familiarized with the content of the plans and procedures, or even worse, they are the only staff who even know these plans exist.
Even if organization-wide awareness campaigns are occurring, non-IT/Security/BCP staff are usually getting on with their normal business without understanding the context and how their daily work might incur risk. Until an immediate trigger occurs (e.g., a real-life cyber incident blocking their data, network, or application access), they do not even think about all the issues that could affect them. Often, information (including cyber) security and business recovery procedures only get written or refreshed for audit or other compliance-related purposes. And if staff can avoid being involved, they usually will.
The problem actually starts much earlier than that. BCP managers, CISOs, and IT Security staff tend to work in a solitary way, or mainly involve those in an organization who work directly with them. At best, they may try to have some dialogue with senior management to provide confidence that the risks are managed and ensure the top management can go to sleep at night.
It is often challenging to get buy-in, time, and attention from middle management and the general workforce who are busy ‘doing their job’. And that is where the ball stops rolling in many BCP and Cyber Incident Response Planning (CIRP) initiatives.
The result is that mountains of documentation may get produced (including detailed preventative and impactreducing controls for a range of incidents such as ransomware, DDoS attacks, malware, phishing, and social engineering), but these are written quite generically, e.g., using a standard template ‘downloaded off the Internet’.
More ‘fit for purpose’ style documents (including practical manual work-arounds) are preferred, but these are often invested in just once and then easily get out of date. If a real incident occurs, most staff are oblivious to the incident (or confused), thereby increasing the chance of worsening the impacts. They do not know their role, what to look out for, what treatment options to activate, and/or who has the authority to give them instructions. In a nutshell, they are far from ready.
These problems stem from the following six mistakes:
- Only the BCP manager, CISO, IT, and/or IT Security staff are fully aware of the plans and these individuals become ‘single points of success’ without the broader workforce being ready at any time for an incident. Little or no integration exists with broader incident management processes. Or worse, the entire plans have been written by an external party who have not aligned it with the organization’s processes, structure, priorities, and culture.
- In addition to over-dependency on a few internal skilled individuals, there tends to be an over-reliance on (and over-confidence in) external recovery service providers and Cyber Incident Response (CIR) providers. Will their contractual promises and Service Level Agreements (SLAs) survive a substantial influx in demand for their services if many of their clients are affected by the same incident, such as an industry-wide ransomware attack or widespread flooding? Have you discussed with them how they might juggle their various clients’ needs for help and where you are on their priority list? Taking legal action to address their non-compliance and getting compensated weeks or months after the event will not help you to maintain proper service levels and relationships with your own clients – and your reputation in the marketplace.
- Complicated and jargon-filled procedures sent by technical staff to business divisions, under the expectation that their staff will understand and adopt them without proper guidance. Staff within the divisions are often unclear about their role in the plans and the purpose of some of the treatment options (e.g., password change policies, phishing attack simulations, BCP exercises, and staff training programs), which results in low uptake, attempts to circumvent certain controls and eventually create resistance amongst the broader workforce to help keep the process alive.
- Top management, whilst aware of the risks and the need to comply with relevant regulatory requirements, often does not commit sufficient time to truly understand their own role in the processes, palms it off as an ‘IT thing’, is not equipped with the skills to actively guide middle management and general staff and does not commit sufficient resources to embed awareness programmes across the organization.
- The CIRP and BCP are built as large documents – which are centrally managed by the BCP manager, the CISO, and other Security staff – not regularly maintained and impractical in real incidents, because relevant content is difficult to find. Version control (if any) may be impeded by only one person being able to edit the latest version at a time. And when the IT systems are deactivated as a precaution, the CIRP and BCP documents cannot be retrieved as it sits on a system that is now unavailable.
Simulation tests being timed inconveniently, repetitive in terms of the scenario, not including sufficient business context/relevance, and/or having a ‘pass/fail’ flavor – causing participants to try to look good in front of top management rather than trying to find areas of the plan that need improving. I have observed organizations spending hundreds of thousands of dollars on consultants, only to find they still make these six mistakes. The resulting problems recur every few years when the documents are out of date, or sooner – and this is much worse – when a real-life flood, fire, data breach, or other incident occurs and the plans (and other controls) do not work – and nobody knows how to activate them.
Equipped with a short, sharp, dependable BCP and CIRP (integrated where possible, in terms of key decision-makers and related teams), your business will be in a far better position to respond confidently in an actual incident, protecting its brand and reputation, meeting its legal responsibilities, and ensuring the needs of its staff, clients, and stakeholders are met. To achieve this, senior management needs to commit to these processes ‘all the way’.In a nutshell, the right approach includes the following elements:
- A ‘superhero’ team is established, consisting of BCP coordinators, IT (Security), as well as key business unit representatives, to assist in creating the response plans, engaging with staff across the organization, planning/facilitating training and awareness programs and conducting rehearsals/tests.
- Scenario-based discussions are held with external providers prior to selecting any of them. Once realistic promises regarding their response times and capabilities have been agreed upon, these are then validated. Where gaps come to the surface, further collaborative work is conducted together with them, to align mutual expectations and promises – and related (standby/retainer and/or activation) fees. Providers are included in any plan walk-throughs and/or exercises/tests, so they understand the internal mechanics of your organization, as well as key deliverables and roles relevant in an actual incident.
- Middle management and general staff are engaged in concise but highly interactive workshops, so they start engaging hands-on with the BCP and information security processes, in an effort to assist with choosing preventative controls that their teams can practically implement and maintain. This could include the encouragement of a true “if you see something: say something” habit amongst staff, and the development of practical work-arounds in case of a disruptive incident.
- Top management is trained in its governance role, as well as its decision-making role in the event of an emerging or evolving incident. By means of workshops using mini-scenarios, they share views on their organization’s risk appetite and related risk evaluation criteria. These can then be utilized by staff down the line to select feasible and reasonable treatment options.
- BCP and Information Security documentation is simple to maintain (e.g. by using color coding, bullet-style checklists and Quick Reference Cards) and based on a top-down holistic approach (e.g. by working with a small number of impact-based scenarios). It resides on an interactive, common platform such as the organization’s Sharepoint/LAN/ Intranet site (i.e. one that the broader workforce already uses in their daily life) and has a remotely accessible copy in case IT systems are down.
- Rehearsals or simulations are entertaining and actually allow participants to make mistakes. They aim to identify gaps instead of covering them up (for these to then surface during a real-life incident when it is too late). Exercises include audio-visual tools and a range of practical challenges and injects (including realistic testing of decision-making processes and staff notification systems if IT services are not to be used) in order to ensure management and staff develop true incident readiness.
The goal is for everyone to be able to sleep soundly at night knowing that, not only are good plans in place, but also that they are up to date, and that everyone knows what to do should an incident occur.