Bacula Systems White Paper
IT Disaster Recovery
Planning Guide
Bacula Systems User’s Guide
This document is intended to provide insight into the considerations and processes
required to establish, test, and implement disaster recovery procedures for crucial
IT services at your company.
LEGAL DISCLAIMER
Bacula Systems has taken, and will continue to take, proper care in the development, preparation and maintenance of the content
of this document which is intended for general information purposes only. Notwithstanding the preceding, Bacula Systems
makes no representation and gives no warranty, whether express or implied, as to the accuracy, reliability or completeness of
any information, content or materials contained within the document. Bacula Systems will accept no responsibility or liability
whatsoever for any material or services contained on any application, website or software not under the direct control of Bacula
Systems.
Version 1.0, April 5, 2018
Copyright ©2008-2018, Bacula Systems S.A.
All rights reserved.
Contents
1 Overview 2
What is an IT Disaster Recovery Plan? . . . . . . . . . . . . . . . . . . . 3
Who Is Involved in IT Disaster Recovery Planning? . . . . . . . . . . . . 3
Disaster Recovery Planning Process . . . . . . . . . . . . . . . . . . . . 3
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Disclaimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 IT Disaster Recovery Planning Guide 4
1 Obtain Authorization and Commitment . . . . . . . . . . . . . . . 5
1.1 Gather Background Information (Optional) . . . . . . . . . 5
1.2 Determine How to Proceed . . . . . . . . . . . . . . . . . . 5
2 Define Priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Identify Critical Services . . . . . . . . . . . . . . . . . . . . 6
2.2 Assess Impact of Service Outage . . . . . . . . . . . . . . . 7
2.3 Risk Assessment . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Prioritize . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Decide extent of action . . . . . . . . . . . . . . . . . . . . 8
3 Decide on Technical Methodology . . . . . . . . . . . . . . . . . . 8
3.1 Determining a technical methodology for each service . . . . 8
3.2 Developing Facility and Infrastructure Plan . . . . . . . . . 10
3.3 Estimating Costs and Developing a Schedule . . . . . . . . . 10
4 Develop and Implement the Plan . . . . . . . . . . . . . . . . . . . 10
4.1 Roles and Responsibilities . . . . . . . . . . . . . . . . . . . 10
4.2 Determine disaster response process . . . . . . . . . . . . . 10
4.3 Develop detailed service recovery plans . . . . . . . . . . . . 11
5 Test the Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.1 How to Test? . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 IT Disaster Recovery Plan Template 12
1 Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1 Policies and Administrative Regulation . . . . . . . . . . . . 13
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Services and Their Priorities . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Services List . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Assess Impact of Service Outage . . . . . . . . . . . . . . . 14
2.3 Assess Risks . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Prioritize . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Set Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Facility and Infrastructure Plan . . . . . . . . . . . . . . . . . . . . 16
3.1 Determining technical approach for each service . . . . . . . 16
3.2 Facility Plan . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Infrastructure Plan . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Estimating Costs and Developing a Schedule . . . . . . . . . 19
4 Plan Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1 Roles and Responsibilities . . . . . . . . . . . . . . . . . . . 19
4.2 Disaster Response Processes . . . . . . . . . . . . . . . . . 22
4.3 IT Services Recovery Plans . . . . . . . . . . . . . . . . . . 24
IT Disaster Recovery — Planning Guide 1 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
5 Testing the DR Plan . . . . . . . . . . . . . . . . . . . . . . . . . . 25
IT Disaster Recovery — Planning Guide 2 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
Overview
Developing an IT disaster recovery plan involves choosing the right people to be
involved, assigning appropriate roles, selecting the technologies to use, as well as
developing, implementing, testing, and documenting the recovery process. This
document will show you how to create a simple disaster recovery plan for your
company that can be further expanded, based on your company’s needs. This
document contains two sections:
◾ IT Disaster Recovery Planning Guide – Walks you through the process of ob-
taining the required authorization, establishing planning priorities, determining
the technical approach, as well as developing, implementing, and testing the
disaster recovery plan.
◾ IT Disaster Recovery Plan Template – Provides sample content that you can
use while developing a disaster recovery plan for your company.
What is an IT Disaster Recovery Plan?
An IT disaster recovery plan documents:
◾ the company’s leadership’s objectives for disaster recovery
◾ members of the recovery team and their roles and responsibilities
◾ detailed procedures for protecting and recovering required technical services
after a disruptive event such as a flood or fire
An IT disaster recovery plan aims to:
◾ provide critical IT services after an incident.
◾ ensure that critical business functions continue within a sufficient period of
time.
Who Is Involved in IT Disaster Recovery Planning?
The company’s IT manager should lead the planning. He or she usually works with
the IT department to determine specific steps within the disaster recovery process
and to develop and test the resulting recovery plan.
In addition, it is also important to involve other stakeholders outside of the IT
department, including senior leaders, CTO and CEO office representatives, and
board members (if applicable) to ensure the entire organization’s needs are met.
Disaster Recovery Planning Process
Disaster recovery planning is an ongoing and iterative process. Each step includes
several activities to be performed. During initial development of the disaster recov-
ery plan, stages 4 and 5 are repeated several times, each time focusing on developing
and testing recovery plans for a different service or a set of services.
After obtaining leadership commitment to the disaster recovery planning program
in stage 1, stages 2 through 5 are repeated periodically. IT services are dynamic:
new services are created and obsolete services are retired. Priorities and disaster
recovery plans must be reviewed and revised periodically to ensure that they are
current.
IT Disaster Recovery — Planning Guide 3 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
Glossary
Criticality Period A criticality period is any point during which the identified pro-
cess is critical and may affect the recovery time objective (RTO). A service or
process may have multiple criticality periods or none at all, depending on the
nature of the process. Criticality periods may be cyclical or one-offs and may
range from months to hours in length. Examples of criticality periods include:
year-end processing, regulatory deadlines, payroll processing, and scheduled
events.
Recovery time objective (RTO) The goal for how soon the service needs to be
recovered after a disruption, based on the acceptable amount of downtime and
level of performance. For example, an RTO of 24 hours with local accessibility
for payroll services means that the payroll application must be up and running
within 24 hours as well as accessible locally.
Recovery point objective (RPO) The goal for how much data or information
can be lost after a disruption, based on the acceptable amount of data or
information loss. For example, an RPO of 6 hours for payroll services means
that the payroll data must be backed-up every 6 hours so that no more than
6 hours of data entered into the payroll application is lost after a disruption.
Service Priority The logical grouping of services to be recovered such as Prio 1,
Prio 2, and so on. Core infrastructure services need to be recovered first and
would be included in Prio 1.
Steering Committee A group of people that makes decisions, authorizes time and
resources, and provides oversight.
Working Group A group of people that defines technical approach, develops re-
covery plans for individual services, as well as tests and implements those
plans.
Disclaimer
The information contained in this guide should not be considered as legal advice.
The content may be inaccurate and incomplete and may not be fully applicable to
your situation. This guide should not be your only reference when developing a
disaster recovery plan. Please consider using multiple sources of information while
working on such a plan to make sure all aspects are covered.
IT Disaster Recovery — Planning Guide 4 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
IT Disaster Recovery Planning Guide
Stage 1: Obtain Authorization and Commitment
Commitment from all business areas and all levels of management is crucial for
effective IT disaster recovery planning. Senior leaders, top management, and other
stakeholders need to understand why disaster recovery planning is important, so
they can budget time, attention and resources for disaster recovery activities as
necessary. This important stage not only commits the company to having a disaster
recovery plan, but with every stakeholder bought in to the project, also helps you
develop the plan by making it easier to get time and resources from other areas of
the organization.
Stage 1.1: Gather Background Information (Optional)
If there are no pre-existing disaster recovery procedures at your company, it may be
helpful to collect the following information before you proceed:
◾ What is the administrative procedure or policy that regulates how an IT dis-
aster recovery plan should be maintained (if one exists)?
◾ What is the location of the current IT disaster recovery plan (if one exists)?
◾ The date that the disaster recovery plan was last updated
◾ Who is the person(s) responsible for maintaining the disaster recovery plan?
Stage 1.2: Determine How to Proceed
After you have collected the necessary background information, evaluate what you
need to do to ensure management support and to acquire any needed authorization.
For example, you can:
◾ Ascertain the degree of mindshare and support among senior leaders for
putting in place a disaster recovery plan.
◾ Communicate clearly, and/or even deliver a presentation to senior leaders
that outlines why a disaster recovery plan is important, what has already
been done, and what steps the recovery process includes.
◾ Work with your colleagues to build two groups:
– A steering committee that makes decisions, authorizes time and re-
sources, and provides oversight.
– A working group that defines technical approach, develops recovery
plans for individual services, as well as tests and implements those plans.
When determining who should participate in each group, consider decision-
making authority, resources available to the person, and their skills and knowl-
edge.
◾ Develop or revise administrative procedures or policy regulations on how the
disaster recovery plan should be maintained (if applicable).
IT Disaster Recovery — Planning Guide 5 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
Stage 2: Define Priorities
Establishing planning priorities involves the following steps:
1 Identify critical services and the applications and other IT elements used by
these crucial services
2 Quantify the impact of service outage and identify maximum allowable data
loss and service downtime.
3 Assess vulnerabilities risks and threats that could cause disruptions to IT
services, processes and applications.
4 Prioritize IT services according to requirements and criticality.
5 Decide extent of action: determine which services should be the first to have
recovery plans for.
Additional questions to discuss:
◾ Does the impact vary by time of month or year?
◾ Are there any specific deadlines that organization stakeholders must meet?
◾ What is the maximum allowable outage time for each critical business or
education service?
◾ How feasible are manual workarounds?
◾ How soon does each technology service need to be available? (See recovery
time objective definition on page 4)
◾ How much data or information can we afford to lose? (See recovery point
objective definition on page 4)
Stage 2.1: Identify Critical Services
Work with each department in your organization to determine which services are the
most critical to them. Identify all the underlying technologies or other contributing
services are that are needed for the whole.
Create a list of critical services, providing information about the departments they
are used by, their location, criticality periods, manual workarounds (if available),
maximum allowable outage, as well as underlying services or applications they de-
pend upon.
IT Disaster Recovery — Planning Guide 6 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
Stage 2.2: Assess Impact of Service Outage
Determine what will be the impact of service outages and time sensitivity. A business
disruption can impact an organization in several ways. There are five main categories
that are used to measure impact:
◾ Safety and human life
◾ Financial
◾ Reputation
◾ Operations
◾ Regulatory and legal
For each of the critical services and processes you identified, determine the impact
in each of the applicable categories based on the following values:
Minor The consequences would threaten the efficiency or effectiveness of some ser-
vices and processes but would be dealt with at the business unit or department
level.
The consequences may include low monetary losses.
Moderate The consequences would not threaten the provision of services and pro-
cesses, but would mean the business operations could be subject to significant
review or changed ways of operating. Executive involvement would likely be
required.
The consequences may include moderate monetary losses.
Major The consequences would threaten continued effective provision of services
and processes and require executive involvement.
The consequences may include significant damage or destruction, some minor
injuries or threat to human safety with no loss of life, high monetary losses.
Catastrophic The consequences would threaten the provision of essential services
and processes, causing major problems for customers and require immediate
executive involvement and action.
The consequences may include major damage or destruction, imminent threat
to human safety, loss of life or major injuries, and extreme monetary losses.
The time sensitivity of service outages can be determined by calculating the required
recovery time objective and recovery point objective.
Recovery time objective (RTO) The goal for how soon the service needs to be
recovered after a disruption, based on the acceptable amount of downtime and
level of performance. For example, an RTO of 24 hours with local accessibility
for payroll services means that the payroll application must be up and running
within 24 hours as well as accessible locally.
Recovery point objective (RPO) The goal for how much data or information
can be lost after a disruption, based on the acceptable amount of data or
information loss. For example, an RPO of 6 hours for payroll services means
that the payroll data must be backed up every 6 hours so that no more than
6 hours of data entered into the payroll application is lost after a disruption.
IT Disaster Recovery — Planning Guide 7 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
Stage 2.3: Risk Assessment
Record the risks that would cause disruptions to IT services, processes and appli-
cations, such as a server room without a backup power source that would lead to
the services hosted by the servers in this server room not being available during a
power outage. If the risk is present, determine the implications and whether or not
a strategy to address such a risk is required.
Stage 2.4: Prioritize
Prioritize the IT services based on need and criticality, along with their dependencies.
Each service can be classified as follows:
Service Priority Recovery Time Objective
Critical e.g. within 24 hours
Vital e.g. within 72 hours
Necessary e.g. within 2 weeks
Desired e.g. longer than 2 weeks but necessary to return to normal
operating conditions
For better budgeting, it is important to prioritize risk reduction and recovery efforts
based on how critical the service is to the company.
Stage 2.5: Decide extent of action
Based on service priorities identified in Stage 2.4 decide which services will be
applicable for this stage of disaster recovery planning and which will be earmarked
for a later date. It is advisable to document core IT services and infrastructure used
by other applications and services first according to their criticality.
Stage 3: Decide on Technical Methodology
At this stage you need to determine how much to invest in proactive disaster preven-
tion, as opposed to dealing with the consequences and recovering after a disaster.
After you establish investment priorities, determine the facilities and technologies
needed as well as their cost and what the implementation schedule should be.
Determining the technical approach involves the following:
1 Determine technical approach for each service (whether to prevent risks or
focus on recovery options).
2 Develop Facility Plan and Infrastructure Plan: find an alternative site with
sufficient power, infrastructure, and space.
3 Develop cost estimates and schedule: prepare labor and technology cost esti-
mates and proposed schedule for implementation.
Stage 3.1: Determining a technical methodology for each service
Using the list of priorities you identified in Stage 2 on page 6, determine what your
technical approach should be: would you prefer to focus on preventing outages (for
IT Disaster Recovery — Planning Guide 8 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
example, through implementing additional components or equipment) or on recovery
options, including manual workarounds and alternate sites to restore technology
services and applications within the required timeframe. When deciding on the best
technical approach, you may need to consider the following aspects:
◾ Recovery time .............
............ ....... objective1 and ...........
recovery........
point.............
objective2 : Services that have
shorter recovery point objectives and/or recovery time may be better suited
to adopt a methodology based on prevention rather than recovery. It could
be hard to recover these services in a short time frame.
◾ Risks and risk mitigation: Each service will face a variety of risks which may
cost a lot of time just to identify, as well as mitigate. Therefore, it may
be preferable to develop a strategy for dealing with grouped risk scenarios,
designed to cover most of the risks to a service. For example, these groups
could be:
IT Service Risk Type Risk Mitigation
(e.g. CRM, Facility down due Implement network/servers/software at
or HR Sys- to Non-destructive a mirror data centre. Restore data from
tems, or ERP Event backup until the main facility has been
Systems, etc) recovered.
Facility down due to Implement network/servers/software at
Destructive Event a mirror data centre. Recover data from
backup. Mirror facility becomes primary
facility.
Network Loss Take preventative approach through set-
ting up redundant network components
and agreement with alternate internet
provider.
Application Loss Have application set up on standby
server in data centre. Restore applica-
tion data from backups.
Continue on the next page
1 See definition on page 4
2 See definition on page 4
IT Disaster Recovery — Planning Guide 9 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
IT Service Risk Type Risk Mitigation
Employee Loss Create more detailed documentation due
to criticality of service. If needed, bring
in temporary qualified staff during a dis-
aster.
Stage 3.2: Developing Facility and Infrastructure Plan
Work together with your team to find an alternate site that meets the power,
infrastructure, and physical space requirements needed to recover your IT services.
This site should not be too close to your primary site, to ensure that a disaster does
not affect them both. Use the following sections of the template to document your
facility and infrastructure plans: Step 3.2 on page 17 and and Step 3.3 on page 18.
Stage 3.3: Estimating Costs and Developing a Schedule
Prepare cost estimates for labor and technology costs required to implement steps 3.2
on page 17 and 3.3 on page 18. Develop a schedule for implementation. Discuss
and obtain an approval from the IT disaster recovery plan steering committee.
Stage 4: Develop and Implement the Plan
Stage 4.1: Roles and Responsibilities
Developing an IT disaster recovery plan requires involvement of the following two
groups:
◾ A steering committee that makes decisions, authorizes time and resources,
and provides oversight.
◾ A working group that defines technical approach, develops recovery plans
for individual services, as well as tests and implements those plans.
While determining who should participate in each group, consider decision-making
authority, resources available to the person, and their skills and knowledge.
Establish roles and responsibilities for recovery efforts. Designate alternates in case
not everyone is available (specific names, contact info, and so on.)
Stage 4.2: Determine disaster response process
Responding to a disaster consists of a number of phases:
Tip: Handle minor incidents causing service outage via incident response proce-
dures. Escalate severe incidents or events, such as loss of all communications, loss
of power, flood or fire, or loss of the building to appropriate personnel.
IT Disaster Recovery — Planning Guide 10 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
Stage 4.3: Develop detailed service recovery plans
Developing detailed service recovery plans involves the following steps:
1 Gather detailed requirements:
◾ Gather information about current configuration and network and security
requirements.
◾ Determine server and storage configuration, network and security re-
quirements, as well as application configuration details.
2 Analyze:
◾ Determine recommended technical recovery approach and level of doc-
umentation.
◾ Gather requirements for restoring the service in your recovery facility
if needed (for example, space, power, telecommunications, and so on).
Cloud-based services may not need this step.
◾ Analyze dependencies.
3 Document current setup and process for recovering the service. Provide more
documentation for complex and critical services.
4 Test the process to identify gaps and areas for improvement.
Stage 5: Test the Plan
Testing is the most important part of developing a disaster recovery plan. The
only way to know if your disaster recovery plan will work in a real disaster is if you
thoroughly test it before it is actually needed. There are different types of tests
varying in complexity and amount of time and resources to complete. Tabletop
walkthroughs where team members verbally go through steps in the plan tend to
be the least time-consuming. Disaster simulations and full failover testing requires
more time and resources. Initial testing and tabletop walkthroughs my only involve
the disaster recovery team, but simulations or full failover testing require additional
people to be involved to make them more realistic. The IT leader and/or the steering
committee need to be involved in determining when to run a disaster simulation or
full failover testing due to the potential impact and the need to involve other areas
of the organization.
Stage 5.1: How to Test?
◾ Tabletop Walkthrough – Team members gather in a meeting room and ver-
bally go through the specific steps as documented in the plan to confirm
effectiveness, and identify gaps, bottlenecks or other weaknesses. This test
provides the opportunity to review a plan with the full team and familiarize
staff with procedures, equipment and offsite facilities.
◾ Disaster Simulation – A mock disaster is simulated so that normal opera-
tions are not interrupted. A simulation involves testing hardware, software,
personnel, communications, procedures, supplies and forms, documentation,
transportation, utilities and alternate site processing. If possible, test against
IT Disaster Recovery — Planning Guide 11 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
production data. Analyze the results to capture lessons learned and update
the plan as appropriate. Disaster simulation could include:
1 Component testing
(a) Test individual parts of the environment.
(b) Execute tests at different times throughout the year.
(c) Include participation of a limited number of business areas.
(d) Only test connectivity.
2 Environment segment testing
(a) Test segments of the environment together (for example, groups of
services like routers and firewalls).
(b) Execute a limited number of tests per year.
(c) Execute limited functional testing.
3 Real time testing
(a) Test all aspects of the environment within scope.
(b) Test all applications on one day.
(c) Execute connectivity and some functional testing.
(d) Isolate production.
◾ Full Failover Testing – A full failover test exercises the total disaster recovery
plan. The test is likely to be costly and involves risk to normal operations.
If your focus is on resiliency, and failing over automatically, these tests are
required to ensure successful failovers during a disaster.
IT Disaster Recovery — Planning Guide 12 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
IT Disaster Recovery Plan Template
This template provides sample content that can be adapted to your context.
How to use this template Red text contains instructions and is intended to be
deleted. Grey text is used for sample content that can be adapted, as you are
developing your disaster recovery plan.
Stage 1: Authorization
Stage 1.1: Policies and Administrative Regulation
Instructions:
◾ Document why the IT disaster recovery plan was developed.
◾ List applicable legislation or company policies or administrative regulations
that specify requirements to create and maintain an IT disaster recovery plan.
For example, under the General Data Protection Regulation, businesses that
have customers in Europe are required to safeguard copies of personal in-
formation (including metadata) from unauthorized access, use, disclosure, or
destruction. An IT disaster recovery plan can help mitigate the risk of disclo-
sure or destruction of such private data as the result of an event or disaster.
This plan has been created as per the requirements of the following administrative
regulations:
◾ Data Security Compliance Act
Policy code: DSCA
◾ General Data Protection Regulation
Policy code: GDPR
Stage 1.2: Objectives
Instructions:
◾ Document the main objectives of the IT disaster recovery plan
The IT Department has developed this IT disaster recovery plan to be used in the
event of a significant disruption to critical IT services at [your company name]. The
goal of this plan is to outline the key recovery steps to be performed during and
after a disruption so that critical IT and telecommunication services continue within
an appropriate period of time after an incident has occurred.
Stage 2: Services and Their Priorities
Stage 2.1: Services List
Instructions:
◾ Create a list of critical services, providing information about the departments
they are used by, their location, criticality periods, manual workarounds (if
available), maximum allowable outage, as well as underlying services or ap-
plications they depend upon.
IT Disaster Recovery — Planning Guide 13 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
◾ Fill out the table below, adjusting sample data as needed.
Department Service or Pro- Service location Criticality Manual Maximum Underlying IT
cess Period Workaround Allowable Services or
Outage Applications
Financial Payroll Central Last None 24 hours SAP
Services Office week Simple
of the Finance
month
........ ........ ........ ........ ........ ........ ........
........ ........ ........ ........ ........ ........ ........
........ ........ ........ ........ ........ ........ ........
........ ........ ........ ........ ........ ........ ........
Stage 2.2: Assess Impact of Service Outage
Instructions:
◾ Determine the impact of service outages and the required recovery time objec-
tive (RTO) (how soon the service needs to be recovered) and recovery point
objective (RPO) (how much data can be lost).
◾ Fill out the table below, adjusting sample data as needed.
Impact if Business Service Unavailable Time Sensitivity
Department Service Underlying Safety / Financial Operations Reputation Regulatory RTO RPO
Services / Human / Legal /
Applica- Life Contrac-
tions tual
Finances Payroll SAP Minor Major Major ModerateMajor 24 24
Sim- hours hours
ple
Fi-
nance
...... ...... ...... ...... ...... ...... ...... ...... ...... ......
...... ...... ...... ...... ...... ...... ...... ...... ...... ......
...... ...... ...... ...... ...... ...... ...... ...... ...... ......
...... ...... ...... ...... ...... ...... ...... ...... ...... ......
Stage 2.3: Assess Risks
Instructions:
◾ Document the risks that could cause disruptions to IT services, applications,
and processes.
◾ Specify the implications if the risk occurs and whether or not a strategy needs
to be developed to address such a risk.
IT Disaster Recovery — Planning Guide 14 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
◾ Fill out the table below, adjusting sample data as needed.
Department Service Underlying Services Known Risks Implications Need Strategy to
/ Applications address?
Finances Payroll SAP Sim- No backup Finance Yes, con-
ple Finance power sup- information sider
ply unavailable installing
during a backup
a power generator
outage
........... ........... ........... ........... ........... ...........
........... ........... ........... ........... ........... ...........
........... ........... ........... ........... ........... ...........
........... ........... ........... ........... ........... ...........
Stage 2.4: Prioritize
Instructions:
◾ Prioritize the IT services based on need and criticality, along with their de-
pendencies.
◾ Fill out the table below, adjusting sample data as needed.
Department Service Underlying Ser- Recovery Time Recovery Point Service Classifi- Service Priority
vices / Applica- Objective Objective cation
tions
Finances Payroll SAP 24 hours 24 hours Critical Prio 2
Simple
Finance
......... ......... ......... ......... ......... .........
......... ......... ......... ......... ......... .........
Dependencies
......... ......... Network 12 hours 12 hours Critical Prio 1
Infras-
tructure
......... ......... Network 12 hours 12 hours Critical Prio 1
Connec-
tivity
......... ......... ......... ......... ......... ......... .........
......... ......... ......... ......... ......... ......... .........
Stage 2.5: Set Scope
Instructions:
◾ List the IT services covered in the scope of this IT disaster recovery plan,
IT Disaster Recovery — Planning Guide 15 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
along with their recovery time objectives, recovery point objectives, and the
order in which these services should be recovered.
Service Priority Service or Application Name Recovery Time Objective Recovery Point Objective
Prio 1 Network Infras- 12 hours 12 hours
tructure
Prio 1 Network Connec- 12 hours 12 hours
tivity
Prio 1 Storage Services 12 hours 24 hours
Prio 1 Firewall Services 12 hours 24 hours
Prio 1 Active Directory 12 hours 24 hours
.................. .................. .................. ..................
Prio 2 SAP Simple Fi- 24 hours 24 hours
nance
Prio 2 Payroll 24 hours 24 hours
Prio 1 Email 24 hours 24 hours
Stage 3: Facility and Infrastructure Plan
Instructions:
◾ Document plans for recovering IT services in an alternate facility (if required)
and plans for recovering infrastructure.
◾ Answer the following key questions:
– Where will we go when a disaster occurs?
– How will we restore our infrastructure services?
Stage 3.1: Determining technical approach for each service
Instructions:
◾ Using the list of priorities you identified in Step 2 on page 6, determine what
your technical approach should be: would you prefer to focus on prevent-
ing outages (for example, through implementing additional components or
equipment) or on recovery options, including manual workarounds and alter-
nate sites to recover technology services and applications within the required
timeframe.
◾ Fill out the table below, adjusting sample data as needed
IT Disaster Recovery — Planning Guide 16 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
IT Service Risk Type Risk mitigation
Facility down due to Non- Implement net-
(e.g. CRM, or HR destructive Event work/servers/software
Systems, or ERP Systems, at a mirror data centre.
etc) Restore data from backup
until the main facility has
been recovered.
Facility down due to De- Implement net-
structive Event work/servers/software
at a mirror data cen-
tre. Recover data from
backup. mirror facility
becomes primary facility.
Network Loss Take preventative ap-
proach through setting up
redundant network com-
ponents and agreement
with alternate internet
provider.
Application Loss Have application set up on
standby server in data cen-
tre. Restore application
data from backups.
Employee Loss Create more detailed doc-
umentation due to critical-
ity of service. If needed,
bring in temporary quali-
fied staff during a disaster.
Stage 3.2: Facility Plan
Instructions: Consider documenting items such as:
◾ the power, infrastructure, and space requirements for a recovery facility
◾ the circumstances under which a recovery facility will be used
◾ who is authorized to make the decision to use it
◾ who will be involved in setting up the recovery facility
◾ where the recovery facility is located and plans to identify an alternate facility
if needed
Facility Requirements
Requirement Description
Power
Infrastructure
Space
IT Disaster Recovery — Planning Guide 17 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
If the primary facility is no longer functional to restore normal business operations,
the team will be instructed that the recovery of systems will be done at the recovery
facility. Once this has been determined, the facilities team should start bringing the
alternate facility to a functional state. It is also important to properly coordinate
travel and logistics to ensure that the team can operate out of the alternate site.
Stage 3.3: Infrastructure Plan
Instructions:
◾ Focus on recovering the minimum core infrastructure required to recover mis-
sion critical IT services. Create a separate section in the document for each
core service that includes detailed recovery procedures.
Sample List of Critical Infrastructure Services:
System
Local Area Network (LAN)
Wide Area Network (WAN)
Servers
Core Network
Firewall
Remote Connectivity
.................................................................................
.................................................................................
Local Area Network (LAN) Recovery Plan
People Responsible:
◾ Thomas Bauer, IT Manager, [contact details]
◾ [backup, if available]
Priority
◾ Critical
Recovery Strategy and Location
...
Network Diagram
...
Assumptions
◾ Racks and power are available
◾ Other
Recovery Time Objective (RTO) and Recovery Point Objective (RPO)
◾ RTO: 6 hours.
◾ RPO: 6 hours.
IT Disaster Recovery — Planning Guide 18 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
Recovery Platform
...
Recovery Procedure
◾ Overview of major steps:
1 Rack gear
2 ...
3 ...
4 ...
5 Patch to switches
6 Configure router
◾ Details for each step:
1 Rack gear
– Mount gear
– Confirm power
– Patch to servers
– Connect to WAN
– Log in and update switch
– Configure rules
– ...
Stage 3.4: Estimating Costs and Developing a Schedule
Instructions:
◾ Prepare cost estimates for labor and technology costs required to implement
steps 3.2 on page 17 and 3.3 on the preceding page. Develop a schedule for
implementation. Discuss and obtain an approval from the IT disaster recovery
plan steering committee.
Stage 4: Plan Implementation
Stage 4.1: Roles and Responsibilities
Instructions:
◾ Document roles, responsibilities and contact information for the disaster re-
covery team in order to respond effectively to an incident or disaster.
Depending on the size and organization of your team, some roles may be combined.
Disaster Recovery Team Org Chart (Optional)
[Instructions:
Consider adding an org chart to show the team roles and how they are interrelated.]
The following chart shows the key roles involved in preparing for and responding to
a disaster.
Incident Manager (IT Lead)
IT Disaster Recovery — Planning Guide 19 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
The disaster recovery incident manager is responsible for making all decisions related
to the IT disaster recovery efforts. This person’s primary role is to guide the disaster
recovery process. The entire IT recovery team reports to this person during an
incident.
Responsibilities:
◾ Initiate the IT disaster recovery call tree.
◾ Provide status updates to senior leaders and information needed for making
decisions.
◾ Coordinate communications.
Facilities Team
The facilities team is responsible for all issues related to the physical facilities that
house IT systems, including both the primary and recovery facilities. They also
are responsible for assessing the damage and overseeing the repairs to the primary
location in the event of the primary location’s destruction or damage.
Responsibilities:
◾ Ensure that the recovery facility is maintained in working order.
◾ Ensure transportation, sufficient supplies, food and water and sleeping ar-
rangements are provided for all employees working at the recovery facility.
◾ Assess physical damage to the primary facility.
◾ Ensure that measures are taken to prevent further damage to the primary
facility and appropriate resources are provisioned to rebuild or repair the main
facilities if necessary.
Network Team
The network team is responsible for assessing damage to network infrastructure and
for providing data and voice network connectivity during a disaster.
Responsibilities:
◾ Assess damage to network infrastructure at the primary facility and prioritize
the recovery of services in the manner and order that has the least impact.
◾ Communicate and coordinate with third parties to ensure recovery of connec-
tivity.
IT Disaster Recovery — Planning Guide 20 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
◾ Ensure that needed network services are available at the recovery facility (if
needed).
◾ Restore network services at the primary facility.
Server and Storage Team
The server and storage team is responsible for providing the physical server and
storage infrastructure required to run IT operations and applications.
Responsibilities
◾ Assess damage to servers and storage and prioritize the recovery of servers
and storage devices in the manner and order that has the least impact.
◾ Ensure that servers and storage services are kept up-to-date with patches and
copies of data.
◾ Ensure appropriate backups.
◾ Install and implement required tools, hardware and systems in the facilities.
Applications and Processes Team
The applications and processes team is responsible for ensuring that all applications
operate as required to meet organizational objectives as well as managing IT pro-
cesses that are fundamental to support the recovery of IT services and applications.
Responsibilities:
◾ Assess impact to applications and prioritize the recovery of applications in the
manner and order that has the least impact.
◾ Ensure that the following IT processes are followed when managing applica-
tions:
– incident management
– change management
– access provisioning
– security
– other
◾ Ensure that servers in the facilities are kept up-to-date with application patches
and copies of data.
◾ Install and implement any tools, software and patches required in the facilities
as appropriate.
Call List
[Instructions:
Document the names, roles and contact information of leaders and team members
responsible for responding to an incident and handling recovery efforts.]
Name Role / Title Phone Number
......................... ......................... .........................
......................... ......................... .........................
IT Disaster Recovery — Planning Guide 21 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
Stage 4.2: Disaster Response Processes
Instructions:
◾ This section’s aim is to record clearly the main processes involved in re-
sponding to a disaster. This information will likely significantly improve the
response-time and the effectiveness of actions.
◾ The below information may be needed as part of each Disaster Response
process:
– process name & description;
– steps;
– inputs/outputs; and
– Personnel responsibilities and roles (what is expected of the different
people involved in carrying out procedures?).
Disaster Response occurs in several phases as per the diagram below. When an
event happens, the team first assesses the event and decides whether or not declaring
a disaster is necessary. In the case that a disaster has occurred, the team starts
procedures for recovery of the IT service(s), using, if necessary, an alternate location.
Once the critical and required services are back, up and running, the team can work
on getting back to normal operations. The final stage is to hold a review and
analysis of the event up to the point normal services were resumed.
Processes for Assess Phase:
Process to Assess Severity of Incident or Event
Instructions:
◾ Document the procedure for deciding how sever the incident is, and give clear
criteria and procedures for escalation.
◾ Additional Advice:
– Handle minor incidents that lead to service outage through incident re-
sponse procedures. Severe incidents such as fire, flood, significant com-
munications loss, power loss or unavailability of the building, should be
escalated to the appropriate personnel.
– Service desk process links should be documented and kept up to date.
Severe Incident Escalation
Directions:
◾ The escalation procedures for alerting senior leadership and IT Management
to assess the impact of the incident need to be fully documented.
IT Disaster Recovery — Planning Guide 22 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
How to perform Impact Assessment
Directions:
◾ Clearly document what information needs collecting in order to declare – or
not declare- a disaster. Some examples would be: assessing the degree of
damage and calculating the anticipated recovery time. Include advice on
whether to recover in-situ, or begin recovery in an alternate site.
Process to Declare Disaster
Directions:
◾ Clearly define the criteria for how and when a disaster should be confirmed and
communicated. A procedure for delegation of authority should be in place, so
that IT and other infrastructure team members are empowered to act if the
usual authorities are unavailable.
Recover Phase Procedures:
Team Notification Procedures
Directions:
◾ Clearly document call-out procedures to ensure fast response from the disaster
recovery team.
Recovery and Data Restore Initiation
Directions:
◾ Outline clearly the process for launching the disaster recovery plan, enabling
the recovery site and recovering each system by order of priority.
Progress Communication
Directions:
◾ Identify preferred channels of communication, document who are the impor-
tant stakeholders and suggest the frequency of communication.
Recovery Team Support
Directions:
◾ Record plans that guarantee the recovery team will receive enough water,
food, rest and resources to be able to successfully complete their tasks.
◾ Give direction for handling employees’ personal requirements, for example
family emergencies, illness or injury.
Resume Services Phase:
Resuming Normal Operations Procedure
Directions:
◾ Identify and record the process for having normal operations begin again,
including checking readiness to resume and communicating intentions and
status to stakeholders.
Review Phase Procedures:
Review Procedures
Directions:
◾ Write down and record the review procedure, including analysis, assessment
and clarifying what was leared and how future procedures can be improved.
Communicate the findings of the post event analysis to a broad group of the
organizations’ stakeholders.
IT Disaster Recovery — Planning Guide 23 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
Stage 4.3: IT Services Recovery Plans
Instructions:
◾ This section documents plans for recovering the IT services listed in Stage 2.5
on page 15
◾ It is recommended that you include the following information relevant to each
service:
– Responsibility – Specify who is responsible for managing this service as
well as any backup contacts in the event they are not available.
– Service Context – Who uses the service, what are the criticality periods,
contact information for vendors and other personnel such as database
administrators and application owners.
– Service Classification – Specify the classification of this service (critical,
vital, necessary or desired) as determined in Step 2.4 on page 8
– Recovery Strategy and Location – Specify the overall strategy for recov-
ering this service as well as where the service will be recovered.
– Assumptions – Specify any assumptions required to follow the recovery
procedure, such as the ability to restore from backups, etc.
– Recovery Time Objective (RTO) and Recovery Point Objective (RPO)
– Specify the recovery time objective and recovery point objective for
this service as determined in Step 2.2 on page 7
– Recovery Platform – Specify the technology platform required to restore
this service. For example, virtualized Windows servers configured similar
to the current production environment.
– Recovery Procedure – Consider providing an overview of the major steps
of recovery before providing detailed recovery procedures. Select the
minimum level of documentation possible that reduces risk to an ac-
ceptable level as more detailed documentation requires more time to
create and maintain.
– Test Procedure – Specify how the service can be tested to ensure that
it’s working correctly.
– Resume Procedure – Specify how to resume the service after the event
has been addressed.
Example list of IT Services
Payroll
Marketing CRM
Email
Financial Services
Delivery Monitoring
Legal Records
IT Disaster Recovery — Planning Guide 24 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
Stage 5: Testing the DR Plan
Instructions:
◾ Document and communicate the reasons that plan testing and review are
critical, the nature of the tests to be done, and the frequency of disaster
recovery plan testing.
Disaster recovery plan checks are one of the most important factors in building the
best plan for your organization. Creating an effective, high standard IT disaster
recovery plan comes as a result of good team cohesion. Therefore, drilling and
checking are mandatory to achieve the end goal of reliable disaster recovery.
It is imperative that these reviews take place periodically, especially because changes
that are not connected to the technology can likely have a significant impact on the
DR plan.
◾ Updated the plan to meet new organizational changes, priorities and aims.
◾ Be sure to update the call lists.
◾ Be sure to update the team lists.
◾ Check that updates have been effected relevant to any alterations to config-
urations in the environment.
Remember: a good disaster recovery plan is one that it can be carried out effectively
and smoothly whenever needed. Each person and each system involved in the plan
should be a full part of the practice.
Every six months, the disaster recovery plan should be checked and given a run-
through. On a more frequent basis, Bacula Systems recommends doing a walk-
through, a disaster simulation and even a complete failover test.
Further Reading:
Bacula Enterprise Edition has a large number of features that make it especially
suitable to be used as part of a Disaster Recovery Plan. Because of it’s deep
technical features, wide choice of storage destinations (including Cloud), and broad
customizability, Bacula Systems recommends you consider this software as integral
to your Disaster Recovery plans. Download the Bacula Enterprise Edition technical
whitepaper “......
How....
to................
implement ...........
disaster............
recovery............
strategy.......
and .......
high ...............
availability”
here:
https://www.baculasystems.com/how-to-implement-disaster-recovery-strategy-
and-high-availability-the-bacula-systems-whitepaper
This whitepaper provides consideration for Disaster Recovery from a software data
recovery aspect. Or ..........
contact..........
Bacula............
Systems to find out more on how this unique
software can help you.
IT Disaster Recovery — Planning Guide 25 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners
For More Information
For more information on Bacula Enterprise Edition, or any part of the broad Bacula
Systems services portfolio, visit www.baculasystems.com.
IT Disaster Recovery — Planning Guide 26 / 26
Copyright © April 2018 Bacula Systems SA www.baculasystems.com/contactus
..............................................
All trademarks are the property of their respective owners