Disaster Recovery in Cloud Computing Systems
Disaster Recovery in Cloud Computing Systems
Abstract—With the rapid growth of internet technologies, problem has been addressed using many different approaches
large-scale online services, such as data backup and data recovery including distributed computing, server clustering, and wide-
are increasingly available. Since these large-scale online services area networking [3].
require substantial networking, processing, and storage capacities,
it has become a considerable challenge to design equally large-scale Small and Medium Business (SMB) corporations are
computing infrastructures that support these services cost- progressively coming to terms with the fact that the cloud
effectively. In response to this rising demand, cloud computing has service offers many benefits in terms of managing and
been refined during the past decade and turned into a lucrative facilitating their business. They can acquire immediate access
business for organizations that own large datacenters and offer to effective business applications and significantly expand
their computing resources. Undoubtedly cloud computing provides their infrastructure resources, all at a minimal expense[4].
tremendous benefits for data storage backup and data accessibility Cloud computing is understood as a strategy to enhance
at a reasonable cost. This paper aims at surveying and analyzing existing capabilities and to dynamically introduce new
the previous works proposed for disaster recovery in cloud functionalities without investments in different infrastructures,
computing. The discussion concentrates on investigating the offer training to new employees, and ensure the accreditation
positive aspects and the limitations of each proposal. Also of new software packages to expand IT abilities [5]. In today’s
examined are discussed the current challenges in handling data business environment, the data services operated by CPs
recovery in the cloud context and the impact of data backup plan encounter many challenges in ensuring a high level of
on maintaining the data in the event of natural disasters. A
reliability of data services before and after disasters [6]. Data
summary of the leading research work is provided outlining their
weaknesses and limitations in the area of disaster recovery in the
services must ensure reliability and flexibility through an
cloud computing environment. An in-depth discussion of the effective and practical DR plan. The data reliability and
current and future trends research in the area of disaster recovery flexibility are essential requirements for any firm to maintain
in cloud computing is also offered. Several work research financial success and sustain the future growth of the
directions that ought to be explored are pointed out as well, which organization [6]. The main issue concerning disaster recovery
may help researchers to discover and further investigate those in the cloud computing context is how to provide an effective
problems related to disaster recovery in the cloud environment plan for data backup and recovery that guarantees high data
that have remained unresolved. reliability at a reasonable cost prior to a disaster. Thus, a
number of solutions have been offered focusing on disaster
Keywords—Cloud computing; data backup; disaster recovery; recovery and data backup in a single-cloud paradigm [6-9].
multi-cloud
This paper attempts to highlight and discuss the existing
I. INTRODUCTION research done on disaster recovery in cloud computing
Since its introduction in the commercial sector, cloud including single and multi-cloud environments. The surveyed
computing has undergone a significant change in storing and studies are thoroughly evaluated to identify the strengths and
securing information. With cloud computing, data are run in a limitations of each work. Besides, the current and future
collection of nodes including servers and remote computers, trends related to disaster recovery in the cloud environment
which enables users to remotely access the data at any time are discussed. In the focus of the discussion are the major
and from any location. The cloud service providers wish to issues concerning data backup and recovery in the cloud
ensure the delivery of flexible services offered in such a way paradigm.
that keeps users separated from the underlying infrastructure. The remainder of this paper is structured as follows:
Cloud computing is important when applied to data recovery Section II explains the four categories of disasters that are
due to its flexibility, cost-effectiveness, reliability, and most likely to occur. In Section III, an overview of disaster
scalability. However, since the internet constitutes an open recovery is outlined while its different types are explained in
network for sharing information and conducting transactions, Section IV. The discussion covers the three current recovery
it possesses many security and privacy risks as well as techniques of cold site recovery, warm site recovery, and hot
availability issues, particularly for businesses [1,2]. This site recovery. Section V examines the concept of DR in the
This work is fully supported by the Kulliyyah of Information and
Communication Technology, International Islamic University Malaysia, Malaysia
702 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
cloud computing context. The research challenges linked to obtaining a dedicated and shared model that can serve DR at
DR in cloud computing are discussed in Section VI. The most low cost and sustain a high speed of access [10-12]. Disaster is
noteworthy research done on data DR in cloud computing is defined as any kind of event that leads to critical or
reported and discussed in Section VII. Furthermore, a devastating damage to a system and results in compromising
descriptive summary of the related works covered in this the availability and the continuity of the system’s operations
survey is given outlining the merits and demerits of each and services for an unknown period. Thus, due to the huge
work. Subsequently discussed are the current challenges such negative impact of any kind of such disaster on the essential
as the number of replications, storage cost, and data reliability services of the system, many businesses, and public services
in a multi-cloud DR process in Section VIII. Moreover, strive to install effective disaster recovery mechanisms that
several future work directions deserving to be explored further can preserve the sensitive data and decrease the downtime to
are included in observations throughout the paper. The the minimum level (service disruption). Disasters can be
research conclusions are presented in the final Section IX. classified into four main classes based on their nature and
type, namely climate disaster deliberated and/or intended
II. TYPES OF DISASTER disruption, damage or loss of utilities and services, and system
Disasters, whether man-made or natural, can result in equipment malfunction. These four types of disasters are
costly service interruption. For many organizations, adopting further elaborated in Table I.
cloud computing constitutes the most reliable way of
Flood Rapid and uncontrolled increment in water level in a stream, natural or artificial lake, dam,
or coastal area.
Fire Fires that cause severe and serious damage in properties can be ignited by inadvertent acts
such as lightning, arsonists, smokers, or burning wood or any other inflammable materials.
Subsidence and landslides Natural disasters may occur in certain areas on earth. This includes landslides due to heavy
rain or heavy objects falling from high places such as rocks.
Climate disasters Windstorm Strong winds with high speed that might strike some regions especially in low atmospheric
pressure areas like deserts.
Contamination and The source of this type of natural disaster includes any substances such as chemical,
environmental hazards airborne radioactive particles that compromise the surrounding environment and threaten
the population, particularly in urban areas. This type of disaster also includes pollution of
the air due to the emission of some toxic substances in the event of earthquakes and
hurricanes.
Arson Arson is a deliberate act of setting fire with the intention of vandalism that causes damage
to property such as buildings, bridges, vehicles, and private homes.
Labor dispute/ Industrial When a group of workers are dissatisfied with their work conditions and want to show a
Deliberated and/or Action form of refusal to perform work through collective action such as demonstration or strike.
intended disruption This type of disaster comes in a form of threatening others using violence to create fear to
Act of terrorism accomplish personal, political, or ideological goals. Acts of terrorism do not distinguish
between civilians and/or government officials and may target anyone in society.
Act of sabotage Deliberate destruction or damage of equipment to hinder a particular group.
This type of disaster encompasses several harmful activities that lead to the interruption of
Loss of utilities and normal electrical power services. Furthermore, a disruption in network services is not
Electrical power failure and
services considered as an immediate disaster; however, network breakdown is problematic if the
Network services breakdown
outrage negatively affects the ability of the company to provide services to its clients,
vendors, and business partners.
Cooling plant failure Interruption of the cooling plant that can cause the unavailability of services and facilities.
A/C failure Interruption of the air conditioning system that can cause the unavailability of services and
facilities.
Equipment or system Fire suppression failure Interruption of fire suppression that can cause the unavailability of services and facilities.
failures
Internet failure Internal power outage.
Equipment failure Piece of equipment that physically fails in such a way as to impair its availability and
performance.
703 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
IV. AN OVERVIEW OF DISASTER RECOVERY for the recovery process to be completed. Similarly, the BIA
DR refers to planning the minimization of data loss and represents the risk analysis that examines the CBFs and the
recovery when such losses occur in terms of the expected MAO in order to determine the impact the function failure has
legal, regulatory, financial, and reputational effects. on a business. The BIA can also be used to specify the priority
Regardless of the type of industry, unforeseen events can of recovery attempts that need to be accomplished [8,17].
bring business operations to a standstill and incur extensive V. TYPES OF DISASTER RECOVERY
financial loss and/or reputational damage to an organization
[3, 12-16]. Therefore, a data recovery plan is critical to The various types of DR upon which others are built
maintaining continuity by providing all the solutions and steps include cold site recovery, warm site recovery, and hot site
needed to restore normal operations as soon as possible. Most recovery. As shown in Table II, the current technology
business organizations throughout the world rely on data to standards for platform recovery can be implemented using one
derive competitive advantage and thrive in the marketplace yet of the following techniques [3, 7, 17, 22]:
give little thought to potential data losses and the Hot site: Computers are configured and equipped with a
consequences thereof. DR is usually the responsibility of the list of software and data to accept the production load when
IT department as it is principally concerned with the recovery the primary server is down. The fail-over is typically (if
of computing systems and data after a breach. A breach may required) obtained through cluster configuration. The standby
be caused by a natural disaster such as a fire, storm, or flood, cluster configuration is separate and distinguished from the
yet it can also have man-made causes, such as a power outage, master database configuration.
malware, data theft, or other malevolent practices. DR
preparedness usually requires the implementation of a Disaster Warm site: Computer hardware is pre-configured and
Recovery Plan (DRP) so that the steps and procedures to be supplied with a list of software. Once a disaster occurs, the
followed after an incident and can be codified beforehand Domain Name System (DNS) is switched and redirected to the
[8,12, 17-18]. backup site, and the server accepts the production load. The
services have to be restarted manually.
Thus, a DRP constitutes an essential and necessary aspect
of any functional enterprise [3, 8, 17-21]. It consists of a set of Cold site: In cold site, the hardware elements of the
procedures and predefined policies that attempt to ensure the computer need a set of software associated with a set of data
continuity of the critical business services and sustain the to be generated or restored before promoting the system into a
organization’s mission by providing the usual services to the productive state.
target clients during and after the disaster. One of the essential Generally, if a disaster occurs at one of the sites, the
tasks of any effective DRP is to help firms rebuild and restore business is successfully switched to other sites. DR for large-
their system after the failure of their software and hardware scale hazards usually requires shutting off the power to all
components. Unlike fault tolerance that ensures the continuity utilities and evacuating the facility if required, with the exact
of the operations due to a failure occurring in one of the tasks to be performed to protect personnel and save lives as
system components, DR is more concerned with serious identified in the DRP. Many natural disasters, such as flooding
damage and long-term disruption of the business services or major fires, can cause extensive damage to storage media,
[8,17, 20]. A DRP intends to manage and maintain the system in which case specialized and professional data recovery
that is affected by events that have an immediate impact on techniques must be used. The physical recovery of data is
the availability and the continuity of the services. This conducted through different means depending on the extent of
includes but is not limited to recovery against cyber-attacks the damage, and it may require the use of custom hardware
that threaten security, natural disasters, and server outages. A and software recovery systems such as spin-stand data
typical disaster recovery plan includes certain steps that ensure recovery from physically damaged media and data carving [7,
the rapid implementation of the DRP to restore the system to 22-23].
its normal state. Many critical parameters should be
considered when designing a DRP, which encompasses TABLE II. STANDARDS PLATFORM RECOVERY
Critical Business Functions (CBFs), Maximum Acceptable
Outage (MAO), Recovery Time Objective (RTO), and Option
RTO
Description
Cost
Business Impact Analysis (BIA). The most critical parameter Coverage Indication
in the case of a disaster is CBFs, which include a set of The hot site option needs a high
functions very critical in sustaining the business continuity of Hot
Minutes attention level from the administrative
the services by the organization. Any long-term interruption of (5 min – 4 staff of the organization. The age of High
Site
hrs) data is dependent on the data recovery
these services means that the organization fails to execute its strategy.
critical operations. There is also a strong relationship between
the service disruption and the maximum time that a function The warm site option denotes that the
Hours organization has sufficient resources
can be unavailable without affecting the main mission of the Warm
(4 – 24 to recover the system. Nevertheless, Medium
organization, which is called (MAO). Also, to ensure smooth Site
hrs) some extra work is needed to make it
continuity in the organization’s service, the maximum time live.
before recovery should be computed accurately. It should be Days The cold site needs to reconstruct the
Cold
noted that for any DRP the RTOs must be either greater than Site
(1 – 7 system in a way the recovered data is Low
or equal to the MAO since the RTO represents the timeframe days) transferred to another location.
704 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
Another type of DR concerns the backing up of critical another location while downstream circuit breakers prevent
business data into one or more geographically distributed DCs any failure of processes and interfaces that rely on that
so that there is a very low probability of all the sites being application [26]. An important concern in DR service delivery
affected at once. An important aspect of DR is information is continuity of service to enable applications to come back
assurance, which is implemented through multiple Network online very soon after a disaster [3, 15, 18, 21, 25, 27- 28, 37].
Attached Storage (NAS) and Storage Area Networks (SANs).
Information assurance and recoverability can also be ensured There are numerous benefits of adopting disaster recovery
using grid computing and cloud storage. The DRP developer in the cloud. Nevertheless, several weaknesses may prevent
considers the cost involved (including an evaluation of the people from exploiting disaster recovery in the cloud. Table
III summarizes the advantages and disadvantages of DR in the
cost of planning against the cause of failure), the optimal
facility location, the optimal data allocation units and the cloud [17, 29 - 30,38].
method to be followed for data replication. In consequence, TABLE III. ADVANTAGES AND D ISADVANTAGES OF D ISASTER RECOVERY
autonomous and semi-autonomous remote data backup and IN THE C LOUD
recovery processes have proven to be more popular as storage
costs have decreased and bandwidth increased. Advantages Disadvantages
The arrival and maturity of cloud Customers may be concerned
VI. AN OVERVIEW OF DISASTER RECOVERY IN CLOUD computing represent a paradigm shift about security and data
COMPUTING where many of the same functionalities confidentiality since company
can be shifted to the cloud. data are transferred to a third
A proactive disaster recovery plan constitutes an essential DRPs using cloud architecture are party.
requirement to sustain long-term success for organizations. A attractive for small and medium-sized A company has no control
set of well-planed measures that system recovery in the case enterprises. over where its data will be
of a disaster is necessary to ensure the continuity of the Companies can “outsource” their stored.
services and ensure the availability of daily business activities. computing requirements and continuity There have been many
planning to CSPs. incidents of company insiders
A well prepared DRP is very beneficial and can be considered The cloud has a rapid turnaround time engaging in malpractice.
as a long-term investment for many organizations. It is with outages lasting no more than a few Companies become
disputable, however, if we acknowledge the fact that the hours. dependent on CSP.
immediate impact of the DRP is unclear and its potential In-house personnel can work with the The long-term viability of the
benefits may be rejected. However, cloud-based data backup CSP to redirect customers to the cloud CSP becomes a source of
during a disaster. concern for the company.
and recovery has become predominant and proven to be a
The entire process remains transparent to
cost-effective strategy compared to other non-cloud-based customers worldwide who do not
approaches [7]. In the cloud environment, the idea of experience the effects from the disruption.
virtualization is no longer relevant to the specifications of the
hardware on which it runs. This independency between VII. ISSUES AND CHALLENGES OF DISASTER RECOVERY IN
virtualization technology and hardware often means that
THE CLOUD
organizations are able to safely migrate their data, OS, and
software tools to the cloud taking into consideration the Since its adoption by a large number of corporations in the
financial advantages. The performance of the recovery process world, cloud computing has become an indispensable element
is considerably influenced by the network bandwidth and the in running the essential business operations for large, medium,
scalability of services. In other words, high network and small-scale organizations. This is due to its unique ability
bandwidth with sustainable scalability of services ensures the to ensure the availability of the services and provide resources
rapid commencement of the recovery process. After a disaster, that are efficient and reliable while maintaining a reasonable
all operations can be re-executed again within a few hours cost. The cloud model relies on the concept of pay-as-you-go,
according to the compatibility of the IT structure and the which means the user can request the needed resources from
cloud-based DR. It is worth noting that most of the data the cloud service provider and be billed according to the used
backup and recovery processes are fully automated and resources. Many service models have been incorporated in
requires either minimal or no human intervention [7,24,25, 37 cloud computing. This includes but is not limited to
- 38]. Infrastructure as a Service (IaaS), Software as a Service
(SaaS), and Platform as a Service (PaaS). Other service
The primary importance of utilizing cloud architecture for models are also provided by the cloud providers, for instance,
implementing a DR strategy is the consequent increase in the Database as a Service (DBaaS). Despite the tremendous
overall resilience of the organization’s processes and benefits of cloud computing in running the essential business
applications. Most CSPs use the geographically distributed operations for organizations, some are reluctant to fully adopt
model of data backup and redundancy so that companies the cloud paradigm due to security and privacy issues. Thus,
experiencing widespread outages in their networks due to a cloud computing has not been fully exploited by many
disaster can recover within a few hours and with minimal organizations.
disruption [10]. For example, the Amazon cloud stores
mission-critical customer applications at multiple A variety of reasons and challenges may prevent many
geographically dispersed DCs and uses the “fail gracefully” organizations from moving towards disaster recovery planning
design philosophy. If there is a momentary outage in an in cloud computing. In the following, we outline the most
application at one location, the customer is notified critical factors that contribute to rejecting cloud computing [2,
immediately. The application is then automatically switched to 13, 17, 22, 24- 28, 31, 36 - 38]:
705 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
Lack of Full Control of Data: Sharing data with cloud higher cost than the asynchronous replication strategy, which
providers can result in losing the full control of data. Since the in turn may negatively affect the system performance. A larger
data backup is executed by the cloud service provider, clients number of tiers in the web application leads to a significant
may feel concerned about their data dependency with the CPs increment in the Round Trip Time (RTT) between the primary
and the risk of data loss. Hence, it is crucial for these site and the backup site. Although the asynchronous
organizations that they select the most reliable service replication strategy is cheaper, it does not deliver the same
provider who can guarantee the integrity and the privacy of level of quality service for disaster recovery. Thus,
their data. Given these concerns, many organizations may be organizations should strike a balance between cost and desired
reluctant to move their businesses on the cloud. performance taking into consideration the requirements of
their particular situation. Furthermore, the latency in data
Operation Cost: Operation cost to run the organization’s replication constitutes a major concern when deciding whether
business on the cloud constitutes a critical factor that to adopt the cloud as their preferred platform to run the
influences the decision to adopt it. However, the actual cost of business.
running user business on the cloud after switching to a data
recovery service is reduced. This reduction in the operating Security of Data Storage: One of the essential benefits of
cost may attract many users to adopt the cloud as their cloud services is that it offers an adequate solution to the issue
preferred platform to run their businesses. The goal of any of data storage. It allows organizations to store their data by
cloud service provider is to always propose an effective data providing unlimited space at a reasonable cost. The extensive
recovery plan with the least cost. The operating cost of usage of cloud services leads to a steady increment in the
disaster recovery comprises of the following components: amount of data required for storage. Cloud storage services
offer greater flexibility and thus save the budget. Using cloud
1) Setting up and implementation costs, which denotes the storage requires less investment than purchasing conventional
cost of migrating and implementing the organization’s data storage devices. The architecture of a cloud storage
business on the cloud. This cost will reduce in the long-term system comprises of four layers, namely physical storage,
run for the business. infrastructure management, application interface, and access.
2) Operation cost represents the estimated cost for the The smooth and reliable running of the applications requires a
daily activities to operate with the data. This includes the distributed computing environment that ensures availability,
operating costs of data storage, transfer, and processing. reliability of services, and balancing the workload among all
3) Disaster costs indicate the total cost of data recovery in servers. However, data security requires centralized storage in
which data are placed in one single storage point. This means
the event of a disaster as well as the estimated cost of the
that the security of the stored data is at high risk if any failure
damage for unrecoverable disasters. The potential cost of the occurs on the cloud service provider.
disaster has a significant impact on the total cost of the
services on the cloud. Lack of Redundancy: When a disaster occurs on the
primary site running the services, the cloud service provider
Speed of Response in Failure Detection: The duration of immediately activates the secondary site and redirects the
the time to detect and report to the system failure is very incoming requests and services toward the secondary site to
crucial to sustain a high level of availability and reliability. ensure the continuity of the business. Running services on the
The speed of response to system failure reflects the period in secondary site will have negative implications on the future
which the system is down and all services are inoperable. data backup process as no replication technique (synchronous
Therefore, it is an essential objective for any cloud provider to or asynchronous) can be performed. Failure in running future
ensure a fast reaction to the service disruption of the system. data backup due to the outage of the primary site thus
However, in certain cases, multiple backup sites are engaged, increases the risk of data loss since one single local storage
which makes it difficult to immediately distinguish between (secondary site) is available. However, this issue can be easily
service disruption and network failure and take the necessary resolved once the primary site is restored. Overall, any
action for detecting and reporting the problem. disaster recovery strategy should provide the best solution
Security: A cyber-terrorist attack is a typical example of a possible to ensure the precise assessment of all the potential
man-made disaster whereby the system resources are attacked types of risk and examine their negative implications.
for a variety of reasons. Such attacks may cause data
VIII. SOLUTIONS OF DISASTER RECOVERY IN CLOUD
corruption and destroy the system. Hence, any form of data
COMPUTING
protection must ensure a high level of security and rapid data
recovery. They constitute the key elements that influence any This section examines several of the proposed solutions
decision to adopt disaster recovery services. that are relevant to disaster recovery in the cloud computing
environment. We attempt to evaluate these solutions by
Replication Latency: The concept of a disaster recovery highlighting the merits and limitations of each approach. A
plan relies on performing data backup through replication. summary table given at the end describes some characteristics
There are two different strategies of data replication that can of the works considered.
be utilized, namely synchronous and asynchronous replication
strategies. Synchronous replication strategy aims at ensuring a The study completed by Pokharel et al. [32] introduces the
high probability of fulfilling the requirements of the Recovery Geographical Redundancy Approach (GRA) to disaster
Point Objective (RPO) and Recovery Time Objective (RTO). recovery in the cloud system. GRA is analyzed using the
Nevertheless, the synchronous replication strategy incurs Markov model, and the experiment result shows that it
706 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
accomplishes a high availability and survivability while Protection Level (PL) and Placement Constraint (PC). PL
sustaining a low downtime and low cost. However, the denotes the degree of reliability required by the client against
proposed approach is not evaluated in terms of measuring the the simultaneous datacenter failures while PC denotes the
RTO and the RPO that are considered an important measure in constraint on some DC locations either to be included or
evaluating any DR solution. Most importantly, the proposed excluded from the list of potential locations for the data
solution fits only single-cloud systems and may not apply to backup. DDP-DR derives the optimal plan based on the most
multi-cloud systems where multiple remote independent critical business and operational factors such as cost of data
clouds are interconnected. storage and replication, Recovery Time Objective (RTO), and
Recovery Point Objective (RPO). Several experiments are
A comprehensive survey is offered by Wood et al. [27]
conducted to evaluate the efficiency of the proposed solution
who list the current disaster recovery solutions and practices in different scenarios. However, the proposed solution does
concentrating on the most critical factors that affect the not include computing the network cost for data transmission
disaster recovery process. The three categories of disaster during the backup process and is limited to one client. In some
recovery mechanisms that are defined are the hot backup site, real-life scenarios, there may be more than one client within
warm backup site, and cold backup site. The study also the same DR architecture.
discusses the issue of failover and failback that may occur in
the event of a disaster, emphasizing on how to restore the Grolinger et al. [35] discuss the problem of disaster data
control to the primary site and ensure the continuity of the management. They emphasize that most of the current data
business-critical services. management solutions designed for disaster recovery lack the
integration capabilities in order to minimize the negative
The study completed by Jian-hua and Nan [33] describes
impact on user data. The proposed framework called
the typical cloud storage architecture that consists of a storage Knowledge as a Service (KaaS) handles the cloud data
layer, an infrastructure management layer, an application
management process during a disaster. It stores as much as
interface layer, and an access layer. It also explains the typical possible from the disaster-related data, thus sustaining the
architecture of disaster recovery deployment in the cloud
interoperability and the integration of the data. Facilitating
system that manages the cloud storage in the inter-private data integration relies on using knowledge acquisition and
cloud model. It stores the application data in the server, knowledge delivery. Knowledge acquisition includes
remotely connected to another set of backup servers information extraction and retrieval to develop a sound
distributed over different areas. Each backup server has structure for the disaster data while knowledge delivery is
another two backup servers, the local backup server (LBS) and used to integrate information from different data sources and
the remote backup server (RBS). An incremental data backup forward it to the target clients. However, the proposed
approach is used to progressively update the data in order to framework is not tested and evaluated empirically in order to
decrease the usage of network bandwidth and accelerate the determine its efficiency and effectiveness. Moreover, not
data backup process. Several enhancements in the service
discussed is the issue of disaster recovery in multi-sites where
experience lead to reduced data traffic and transmission cost, backup data need to be distributed among several remote
which includes carrying out data compression and encryption
locations. Lastly, the proposed solution does not incorporate
before the data backup process. The model is designed to the issue of deriving the optimal plan for data backup during
work in a single-cloud environment that replicates the original
the disaster.
data. Creating one single replica is very crucial and increases
the risk of data loss particularly in the event of a disaster. Saquib et al. [6] proposed a new model named Disaster
Recovery as a Service for database applications in cloud
The work introduced by Javaraiah [34] highlights the issue computing systems. The proposed model provides a solution
of online data backup in cloud computing systems. The
for disaster recovery with zero data loss and fast recovery. The
approach concentrates on managing the data backup process proposed model exploits the synchronous technique for data
on the consumer’s premises to reduce cost. The approach is
replication to ensure minimum RPO and RTO. However, the
designed to handle complicated issues associated with the study lacks the empirical comparison with other cloud-based
online data backup process in the cloud along with DR. disaster recovery solutions that would determine its
Among the critical issues considered is eliminating the effectiveness. Moreover, the solution is limited to single cloud
dependency on other cloud providers when performing the systems and may not fit multi-cloud systems.
data backup operation. Various experiments are conducted,
and the results have shown that the proposed solution achieves Satoshi Togawa and Kazuhide Kanenishi [14] introduce a
low costs data backup and simplifies the migration process of new framework of disaster recovery for e-learning systems
data from one CP to another. Nevertheless, this work is that sustain business operations during natural disasters such
limited as it focuses exclusively on the issue of data DR in a as earthquakes and tsunami. A prototype that works in a
single-cloud environment and does not address the private cloud model is developed based on IaaS architecture.
maintenance of business services during and after a disaster. The proposed framework incorporates a distributed storage
system to ensure that the framework continues sustaining e-
Sengupta and Annervaz [31] address the issue of disaster learning services even after the disaster. Several experiments
recovery in multi-sites architecture where the data backup are conducted that prove its effectiveness. However, the work
resides in multiple distributed locations. Data Distribution
fails to examine the framework in terms of the critical
Plan for multi-site Disaster Recovery (DDP-DR) is proposed business operational metrics of cost, RTO, and RPO. In
that offers different plans for data distribution based on
addition, it is tailored to work in a single cloud environment
707 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
where only one single data backup is performed. Any failure solution is tested by producing the optimal plan for resource
in the backup site may thus result in data loss and long- term allocation for impatient tasks. Nevertheless, the proposed
service disruption. solution is limited to a single cloud with multiple data centers
distributed over many remote locations. Besides, not
Lenk [8] focuses on the issue of data deployment for considered are managing the replication plan to generate a
distributed systems in the event of a disaster. The proposed
minimum number of data backup without compromising the
deployment method utilizes the Cloud Standby Disaster reliability requirements for the user. Also, the algorithm is not
Recovery for warm standby in the cloud and runs on different evaluated in terms of Recovery Time Objective and Recovery
clouds with many cloud providers. The method enables Point Objective. These two parameters are very essential in
independent and automated data deployment. The method is
the investigation of disaster recovery solutions in cloud
tested in several experiments, and the results show that the computing systems.
recovery time is reduced significantly. Nevertheless, the fault-
tolerance of the deployment method is not investigated. Sabbaghi et al. [3] propose a framework formed by
integrating five essential types of proven redundancy
Jena and Mohanty [9] investigate the issue of disaster
techniques that have a major impact on the uptime of services
recovery in intercloud systems exploiting the genetic in cloud DCs. This work focuses on how disasters can be
algorithm for resource allocation. The main aim is to provide controlled in a cloud computing DC and how to keep the
fast track and balanced mapping procedures for impatient organization’s business running in the event of a disaster. The
tasks in the cloud system. The proposed approach utilizes the proposed framework is evaluated through a survey of
genetic algorithm and Pareto optimal mapping to manage networking professionals and experts. The results are provided
resource allocation while sustaining a high utilization rate of
for evaluation but do not include the performance metrics
the processors, high throughput, and producing a low carbon RTO and RPO. Table IV summarizes the previous approaches
footprint. A variety of experiments are conducted to evaluate
of DR in the cloud computing environment.
the performance of the proposed approach. The proposed
TABLE IV. SUMMARY OF PREVIOUS APPROACHES OF D ISASTER RECOVERY IN THE CLOUD
Type of DR
Author and Year Scope CP No. Parameters Limitations
Cloud
Pokharel et al. [32] Single Cloud DR 1 Infrastructure cost, Downtime Did not discuss RTO and RPO
Did not provide RTO and RPO analysis to ensure
Wood et al.[27] Single Cloud DR 1 Cost, RTO, RPO, Performance
continuity
Did not present RTO and RPO analysis; no
Jian-hua & Nan [33] Single Cloud DR 1 Storage cost
experimental result
Did not discuss the parameters RTO and RPO;
Javaraiah[34] Single Cloud DR 1 Infrastructure cost
did not ensure continuity
Sengupta & Storage cost, Protection level, The proposed model only considered the case of
Single Cloud DR 1
Annervaz[31] RTO, RPO one customer, single-cloud multiple DCs.
Did not discuss data recovery; did not provide the
Grolinger et al.[35] Single Cloud DR 1 Storage space full framework; did not use the performance
metrics RTO and RPT to test the framework
DR and Did not provide performance analysis; did not
Saquib et al.[6] Single Cloud 1 Infrastructure cost, RTO, RPO
BC ensure BC
Togawa & DR and Did not discuss the parameters RTO and RPO;
Single Cloud 1 Migration Time
Kanenishi[14] BC did not ensure BC
A. Lenk[8] Single Cloud DR 1 Cost, Time, RPO Did not discuss the parameters RTO and RPO
Jena & Mohanty[9] Single Cloud DR 1 Cost, Time Did not discuss the parameters RTO and RPO
Sabbaghi et al.[3] Single Cloud DR 1 Cost, Time Did not discuss the parameters RTO and RPO
IX. DISCUSSION AND FUTURE WORK RECOMMENDATION DR mechanism in order to protect their critical data and
minimize the downtime caused by catastrophic system faults.
This section highlights and discusses the issues and Among the types of technologies adopted in DR, systems are
challenges relevant to DR examined in this paper. Also, this asynchronous backup or continuous synchronization of data
section presents the future directions towards DR in cloud and preparing standby systems in geographically separated
computing. Most of today’s company services rely on IT places. During the past decade, cloud computing has emerged
systems, some of them being of critical importance to society as the new service paradigm and is gaining in popularity. A
such as financial services and health care services. Even a vast number of services are now being built on the cloud
very short period of downtime or a very small amount of data platform. These services utilize the resources of a cloud
loss may result in huge economic losses or social problems. platform with a pay-as-you-go pricing model. The on-demand
Therefore, most important business and public services use nature of cloud computing vastly reduces the cost and RTO of
708 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
DR whose peak resource demands are much higher than Privacy and Confidentiality: A significant and critical
average demands. However, data DR represents a kind of issue is that cloud data storage must guarantee privacy and
service that possesses the highest data reliability requirements. confidentiality of the data used for DR. Therefore, an effective
How to perform data DR service using the cloud computing approach that addresses the issue of privacy and data
paradigm to maximize data reliability while reducing cost and confidentiality in the cloud data storage is required.
RTO still constitutes a challenge. Similar to other computer
systems, cloud computing systems also risk dependency, X. CONCLUSION
failure detection, security, human-caused damage, natural This paper has discussed and examined the issue of
disasters, and the like. All of these risks may lead to cloud disaster recovery in the cloud computing environment. An in-
service interruption or even loss of data. To ensure high data depth analysis of the state of the art for DR in cloud
reliability, CSPs deploy several data protection strategies. For computing has been given, together with an overview of the
example, popular distributed storage systems currently used in process of disaster recovery for computer systems. The
cloud platforms such as Amazon S3, Google GFS, and elements of DR in cloud computing have been reported, which
Apache HDFS have adopted 3-replicas data redundant includes overview, definition, and types of DR. Also
mechanism by default. However, in the case of an entire data discussed were the details of cloud-based DR analyzed using
center failure, data may still be lost. In order to avoid this traditional approaches. In addition, we also identified the main
problem, some CSPs use geographical data dispersion to issues and challenges of DR mechanisms that need to be
protect the most critical data, while data centers in distinct resolved. Several disaster recovery platforms have been
locations owned by one CSP use similar software stack, described. A comprehensive review of the previous studied of
infrastructures purchased in bulk, operation mechanism, and DR in the cloud in both public cloud and privately-owned
management team. There are still risks of multiple data center resources has been conducted. The paper concludes that data
failures due to common causes shared across data centers. DR services must ensure reliability and flexibility through an
Also, the number of data centers owned by one CSP is limited. effective and practical DR plan that constitute vital initiatives
In case some of them become unreachable, the surviving data for any organization to prosper and sustain growth. Finally,
centers may not apply to customers due to geographical the paper has examined the current trends in the area of
distance, especially in the event of emergency data restoration. disaster recovery in cloud computing and has pointed out
Thus, no matter how many preventive measures are being future work directions in the field of cloud-based DR to
taken, the possibility of data reliability disruption in a cloud identify the most recent issues and challenges that need to be
cannot be ignored. According to public reports, even the most explored further.
advanced cloud services have encountered several instances of
REFERENCES
wide-area outages and the shutting down of public services.
Therefore, the best solution for DR service is to utilize [1] Alzain MA, Soh B, Pardede E (2011). MCDB: Using Multi-clouds to
Ensure Security in Cloud Computing. 2011 IEEE Ninth International
multiple data centers from different CSPs. Some researchers Conference on Dependable, Autonomic and Secure Computing, Sydney,
focused on how to backup data in a cloud computing NSW, Australia.
environment. Javaraiah [34], for example, introduces online [2] Tebaa M, Hajji SEL (2014). From Single to Multi-clouds Computing
backup and DR and eliminates the dependency on CPs. Privacy and Fault Tolerance. IERI Procedia, 10, 112-118.
Sengupta and Annervaz [31] proposed a plan for multi-site [3] Sabbaghi F, Mahboubi A, Othman SH (2017). Hybrid Service for
DR where backup data can reside in multiple data centers, Business Contingency Plan and Recovery Service as a Disaster
including the public cloud. Recovery Framework for Cloud Computing. Journal of Soft Computing
and Decision Support Systems, 4(4), 1-10.
DR in cloud computing has the potential to become a [4] Chen D, Zhao H (2012). Data Security and Privacy Protection Issues in
frontrunner in promoting a secure, virtual, and economically Cloud Computing. 2012 International Conference on Computer Science
viable IT solution in the future. One of the challenges for data and Electronics Engineering, Hangzhou, China.
management in a cloud environment is how to design a model [5] Marston S, Li Z, Bandyopadhyay S, Zhang J, Ghalsasi A (2011). Cloud
computing — The business perspective. Decision Support Systems,
that tests data storage at low cost, and RTO with high data 51(1), 176-189.
reliability. Below are summarized the most critical issues [6] Saquib Z, Tyagi V, Bokare S, Dongawe S, Dwivedi M, Dwivedi J
relevant to DR in cloud computing that can be observed: (2013). A new approach to disaster recovery as a service over cloud for
database system. 2013 15th International Conference on Advanced
Cloud Data Storage: DR in the cloud possesses potential Computing Technologies (ICACT), Rajampet, India.
side effects that affect data availability and data access [7] Suguna S, Suhasini A (2014). Overview of data backup and disaster
performance. Moreover, it inevitably reduces the replication recovery in cloud. International Conference on Information
level of cloud data, and the location of replicas becomes more Communication and Embedded Systems (ICICES2014), Chennai, India.
important which needs further research focusing on data [8] Lenk A (2015). Cloud Standby Deployment: A Model-Driven
access performance. Deployment Method for Disaster Recovery in the Cloud. IEEE 8th
International Conference on Cloud Computing, New York, USA.
Cost-effective: The cost-effective cloud data storage [9] Jena T, Mohanty J (2016). Disaster recovery services in intercloud using
solution is still at its validation stage, where the approaches genetic algorithm load balancer. International Journal of Electrical and
provided are based on experimental environments. Therefore, Computer Engineering (IJECE), 6(4), 1828-1838.
effective solutions are needed to focus on implementing a [10] Prazeres A, Lopes E (2013). Disaster Recovery – A Project Planning
prototype of the solution in the cloud. Case Study in Portugal. Procedia Technology, 9, 795-805.
[11] Matos R, Andrade EC, Maciel P (2014). Evaluation of a disaster
recovery solution through fault injection experiments. 2014 IEEE
709 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 9, 2020
International Conference on Systems, Man, and Cybernetics (SMC), San [26] Jaiswal V, Sen A, Verma A (2014). Integrated Resiliency Planning in
Diego, CA, USA. Storage Clouds. IEEE Transactions on Network and Service
[12] Andrade E, Nogueira B (2018). Performability Evaluation of a Cloud- Management, 11(1), 3-14.
Based Disaster Recovery Solution for IT Environments. Journal of Grid [27] Wood T, Cecchet E, Ramakrishnan KK, Shenoy PJ, van der Merwe JE,
Computing, 16(2), 1-19. Venkataramani A (2010). Disaster Recovery as a Cloud Service:
[13] Yang P, Kong B, Li J, Lu M (2010). Remote disaster recovery system Economic Benefits & Deployment Challenges. HotCloud, 10, 8-15.
architecture based on database replication technology. 2010 [28] Liu G, Shen H (2017). Minimum-Cost Cloud Storage Service Across
International Conference on Computer and Communication Multiple Cloud Providers. IEEE/ACM Transactions on Networking,
Technologies in Agriculture Engineering, Chengdu, China. 25(4), 2498-2513.
[14] Togawa S, Kanenishi K (2013). Private Cloud Cooperation Framework [29] Shi X, Guo K, Lu Y, Chen X (2014). Survey on Data Recovery for
of E-Learning Environment for Disaster Recovery. 2013 IEEE Cloud Storage. International Conference on Trustworthy Computing and
International Conference on Systems, Man, and Cybernetics, Services, Beijing, China.
Manchester, UK. [30] Attiya I, Zhang X (2017). Cloud Computing Technology: Promises and
[15] Chang V (2015). Towards a Big Data system disaster recovery in a Concerns. International Journal of Computer Applications, 159(9), 32-
Private Cloud. Ad Hoc Networks, 35, 65-82. 37.
[16] Alshammari MM, Alwan AA, Nordin A, Al-Shaikhli IF (2017). Disaster [31] Sengupta S, Annervaz KM (2012). Planning for Optimal Multi-site Data
recovery in single-cloud and multi-cloud environments: Issues and Distribution for Disaster Recovery. International Workshop on Grid
challenges. 4th IEEE International Conference on Engineering Economics and Business Models, Paphos, Cyprus.
Technologies and Applied Sciences (ICETAS), Bahrain. [32] Pokharel M, Lee S, Park JS (2010). Disaster Recovery for System
[17] Alhazmi OH (2016). A Cloud-Based Adaptive Disaster Recovery Architecture Using Cloud Computing. 2010 10th IEEE/IPSJ
Optimization Model. Computer and Information Science, 9(2), 58. International Symposium on Applications and the Internet, Seoul, South
[18] Alshammari MM, Alwan AA, Nordin A, Abualkishik AZ (2018). Korea.
Disaster Recovery with Minimum Replica Plan for Reliability Checking [33] Jian-hua Z, Nan Z (2011). Cloud Computing-based Data Storage and
in Multi-Cloud. Procedia computer science, 130(C), 247-254. Disaster Recovery. 2011 International Conference on Future Computer
[19] Lenk A, Tai S (2014). Cloud Standby: Disaster Recovery of Distributed Science and Education, Xi'an, China.
Systems in the Cloud. New York, USA. [34] Javaraiah V (2011). Backup for cloud and disaster recovery for
[20] Osama E-T, Munir M, Lela P (2016). Assessing IT disaster recovery consumers and SMBs. 5th IEEE International Conference on Advanced
plans: The case of publicly listed firms on Abu Dhabi/UAE security Telecommunication Systems and Networks (ANTS), Bangalore, India.
exchange. Information and Computer Security, 24(5), 514-533. [35] Grolinger K, Capretz MAM, Mezghani E, Exposito E (2013).
[21] Alshammari MM, Alwan AA (2018). Disaster Recovery and Business Knowledge as a Service Framework for Disaster Data Management.
Continuity of Database Services in Multi-Cloud. International 2013 Workshops on Enabling Technologies: Infrastructure for
Conference on Computer Applications & Information Security, ICCAIS, Collaborative Enterprises, Hammamet, Tunisia.
Riyadh, Saudi Arabia. [36] Sengupta S, Annervaz KM (2014). Multi-site data distribution for
[22] Khoshkholghi MA, Abdullah A, Latip R, Subramaniam S, Othman M disaster recovery—A planning framework. Future Generation Computer
(2014). Disaster recovery in cloud computing: A survey. Computer and Systems, 41, 53-64.
Information Science, 7(4), 39-54. [37] Mohammad M. Alshammari, Ali A. Alwan, Azlin Nordin, Abedallah
[23] Ameigeiras P, Ramos-Muñoz JJ, Schumacher L, Prados-Garzon J, Zaid Abualkishik (2020). Data backup and recovery with minimum
Navarro-Ortiz J, López-Soler JM (2015). Link-level access cloud replica plan in multi-cloud environment. International Journal of Grid
architecture design based on SDN for 5G networks. IEEE network, and High Performance Computing. 12(2). 201-120.
29(2), 24-31. [38] Mohammad Matar Al-Shammari and Ali A. Alwan. Disaster Recovery
[24] Chintureena SV (2014). Ensured Availability of resources in a highly and Business Continuity for Database Services in Multi-Cloud.
reliable mode through Enhanced approaches for Effective Disaster Proceedings of the 1st International Conference on Computer
Management in Cloud. International Conference on Electronics and Applications & Information Security (ICCAIS’ 2018), 4 – 6 April 2018,
Communication System (ICECS), Coimbatore, India. Riyadh, Saudi Arabia.
[25] Aobing S, Tongkai J, Qiang Y, Song Y (2013). Virtual machine
scheduling, motion and disaster recovery model for IaaS cloud
computing platform. IEEE Conference Anthology, China.
710 | P a g e
www.ijacsa.thesai.org