0% found this document useful (0 votes)
47 views70 pages

CH 09

This document discusses various aspects of designing redundancy and business continuity plans. It covers topics like designing redundant systems using techniques like RAID and failover clusters to prevent single points of failure. It also discusses backup strategies and disaster recovery plans. Finally, it discusses testing plans and procedures to ensure business continuity in the event of a disaster or outage.

Uploaded by

Alles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views70 pages

CH 09

This document discusses various aspects of designing redundancy and business continuity plans. It covers topics like designing redundant systems using techniques like RAID and failover clusters to prevent single points of failure. It also discusses backup strategies and disaster recovery plans. Finally, it discusses testing plans and procedures to ensure business continuity in the event of a disaster or outage.

Uploaded by

Alles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Ch 9:

Preparing for Business Continuity


CompTIA Security+: Get Certified Get
Ahead: SY0-401 Study Guide
Darril Gibson
Designing Redundancy
Redundancy
• Duplication of systems to provide availability
• Fault Tolerant
– Prevents a fault from leading to a failure
• Fault
– A device stops working
• Failure
– Users stop receiving service
Examples of Redundant Systems

• Disk redundancy: RAID (Redundant Array of


Independent Disks)
• Server redundancy: Failover cluster
• Power redundancy: Add generator or UPS
• Site redundancy: Add hot, cold, or warm sites
Single Point of Failure
• A single component which can cause a failure
• Examples
– Disk: a server with a single disk drive instead of a
RAID
– Server: a standalone server instead of a cluster
– Power: Relying only on the power grid without
UPS or generator
RAID Types
• RAID-0 (Striping)
– Increases speed, but provides no fault tolerance
• RAID-1 (Mirroring)
– Two disks with identical data
– Fault tolerant, but disk controller use a single
point of failure
– RAID-1 with two disk controllers is called disk
duplexing
RAID Types
• RAID-5
– Three or more disks
– "Parity" data stored on some of the stripes
– If one drive fails, the "Parity" data can be used to
recover the data
– If two or more drives fail, data is lost
RAID Types
• RAID-6
– Like RAID-5, but adds a second parity block
– Continues to operate even if two drives fail
– Requires at least four disks
• RAID-10
– Combines Mirroring (RAID-1) and Striping (RAID-0)
Software vs. Hardware RAID
• Hardware RAID
– Better performance
– Removes load from the operating system
– Often "hot swappable" – replace a disk with zero
downtime
• Software RAID
– Implemented in software by the operating system
– Rarely used
Server Redundancy
• 99.999% uptime (five nines) means 5
minutes/year downtime
• Failover clusters
– Two or more servers
– One or more are active
– One or more are inactive
– When active node fails, inactive nodes take over
Failover Cluster

• Two networks
• Heartbeat goes through internal network
Load Balancers
• Distributes traffic to a cluster of servers
• Scalability and high availability
Power Redundancies
• UPS (Uninterruptible Power Supply)
– Contains batteries
– Provides power for a specified duration if main
power fails, often 10-15 min.
– Can also protect devices from power surges and
spikes
• Generators
– Provide longer-term power during extended
power outages
UPS Purpose
• UPS provides a few minutes of power, so the
server can:
– Shut down cleanly
– Wait while generators are started
– Wait for commercial power to return
Generators
• Diesel generators are common
• Takes some time to start up
• Power takes some time to stabilize
Protecting Data with Backups
Backups
• Extra copies of data
– Off-site storage
• Allow recovery after a data loss
– Usually due to human error
• Fault tolerance is NOT the same as backup
– Only availability
• If you accidentally delete a file, a RAID doesn't
save you
Backup Types
• Usually on tape or removable hard disks
• Full backup
– Complete copy of all the data
• Differential backup
– Backs up data that has changed since the last full
backup
• Incremental backup
Backs up data that has changed since the last full or
incremental backup
Backup Comparison
• Full backup
– Most expensive: uses the most time and tape
– Easiest to restore data
• Differential backup
– Cheaper and faster
– Requires two tapes to restore data
• Incremental Backup
– Cheapest and fastest
– May require several tapes to restore data
Testing Backups
• Restore some data
• Regular tests are essential
• Otherwise backup procedures can fail and
remain unnoticed for a long time
• Symform
– Donate space on your servers to the system
– Store your data on other members' servers
– It's like a RAID with 96 disks; 32 of them are parity
– BUT: I suspect they have a single point of failure at
AWS
Protecting Backups
• Protect backups at the same level as the
original data
• Storage
– Clear labeling
– Physical security
• Transfer
– Protected from physical theft or loss
• Destruction
– Wipe or physically destroy media
Iron Mountain Truck

• Image from timesfreepress.com


Backup Policies
• Which data to backup
• Off-site storage of backups
– In case of fire or flood
• Label media
• Testing
• Retention requirements
• Execution and frequency of backups
• Protection of backups
• Disposing of media
Comparing Business Continuity
Elements
Business Continuity Plan
• Ensures that critical business functions will
continue even after a disaster
– May include temporary measures, like alternate
locations
• Includes Disaster Recovery Plan
– Complete restoration of original location to
service
Disasters
• Fire
• Flood
• Power outage
• Data loss
• Hardware and software failures
• War or terrorist attack
Business Continuity Planning Steps

1. Complete Business Impact Analysis (BIA)


2. Develop recovery strategies
3. Develop recovery plans
4. Test recovery plans
5. Update plans
Business Impact Analysis
• Identify critical functions and services
• Recovery Time Objectives
– How long till systems are recovered
• Recovery Point Objective
– How much recent work will need to be repeated
• BIA doesn't specify solutions
Issues Addressed by BIA
• What assets are included in recovery plans?
• What business functions must continue to
operate?
• Are alternate sites required?
• What data should be backed up?
• Are backup utilities needed (water, gas, etc.)?
Continuity of Operations Plan (COOP)

• Part of a BCP (Business Continuity Plan)


• Focuses on restoring critical business
functions at an alternate site
– Hot site
– Cold site
– Warm site
Video: AT&T's Disaster Recovery Team

• Link Ch 9c
Hot Site
• Equipment installed and running already
• Copies of backup tapes already there or
nearby
• Often another company location
• Fastest recovery time—typically one hour
• Most expensive
Cold Site
• Location with power and connectivity
• When it is used, company must bring in
equipment, software, and data
• Cheapest to maintain
• Most difficult to test
• Slow—takes days to set up
Warm Site
• Compromise between hot site and cold site
• Example: equipment installed but data is out
of date
Mobile and Mirrored Sites
• Mobile site
– Self-contained transportable site
– In a truck or other vehicle
• Mirrored site
– Identical to primary location
– Gets immediate copy of all data
– Always up and operational
– Provides uninterrupted service in case of failure at
primary location
After the Disaster
• Return all business functions to the primary
site
• Move least critical functions first
Disaster Recovery Plan (DRP)
• BCP may include several DRPs
– Specify plans to recover servers
– Recovery steps for different types of disasters,
such as hurricanes or tornadoes
• Hierarchical list of critical systems
• Prioritize systems to restore after an outage
Disaster Recovery Phases
• Activation
• Implement contingencies
• Recovery
• Testing recovered systems
• Documentation and review
Planning for Communications
• Normal methods like email and phones may
be down
• You may need to have a war room people go
to for information
Communicate to:
• Disaster response team members
• Employees
• Customers
• Suppliers
• Media
• Regulatory agancies
IT Contingency Planning
• Focused on recovery for IT (Information
Technology) systems only
• BCP looks at entire organization
Succession Planning
• Non-disaster sense: Identifying people who
can fill key leadership positions
• Business continuity and disaster preparedness
– Defines hierarchical chain of command
– Who can make decisions if some personnel, such
as the CEO, are unavailable
BCP and DRP Testing
• Desktop or tabletop exercise
– Participants talk through a scenario
• Simulation
– Participants go through recovery steps
– Does not affect actual systems
• Full-blown test
– Goes through all the steps
– Determines the amount of time required
Elements of Testing
• Backups
• Server restoration
• Server redundancy
• Alternate sites
Testing Controls
• Cutover test
– Turn off real equipment during a business day to
make sure the backup takes over
– Remove part of the company, such as the HR
server
Escape Plans, Escape Routes, and Drills

• Safety of personnel
• Fire drills
Environmental Controls
Heating, Ventilation, and Air
Conditioning (HVAC)

• Cooling computers makes them more


available
• It is traditional to chill server rooms with
powerful air conditioners
– Employees need to wear sweaters
• Google saves power by running servers much
hotter
– Link Ch 9d
• Image from umich.edu
• Image from nih.gov
Tonnage
• Cooling capacity is rated in tonnage
• One ton is 12,000 British Thermal Units per
hour
– Enough energy to melt 1 ton of ice
• Typical home air conditioner is 3 tons
Humidity
• High humidity
– Condensation on equipment
– Water damage to computers
• Low humidity
– Higher incidence of electrostatic discharge
• Recommended humidity: 45% - 55%
– Link Ch 9e
HVAC and Fire
• HVAC systems often integrated with fire
alarms
• Controls airflow to help prevent rapid spread
of a fire
• May turn off HVAC when a fire is detected
Failsafe/secure vs. Failopen
• Failsafe, Fail secure, Fail closed
– All mean the same thing
– System becomes unavailable when it fails
• Failopen
– System becomes available when it fails
• Example: Card reader locks on doors often fail
open when power fails
– So employees aren't trapped inside
Availability and Failopen
• Failopen
– If availability is more important than preventing
unauthorized use
• Fail closed
– If loss of availability is acceptable to prevent
unauthorized use
– Example: a firewall on a system with sensitive data
Fire Suppression
• A fire requires four components
– Heat
– Oxygen
– Fuel
– Chain reaction creating the fire
Fire Suppression Methods
• Remove the heat
– With water or chemical agents
• Remove the oxygen
– Displacing it with CO2 or another gas
– Common for electrical fires, especially server
rooms
• Remove the fuel (not an option, usually)
• Disrupt the chain reaction
– Some chemicals work this way
Classes of Fires and Fire Extinguishers

• Class A
– Ordinary combustibles
– Wood, paper, cloth, trash, rubber, plastics
• Class B
– Flammable liquids
– Gasoline, propane, solvents, paint,…
Classes of Fires and Fire Extinguishers

• Class C
– Electrical equipment
– Computers, wiring, motors, etc.
– Don't use water on a class C fire because it
conducts electricity and may shock personnel
• Class D
– Combustible metals like magnesium and sodium
– Much more difficult to extinguish
Environmental Monitoring
• HVAC attempts to keep temperature and
humidity constant
• Logs record the actual temperature and
humidity during the day
Shielding
• To prevent EMI (Electromagnetic Interference)
and RFI (Radio Frequency Interference)
• Also prevents unwanted emissions which can
leak data
• Shielded cables reduce this problem
• Fiber-optic cables are immune
• Link Ch 9f
Faraday Cage

• image from digitaltrends.com


RF Shield

• From sleepingelephant.com
Tinfoil Hat Contest at HOPE
TEMPEST
• US Gov't program
• Measures emanations from devices
• A serious security threat

You might also like