RAID
RAID: Redundant Arrays of Inexpensive Disks
this discussion is based on the paper:
A Case for Redundant Arrays of Inexpensive Disks (RAID), David A Patterson, Garth Gibson, and Randy H Katz, In Proceedings of the ACM SIGMOD International Conference on Management of Data (Chicago, IL), pp.109--116, 1988.
2007 A.W. Krings
RAID
!
Motivation
single chip computers improved in performance by 40% per year RAM capacity quadrupled capacity every 2-3 years Disks (magnetic technology)
capacity doubled every 3 years price cut in half every 3 years raw seek time improved 7% every year
Note: values presented in Pattersons paper are dated! Note: paper discusses pure RAID, not smarter implementations, e.g. caching. 2
2007 A.W. Krings
RAID
Amdahls Law: Effective Speedup
f = fraction of work in fast mode k = speedup while in fast mode assume 10% I/O operation if CPU 10x => effective speedup is 5 if CPU 100x => effective speedup is 10
"
Example:
90 % of potential speedup is wasted
2007 A.W. Krings
RAID
!
Motivation
compare mainframe mentality with todays possibilities, e.g. cost, configuration
CPU
Mainframe
Channel Controller
CPU
Small Computer
DMA
Memory
Memory
SCSI
2007 A.W. Krings
RAID
Reliability
Bad news!
e.g. MTTFdisk = 30,000 h MTTF100 = 300 h ( < 2 weeks) MTTF1000 = 30 h Note, that these numbers are very dated. Todays drives are much better. MTBF > 300,000 to 800,000 hours. even if we assume higher MTTF of individual disks, the problem stays. 5
2007 A.W. Krings
RAID
!
RAID Reliability
partition disks into reliability groups and check disks
D = total number of data disks G = # data disks in group C = # check disks in group
2007 A.W. Krings
RAID
!
Target Systems
Different RAID solutions will benefit different target system configurations. Supercomputers
larger blocks of data, i.e. high data rate small blocks of data high I/O rate read-modify-write sequences
Transaction processing
2007 A.W. Krings
RAID
!
5 RAID levels
RAID 1: mirrored disks RAID 2: hamming code for ECC RAID 3: single check disk per group RAID 4: independent read/writes RAID 5: no single check disk
2007 A.W. Krings
RAID
!
RAID level 1: Mirrored Disks
Most expensive option Tandem doubles controllers too Write to both disks Read from one disk Characteristics:
2007 A.W. Krings
S = slowdown. In synchronous disks spindles are synchronized so that the corresponding sectors of a group of disks can be accessed simultaneously. For synchr. disks S = 1. Reads = 2D/S, i.e. concurrent read possible Write = D/S, i.e. no overhead for concurrent write of same data R-Modify-Write = 4D/(3S) Pat88 Table II (pg. 112)
RAID
2007 A.W. Krings
10
RAID
!
RAID level 2: Hamming Code
DRAM => problem with !-particles Solution, e.g. parity for SED, Hamming code for SEC Recall Hamming Code Same idea using one disk drive per bit Smallest accessible unit per disk is one sector
access G sectors, where G = # data disks in a group
If operation on a portion of a group is needed:
1) read all data 2) modify desired position 3) write full group including check info
2007 A.W. Krings
11
Recall Hamming Code
m = data bits k = parity bits
2007 A.W. Krings
12
Compute Check
2007 A.W. Krings
13
RAID
Allows soft errors to be corrected on the fly. Useful for supercomputers, not useful for transaction processing e.g. used in Thinking Machine (Connection Machine) Data Vault with G = 32, C = 8. Characteristics:
Pat88 Table III (pg 112)
2007 A.W. Krings
14
2007 A.W. Krings
15
RAID
!
RAID level 3: Single Check Disk per Group
Parity is SED not SEC! However, often controller can detect if a disk has failed
information of failed disk can be reconstructed extra redundancy on disk, i.e. extra info on sectors etc. read data disks to restore replacement compute parity and compare with check disk if parity bits are equal => data bit = 0 otherwise => data bit = 1
If check disk fails
If data disk fails
2007 A.W. Krings
16
RAID
Since less overhead, i.e. one check disk only => Effective performance increases Reduction in disks over L2 decreases maintenance Performance same as L2, however, effective performance per disk increases due to smaller number of check disks Better for supercomputers, not good for transaction proc. Maxtor, Micropolis introduced first RAID-3 in 1988 Characteristics:
Pat88 Table IV (pg 113)
2007 A.W. Krings
17
2007 A.W. Krings
18
RAID
!
RAID level 4: Independent Reads/Writes
Pat88 fig 3 pg. 113 compares data locations Disk interleaving has advantages and disadvantages Advantage of previous levels:
large transfer bandwidth all disks in a group are accessed on each operation (R,W) spindle synchronization
"
Disadvantages of previous levels:
if none => probably close to worse case average seek times, access times (tracking + rotation)
Interleave data on disks at sector level Uses one parity disk 19
2007 A.W. Krings
2007 A.W. Krings
20
RAID
for small accesses
need only access to 2 disks, i.e. 1 data & parity new parity can be computed from old parity + old/new data compute: Pnew = dataold XOR datanew XOR Pold
e.g. small write
1) read old data + parity 2) write new data + parity in parallel
Bottleneck is parity disk e.g. small read
only read one drive (data) Pat88 Table V (pg 114)
Characteristics:
2007 A.W. Krings
21
2007 A.W. Krings
22
RAID
!
RAID level 5: No Single Check Disk
Distributes data and check info across all disks, i.e. there are no dedicated check disks. Supports multiple individual writes per group Best of 2 worlds
small Read-Modify-Write large transfer performance 1 more disk in group => increases read performance Pat88 Table VI (pg 114)
Characteristics:
2007 A.W. Krings
23
2007 A.W. Krings
24
RAID
!
Patterson Paper
discusses all levels on pure hardware problem refers to software solutions and alternatives, e.g. disk buffering with transfer buffer the size of a track, spindle synchronization of groups not necessary improving MTTR by using spares low power consumption allows use of UPS relative performance shown in Pat88 fig. 5 pg. 115
2007 A.W. Krings
25
2007 A.W. Krings
26
RAID
!
Summary
Data Striping for improved performance
distributes data transparently over multiple disks to make them appear as a single fast, large disk improves aggregate I/O performance by allowing multiple I/Os to be serviced in parallel
" "
independent requests can be serviced in parallel by separate disks single multiple-block block requests can be serviced by multiple disks acting in coordination
Redundancy for improved reliability
large number of disks lowers overall reliability of disk array thus redundancy is necessary to tolerate disk failures and allow continuous operation without data loss
2007 A.W. Krings
27
RAID
!
other RAIDs
RAID 0
employs striping with no redundancy at all claim of fame is speed alone has best write performance, but not the best read performance
"
why? (other RAIDs can schedule requests on the disk with the shortest expected seek and rotational delay)
RAID 6 (P + Q Redundancy)
uses Reed-Solomon code to protect against up to 2 disk failures using the bare minimum of 2 redundant disks.
2007 A.W. Krings
28
Source Che94
2007 A.W. Krings
29
RAID
String management
2007 A.W. Krings
30
RAID
!
Case Studies
Thinking Machines Corp.: TMC ScaleArray
RAID level 3 for CM-5 massively parallel processor (MPP) high bandwidth for large files OS provides file system that can deliver data from a single file to multiple processors from multiple disks uses 4 SCSI-2 strings with 2 disks each (= 8 disks) these 4 strings are attached to an 8MB disk buffer 3 of these units are attached to the backbone (=> 3x8=24 disks) normal configuration: 22 data, 1 parity, 1 spare
2007 A.W. Krings
31
RAID
!
Case Studies
HP: TickerTAIP/DataMesh
material shown is from The TickerTAIP Parallel RAID Architecture, Cao [Link]., ACM Trans. on Computer Systems, Vol. 12, No.3, August 1994, pp.236-269. traditional RAID architecture
"
host interface
bottleneck single point of failure
2007 A.W. Krings
32
RAID
!
Case Studies cont.
TickerTAIP/DataMesh Issues
getting away from centralized architecture different algorithms for computing RAID parity techniques for establishing request atomicity, sequencing, and recovery disk-level request-scheduling algorithms inside the array
2007 A.W. Krings
33
RAID
!
Case Studies
HP: TickerTAIP/DataMesh
TickerTAIP array architecture
TickerTAIP system environment
2007 A.W. Krings
34
RAID
!
Case Studies
HP: AutoRAID
provide a RAID that will provide excellent performance and storage effeciency in the presence of dynamically changing workloads provides both level 1 and level 5 RAID dynamically shift data to the appropriate level dynamically shift data to level 5 if approaching maximum array capacity parity logging hot pluggable disks, spare controller, dynamically addapts to added capacity Wilkes, J. et. al. The HP AutoRAID hierarchical storage system, ACM Trans. on Comuter systems, 14, 1 (Feb.), 108-136,
2007 A.W. Krings
35
RAID
!
Case Studies
StorageTek: Iceberg 9200 Disk Array Subsystem
using 5.25-inch disks to look like traditional IBM mainframe disks implements an extended RAID level 5 and level 6 disk array array consists of 13 data drives, P and Q drives, and a hot spare data, parity and Reed-Solomon coding are stiped across the 15 active drives
2007 A.W. Krings
36
RAID
!
other RAIDs
because of limitations of each RAID level on its own, several flavors of RAID have appeared which attempt to combine the best performance attributes e.g. RAID 0+1
combine RAID 0 striping with RAID 1 mirroring write coalescing uses write buffering to accumulate or coalesce multiple data blocks writes data in one chunk
e.g. RAID3/5
2007 A.W. Krings
37