0% found this document useful (0 votes)
85 views

Disks and RAID: CS2100 - Computer Organization

This document provides an overview of magnetic disks and RAID (Redundant Array of Inexpensive Disks). It discusses the components and operation of magnetic disks, including sectors, tracks, platters, and read/write heads. Typical specifications like capacity, rotational speed, and latency are presented. RAID levels 0 through 5 are summarized, explaining how each provides data striping and redundancy to improve performance, reliability, and availability compared to a single disk. RAID 0 uses striping for higher throughput without redundancy. RAID 1 uses mirroring to allow recovery from single disk failures. Higher RAID levels use parity calculations to recover data with slightly lower write performance but higher reliability than a single disk.

Uploaded by

amanda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Disks and RAID: CS2100 - Computer Organization

This document provides an overview of magnetic disks and RAID (Redundant Array of Inexpensive Disks). It discusses the components and operation of magnetic disks, including sectors, tracks, platters, and read/write heads. Typical specifications like capacity, rotational speed, and latency are presented. RAID levels 0 through 5 are summarized, explaining how each provides data striping and redundancy to improve performance, reliability, and availability compared to a single disk. RAID 0 uses striping for higher throughput without redundancy. RAID 1 uses mirroring to allow recovery from single disk failures. Higher RAID levels use parity calculations to recover data with slightly lower write performance but higher reliability than a single disk.

Uploaded by

amanda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

CS2100 Computer Organization

Disks and RAID

Review: Major Components of a Computer


Processor
Control

Devices
Memory

Datapath

Output
Input

Secondary
Memory
(Disk)

Main
Memory

Cache

Magnetic Disk

Purpose

Long term, nonvolatile storage


Lowest level in the memory hierarchy

Sector

- slow, large, inexpensive

General structure

Track

A rotating platter coated with a magnetic surface


A moveable read/write head to access the information on the disk

Typical numbers

1 to 4 (1 or 2 surface) platters per disk of 1 to 5.25 in diameter


(3.5 dominate in 2004)
Rotational speeds of 5,400 to 15,000 RPM
10,000 to 50,000 tracks per surface
- cylinder - all the tracks under the head at a given point on all surfaces

100 to 500 sectors per track


- the smallest unit that can be read/written (typically 512B)
3

The underlying principle of operation:


Magnetism

A prettier picture

Longitudinal vs Perpendicular

10

11

12

Magnetic Disk Characteristic

Disk read/write components

Controller
+
1. Seek time: position the head over the
Cache
proper track (3 to 14 ms avg)
- due to locality of disk references
the actual average seek time may
be only 25% to 33% of the
advertised number
2.

Platter

0.5/5400RPM = 5.6ms

to

0.5/15000RPM = 2.0ms

Transfer time: transfer a block of bits (one or more sectors)


under the head to the disk controllers cache (30 to 80 MB/s are
typical disk transfer rates)
-

the disk controllers cache takes advantage of spatial locality in


disk accesses

4.

Cylinder

Headunder
Rotational latency: wait for the desired sector to rotate
the head ( of 1/RPM converted to ms)
-

3.

Track
Sector

cache transfer rates are much faster (e.g., 320 MB/s)

Controller time: the overhead the disk controller imposes in


performing a disk I/O access (typically < .2 ms)
13

Typical Disk Access Time

The average time to read or write a 512B sector for a


disk rotating at 10,000RPM with average seek time of
6ms, a 50MB/sec transfer rate, and a 0.2ms controller
overhead
Avg disk read/write = 6.0ms + 0.5/(10000RPM/(60sec/minute) )+
0.5KB/(50MB/sec) + 0.2ms = 6.0 + 3.0 + 0.01 + 0.2 = 9.21ms

If the measured average seek time is 25% of the


advertised average seek time, then
Avg disk read/write = 1.5 + 3.0 + 0.01 + 0.2 = 4.71ms

The rotational latency is usually the largest component of


the access time
15

Magnetic Disk Examples (www.seagate.com)


Characteristic

Seagate ST37 Seagate ST32 Seagate ST94

Disk diameter (inches)

3.5

3.5

2.5

Capacity (GB)

73.4

200

40

# of surfaces (heads)

Rotation speed (RPM)

15,000

7,200

5,400

Transfer rate (MB/sec)

57-86

32-58

34

Minimum seek (ms)

0.2r-0.4w

1.0r-1.2w

1.5r-2.0w

Average seek (ms)

3.6r-3.9w

8.5r-9.5w

12r-14w

MTTF (hours@25oC)

1,200,000

600,000

330,000

Dimensions (inches)

1x4x5.8

1x4x5.8

0.4x2.7x3.9

10

20?/12/-

12/8/1

2.4/1/0.4

16

17

1.9

1.4

0.2

GB/cu.inch
Power: op/idle/sb (watts)
GB/watt
Weight (pounds)

16

Disk Latency & Bandwidth Milestones


CDC Wren

SG ST41

SG ST15

SG ST39

SG ST37

RSpeed (RPM)

3600

5400

7200

10000

15000

Year

1983

1990

1994

1998

2003

Capacity (Gbytes)

0.03

1.4

4.3

9.1

73.4

Diameter (inches)

5.25

5.25

3.5

3.0

2.5

ST-412

SCSI

SCSI

SCSI

SCSI

Bandwidth (MB/s)

0.6

24

86

Latency (msec)

48.3

17.1

12.7

8.8

5.7

Interface

Patterson, CACM Vol 47, #10, 2004

Disk latency is one average seek time plus the rotational


latency.

Disk bandwidth is the peak transfer time of formatted


data from the media (not from the cache).
17

Latency & Bandwidth Improvements

In the time that the disk bandwidth doubles the latency


improves by a factor of only 1.2 to 1.4

18

Aside: Media Bandwidth/Latency Demands

Bandwidth requirements

High quality video


- Digital data = (30 frames/s) (640 x 480 pixels) (24-b color/pixel)
= 221 Mb/s (27.625 MB/s)

High quality audio


- Digital data = (44,100 audio samples/s) (16-b audio samples)
(2 audio channels for stereo) = 1.4 Mb/s (0.175 MB/s)

Compression reduces the bandwidth requirements considerably

Latency issues

How sensitive is your eye (ear) to variations in video (audio)


rates?

How can you ensure a constant rate of delivery?

How important is synchronizing the audio and video streams?


- 15 to 20 ms early to 30 to 40 ms late is tolerable
19

Dependability, Reliability, Availability

Reliability measured by the mean time to failure


(MTTF). Service interruption is measured by mean time
to repair (MTTR)

Availability a measure of service accomplishment


Availability = MTTF/(MTTF + MTTR)

To increase MTTF, either improve the quality of the


components or design the system to continue operating
in the presence of faulty components
1.

Fault avoidance: preventing fault occurrence by construction

2.

Fault tolerance: using redundancy to correct or bypass faulty


components (hardware)

Fault detection versus fault correction

Permanent faults versus transient faults


20

RAIDs: Disk Arrays


Redundant Array of
Inexpensive Disks

Arrays of small and inexpensive disks

Increase potential throughput by having many disk drives


- Data is spread over multiple disk
- Multiple accesses are made to several disks at a time

Reliability is lower than a single disk

But availability can be improved by adding redundant


disks (RAID)

Lost information can be reconstructed from redundant information

MTTR: mean time to repair is in the order of hours

MTTF: mean time to failure of disks is tens of years


21

RAID: Level 0 (No Redundancy; Striping)


blk1

blk2

blk3

blk4

Multiple smaller disks as opposed to one big disk

Spreading the blocks over multiple disks striping means that


multiple blocks can be accessed in parallel increasing the
performance
- A 4 disk system gives four times the throughput of a 1 disk system

Same cost as one big disk assuming 4 small disks cost the
same as one big disk

No redundancy, so what if one disk fails?

Failure of one or more disks is more likely as the number of


disks in the system increases
22

RAID: Level 1 (Redundancy via Mirroring)


blk1

blk1
redundant (check) data

Uses twice as many disks as RAID 0 (e.g., 8 smaller


disks with second set of 4 duplicating the first set) so
there are always two copies of the data

# redundant disks = # of data disks so twice the cost of one big


disk
- writes have to be made to both sets of disks, so writes would be only
1/2 the performance of RAID 0

What if one disk fails?

If a disk fails, the system just goes to the mirror for the data
23

RAID: Level 0+1 (Striping with Mirroring)


blk1

blk2

blk3

blk4

blk1

blk2

blk3

blk4

redundant (check) data

Combines the best of RAID 0 and RAID 1, data is striped


across four disks and mirrored to four disks

Four times the throughput (due to striping)

# redundant disks = # of data disks so twice the cost of one big


disk
- writes have to be made to both sets of disks, so writes would be only
1/2 the performance of RAID 0

What if one disk fails?

If a disk fails, the system just goes to the mirror for the data
24

RAID: Level 2 (Bit-level striping Parity)


blk1,b0

blk1,b1

blk1,b2

blk1,b3

disk fails

1
(odd)
bit parity disk

Cost of higher availability is reduced to 1/N where N is


the number of disks in a protection group

# redundant disks = 1 # of protection groups


- writes require writing the new data to the data disk as well as
computing the parity, meaning reading the other disks, so that the
parity disk can be updated

Can tolerate limited disk failure, since the data can be


reconstructed
- reads require reading all the operational data disks as well as the
parity disk to calculate the missing data that was stored on the failed
disk
25

RAID: Level 3 (Byte-level striping Parity)


blk1,B0

blk1,B1

blk1,B2

blk1,B3

byte parity disk

Similar to RAID 2 but at uses bytes instead of bits

26

RAID: Level 4 (Block-Interleaved Parity)


blk1

blk2

blk3

blk4

block
parity disk

Cost of higher availability still only 1/N but the parity is


stored as blocks associated with sets of data blocks

Four times the throughput (striping)

# redundant disks = 1 # of protection groups

Supports small reads and small writes (reads and writes that
go to just one (or a few) data disk in a protection group)
- by watching which bits change when writing new information, need
only to change the corresponding bits on the parity disk
- the parity disk must be updated on every write, so it is a bottleneck
for back-to-back writes

Can tolerate limited disk failure, since the data can be


reconstructed

27

Small Writes

RAID 2 small writes

New D1 data
D1

3 reads and
2 writes
involving all
the disks

D2

D3

D4

D4

D1

D2

D3

RAID 4 small writes


New D1 data
D1

2 reads and
2 writes
involving just
two disks

D2

D3

D1

D4

D2

D3

D4

P
28

RAID: Level 5 (Distributed Block-Interleaved


Parity)
one of these assigned as the block parity disk

Cost of higher availability still only 1/N but the parity block
can be located on any of the disks so there is no single
bottleneck for writes

Still four times the throughput (striping)

# redundant disks = 1 # of protection groups

Supports small reads and small writes (reads and writes that
go to just one (or a few) data disk in a protection group)

Allows multiple simultaneous writes as long as the


accompanying parity blocks are not located on the same disk

Can tolerate limited disk failure, since the data can be


reconstructed
29

Distributing Parity Blocks


RAID 4

RAID 5

P0

P0

P1

P1

10

11

12

P2

10

13

14

15

16

P3

13

P3

P2
14

11

12

15

16

By distributing parity blocks to all disks, some small


writes can be performed in parallel
30

RAID 6

Double parity

Two disk assigned to parity

Similar to RAID 5

31

Summary

Four components of disk access time:

Seek Time: advertised to be 3 to 14 ms but lower in real systems

Rotational Latency: 5.6 ms at 5400 RPM and 2.0 ms at 15000


RPM

Transfer Time: 30 to 80 MB/s

Controller Time: typically less than .2 ms

RAIDS can be used to improve availability

RAID 0 and RAID 5 widely used in servers, one estimate is that


80% of disks in servers are RAIDs

RAID 1 (mirroring) EMC, Tandem, IBM

RAID 3 Storage Concepts

RAID 4/5/6 Network Appliance / Storage Servers

RAIDS have enough redundancy to allow continuous


operation, but not hot swapping
32

A word about Flash memory

Introduction

Invented in 1984 by Dr. Fujio Masuoka at Toshiba

Unlike SRAM or DRAM, this is a non-volatile, solid state


memory

Unlike hard disks, no moving parts

Two main types: NOR and NAND

34

The Standard Metallic Oxide Semiconductor Field Effect Transistor

Gate oxide insulator


Gate

N source
+

+
+

P substrate
-

N+ drain

N-Channel Metallic Oxide Semiconductor Field Effect Transistor


35

The Floating Gate Transistor

Gate oxide insulator


Control Gate

Traps charges

Float Gate

N+ source

+
+

P- substrate

Symbol for the floating gate transistor

N+ drain

36

Reading the Floating Gate Transistor

Read-out current

+++++++
-

+++++++

+ Charges
sensed, i.e.,
logic 1

CG
FG

--------------------

N+ source

+
+

P- substrate

N+ drain

Default state = no charges trapped in FG = logic 1


37

Reading the Floating Gate Transistor

No conducting
channel formed,
i.e., no current
flow

Read-out current
cancelled out by trapped
charges in FG

+++++++

N+ source

+++++++

CG

--------

FG

+
+

P- substrate

No charges
sensed, i.e.,
logic 0

N+ drain

Programmed state = charges trapped in FG = logic 0


38

Programming the Floating Gate Transistor (Setting it to 0)

Some electrons from source+12V drain flow will be injected into the
floating gate
+12V

Hot electron flow


0V

Control Gate
Float Gate

N+ source

+
+

P- substrate

N+ drain

39

Erasing the Floating Gate Transistor (Setting it to 1)


Trapped charges drifts away
due to quantum electron
tunneling
+12V

0V
Not connected

Control Gate
Float Gate

N+ source

+
+

P- substrate

N+ drain

40

Flash memory

NOR

RAM-like, can do execution-in-place

typically have slower write speeds

NAND

all operations must be performed in a block-wise fashion


- Main design goal: density

require bad block management

greater write endurance than NOR

41

NOR Flash

Bit line
Word line
0

Word line
1

Word line
2

Word line
3

Word line
4

Word line
5

Word line
6

Word line
7

To read bit x, pull word line x to high. Depending on the stored charges of
the corresponding FG, bit line may be connected to ground.
NOR in the sense that word line x is high, the rest is low and output will
be low if corresponding FG has no stored charge.

42

NAND Flash
Bit line
Word line
0

Word line
1

Word line
2

Word line
3

Word line
4

Word line
5

Word line
6

Word line
7

To read bit x, all other floating transistors gate are pulled to very high
voltage so as to ensure a conduction (i.e., whatever stored charges are
not sufficient to cancel out the CG voltage). The word line x is pull to just
high enough to test the stored charge of its FG. If no charge stored in FG
x, then it will conduct and bit line will be connected to ground.

43

Tests on Nokia N800

MMC:

Internal flash/filesystem

2mb read: 10.77/s write: 24M/s


1mb read: 10.59M/s write: 24M/s
500k read: 10.65M/s write: 22M/s

2mb read: 11.35M/s write: 2.18M/s


1mb read: 11.25M/s write: 4.4M/s
500k read: 10.63M/s write: 4.3M/s

RAM read and write speed is roughly 70-80M/s

44

Flash density

Source: Samsung Semiconductors


website

45

The latest

46

From Samsungs Prediction

By Year 2012
1 Tb NAND flash = 128 GB chip
1TB or 2TB disk
for ~$400
or 128GB disk for $40
or 32GB disk for $5

47

Write Endurance

Today typically 2 million write cycles

Increasing capacity + wear leveling


> 10 years lifetime
- longer than IT replacement cycles

Wear leveling

Attempt to equalize the writes to every cell

48

Using Flash

Used as block devices

The three previous characteristics + blocking

treated as permanent storage lower in the


memory hierarchy

Log-structured file systems

Journaling Flash File System version 2: JFFS2 (part of Linux kernel


since 2.4.10)

Yet Another Flash File System 2: YAFFS2

49

Log-structured File System (1990)

Hypothesis:

Most reads will be satisfied by memory caching

Writes will be the main problem

Solution:

Instead of laying out a file on storage, log writes with versioning


info

On read, playback the log

50

Log-structured File Systems

Advantages

Easier to recover deleted files and crashes

Disadvantages

Assumption that reads can be satisfied from cache may not hold
in some workloads

(JFFS2) Need to rebuild key info at mount time

51

Summary

Flash will become an important component of the


memory hierarchy

A potential threat to hard disks

52

CS2100
Summary

What did we learn?


Transistor

Logic Gate

Circuits

Processor

Memory Hierarchy

You might also like