0% found this document useful (0 votes)
14 views

Scheduling

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Scheduling

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 67

Storage Systems

CSE 598d, Spring 2007


Lecture 11: Disk scheduling
Feb 27, 2007
(ACK: Several slides borrowed from
Shiva Chaitanya)
Disk Access Time:
Components
• CPU time to issue and process I/O
• contention for controller
• contention for bus
• contention for memory
• verifying block correctness with
checksums (retransmissions)
• waiting in scheduling queues
• ...
Disk Scheduling
Seek time is a dominant factor of total disk I/O time

Let operating system or disk controller choose which


request
to serve next depending on the head’s current position
and requested block’s position on disk

Disk scheduling is much more difficult than CPU


scheduling

– a mechanical device – hard to determine (accurate)


access times
– disk accesses cannot be preempted – runs until it
finishes
– disk I/O often the main performance bottleneck
Scheduling at Multiple
Locations!

S/W, H/W Components between an


application and the disk:
- File system
- Device driver
- SCSI bus Scheduling locations
- RAID controller (if employing RAID)
- Some bus
- Disk controller

Why?
- Why not do it only at FS/DD level?
- Why not do it only within the disk?
Scheduling at Multiple
Locations!
Why?
 Key ideas that disk scheduling employs:
 Request re-ordering for seek/positioning
minimization
 Exploit temporal locality
 Anticipation for sequential streams
 Introduce non-work conserving behavior!
 Exploit spatial locality
 Coalesce consecutively placed requests
 Free-block scheduling
 Different optimizations are best done at
different locations
 Furthermore, the best location to do an
optimization depends on the workload!
Goals
– Short response time
– High overall throughput
– Fairness (equal probability for all
blocks to be accessed in the same
time)

Tradeoff: Throughput vs. Fairness


Socialism vs. Capitalism?
Disk Scheduling
Several traditional algorithms
– First-Come-First-Serve (FCFS)
– Shortest Seek Time First (SSTF)
• Shortest Positioning Time First
(SPTF)
– SCAN
– C-SCAN
– LOOK
– C-LOOK
–…
First–Come–First–
Serve (FCFS)
FCFS serves the first arriving request first:
 Long seeks
 Short average response time

cylinder number
ming requests (in order of arrival): 1 5 10 15 20 25

12 14 2 7 21 8 24
time
Shortest Seek Time
First (SSTF)
SSTF serves closest request first:
 short seek times
 longer maximum seek times – may lead to starvation

cylinder number
1 5 10 15 20 25
ming requests (in order of arrival):

12 14 2 7 21 8 24
time
SCAN
SCAN moves head edge to edge and serves requests on the way
 bi-directional
 compromise between response time and seek time optimizati

cylinder number
1 5 10 15 20 25
ming requests (in order of arrival):

12 14 2 7 21 8 24
time
C–SCAN
Circular-SCAN moves head from edge to edge
 serves requests on one way – uni-directional
 improves response time (fairness)

cylinder number
1 5 10 15 20 25
ming requests (in order of arrival):

12 14 2 7 21 8 24
time
LOOK and C–LOOK
LOOK (C-LOOK) is a variation of SCAN (C-SCAN):
 same schedule as SCAN
 does not run to the edges
 stops and returns at outer- and innermost request
 increased efficiency
ming requests (in order of arrival):
cylinder number
1214 2 7 21 8 24 1 5 10 15 20 25
time
V–SCAN(R)
V-SCAN(R) combines SCAN (or LOOK) and SSTF
– define a R-sized unidirectional SCAN window,
i.e., C-SCAN, and use SSTF outside the window

– Example: V-SCAN(0.6)
• makes a C-SCAN (C-LOOK) window over 60 % of the
cylinders
 uses SSTF for requests outside the window

 V-SCAN(0.0) equivalent with SSTF


– V-SCAN(1.0) equivalent with SCAN
– V-SCAN(0.2) is supposed to be an appropriate
configuration
Shortest Positioning
Time First (SPTF)
 Given the complete knowledge of the
actual mapping of data blocks onto the
media, the scheduler can choose the
request with the minimum positioning
delay (combined seek and rotational
latency)

 SPTF, like SSTF suffers from poor


starvation resistance. To reduce
response time variance, priority can be
given to requests that have been in
pending queue for excessive periods of
time
Aged Shortest
Positioning Time
First (ASPTF)
 ASPTF(W) adjusts each positioning
delay (Tpos) by subtracting a weighted
value corresponding to the amount of
time the request has been waiting for
service (Twait)

Teff = Tpos – (W*Twait)

 For large values of W, ASPTF behaves


like FCFS
Scheduling in Modern
Disk Drives
Features of current disk drives
that affect traditional
scheduling algorithms

 Host interface
 Data layout
 On-Board Cache

Ref: B.L. Worthington, Greg Ganger, N. Patt : Scheduling Algorithms for Modern
Disk Drives ACM Sigmetrics 1994
Host interface

 Controller presents a request to the disk


drive in terms of the starting logical block
number and request size

 Subsequent media access hidden from the


host

 Scheduling entities outside of the drive


have little knowledge of overhead delays
Data Layout
 Many systems assume sequentiality of LBN-to-
PBN mappings in seek reducing algorithms

 Aggressive algorithms require highly


accurate knowledge of the data layout which is
typically hidden

 Complexity of mappings increased by zoned


recording, track/cylinder skew and defect
management
On-Board Cache
Memory within disk drives has progressed from small
speed-matching buffers to megabytes of cache memory

Disk logic typically prefetches data into cache to


satisfy sequential read requests. This affects
scheduling in two ways:
 Position of the head cannot be determined
easily
 Requests that can be satisfied by cache could
be given higher priority
Scheduling by Logical
Block Number
• As expected,
FCFS quickly
saturates as
workload
increases
• SSTF provides
lower mean
response time
Scheduling by Logical
Block Number

FCFS has the


lowest
coefficient
for lighter
workloads
As FCFS begins
to saturate
and its
response time
variance
increases, C-
LOOK emerges
as a better
algorithm for
response time
Scheduling with Full
knowledge

As W
increases, the
average
response time
slowly grows,
though
variance drops
Scheduling with Full
Knowledge
Modern Disk
Scheduling
 In modern drives, C-LOOK best exploits the
prefetching cache for workloads with significant
read sequentiality

 SSTF and LOOK perform better for random


workloads

 Powerful disk controllers use variants of


Shortest Positioning Time First (SPTF).
Freeblock
Scheduling
 An approach to utilizing more of a
disk’s potential media bandwidth

 Fill rotational latency periods with


useful media transfers for background
applications

 It has been observed that 20-50% of a


never-idle disk’s bandwidth can often be
provided to background applications without
affecting foreground response times

Ref: Christopher R. Lumb, Jiri Schindler, Greg Ganger : “Towards Higher


Disk Head Utilization: Extracting Free Bandwidth From Busy Disk
Drives”, OSDI , 2000
Disk-intensive
background tasks
 Disk Reorganization
 File system cleaning
 Backup
 Prefetching
 Write-back
 Integrity Checking
 RAID scrubbing
 Virus detection
 Index Reorganization
 …
Free Bandwidth
 Time required for a disk media access
Taccess = Tseek + Trotate + Ttransfer

 Freeblock scheduling uses the Trotate


component
of disk access to transfer additional data

 Instead of just waiting for desired sector


to arrive, this technique transfers the
intermediate sectors
Steps in Freeblock
Scheduling
 Predict how much rotational latency
will occur before the next foreground media
transfer
Requires detailed knowledge of disk
attributes, including layout algorithms and
time dependent mechanical positioning
overheads

 Squeeze additional media transfers into


that time

 Get to the destination track in time


Anticipatory Disk
Scheduling
Reorder available disk requests for

 performance by seek optimization,


 proportional resource allocation, etc.

Any policy needs multiple outstanding


requests to make good decisions!

Ref: Sitaram Iyer, Peter Druschel : “Anticipatory scheduling : A disk


scheduling framework to overcome deceptive idleness in synchronous I/O”,
SOSP 2001
With enough
requests…
issued by process A issued by process B

tim
e

s ee
k

location on
disk
Throughput = 21 MB/s (IBM Deskstar d
With synchronous
I/O…

issued by process
too
A issued by process B
lat
e!

forced
!
schedu forced
le !

E.g., Throughput = 5 MB/s


Next
Deceptive idleness

Process A is about to issue next request.

but

Scheduler hastily assumes that process A has


no further requests!
Proportional
scheduler

Allocate disk serviceDeceptive idleness


in say 1:2 ratio: causes 1:1 allocation:
A B
A B

Next
Anticipatory
scheduling
Key idea: Sometimes wait for process
whose request was last serviced.

Keeps disk idle for short intervals.

But with informed decisions, this:


 Improves throughput
 Achieves desired proportions
Cost-benefit
analysis
Balance expected benefits of waiting
against cost of keeping disk idle.

Tradeoffs sensitive to scheduling


policy
e.g., 1. seek optimizing scheduler
2. proportional scheduler
Statistics

For each process, measure:

1. Expected median and 95percentile


thinktime
Number of
requests

Thinktim
Median 95percentilee

2. Expected positioning time

last nex
t
Cost-benefit analysis
for seek optimizing
scheduler
best := best available request chosen by
scheduler
next := expected forthcoming request from
process whose request was last serviced
Benefit =
best.positioning_time — next.positioning_time

Cost = next.median_thinktime

Waiting_duration =
(Benefit > Cost) ? next.95percentile_thinktime : 0
Proportional
scheduler
Costs and benefits are different.

e.g., proportional scheduler:

Wait for process whose request was last serviced,


1. if it has received less than its allocation, and
2. if it has think time below a threshold (e.g., 3ms)

Waiting_duration = next.95percentile_thinktime
Prefetch
Overlaps computation with I/O.
Side-effect:
avoids deceptive idleness!

 Application-driven
 Kernel-driven
Conclusion

Anticipatory scheduling:

 overcomes deceptive idleness


 achieves significant performance
improvement on real applications
 achieves desired proportions
 and is easy to implement!
Fairness : Evaluating
disk scheduling
algorithms
 Storage system designers prefer to keep the
queue length at disks small regardless of the
load

 When queuing threshold is reached at the disk,


the controller or the device driver queues the
requests until disk queue is processed

 Low queuing threshold minimizes request


starvation at the disk level when unfair
scheduling algorithms are deployed
Ref: Alma Riska, Erik Riedel : “It’s not fair – evaluating efficient disk
scheduling” , MASCOTS 2003
Results
 Queuing more requests at the disk provides
the scheduling algorithms more information
used for better disk resource utilization

 Percentage of requests starved remains


small even if longer queues build up at
the disk

 Overall request starvation is independent


from the queuing threshold at the disk
Storage subsytem
architecture

Queues at various levels


Outstanding requests
queued at disk and at
device driver in a single
disk system
And, at the disks and the
controller(s) in a
multiple disk system
Impact of queuing
thresholds

Average load of 16 outstanding requests in Average load of 64 outstanding requests in


system system
Response time
distribution
 Higher the load the larger the gap between
the performances of different scheduling
algorithms

 Fair and simple FCFS yields longest average


request response time

 Best performance obtained when increasing


the queue threshold under SPTF

 How about request starvation and variability


in the request response time ?
Response time
distribution

Tail of response time distribution with Tail of response time distribution with
average load of 16 outstanding requests and average load of 16 outstanding requests and
threshold of 8 threshold of 16
Observations ..
 Majority of requests under FCFS exhibit long
response times, while seek-reducing algorithms
result in majority of short response times

 More than 90% of requests under SPTF have shorter


response times than FCFS and only 1% exhibit upto
double the response times in FCFS

 Amount of starvation in position-based scheduling


algorithms for both queuing thresholds is the same
relative to FCFS

 Hence, queuing more requests improves disk


performance without introducing more request
starvation
Scheduling at
Device driver level
 Depends on workload and filesystem layout
 Eg, with SCAN, seek times to sectors in the
middle of the disks are shorter
 OS could choose between algorithms based on
current queue
 Likely to be expensive in CPU cycles
 Queue changes as new requests arrive
 SSTF or SCAN are reasonable defaults
 Allow algorithm selection as part of OS tuning
 FreeBSD: C-SCAN
 Linux 2.2 :SCAN
 Linux 2.6 : four different versions of elevator
algorithm
Multiple Locations
 Positioning-based optimizations best done within the disk
 Seek-based optimizations best done at device driver
 Why do scheduling within FS?
 Device and DD independent
 Aware of buffer cache
 Application isolation
 Disk queue length crucial
 Short queue results in degraded throughput
 Locally good but globally bad schedules
 Long queue results in unfairness
 Non-work conservation can improve fairness and throughout!
 Anticipatory scheduling
 Achieving proportional fairness non-trivial
 Solutions based on hierarchy of queues, anticipatory scheduling can
help
 Request coalescing can result in great improvement in
throughput
 FS and device driver are good places
 Improve the sequentiality of the request stream seen by the disk
 Free-block scheduling can improve throughput
 Can view this as a “corrector” for the non work conserving nature of
disk
Additional slides
on free-block
scheduling
Illustration of two
freeblock scheduling
possibilities
Characteristics of
tasks
 Low priority
 Freeblock requests will only be served
opportunistically
 Not appropriate for a set of equally
important requests

 Large sets of desired blocks


 Larger the set of disk locations desired,
higher the probability to find a free bandwidth
opportunity
Desired
Characteristics …
 No particular order of access
 Ordering requirements restrict the set of
requests that can be considered by
scheduler
 Effectiveness of freeblock scheduling
directly related to number of outstanding
requests

 Small working memory footprints


 Need to buffer multiple blocks before
processing creates artificial ordering
requirements due to memory limitations
A Simple Interface

No call into the freeblock scheduler subsystem


waits for a disk access. Calls return immediately
Freeblock read requests do not specify memory
locations for read data. Completion callbacks
provide pointers to buffers owned by freeblock
scheduling subsystem
Applications
 Scanning applications : tasks that scan large
portion of disk contents like report
generation, RAID scrubbing, virus detection,
tamper detection and backup
 Internal storage optimization: reorganizing
stored data to improve performance, e.g placing
related data contigously, placing hot data near
the center of the disk, replicating data for
subsequent reads, ..etc
 Prefetching and Prewriting: prewriting is early
writing out of dirty blocks under the
assumption they will not be overwritten or
deleted before writeback is necessary
Availability of
free bandwidth
 Availability of potential free
bandwidth = Total bandwidth *
Fraction of time spent on rotational
latency
 Results in the next few slides
obtained using Disksim
 Default disk drive used is Quantum
Atlas 10k
 Default workload consists of 10,000
foreground requests issued one at a
time with uniform distribution of
starting locations
Impact of disk
characteristics

Overall, about one third of each


disk’s head usage is on rotational
latency
Characteristics of
simulated disk drives
Impact of workload
characteristics
Impact of scheduling
algorithm
• C-LOOK and SSTF
reduce seek times
without affecting
transfer times and
rotational latencies
• SPTF tends to
decrease both
overhead components.
Figure shows
rotational latency
decreases to 22%
Feasibility
 Freeblock scheduling relies heavily on
ability to accurately predict positioning
delays

 Firmware of most disk drives now supports


SPTF which requires similar predictions

 Freeblock scheduling resembles advanced


disk schedulers for environments with a
mixed workload of real-time and non-real-
time activities
Additional slides
on anticipatory
scheduling -
experimental
evaluation
Experiments
• FreeBSD-4.3 patch + kernel module
(1500 lines of C code)

• 7200 rpm IDE disk (IBM Deskstar)

• Also in the paper:


15000 rpm SCSI disk (Seagate Cheetah)
Microbenchmark
25
no prefetch
prefetch
Original
20
Anticipatory
Throughput (MB/s)

15
no prefetch
prefetch
10
no prefetch
prefetch

0
Sequential Alternate Random within file
Real workloads
What’s the impact on real applications
and benchmarks?

Andrew benchmark • Disk-intensive


Apache web server • Prefetching enabled
(large working set)
Database benchmark
Andrew filesystem
benchmark
2 (or more) concurrent clients
30
6
5
25

20 Original
15 Anticipatory

10

5
Execution
0 time (minutes)
mkdir cp stat scan gcc
-16% -5% -5% -54% +1.7%

Overall 8% performance improvement


Apache web server
4

3
• CS.Berkeley trace
2
• Large working set

• 48 web clients 1
Throughput (MB/s)

0
read mmap
+29% +71%

no
prefetch

You might also like