0% found this document useful (0 votes)
13 views15 pages

Effective Lifetime-Aware Dynamic Throttling For NAND Flash-Based SSDs

Uploaded by

Erastus Muiruri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views15 pages

Effective Lifetime-Aware Dynamic Throttling For NAND Flash-Based SSDs

Uploaded by

Erastus Muiruri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO.

4, APRIL 2016 1075

Effective Lifetime-Aware Dynamic Throttling


for NAND Flash-Based SSDs
Sungjin Lee and Jihong Kim, Member, IEEE

Abstract—NAND flash-based solid-state drives (SSDs) are increasingly popular in enterprise server systems because of their
advantages over hard disk drives such as higher performance and lower power consumption. However, the decreasing write
endurance and the unpredictable lifetime remains to be a serious obstacle to their wider adoption in enterprise systems. In this
paper, we propose effective lifetime-aware dynamic throttling, called LADY, which guarantees the required storage lifetime by
intentionally throttling the write performance of SSDs with consideration of the effective write endurance of NAND flash memory.
Unlike existing static throttling, LADY makes throttling decisions based on the characteristics of a workload so that the required
SSD lifetime can be guaranteed with less performance degradation. LADY also exploits the improvement on write endurance
depending on the NAND program speed and the recovery effects of floating-gate transistors, thereby maximally utilizing the
available write endurance of NAND flash while mitigating the decreasing write endurance problem. Our experimental results
show that LADY improves write performance by 4.7x with small write response time variations over existing static throttling while
guaranteeing the required SSD lifetime.

Index Terms—NAND flash memory, storage system, solid-state drive, lifetime management, performance throttling

1 INTRODUCTION

N AND flash memory has been widely used in mobile


systems ranging from smart phones to laptops.
NAND flash-based solid-state drives (SSDs) are now
drops to 3K [5], which is much smaller than 100K P/E cycles
of SLC flash. The lifetime of SSDs is also strongly dependent
upon write-intensiveness of workloads. SSDs can achieve
becoming popular storage solutions for enterprise servers. the required lifetime under non-write-intensive environ-
Despite the prevalence of SSDs in enterprise markets, the ments where a small amount of data is written by applica-
limited lifetime of SSDs is considered a major obstacle that tions. On the other hand, the same SSDs are worn out much
precludes the use of SSDs in enterprise servers. earlier if they are used in write-intensive environments.
Enterprise customers typically require a minimum Because of the rapidly decreasing P/E cycles and the work-
storage lifetime (which is usually three or five years) load-dependent lifetime characteristic, it is a great challenge
because it is essential for designing storage systems as for SSDs to satisfy a minimum storage lifetime that enter-
well as for devising storage deployment and maintenance prise customers demand.
strategies, such as the calculation of the total costs of In this paper, we propose effective lifetime-aware
ownership (TCO) [2], [3], [4]. In spite of the importance of dynamic throttling, called LADY, which resolves the life-
a storage lifetime in enterprise environments, unfortu- time problems of SSDs. The main idea of LADY is to inten-
nately, there are only a few studies on managing the life- tionally throttle the write performance of SSDs to guarantee
time of flash-based SSD. the required lifetime. In LADY, the amount of data written
The lifetime of SSDs depends on the amount of written per unit time is controlled by adjusting the write speed of
data, which is decided by the number of program/erase SSDs. This makes the lifetime of SSDs predictable, allowing
(P/E) cycles and the SSD capacity. As the semiconductor enterprise customers to manage SSDs according to their per-
process is scaled down and a multi-level cell (MLC) technol- formance/lifetime requirements.
ogy is adopted, the capacity of SSDs is continuously The important design issue of LADY is how to properly
increased; however, at the same time, the number of P/E throttle write performance. To achieve better write response
cycles is more rapidly decreased. For example, MLC flash times without excessive performance throttling, LADY pre-
doubles the capacity of SSDs, but the number of P/E cycles dicts future write traffic and decides an appropriate write
speed so that the SSD is worn out at the end of the target
lifetime. In particular, LADY carefully controls the write
 S. Lee is with the Computer Science and Artificial Intelligence Laboratory, speed to prevent large fluctuations in write response times.
Massachusetts Institute of Technology, Cambridge, MA 02139. LADY also supports priority-aware dynamic throttling that
 J. Kim is with the Department of Computer Science and Engineering,
Seoul National University, Seoul 151-742 Korea. differently throttles write requests depending on their prior-
E-mail: [email protected]. ities. This helps us to manage write performance and life-
Manuscript received 3 Feb. 2014; revised 4 Aug. 2014; accepted 5 Aug. 2014. time according to the importance of enterprise services.
Date of publication 18 Aug. 2014; date of current version 17 Mar. 2016. LADY exploits the effective wearing characteristics of
Recommended for acceptance by H. Asadi, P. Ienne, and H. Sarbazi-Azad. NAND flash, so as to minimize a performance penalty asso-
For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference the Digital Object Identifier below. ciated with write throttling. The damage caused by repeti-
Digital Object Identifier no. 10.1109/TC.2014.2349517 tive P/E cycles is lowered by slowing down the program
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
0018-9340 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1076 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 4, APRIL 2016

speed of NAND devices [7]. To take advantage of its benefit detrapping charges captured in the oxide of a cell. This
on endurance improvement, LADY uses a slow NAND pro- recovery (or detrapping) process occurs during idle times
gram mode in throttling write performance, instead of between P/E cycles on the same cell, and its effect in general
merely delaying write requests. Moreover, the damage on increases as the logarithm of idle times (i.e., detrapping /
memory cells is partially recovered during idle times [8], [9], lnðtÞ) where t is the length of idle times. According to [8],
[10]. LADY makes use of endurance improvement by the [10], [11], the decrease in a threshold voltage dV detrap is
self-recovery effect to maximally utilize the effective lifetime expressed as follows:
of NAND flash. By exploiting the effective wearing charac-  
t
teristics, LADY mitigates the decreasing P/E cycles prob- dV detrap ¼ Ce  dVtrap  ln ; (2)
t0
lem, allowing more data to be written to SSDs.
In order to evaluate the effect of LADY on storage perfor- where Ce is a recovery efficiency and set to 5:63  102
mance and lifetime, we carried out a set of evaluations with according to [9]. t0 is 1 hour.
trace-driven simulators using enterprise traces. Our evalua- The increase in a threshold voltage dV th with the self-
tion results showed that LADY improved the average write recovery effect is expressed as follows [8]:
response time by 4.7x with smaller variations over existing
dV th ¼ dV trap  dV detrap : (3)
throttling algorithms while guaranteeing the target SSD life-
time. We also implemented the prototype of LADY in the
Linux kernel to show its feasibility in real-world applica- The length of idle times between P/E cycles on the same
tions like TPC-C. block is very long. Thus, the number of P/E cycles with the
This paper is organized as follows. In Section 2, we explain self-recovery effect is much larger than the number of P/E
the effective wearing characteristics of NAND flash. Section 3 cycles in datasheets.
introduces the motivation of dynamic throttling and Section 4 Performance/endurance trade-off. There is a trade-off
formally describes a dynamic throttling problem after illus- between an NAND program speed and the number of P/E
trating our write traffic model. In Section 5, we explain the cycles depending on the level of a voltage for block erasure.
proposed LADY technique in detail. Our evaluation results To erase a flash block, an NAND chip controller must apply
are presented in Section 6. In Section 7, we explain related a high erase voltage (e.g., 14 V) to memory cells. A recent
work, and finally, Section 8 concludes with summary. study reported that if the erase voltage is reduced, the dam-
age on cells is lowered as well [7]. Thus, the number of P/E
cycles that can be performed increases with the reduced
2 EFFECTIVE WEARING OF NAND FLASH erase voltage. To use the reduced erase voltage, however,
In NAND flash, program/erase (P/E) operations inevitably more precise charge placement is required to program data
cause damage to floating-gate transistors, reducing the to cells, which inevitably increases the program time.
write endurance of memory cells. At the device level, cells The erase voltage level can be configured continuously,
are gradually worn out as charges get trapped in the inter- but NAND devices with discrete erase voltage levels are
face and oxide layers of a floating-gate transistor during more feasible in practice. Thus, in this work, we assume
P/E cycles. This charge trapping increases the threshold NAND devices that support four program/erase modes
voltage of a floating-gate, and the cell becomes unreliable mode for 0; . . . ; 3 depending on the level of the erase voltage
when the threshold voltage is higher than a certain voltage (as proposed in [7]). If mode is 0, a block is programmed
margin (e.g., 0.65V for MLC flash) [8]. According to [8], [10], with the nominal NAND program speed, but the block
the increase in a threshold voltage dV trap because of charge must be erased later with the nominal erase voltage without
trapping approximately scales with P/E cycles in a power- any benefits on write endurance. If mode is 3, a block is pro-
law fashion as follows: grammed with the slowest program speed and erasure for
the block can be done with the lowest erase voltage. Thus,
dV trap ¼ Ait  N 0:62 þ Bot  N 0:3 ; (1)
the write endurance can be improved.
ðmodeÞ
where N is the number of P/E cycles. Ait and Bot are con- The number NP =E (for 0  mode  3) of P/E cycles
stant and set to 2:97  103 and 2:0  102 , respectively. depending on the program/erase mode mode is expressed
Usually, NAND flash vendors do not reveal important as follows:
 
parameters for their recent products. For this reason, Ait ðmodeÞ ðmodeÞ
NP =E ¼ NPspec
=E  1 þ rP =E ; (4)
and Bot for 20 nm MLC flash memory are obtained by scal-
ing up values for 90 nm MLC flash memory (which are where NPspec
=E is the number of P/E cycles in datasheets (e.g.,
available to the public) so that the number of P/E cycles ðmodeÞ
approximately matches 3K at the point where dV trap is 0.65V. 3K) and rP =E is the P/E improvement ratio over the nomi-
The effective wearing of floating-gate transistors (i.e., the nal erase voltage. Based on the real measurement study [7],
ðmodeÞ ð3Þ
amount of damage caused to cells) is greatly reduced rP =E for 1  mode  3 is 0.33, 0.4, and 0.49. NP =E is 1.49x
depending on the self-recovery effect of memory cells and ð0Þ
larger than NPspec spec
=E . NP =E is NP =E .
the voltage level applied for erasing blocks. Therefore, if the
ðmodeÞ
effective wearing characteristics of NAND flash are taken The NAND program time Tprog (for 0  mode  3)
into account, it is possible to exploit much larger P/E cycles depending on the program/erase mode mode is
than fixed P/E cycles in datasheets. expressed as follows:
 
Self-recovery effect. A floating-gate transistor has a self- T ðmodeÞ ¼ T prog  1 þ rðmodeÞ ; (5)
prog prog
recovery property which heals the damage of a cell by
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
LEE AND KIM: EFFECTIVE LIFETIME-AWARE DYNAMIC THROTTLING FOR NAND FLASH-BASED SSDS 1077

Fig. 1. The number of effective P/E cycles depending on different idle


times and program/erase modes.

where Tprog is the nominal NAND program time and rðmodeÞ


prog
is the performance degradation ratio over the nominal erase
voltage. Based on [7], rðmodeÞ
prog for 1  mode  3 is 0.4, 0.75,
ð3Þ
and 1.3, respectively. For example, Tprog is 2.3x slower than
ð0Þ
Tprog . Tprog is Tprog .
While the self-recovery effect occurs during idle times,
the reduced erase voltage affects the NAND program and
block erasure processes. Thus, their effects on the effective Fig. 2. A comparison of throttling policies: no throttling, static throttling,
wearing are nearly orthogonal. As a result, the number and LADY.

NPeff
=E of effective P/E cycles is written as follows:
The proposed effective lifetime-aware dynamic throttling
  technique, LADY, overcomes the limitations of static throt-
ðmodeÞ
NPeff
=E ¼ NPrecov
=E  1þ rP =E ; (6) tling. As depicted in Fig. 2c, LADY dynamically changes the
write speed of the SSD according to the characteristics of a
workload so that the write endurance is maximally utilized
where NPrecov
=E is the number of P/E cycles with the self- without excessive write throttling. In particular, LADY
recovery effect.
exploits the performance/endurance trade-off by adap-
Based on Eq. (6), we plot the effective P/E cycles of 20 nm
tively changing the program/erase mode and considers the
MLC flash memory in Fig. 1 depending on the length of idle
endurance improvement by the self-recovery effect. This
times with different program/erase modes. The number of
increases the number of P/E cycles, allowing more data to
effective P/E cycles is increased in proportional to the length
be written to the SSD.
of idle times. Similarly, as the slower NAND program mode
LADY is designed with the following objectives in
is used, the effective lifetime is improved as well.
mind to properly control the write speed of the SSD. First,
the write speed must be properly decided so that the SSD
3 MOTIVATION is worn out at the end of the target lifetime. If the SSD is
Fig. 2 shows our motivational example for dynamic throt- less throttled, the required lifetime cannot be guaranteed
tling. Based on the specification of the SSD, the maximum (like the SSD without write throttling). If the write speed
amount of data that can be written is proportional to the is excessively throttled, the write performance could sig-
SSD capacity and the number of P/E cycles on datasheets. nificantly deteriorate, underutilizing available write
For example, if the number of P/E cycles is 3K and the SSD endurance (like the SSD with static throttling). Second,
capacity is 128 GB, the data of 375 TB can be written. Sup- the write speed must be properly decided so that
pose that a target SSD lifetime is 5 years. In the example of response time variations are minimized. If the write
Fig. 2a with no throttling, the SSD is worn out before the speed is too throttled in a certain time-period while it is
target lifetime. less throttled over another time-period, I/O response
In order to ensure a lifetime warranty, some SSD vendors times are greatly fluctuated, resulting in the degradation
recently adopt static throttling [12], [13]. As shown in Fig. 2b, of the user experience.
static throttling guarantees the required lifetime by stati-
cally limiting the maximum write throughput under the 4 PROBLEM FORMULATION
assumption that incoming write traffic is always heavy. In 4.1 Write Traffic Model
practice, actual workloads are not intensive all the time, so
We first present a write traffic model used throughout
static throttling often slows down the write speed of SSDs
this paper and formally define a dynamic throttling prob-
uselessly, underutilizing the available SSD endurance. For
lem using our model.1 Our write traffic model includes
example, in Fig. 2b, the SSD is still not worn out at five
years. Static throttling also incurs large fluctuations in write
response times; it throttles the write speed so much when a 1. For readers’ convenience, we summarize mathematical terms fre-
quently used throughout this paper in Section 1 of the Appendix, which
workload is intensive; on the other hand, it never throttles can be found on the Computer Society Digital Library at http://doi.
write performance when a workload is not intensive. ieeecomputersociety.org/10.1109/TC.2014.2349517.
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
1078 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 4, APRIL 2016

Fig. 4. Three main functions of LADY.

4.2 Dynamic Throttling Problem


LADY delays individual write elements W i so that the write
traffic S w is properly regulated to offer a lifetime guarantee.
Fig. 3. An illustration of original write traffic (top) and throttled write traffic In LADY, the time taken to write the data of W i is variable.
(bottom) in our write traffic model. As illustrated in Fig. 3, this variable write time t wt i is decided
by two delay factors, t prog i and t del
i , where t prog
i is the NAND
all of the write requests from the host system as well as program time Tprog ðmodeÞ
depending on the program/erase
from FTL modules, including garbage collection and
mode and t deli is the length of an artificial delay. This artificial
wear-leveling. The target SSD lifetime is denoted by Tssd .
delay is needed because t prog i could not be long enough to
The total amount of data that can be written until the
SSD is worn out is denoted by Cssd . Note that Cssd sufficiently delay W i . If t wt i is longer than Tprog , the time
changes depending on the length of idle times and the taken to write W i is increased by (t wt i  Tprog ). This increased
program/erase mode. write time reduces the SSD write performance, creating the
The top figure of Fig. 3 shows original write traffic repre- illusion that the SSD operates slowly.
sented by our write traffic model. We define a write element The main objective of LADY is to decide t wt i for W i so
W i as the basic unit of write traffic and associate it with the that F S ðS w Þ is equal to Cssd at Tssd . To minimize response
size of requested data and properties related to time. The time variations, t wt i must be distributed across W i as evenly
size of W i is fixed to a page which is the smallest unit for as possible. Consequently, the problem of dynamic throt-
writing data. The size of W i is denoted by F S ðW i Þ. W i has tling can be expressed as follows:
two time properties, Tprog and t iti . Here, Tprog is the time
i for W i in S w if F T ðS w Þ < Tssd
Decide t wt
taken to program W i . t iti is the length of idle times until the
next write element W iþ1 arrives after the data of W i are subject to
completely served. The total length of the time for W i is F S ðS w Þ ¼ Cssd at Tssd and t wt
1 ¼    ¼ t n1 :
wt
(7)
denoted by F T ðW i Þ, and it is Tprog þ t iti . Otherwise;
Using W i defined above, we represent the write traffic
1 ¼    ¼ t n1 ¼ Tprog :
t wt wt
sent to the SSD until it becomes unreliable as a sequence S w
of write elements i.e., S w ¼ hW 1 ; . . . ; W n i, where W i occurs
before W q if i < q. The total size of data written by S w is i is decided by LADY when W i arrives and
In Eq. (7), t wt
the SSD lifetime Tssd is determined by enterprise customers.
denoted by F S ðS w Þ. Similarly, F T ðS w Þ is the total length of
However, the write traffic S w , including F S ðS w Þ and
time until the data of F S ðS w Þ are written to the SSD.
F S ðS w Þ ¼ F S ðW 1 Þ þ    þ F S ðW n Þ and F T ðS w Þ ¼ F T ðW 1 Þ F T ðS w Þ, is unknown when a decision on t wt
i is made. More-

þ    þ F T ðW n Þ. F S ðS w Þ is equal to Cssd because the SSD is over, Cssd changes according to the length of idle times as
worn out after the data of Cssd are written. If write traffic is well as the program/erase mode chosen. Thus, the future
heavy and dynamic throttling is not used, F T ðS w Þ could be write traffic S w and the effective write endurance Cssd must
smaller than Tssd . be carefully estimated for write throttling. LADY is
In our write traffic model, time is divided into time- designed to properly decide t wt i in real-world environments
epochs, simply called epochs. Here, an epoch is a unit period where no knowledge on S w and Cssd is available a priori.
of time for predicting future write traffic and deciding a
write speed (see Section 5). Given the sequence S w , we con- 5 DESIGN AND IMPLEMENTATION OF LADY
struct a sequence E k of write elements for an epoch k, i.e., Fig. 4 shows the overall architecture of LADY, which is
E k ¼ hW k1 ; . . . ; W km i where W k1 is a write element in S w composed of three modules: a write-traffic predictor, a write-
that first arrives at the SSD after E k begins and W km is the speed selector, and a write-traffic regulator. The write-traffic
last one before E k ends. The amount of data written during predictor analyzes the history of previous write traffic and
E k is denoted by F S ðE k Þ and the length of E k is denoted by estimates the future write traffic (see Section 5.1). The write-
F T ðE k Þ. As expected, F S ðE k Þ ¼ F S ðW k1 Þ þ    þ F S ðW km Þ speed selector decides the write speed based on the pre-
and F T ðE k Þ ¼ F T ðW k1 Þ þ    þ F T ðW km Þ. The write traffic dicted write traffic and the remaining write endurance. The
S w is also represented as a sequence of epochs, i.e., S w ¼ remaining write endurance is estimated by taking into
hE 1 ; . . . ; E j i where j is the number of epochs in S w . account the program/erase mode and the self-recovery
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
LEE AND KIM: EFFECTIVE LIFETIME-AWARE DYNAMIC THROTTLING FOR NAND FLASH-BASED SSDS 1079

recency of a workload. In Weighted, the future write


traffic F wgt
S ðE k Þ for E k is estimated as follows:
F S ðE k1 Þ þ F S ðE k2 Þ þ F S ðE k3 Þ
F wgt
S ðE k Þ ¼  0:8
3 (9)
F S ðE k4 Þ þ    þ F S ðE 1 Þ
þ  0:2:
k3

 Cyclic is motivated by the previous observations [17]


that enterprise workloads often exhibit cyclic behav-
iors with periods between several minutes or several
Fig. 5. Prediction of future write traffic. days. Cyclic detects the repeated pattern of a work-
load and adjusts the length of an epoch so that it
includes the entire repeated pattern [1]. Then, it
effect (see Section 5.2). Finally, the write-traffic regulator assumes that the write traffic observed in the latest
throttles the write performance so that the target SSD life- epoch will be repeated again in the next epoch. In
time will be reached with small response time variations Cyclic, the future write traffic F cyc
S ðE k Þ for E k is esti-
(see Section 5.3). mated as follows:
LADY is implemented based on a modular design con-
cept; it runs below the FTL without any functional depen- F cyc
S ðE k Þ ¼ F S ðE k1 Þ: (10)
dencies. All of the I/O requests sent from both the host
system and the FTL (e.g., GC and WL) are sent to LADY. In Fig. 5, if F cyc
S ðE k Þ exhibits the smallest write-traffic dif-
LADY creates the illusion that the FTL runs on slower ference, F pred
S ðE k Þ is set to F cyc
S ðE k Þ.
NAND flash; it monitors write traffic from the FTL, decides
a write speed, and throttles individual write elements. 5.2 Determination of Write Speed
Thanks to this modular design, any kinds of FTL schemes The write-speed selector decides the write speed for the
can run on top of LADY. next epoch based on the future write traffic and the avail-
able write endurance. In this subsection, we first explain
5.1 Estimation of Future Write Traffic our mechanism for deciding the write speed and then
The write-traffic predictor of LADY uses an epoch-based describe how the available write endurance is estimated
approach that estimates future write traffic on an epoch- with consideration of the self-recovery effect as well as the
by-epoch basis. This is based on our observation that program/erase mode.
even though it is difficult to exactly predict the whole of Write-speed decision overview. Whenever a new epoch E k
the future write traffic Sw in advance, the write traffic in begins, the write-speed selector decides a write time t wt k for
the near future can be accurately estimated by referring E k by increasing or decreasing a previous write time t wt k1 used
to the previous history. The write-traffic predictor also for a previous epoch E k1 . The newly decided write time is
adopts a multiple-expert system [14], [15], [16] that runs equally applied to all write elements in the epoch (i.e., for
simple multiple workload-prediction policies at the same W k1 ; . . . ; W km in E k , t wt wt wt
k1 ; . . . ; t km are equal to t k ). After the
time and chooses the best one with the highest accuracy. epoch ends, the write-speed selector decides a write time
The multiple-expert system not only effectively deals
kþ1 for a next epoch E kþ1 by updating t k . In LADY, the
t wt wt
with various workloads [15], but also is useful in a
changes of the write time occur only at the beginning of
resource-constrained environment like SSDs [16].
each epoch. This is not only useful to avoid response time
Fig. 5 shows how the write-traffic predictor estimates the
variations caused by the frequent write speed changes, but
future write traffic F pred
S ðE k Þ for the next epochs E k . At the also allows us to decide the write speed according to chang-
beginning of each epoch, it evaluates the accuracy of indi-
ing workloads.
vidual experts. For each expert, the write-traffic predictor
Write-speed decision algorithm. We now detail how the
calculates the difference between the predicted write traffic
write-speed selector decides the write speed. The write time
and the actual write traffic. Then, it chooses the expert that prog
t wt
k is decided by two factors: the program/erase mode t k
shows the best accuracy for the past epoch. The write-traffic
predictor employs three experts for workload prediction: and the length of an artificial delay t del k . The write-speed
Average, Weighted, and Cyclic. selector starts with the nominal program/erase mode with
no artificial delays. Then, at the beginning of the kth epoch
 Average is a global average of write traffic observed E k (k > 1), it decides the write speed based on the expected
in all of the previous epochs. Average is effective future write traffic F pred S ðE k Þ and an epoch capacity ck . The
when the long-term behavior of a workload is stable epoch capacity ck is the number of writable bytes assigned
and is not changed greatly. In Average, the future to E k . The epoch capacity changes depending on the length
write traffic F avg
S ðE k Þ for E k is estimated as follows: of idle times and the NAND program mode. We explain
F S ðE k1 Þ þ F S ðE k2 Þ þ    þ F S ðE 1 Þ
how the epoch capacity is estimated later.
F avg
S ðE k Þ ¼ : (8) Fig. 6 shows our write speed decision algorithm. The
k
write-seed selector first sees if the expected future write traf-
 Weighted is an extension of Average. It gives a higher fic is equal to the epoch capacity (i.e., F pred S ðE k Þ ¼ ck ) with
weight to recent three epochs, so as to reflect the the program mode used before. If it is, the write time
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
1080 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 4, APRIL 2016

Fig. 7. A change in a write time.

If the expected future write traffic is smaller than the


epoch capacity as shown in Fig. 7b (i.e., F pred
S ðE k Þ < ck ), it
means that write requests were not intensive enough to
wear out the SSD before a target lifetime or they were
too throttled in the previous epoch. The write time must
be shorter than the previous write time so that more data
are to be written. A change Dt wt in the previous write
time is as follows:
  
ck
F pred
T ðE k Þ  pred 1
FS ðE k Þ
k ¼
Dt wt if F pred
S ðE k Þ < ck :
m
(12)
Fig. 6. A write speed decision algorithm. To increase data to be written by ck  F pred S ðE k Þ, the tempo-
ðmodeÞ
rary write time is set to t wt
k1  Dt wt
. If t k1  Dt
wt wt
< Tprog ,
becomes the same as the previous one under the assump- ðmodeÞ
it is set to Tprog because the program time is fixed to
tion that the similar traffic would be repeated in future ðmodeÞ
(lines 1-3). Otherwise, the write-speed selector attempts Tprog (lines 11-13).
to decide the new write time based on the expected future If the temporary write time is shorter than the previously
write traffic and the epoch capacity. The write-speed selec- decided write time (i.e., t wt wt
tmp < t k ), the write-speed selector
tor uses the same program/erase mode that was used for chooses the temporary one along with the corresponding
NAND program mode (i.e., t wt k ¼ t tmp and modek ¼ mode).
wt
the previous epoch (i.e., mode := modek1 ). The write time is
k := 1) (lines 4-5).
initially set to 1 (i.e., t wt Otherwise, the previously decided write time is maintained
If the expected future write traffic is larger than the and the write speed decision is finished (lines 17-20).
ðmode Þ
epoch capacity as shown in Fig. 7a (i.e., F pred S ðE k Þ > ck ), k is t k  Tprog
Note that the artificial delay t del wt k
(line 31).
the write time must be longer than the previous write The write-speed selector attempts to choose a proper
time. A change Dt wt in the previous write time is obtained program/erase mode. If mode is smaller than 3 and the
as follows: write time is longer than the next slower NAND program
  pred  ðmodeþ1Þ
speed (i.e., Tprog  t wt
F S ðE k Þ k ), the write-speed selector selects
F pred
T ðE k Þ  ck  1 the slower one (lines 21-24). Then, the write time is cal-
k ¼
Dt wt if F pred
S ðE k Þ > ck ; culated again. The slower program mode improves the
m
write endurance, allowing us to write more data. Thus, as
(11)
depicted in Fig. 8a, the write time decreases with the
where m is the number of write elements allowed to be slower mode.
written during E k and F pred T ðE k Þ is the length of E k . To If mode is larger than 0 and the current NAND program
make the data written during E k equal to ck , F pred S ðE k Þ  ck time is equal to the write time (i.e., Tprog ðmodeÞ
¼ t wt
k ), the
of the data must be delayed to the next epoch as shown NAND program speed may be set too slow (i.e.,
in Fig. 7a. The total time required to delay those data ðmodeÞ
k1  Dt ). This would overly throttle write per-
> t wt wt
Tprog
is approximated as F pred pred
T ðE k Þ  ðF S ðE k Þ=ck  1Þ. Since formance. The write-speed selector chooses the next faster
write elements are equally delayed, Dt wt k is obtained by mode and calculates the write time again (lines 25-28).
dividing the total delay by m. t wt k1 þ Dt
wt
is chosen as a As illustrated in Fig. 8b, the over-throttling problem can be
temporary write time t tmp (lines 8-10).
wt
avoided with the faster program mode. As a result, the
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
LEE AND KIM: EFFECTIVE LIFETIME-AWARE DYNAMIC THROTTLING FOR NAND FLASH-BASED SSDS 1081

Fig. 9. Estimation of the remaining writable bytes.

5.3 Write Performance Regulation


Fig. 8. A change in a program mode. Once the write speed is decided, the write-traffic regula-
tor of LADY throttles write performance by equally
write time with the faster mode is shorter than that with the delaying write elements in the same epoch. This write-
slower one. traffic regulation is effective in minimizing response time
The expected future write traffic could be the same as the variations, but it may not guarantee the required lifetime
epoch capacity after changing the program mode (i.e., all the time—if unexpectedly heavy write traffic comes,
F pred more data than the epoch capacity could be written. For
S ðE k Þ ¼¼ ck ). In that case, the temporary write time is
the same as the newly chosen program time because future this reason, the write-traffic regulator adopts the epoch-
write traffic would be properly regulated with no artificial capacity regulation policy that prevents more data than
delays (lines 14-16). our expectation from being written.
Finally, the write-speed selector changes the program/ One of the easiest ways to enforce the epoch capacity is to
erase mode until the above two conditions are not met stop writing if the epoch capacity is exhausted. We call it
(line 29) or the write time with the new program/erase strict capacity regulation. For example, suppose that the
mode is longer than the previously decided one (line 20). epoch length is 4 seconds and the epoch capacity is 1 MB.
Estimating epoch capacity. In order to know the epoch Further suppose that the epoch capacity of 1 MB runs out at
capacity ck , the write-speed selector first estimates the num- the end of the third second. The write-traffic regulator pre-
ber Cr of remaining bytes that can be written until the SSD vents the overuse of the epoch capacity by stopping writing
becomes unreliable. It then decides the epoch capacity by data for the remaining 1 second. This strict capacity regula-
equally distributing the number of remaining bytes to tion incurs great response time variations because write
future epochs. The wearing-rate of NAND flash depends on requests arriving after 3 seconds are delayed until the next
the length of idle times and the program/erase mode. The epoch begins.
number of remaining writable bytes is thus a function of the We resolve this problem by introducing the concept of a
current P/E cycles, the length of idle times between conse- default capacity and a spare capacity. The default capacity is
cutive P/E cycles Tidle , and the NAND program mode some of the epoch capacity, which is evenly assigned to
ðmodeÞ every second by default. The default capacity is useful to
Tprog chosen for writing data.
offer the minimum write throughput. The spare capacity is
ðmodeÞ
 an additional capacity borrowed from future epochs. If the
Cr ¼ fcr current P=E cycles; Tidle ; Tprog : (13) spare capacity is 10 percent, the write-traffic regulator bor-
rows 10 percent of the capacities of all the future epochs. If
The SSD is composed of several thousands of flash more data than the default capacity are written, the write-
blocks, so the current P/E cycles and Tidle of individual traffic regulator temporarily uses the spare capacity for
blocks are different from each block. In this paper, we writing data, avoiding strict capacity regulation. After the
estimate remaining bytes in a conservative way. The high- epoch ends, the write speed for future epochs is slightly
est number of P/E cycles among all of the flash blocks is reduced to reclaim the epoch capacity overly used by previ-
chosen as current P/E cycles and the shortest idle time is ous epochs.
used as Tidle . We give a detailed explanation about how the write-traf-
Fig. 9 illustrates how Cr is estimated. The write-speed fic regulator behaves with the spare capacity. Suppose that
selector estimates the maximum number of effective P/E the spare capacity is 10 percent and there are j epochs. The
cycles using Eq. (6) in Section 2 under the assumption epoch capacity is denoted by c1 ; . . . ; cj , respectively, and
ðmodeÞ c1 ¼    ¼ cj ¼ Cr =j. If the length of the epoch is n seconds,
that Tidle and Tprog will be the same in the future. In
Fig. 9, the number of current P/E cycles and the maxi- the default capacity for the first epoch is c1 /n. The spare
mum number of effective P/E cycles are 1.5K and 6K, capacity for the 1st epoch is ðc2 þ    þ cj Þ  0:1. The total
respectively. The number of remaining effective P/E capacity assigned to the 1st epoch is c1 þ ðc2 þ    þ cj Þ
cycles is 4.5K (¼ 6K1.5K), so Cr is 4:5K  SSD Capacity. 0:1. For example, if j is 4 and Cr is 4 MB, c1 is 1 MB (= 4
Finally, the epoch capacity ck is obtained by dividing Cr MB/4) and the spare capacity is 0.3 MB (= 3 MB0:1). If the
by the number of remaining epochs. data smaller than c1 have been written, the remaining
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
1082 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 4, APRIL 2016

Fig. 10. An example of how LADY handles write traffic when the spare
capacity is exhausted. Fig. 11. A procedure of PA-DT in deciding write speeds.

capacity Cr after the 1st epoch is equally distributed to the incoming write traffic using strict capacity regulation. At
remaining epochs and the spare capacity is decided by the beginning of the second epoch, Cr is 2.7 MB and the
ðc3 þ    þ cj Þ  0:1 for the second epoch. For instance, if 1.0 capacity of the second epoch becomes 0.9 MB. The spare
MB of data are written in the first epoch, the spare capacity capacity is 0 MB. Therefore, more data than 0.9 MB cannot
is not used and Cr is reduced to 3.0 MB.2 c2 , c3 , and c4 are be written in the second epoch. This could incur great
1.0 MB (=3.0 MB/3) and the spare capacity is 0.2 MB response time variations, but allows us to guarantee the
(¼ 2:0MB  0:1). required lifetime.
If the spare capacity is partially used, c2 ; . . . ; cj are The spare capacity must be carefully chosen. If the spare
reduced to 90 percent of the original capacity and only the capacity is unlimited, it is equivalent to LADY without
unallocated capacity is used as the spare capacity. In the epoch capacity regulation in which an unlimited spare
above example, if the data of 1.2 MB are written in the first capacity can be borrowed from future epochs. On the other
epoch, Cr is reduced to 2.8 MB at the beginning of the sec- hand, if the spare capacity is too small, strict capacity regu-
ond epoch. c2 , c3 , and c4 are 0.9 MB and the spare capacity lation would be frequently observed due to the lack of the
becomes 0.1 MB. This capacity assignment makes the write- spare capacity. In this work, the spare capacity is empiri-
speed selector slightly reduce write performance with a cally set to 10 percent of the remaining capacity. In our
smaller epoch capacity (i.e., the epoch capacity is 0.9 MB observation, it is large enough to avoid strict capacity regu-
instead of 1.0 MB), helping us to reclaim the overused lation in real-world applications.
capacity in the future epochs.
In the worst case, the spare capacity could be 5.4 Priority-Aware Dynamic Throttling
completely used up before the epoch ends. In this case, Enterprise services have different performance require-
the write-traffic regulator strictly regulates write traffic to ments, so their importance is different from one another
prevent more data than we assigned from being written. [18], [19]. For example, in database management systems
In the example above, the sum of the epoch and spare (DBMS), transactions having a great effect on end-users’
capacities assigned to the first epoch is 1.3 MB. Fig. 10 experiences are handled with a higher priority than other
illustrates what happens when the data of 1.4 MB are ones. Similarly, interactive enterprise services are also
requested for writing in the 1st epoch. Here, we assume served in advance of other services like batch jobs, so as to
that the length of the epoch is 4 seconds. The write-traffic improve the quality of end-users.
regulator assigns the default capacity of 0.25 MB (=1 To effectively handle write requests from services with
MB/4 seconds) to every second. The spare capacity is not different priorities, we propose priority-aware dynamic
assigned and is reserved. Initially, 0.25 MB are written in throttling, called PA-DT, which is an extension of LADY.
the 1st second and 0.15 MB are written in the 2nd second. The basic idea of PA-DT is to less throttle write requests
0.1 MB of the capacity assigned to the 2nd second are not from important services so that they are more quickly
used. The unused capacity is forwarded to the 3rd sec- served. The SSD lifetime is more used by important serv-
ond. In the 3rd second, 0.65 MB are written. The total ices. This overused lifetime is offset by further throttling
capacity assigned to the 3rd second is completely con- write requests from less important services.
sumed (¼ the spare capacity of 0.3 MB + the unused Fig. 11 illustrates how PA-DT decides write speeds for
capacity of 0.1 MB + the default capacity of 0.25 MB). In services with different importance. Initially, PA-DT receives
the 4th second, 0.35 MB are requested for writing. Since a list of importance values I l for enterprise services Service l
the spare capacity is exhausted, the write-traffic regulator from administrators, i.e., Service 1, Service 2, and Service 3 as
allows only the data of 0.25 MB to be written, delaying in the example of Fig. 11. The importance value of a service
the data of 0.1 MB to the next second. As a result, 1.3 MB ranges from 1 to 4. In Fig. 11, the importance values I1 , I2 ,
of data have been written in the 1st epoch. and I3 for Services 1, 2, and 3 are 1, 2, and 2, respectively. An
After the spare capacity runs out, the write-traffic regula- enterprise service with a smaller important value is more
tors sets the spare capacity to 0 MB and then throttles important. PA-DT computes relative throttling priorities P l
for target services using their importance values. PA-DT
obtains a throttling priority P l for Service l as follows:
2. According to Eq. (13), Cr changes depending on the length of idle
times and the program mode. For the sake of simplicity, we assume P l ¼ I l =ðI1 þ    þ INs Þ, where Ns is the number of services
that these parameters are the same as the previous epoch. with specific importance values. The sum of the throttling
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
LEE AND KIM: EFFECTIVE LIFETIME-AWARE DYNAMIC THROTTLING FOR NAND FLASH-BASED SSDS 1083

priorities is always 1.0. In Fig. 11, P 1 , P 2 , and P 3 are 0.2, 0.4, Using LADY in the RAID system. The SSD with LADY can
and 0.4, respectively. Similarly, a service with a smaller be used to comprise a disk array of the RAID system. How-
throttling priority has a higher priority. ever, since LADY changes the write speed of the SSD
According to the priorities of services, PA-DT decides according to the characteristic of input write traffic, if some
priority-aware artificial delays t del
ðk;lÞ for Services l as follows: SSDs receive different write traffic from other ones in the
t ðk;lÞ ¼ Ns  t k  P l . In Fig. 11, t del
del del del del
ðk;1Þ , t ðk;2Þ , and t ðk;3Þ for
same RAID, then response times across SSDs could be very
Services 1, 2, and 3 are 0.6, 1.2, and 1.2 ms if t del different. This problem could be solved by implementing
k is 1.0 ms.
LADY in the RAID controller so that it manages multiple
t del
ðk;lÞ is proportional to P l . For example, t del
ðk;1Þ is twice shorter
SSDs together. Coordinating multiple SSDs in the RAID
than t del del
ðk;2Þ and t ðk;3Þ . For services without importance values, controller to reduce response time variations is not new and
t del
k is used. Finally, PA-DT throttles write elements differ- has been studied intensively by [23], [24]. Moreover, since
ently according to their write times t wt
ðk;lÞ .
LADY has no dependencies with other SSD firmware mod-
In order to support PA-DT, the SSD firmware must know ules, it can be easily implemented in the RAID controller.
(i) importance values for enterprise services and (ii) service
numbers to which individual write elements belong. First, a 6 EXPERIMENTAL RESULTS
list of important values for services can be delivered by add- In order to evaluate LADY, we first carried out a set of eval-
ing a new custom command to the existing bus interface,
uations with trace-driven simulators. We also implemented
such as ATA and SATA. The S.M.A.R.T function of the
the prototype of LADY in the Linux kernel to assess its fea-
SATA interface that allows device vendors to define their
sibility in real-world environments.
own commands would be useful for this. Second, in order
to deliver a service number to which a write element
belongs, a SATA WRITE command must be modified to 6.1 Experiments with Trace-Driven Simulators
specify a service number of a write request. The modifica- Experimental settings. The NAND flash was based on 2-bit
tion of a block device driver is also needed to deliver soft- MLC flash, and a block was composed of 64 4 KB pages.
ware-side information (i.e., important values and service The page read and program times were 50 and 600 ms,
numbers) via those H/W interfaces. Considering that SSD respectively, and the block erasure time was 2 ms. The num-
manufacturers often offer custom device drivers and inter- ber of P/E cycles was set to 3K, but it changed depending
faces, it would not be a serious obstacle to add such custom on idle times and the program/erase mode. The NAND
commands/interfaces to SSD products. program time was also changed according to the program/
erase mode chosen. The SSD lifetime was five years.
5.5 Discussion We performed our evaluations using two types of trace-
Multi-channel architecture support. For the sake of simplicity, driven simulators: an SSD simulator and a dynamic throt-
we describe LADY based on the SSD with single-channel tling (DT) simulator. The SSD simulator was based on a Dis-
architecture, but LADY works well with multi-channel kSim-based SSD simulator with four channels [22]. Because
architecture. Regardless of the underlying channel architec- of its rich functionalities, however, it exhibited a slow simu-
ture, LADY controls write performance by changing the lation speed. For this reason, we used the SSD simulator to
write speed of individual NAND devices (e.g., NAND flash collect firmware-level I/O traces sent to NAND flash, which
chips). This throttling mechanism allows LADY to work included all of the I/O requests both from SSD firmware
independently of organizations of SSD architectures. For a and from a host system. The SSD simulator used the page-
more detailed explanation, please see Section 3 of the level FTL with a greedy garbage collection policy. We used
Appendix, available in the online supplemental material. a firmware-level trace as an input for our DT simulator. The
Throttling issues with multi-threaded and multiple applica- DT simulator supported several throttling algorithms. Since
tions. In our write traffic model in Section 3, we assume that the DT simulator did not simulate complicated FTL algo-
if one write element is delayed by t, the arrival times of the rithms, it was much faster than the SSD simulator. Throt-
following write elements are delayed by the same amount tling algorithms worked independently of FTL algorithms;
of time. This is not true in some environments like multi- they reduced write performance and did not change the
threaded and multiple applications. Suppose that two write behaviors of FTL algorithms. Thus, our simulation method
elements are issued by two threads running on different was effective enough to accurately evaluate throttling algo-
CPUs. In that case, even if one write element is delayed by t, rithms in a rapid manner.
the arrival time of the other write element may not be We compared LADY with two existing SSD configura-
delayed because two threads run independently. Therefore, tions, NT and ST. While NT was the SSD without write
the write traffic may be not properly throttled and more throttling, ST was the SSD with static throttling. We catego-
data than our expectation are written. LADY effectively rized LADY into three sub-techniques: DT, LADYRECOV ,
handles such a situation. If incoming write traffic is not suf- and LADYALL , which are summarized in Table 1. All the
ficiently throttled, LADY continuously increases a write three sub-techniques used the same write-traffic estimation
time, further slowing down the write speed of the SSD, until and write performance regulation policies, which were
write traffic is properly regulated. The behaviors of host explained in Sections 5.1 and 5.3, respectively. DT throttled
applications in fact do not affect LADY. A more detailed write performance by changing artificial delays without
description of throttling issues with multiple applications is considering the effective lifetime of NAND flash. DT used
found in Section 4 of the Appendix, available in the online 3K P/E cycles with the nominal NAND program speed.
supplemental material. LADYRECOV was the same as DT, except that it considered
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
1084 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 4, APRIL 2016

TABLE 1
A Summary of Evaluated Sub-Techniques

Techniques DT LADYRECOV LADYALL


Artificial Throttling Delay   
Self-Recovery Effect   
Program/Erase Mode   

the self-recovery effect. LADYRECOV always used the nomi- Fig. 12. A comparison of SSD lifetimes for five traces.
nal NAND program speed. LADYALL was the same as
LADYRECOV , but it exploited different program/erase it can guarantee five-year lifetime with smaller performance
modes. A default epoch length was set to 10 minutes, but penalties. We discuss the performance benefit of LADYALL
for Cyclic it changed adapting to workloads. over LADYRECOV in detail later.
We chose two enterprise traces, proxy and proj from Table 3 analyzes the lifetimes of SSDs from the perspec-
the MSR-Cambridge benchmark [20] and used three pro- tive of written data for proj and proxy. Cssd 3K
is the number
duction traces, exchange, map, and msnfs, from the MS- of writable bytes based on 3K P/E cycles, whereas Cssd is
Production benchmark [21]. Table 2 summarizes the traces. the number of writable bytes when the effective wearing of
Because of the limited duration of the traces, we performed NAND flash is taken into account. Wwork is the number of
our evaluations under the assumption that the same I/O bytes written to the SSD for five years. ST and DT throttle
pattern will be repeated for five years. Note that the maxi- 3K
write performance so that Wwork becomes Cssd at the end of
mum epoch length for Cyclic was smaller than half of the the target lifetime. In the case of ST, however, Wwork is 43
trace duration because we replayed the same traces repeat- and 11 percent smaller than Cssd 3K
for proj and proxy,
edly. The amount of written data per hour was different respectively. This is because ST excessively throttles write
depending on the traces. proxy and proj exhibited low performance under the assumption that write traffic is
write traffic compared with exchange, map, and msnfs. always heavy. Unlike ST, DT dynamically changes the write
The write amplification factor (WAF), which has a great speed according to a workload, making Wwork close to Cssd 3K
.
effect on the size of write traffic, ranged from 1.62 to 2.26. LADYRECOV fully utilizes the endurance improvement
The SSD capacity was configured differently depending on offered by the self-recovery effect, making Wwork close to
the size of traces. For proxy and proj, the SSD capacity Cssd . In the case of proj, write throttling is not performed
was 32 GB. For exchange, map, and msnfs with high write in most cases because write traffic is not so heavy. Instead
traffic, the capacity of the SSD was set to 128 GB. of merely delaying write requests with artificial delays,
Lifetime analysis: Fig. 12 shows evaluation results on SSD LADYALL exploits the slow NAND program mode, allowing
lifetimes with five traces. NT cannot guarantee the required us to write more data. For this reason, Cssd of LADYALL is 24
SSD lifetime for all the traces, except for proj. The write percent larger than that of LADYRECOV for proxy. In the
traffic of proj is not so heavy, so NT ensures the five-year case of proj, Cssd of LADYALL is almost the same as that of
lifetime without throttling. ST and DT do not consider the LADYRECOV because the slow program mode is rarely used.
effective lifetime of NAND flash, throttling write perfor- Performance analysis. To understand the effect of LADY
mance based on 3K P/E cycles. Since the P/E cycles of on SSD performance, we measured the response time which
NAND flash increase because of the self-recovery effect, the was the average elapsed time to write individual pages.
effective SSD lifetimes with ST and DT are much longer Fig. 13 shows our evaluation results. NT exhibits the best I/
than the required lifetime. It means that ST and DT exces- O response time, but it cannot guarantee the target lifetime.
sively throttle write performance, which results in poor Both LADYRECOV and LADYALL throttle write requests to
write performance over LADYRECOV and LADYALL . Unlike meet the required lifetime, so their performance is worse
ST, DT dynamically decides the write speed in response to than that of NT; LADYRECOV and LADYALL exhibit 1.65x to
a changing workload, maximally utilizing 3K P/E cycles
and exhibiting better performance than ST. LADYRECOV
TABLE 3
takes advantage of the self-recovery effect, thus it throttles
The Amount of Data written for Five Years for
write performance so that the SSD lifetime is close to five Two Traces, proj and proxy
years. LADYALL considers the self-recovery effect, and fur-
thermore, it exploits the slow NAND program mode. Thus, Trace SSD 3K
Cssd (TB) Cssd (TB) Wwork (TB)
configuration
TABLE 2 proj NT 312.6 144.4
A Summary of Traces Used for Evaluations ST 403.4 54.2
DT 93.75 346.9 93.7
Trace Duration Data written WAF SSD LADYRECOV 312.8 141.0
per hour (GB) capacity (GB) LADYALL 312.7 141.3
proxy 1 week 4.94 1.93 32 proxy NT 246.6 399.2
proj 1 week 2.08 1.62 32 ST 357.7 83.5
exchange 1 day 20.61 2.24 128 DT 93.75 347.0 93.74
map 1 day 23.82 1.68 128 LADYRECOV 271.9 271.9
msnfs 6 hours 18.19 2.26 128 LADYALL 338.2 331.8
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
LEE AND KIM: EFFECTIVE LIFETIME-AWARE DYNAMIC THROTTLING FOR NAND FLASH-BASED SSDS 1085

TABLE 4
Accuracy of Write Traffic Prediction

Prediction proxy proj exchange map msnfs


Accuracy (%)
Average 65.9 13.7 79.8 45.2 78.8
Weighted 73.0 17.8 74.6 52.5 78.7
Cyclic 76.0 16.1 71.9 59.3 77.5
Multiple-Expert 76.9 27.7 83.1 68.8 75.5
Fig. 13. A comparison of average write response times.

1.45x longer write response time than NT, on average,


respectively. In the case of proj, LADYRECOV and LADYALL
do not reduce write performance because the required life-
time can be satisfied without throttling. LADYRECOV
achieves 2.32x better performance than DT on average.
LADYALL outperforms LADYRECOV and improves the over-
all write response time by 13.5 percent. DT exhibits 1.4x
faster response time over ST on average. DT decides the
epoch capacity periodically based on the remaining SSD life-
time and changes an artificial delay in response to future
write traffic. ST neither considers the remaining SSD lifetime Fig. 15. CDFs of write response times with four different prediction poli-
cies for the map trace.
nor the characteristic of a workload. ST simply regulates
write traffic by limiting the maximum write throughput,
causing lots of unnecessary throttling. prediction policies and the workloads, but Multiple-Expert
We compared response time variations between differ- exhibits the highest accuracy. Fig. 15 shows the effect of the
ent throttling algorithms. Fig. 14 shows the cumulative write-traffic prediction accuracy on response time variations
density functions (CDFs) of write response times for four for map. If future write traffic is inaccurately estimated,
traces. ST shows significant response time variations for LADY uselessly increases or decreases the write speed
all the traces because it forcibly stops writing if throttling according to the mispredicted future write traffic, incurring
is needed. DT, LADYRECOV , and LADYALL greatly reduce variations on response times. As expected, Multiple-Expert
write response time variations over ST by evenly slowing showing the best accuracy exhibits the smallest response
down write performance. time variations.
Future write-traffic prediction. We evaluated the accuracy Effect of spare capacity. We evaluated the performance of
of our future write-traffic prediction policy. Table 4 com- LADY with various spare capacities, 0, 5, 10 percent, and
pares write-traffic prediction accuracies of four prediction 1. Fig. 16 shows the CDFs of write response times for
policies: Average, Weighted, Cyclic, and Multiple-Expert. Here, exchange and proxy. For exchange, write response
the prediction accuracy is obtained by comparing the differ- times deteriorate significantly with the spare capacities of 0
ence between the predicted write traffic and the actual one; and 5 percent; LADY often stops writing data because of the
the higher the number is the better the prediction accuracy depletion of the spare capacity. As the spare capacity
is. The prediction accuracy is different depending on the increases to 10 percent and 1, response time variations
become smaller, avoiding strict capacity regulation. In the
case of proxy, the exhaustion of spare capacity rarely
occurs, thus the size of the spare capacity does not affect
overall write response times. 1 shows the most stable write
response times by borrowing unlimited spare capacity, but
it cannot ensure the required lifetime. For exchange,
LADY offers about 4.9 years lifetime.
Worst case analysis. LADY properly manages the SSD life-
time even in the worst-case scenario where the spare

Fig. 16. The CDFs of write response times with various settings of spare
Fig. 14. CDFs of write response times. capacity.
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
1086 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 4, APRIL 2016

Fig. 17. An illustration of how LADY manages write traffic when the
spare capacity is nearly exhausted.

capacity is completely used up. To evaluate this, we synthe-


sized the worst-case write traffic; the spare capacity often
ran out and the overly-used spare capacity was not fully
reclaimed in future epochs. Fig. 17 illustrates how LADY
manages incoming write traffic when the spare capacity is
nearly exhausted. Fig. 17 shows the sum of the default
capacity and the remaining spare capacity, which is denoted Fig. 19. Results with exchange+map+msnfs.
by a total capacity. While the default capacity is maintained
as 2.2 MB, the spare capacity is changing over time. Fig. 17
Fig. 18a shows the write throughput (MB/s) and throt-
also shows the amount of data actually written. The spare
tling delays (msec) of proxy+proj for 0-1,400K seconds.
capacity is nearly exhausted at around 70th second, but
The throttling delay is the length of time increased by a
LADY properly handles write traffic so that more data than
slower program mode and an artificial delay. The write traf-
the total capacity are not written. This allows LADY to guar-
fic of proxy is much heavier than that of proj, while proj
antee the required SSD lifetime. Because of this strict write-
exhibits large fluctuations in write traffic. Initially, LADY
traffic regulation, the write performance is inevitably
changes throttling delays frequently; it makes wrong deci-
degraded; the write throughput is about 4 MB, but it is
sions because of the lack of the previous history. The write
reduced to 2-3 MB after the spare capacity is exhausted. We
traffic rapidly drops after proj starts, so the throttling
conducted a more comprehensive analysis that included
delay is reduced to 0 msec. LADY is gradually adapted to
lifetime, performance, and response time variations. For
the changing write traffic over time. Fig. 18b shows the
more details, see Section 2 of the Appendix, available in the
write throughput and throttling delays for 7,800K-9,000K
online supplemental material.
seconds. The throttling delays become more stable than
Adaptability to write traffic with variability. We evaluated
those for 0-1,400K seconds. Fig. 20a compares the CDFs of
how well LADY behaved under write traffic with vari-
write response times for 0-1,400K and 7,800K-9,000K sec-
ability. To this end, we combined I/O traces exhibiting
onds. For 7,800K-9,000K seconds, fluctuations in response
different I/O patterns. We mixed proxy and proj into
times become smaller. In particular, the average write
one trace (denoted by proxy+proj), and combined
response time is greatly improved. LADY applies long
exchange, map, and msnfs into one trace (denoted by
throttling delays to proxy for 0-1,400K seconds, expecting
exchange+map+msnfs).
that heavy write traffic would continue in the future. How-
ever, the write traffic of proj is much lower than that of
proxy. This allows LADY to apply shorter delays to proxy,
which improves write performance.
Figs. 19 and 20b show write throughput, throttling
delays, and CDFs for 0-250K and 1,610K-1,970K seconds of
exchange+map+msnfs. LADY works similarly to proxy
+proj; LADY often changes the length of throttling delays,
but it becomes stable after obtaining sufficient information
about previous write traffic.

Fig. 20. A comparison of CDFs for proxy+proj and exchange+map


Fig. 18. Results with proxy+proj. +msnfs.
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
LEE AND KIM: EFFECTIVE LIFETIME-AWARE DYNAMIC THROTTLING FOR NAND FLASH-BASED SSDS 1087

Fig. 21. The average response times for proxy+proj and exchange
+map.

Priority-aware dynamic throttling. Finally, we evaluated Fig. 22. A schematic description of our prototype.
priority-aware dynamic throttling, PA-DT. We executed
two traces simultaneously while assigning different
due to the limited internal information of the SSD.
importance values. Two I/O trace combinations, proxy
Instead, we focused on evaluating how effectively LADY
+proj and exchange+map, were used for the evalua-
throttled write performance in real environments. Con-
tion. The SSD capacity for proxy+proj was set to 32 GB
sidering that the effective wearing only had a great effect
and the SSD of 256 GB was used for exchange+map. The
on the number of effective P/E cycles, the limited inter-
target SSD lifetime was five years. Figs. 21a and 21b show
nal information was not serious obstacle in assessing the
the average write response times for proxy+proj and
feasibility of our throttling algorithm.
exchange+map, respectively, with different combina-
The benchmark programs used for our evaluations
tions of importance values ranging from 1 to 4. If two dif-
included TPC-C, bonnie++, and postmark. Since it was
ferent traces have the same importance values (i.e., (1,1)
infeasible to run real-world benchmarks for several years,
and (4,4)), they exhibit the same write response times.
we manually scaled down the target SSD lifetime Tssd and
However, as the difference between two importance val-
the amount of writable data Cssd . In particular, we executed
ues increases, the difference between their write response
multiple benchmark instances or threads, so as to under-
times increases as well. In the case of proxy+proj, if
stand whether incoming write traffic was effectively throt-
proxy has higher importance than proj, the average
tled or not even when independent write requests were
response time of proj is greatly increased. As listed in
issued. A detailed description of benchmarks and evalua-
Table 2, the amount of data written by proxy is about 2.5
tion settings are noted in Table 5.
times larger than that of proj. If a short write time is
Performance/lifetime analysis. We implemented two throt-
assigned to proxy in spite of its heavy write traffic, proj
tling algorithms, static throttling and LADY in the throttling
must be throttled more intensively to offset the SSD life-
layer, which are denoted by ST and LADY, respectively. We
time excessively consumed by proxy. For exchange
also have evaluated the SSD without write throttling, which
+map where two traces write the similar amount of data,
is denoted by NT.
the difference in write response times is not significant in
Fig. 23a shows the SSD lifetimes of NT, ST, and LADY.
comparison to proxy+proj. Finally, regarding the SSD
Tssd is set to 1 hour. NT cannot ensure 1-hour lifetime, while
lifetime, PA-DT guarantees the five-year target lifetime.
LADY guarantees the target lifetime. For all the bench-
6.2 Experiments with Linux Prototype marks, the amount of written data is smaller than Cssd . ST
In order to evaluate the feasibility of LADY in real-world
environments, we implemented a proof-of-concept prototype TABLE 5
of LADY in a PC server with 3.4 GHz i7 CPU, 12 GB RAM, A Summary of Benchmark Programs
and Samsung’s 840 SSD. The operating system was Ubuntu
10.04 with the Linux kernel 2.6.32.29. Benchmark Description Tssd Cssd
Experimental settings. Since it was difficult to implement TPC-C An on-line transaction 1 hour 10.8 GB
throttling algorithms directly in the firmware of a com- processing (OLTP) benchmark
mercial SSD, we added an intermediate layer, called a for transaction processing
systems. 40 users run
throttling layer, between an I/O scheduler and an SSD.
transactions simultaneously.
Fig. 22 shows a schematic description of our prototype.
Bonnie++ It creates and deletes files in 1 hour 72 GB
Throttling algorithms are implemented in the throttling
sequential and random orders,
layer. When a host system issues write requests to the while performing different
SSD, the throttling layer intercepts and delivers them to types of file system operations.
the SSD. After receiving completion interrupts from the Five bonnie++ programs are
SSD, it puts them into a throttling queue, instead of deliv- executed at the same time.
ering them to the I/O scheduler. After a throttling delay, Postmark It emulates a workload of 1 hour 25 GB
it dequeues them from the throttling queue and delivers electronic mail and netnews
them to the I/O scheduler. services. Eight postmark
In our implementation study, the effective wearing programs are executed
concurrently.
properties of NAND flash were not taken into account
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
1088 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 4, APRIL 2016

Fig. 23. Experimental results with the prototype of LADY in the Linux Fig. 24. An illustration of the write traffic of three different throttling poli-
operating system. cies with TPC-C.

also ensures the target lifetime, but it cannot fully utilize the attention. Mohan et al. first investigated the effect of the
available SSD endurance because of its excessive throttling, damage recovery on the SSD lifetime [8]. They showed
which results in great performance degradation. As that the endurance of NAND flash was durable enough
depicted in Fig. 23b, LADY reduces write response times by even for I/O intensive enterprise servers due to its recov-
72, 16, and 45 percent over ST for TPC-C, Bonnie++, and ery ability. Their investigations were limited to 90 nm
Postmark, respectively. It must be noted that LADY guar- NAND flash which exhibits good write endurance. They
antees the target lifetime in the environment where multiple also did not exploit the recovery effect in ensuring the
instances of benchmark programs simultaneously access the SSD lifetime.
SSD. As noted in Table 5, five Bonnie++ instances and Differences from our previous study. We presented the basic
eight Postmark instances write data to the SSD at the same idea of dynamic throttling in [1]. Our previously proposed
time, but LADY properly regulates write traffic. The similar throttling technique was greatly improved in several aspects.
results are also observed in TPC-C where 40 users run trans- First, while our earlier work only considered the self-recov-
actions concurrently. ery effect of NAND flash, this study took into account the
Fig. 24 illustrates the write traffic with three throttling trade-off between NAND program speed and write endur-
policies, NT, ST, and LADY, when TPC-C is running. NT ance in addition to the self-recovery effect. This improved
exhibits the best performance, but the SSD is completely the write performance by 13.5 percent on average over our
worn out at 3,024 seconds. By limiting the maximum write previous technique. Second, we improved the write-traffic
throughput of the SSD to 3 MB/s (i.e., 10.8 GB/3,600 sec- prediction technique which estimated future write traffic
onds), ST extends the SSD lifetime to more than 1 hour. more accurately, lowering response time variations. Third,
However, the SSD performance is excessively regulated we proposed priority-aware dynamic throttling which
when write requests are intensively issued; on the other managed the performance and lifetime of SSDs according to
hand, it is never throttled if there are only few write the importance of enterprise services. Finally, we imple-
requests. For this reason, ST incurs significant write mented a proof-of-concept prototype of LADY in the Linux
response variations, underutilizing the available SSD endur- kernel and evaluated its feasibility using various benchmark
ance. Unlike ST, LADY predicts the future write traffic and programs, including TPC-C, bonnie++, and postmark.
then changes write speeds as evenly as possible. Thus, the
SSD is worn out at the target lifetime with small variations
on write response times. 8 CONCLUSIONS
In this paper, we proposed the effective lifetime-aware
7 RELATED WORK dynamic throttling technique, called LADY, to overcome
the lifetime problems of flash-based SSDs in enterprise envi-
As the endurance of flash memory is continuously ronments. LADY throttled write performance so that the
reduced, several endurance enhancement techniques that required lifetime was satisfied. In order to guarantee the
aggressively reduce the amount of data written to SSDs SSD lifetime with small throttling penalties, LADY
have been proposed. Data de-duplication [25] and data exploited the effective wearing characteristics of NAND
compression [26] are representative ones. These techni- flash. Our experimental results showed that LADY guaran-
ques are useful in improving the lifetime of SSDs, but teed a lifetime warranty, while achieving better write
they have a limitation in that none of them guarantee the response times and smaller variations on response times
SSD lifetime. J. Guerra et al. presented a storage configu- over the static throttling technique.
ration strategy that effectively combines flash-based SSDs
with HDDs for multi-tier storage systems [6]. Their tech-
nique decided an adequate mix of storage devices that ACKNOWLEDGMENTS
requires the low capital and operating cost, satisfying per- An earlier version of this paper was presented at the USENIX
formance requirements with the minimum power con- Conference on File and Storage Technologies, February 14-
sumption. Although this work considered various aspects 17, 2012 [1]. This research was supported by Basic Science
required for designing multi-tier storage systems, they Research Program through the National Research Founda-
did not take into account the SSD lifetime having a great tion of Korea (NRF) funded by the Ministry of Education
effect on the overall storage cost, along with a trade-off (NRF-2013R1A6A3A03063762). This work was supported by
between SSD performance and lifetime. Exploiting the NRF grant funded by the Ministry of Science, ICT and Future
recovery effect of NAND flash has received considerable Planning (NRF-2013R1A2A2A01068260). This research was
Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.
LEE AND KIM: EFFECTIVE LIFETIME-AWARE DYNAMIC THROTTLING FOR NAND FLASH-BASED SSDS 1089

supported by Next-Generation Information Computing [22] N. Agrawal, V. Prabhakaran, and T. Wobber, “Design tradeoffs
for SSD performance,” in Proc. USENIX Annu. Tech. Conf., 2008,
Development Program through NRF funded by the Ministry pp. 57–70.
of Science, ICT and Future Planning (No. 2010-0020724). The [23] Y. Kim, J. Lee, S. Oral, D. A. Dillow, F. Wang, and G. M.
ICT at Seoul National University and IDEC provided Shipman, “Coordinating garbage collection for arrays of solid-
research facilities for this study. Jihong Kim is the corre- state drives,” IEEE Trans. Comput., vol. 63, no. 4, pp. 888–901,
Apr. 2014.
sponding author of this paper. [24] Y. Kim, S. Oral, G. M. Shipman, and J. Lee, “Harmonia: A
globally coordinated garbage collector for arrays of solid-state
REFERENCES drives,” in Proc. Symp. Massive Storage Syst. Technol., 2011,
pp. 1–12.
[1] S. Lee, T. Kim, K. Kim, and J. Kim, “Lifetime management of flash- [25] F. Chen, T. Luo, and X. Zhang, “CAFTL: A content-aware flash
based SSDs using recovery-aware dynamic throttling,” in Proc. translation layer enhancing the lifespan of flash memory based
USENIX Conf. File Storage Technol., 2012, pp. 327–340. solid state drives,” in Proc. USENIX Conf. File and Storage Technol.,
[2] E. Pinheiro, W.-D. Weber, and L.-A. Barroso, “Failure Trends in a 2011, pp. 77–90.
Large Disk Drive Population,” in Proc. USENIX Conf. File Storage [26] T. Park, and J.-S. Kim, “Compression support for flash translation
Technol., 2007, pp. 17–23. layer,” in Proc. Int. Workshop Softw. Support Portable Storage, 2010,
[3] D. Narayanan, E. Thereska, A. Donnelly, S. Elnikety, and pp. 19–24.
A. Rowstron, “Migrating server storage to SSDs: Analysis
of tradeoffs,” in Proc. ACM Eur. Conf. Comput. Syst., 2009, Sungjin Lee received the BE degree in electrical
pp. 145–158. engineering from Korea University, Seoul, Korea,
[4] A. Opitz, H. Knig, S. Szamlewska, “What does grid computing in 2005 and the MS and PhD degrees in computer
cost?” J. Grid Comput., vol. 6, no. 4, pp. 385–3972, 2008. science and engineering from the Seoul National
[5] B. You, J. Park, S. Lee, G. Baek, J. Lee, M. Kim, J. Kim, H. Chung, University, Seoul, in 2007 and 2013, respectively.
E. Jang, and T. Kim,“A high performance co-design of 26 nm He is currently working as a postdoctoral associ-
64 Gb MLC NAND flash memory using the dedicated NAND ate in the Computer Science and Artificial Intelli-
flash controller,” J. Semiconductor Technol. Sci., vol. 11, no. 2, gence Laboratory at the Massachusetts Institute
pp. 121–129, 2011. of Technology, Cambridge, MA. His current
[6] J. Guerra, H. Pucha, J. Glider, W. Belluomini, R. Rangaswami, research interests include storage systems, oper-
“Cost effective storage using extent based dynamic tiering,” ating systems, and embedded software.
in Proc. USENIX Conf. File Storage Technol., 2011, pp. 273–286.
[7] J. Jeong, S. Hahn, S. Lee, and J. Kim, “Improving NAND Endur-
ance by Dynamic Program and Erase Scaling,” in Proc. USENIX Jihong Kim received the BS degree in computer
Workshop Hot Topics Storage File Syst., 2013. science and statistics from Seoul National Uni-
[8] V. Mohan, T. Siddiqua, S. Gurumurthi, and M. Stan, “How versity (SNU), Seoul, Korea, in 1986, and the MS
I learned to stop worrying and love flash endurance,” in Proc. and PhD degrees in computer science and engi-
Workshop Hot Topics Storage File Syst., 2010, pp. 1–3. neering from the University of Washington, Seat-
[9] N. Mielke, H. Belgal, A. Fazio, Q. Meng, and N. Righos, “Recovery tle, WA, in 1988 and 1995, respectively. Before
Effects in the Distributed Cycling of Flash Memories,” in Proc. joining SNU in 1997, he was a technical staff
IEEE Int. Rel. Phys. Symp., 2006, pp. 29–35. member at the DSPS R&D Center of Texas
[10] Q. Wu, G. Dong, and T. Zhang, “Exploiting heat-accelerated flash Instruments in Dallas, TX. He is currently a pro-
memory wear-out recovery to enable self-healing SSDs,” in Proc. fessor at the School of Computer Science and
Workshop on Hot Topics Storage File Syst., 2011, pp. 1–4. Engineering, SNU. His current research interests
[11] Y. Pan, G. Dong, and T. Zhang, “Exploiting memory device include embedded software, low-power systems, computer architecture,
wear-out dynamics to improve NAND flash memory system and storage systems. He is a member of the IEEE.
performance,” in Proc. USENIX Conf. File Storage Technol.,
2011, pp. 245–258. " For more information on this or any other computing topic,
[12] Dell Inc., “Solid State Drive (SSD) FAQ,” 2011. please visit our Digital Library at www.computer.org/publications/dlib.
[13] SMART Modular Technologies, “XceedIOPS SATA SSD,” 2012.
[14] N. Cesa-Bianchi, Y. Freund, D. P. Helmhold, D. Haussler, R. E.
Schapire, and M. K. Warmuth, “How to use expert advice,”
in Proc. ACM Symp. Theory Comput., 1993, pp. 427–485.
[15] G. Dhiman and T. Rosing, “System-level power management
using online learning,” IEEE Trans. Comput.-Aided Des. Integr. Cir-
cuits Syst., vol. 28, no. 5, pp. 676–689, May 2009.
[16] S. Yoo, and C. Park, “Low power mobile storage: SSD case study,”
in Energy-Aware System Design: Algorithms and Architectures.
New York, NY, USA: Springer, 2011, pp. 223–246.
[17] D. Gmach, J. Rolia, L. Cherkasova, and A. Kemper, “Workload
analysis and demand prediction of enterprise data center
applications,” in Proc. IEEE Int. Symp. Workload Characterization,
2007, pp. 171–180.
[18] M. Carey, R. Jauhari, and M. Livny, “Priority in DBMS resource
scheduling”, in Proc. Int. Conf. Very Large Data Bases, 1989,
pp. 397–410.
[19] D. McWherter, B. Schroeder, A. Ailamaki, and M. Harchol-Balter,
“Improving preemptive prioritization via statistical characteriza-
tion of OLTP locking,” in Proc. Int. Conf. Data Eng., 2005, pp. 446–
457.
[20] D. Narayanan, A. Donnelly, and A. Rowstron, “Write off-loading:
Practical power management for enterprise storage,” in Proc. USE-
NIX Conf. File Storage Technol., 2008, pp. 253–267.
[21] S. Kavalanekar, B. Worthington, Q. Zhang, and V. Sharda,
“Characterization of storage workload traces from production
windows servers,” in Proc. Int. Symp. Workload Characterization,
2008, pp. 119–128.

Authorized licensed use limited to: University of Leeds. Downloaded on July 08,2024 at 23:21:45 UTC from IEEE Xplore. Restrictions apply.

You might also like