0% found this document useful (0 votes)
7 views14 pages

Patterson

The document discusses SnapMirror, an asynchronous mirroring technology designed for disaster recovery, which allows data managers to balance performance and data loss. By periodically transferring file system snapshots, SnapMirror minimizes write latency and network bandwidth usage while ensuring data consistency. The paper outlines the advantages of SnapMirror over traditional tape backups and synchronous mirroring, emphasizing its cost-effectiveness and efficiency in maintaining up-to-date data across remote sites.

Uploaded by

sranil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

Patterson

The document discusses SnapMirror, an asynchronous mirroring technology designed for disaster recovery, which allows data managers to balance performance and data loss. By periodically transferring file system snapshots, SnapMirror minimizes write latency and network bandwidth usage while ensuring data consistency. The paper outlines the advantages of SnapMirror over traditional tape backups and synchronous mirroring, emphasizing its cost-effectiveness and efficiency in maintaining up-to-date data across remote sites.

Uploaded by

sranil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

USENIX Association

Proceedings of the
FAST 2002 Conference on
File and Storage Technologies

Monterey, California, USA


January 28-30, 2002

THE ADVANCED COMPUTING SYSTEMS ASSOCIATION

© 2002 by The USENIX Association All Rights Reserved For more information about the USENIX Association:
Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: office@[Link] WWW: [Link]
Rights to individual papers remain with the author or the author's employer.
Permission is granted for noncommercial reproduction of the work for educational or research purposes.
This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.
SnapMirror®: File System Based Asynchronous Mirroring
for Disaster Recovery
Hugo Patterson, Stephen Manley, Mike Federwisch, Dave Hitz, Steve Kleiman, Shane Owara
Network Appliance Inc.
Sunnyvale, CA
{hugo, stephen, mikef, hitz, srk, owara}@[Link]

Abstract hours downtime can cost from thousands to millions of


dollars depending on the size of the enterprise and the
Computerized data has become critical to the survival of
role of the data. With increasing frequency, companies
an enterprise. Companies must have a strategy for recov-
ering their data should a disaster such as a fire destroy the are instituting disaster recovery plans to ensure appropri-
primary data center. Current mechanisms offer data man- ate data availability in the event of a catastrophic failure
agers a stark choice: rely on affordable tape but risk the or disaster that destroys a site (e.g. flood, fire, or earth-
loss of a full day of data and face many hours or even quake). It is relatively easy to provide redundant server
days to recover, or have the benefits of a fully synchro- and storage hardware to protect against the loss of phys-
nized on-line remote mirror, but pay steep costs in both ical resources. Without the data, however, the redundant
write latency and network bandwidth to maintain the hardware is of little use.
mirror. In this paper, we argue that asynchronous mirror- The problem is that current strategies for data pro-
ing, in which batches of updates are periodically sent to tection and recovery offer either inadequate protection,
the remote mirror, can let data managers find a balance or are too expensive in performance and/or network
between these extremes. First, by eliminating the write bandwidth. Tape backup and restore is the traditional ap-
latency issue, asynchrony greatly reduces the perfor- proach. Although favored for its low cost, restoring from
mance cost of a remote mirror. Second, by storing up a nightly backup is too slow and the restored data is up to
batches of writes, asynchronous mirroring can avoid a day old. Remote synchronous and semi-synchronous
sending deleted or overwritten data and thereby reduce mirroring are more recent alternatives. Mirrors keep
network bandwidth requirements. Data managers can backup data on-line and fully synchronized with the pri-
tune the update frequency to trade network bandwidth mary store, but they do so at a high cost in performance
against the potential loss of more data. We present Snap- (write latency) and network bandwidth. Semi-synchro-
Mirror, an asynchronous mirroring technology that le- nous mirrors can reduce the write-latency penalty, but
verages file system snapshots to ensure the consistency can result in inconsistent, unusable data unless write or-
of the remote mirror and optimize data transfer. We use dering across the entire data set, not just within one stor-
traces of production filers to show that even updating an age device, is guaranteed. Data managers are forced to
asynchronous mirror every 15 minutes can reduce data choose between two extremes: synchronized with great
transferred by 30% to 80%. We find that exploiting file expense or affordable with a day of data loss.
system knowledge of deletions is critical to achieving
In this paper, we show that by letting a mirror vol-
any reduction for no-overwrite file systems such as
ume lag behind the primary volume it is possible to re-
WAFL and LFS. Experiments on a running system show
duce substantially the performance and network costs of
that using file system metadata can reduce the time to
maintaining a mirror while bounding the amount of data
identify changed blocks from minutes to seconds com-
loss. The greater the lag, the greater the data loss, but the
pared to purely logical approaches. Finally, we show that
cheaper the cost of maintaining the mirror. Such asyn-
using SnapMirror to update every 30 minutes increases
chronous mirrors let data managers tune their systems to
the response time of a heavily loaded system only 22%.
strike the right balance between potential data loss and
cost.
1 Introduction
We present SnapMirror, a technology which imple-
As reliance on computerized data storage has
ments asynchronous mirrors on Network Appliance fil-
grown, so too has the cost of data unavailability. A few
ers. SnapMirror periodically transfers self-consistent
snapshots of the data from a source volume to the desti-
SnapMirror, NetApp, and WAFL are registered trademarks of nation volume. The mirror is on-line, so disaster recov-
Network Appliance, Inc.
ery can be instantaneous. Users set the update frequency. 1.1 Outline for remainder of paper
If the update frequency is high, the mirror will be nearly
We start, in Section 1.2, with a discussion of the re-
current with the source and very little data will be lost
quirements for disaster recovery. We go on in Sections
when disaster strikes. But, by lowering the update fre-
1.3 and 1.4 to discuss the shortcomings of tape-based re-
quency, data managers can reduce the performance and
covery and synchronous remote mirroring. In Section 2,
network cost of maintaining the mirror at the risk of in-
we review related work. We present the design and im-
creased data loss.
plementation of SnapMirror in Section 3. In Section 4,
There are three main problems in maintaining an we use system traces to study the data reduction benefits
asynchronous mirror. First, for each periodic transfer, the of asynchronous mirroring with file system knowledge.
system must determine which blocks need to be trans- Then, in Section 5, we compare SnapMirror to asynchro-
ferred to the mirror. To obtain the bandwidth reduction nous mirroring at the logical file level. Section 6, pre-
benefits of asynchrony, the system must avoid transfer- sents experiments measuring the performance of our
ring data which is overwritten or deleted. Second, if the SnapMirror implementation running on a loaded system.
source volume fails at any time, the destination must be Conclusion, acknowledgments, and references are in
ready to come on line. In particular, a half-completed Sections 7, 8, and 9.
transfer can’t leave the destination in an unusable state.
Effectively, this means that the destination must be in, or 1.2 Requirements for Disaster Recovery
at least recoverable to, a self-consistent, state at all times. Disaster recovery is the process of restoring access
Finally, for performance, disk reads on the source and to a data set after the original was destroyed or became
writes on the destination must be efficient. unavailable. Disasters should be rare, but data unavail-
In this paper, we show how SnapMirror leverages ability must be minimized. Large enterprises are asking
the internal data structures of NetApp’s WAFL® file sys- for disaster recovery techniques that meet the following
tem [Hitz94] to solve these problems. SnapMirror lever- requirements:
ages the active block maps in WAFL’s snapshots to Recover quickly. The data should be accessible within a
quickly identify changed blocks and avoid transferring few minutes after a failure.
deleted blocks. Because SnapMirror transfers self-con-
Recover consistently. The data must be in a consistent
sistent snapshots of the file system, the remote mirror is
state so that the application does not fail during the re-
always guaranteed to be in a consistent state. New up-
covery attempt because of a corrupt data set.
dates appear atomically. Finally, because it operates at
the block level, SnapMirror is able to optimize its data Minimal impact on normal operations. The perfor-
reads and writes. mance impact of a disaster recovery technique should be
minimal during normal operations.
We show that SnapMirror's periodic updates trans-
fer much less data than synchronous block-level mirrors. Up to date. If a disaster occurs, the recovered data
Update intervals as short as 1 minute are sufficient to re- should reflect the state of the original system as closely
duce data transfers by 30% to 80%. The longer the period as possible. Loss of a day or more worth of updates is not
between updates, the less data needs to be transferred. acceptable in many applications.
SnapMirror allows data managers to optimize the Unlimited distance. The physical separation between
tradeoff of data currency against cost for each volume. the original and recovered data should not be limited.
In this paper, we explore the interaction between Companies may have widely separated sites and the
asynchronous mirroring and no-overwrite file systems scope of disasters such as earthquakes or hurricanes may
such as LFS [Rosenblum92] and WAFL. We find that require hundreds of miles of separation.
asynchronous block-level mirroring of these file systems Reasonable cost. The solution should not require exces-
does not transfer less data synchronous mirroring. Be- sive cost, such as many high-speed, long-distance links
cause these file systems do not update in place, logical (e.g. direct fiber optic cable). Preferably, the link should
overwrites become writes to new storage blocks. To gain be compatible with WAN technology.
the data reduction benefits of asynchrony for these file
systems, it is necessary to have knowledge of which 1.3 Recovering from Off-line Data
blocks are active and which have been deallocated and
Traditional disaster recovery strategies involve
are no longer needed. This is an important observation
loading a saved copy of the data from tape onto a new
since many commercial mirroring products are imple-
server in a different location. After a disaster, the most
mented at the block level.
recent full backup tapes are loaded onto the new server.
A series of nightly incremental backups may follow the
full backup to bring the recovered volume as up-to-date the improved performance comes at the cost of some po-
as possible. This worked well when file systems were of tential data loss in the event of a disaster.
moderate size and when the cost of a few hours of down-
A major challenge for non-synchronous mirroring is
time was acceptable, provided such events were rare.
ensuring the consistency of the remote data. If writes ar-
Today, companies are taking advantage of the 60% rive out-of-order at the remote site, the remote copy of
compound annual growth rate in disk drive capacity the data may appear corrupted to an application trying to
[Growchowski96] and file system size is growing rapid- use the data after a disaster. If this occurs, the remote
ly. Terabyte storage systems are becoming common- mirroring will have been useless since a full restore from
place. Even with the latest image dump technologies tape will probably be required to bring the application
[Hutchinson99], data can only be restored at a rate of back on line. The problem is especially difficult when a
100-200 GB/hour. If disaster strikes a terabyte file sys- single data set is spread over multiple devices and the
tem, it will be off line for at least 5-10 hours if tape-based mirroring is done at the device level. Although each de-
recovery technologies are used. This is unacceptable in vice guarantees in-order delivery of its the data, there
many environments. may be no ordering guarantees among the devices. In a
rolling disaster, one in which devices fail over a period
Will technology trends solve this problem over
of time (imagine fire spreading from one side of the data
time? Unfortunately, the trends are against us. Although
center to the other), the remote site may receive data
disk capacities are growing 60% per year, disk transfer
from some devices but not others. Therefore, whenever
rates are growing at only 40% per year [Grochowski96].
synchrony is relaxed, it is important that it be coordinat-
It is taking more, not less, time to fill a disk drive even in
ed at a high enough level to ensure data consistency at the
the best case of a purely sequential data stream. In prac-
remote site.
tice, even image restores are not purely sequential and
achieved disk bandwidth is less than the sequential ideal. Another important issue is keeping track of the up-
To achieve timely disaster recovery, data must be kept dates required on the remote mirror should it or the link
on-line and ready to go. between the two systems become unavailable. Once the
modification log on the primary system is filled, the pri-
1.4 Remote Mirroring mary system usually abandons keeping track of individ-
Synchronous remote mirroring immediately copies ual modifications and instead keeps track of updated
all writes to the primary volume to a remote mirror vol- regions. When the destination again becomes available,
ume. The original transfer is not acknowledged until the the regions are transferred. Of course, the destination file
data is written to both volumes. The mirror gives the user system may be inconsistent while this transfer is taking
a second identical copy of the data to fall back on if the place, since file system ordering rules may be violated,
primary file system fails. In many cases, both copies of but it’s better than starting from scratch.
the data are also locally protected by RAID.
The down side of synchronous remote mirroring is
2 Related Work
that it can add a lot of latency to I/O write operations. There are other ways to provide disaster recovery
Slower I/O writes slow down the server writing the data. besides restore from tape and synchronous mirroring.
The extra latency results first from serialization and One is server replication.
transmission delays in the network link to the remote Server replication is another approach to providing
mirror. Longer distances can bloat response time to un- high availability. Coda is one example of a replicated file
acceptable levels. Second, unless there is a dedicated system [Kistler93]. In Coda, the clients of a file server
high-speed line to the remote mirror, network congestion are responsible for writing to multiple servers. This ap-
and bandwidth limitations will further reduce perfor- proach is essentially synchronous logical-level mirror-
mance. For these reasons, most synchronous mirroring ing. By putting the responsibility for replication on the
implementations limit the distance to the remote mirror clients, Coda effectively off-loads the servers. And, be-
to 40 kilometers or less. cause clients are aware of the multiple servers, recovery
Because of its performance limitations, synchronous from the loss of a server is essentially instantaneous.
mirroring implementations sometimes slightly relax However, Coda is not designed for replication over a
strict synchrony, to allow a limited number of source I/O WAN. If the WAN connecting a client to a remote server
operations to proceed before waiting for acknowledg-
ment of receipt from the remote site1. Although this ap-
proach can reduce I/O latency, it does not reduce the link 1. EMC’s SRDF™ in semi-synchronous mode or Stor-
bandwidth needed to keep up with the writes. Further, age Computer’s Omniforce® in log synchronous mode.
is slow or congested, the client will feel a significant per- source file system.
formance impact. Another difference is that where Coda
Periodically, SnapMirror reflects changes in the
leverages client-side software, SnapMirror’s goal is to
source volume to the destination volume. It replicates the
provide disaster recovery for the file servers without cli-
source at a block-level, but uses file system knowledge
ent side modifications.
to limit transfers to blocks that are new or modified and
Earlier, we mentioned that SnapMirror leverages that are still allocated in the file system. SnapMirror does
file system metadata to detect new data since the last up- not transfer blocks which were written but have since
date of the mirror. But, there are many other approaches. been overwritten or deallocated.
At the logical file system level, the most common Each time SnapMirror updates the destination, it
approach is to walk the directory structure checking the takes a new snapshot of the source volume. To determine
time that files were last updated. For example, the UNIX which blocks need to be sent to the destination, it com-
dump utility compares the file modify times to the time pares the new snapshot to the snapshot from the previous
of the last dump to determines which files it should write update. The destination jumps forward from one snap-
to an incremental dump tape. Other examples of detect- shot to the next when each transfer is completed. Effec-
ing new data at the logical level include programs like rd- tively, the entire update is atomically applied to the
ist and rsync [Tridgell96]. These programs traverse both destination volume. Because the source snapshots al-
the source and destination file systems, looking for files ways contain a self-consistent, point-in-time image of
that have been more recently modified on the source than the entire volume or file system, and these snapshots are
the destination. The rdist program will only transfer applied atomically to the destination, the destination al-
whole files. If one byte is changed in a large database ways contains a self-consistent, point-in-time image of
file, the entire file will be transferred. The rsync program the volume. SnapMirror solves the problem of ensuring
works to compute a minimal range of bytes that need be destination data consistency even when updates are
transferred by comparing checksums of byte ranges. It asynchronous and not all writes are transferred so order-
uses CPU resources on the source server to reduce net- ing among individual writes cannot be maintained.
work traffic. Compared to these programs SnapMirror
The system administrator sets SnapMirror's update
does not need to traverse the entire file system or do
frequency to balance the impact on system performance
checksums to determine the block differences between
against the lag time of the mirror.
the source and destination. On the other hand, SnapMir-
ror needs to be tightly integrated with the file system 3.1 Snapshots and the Active Map File
whereas approaches which operate at the logical level are
SnapMirror's advantages lie in its knowledge of the
more general.
Write Anywhere File Layout (WAFL) file system and its
Another approach to mirroring, adopted by databas- snapshot feature [Hitz94], which runs on top of Network
es such as Oracle, is to write a time-stamp in a header in Appliance's file servers. WAFL is designed to have
each on-disk data block. The time-stamp enables Oracle many of the same advantages as the Log Structured File
to determine if a block needs to be backed up by looking System (LFS) [Rosenblum92]. It collects file system
only at the relatively small header. This can save a lot of block modification requests and then writes them to an
time compared to approaches which must perform check unused group of blocks. WAFL's block allocation policy
sums on the contents of each block. But, it still requires is able to fit new writes in among previously allocated
each block to be scanned. In contrast, SnapMirror uses blocks, and thus it avoids the need for segment-cleaning.
file system data structures as an index to detect updates. WAFL also stores all metadata in files, like the Episode
The total amount of data examined is similar in the two file system [Chutani92]. This allows updates to write
cases, but the file system structures are stored more metadata anywhere on disk, in the same manner as regu-
densely and consequently the number of blocks that must lar file blocks.
be read from disk is much smaller.
WAFL's on-disk data structure is a tree that points to
all data and metadata. The root of the tree is called the fs-
3 SnapMirror Design and Implementation info block. A complete and consistent version of the file
SnapMirror is an asynchronous mirroring package system can be reached from the information in this block.
currently available on Network Appliance file servers. The fsinfo block is the only exception to the no-over-
Its design goal was to meet the data protection needs of write policy. Its update protocol is essentially a database-
large-scale systems. It provides a read-only, on-line, rep- like transaction; the rest of the file system image must be
lica of a source file system. In the event of disaster, the consistent whenever a new fsinfo block overwrites the
replica can be made writable, replacing the original old. This insures that partial writes will never corrupt the
file system. timizations apply equally well to both systems. When the
block transfers complete, the destination writes its new
It is easy to preserve a consistent image of a file sys-
fsinfo block.
tem, called a snapshot, at any point in time, by simply
saving a copy of the information in the fsinfo block and 3.2.2 Block-Level Differences and Update
then making sure the blocks that comprise the file system Transfers
image are not reallocated. Snapshots will share the block
Part of the work involved in any asynchronous mir-
data that remains unmodified with the active file system;
roring technique is to find the changes that have occurred
modified data are written out to unallocated blocks. A
in the primary file system and make the same changes in
snapshot image can be accessed through a pointer to the
another file system. Not surprisingly, SnapMirror uses
saved fsinfo block.
WAFL’s active map file and reference snapshots to do
WAFL maintains the block allocations for each this as shown in Figure 1.
snapshot in its own active map file. The active map file
When a mirror has an update scheduled, it sends a
is an array with one allocation bit for every block in the
message to the source. The source takes an incremental
volume. When a snapshot is taken, the current state of the
reference snapshot and compares the allocation bits in
active file system’s active map file is frozen in the snap-
the active map files of the base and incremental reference
shot just like any other file. WAFL will not reallocate a
snapshots. This active map file comparison follows the
block unless the allocation bit for the block is cleared in
following rules:
every snapshot’s active map file. To speed block alloca-
tions, a summary active map file maintains for each If the block is not allocated in either active map, it is un-
block, the logical-OR of the allocation bits in all the used. The block is not transferred. It did not exist in the
snapshot active map files. old file system image, and is not in use in the new one.
Note that it could have been allocated and deallocated
3.2 SnapMirror Implementation between the last update and the current one.
Snapshots and the active map file provide a natural If the block is allocated in both active maps, it is un-
way to find out block-level differences between two in- changed. The block is not transferred. By the file sys-
stances of a file system image. SnapMirror also uses such tem's no-overwrite policy, this block's data has not
block-level information to perform efficient block-level changed. It could not have been overwritten, since the
transfers. Because the mirror is a block-by-block replica old reference snapshot keeps the block from being re-al-
of the source, it is easy to turn it into a primary file server located.
for users, should disaster befall the source.
If the block is only allocated in the base active map, it has
3.2.1 Initializing the Mirror been deleted. The block is not transferred. The data it
The destination triggers SnapMirror updates. The contained has either been deleted or changed.
destination initiates the mirror relationship by requesting If the block is only allocated in the incremental active
an initial transfer from the source. The source responds map, it has been added. The block is transferred. This
by taking a base reference snapshot and then transferring means that the data in this block is either new or an up-
all the blocks that are allocated in that or any earlier snap- dated version of an old block.
shot, as specified in the snapshots’ active map files.
Note that SnapMirror does not need to understand
Thus, after initialization, the destination will have the
whether a transferred block is user data or file system
same set of snapshots as the source. The base snapshot
metadata. All it has to know is that the block is new to the
serves two purposes: first, it provides a reference point
file system since the last transfer and therefore it should
for the first update; second, it provides a static, self-con-
be transferred. In particular, block de-allocations auto-
sistent image which is unaffected by writes to the active
matically get propagated to the mirror, because the up-
file system during the transfer.
dated blocks of the active map file are transferred along
The destination system writes the blocks to the same with all the other blocks.
logical location in its storage array. All the blocks in the
In practice, SnapMirror transfers the blocks for all
array are logically numbered from 1 to N on both the
existing snapshots that were created between the base
source and the destination, so the source and destination
and incremental reference snapshots. If a block is newly
array geometries need not be identical. However, be-
allocated in the active maps of any of these snapshots,
cause WAFL optimizes block layout for the underlying
then it is transferred. Otherwise, it is not. Thus, the des-
array geometry, SnapMirror performance is best when
tination has a copy of all of the source’s snapshots.
the source and destination geometries match and the op-
Base Reference Snapshot Base Reference Snapshot
Active File System Active File System
Block 100
Initial Transfer C’ Block 100
File System Changes
C C Block 101 C Block 101
Base Ref Snapshot Active File System Base Ref Snapshot Active File System
A A Block 102 A Block 102
D D Block 103 D D Block 103
B B Block 104 B B Block 104
A B C D A B C D C’ E
Block 105 E Block 105
Block 106 Block 106

Base Reference Snapshot


Incremental Reference Snapshot

Active Map Comparison: Active File System


Update Transfer
added (transferred) C’ C’ Block 100
deleted (not transferred) C Block 101 Base Ref Snapshot Inc Ref Snapshot Active File System
deleted (not transferred) A Block 102
unchanged (not transferred) D D D Block 103
unchanged (not transferred) B B B Block 104
added (transferred) E E Block 105 A B C D C’ E
unused (not transferred) Block 106

Figure 1. SnapMirror’s use of snapshots to identify blocks for transfer. SnapMirror uses a base reference snapshot
as point of comparison on the source and destination filers. The first such snapshot is used for the Initial Transfer. File
System Changes cause the base snapshot and the active file system to diverge (C is overwritten with C', A is deleted,
E is added). Snapshots and the active file system share unchanged blocks. When it is time for an Update Transfer,
SnapMirror takes a new incremental reference snapshot and then compares the snapshot active maps according to the
rules in the text to determine which blocks need to be transferred to the destination. After a successful update, Snap-
Mirror deletes the old base snapshot and the incremental becomes the new base.

At the end of each transfer the fsinfo block is updat- partition. The mirror is left in the same state as it was be-
ed, which brings the user’s view of the file system up to fore the transfer started, since the new fsinfo block is
date with the latest transfer. The base reference snapshot never written. Because all data is consistent with the last
is deleted from the source, and the incremental reference completed round of transfers, the mirror can be reestab-
snapshot becomes the new base. Essentially, the file sys- lished when both systems are available again by finding
tem updates are written into unused blocks on the desti- the most recent common SnapMirror snapshot on both
nation and then the fsinfo block is updated to refer to this systems, and using that as the base reference snapshot.
new version of the file system with is already in place.
3.2.4 Update Scheduling and Transfer Rate
3.2.3 Disaster Recovery and Aborted Transfers Throttling
Because a new fsinfo block (the root of the file sys- The destination file server controls the frequency of
tem tree structure) is not written until all blocks are trans- update through how often it requests a transfer from the
ferred, SnapMirror guarantees a consistent file system on source. System administrators set the frequency through
the mirror at any time. The destination file system is ac- a cron-like schedule. If a transfer is in progress when an-
cessible in a read-only state throughout the whole Snap- other scheduled time has been reached, the next transfer
Mirror process. At any point, its active file system will start when the current transfer is complete. SnapMir-
replicates the active map and fsinfo block of the last ref- ror also allows the system administrator to throttle the
erence snapshot generated by the source. Should a disas- rate at which a transfer is done. This prevents a flood of
ter occur, the destination can be brought immediately data transfers from overwhelming the disks, CPU, or net-
into a writable state. work during an update.
The destination can abandon any transfer in progress
in response to a failure at the source end or a network
3.3 SnapMirror Advantages and Limitations through modifications to the file system, or through a
slower, logical-level approach.
SnapMirror meets the emerging requirements for
data recovery by using asynchrony and combining file
system knowledge with block-level transfers. 4 Data Reduction through Asynchrony
An important premise of asynchronous mirroring is
Because the mirror is on-line and in a consistent
that periodic updates will transfer less data than synchro-
state at all phases of the relationship, the data is available
nous updates. Over time, many file operations become
during the mirrored relationship in a read-only capacity.
moot either because the data is overwritten or deleted.
Clients of the destination file system will see new up-
Periodic updates don’t need to transfer any deleted data
dates atomically appear. If they prefer to access a stable
and only need to transfer the most recent version of an
image of the data, they can access one of the snapshots
overwritten block. Essentially, periodic updates use the
on the destination. The mirror can be brought into a writ-
primary volume as a giant write cache and it has long
able state immediately, making disaster recovery ex-
been known that write caches can reduce I/O traffic
tremely quick.
[Ousterhout85, Baker91, Kistler93]. Still at question,
The schedule-based updates mean that SnapMirror though, is how much asynchrony can reduce mirror data
has as much or as little impact on operations as the sys- traffic for modern file server workloads over the extend-
tem administrator allows. The tunable lag also means ed intervals of interest to asynchronous mirroring.
that the administrator controls how up to date the mirror
To answer these questions, we traced a number of
is. Under most loads, SnapMirror can reasonably trans-
file servers at Network Appliance and analyzed the trac-
mit to the mirror many times in one hour.
es to determine how much asynchronous mirroring
SnapMirror works over a TCP/IP connection that would reduce data transfers as a function of update peri-
uses standard network links. Thus, it allows for maxi- od. We also analyzed the traces to determine the impor-
mum flexibility in locating the source and destination fil- tance of using the file system’s active map to avoid
ers and in the network connecting them. transferring deleted blocks for WAFL as an example of
The nature of SnapMirror gives it advantages over no-overwrite file systems.
traditional mirroring approaches. With respect to syn-
4.1 Tracing environment
chronous mirroring, SnapMirror reduces the amount of
data transferred, since blocks that have been allocated We gathered 24 hours of traces from twelve separate
and de-allocated between updates are not transferred. file systems or volumes on four different NetApp file
And because SnapMirror uses snapshots to preserve im- servers. As shown in Table 1, these file systems varied in
age data, the source can service requests during a trans- size from 16 GB to 580 GB, and the data written over the
fer. Further, updates at the source never block waiting for day ranged from 1 GB to 140 GB. The blocks counted in
a transfer to the remote mirror. the table are each 4 KB in size. The systems stored data
from: internal web pages, engineers’ home directories,
The time required for a SnapMirror update is largely
kernel builds, a bug database, the source repository, core
dependent on the amount of new data since the last up-
dumps, and technical publications.
date and, to some extent, on file system size. The worst-
case scenario is where all data is read from and re-written In synchronous or semi-synchronous mirroring all
to the file system between updates. In that case, Snap- disk writes must go to both the local and remote mirror.
Mirror will have to transfer all file blocks. File system To determine how many blocks asynchronous mirroring
size plays a part in SnapMirror performance due to the would need to transfer at the end of any particular update
time it takes to read through the active map files (which interval, we examined the trace records and recorded in
increases as the number of total blocks increase). a large bit map which blocks were written (allocated)
during the interval. We cleared the dirty bit whenever the
Another drawback of SnapMirror is that its snap-
block was deallocated. In an asynchronous mirroring
shots reduce the amount of free space in the file system.
system, this is equivalent to computing the logical-AND
On systems with a low rate of change, this is fine, since
of the dirty map with the file system’s active map and
unchanged blocks are shared between the active file sys-
only transferring those blocks which are both dirty and
tem and the snapshot. Higher rates of change mean that
still part of the active file system.
SnapMirror reference snapshots tie up more blocks.
By design, SnapMirror only works for whole vol- 4.2 Results
umes as it is dependent on active map files for updates. Figure 2 plots the blocks that would be transferred
Smaller mirror granularity could only be achieved by SnapMirror as a percentage of the blocks that would
Blocks Written
File System Size Used
Filer Description Written Deleted
Name (GB) (GB)
(1000’s) (%)
Build1 Source tree build space 100 68 7757 69
Cores1 Core dump storage 100 72 319 85
Benchmark scratch space
Bench Ecco 87 56 512 91
and results repository
Pubs Technical Publications 32 16 262 59
Users1 Engineering home directories 350 292 10803 78
Bug Bug tracking database 16 11 1465 98
Cores2 Maglite Core dump storage 550 400 11956 76
Source Source control repository 50 36 3288 70
Cores3 Core dump storage 255 151 1582 77
Makita Engineering home directories
Users2 580 470 13752 53
and corporate intranet site
Build2 Source tree build space 320 271 34779 80
Ronco
Users3 Engineering home directories 380 323 15103 85
Table 1. Summary data for the traced file systems. We collected 24 hours of traces of block alloca-
tions (which in WAFL are the equivalent of disk writes) and de-allocations in the 12 file systems listed
in the table. The ‘Blocks Written’ is the total number of blocks written and indicates the number of
blocks that a synchronous block-level mirror would have to transfer. The ‘Written Deleted’ column
shows the percentage of the written blocks which were overwritten or deleted. This represents the po-
tential reduction in blocks transferred to an asynchronous mirror which is updated only once at the end
of the 24-hour period. The reduction ranges from 52% to 98% and averages about 78%.

be transferred by a synchronous mirror as a function of repeatedly dirty the same block which would eventually
the update period: 1 minute, 5 minutes, 15 minutes, 30 only need to be transferred once. Further, because the file
minutes, 1 hour, 6 hours, 12 hours, and 24 hours. We allocation policies of these file system often result to the
found that even an update interval of only 1 minute re- reallocation of blocks recently freed, even file deletions
duces the data transferred by at least 10% and by over and creations end up reusing the same set of blocks.
20% on all but one of the file systems. These results are The situation is very different for no-overwrite file
consistent with those reported for a 30 second write- systems such as LFS and WAFL. These systems tend to
caching interval in earlier tracing studies [Ousterhout85, avoid reusing blocks for either overwrites or new creates.
Baker91]. Moving to 15 minute intervals enabled asyn- Figure 3 plots the blocks transferred by SnapMirror,
chronous mirroring to reduce data transfers by 30% to which takes advantage of the file system’s active map to
80% or over 50% on average. The marginal benefit of in- avoid transferring deallocated blocks, and an asynchro-
creasing the update period diminishes beyond 60 min- nous block-level mirror, which does not, as a percentage
utes. Nevertheless, extending the update period all the of the blocks transferred by the synchronous mirror for a
way to 24 hours reduces the data transferred to between selection of the file systems. Because, most of the file
53% and 98% – over 75% on average. This represents a systems in the study had enough free space in them to ab-
50% reduction compared to an update interval of 15 min- sorb all of the data writes during the day, there were es-
utes. Clearly, the benefits of asynchronous mirroring can sentially no block reallocations during the course of the
be substantial. day. For these file systems, the data reduction benefits of
As mentioned above, we performed the equivalent asynchrony would be completely lost if SnapMirror were
of a logical-AND of the dirty map with the file system’s not able to take advantage of the active maps. In the fig-
active map to avoid replicating deleted data. How impor- ure, the ‘all other, include deallocated’ line represents
tant is this step? In conventional write-in-place file sys- these results. There were two exceptions, however.
tems such as the Berkeley FFS [McKusick84], we do not Build2 wrote about 135 GB of data while the volume had
expect this last step to be critical. File overwrites would only about 50 GB of free space and Source wrote about
Percentage of written blocks transferred 100 100
Build1
Cores1
80 Bench 80
Cores2
Cores3
60 Build2 60

40 40

20 20

0 0
0 200 400 600 800 1000 1200 1400 0 10 20 30 40 50 60
Update interval (minutes) Update interval (minutes)
(a) (b)
100 100
Percentage of written blocks transferred

Pubs
Users1
80 Bug 80
Source
Users2
60 Users3 60

40 40

20 20

0 0
0 200 400 600 800 1000 1200 1400 0 10 20 30 40 50 60
Update interval (minutes) Update interval (minutes)
(c) (d)
Figure 2. Percentage of written blocks transferred by SnapMirror vs. update interval. These graphs show, for
each of the 12 traced systems, the percentage of written blocks that SnapMirror would transfer to the destination mirror
as a function of mirror update period. Because the number of traces is large, the results are split into upper and lower
pairs of graphs. The left graph in each pair (a and c) show the full range of intervals from 1 minute to 1440 minutes
(24 hours). The right graphs in each pair (b and d) expand the region from 1 to 60 minutes. The graphs show that most
of the reduction in data transferred occurs with an update period of as little as 15 minutes, although substantial addi-
tional reductions are possible as the interval is increased to an hour or more.

13 GB of data with only 14 GB of free space. Inevitably, next section.


in these file systems, there was some block reuse as
shown in the figure. Even in these two cases, however, 5 SnapMirror vs. Asynchronous Logical
the use of the active map was highly beneficial. Success- Mirroring
ful asynchronous mirroring of no-overwrite file systems The UNIX dump and restore utilities can be used to
requires the use of the file system’s active map or equiv- implement an asynchronous logical mirror. Dump works
alent information. above the operating system to identify files which need
An alternative to the block-level mirroring (with or to be backed up. When performing an incremental, the
without the active map) discussed in this section is logi- utility only writes to tape the files which have been cre-
cal or file-system level mirroring. This is the topic of the ated or modified since the last incremental dump. Re-
Percentage of written blocks transferred
100

80
Build2, include deallocated
60 Build2, omit deallocated
Source, include deallocated
Source, omit deallocated
40 All other, include deallocated

20

0
0 200 400 600 800 1000 1200 1400
Update interval (minutes)

Figure 3. Percentage of written blocks transferred with and without use of the active map to
filter out deallocated blocks. Successful asynchronous mirroring of a no-overwrite file system such
as LFS or WAFL depends on the file system’s active map to filter out deallocated blocks and achieve
reductions in block transfers. Without the use of the active map, only 2 of the 12 measured systems,
would see any transfer reductions.

Used (GB) Files Data


File Sys- Size Time Rate
System transferred
tem Name (GB) Base End Base End (sec.) (MB/s)
(GB)
SnapMirror 2.1 140 15.4
Users4 96 63 65 1001131 1054917
logical 4.0 493 8.3
SnapMirror 15.3 797 19.7
Users5 192 135 150 5297016 6423984
logical 25.2 7200 3.6
Table 2. Logical replication vs. SnapMirror incremental update performance. We measured incremental perfor-
mance of SnapMirror and logical replication on two separate data sets. Since SnapMirror sends only changed blocks,
it transfers at least 39% less data than logical mirroring.

store reads such incremental dumps and recreates the workstation. For these experiments, we configured dump
dumped file system. If dump’s data stream is piped di- to send its data over the network to a restore process on
rectly to a restore instead of a tape, the utilities effective- another filer. Because this code and data path are includ-
ly copy the contents of one file system to another. An ed in a shipping product, they are reasonably well tuned
asynchronous mirroring facility could periodically run and the comparison to SnapMirror is fair.
an incremental dump and pipe the output to a restore run- To compare logical mirroring to SnapMirror, we
ning on the destination. The following set of experiments first established and populated a mirror between two fil-
compares this approach to SnapMirror. ers in the lab. We then added data to the source side of
5.1 Experimental Setup the mirror and measured the performance of the two
mechanisms as they transferred the new data to the des-
To implement the logical mirroring mechanism, we tination file system. We did this twice with two sets of
took advantage of the fact that Network Appliance filers data on two different sized volumes. For data, we used
include dump and restore utilities to support backup and production full and incremental dumps of some home di-
the Network Data Management Protocol (NDMP) copy rectory volumes. Table 2 shows the volumes and their
command. The command enables direct data copies from sizes. The full dump provided the base file system. The
one filer to another without going through the issuing incremental provided the new data.
We used a modified version of restore to load the in-
data transfer
cremental data into the source volume. The standard re-
data scan
store utility always completely overwrites files which
have been updated; it never updates only the changed
blocks. Had we used the standard restore, SnapMirror 600 8000
and the logical mirroring would both have transferred
whole files. Instead, when a file on the incremental tape 500
6000
matched an existing file in both name and inode number,

Time (seconds)

Time (seconds)
400
the modified restore did a block by block comparison of
the new and existing files and only wrote changed blocks 300 4000
into the source volume. The logical mirroring mecha-
nism, which was essentially the standard dump utility, 200
still transferred whole files, but SnapMirror was able to 2000
take advantage of the fact that it could detect which 100
blocks had been rewritten and thus transfer less data.
0 0
For hardware, we used two Network Appliance SnapMirror logical SnapMirror logical
F760 filers directly connected via Intel GbE. Each uti- Users4 Users5
lized an Alpha 21164 processor running at 600 MHz,
with 1024 MB of RAM plus 32 MB non-volatile write Figure 4. Logical replication vs. SnapMirror incre-
mental update times. By avoiding directory and inode
cache. For the tests run on Users4, each filer was config-
scans, SnapMirror’s data scan scales much better than
ured with 7 FibreChannel-attached disks (18 GB, 10k that of logical replication. Note: tests are not rendered on
rpm) on one arbitrated loop. For the tests run on Users5, the same scale)
each filer was configured with 14 FibreChannel-attached
disks on one arbitrated loop. Each group of 7 disks was
set up with 6 data disks and 1 RAID4 parity disk. All transferring only changed blocks can be substantial com-
tests were run in a lab with no external load. pared to whole file transfer.
5.2 Results
6 SnapMirror on a loaded system
The results for the two runs are summarized in Table
2 and Figure 4. Note that in the figure, the two sets of To assess the performance impact on a loaded sys-
runs are not rendered to the same scale. The ‘data scan’ tem of running SnapMirror, we ran some tests very much
value for logical mirroring represents the time spent like the SPEC SFS97 [SPEC97] benchmark for NFS file
walking the directory structure to find new data. For servers.
SnapMirror, ‘data scan’ represents the time spent scan- In the tests, data was loaded onto the server and a
ning the active map files. This time is essentially inde- number of clients submitted NFS requests at a specified
pendent of the number of files or the amount new data aggregate rate or offered load. For these experiments,
but is instead a function of volume size. The number was there were 48 client processes running on 6 client ma-
determined by performing a null transfer on a volume of chines. The client machines were 167 MHz Ultra-1 Sun
this size. workstations running Solaris 2.5.1, connected to the
The most obvious result is that logical mirroring server via switched 100bT ethernet to an ethernet NIC on
takes respectively 3.5 and 9.0 times longer than Snap- the server. The server was a Network Appliance F760 fil-
Mirror to update the remote mirror. This difference is er with the same characteristics as the filers in Section
due both to the time to scan for new data and the efficien- 5.1. The filer had 21 disks configured in a 320 GB vol-
cy of the data transfers themselves. When scanning for ume. The data was being replicated to a remote filer.
changes, it is much more efficient to scan the active map 6.1 Results
files than to walk the directory structure. When transfer-
ring data, it is much more efficient to read and write After loading data onto the filer and synchronizing
blocks sequentially than to go through the file system the mirrors, we set the SnapMirror update period to the
code reading and writing logical blocks. desired value and measured the request response time
over an interval of 60 minutes. Table 3 and Figure 5 re-
Beyond data transfer efficiency, SnapMirror is able port the results for an offered load of 4500 and 6000 NFS
to transfer respectively 48% and 39% fewer blocks than operations per second. In the table, SnapMirror data is
the logical mirror. These results show that savings from the total data transferred to the mirror over the 60 minute
and a scan of the active map files. With less frequent up-
Load Update CPU Disk SnapMirror dates, the impact of these fixed costs is spread over a
(ops/s) Interval busy busy data (MB) much greater period. Second, as the update period in-
base 66% 34% 0 creases, the amount of data that needs to be transferred to
the destination per unit time decreases. Consequently
1 min. 93% 50% 12817
4500 SnapMirror reads as a percentage of the total load de-
15 min. 74% 43% 6338 creases.
30 min. 69% 40% 2505
base 87% 54% 0 7 Conclusion
1 min. 99% 67% 13965 Current techniques for disaster recovery offer data
6000
15 min. 94% 62% 8071 managers a stark choice. Waiting for a recovery from
30 min. 91% 60% 3266 tape can cost time, millions of dollars, and, due to the age
of the backup, can result in the loss of hours of data.
Table 3. SnapMirror Update Interval Impact on Sys-
Failover to a remote synchronous mirror solves these
tem Resources. During SFS-like loads, resource con-
problems, but does so at a high cost in both server perfor-
sumption diminishes dramatically when SnapMirror
mance and networking infrastructure.
update intervals increase. Note: base represents perfor-
mance when SnapMirror is turned off. In this paper, we presented SnapMirror, an asyn-
chronous mirroring package available on Network Ap-
10 4500 ops/s pliance filers. SnapMirror periodically updates an on-
with SnapMirror line mirror. It provides the rapid recovery of synchro-
6000 ops/s
nous remote mirroring but with greater flexibility and
8
NFS response time (msec)

control in maintaining the mirror. With SnapMirror, data


managers can choose to update the mirror at an interval
6 of their choice. SnapMirror allows the user to strike the
with SnapMirror proper balance between data currency on one hand and
performance and cost on the other.
4
base
By updating the mirror periodically, SnapMirror can
transfer much less data than would a synchronous mirror.
2 base In this paper, we used traces of 12 production file sys-
tems to show that by updating the mirror every 15 min-
utes, instead of synchronously, SnapMirror can reduce
0
0 5 10 15 20 25 30 data transfers by 30% to 80%, or 50% on average. Updat-
Update interval (minutes)
ing every hour reduces transfers an average of 58%. Dai-
ly updates reduce transfers by over 75%.
Figure 5. SnapMirror Update Interval vs. NFS re- SnapMirror benefits from the WAFL file system’s
sponse time. We measured the effect of SnapMirror on
ability to take consistent snapshots both to ensure the
the NFS response time of SFS-like loads. By increasing
SnapMirror update intervals, the penalty approaches a consistency of the remote mirror and to identify changed
mere 22%. blocks. It also uses the file system’s active map to avoid
transferring deallocated blocks. Trace analysis showed
that this last optimization is critically important for no-
run. overwrite file systems such as WAFL and LFS. Of the 12
traces analyzed, 10 would have seen no transfer reduc-
Even with the SnapMirror update period set to only tions even with only update after 24 hours.
one minute, the filer is able to sustain a high throughput
of NFS operations. However, the extra CPU and disk SnapMirror also leverages block level behavior to
load increases response time by a factor of two to over solve performance problems that challenge logical-level
three depending on load. mirrors. In experiments comparing SnapMirror to dump-
based logical-level asynchronous mirroring, we found
Increasing the SnapMirror update period to 30 min- that using block-level file system knowledge reduced the
utes decreases the impact on response time to only about time to identify new or changed blocks by as much as
22% even when the system is heavily loaded with 6000 two orders of magnitude. By avoiding a walk of directory
ops/sec. This reduction comes from two major effects. and inode structures, SnapMirror was able to detect
First, each SnapMirror update requires a new snapshot changed data significantly more quickly than the logical
schemes. Furthermore, transferring only changed blocks, [Hutchinson99] N.C. Hutchinson, S. Manley, M. Feder-
rather than full files, reduced the data transfers by over wisch, G. Harris, D. Hitz, S. Kleiman, S. O’Malley.
40%. Asynchronous mirror updates can run much more Logical vs. Physical File System Backup. Proceed-
ings Third Symposium on Operating System Design
frequently when it takes a short time to identify blocks and Implementation. February 1999.
for transfer, and only the necessary blocks are updated.
Thus, SnapMirror’s use of file system knowledge at a [Kistler92] J.J. Kistler, M. Satyanarayanan. Disconnect-
ed Operation in the Coda File System. ACM Trans-
block level greatly expands its utility. actions on Computer Systems, 10(1). February
SnapMirror fills the void between tape-based disas- 1992.
ter recovery and synchronous remote mirroring. It dem- [Kistler93] J.J. Kistler, Disconnected Operation in a Dis-
onstrates the benefit of combining block-level and tributed File System. Technical Report CMU-CS-
93-156. School of Computer Science, Carnegie
logical-level mirroring techniques. It gives system ad-
Mellon University, 1993. [Link]
ministrators the flexibility they need to meet their varied afs/[Link]/project/coda/Web/[Link]
data protection requirements at a reasonable cost.
[McKusick84] M.K. McKusick, W.J. Joy, S.J. Leffler,
R.S. Fabry. A Fast File System for UNIX. Transac-
8 Acknowledgments tions on Computer Systems 2,3. August 1984. pp.
181-197
The authors wish to thank Steve Gold, Norm Hutch-
inson, Guy Harris, Sean O’Malley and Lara Izlan for [Ousterhout85] J.K. Ousterhout, H. Da Costa, D. Harri-
their generous contributions. We also wish to thank the son, J.A. Kunze, M. Kupfer, J.G. Thompson, “A
Trace-Driven Analysis of the UNIX 4.2 BSD File
reviewers and our shepherd, Roger Haskin, for their System,” Proceedings of the 10th Symposium on
helpful suggestions. Operating Systems Principles (SOSP), Orcas Island,
WA, December, 1985, pp. 15-24.
9 References [Rosenblum92] M. Rosenblum, J.K. Osterhout. The De-
[Chutani92] S. Chutani, et. al. The Episode File System. sign and Implementation of a Log-structured File
Proceedings of the Winter 1992 USENIX Confer- System. ACM Transactions on Computer Systems,
ence, San Francisco, CA, January 1992. pp. 43-60. Vol.10, No.1 (Feb. 1992), pp. 26-52.
[Baker91] M. Baker, J. Hartman, M. Kupfer, L. Shirriff, [SPEC97] The Standard Performance Evaluation Corpo-
J. Ousterhout. Measurements of a Distributed File ration. SPEC SFS97 Benchmark. http://
[Link]/osg/sfs97/.
System. Proceedings of the 13th Symposium on Op-
erating System Principles. October 1991. pp. 198- [Storage] Storage Computer Corporation Omniforce©
212. software. [Link]
[EMC] EMC Symmetrix® Remote Data Facility. http:// [Tridgell96] A. Tridgell, P. Mackerras. The rsync algo-
[Link]/. rithm. Department of Computer Science Australian
National University. TR-CS-96-05.
[Growchowski96] E.G. Grochowski, R.F. Hoyt. Future
Trends in Hard Disk Drives. IEEE Transactions on
Magnetics, V32. May 1996. pp. 1850-1854.
[Hitz94] D. Hitz, J. Lau, M.A. Malcolm. File System De-
sign for an NFS File Server Appliance. Proceedings
USENIX Winter 1994 Conference. pp. 235-246. ht-
tp://[Link]/tech_library/[Link]

You might also like