0% found this document useful (0 votes)
83 views6 pages

Software Zero-Copy for Web Caching

This paper proposes a lightweight software zero-copy mechanism called ZCopy for web-caching applications, addressing the issue of data mutation during transmission. By utilizing a twin memory allocator, ZCopy isolates zero-copying data from other application data, achieving significant performance improvements (up to 41%) in applications like Memcached and Varnish with minimal code changes. The implementation demonstrates that ZCopy can effectively enhance network I/O processing efficiency while maintaining data integrity.

Uploaded by

gamezzzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views6 pages

Software Zero-Copy for Web Caching

This paper proposes a lightweight software zero-copy mechanism called ZCopy for web-caching applications, addressing the issue of data mutation during transmission. By utilizing a twin memory allocator, ZCopy isolates zero-copying data from other application data, achieving significant performance improvements (up to 41%) in applications like Memcached and Varnish with minimal code changes. The implementation demonstrates that ZCopy can effectively enhance network I/O processing efficiency while maintaining data integrity.

Uploaded by

gamezzzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Revisiting Software Zero-Copy for Web-caching Applications

with Twin Memory Allocation∗


Xiang Song† ‡, Jicheng Shi† ‡, Haibo Chen†, Binyu Zang† ‡
†Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University
‡Software School, Fudan University

A BSTRACT tems with commodity networking devices. One approach


A key concern with zero copy is that the data to be sent is bypassing the operating system with Remote DMA.
out might be mutated by applications. In this paper, fo- However, these require special and expensive hardware
cusing specially on web-caching application, we observe (e.g., Infiniband [2] and Myrinet [10]) and most com-
that in most cases the data to be sent out is not supposed modity networking devices have not been built with such
to be mutated by applications, while the metadata around support. Several previous software zero copy mecha-
it does get mutated. Based on this observation, we pro- nisms, such as fbufs [7] and IO-Lite [11] are designed
pose a lightweight software zero-copy mechanism that for a micro-kernel and require special data management
uses a twin memory allocator to allocate spaces for zero- and accessing methods across protection domains. Con-
copying data, and ensures such data is unchanged before tainer shipping [12] supports zero-copy on UNIX plat-
being sent out with a lightweight data protection mech- forms, but requires data being aggregated in a scatter-
anism. The only change required to an application is gather manner and additional system-call interfaces. Ap-
to allocate zero-copying data through a specific ZCopy proaches [5, 6] using on-demand memory mapping and
memory allocator. To demonstrate the effectiveness of copy-on-write mechanism are limited by the protection
ZCopy, we have designed and implemented a prototype granularity (e.g., page size) and the corresponding align-
based on Linux and ported two applications with very ment requirement, thus may face the false sharing prob-
little effort. Experiments with Memcached and Varnish lem that protects unwanted data. This may cause notable
shows that show that ZCopy can achieve up to 41% per- performance overhead for irregular (e.g., unaligned) data
formance improvement over the vanilla Linux with less chunks. Modern operating systems also have several
CPU consumption. mechanisms to support zero copy, such as sendfile [14]
and splice [9]. However, such mechanisms require zero-
1 I NTRODUCTION copying data to be treated as files, which is not feasible
in many applications that need to mutate the data to be
Many network-intensive applications can easily be lim-
sent out.
ited by the speed of network I/O processing. Other than
The key issue in supporting software zero-copy is that
the physical limitation of networking devices, the perfor-
the zero-copying data might be mutated when being sent
mance of networking applications are also constrained
out. This is because when a user application invokes a
by the efficiency of network I/O sub-systems, in which
data sending system call (e.g., sendmsg and write), it as-
data copying is one of the key limiting factors. Usually,
sumes that the data has been sent out when the system
during network protocol processing, the operating sys-
call returns. However, when such system calls return,
tem kernel has to copy data from user space to a kernel
the data might have not been moved into the networking
buffer and then sends the kernel buffer to the network
devices. If the kernel does not copy the data from the
device.
user buffer to a kernel buffer, any changes on the data
Though there has been extensive research on avoiding
from applications may be sent out, which violates the se-
the data copying, prior systems are still not easily adopt-
mantics of such system calls.
able for many applications on commodity operating sys-
Intuitively it should be the case that the data will nor-
∗ We thank our shepherd Alexandra Fedorova the anonymous re-
mally not be mutated. However, focusing specifically
viewers for their insightful comments. This work was funded by China
National Natural Science Foundation under grant numbered 61003002, on web-caching applications, we observe that, in most
a grant from the Science and Technology Commission of Shanghai cases, the data to be sent out is not supposed to be mu-
Municipality numbered 10511500100. Xiang Song was also funded tated by applications. However, the data around it, es-
by Fudan University’s outstanding doctoral research funding schemes pecially the metadata corresponding to it, does get mu-
2011.
tated. Some metadata (e.g., the data expire time in Mem-
cached [8]) is usually co-located around the data to be
sent. Due to lacking of application semantics, operating
systems cannot simply zero-copy a page with specific stritem, while the metadata is stored from the beginning
network data packets as that page holding the network of it. Each time Memcached receives a request and find
data can be modified by applications. a corresponding key/value pair, the refcount of the corre-
Based on the above observation, we revisit the soft- sponding item will be increased in function do item get
ware zero-copy mechanism for web-caching applica- (As shown in Figure 2). If we write protect the key/value
tions. The basic idea is using a second (twin) memory pair, we need also write protect the metadata around it.
allocator to allocate and aggregate data that are likely Hence, there will be a lot of unnecessary protection faults
to be zero-copied, and providing a lightweight memory due to false sharing.
protection mechanism in case such data does get modi- The example indicates that for some networking ap-
fied. Hence, the zero-copying data can be isolated from plications, the network I/O data to be sent out is not sup-
other application data, thus can be aggregated together posed to be mutated. However, the data around it, espe-
to allow kernel to use traditional page-level protection. cially the metadata corresponding to it does get mutated.
This minimizes unnecessary write protection faults due Hence, naively write-protecting the networking data may
to false sharing. To support software zero copy, an in- also protect the metadata allocated within the same page,
kernel proxy is added into the UDP and TCP processing resulting in false protection.
paths to distinguish the zero-copy data with the others.
A write protection module is also added to handle rare !"#$%$&'(!)*+!',(!)-!$.'/
cases where the data that is supposed to be zero-copied 0'''''(!)*+!',(!)-!$.'12$3!4
5''''''6
have really been mutated. In such a case, the data will be 7''''*-2!8,!'''''''''''''''''29$"4'''''''':1'9$"';$2<!='1:
copied to ensure program correctness. >''''?@-%'1'$2%AB4''''''''''''''''''''''''':1'(!)*+!'/'''!"#$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%&'()
We have implemented a prototype based on Linux %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*+,+%%C'1:
2.6.38. The prototype of ZCopy is very lightweight and C'-!$.4
adds around 735 lines of code (LOCs) to Linux kernel
and adds 20 LOCs to streamflow [13]. It consists of Figure 1: Memcached storing item structure.
a specific user-level memory allocator ZC alloc based
on streamflow. A 200 LOCs user-level library is imple- !"#$%&"'()*#$%&*+%$,-)./$"-012"'3%45"-)./$"/#6%*$".3%47"8
mented to support cooperation between the ZCopy ker- 9 #$%&"'#$":"1//)-*;.(,3%45".3%47<
nel and the ZC alloc to provide memory protection for = >
? #@",#$"A:"BCDD7"8
zero-copying data. E #$FG2%@-)H.$II<
The porting effort required to run web-caching appli- J >
cations on ZCopy using zero-copy mechanism is also K"L

quite small. Providing zero-copy support to Mem-


Figure 2: Code piece of function do item get.
cached [8], a widely-used key-value based memory
caching server, requires only 10 LOCs changes. Run-
ning Varnish [4] server also only requires 3 LOCs mod- 3 D ESIGN AND A PPROACHES
ification. The only change required is simply replacing This section first presents an overview of ZCopy and then
the memory allocator for zero-copying data with the one illustrates the approaches to supporting efficient zero-
provided by the ZC alloc. copy mechanism.
To measure the effectiveness of ZCopy, we conducted
several application performance measurements using 3.1 ZCopy Overview
Memcached and Varnish web caching system. Perfor- It is quite intuitive to let applications to designate which
mance results show that ZCopy brings modest improve- data should be zero-copied. When such data is being sent
ment over vanilla Linux. ZCopy improves the through- out, ZCopy will zero-copy it while processing other data
put of Memcached over vanilla Linux up to 41.1% and through the normal path. However, it has to deal with the
40.8% for UDP and TCP processing when the value size following issues: 1) it should retain the existing memory
is larger than 256 bytes. The performance speedup of accessing manner for user applications; 2) it should con-
Varnish ranges from 0.7% to 7.9% for data size ranging form to existing system calls to avoid adding any new in-
from 2 KBytes to 8 KBytes. terfaces; and 3) it should provide proper protection over
the data to be sent out to conform to the semantics of
2 O BSERVATION existing network sending system calls.
To gain insight into how network data might be mutated, In ZCopy, we introduce a twin memory allocator to
we make a case study on Memcached. Figure 1 shows separately allocate data according to application seman-
the basic storing item structure of Memcached to store tics and aggregate several zero-copying memory blocks
key/value pairs. The key/value data is stored at the end of into the same memory chunks. Hence, the data to be

2
If the allocation request is for a large data block,
ZC alloc directly allocates a memory chunk rounded
from the requesting size. A threshold (4096 bytes by
default) is set in ZC alloc to decide whether a request is
for the large data block. This threshold can be tuned by
the programmer if needed.
The twin memory allocator is especially friendly to the
reusable data. Once a data block is allocated it will be
sent out to network multiple times before it is modified
or freed. One representative usage scenario is allocating
value data for Memcached. Memcached server caches a
Figure 3: An overview of architecture of ZCopy lot of key/value pairs in memory to serve quick key/value
queries. Every time the server receives a request con-
protected can be separated from other application data. taining a key, it will respond with the value correspond-
Figure 3 shows the general architecture of ZCopy. The ing to that key. For the perspective of long execution,
application running on ZCopy can use the original mem- the key/value pairs are not expected to be modified or
ory allocator (e.g., glibc) to allocate memory for normal freed. Hence, we can zero-copy the value during data
data or use the twin memory allocator named ZC alloc transferring without worrying about the modification to
to allocate memory for zero-copying data. A ZCopy such data in most cases.
proxy is added to the UDP and the TCP package pro- 3.2.2 Zero-copying Network I/O Data
cessing path to distinguish the network data that will be
zero-copied from others. If the data is allocated from ZCopy supports two common network protocols: UDP
ZC alloc, ZCopy will bypass the data copy path. Other- and TCP. We add a proxy in UDP and TCP’s package
wise, ZCopy will handle the data as usual. The proxy processing paths to distinguish the network data that will
also cooperates with the ZCopy data protection mod- be zero-copied and others. At the very beginning, ZCopy
ule to provide basic write protection on the zero-copying will first check whether current process wants to use
data. zero-copy mechanism or not. If so, it will check whether
there are any memory blocks that need to be write pro-
3.2 Supporting Zero-copy tected. The ZCopy data protection module is invoked if
3.2.1 Isolating Zero-copying Data with Twin write protection is needed.
Memory Allocator
To isolating zero-copying data from other data, ZCopy
provides a twin memory allocator along with the original
one to allocate memory for network data that is guar-
anteed to be insulated from other data allocated from
a generic memory allocator (e.g., glibc). Restricted by
the minimal memory protection granularity of a page
size and the following address alignment requirement,
ZC alloc has to pay special attention to small mem-
ory blocks (e.g., block size small than 1024 bytes). A
naive way to handle this is to allocate one page for each Figure 4: The structure of normal package and ZCopy package
request. However, this may waste a lot of memory. ZCopy handles zero-copying data at the time when the
ZC alloc uses an aggressive way by aggregating mem- network data is organized into a network package. Fig-
ory blocks with similar sizes into the same basic mem- ure 4 shows the structure of normal network package and
ory unit, namely the pageblock. A pageblock is treated as the ZCopy package. In normal cases (shown in the top
a basic protection chunk and usually consists of several half of Figure 4), a network package consists of several
pages (16 pages by default). It is write protected only protocol headers followed by network data. The network
when it is full of zero-copying data. As ZC alloc ag- data can be organized as a single data buffer or a list of
gregates zero-copying data together to provide memory data buffers. The data is copied from user address space
protection, it minimizes the amount of wasted memory into the package in order. If the package buffer is not
(e.g., by aggregating small objects smaller than 1 page large enough to hold all network data, the kernel will al-
size into a default pageblock, the maximum amount of locate new empty pages to hold the rest of the data and
memory wasted is less than 1 page, which is less than attaches them into the package’s page fragment list. Each
6.25%). entry in the list contains the starting address of the data

3
and its length. When the package is passed to the NIC batched in a group and are treated as a whole for write
driver, the driver will first transfer the package content protection. The minimal protection unit is one page-
and the fragments to the NIC hardware through the DMA block. When a pageblock is full, ZC alloc will request
engine. the kernel to protect it. To avoid the cost of context
switches between user space and kernel space and pos-
sible false protection problem caused by early write pro-
tection, ZCopy batches the requests from ZC alloc to de-
lay the protection of the pageblock until the system en-
ters the network package processing path. The protection
is done by walking the page table of the target range and
changing the protection bit of the corresponding page ta-
ble entries. When the pageblock is not full, data allocated
from ZC alloc are still sent through normal path without
being zero-copied.
ZCopy tries to protect zero-copying data blocks in an
aggressive way. ZCopy does not remove the write pro-
tection of the data block even if the data block is com-
Figure 5: Zero-copy in UDP package processing
pletely sent out by the hardware. The removal of the
ZCopy treats zero-copying data differently from nor- write protection is triggered only when a write operation
mal data. Each pageblock is identified by a magic string, is trapped by the kernel. At that time, the reference count
thus the zero-copy data buffers can be distinguished with of the page corresponding to the faulting address is first
others. We use the UDP package processing as an ex- checked. If the count is larger than one, a copy-on-write
ample to illustrate the process of handling zero-copying mechanism is used to protect the network data from be-
data. Figure 5 shows the UDP package processing path ing modified. Otherwise, the changing request should
in ZCopy and the bottom half of Figure 4 shows the come from the application itself and we simply remove
structure of a ZCopy package. ZCopy first scans the user the write protection. Note that, the basic protection unit
data buffer lists to copy all prior normal data into the is a pageblock, any write to a write protected pageblock
network package buffer including the protocol headers will cause all the data blocks belong to the pageblock
(step 1). Then, it iteratively processes the following user lose the write protection. However, we do not expect this
buffers by handling zero-copying data and normal data happens frequently as mutation on zero-copying data is
separately (step 2-5). It will check the pageblock magic rare.
string to discover zero-copying user buffers. For zero-
copying data (step 3), it first gets the starting address and 4 E XPERIMENTAL R ESULTS
the length of the data buffer. It then finds all pages cov- All experiments were conducted on an Intel machine
ered by the data buffer and finally organizes the pages in with 2 1.87 Ghz Six-Core Intel Xeon E7 chips running
the form of fragments and adds them into the package’s Debian GNU/Linux 6.0 with the kernel version 2.6.38.
page fragment list. For normal data (step 4), it allocates The NIC used is an Intel 82576 Gigabit Network Con-
new empty pages and copies the buffer content into them. troller. We use another Intel machine with the same hard-
ZCopy finally organizes the pages in the form of frag- ware and software configuration as the client machine.
ments and adds them into the package’s page fragment To minimize the interaction between different cores of
list. The package will be passed into the lower level of a multi-core system (e.g., cache trashing), experiments
the network stack. were conducted using only one CPU core.
One optimization to the ZCopy proxy is to treat read- We use two widely-used web-caching applications,
only data buffers as zero-copying buffers, though they Memcached 1.4.5 [8] and Varnish 3.0.0 [4] to demon-
are not allocated using ZC alloc. This can simply be strate the performance improvements. All applications
done by feeding the offset and length in the fragment list. in the experiments use the ZC alloc to allocate memory
for network data to eliminate the effect of using different
3.2.3 Protection of Zero-copying Data
memory allocators.
ZCopy must provide a protection mechanism to the zero-
copying data in case it is mutated when the data is sent 4.1 Memcached
out. To do this, ZCopy adds a simple data protection Memcached [8] caches multiple key/value pairs in mem-
module into the native memory management system. ory. Each time it receives a request containing a key, it
Based on the page-level protection granularity in will respond with the corresponding value. From a long
kernel, small data blocks allocated from ZC alloc are run’s perspective, the key/value pairs are not expected to

4
ZCopy
Vanilla Linux

00
50
0 00 00 00 00 00 00 00 00 00

Execution Time (cycles)


Vanilla Linux
0 0 0 0 0 0 0 0 0
20 40 60 80 100 120 140 160 180 ZCopy
Throughput (requests/sec)

Throughput Speedup (%)

00
40
Speedup

00
30
00
20
50
40

00
10
30
20
10

0
0 256 512 768 1024 1280
-10 Package Size (bytes)
128 256 512 768 1024 Figure 7: The time spent on UDP package processing for Mem-
Package Size (bytes)
cached in ZCopy and vanilla Linux.
Figure 6: The throughput of Memcached in ZCopy and vanilla
Linux with UDP and the speedup of ZCopy. shorter package sending time in ZCopy causes the NIC
interrupt handler switch frequently to the polling mode
be modified or freed. However, the metadata (e.g., the which is more effective than the interrupt mode in heavy
data expire time, the item links) along with the cached network stress. However, in vanilla Linux, the network
pairs may change. We modify Memcached to allocate status triggers less frequent switches to the NIC polling
memory for the values from ZC alloc. This takes only mode.
10 lines of modification to the original Memcached.
We use the memaslap testsuite form the libmemcached L2 Cache Miss Rate (1 miss/K cycles)
library [3] as the client of Memcached. The client first 512 bytes 768 bytes 1024 bytes
warms up Memcached with a user-defined number of UDP Linux 4.89 5.17 6.11
key/value pairs and then randomly issues get and set op- UDP ZCopy 4.17 4.57 4.73
erations through several concurrent connections. TCP Linux 8.08 9.06 10.86
UDP: Figure 6 shows the average throughput of Mem- TCP ZCopy 7.73 8.22 9.46
cached in ZCopy and vanilla Linux. The Memcached Table 1: The L2 cache miss rate in vanilla Linux and ZCopy in
is warmed up with ten thousand key/value pairs. The 256 byte, 768 byte and 1024 byte cases.
memaslap client is configured to issue pure get opera-
TCP: Figure 8 shows the average throughput of Mem-
tions through 36 concurrent connections from 12 threads
cached in ZCopy and vanilla Linux and the performance
using the UDP protocol. We adjust the number of worker
speed of ZCopy over vanilla Linux. We use the same
threads of Memcached to achieve the best performance.
evaluation method used in the UDP experiments. For
The CPU usage in all cases is above 99%. Vanilla Linux
each TCP connection, we only issues a signle request
performs slightly better when the value size is smaller
and then close it. Vanilla Linux performs better when the
than 256 bytes. However, when the value size reaches
value size is smaller than 256 bytes. However, when the
512 bytes, ZCopy starts to outperform vanilla Linux. In
value size reaches 512 bytes, ZCopy starts to outperform
512 bytes cases, ZCopy has a 28.7% performance im-
the vanilla Linux by 40.8%. When the value size is with
provement. When the value size is 768 bytes, the per-
1024 bytes, ZCopy outperforms vanilla Linux by 30.8%.
formance improvement increases to 41.1%. For the case
The performance of Memcached reaches the hardware
where the value size is 1024 bytes, ZCopy and vanilla
limits when the value size is of 2048 bytes.
Linux has nearly the same throughput as the network
As in UDP, the performance improvement comes from
reaches its hardware limitation.
copy avoidance and reduced cache trashing. As the code
The performance improvement comes from two parts: for TCP package processing and data sending is mixed
1) minimized data copying and 2) reduced cache trash- together, we measure the time spent on the tcp sendmsg
ing. Figure 7 compares the time spent on UDP pack- instead of TCP package processing time. Figure 9 shows
age processing in ZCopy and vanilla Linux. In ZCopy, the profiling results. From the figure we can see that
the package processing time is around 3000 cycles in ZCopy does reduce the time spent on tcp sendmsg in all
all cases. However, in vanilla Linux, the time increases cases. Table 1 shows the L2 cache miss rate of Mem-
along with the package size and reaches 4400 cycles in cached in Linux and ZCopy. ZCopy reduces 10.2% L2
1024 bytes cases. Table 1 shows the L2 cache miss rate cache misses in 768 byte cases and 14.8% L2 cache
of Memcached in Linux and ZCopy. ZCopy reduces misses in 1024 byte cases.
more than 10% L2 cache misses in UDP cases. The
hottest function copy user generic string in Linux disap- 4.2 Varnish
pears in ZCopy. Another reason for such notable perfor- Varnish [4] is an open-source web application accelera-
mance improvement in the 512 and 768 cases is that the tor. It caches web content into memory objects and re-

5
15000
00

00 00 00 00 00 00 00 00 00
Vanilla Linux ZCopy ZCopy
00

20 40 60 80 100 120 140 160 180


ZCopy Vanilla Linux Vanilla Linux

Throughput (requests/sec)
15
Throughput (requests/sec)

Throughput Speedup (%)

Execution Time (cycles)


00

Speedup 14000
00
12
0
00

13000
90

50
0
00

40
60

30 12000
20
0
00

10
30

0
-10 11000
0

0
256 512 768 1024 2048 256 512 768 1024 2048 4096 0 1024 2048 3072 4096 5120 6144 7168 8196
Package Size (bytes) Package Size (bytes) Package Size (bytes)

Figure 8: The throughput of Memcached Figure 9: The time spent on function Figure 10: The throughput of varnish
in ZCopy and vanilla Linux with TCP and tcp sendmsg for Memcached in ZCopy server in ZCopy and vanilla Linux.
the speedup of ZCopy. and vanilla Linux.
turns web objects according to the network request. We 5 C ONCLUSION AND F UTURE W ORK
modify Varnish to allocate object memory from ZC alloc This paper revisited the existing software zero-copy
with 3 LOCs changes. mechanism and presented a new zero copy system named
We test Varnish using ab (apache benchmark) from ZCopy, which was based on the observation that the
Apache with the web page sizes ranging from 1 KBytes metadata around the network data will usually get mu-
to 8 KBytes (the average individual response size ranges tated. Experiments with two applications on an Intel
from 3 KBytes to 15 KBytes [1].) Figure 10 compares machine show that ZCopy outperforms vanilla Linux for
the performance of ZCopy and vanilla Linux. The var- sending a relative large network data package.
nish server saturates the CPU on both ZCopy and vanilla In our future work, we plan to extend our work in
Linux. Vanilla Linux performs slightly better with small two directions. First, though we focus specially on web-
web page sizes (1 KBytes). However, when the web page caching applications in this paper, ZCopy places little
size increases, ZCopy starts to outperform Linux. The constraints on applications and is applicable to other
performance improvement reaches 7.8% when the web networking applications. We plan to study and evalu-
page size increases to 6 KBytes. Both configurations ate the performance benefit of ZCopy on other network-
reach networking limitation when the web page size in- intensive applications. Second, ZCopy was evaluated us-
creases to 8 KBytes. The reason that the improvement ing a single core. We plan to extend the ZCopy to effi-
is much less than Memcached is that the single request ciently run on multicore machines.
processing time in Varnish is much longer than that in
Memcached, which thus amortize the improvements of R EFERENCES
ZCopy. [1] Average web response size. [Link]
[2] Infiniband. [Link]
CPU cycles [3] LibMemcached. [Link]
getpid 1149.9 [4] Varnish web cache system. [Link]
ZCopy write protection fault 2802.5 [5] J.C. Brustoloni and P. Steenkiste. Effects of buffering semantics
native page fault 6247.4 on i/o performance. In Proc. OSDI, 1996.
[6] Jerry Chu. Zero-copy tcp in solaris. In Proc. Usenix ATC, 1996.
Table 2: The execution time of invoking getpid system call, trig- [7] P. Druschel and L.L. Peterson. Fbufs: A high-bandwidth cross-
gering ZCopy write protection fault and triggering native page domain transfer facility. In Proc. SOSP, pages 189–202. ACM,
fault. 1993.
[8] R. LERNER. Memcached integration in rails. Linux Journal,
2009.
4.3 ZCopy Primitive [9] L. McVoy. The splice i/o model, 1998.
Overhead of Write Protection We also evaluate the cost [10] myricom. Myrinet. [Link]
of triggering write protection faults for zero-copied data. [11] V.S. Pai, P. Druschel, and W. Zwaenepoel. Io-lite: a unified i/o
buffering and caching system. ACM TOCS, 18(1):37–66, 2000.
Table 2 shows the execution time of invoking the get-
[12] J. Pasquale, E. Anderson, and P.K. Muller. Container shipping:
pid system call, triggering ZCopy write protection fault operating system support for i/o-intensive applications. Com-
and triggering traditional page fault respectively. The puter, 27(3):84–93, 1994.
cost of triggering a ZCopy write protection fault is much [13] S. Schneider, C.D. Antonopoulos, and D.S. Nikolopoulos. Scal-
smaller than triggering a native page fault. This is be- able locality-conscious multithreaded memory allocation. In
cause usually ZCopy only removes the write protection Proc. ISMM, pages 84–94, 2006.
of the faulting address from the page table, which is [14] D. Stancevic. Zero copy i: user-mode perspective. Linux Journal,
2003(105):3, 2003.
much less expensive.

You might also like