0% found this document useful (0 votes)
37 views19 pages

Speicher - Securing LSM-based Key-Value Stores Using Shielded Execution

This paper introduces SPEICHER, a secure storage system that leverages shielded execution using Intel SGX to provide confidentiality, integrity, and data freshness for key-value stores. SPEICHER extends the trust of shielded execution beyond volatile enclave memory to persistent storage. It designs an authenticated and confidentiality-preserving LSM data structure and asynchronous trusted counters to ensure data freshness. SPEICHER implements storage and query operations like RocksDB and evaluates the performance overhead of providing security guarantees.

Uploaded by

Kapil Vaswani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views19 pages

Speicher - Securing LSM-based Key-Value Stores Using Shielded Execution

This paper introduces SPEICHER, a secure storage system that leverages shielded execution using Intel SGX to provide confidentiality, integrity, and data freshness for key-value stores. SPEICHER extends the trust of shielded execution beyond volatile enclave memory to persistent storage. It designs an authenticated and confidentiality-preserving LSM data structure and asynchronous trusted counters to ensure data freshness. SPEICHER implements storage and query operations like RocksDB and evaluates the performance overhead of providing security guarantees.

Uploaded by

Kapil Vaswani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Speicher: Securing LSM-based Key-Value Stores

using Shielded Execution


Maurice Bailleu, Jörg Thalheim, and Pramod Bhatotia, The University of Edinburgh; Christof
Fetzer, TU Dresden; Michio Honda, NEC Labs; Kapil Vaswani, Microsoft Research
https://www.usenix.org/conference/fast19/presentation/bailleu

This paper is included in the Proceedings of the


17th USENIX Conference on File and Storage Technologies (FAST ’19).
February 25–28, 2019 • Boston, MA, USA
978-1-939133-09-0

Open access to the Proceedings of the


17th USENIX Conference on File and
Storage Technologies (FAST ’19)
is sponsored by
S PEICHER: Securing LSM-based Key-Value Stores using Shielded Execution

Maurice Bailleu, Jörg Thalheim, Pramod Bhatotia


The University of Edinburgh

Christof Fetzer† , Michio Honda‡ , Kapil Vaswani∗


† TU Dresden, ‡ NEC Labs, ∗ Microsoft Research

Abstract properties of the stored data and query operations. In fact,


We introduce S PEICHER, a secure storage system that many studies show that software bugs, configuration errors,
not only provides strong confidentiality and integrity prop- and security vulnerabilities pose a serious threat to storage
erties, but also ensures data freshness to protect against systems [9, 12, 16, 20, 24, 35, 37].
rollback/forking attacks. S PEICHER exports a Key-Value (KV) However, securing a storage system is quite challenging be-
interface backed by Log-Structured Merge Tree (LSM) for cause modern storage systems are quite complex [9, 49, 64, 72].
supporting secure data storage and query operations. S PE - For instance, a persistent KV store based on the Log-Structured
ICHER enforces these security properties on an untrusted host Merge Tree (LSM) data structure [54] is composed of multiple
by leveraging shielded execution based on a hardware-assisted software layers to enable a data path to the storage persistence
trusted execution environment (TEE)—specifically, Intel layer. Thereby, the enforcement of security policies needs
SGX. However, the design of S PEICHER extends the trust in to be carried out by various layers in the system stack, which
shielded execution beyond the secure SGX enclave memory could expose the data to security vulnerabilities. Furthermore,
region to ensure that the security properties are also preserved since the data is stored outside the control of the data owner,
in the stateful (or non-volatile) setting of an untrusted storage the third-party storage platform provides an additional attack
medium, including system crash, reboot, or migration. vector. The clients currently have limited support to verify
More specifically, we have designed an authenticated whether the third-party operator, even with good intentions,
and confidentiality-preserving LSM data structure. We can handle the data with the stated security guarantees.
have further hardened the LSM data structure to ensure In this landscape, the advancements in trusted execution
data freshness by designing asynchronous trusted counters. environments (TEEs), such as Intel SGX [4] or ARM
Lastly, we designed a direct I/O library for shielded execution TrustZone [7], provide an appealing approach to build
based on Intel SPDK to overcome the I/O bottlenecks in secure systems. In fact, given the importance of security
the SGX enclave. We have implemented S PEICHER as a threats in the cloud, there is a recent surge in leveraging
fully-functional storage system by extending RocksDB, and TEEs for shielded execution of applications in the untrusted
evaluated its performance using the RocksDB benchmark. infrastructure [8, 10, 55, 69, 75]. Shielded execution aims to
Our experimental evaluation shows that S PEICHER incurs provide strong security properties using a hardware-protected
reasonable overheads for providing strong security guarantees, secure memory region or enclave.
while keeping the trusted computing base (TCB) small. While the shielded execution frameworks provide strong
security guarantees against a powerful adversary, they are
1 Introduction primarily designed for securing “stateless" (or volatile)
With the growth in cloud computing adoption, online data in-memory computations and data. Unfortunately, these
stored in data centers is growing at an ever increasing rate [11]. stateless techniques are not sufficient for building a secure
Modern online services ubiquitously use persistent key-value storage system, where the data is persistently stored on an
(KV) storage systems to store data with a high degree of reliabil- untrusted storage medium, such as an SSD or HDD. The
ity and performance [39, 65]. Therefore, persistent KV stores challenge is how to extend the trust beyond the “secure, but
have become a fundamental part of the cloud infrastructure. stateless/volatile" enclave memory region to the “untrusted
At the same time, the risks of security violations in and persistent" storage medium, while ensuring that the
storage systems have increased significantly for the third- security properties are preserved in the “stateful settings", i.e.,
party cloud computing infrastructure [66]. In an untrusted even across the system reboot, migration, or crash.
environment, an attacker can compromise the security To answer this question, we aim to build a secure storage sys-

USENIX Association 17th USENIX Conference on File and Storage Technologies 173
tem using shielded execution targeting all three important se- freshness of the data. Thus, our LSM data structure over-
curity properties for the data storage and query processing: (a) comes the memory and I/O limitations of Intel SGX.
confidentiality — unauthorized entities cannot read the data, (b) • Algorithms: We present the design and implementation
integrity — unauthorized changes to the data can be detected, of all storage and query operations in persistent KV stores:
and (c) freshness — stale state of data can be detected as such. get, put, range queries, iterators, compaction, and restore.
To achieve these security properties, more specifically, we We have built a fully-functional prototype of S PEICHER
need to address the following three architectural limitations of based on RocksDB [65], and extensively evaluated it using
shielded execution in the context of building a secure storage the RocksDB benchmark suite. Our evaluation shows that
system: Firstly, the secure enclave memory region is quite lim- S PEICHER incurs reasonable overheads, while providing
ited in size, and incurs high performance overheads for memory strong security properties against powerful adversaries.
accesses. It implies that the storage engine cannot store the
data inside the enclave memory; thus, the in-memory data
2 Background and Threat Model
needs to be stored in the untrusted host memory. Furthermore, 2.1 Intel SGX and Shielded Execution
the storage engine persists the data on an untrusted storage Intel Software Guard Extension (SGX) is a set of x86 ISA
medium, such as SSDs. Since the TEE cannot give any security extensions for Trusted Execution Environment (TEE) [15].
guarantees beyond the enclave memory, we need to design SGX provides an abstraction of secure enclave—a hardware-
mechanisms for extending the trust to secure the data in the un- protected memory region for which the CPU guarantees the
trusted host memory and also on the persistent storage medium. confidentiality and integrity of the data and code residing in
Secondly, the syscall-based I/O operations are quite the enclave memory. The enclave memory is located in the
expensive in the context of shielded execution since the Enclave Page Cache (EPC)—a dedicated memory region
thread executing the system call has to exit the enclave, and protected by an on-chip Memory Encryption Engine (MEE).
perform a secure context switch, including TLB flushing, The MEE encrypts and decrypts cache lines with writes and
security checks, etc. While existing shielded execution reads in the EPC, respectively. Intel SGX supports a call-gate
frameworks [8, 55] proposed an asynchronous system call mechanism to control entry and exit into the TEE.
interface [70], it is clearly not well-suited for building a Shielded execution based on Intel SGX aims to provide
storage system that requires frequent I/O calls. To mitigate strong confidentiality and integrity guarantees for applications
the expensive enclave exits caused by I/O syscalls, we need to deployed on an untrusted computing infrastructure [8, 10, 55,
design a direct I/O library for shielded execution to completely 69, 75]. Our work builds on the S CONE [8] shielded execution
eliminate the expensive context switch from the data path. framework. In S CONE, the applications are statically compiled
Lastly, we also aim to ensure data freshness to protect and linked against a modified standard C library (S CONE libc).
against rollback (replay old state) or forking attacks (create In this model, application’s address space is confined to the
second instance). Therefore, we need a protection mechanism enclave memory, and interaction with the untrusted memory is
based on a trusted monotonic counter [57], for example, SGX performed via the system call interface. In particular, S CONE
trusted counters [3]. Unfortunately, the SGX trusted counters runtime provides an asynchronous system call mechanism [70]
are extremely slow and they wear out within a couple of days in which threads outside the enclave asynchronously execute
of operation. To overcome the limitations of the SGX counters, the system calls. S CONE protects the executing application
we need to redesign the trusted monotonic counters to suit the against Iago attacks [13] through shields. Furthermore, it en-
requirements of modern storage systems. sures memory safety for the applications running inside the
To overcome these design challenges, we propose S PE - SGX enclaves [36]. Lastly, SCONE provides an integration to
ICHER , a secure LSM-based KV storage system. More Docker for seamlessly deploying containers.
specifically, we make the following contributions. 2.2 Persistent Key-Value (KV) Stores
• I/O library for shielded execution: We have designed Our work focuses on persistent KV stores based on the LSM
a direct I/O library for shielded execution based on Intel data structure [54], such as LevelDB [39] and RocksDB [65].
SPDK. The I/O library performs the I/O operations with- In particular, we base our design on RocksDB. RocksDB
out exiting the secure enclave; thus it avoids expensive organizes the data using three constructs: MemTable, static
system calls on the data path. sorted table (SSTable), and log files.
• Asynchronous trusted monotonic counter: We have RocksDB inserts put requests to a memory-resident
designed trusted counters to ensure data freshness. Our MemTable that is organized as a skip list [62]. For crash
counters leverage the lag in the sync operations in modern recovery, these puts are also sequentially logged to the
KV stores to asynchronously update the counters. Thus, write-ahead-log (WAL) file backed by persistent storage
they overcome the limitations of the native SGX counters. medium with checksums. When the MemTable fills up, it
• Secure LSM data structure: We have designed a secure is moved to an SSTable file backed by an SSD or HDD in a
LSM data structure that resides outside of the enclave batch to ensure sequential device access (this thus can cause
memory while ensuring the integrity, confidentiality and scanning the skip list).

174 17th USENIX Conference on File and Storage Technologies USENIX Association
The SSTable files are grouped into levels with increasing implemented S PEICHER by extending RocksDB [65], but our
size (typically 10×). The process of moving data to the next architecture can be generalized to other LSM-based KV stores.
level is called compaction, which ensures the SSTables to 3.1 Design Challenges
be sorted by keys, including the ones being merged from the
As a strawman design, we could try to secure a storage
previous level. Since SSTables are immutable, compaction
system by running the storage engine inside the enclave
always creates new SSTables on the persistent storage medium.
memory. However, the design of a practical and secure system
Any state changes in the entire storage system, such as creation
requires addressing the following four important architectural
and deletion of SSTable and WAL files, are recorded to the
limitations of Intel SGX.
Manifest, which is a transactional and persistent log.
On a get request, RocksDB first searches the MemTable I: Limited EPC size. The strawman design would be able
for the key, then searches the SSTables from the lowest level to protect the in-memory state of the MemTable using the
in turn; at each level, it binary-searches the corresponding EPC memory. However, EPC is a limited and shared resource.
SSTable. Using this primitive, it is trivial to process range Currently, the size of EPC is 128 MiB. Approximately 94 MiB
and iterator queries, where the latter only differs in the client are available to the user, the rest is reserved for the metadata.
interface. RocksDB maintains an index table with a Bloom To allow creation of enclaves with sizes beyond that of EPC,
filter attached to each SSTable in order to avoid searching SGX features a secure paging mechanism. The OS can evict
unnecessary SSTables. EPC pages to an unprotected memory using SGX instructions.
While restarting, RocksDB establishes the latest state in During eviction, the page is re-encrypted. Similarly, when an
a restore operation. To this end, the Manifest and the WAL evicted page is brought back, it is decrypted and its integrity
are read and replayed. is checked. However, the EPC paging incurs high performance
overheads (2×—2000×) [8].
2.3 Threat Model
Therefore, we need to redesign the shielded storage engine,
In addition to the standard SGX threat model [10], we also where we allocate the MemTable(s) outside the enclave in the
consider the security attacks that can be launched using an un- untrusted host memory. Since the secure enclave region cannot
trusted storage medium, e.g., persistent state stored on an SSD give any guarantees for the data stored in the host memory,
or HDD. More specifically, we aim to protect against a pow- and the native MemTable is not designed for security—we
erful adversary in the virtualized cloud computing infrastruc- designed a new MemTable data structure to guarantee the
ture [10]. In this setting, the adversary can control the entire sys- confidentiality, integrity and freshness properties.
tem software stack, including the OS or hypervisor, and is able
II: Untrusted storage medium. The storage engine does not
to launch physical attacks, such as performing memory probes.
exclusively store the data in the in-memory MemTable, but
For the untrusted storage component, we also aim to
also on a persistent storage medium, such as on an SSD or
protect against rollback attacks [57], where the adversary can
HDD. In particular, the storage engine stores three types of
arbitrarily shut down the system, and replay from a stale state.
files on a persistent storage medium: SSTable, WAL and the
We also aim to protect against forking attacks [40], where
Manifest. However, Intel SGX is designed to protect only the
the adversary can attempt to fork the storage system, e.g., by
volatile state residing in the enclave memory. Unfortunately,
running multiple replicas of the storage system.
SGX does not provide any security guarantees for stateful
Even under the extreme threat model, our goal is to guarantee
computations, i.e., across system reboot or crash. Further, the
the data integrity, confidentiality, and freshness. Lastly, we also
trust from the TEE does not naturally extend to the untrusted
aim to provide crash consistency for the storage system [58].
persistent storage medium.
However, we do not protect against side-channel attacks,
To achieve the end-to-end security properties, we further
such as exploiting cache timing and speculative execution [78],
redesigned the LSM data structure, including the persistent
or memory access patterns [25, 81]. Mitigating side channel
storage state in the SSTable and log files, to extend the trust
attacks in the TEEs is an active area of research [53]. Further,
to the untrusted storage medium.
we do not consider the denial of service attacks since these
attacks are trivial for a third-party operator controlling the III: Expensive I/O syscall. To access data stored on an SSD
underlying infrastructure [10]. Lastly, we assume that the or HDD (in the SSTable, WAL or Manifest files), conventional
adversary cannot physically open the processor packaging to systems leverage the system call interface. However, the
extract secrets or corrupt the CPU system state. system call execution in the SGX environment incurs high
performance overheads. This is because the thread executing
3 Design the system call has to exit the enclave, and the syscall
S PEICHER is a secure persistent KV storage system designed arguments need to be copied in and out of the enclave memory.
to operate on an untrusted host. S PEICHER provides strong These enclave transitions are expensive because of security
confidentiality, integrity, and freshness guarantees for the checks and TLB flushes.
data storage and query operations: get, put, range queries, To mitigate the context switch overhead, shielded execution
iterators, compaction, and restore. In this paper, we frameworks, such as S CONE [8] or Eleos [55], provide an

USENIX Association 17th USENIX Conference on File and Storage Technologies 175
asynchronous system call interface [70], where a thread Trusted SGX enclave memory Untrusted host
memory
outside the enclave asynchronously executes the system calls Storage engine
Trusted MemTable MemTable
without forcing the enclave threads to exit the enclave. While counter (RocksDB) (keys) (values)
such an asynchronous interface is useful for many applications,
it is not clearly suited for building a storage system that needs Speicher controller Shielded SPDK buffers
to support frequent I/O system calls. I/O library (DMA region)
To support frequent I/O calls within the enclave, we SCONE
Data path I/O
designed a new I/O mechanism based on a direct I/O library Control path I/O
for shielded execution leveraging storage performance SSTable(s)
Operating
development kit (SPDK) [28]. System Log files (WAL and Manifest)

IV: Trusted counter. In addition to guaranteeing the integrity Untrusted storage (SSD)
and confidentiality, we also aim to ensure the freshness of the Figure 1: S PEICHER overview (shaded boxes depict the
stored data to protect against rollback attacks [57]. To achieve system components)
the freshness property, we need to protect the data stored in the
untrusted host memory (MemTable), and those on the untrusted the controller. In particular, we built the controller based
persistent storage medium (SSTable, WAL and Manifest files). on the S CONE shielded execution framework [8], where we
For the first part, i.e., to ensure the freshness of MemTable leverage S CONE’s container support for secure deployment
allocated in the untrusted host memory, we can leverage of the S PEICHER executable on an untrusted host.
the EPC of SGX. In particular, the Memory Encryption The controller provides the remote attestation service to
Engine (MEE) in SGX already protects the EPC against the clients [6, 32]. In particular, the SGX enclave generates
rollback attack. Therefore, we use the EPC to store a freshness a signed measured of its identity, whose authenticity can be
signature of the MemTable, which we use at runtime to verify verified by a third party. After successful attestation, the client
the freshness of data stored as part of the MemTable in the provides its encryption keys to the controller. The controller
untrusted host memory. uses the client certificate to perform the access control
However, the second part is quite tedious, i.e., to ensure operation. The controller also provides runtime support for
the freshness of the data stored on untrusted persistent storage user-level multithreading and memory management inside
(SSTables and log files), because the rollback protected the enclave. The controller leverages the asynchronous system
EPC memory is stateless, or it cannot be used to verify the calls interface (S CONE libc) on the control path for the system
freshness properties after the system reboots or crashes. configuration. For the data path I/O, we built a direct I/O
Therefore, we need a rollback protection mechanism based library, which we describe next.
on a trusted monotonic counter [57]. For example, we could Shielded direct I/O library. The I/O library allows the
use SGX trusted counters [3]. Unfortunately, the SGX trusted storage engine to access the SSD or HDD from inside the SGX
counters are extremely slow (60−250 ms) [45]. Furthermore, enclave, without issuing the expensive enclave exit operations.
the counter memory allows only a limited number of write We achieve this by building a direct I/O library for shielded
operations to NVRAM, and it easily becomes unusable due execution based on SPDK [28].
to wear out within a couple of days of operation. Therefore, SPDK is a high-performance user-mode storage library,
the SGX counters are impractical to design a storage system. based on Data Plane Development Kit (DPDK) [2]. It elimi-
To overcome the limitations of SGX counters, we designed nates the need to issue system calls to the kernel for read and
an asynchronous trusted monotonic counter that drastically write operations by having the NVMe driver in the user space.
improves the throughput and mitigates wear-out by taking SPDK enables zero-copy I/O by mapping DMA buffers to
advantage of the crash consistency properties of modern the user address space. It relies on actively polling the device
storage systems. instead of interrupts.
These SPDK features align with the goal of S PEICHER of
3.2 System Components
exit less I/O operations in the enclave, i.e., to allow the shielded
We next detail the system components of S PEICHER. Figure 1 storage engine to interact with the SSD directly. However, we
illustrates the high-level architecture and building blocks need to adapt the design of SPDK to overcome the limitations
of S PEICHER. The system is composed of the controller, a of the enclave memory region. In particular, our shielded I/O
direct-I/O library for shielded execution, a trusted monotonic library allocates huge pages and SPDK ring buffers outside
counter, the storage engine (RocksDB engine), and a secure the enclave for DMA. The host system maps the device in an
LSM data structure (MemTable, SSTable, and log files). allocated DMA region. Afterwards SPDK can initialize the
S PEICHER controller. The controller provides the trusted device. To reduce the number of enclave exits, SPDK’s device
execution environment based on Intel SGX [8]. Clients driver runs inside the enclave. This enables efficient delivery of
communicate over a mutually authenticated encrypted channel requests from the storage engine to the driver, which explicitly
(TLS) to the controller. The TLS channel is terminated inside copies the data between the host and the enclave memory.

176 17th USENIX Conference on File and Storage Technologies USENIX Association
Trusted counter. In order to protect the system from rollback Untrusted
Trusted SGX enclave memory host memory
attacks, we need a trusted counter whose value is stored
MemTable skip list (Key, hash, ptr to values) Encrypted values
alongside with the LSM data structure. Intel SGX provides
Value
monotonic counters, but their update frequency is in a range of Value
10 updates per second, and we indeed measured approximately
250 ms to increment a counter once. This is far too slow for
modern KV stores [26].
To overcome the limitations of SGX counters, we designed K H P K H P Key Hash Ptr Value
an Asynchronous Monotonic Counter (AMC) based on the
observation that many contemporary KV stores do not persist
their inserted data immediately. This allows AMC to defer the Figure 2: S PEICHER MemTable format
counter increment until the data is persisted without loosing
to be decrypted on each traversal. Further, it requires multiple
any availability guarantees. As a result, AMC achieves 70K
hash recalculations on each lookup and insertion.
updates per second in the current implementation.
Secondly, we tried a modified Merkle tree design based on
AMC provides an asynchronous increment interface,
a prefix array, where a fixed size prefix is used as an index into
because it takes a while since the counter value is incremented
the array of Merkle trees. An array entry holds the root node of
until it becomes stable, which means the counter value cannot
a Merkle tree, which holds the actual data. This should reduce
be rolled back without being detected. At an increment, AMC
the depth of the search tree compared to the native Merkle tree;
returns three pieces of information: the current stable value,
thus, reducing the number of necessary hash calculations and
the incremented counter value, and the expected time for the
decryptions of keys. However, while we were able to increase
value to be stable. Due to the expected time and the controller
the lookup speed compared to the native Merkle tree, it still
having to be re-authenticated after a shutdown, the client only
suffered from the same problem of having to decrypt a large
has to keep the values until the stable time has elapsed, to
number of keys in a lookup, and causing a large number of
prevent any data loss in case of a sudden shutdown.
hash calculations.
AMC’s flexible interface allows us to optimize update
Lastly, our third attempt of the MemTable design reuses the
throughput and latency by increasing the time until a trusted
existing skip list data structure for the MemTable in RocksDB.
counter is stable. This also allows users to adjust trade-off
Figure 2 shows S PEICHER’s MemTable format. In particular,
between the wear out of the SGX monotonic counter and the
we partition the existing MemTable in two parts: key path and
maximum number of unstable counter increments, which a
value path. In the key path, we store the keys as part of the skip
client might have to account for. S PEICHER generates multiple
list inside the enclave. Whereas, the encrypted values in the
counters by storing their state to a file, whose freshness is
MemTable are stored in the untrusted host memory as part of
guaranteed through the use of a synchronous trusted mono-
the value path. This partitioning allows S PEICHER to provide
tonic counter. For instance, we can employ SGX monotonic
confidentially by encrypting the value, while still enabling
counters [3], ROTE [45] or Ariadne [71] to support our
fast key lookups inside the enclave. To prevent attacks on the
asynchronous interface. Therefore, we can have a counter with
integrity or the freshness of the values, S PEICHER stores a
deterministic increments for WAL and the Manifest, making it
cryptographic hash of the value in each skip list node together
possible to argue about the freshness of each record in the files.
with the host memory location of the value.
MemTable. As detailed in §3.1, the EPC is limited in size and While the first two designs removed almost the entire
the EPC paging incurs very high overheads. Therefore, it is MemTable from the EPC, the last design still maintains the
not judicious to store large MemTables or multiple MemTa- keys and hash values inside the enclave memory. To determine
bles within the EPC. Further, since S PEICHER uses the EPC the space requirements of our MemTable in comparison to the
memory region to secure the storage engine (RocksDB) and the regular RocksDB’s MemTable, we use the following formula:
shielded I/O library driver, it further shrinks the available space.
m
Due to this memory restriction, we need to store the
S = n∗(k+v)+ ∑ pi ∗n∗ ptr
MemTable in the host memory. Since the host memory is i=0
untrusted, we need to devise a mechanism to ensure the
confidentiality, integrity, and freshness of the MemTable. Where S represents the entire size of the skip list, n is the
In our project, we tried three different designs for the number of KV pairs, k is the key size, v is the value size or
MemTable. Firstly, we explored a native Merkle tree that gen- the size of the pointer plus hash value for our skip list, p is the
erates hashes of the leafs and stores them in each node. Thus, probability for being added into a specific layer of the skip list,
we can verify the data integrity by checking the root node hash m is the maximum number of layers, and ptr is the size of a
and each hash down to the leaf storing the KV, while allowing pointer in the system.
the MemTable to be stored outside the EPC memory. However, For instance, in case of the default setting for RocksDB,
the native Merkle tree suffers from slow lookups as the key has with a maximum size of 64 MiB, key size of 16 B, value size

USENIX Association 17th USENIX Conference on File and Storage Technologies 177
KV
Block #1 KV
to ensure the desired security properties, as shown in Figure 4.
Hash of
(32KB) Regarding WAL, every put operation appends a record to
block #1 (H1)
KV
the current WAL. This record consists of the encrypted KV
KV
KV Hash of pair, and an encrypted trusted counter value for the WAL at
Block #2
block #2 (H2) the moment of insertion, and a cryptographic hash over both.
KV
Since the records are only appended to the WAL, S PEICHER
can use the trusted counter value and the hash value to verify
the KV pair, and to replay the operations in a restore event.
KV The Manifest is similar to the WAL; it is a write-append log
KV
Block #n Hash of consisting of records storing changes of live files. We use the
block #n (Hn)
KV same scheme for the Manifest file as we do for the WAL.
H1 H2 Hn
3.3 Algorithms
block of hashes
We next present the algorithms for all storage operations in
Figure 3: S PEICHER SSTable file format S PEICHER. The associated pseudocodes are detailed in the
appendix.
Record1
Trusted counter
Hash1 Record2 TCV2 Hash2
I: Put. Put is used to insert a new KV pair into the KV store, or
value (TCV)1
to update an existing one. We need to perform two operations
Append # 1 Append # 2 to insert the KV pair into the store (see Algorithm 1). First, we
need to append the KV pair to the WAL for persistence. Second,
Figure 4: S PEICHER append-only log file format
we need to write the KV pair to the MemTable for fast lookups.
of 1024 B, pointer size of 8 B, p of 1/4, m of 12 and for S PE - Inserting the KV pair into the WAL guarantees that the state
ICHER’s skip list a hash size of 16 B — S PEICHER’s MemTable of the KV store can be restored after an unexpected reboot.
achieves a space reduction of approximately 95.2 %. Further, Therefore, the KV pair should be inserted into the WAL before
the reduction ratio increases with increased value size. it is inserted into the MemTable. To add a KV pair to the WAL,
SSTables. The SSTable files maintain the KV pairs persis- S PEICHER encrypts the pair together with the next WAL trusted
tently. These files store KV pairs in the ascending order of counter value and a cryptographic hash over both the data and
keys. This organization allows for a binary search within the the counter. The encrypted block is then appended to the WAL
SSTable, requiring only a few reads to find a KV-pair within (see the log file format in Figure 4). Thereafter, the trusted
the file. Since SSTable files are optimized for block devices, counter is incremented to the value stored in the appended
such as SSDs, they group KV pairs together into blocks (the block. In addition, the client is notified when the KV pair will
default block size is 32 KiB). be stable; thereafter, the state cannot be rolled back. In case of a
S PEICHER adapts SSTable file format to ensure the security system crash between generating the data block and increasing
properties (see Figure 3 for S PEICHER’s SSTable file format). the trusted counter value, the data block would be invalid at re-
The confidentiality is secured by encrypting each block of boot, because the trusted counter would point the block to a fu-
the SSTable file before it is written to the persistent storage ture time. This operation is safe as the client can detect a reboot
medium. Additionally, S PEICHER calculates a cryptographic when S PEICHER tries to authenticate itself. After the reboot the
hash over each block. These hashes are then grouped together client can ask the KV store about what the last added key was, or
in a block of hashes and appended at the end of the SSTable can simply put the KV pair again in the store as another request
file. When reading S PEICHER can check the integrity of each with the same key supersedes any old value with the same key.
block by calculating the block’s hash and comparing it to In the second step, S PEICHER writes the KV pair into
the corresponding hash stored in the footer. To protect the the MemTable and thereby making the put visible to later
integrity of the footer an additional hash over the footer is gets. S PEICHER first encrypts the value of the KV pair and
calculated and stored in the Manifest. Since the Manifest generates a hash over the encrypted data. The encrypted value
is protected against rollback attacks using a trusted counter, is then copied to the untrusted host memory, while the hash
the footer hash value stored in the Manifest is also protected with a pointer to the value is inserted into the skip list in the
from the rollback attacks. Thus, S PEICHER can use this hash enclave, in accordance to S PEICHER’s MemTable format
to guarantee the freshness of the SSTable file’s footer and (Figure 2). Since the KV pair is first inserted into the WAL,
transitively the freshness of each block in the SSTable file. and only if this is successful, i.e., the WAL and trusted counter
Log files. RocksDB uses two different log files to keep track of are updated, we can guarantee that only KV value pairs whose
the state of the KV store: (a) WAL for persisting inserted KV freshness is secured by the trusted counter are returned.
pairs until a top-level compaction; and (b) the Manifest to keep II: Get. Get may involve searching multiple levels in the
track of live files, i.e., the set of files of which the current state LSM data structure to find the latest value. Within each level,
of the KV store consists. S PEICHER adapted these log files S PEICHER has to generate either the proof of existence, or the

178 17th USENIX Conference on File and Storage Technologies USENIX Association
proof of non-existence of the key. This is necessary to detect being compacted over time to the lower levels, and lower being
insertion or deletion of the KV pairs by an attacker. larger in size and therefore having more KV pairs. If the next
Algorithm 2 details the get operation in S PEICHER. In KV pair is requested the next key of all iterators is checked
particular, S PEICHER begins with searching the MemTable. and the iterators with the smallest next key are forwarded.
S PEICHER searches the skip list for the node with the key. In case the next key is in multiple levels, the highest level KV
Either the key is in the MemTable, then the hash value is pair is chosen. Therefore, S PEICHER has to do a non-/existence
calculated over the value and compared to the hash stored proof at all the levels, before it returns the chosen KV pair. If
in the skip list, or the key could not be found in the skip list. any of these proofs fails, the client is informed about the failed
Since the skip list resides inside the protected memory region, proof. Identical to the get operation, the client can then decide
S PEICHER does not need to make the non-existence proof for to either restore the KV store or to restore a backup.
the MemTable because an attacker cannot access the skip list. Similar to the get operation, the hash value stored in
If the KV store finds a key in the MemTable and the existence the EPC and the Merkle tree over the SSTables are used to
proof is correct, i.e., the calculated hash value is equal to the guarantee the freshness of the returned values.
stored hash value, the value is returned to the client. If the
IV: Iterators. Iterators work identical to the range queries;
proof is incorrect, the client is informed that the MemTable
they just have a different interface (see Algorithm 4).
is corrupted. Since the MemTable can be reconstructed from
the WAL, the client can then instruct the S PEICHER to recreate V: Restore. After a reboot, the KV store has to restore its last
the KV store state in the case of an incorrect proof. state (see Algorithm 5). This process is performed in two steps,
When the key is not found in the MemTable, the next level first collecting all files belonging to the KV store, and then
is searched. All levels below the MemTable are stored in SSTa- replaying all changes to the MemTable. In the first step the
bles. The SSTable files are organized in a way that no two Manifest file is read. It contains all necessary information
SSTables in the same level have an overlapping key-range. Ad- about the other files, such as live SSTable files, live WAL
ditionally, all the keys are sorted within an SSTable file. Due to files, smallest key of each SSTable file. Each changing event
this, any given key can only exist in one position in one SSTable about the live file is logged into the Manifest by appending a
file per level. This allows S PEICHER to construct a Merkle tree record describing the event. Therefore, at a restore all changes
on top of the SSTable files of a level. With the ordering inside committed in the Manifest have to be replayed. This means
the SSTable, S PEICHER can correlate a block in the file with the that the SSTable files have to be put in the correct level. Each
key. This allows S PEICHER to calculate a hash over this block, record in the Manifest is integrity-checked by a hash, and the
which then can be checked against the stored hash in the footer. freshness is guaranteed by the trusted counter for the Manifest.
The hash of the footer can then be checked against the Merkle Since the counter value is incremented in a deterministic
tree over the SSTable files in that level. It gives S PEICHER the way, S PEICHER can use this value to check if all blocks are
proof of non-/existence for the lookup, and possibly the value present in the Manifest. After the SSTable files in the levels are
belonging to the key. If the proof fails, the client is informed. restored, and the freshness of all the SSTable files is checked
In contrast to an incorrect proof in the MemTable, S PEICHER against the Manifest by comparing the hash with the hash
is not able to recover from this problem since the data is stored stored in the Manifest, the WAL is replayed.
on the untrusted storage medium. If S PEICHER finds the KV Since each put operation is persisted in the WAL before
pair and the proof is correct, it returns the value to the client. If it is written into the MemTable, replaying the put operations
the key does not exist, that is S PEICHER could not find it in any from the WAL allows S PEICHER to reconstruct the MemTable
level and all level proofs are correct, an empty value is returned. at the moment of the shutdown. Each put in the WAL has
The freshness of data is guaranteed either by checking the to be checked against the stored hash in the record, and the
value against the securely stored hash in the EPC for the case stored counter value. Additionally, since the counter value of
where the key has been found in the MemTable, or by checking the WAL is checked whether it equals to that of the Manifest
the hash values of the SSTables against a Merkle tree. Addi- counter, S PEICHER can check for the missing records. Records
tionally, as any key can only be stored in one position within a that have a counter value being in the future, i.e. a counter
level, S PEICHER can also check against deletion of the key in value higher than the stored stable trusted counter value are
a higher level, which is also necessary to guarantee freshness. ignored at restore. Further, due to the deterministic increase of
III: Range queries. Range queries are used to access all KV the counter, S PEICHER can check against the missing records
pairs, with a key greater than or equal to a start key and lesser in the log files. If in any of these steps one of the checks
than an end key (see Algorithm 3). To find the start KV pair, we fails, S PEICHER returns the information to the client, because
need to do the same operation as in get requests. Furthermore, S PEICHER is not able to recover from such a state.
it requires to initialize an iterator in each level, pointing to the VI: Compaction. Compaction is triggered when a level holds
KV pair with a key greater or equal to the starting key. These data beyond a pre-defined threshold in size. In compaction
iterators are necessary as higher levels have the more recent (see Algorithm 6), a file from Leveln is merged with all
updates, due to keys being inserted into the highest level and SSTable files in Leveln+1 covering the same key range. The

USENIX Association 17th USENIX Conference on File and Storage Technologies 179
new SSTables are added to Leveln+1 , while all SSTables in the We use SPDK 18.01.1-pre and DPDK 18.02. In SPDK, 56
previous level are discarded. Before keys are added to the new LoC are added, and 22 LoC are removed. In DPDK, 138 LoC
SSTable file, the non-/existence proof is done on the files being are added and 72 LoC are removed. These changes were made
merged. This is necessary to prevent the compaction process to replace the routines that cannot be executed in the enclave.
from skipping keys or writing old KV to the new SSTable files. Trusted counters. AMCs are implemented using the Intel
Since hash values are calculated over blocks of the SSTable SGX SDK. A dedicated thread continually checks if any
files, a new block has to be constructed in the enclave memory, monotonic counter value has changed. If a counter value has
before it is written to the SSD. Also, all hash values of the been incremented, the thread writes the current value to the
blocks have to be stored in the protected memory until the file. The storage engine can query the stable value of any of its
footer is written and a hash over the footer is created. The counters, i.e., the last value that has been written to disk. Note
file names of newly created SSTables and footer hashes are that this value cannot be rolled back since it is protected by
then written to the Manifest file, with the new trusted counter the synchronous SGX monotonic counter. Overall, our trusted
value. This is similar to the put operation. After the write counter consists of 922 LoC.
operation to the Manifest completes and the trusted counter
is incremented, the old SSTable files are removed from the KV S PEICHER controller. The S PEICHER controller is based on
store and the new files are added to Leveln+1 . Since the hash S CONE. We leverage the Docker integration in S CONE to seam-
values of the new SSTables are secured with a trusted counter lessly deploy S PEICHER binary on an untrusted host. Further,
value in the Manifest file, the SSTables cannot be rolled back we implemented a custom memory allocator for the storage
after the compaction process. engine. The memory allocator manages the unprotected host
memory, and exploits RocksDB’s memory allocation pattern,
3.4 Optimizations
which allows us to build a lightweight allocator with just 119
Timer performance. As described in §3.2, in order to prevent LoC. Further, the controller employs our direct I/O library
every request from blocking for the trusted counter increment, on the data path, and the asynchronous syscall interface of
we leverage asynchronous counters written in files whose S CONE on the control path for system configuration. The con-
freshness is guaranteed by synchronous counters (or SGX troller also implements a TLS-based remote attestation for the
counters). We use one counter for the WAL and another for the clients [32]. Lastly, we integrated the trusted counter as a part
Manifest so that S PEICHER can operate on them independently. of the controller, and exported the APIs to the storage engine.
Although this method drastically improves throughput by
Storage engine. We implemented the storage engine by
allowing S PEICHER to process many requests without waiting
extending a version of RocksDB that leverages SPDK. In
for the counter to be stable, it also poses on the client the need
particular, we extended the RocksDB engine to run within
for holding its write requests until the counter value is stable.
the enclave, also integrated our direct I/O library. Since the
This is why we designed and implemented the interface of
RocksDB engine with SPDK does not support data encryption
AMC that reports the expected time for the counter to be stable.
and decryption, we also ported encryption support from the
Because of this interface, the client does not need to frequently
regular RocksDB engine using the Botan Library [1] (1000
issue the requests to check the current stable counter value.
LoC). In addition to encrypting data files, we extended the
SPDK performance. SPDK is designed to eliminate system encryption support to ensure the confidentiality of the WAL
calls from the data path, but in reality its data path issues and Manifest files. We further modified the storage engine to
two system calls on every I/O request: one for obtaining the replace the LSM data structure and log files with our secure
process identifier and the other for obtaining the time. They are MemTable, SSTables, and log files. Altogether, the changes in
executed once in an I/O request that covers multiple blocks and RocksDB account for 5029 new LoC and 319 changed LoC.
their costs are normally amortized. However, since the context
switch to and from the enclave is an order of magnitude more MemTables. RocksDB as default uses a skip list for
expensive, these costs are not amortized enough. We modified MemTable. However, it does not offer any authentication
them to obtain the values from a cache within the enclave that or freshness guarantees. Therefore, we replaced MemTable
are updated only at the vantage points. As a result, we achieved with an authenticated data structure coupled with mechanisms
25× improvements over the naive port of SPDK to the enclave. to ensure the freshness property. Our MemTable uses the
Inlineskiplist of RocksDB and replaces the value part of
4 Implementation the KV-pair with a node storing a pointer to and the size of the
Direct I/O library. Our direct I/O library for shielded execu- value as well as an HMAC. For the en-/decryption as well as for
tion extends Intel SPDK. Further, the memory management the HMAC we used OpenSSLs AES128 in GCM mode. This
routines and the uio kernel module that maps the device mem- results in a 16 B wide HMAC. This implementation consists
ory to the user space are based on Intel DPDK [2]. Although the of 459 LoC. As discussed previously, we also implemented
device DMA target is configured outside the enclave, the SSD MemTable with a native Merkle tree (1186 LoC) and a Merkle
device driver and library code, including BlobFS in which S PE - tree with a prefix array (528 LoC). However, we did not use
ICHER stores RocksDB files, entirely run within the enclave. them eventually since their performance was quite low.

180 17th USENIX Conference on File and Storage Technologies USENIX Association
SSTables. To preserve the integrity of the SSTable blocks, Workload Pattern Read/Write ratio
we changed the block layer in RocksDB to calculate the A (default) Read-write 90R—10W
hash before it issues a write request to the underlying layer. B Read-write 80R—20W
The hash is then cached until the file is flushed (258 LoC). C Read only 100R—0W
Thereafter, hashes of all blocks are appended to the file coupled Table 1: RocksDB benchmark workloads.
with the information about the total number of blocks, and
the hash of this footer. When a file is opened, our hash layer 25
1.25
loads the footer into the protected memory and calculates the

Throughput [GB/S]
20 1.00
hash of the footer. It then compares the value against the hash

IOPS [10⁴]
15 0.75
stored in the Manifest file. Only if these checks are passed,
it opens the corresponding SSTable file and normal operations 10 0.50

proceed. At reading, the hash of the block is calculated and 5 0.25 Speicher
checked against the hashes stored in the protected memory Native
0 0.00
area, before the block data is handed to the block layer of 2 4 8 16 2 4 8 16
Blocksize [KiB] Blocksize [KiB]
RocksDB. We further enabled AES128 encryption to ensure
the confidentiality of the blocks (188 LoC). The hashes used Figure 5: Performance of direct I/O library for shielded
in the SSTables are SHA-3 with 384 bit. execution vs native SPDK.
Log files. Log files including the WAL and the Manifest use
running Linux kernel 4.9. Each core has private 32 KiB L1 and
the same encryption layer as the SSTable files. However,
256 KiB L2 caches, and all cores share a 8 MiB L3 cache. For
the validation layer is different, and comes before the block
the storage device our testbed uses a Intel DC P3700 SSD. The
layer since the operation requires knowledge of the record
SSD has a capacity of 400 GB and is connected over PCIe x4.
size. While writing, the validation layer adds the hash and the
trusted counter value to the log files. Methodology for measurements. We compare the perfor-
The validation layer uses the knowledge that log files mance of S PEICHER with an unmodified version of RocksDB.
are only read sequentially at startup for restoring purpose. The native version of RocksDB does not provide any security
Therefore, at the start up, the layer allows any action written in guarantees, i.e., it provides no support for confidentiality,
the log file as long as the hash is correct, and the stored counter integrity and freshness of the data and query operations.
increases as expected. At the end of the file, S PEICHER checks Importantly, we stress-test the system by running a client
if the stored counter is equal to the trusted counter. The last on the same machine as the KV store. This is the worst-case
record’s freshness is guaranteed through the trusted counter. scenario for S PEICHER since the client is not communicating
Integrity of all the records is guaranteed through the hash value over the network. Usually, the network slows down client’s
protecting also the stored counter value. This value can then requests, and therefore, such an experimental setup is unable
be checked against the expected counter value for that block. to stress-test the KV store. We avoid this scenario by running
Since the counter lives longer than the log files, the start record the client as part of the same process on the same host. This
value has to be secured too. In case of WAL, this is achieved eliminates further the need for enclave enters and exits, which
by storing the start counter value of the WAL in the Manifest. would add a high overhead, making a stress-test impossible.
The start record of the Manifest is implicitly secured, since Compiler and software versions. We used the RocksDB ver-
the record must describe the state of the entire KV store. sion with SPDK support (git commit 3c30815). We used SPDK
5 Evaluation version 18.01.1-pre (git commit 73fee9c), which we compiled
with DPDK version 18.02 (commit 92924b2). The native ver-
Our evaluation answers the following questions.
sion of SPDK/ DPDK and RocksDB was compiled with gcc
• What is the performance (IOPS and throughput) of the
6.3.0 and the default release flags. The S PEICHER version
direct I/O library for shielded execution? (§5.2)
of SPDK/DPDK and RocksDB was compiled with the same
• What is the impact of the EPC paging on the MemTable?
release flags but gcc version 7.3.0 of the S CONE project.
(§5.3)
• What are the performance overheads of S PEICHER in RocksDB benchmark suite. We use the RocksDB benchmark
terms of throughput and latency measurements? (§5.4) suite for the evaluation. In particular, we used the db_bench
• What is the performance of our asynchronous trusted benchmarking tool which is shipped with RocksDB [5] and
counter? And what stability guarantees it has to provide Fex [52]. The benchmark consists of three workloads as
to be compatible with modern KV stores? (§5.5) shown in Table 1. Workload A is the default workload.
• What is the I/O amplification overhead? (§5.6) 5.2 Performance of the Direct I/O Library
5.1 Experimental Setup We first evaluate the performance of S PEICHER’s I/O library
Testbed. We used a machine with Intel Xeon E3-1270 v5 for shielded execution. The I/O library is designed to have fast
(3.60 GHz, 4 cores, 8 hyper-threads) with 64 GiB RAM access to the persistent storage for accessing the KV pair stored

USENIX Association 17th USENIX Conference on File and Storage Technologies 181
105
ratio of slowdown to the native SPDK-based RocksDB.
Effect of varying workloads. In the first experiment, we used
10 4 different workloads listed in Table 1. The workloads were
Time [s]

90MB evaluated with 5 million KV pairs each. Each key was 16 B and
value was 1024 B. The benchmarks were run single threaded.
103 We get a throughput of 34.2k request/second (rps) for Work-
load A down to 20.8k rps for Workload C, while RocksDB
archived 512.8k rps or 676.8k rps respectively. The results
102 show that S PEICHER overheads 15×—32.5× for different
1 MB 10 MB 100 MB 1 GB
Buffer size [MB] workloads. The overheads in Workloads A and B are mainly
Figure
1 6: Impact of the EPC paging on the MemTable. due to the operations performed in the MemTable, since S PE -
10
ICHER has to encrypt the value and generate a cryptographic
on the SSD (§3.2). We run the performance measurement 20 hash for every write to the MemTable. Furthermore, for each
times for every configuration of block size for the native execu- read operation the data has to be decrypted and the hash has to
tion and S PEICHER. Figure 5 shows the mean throughput and be recalculated and compared to one in the Skip list. However,
IOPS with our I/O library and those with the native RocksDB- even with AES-NI instructions, this decryption operation
SPDK with a confidence interval of 95%. We use Workload takes at least 1.3 cycles/byte for encryption, limiting the
B (80%R—20%W). Since the communication between SPDK maximal reachable performance. The overhead in Workload
and the device is handled completely over DMA, our direct C is due to reading a very high percentile of the KV pairs
I/O library does not suffer from context switches. Additionally, from the SSTable files, which uses currently an un-optimized
due to storing the buffers outside of the enclave, we also do not code path for en-/decryption and hash calculations. We expect
require expensive EPC paging, which would drastically reduce performance improvement by further optimizing the code path.
the performance of the I/O library. Our performance evaluation
Effect of varying byte sizes. In the second experiment, we
of the direct I/O library shows that it does not suffer from any
investigate the overheads with varying value sizes, since it
performance deprecation compared to the native SPDK setup.
changes the amount of data S PEICHER has to en-/decrypt and
5.3 Impact of the EPC paging on MemTable hash for each request. We used the default Workload A, and
We next study the impact of EPC paging on MemTable(s). changed the value size from 64 B up to 4 KiB.
Note that a naive solution of storing a large or many MemTa- S PEICHER incurs an overhead of 6.7× for small value size,
bles in the EPC memory would incur high performance i.e. 64 B, up to an overhead of 16.9× for values of size 4 KiB.
overheads due to the EPC paging. Therefore, we adopted a As in the previous experiment, the overhead is mainly domi-
split MemTable approach, where we store only the keys along nated by the en-/decryption and hash calculation for the values
with metadata (hashes and pointers to value) inside the EPC, in the MemTable. The benchmark shows a higher overhead
but the values are stored in the untrusted host memory (§3.3). for larger value sizes, since the amount of data S PEICHER has
To confirm the overheads of the EPC paging on accessing a to en-/decrypt increases with the size of the values.
large MemTable which are incurred in our rejected design, Effect of varying threads. We also investigated the scaling
we measure the overheads of accessing random nodes in a capabilities of S PEICHER. For that we increased the number of
MemTable completely resident in the enclave memory. threads up to 8 and compared the overhead to native RocksDB
Figure 6 shows the performance overhead of accessing with the default Workload A. Note that the current SGX server
memory within the SGX enclave. The result shows that as soon machine has 4 physical cores / 8 hyperthread cores.
as SGX has to page out MemTable memory from the EPC, In the test the overhead increased from around 13.6× for
which happens at 96 MiB, the performance drops dramatically. two threads to 17.5× for 8 threads. This implies S PEICHER
This is due to the en-/decryption and integrity checks employed scales slightly worse than RocksDB. This is due to less
by the MEE in Intel SGX. Therefore, it is important for our optimal caching for random memory access in S PEICHER’s
system design to keep the data values in the untrusted host memory allocator. S PEICHER has to manage two different
memory to avoid the expensive EPC paging. Our approach memory regions (host and EPC) for the MemTable, which
of only keeping the key path of the MemTable inside the leads to sub-optimal caching. We plan to optimize our memory
EPC requires a small EPC memory footprint. Therefore, our allocator and data structures to exploit the cache locality.
MemTable does not incur the EPC paging overhead.
Latency measurements. In the benchmarks, S PEICHER has
5.4 Throughput and Latency Measurements an average latency ranging from 16 µs for single threaded and
We next present the end-to-end performance of S PEICHER 64 B value size up to 256 µs for 8 threads and 1024 B value
with different workloads, value sizes and thread counts. We size, native RocksDB had for the same benchmark a latency
measured the average throughput and latency for each of our of 1.6 µs or 14 µs respectively. However, RocksDB’s best
benchmarks. Figure 7 shows the measurement results as a latencies were in Workload C with an average of 1.5 µs.

182 17th USENIX Conference on File and Storage Technologies USENIX Association
17.5
17.5
30
15.0
15.0
25 12.5
Relative overhead

Relative overhead
Relative overhead
12.5
20 10.0 10.0
15 7.5 7.5
10 5.0 5.0

5 2.5 2.5

0 0.0 0.0
90/10 80/20 100/0 64 256 1024 2048 4096 1 2 4 8
Workload Value size [bytes] Threads

Figure 7: S PEICHER performance normalized to the native RocksDB (with no security): (a) different workloads with constant
value size of 1024 and constant number of threads, (b) varying value sizes, and (c) increasing number of threads.

KV store Default time for persistence (ms) Configurable 6 Related Work


RocksDB 0 (flushing) yes
Shielded execution. Shielded execution frameworks provide
LevelDB 0 (non-flushing) yes
strong security guarantees for legacy applications running
Cassandra 1000 yes
on an untrusted infrastructure. Prominent examples include
HBase 10000 yes
Haven [10], S CONE [8], Graphene-SGX [75], Panoply [69],
Table 2: Default time for data persistence in KV stores. and Eleos [55]. Recently, there has been a significant interest
in designing secure systems based on shielded execution, such
5.5 Performance of the Trusted Counter
as VC3 [68], Opaque [82], Ryoan [27], Ohrimenko et al. [51],
The synchronous trusted counter rate of SGX is limited to one SGXBounds [36], etc. However, these systems are primarily
increment at every 60 ms. This would limit our approach to designed to secure stateless computation and data. (Pesos [34]
only 20 Put operations per second since each Put has to be is an exception, see the policy-based storage systems section
appended to the WAL, which requires a counter increment. for the details.) In contrast, we present the first secure persistent
However, our latency suggest that we have a lot more put LSM-based KV storage system based on shielded execution.
operations to deal with. Even in our worse latency case with I/O for shielded execution. To mitigate the I/O overheads in
256 µs per request we would expect 234.4 request per 60 ms, SGX, shielded execution frameworks, such as Eleos [55] and
with a write rate of 10% this would amount to 23.4 required S CONE [8], proposed the usage of an asynchronous system call
counter increases every possible sequential counter increase. interface [70]. While the asynchronous interface is sufficient
In practice S PEICHER should reach far higher update rates as for the low I/O rate applications—it can not sustain the perfor-
this calculation used worst case values from our benchmarks. mance requirements of modern storage/networked systems. To
Table 2 shows the time before different KV stores guarantee mitigate the I/O bottleneck, ShieldBox [73] proposed a direct
that the values are persisted. We argue that these times can be I/O library based on Intel DPDK [2] for building a secure
used to hide the stability time of our asynchronous counters, middlebox framework. Our direct I/O library is motivated
which is a maximum of 60 ms. This is far less than the by this advancement in the networking domain. However, we
maximum time to persist the data in the default configuration propose the first direct I/O library for shielded execution based
of Cassandra and HBase. If the client expects the value is on Intel SPDK [28] for the I/O acceleration in storage systems.
persisted only after a specific period of time, we can relax our Trusted counters. A trusted monotonic counter is one of
freshness guarantees to match to the same time window. the important ingredients to protect against rollback and
equivocation attacks. In this respect, Memoir [57] and
5.6 I/O Amplification
TrInc [40] proposed the usage of TPM-based [74] trusted
We measured the relative I/O amplification increase in data counters. However, TPM-based solutions are quite impractical
for S PEICHER compared to the native RocksDB. We report because of the architectural limitations of TPMs. For instance,
the I/O amplification results using the default workload (A) they are rate-limited (only one increment every 5 seconds) to
with the key size of 16 B and value size of 4 KiB. We observed prevent wear out. Therefore, they are mainly used for secure
an overhead of 30% for read and write in the I/O amplification. data access in the offline settings, e.g., Pasture [33].
This overhead mainly comes from the footer we have to add to Intel SGX has recently added support for monotonic
each SSTable as well as from the hashes and counter values we counters [3]. However, SGX counters are also quite slow,
have to add to the log files. This overhead is not only present and they wear out quickly (§3). To overcome the limitations,
in the write case but also in the read, as the additional data has ROTE [45] proposed a distributed trusted counter service
also to be read to be able to verify the files. based on a consensus protocol. Likewise, Ariadne [71]

USENIX Association 17th USENIX Conference on File and Storage Technologies 183
proposed an optimized technique to increment the counter by SGX. Since ShieldStore is an in-memory KV Store, it does not
a single bit flip. Our asynchronous trusted counter interface is persist the data using the LSM data structure unlike S PEICHER.
complimentary to these synchronous counter implementations. Authenticated data structures. Authenticated data struc-
In particular, we take advantage of the properties of modern tures (ADS) [47] enable efficient verification of the integrity of
storage systems, where we can use these synchronous counters operations carried out by an untrusted entity. The most relevant
to support our asynchronous interface. ADS for our work is mLSM [63], a recent proposal to provide
Policy-based storage systems. Policy-based storage systems integrity guarantee for LSM. In contrast to mLSM, our system
allow clients to express fine-grained security policies for provides stronger security properties, i.e., we ensure not only
data management. In this context, a wide range of storage integrity, but also confidentiality and freshness. Furthermore,
systems have been proposed to express client capabilities [22], our system targets a stronger threat model, where we have to
enforce confidentiality and integrity [21], or enable new design a secure storage system leveraging Intel SGX.
features that include data sharing [44], database interface [46], Robust storage systems. Robust storage systems provide
policy-based storage [19, 77], or policy-based data seal/unseal strong safety and liveness guarantees in the untrusted cloud en-
operations [67]. Amongst all, Pesos [34] is the most relevant vironment [14, 42, 79]. In particular, Depot [42] protects data
system since it targets a similar threat model. In particular, from faulty infrastructure in terms of durability, consistency,
Pesos proposes a policy-based secure storage system based availability, and integrity. Likewise, Salus [79] proposed
on Intel SGX and Kinetic disks [31]. However, Pesos relies a block store robust storage system while ensuring data
on trusted Kinetic disks to achieve its security properties, integrity in the presence of commission failures. A2M [14]
whereas S PEICHER targets an untrusted storage, such as an is also a robust system against Byzantine faults, and provides
untrusted SSD. Secondly, Pesos is designed for slow trusted consistent, attested memory abstraction to thwart equivocation.
HDDs, where the additional overheads of the SGX-related In contrast to S PEICHER, this line of work neither provides
operations are eclipsed by slow disk operations. In contrast, confidentiality nor freshness guarantees.
S PEICHER is designed for high-performance SSDs. Secure file systems. There is a large body of work on
Secure databases/datastores. Encrypted databases, such as software-based secure storage systems. SUNDR [41], Plu-
CryptDB [60], Seabed [56], Monomi [76], and DJoin [50], are tus [29], jVPFS [80], SiRiUS [23], SNAD [48], Maat [38] and
designed to ensure the confidentiality of computation in un- PCFS [21] employ cryptography to provide secure storage in
trusted environments. However, they are primarily for preserv- untrusted environments. None of them protect the system from
ing confidentiality. In contrast, S PEICHER preserves all three rollback attacks, and our challenges to overcome overheads
security properties: confidentiality, integrity, and freshness. of shielded execution are irrelevant for them. Among all,
EnclaveDB [61] and CloudProof [59] target a threat model StrongBox [18] provides file system encryption with rollback
and security properties similar to S PEICHER. In particular, protection; however, it does not consider untrusted hosts.
EnclaveDB [61] is a shielded in-memory SQL database.
However, it uses the secondary storage only for checkpoint 7 Conclusion
and logging unlike S PEICHER. Hence, it does not solve the In this paper, we presented S PEICHER, a secure persistent
problem of freshness guarantee for the data stored in the LSM-based KV storage system for untrusted hosts. S PEICHER
secondary storage. Furthermore, the system implementation targets all the three important security properties: strong confi-
does not consider the architectural limitations of SGX. dentiality and integrity guarantees, and also protection against
Secondly, CloudProof [59] is a key-value store designed for rollback attacks to ensure data freshness. We base the design of
untrusted cloud environment. Unlike S PEICHER, it requires S PEICHER on hardware-assisted shielded execution leveraging
the clients to encrypt or decrypt data to ensure confidentiality, Intel SGX. However, the design of S PEICHER extends the
as well as to perform attestation procedures with the server, trust in shielded execution beyond the secure enclave memory
introducing a significant deployment barrier. region to ensure that the security properties are also preserved
TDB [43] proposed a secure database on untrusted storage. in the stateful setting of an untrusted storage medium.
It provides confidentiality, integrity, and freshness using To achieve these security properties while overcoming the ar-
a log-structured data store. However, TBD is based on a chitectural limitations of Intel SGX, we have designed a direct
hypothetical TCB, and it does not address many practical I/O library for shielded execution, a trusted monotonic counter,
problems addressed in our system design. a secure LSM data structure, and associated algorithms
Obladi [17] is a KV store supporting transactions while hid- for storage operations. We implemented a fully-functional
ing the access patterns. While it can effectively hide the values prototype of S PEICHER based on RocksDB, and evaluated the
and their access pattern against the cloud provider, it needs a system using the RocksDB benchmark. Our experimental eval-
trusted proxy. In contrast, S PEICHER does not rely on a trusted uation shows that S PEICHER achieves reasonable performance
proxy. Furthermore, Obladi does not consider rollback attacks. overheads while providing strong security guarantees.
Lastly, in parallel with our work, ShieldStore [30] uses a Acknowledgement. We thank our shepherd Umesh Mahesh-
Merkle tree to build a secure in-memory KV store using Intel wari for the helpful comments.

184 17th USENIX Conference on File and Storage Technologies USENIX Association
8 Appendix
In this appendix, we present the pseudocode for all data Algorithm 3: Range query algorithm of S PEICHER
storage and query operations in S PEICHER. Input: KV-pair with the lowest key and callback method to the
client
Algorithm 1: Put algorithm of S PEICHER /* Build an iterator pointing to the first KV-pair */
Input: KV-pair which should be inserted into the store. iterator ← constructIterator(keymin );
Result: Freshness of MemTable next ← True;
/* Generating a block with the trusted counter */ /* Call the provided function until the iterator is not
hashBlock ← hash(KV,counterWAL +1); valid anymore or a freshness proof failed or the
block ← encrypt(KV,counterWAL +1,hashBlock ); client request to end */
/* Writing the block to the persistent storage, before while isValid(iterator) and state = f resh and next do
the trusted counter gets incremented */ state,value ← Iterator.key_value;
writeWAL (block); next ← callback(state,value);
counterWAL ← counterWAL +1; Iterator ← Iterator.next;
/* Generating hash over the KV-pair for the Memtable */ end
hashKV ← hash(KV );
/* Trying to insert into the memtable, if the memtable is
corrupted return a failure */
f reshness ← putIntoMemtable(KV,hashKV );
return freshness Algorithm 4: Iterator functions of S PEICHER
Input: Start key
Algorithm 2: Get algorithm of S PEICHER Result: Result of freshness proof or iterator
Input: Key in the format of the KV-store Function constructIterator(keymin )
Result: Freshness of the KV-pair and Value /* Build an iterator for each level of the LSM
for level = 0 to numbero f levels do /* Check in each level if pointing to the KV-pair or the next pair in the
key-value is existend, from highest to lowest */ level */
foreach level ∈ Level do
if level = Level0 then /* First level lookup therefore iteratorlevel ← lowerBound(level,key);
lookup in MemTable */ if iteratorlevel .state 6= f resh then
return state
path,value ← lookupMemtable(key) /* It is end
possible that the value is empty, however we iterator.add(iteratorlevel );
still have to do a proof of non-existence */ end
foreach node ∈ path do /* Validate hash values of end
the trace to the leaf node */ Input: iterator
Result: Iterator points to the next KV-pair and freshness of the
if hash(node.le f t,node.right) 6= node.hash then iterator
/* check that the hash value of the child Function next(iterator)
nodes is equal to the stored hash value */ /* Forward all iterators pointing to the current key
*/
/* The integrity and freshness proof foreach iteratorlevel ∈ iterator where iteratorlevel .key =
failed */ iterator.key do
return staleMemTable ,value next(iteratorlevel );
end if iteratorlevel .state 6= f resh then
end return iteratorlevel .state
return f resh,value end
end end
else /* Lookup in a level backup by SST files */ /* Find the level iterator pointing to the lowest key
*/
SST ← findSSTFile(level,key) /* Lookup over for i = 0 to number_levels do
authentication structures similar to MemTable iter ← iterator[i];
*/ if iter.state 6= f resh then
block,value ← lookup(SST slevel ,key); return iter.state
if hash(block) 6= end
SST.hashBlock(block) or !freshness(SST ) then if keylowest > iter.key then
return staleSST ,value keylowest ← iter.key;
end level ← i
return f resh,value end
end end
end iterator.currentLevel(i);
return fresh
end

USENIX Association 17th USENIX Conference on File and Storage Technologies 185
Algorithm 6: Compaction algorithm of S PEICHER
Algorithm 5: Restore algorithm of S PEICHER Input: SSTable file to be compacted one from leveln
Input: Manifest File Result: Multiple SSTable files for leveln+1
Result: Restored KV-store // Creating an Iterator over the higher level SSTable file create a new file
/* Get the counter value of the first record in the and a new data block
manifest and check that the first record is an inital iteratorn ← createIterator(SSTablen );
record */ NewSSTable ← createNewSST();
counter ← Mani f est. f irstCounterValue; block ← createNewBlock();
/* Iterate over all records in the Manifest */ last_key ← iteratorn .key−1;
foreach recordencrypted ∈ Mani f est do // As long as their are KV-pairs remaining in the SSTable open the
record ← decrypt; SSTable file in the next level which has the range of the smallest possible
hash ← hash(record); next key based on the last key compacted. while
/* Check the records hash and counter value, if they has_next(iteratorn ) do
do not match, report an error to the client */ SSTablen+1 ← findSSTFile(n+1,last_key+1);
if hash 6= record.hash then iteratorn+1 ← createIterator(SSTablen+1 );
return Hash does not match // As long as the currently open SSTn+1 file has KV-pairs find the
smaller next key of SSTn and SSTn+1 file. If both have the same next
end
key choose from SSTn file.
if counter 6= record.counter then
while has_next(iteratorn+1 i) do
return Counter does not match
iteratormin ← min(iteratorn ,iteratorn+1 );
end // test if the key value is still fresh, that is check the hash of the
/* If hash and counter match apply the change to the block compare in the SSTable file hash footer and check
KV-store */ against the Manifest
apply(record); if iteratormin 6= f resh then
inc(counter); // If the key value is not fresh return error to client
end return iteratormin .state
/* Check if the last counter in the Manifest matches the end
trusted counter, if not report an error to the client // Add key to block, if the block is then over the size limit for
*/ blocks calculate a hash add the hash to the footer of the new
if counter 6= tusted_counterMani f est then file and write the block to persistent storage, and create a new
return Counter does not match block
end block.add(iteratormin .kv);
/* Get the current WAL and its initial counter value from if size(block) > block_size_limit then
the Manifest */ hash ← hash(block);
counter ← Mani f est. f irstWALCounter; encrypted_block ← encrypt(block);
/* Apply each record of the WAL to the KV if the counter NewSSTable.write(encryptedb lock);
and hash are correct, similar to the Manifest */ NewSSTable.addHash(hash);
// If the file reaches the size limit after an append, write
foreach recordencrypted ∈WAL do the footer to the storage and create a new SSTable
record ← decrypt; if size(NewSSTable) > SSTable_size_limit then
hash ← hash(record); NewSSTable.writeFooter();
if hash 6= record.hash then NewSSTable ← createNewSST();
return Hash does notmatch
end end
if counter 6= record.counter then block ← createNewBlock;
return Counter does not match end
end last_key = iteratormin .key;
apply(record); next(iteratormin );
inc(counter); end
end end
/* Check if the last counter value is the same as the // After compaction, flush the block & write the footer.
trusted counter */ hash ← hash(block);
if counter 6= trusted_counterWAL then encrypted_block ← encrypt(block);
return Counter does not match NewSSTtable.write(encrypted_block);
end NewSSTable.addHash(hash);
/* KV-store was successfully restored and no integrity NewSSTable.writeFooter();
or rollbacks problem were found */ // Write the changes to the Manifest file.
return Success Manifest.remove(SSTn ,SSTn+1 inrangeofSSTn );
Manifest.add(∀NewSSTfile);

186 17th USENIX Conference on File and Storage Technologies USENIX Association
References Scale Distributed Systems and Middleware (LADIS),
2010.
[1] Botan Library. https://botan.randombit.net/.
Last accessed: Jan, 2019. [13] S. Checkoway and H. Shacham. Iago Attacks: Why the
System Call API is a Bad Untrusted RPC Interface. In
[2] Intel DPDK. http://dpdk.org/. Last accessed: Jan,
Proceedings of the 18th ACM International Conference
2019.
on Architectural Support for Programming Languages
[3] Intel, “SGX documentation: sgx create monotonic and Operating Systems (ASPLOS), 2013.
counter". https://software.intel.com/en-us/
[14] B.-G. Chun, P. Maniatis, S. Shenker, and J. Kubiatowicz.
sgx-sdk-dev-reference-sgx-create-monotonic-counter/.
Attested append-only memory: Making adversaries
Last accessed: Jan, 2019.
stick to their word. In Proceedings of Twenty-first ACM
[4] Intel Software Guard Extensions (Intel SGX). https: SIGOPS Symposium on Operating Systems Principles
//software.intel.com/en-us/sgx. Last accessed: (SOSP), 2007.
Jan, 2019.
[15] V. Costan and S. Devadas. Intel SGX Explained, 2016.
[5] RocksDB Benchmarking Tool. https://github.com/
facebook/rocksdb/wiki/Benchmarking-tools. [16] CRN. The ten biggest cloud outages of 2013. https:
Last accessed: Jan, 2019. //www.crn.com/slide-shows/cloud/240165024/
the-10-biggest-cloud-outages-of-2013.htm,
[6] I. Anati, S. Gueron, P. S. Johnson, and R. V. Scarlata. 2013. Last accessed: Jan, 2019.
Innovative technology for CPU based attestation and
sealing. In Proceedings of the 2nd International [17] N. Crooks, M. Burke, E. Cecchetti, S. Harel, R. Agarwal,
Workshop on Hardware and Architectural Support for and L. Alvisi. Obladi: Oblivious serializable transactions
Security and Privacy (HASP), 2013. in the cloud. In 13th USENIX Symposium on Operating
Systems Design and Implementation (OSDI), 2018.
[7] ARM. Building a secure system using trustzone tech-
nology. http://infocenter.arm.com/help/ [18] B. Dickens III, H. S. Gunawi, A. J. Feldman, and
topic/com.arm.doc.prd29-genc-009492c/ H. Hoffmann. Strongbox: Confidentiality, integrity, and
PRD29-GENC-009492C_trustzone_security_ performance using stream ciphers for full drive encryp-
whitepaper.pdf, 2009. Last accessed: Jan, 2019. tion. In Proceedings of the Twenty-Third International
Conference on Architectural Support for Programming
[8] S. Arnautov, B. Trach, F. Gregor, T. Knauth, A. Martin, Languages and Operating Systems (ASPLOS), 2018.
C. Priebe, J. Lind, D. Muthukumaran, D. O’Keeffe,
M. L. Stillwell, D. Goltzsche, D. Eyers, R. Kapitza, [19] E. Elnikety, A. Mehta, A. Vahldiek-Oberwagner,
P. Pietzuch, and C. Fetzer. SCONE: Secure Linux D. Garg, and P. Druschel. Thoth: Comprehensive Policy
Containers with Intel SGX. In Proceedings of the 12th Compliance in Data Retrieval Systems. In Proceedings
USENIX Symposium on Operating Systems Design and of the 25th USENIX Security Symposium (USENIX
Implementation (OSDI), 2016. Security), 2016.

[9] L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, [20] D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V.-A.
A. C. Arpaci-Dusseau, and R. H. Arpaci-Dussea. An Truong, L. Barroso, C. Grimes, and S. Quinlan. Avail-
Analysis of Data Corruption in the Storage Stack. In ability in Globally Distributed Storage Systems. In
Proceedings of the 6th USENIX Conference on File and Proceedings of the 9th USENIX Symposium on Operating
Storage Technologies (FAST), 2008. Systems Design and Implementation (OSDI), 2010.

[10] A. Baumann, M. Peinado, and G. Hunt. Shielding [21] D. Garg and F. Pfenning. A proof-carrying file system.
Applications from an Untrusted Cloud with Haven. In In Proceedings of the 31st IEEE Symposium on Security
Proceedings of the 11th USENIX Symposium on Operat- and Privacy (Oakland), 2010.
ing Systems Design and Implementation (OSDI), 2014.
[22] G. A. Gibson, D. F. Nagle, K. Amiri, J. Butler, F. W.
[11] P. Bhatotia, R. Rodrigues, and A. Verma. Shredder: Chang, H. Gobioff, C. Hardin, E. Riedel, D. Rochberg,
GPU-Accelerated Incremental Storage and Computation. and J. Zelenka. A cost-effective, high-bandwidth storage
In Proceedings of USENIX Conference on File and architecture. In Proceedings of ACM Conference on
Storage Technologies (FAST), 2012. Architectural Support for Programming Languages and
Operating Systems (ASPLOS), 1998.
[12] P. Bhatotia, A. Wieder, R. Rodrigues, F. Junqueira, and
B. Reed. Reliable data-center scale computations. In [23] E.-J. Goh, H. Shacham, N. Modadugu, and D. Boneh.
Proceedings of the 4th International Workshop on Large Sirius: Securing remote untrusted storage. In Proceed-

USENIX Association 17th USENIX Conference on File and Storage Technologies 187
ings of the Network and Distributed System Security Proceedings of the Eleventh European Conference on
Symposium (NDSS), 2003. Computer Systems (EuroSys), 2016.
[24] H. S. Gunawi, M. Hao, T. Leesatapornwongsa, T. Patana- [36] D. Kuvaiskii, O. Oleksenko, S. Arnautov, B. Trach,
anake, T. Do, J. Adityatama, K. J. Eliazar, A. Laksono, P. Bhatotia, P. Felber, and C. Fetzer. SGXBOUNDS:
J. F. Lukman, V. Martin, and A. D. Satria. What Bugs Memory Safety for Shielded Execution. In Proceedings
Live in the Cloud? A Study of 3000+ Issues in Cloud of the 12th ACM European Conference on Computer
Systems. In Proceedings of the ACM Symposium on Systems (EuroSys), 2017.
Cloud Computing (SoCC), 2014.
[37] D. Kuvaiskii, O. Oleksenko, P. Bhatotia, P. Felber, and
[25] M. Hähnel, W. Cui, and M. Peinado. High-resolution C. Fetzer. Elzar: Triple modular redundancy using intel
side channels for untrusted operating systems. In Pro- avx. In proceedings of IEEE/IFIP International Confer-
ceedings of the USENIX Annual Technical Conference ence on Dependable Systems and Networks (DSN), 2016.
(ATC), 2017. [38] A. W. Leung, E. L. Miller, and S. Jones. Scalable security
[26] M. Honda, G. Lettieri, L. Eggert, and D. Santry. PASTE: for petascale parallel file systems. In Proceedings of the
A network programming interface for non-volatile main ACM/IEEE Conference on Supercomputing (SC), 2007.
memory. In 15th USENIX Symposium on Networked [39] LevelDB. http://leveldb.org/. Last accessed: Jan,
Systems Design and Implementation (NSDI), 2018. 2019.
[27] T. Hunt, Z. Zhu, Y. Xu, S. Peter, and E. Witchel. Ryoan: [40] D. Levin, J. R. Douceur, J. R. Lorch, and T. Moscibroda.
A Distributed Sandbox for Untrusted Computation on Trinc: Small trusted hardware for large distributed
Secret Data. In Proceedings of the 12th USENIX Sympo- systems. In Proceedings of the 6th USENIX Symposium
sium on Operating Systems Design and Implementation on Networked Systems Design and Implementation
(OSDI), 2016. (NSDI), 2009.
[28] Intel Storage Performance Development Kit. [41] J. Li, M. Krohn, D. Mazières, and D. Shasha. Secure
http://www.spdk.io. Last accessed: Jan, 2019. untrusted data repository (SUNDR). In Proceedings of
[29] M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang, and 6th USENIX Symposium on Operating Systems Design
K. Fu. Plutus: Scalable secure file sharing on untrusted and Implementation (OSDI), 2004.
storage. In Proceedings of the 2nd USENIX Conference [42] P. Mahajan, S. Setty, S. Lee, A. Clement, L. Alvisi,
on File and Storage Technologies (FAST), 2003. M. Dahlin, and M. Walfish. Depot: Cloud Storage with
[30] T. Kim, J. Park, J. Woo, S. Jeon, and J. Huh. ShieldStore: Minimal Trust. 2011.
Shielded In-memory Key-value Storage with SGX. In [43] U. Maheshwari, R. Vingralek, and W. Shapiro. How to
Proceedings of the 9th ACM European Conference on build a trusted database system on untrusted storage. In
Computer Systems (EuroSys), 2019. Proceedings of the 4th Conference on Symposium on Op-
[31] Kinetic Data Center Comparison. https: erating System Design & Implementation (OSDI), 2000.
//www.openkinetic.org/technology/ [44] K. Mast, L. Chen, and E. Gün Sirer. Enabling
data-center-comparison. Last accessed: Jan, 2019. Strong Database Integrity using Trusted Execution
[32] T. Knauth, M. Steiner, S. Chakrabarti, L. Lei, C. Xing, Environments. 2018.
and M. Vij. Integrating Remote Attestation with [45] S. Matetic, M. Ahmed, K. Kostiainen, A. Dhar, D. Som-
Transport Layer Security. 2018. mer, A. Gervais, A. Juels, and S. Capkun. ROTE:
[33] R. Kotla, T. Rodeheffer, I. Roy, P. Stuedi, and B. Wester. Rollback protection for trusted execution. In 26th
Pasture: Secure offline data access using commodity USENIX Security Symposium (USENIX Security), 2017.
trusted hardware. In Presented as part of the 10th [46] A. Mehta, E. Elnikety, K. Harvey, D. Garg, and P. Dr-
USENIX Symposium on Operating Systems Design and uschel. Qapla: Policy compliance for database-backed
Implementation (OSDI), 2012. systems. In Proceedings of the 26th USENIX Security
Symposium, 2017.
[34] R. Krahn, B. Trach, A. Vahldiek-Oberwagner, T. Knauth,
P. Bhatotia, and C. Fetzer. Pesos: Policy enhanced secure [47] A. Miller, M. Hicks, J. Katz, and E. Shi. Authenticated
object store. In Proceedings of the Thirteenth EuroSys data structures, generically. In Proceedings of the 41st
Conference (EuroSys), 2018. ACM SIGPLAN-SIGACT Symposium on Principles of
Programming Languages (POPL), 2014.
[35] D. Kuvaiskii, R. Faqeh, P. Bhatotia, P. Felber, and
C. Fetzer. Haft: Hardware-assisted fault tolerance. In [48] E. L. Miller, D. D. Long, W. E. Freeman, and B. Reed.

188 17th USENIX Conference on File and Storage Technologies USENIX Association
Strong Security for Network-Attached Storage. In ACM Symposium on Operating Systems Principles
Proceedings of the 1st USENIX Conference on File and (SOSP), 2011.
Storage Technologies (FAST), 2002.
[61] C. Priebe, K. Vaswani, and M. Costa. EnclaveDB: A
[49] C. Min, S. Kashyap, B. Lee, C. Song, and T. Kim. Cross- Secure Database using SGX. In IEEE Symposium on
checking Semantic Correctness: The Case of Finding Security and Privacy (Oakland), 2018.
File System Bugs. In Proceedings of the 25th ACM Sym-
posium on Operating Systems Principles (SOSP), 2015. [62] W. Pugh. Skip lists: A probabilistic alternative to
balanced trees. Communication of ACM (CACM), 1990.
[50] A. Narayan and A. Haeberlen. DJoin: differentially
private join queries over distributed databases. In Pro- [63] P. Raju, S. Ponnapalli, E. Kaminsky, G. Oved, Z. Keener,
ceedings of the 10th USENIX Symposium on Operating V. Chidambaram, and I. Abraham. mlsm: Making
Systems Design and Implementation (OSDI), 2012. authenticated storage faster in ethereum. In 10th
USENIX Workshop on Hot Topics in Storage and File
[51] O. Ohrimenko, F. Schuster, C. Fournet, A. Mehta, Systems (HotStorage), 2018.
S. Nowozin, K. Vaswani, and M. Costa. Oblivious multi-
party machine learning on trusted processors. In 25th [64] T. Ridge, D. Sheets, T. Tuerk, A. Giugliano, A. Mad-
USENIX Security Symposium (USENIX Security), 2016. havapeddy, and P. Sewell. SibylFS: Formal Specification
and Oracle-based Testing for POSIX and Real-world File
[52] O. Oleksenko, D. Kuvaiskii, P. Bhatotia, and C. Fetzer. Systems. In Proceedings of the 25th ACM Symposium
Fex: A software systems evaluator. In 2017 47th Annual on Operating Systems Principles (SOSP), 2015.
IEEE/IFIP International Conference on Dependable
Systems and Networks (DSN), 2017. [65] RocksDB | A persistent key-value store.
https://rocksdb.org/. Last accessed: Jan, 2019.
[53] O. Oleksenko, B. Trach, R. Krahn, M. Silberstein, and
C. Fetzer. Varys: Protecting SGX enclaves from practical [66] N. Santos, K. P. Gummadi, and R. Rodrigues. Towards
side-channel attacks. In 2018 USENIX Annual Technical Trusted Cloud Computing. In Proceedings of the 1st
Conference (USENIX ATC), 2018. USENIX Workshop on Hot Topics in Cloud Computing
(HotCloud), 2009.
[54] P. O’Neil, E. Cheng, D. Gawlick, and E. O’Neil. The
log-structured merge-tree (lsm-tree). In Acta Inf., 1996. [67] N. Santos, R. Rodrigues, K. P. Gummadi, and S. Saroiu.
Policy-sealed data: A new abstraction for building
[55] M. Orenbach, M. Minkin, P. Lifshits, and M. Silberstein.
trusted cloud services. In Proceedings of the 21st
Eleos: ExitLess OS services for SGX enclaves. In Pro-
ceedings of the 12th ACM European ACM Conference USENIX Security Symposium (USENIX Security), 2012.
in Computer Systems (EuroSys), 2017. [68] F. Schuster, M. Costa, C. Gkantsidis, M. Peinado,
[56] A. Papadimitriou, R. Bhagwan, N. Chandran, R. Ramjee, G. Mainar-ruiz, and M. Russinovich. VC3 : Trustworthy
A. Haeberlen, H. Singh, A. Modi, and S. Badrinarayanan. Data Analytics in the Cloud using SGX. In Proceedings
Big data analytics over encrypted datasets with seabed. of the 36th IEEE Symposium on Security and Privacy
In 12th USENIX Symposium on Operating Systems (Oakland), 2015.
Design and Implementation (OSDI), 2016. [69] S. Shinde, D. Le Tien, S. Tople, and P. Saxena.
[57] B. Parno, J. R. Lorch, J. R. Douceur, J. Mickens, and PANOPLY: Low-TCB Linux Applications with SGX
J. M. McCune. Memoir: Practical state continuity for Enclaves. In Proceedings of the Network and Distributed
protected modules. In Proceedings of the 32nd IEEE System Security Symposium (NDSS), 2017.
Symposium on Security and Privacy (Oakland), 2011. [70] L. Soares and M. Stumm. FlexSC: Flexible System Call
[58] T. S. Pillai, V. Chidambaram, R. Alagappan, S. Al- Scheduling with Exception-less System Calls. In Pro-
Kiswany, A. C. Arpaci-Dusseau, and R. H. Arpaci- ceedings of the 9th USENIX Symposium on Operating
Dusseau. Crash consistency. ACM Queue, 2015. Systems Design and Implementation (OSDI), 2010.
[59] R. A. Popa, J. R. Lorch, D. Molnar, H. J. Wang, and [71] R. Strackx and F. Piessens. Ariadne: A minimal
L. Zhuang. Enabling security in cloud storage slas approach to state continuity. In 25th USENIX Security
with cloudproof. In Proceedings of the 2011 USENIX Symposium (USENIX Security), 2016.
Conference on USENIX Annual Technical Conference
[72] E. Thereska, H. Ballani, G. O’Shea, T. Karagiannis,
(USENIX ATC), 2011.
A. Rowstron, T. Talpey, R. Black, and T. Zhu. IOFlow: A
[60] R. A. Popa, C. Redfield, N. Zeldovich, and H. Balakrish- Software-defined Storage Architecture. In Proceedings
nan. CryptDB: protecting confidentiality with encrypted of the 24th ACM Symposium on Operating Systems
query processing. In Proceedings of the Twenty-Third Principles (SOSP), 2013.

USENIX Association 17th USENIX Conference on File and Storage Technologies 189
[73] B. Trach, A. Krohmer, F. Gregor, S. Arnautov, P. Bhatotia, Y. Yarom, and R. Strackx. Foreshadow: Extracting
and C. Fetzer. ShieldBox: Secure Middleboxes using the keys to the Intel SGX kingdom with transient
Shielded Execution. In Proceedings of the ACM out-of-order execution. In Proceedings of the 27th
SIGCOMM Symposium on SDN Research (SOSR), 2018. USENIX Security Symposium (USENIX Security), 2018.
[74] Trusted Computing Group. TPM Main Specifica- [79] Y. Wang, M. Kapritsos, Z. Ren, P. Mahajan, J. Kirubanan-
tion. https://trustedcomputinggroup.org/ dam, L. Alvisi, and M. Dahlin. Robustness in the
tpm-main-specification, 2011. Last accessed: Jan, salus scalable block store. In Proceedings of the 10th
2019. USENIX Symposium on Networked Systems Design and
[75] C.-C. Tsai, D. E. Porter, and M. Vij. Graphene-SGX: Implementation (NSDI), 2013.
A practical library OS for unmodified applications on
SGX. In Proceedings of the USENIX Annual Technical [80] C. Weinhold and H. Härtig. jVPFS: Adding Robustness
Conference (USENIX ATC), 2017. to a Secure Stacked File System with Untrusted Local
Storage Components. In Proceedings of the USENIX
[76] S. Tu, M. F. Kaashoek, S. Madden, and N. Zeldovich. Annual Technical Conference (ATC), 2011.
Processing analytical queries over encrypted data. In
Proceedings of the 39th international conference on Very [81] Y. Xu, W. Cui, and M. Peinado. Controlled-channel
Large Data Bases (VLDB), 2013. attacks: Deterministic side channels for untrusted
[77] A. Vahldiek-Oberwagner, E. Elnikety, A. Mehta, operating systems. In Proceedings of the 36th IEEE
D. Garg, P. Druschel, R. Rodrigues, J. Gehrke, and Symposium on Security and Privacy (Oakland), 2015.
A. Post. Guardat: Enforcing data policies at the storage [82] W. Zheng, A. Dave, J. G. Beekman, R. A. Popa, J. E.
layer. In Proceedings of the 10th ACM European Gonzalez, and I. Stoica. Opaque: An Oblivious and En-
Conference on Computer Systems (EuroSys), 2015. crypted Distributed Analytics Platform. In Proceedings
[78] J. Van Bulck, M. Minkin, O. Weisse, D. Genkin, of the 14th USENIX Symposium on Networked Systems
B. Kasikci, F. Piessens, M. Silberstein, T. F. Wenisch, Design and Implementation (NSDI), 2017.

190 17th USENIX Conference on File and Storage Technologies USENIX Association

You might also like