Co-Designing Distributed Systems with Programmable Network Hardware
Loading...
Date
Authors
Li, Jialin
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The unprecedented scale and demand of today’s datacenter applications present tremendous challenges to the design of distributed systems. These systems need to handle the immense and unpredictable user traffic, remain highly available despite failures, keep data strongly consistent, and meet stringent service-level agreements (SLAs). Existing approaches, however, fall short in meeting these requirements: they require extensive server coordination to guarantee data consistency which leads to severe performance penalties, and they suffer from load imbalance in the presence of highly skewed workloads. This thesis proposes a new approach to designing distributed systems – co-designing distributed systems with the datacenter network. Specifically, we have taken advantage of new-generation programmable switches in datacenters to build several novel network-level primitives that offer strong guarantees. We then leveraged these primitives to enable more efficient protocol and system designs. Our key contribution is the design, implementation, and evaluation of three systems that demonstrate the benefit of this approach. The first two, Network-Ordered Paxos and Eris, virtually eliminate the coordination overhead in state machine replication and fault-tolerant distributed transactions, by relying on network sequencing primitives to consistently order user requests. The third, Pegasus, substantially improves the load balance of a distributed storage system. To achieve this, Pegasus selectively replicates the most popular objects, and tracks and manages the location of replicated objects using an in-network coherence directory implemented in the switch dataplane.
Description
Thesis (Ph.D.)--University of Washington, 2019
