Co-Designing Distributed Systems with Programmable Network Hardware

Loading...
Thumbnail Image

Date

Authors

Li, Jialin

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The unprecedented scale and demand of today’s datacenter applications present tremendous challenges to the design of distributed systems. These systems need to handle the immense and unpredictable user traffic, remain highly available despite failures, keep data strongly consistent, and meet stringent service-level agreements (SLAs). Existing approaches, however, fall short in meeting these requirements: they require extensive server coordination to guarantee data consistency which leads to severe performance penalties, and they suffer from load imbalance in the presence of highly skewed workloads. This thesis proposes a new approach to designing distributed systems – co-designing distributed systems with the datacenter network. Specifically, we have taken advantage of new-generation programmable switches in datacenters to build several novel network-level primitives that offer strong guarantees. We then leveraged these primitives to enable more efficient protocol and system designs. Our key contribution is the design, implementation, and evaluation of three systems that demonstrate the benefit of this approach. The first two, Network-Ordered Paxos and Eris, virtually eliminate the coordination overhead in state machine replication and fault-tolerant distributed transactions, by relying on network sequencing primitives to consistently order user requests. The third, Pegasus, substantially improves the load balance of a distributed storage system. To achieve this, Pegasus selectively replicates the most popular objects, and tracks and manages the location of replicated objects using an in-network coherence directory implemented in the switch dataplane.

Description

Thesis (Ph.D.)--University of Washington, 2019

Citation

DOI