LTCI, Télécom Paris, Institut Polytechnique de Paris, Francejoaopba01@gmail.com Nomadic Labs, Paris, Francelucianofdz@gmail.com LTCI, Télécom Paris, Institut Polytechnique de Paris, Francepetr.kuznetsov@telecom-paris.fr LTCI, Télécom Paris, Institut Polytechnique de Paris, Francematthieu.rambaud@telecom-paris.fr \CopyrightJ. P. Bezerra, L. Freitas, P. Kuznetsov, and M. Rambaud\ccsdesc[500]Theory of computation Design and analysis of algorithms Distributed algorithms

Asynchronous Latency and Fast Atomic Snapshot

João Paulo Bezerra Luciano Freitas Petr Kuznetsov Matthieu Rambaud

Abstract

This paper introduces a novel, fast atomic-snapshot protocol for asynchronous message-passing systems. In the process of defining what “fast” means exactly, we spot a few interesting issues that arise when conventional time metrics are applied to long-lived asynchronous algorithms. We reveal some gaps in latency claims made in earlier work on snapshot algorithms, which hamper their comparative time-complexity analysis. We then come up with a new unifying time-complexity metric that captures the latency of an operation in an asynchronous, long-lived implementation. This allows us to formally grasp latency improvements of our atomic-snapshot algorithm with respect to the state-of-the-art protocols: optimal latency in fault-free runs without contention, short constant latency in fault-free runs with contention, the worst-case latency proportional to the number of active concurrent failures, and constant, amortized latency.

keywords:

Asynchronous systems, time complexity, atomic snapshot, crash faults

1 Introduction

The distributed snapshot abstraction [13, 25] allows us to determine a consistent view of the global system state. Originally proposed in the asynchronous fault-free message-passing context, it was later cast to shared-memory models [3] as a vector of shared variables, exporting an update operation that writes to one of them and a snapshot operation that returns the current vector state. Atomic snapshot can be implemented from conventional read-write registers in a wait-free manner, i.e., tolerating unpredictable delays or failures of any number of processes. By applying the reduction from shared memory to message-passing [6], one can get an asynchronous distributed atomic-snapshot implementation that tolerates up to a minority of faulty processes. The atomic-snapshot object (ASO) is, in a strong sense, equivalent to lattice agreement (LA) [8, 16]¹¹1Lattice agreement can be seen as a weak version of consensus, where decided values form totally ordered joins of proposed values in a join semi-lattice.: one can implement the other with no time overhead. A long line of results improve time and space complexities of ASO and LA algorithms in shared-memory [5, 4, 7] and message-passing [16, 20, 18, 15, 17] models.

In this paper, we focus on the latency of operations in message-passing ASO implementations. We propose an LA (and, thus, ASO) algorithm that is faster than (or matches) state-of-the-art solutions in all execution scenarios: with or without failures and with or without contention. The comparative analysis of our algorithm with respect to the existing work appeared to be challenging: as we show below, earlier work considered diverging metrics and execution scenarios, and sometimes used over-simplified reasoning. We observed that conventional metrics [12, 6, 2] are not always suitable for long-lived asynchronous algorithms. Besides, prior latency analyses of ASO and LA algorithms [16, 20, 18, 15, 17] used different ways to measure time, which complicated the comparison. We therefore propose a unifying time-complexity analysis of prior asynchronous ASO and LA algorithms with respect to a new metric, which we take as a contribution on its own.

Fault-free

w/o contention

Fault-free

w/ contention

Worst-case

Amortized

constant

Faleiro et al. [16]

2

16

O(k)

yes

Imbs et al. [20]

2

O(n)

O(n)

Garg et al. [17]

\geq 6

\geq 8

O(k)

yes

Garg et al. [17] + Zheng et al. [26]

O(\log n)

O(\log n)

O(\log n)

Delporte et al. [15]

2

O(n)

O(n)

This paper

2

8

O(k)

yes

Table 1: Comparative time complexity of atomic-snapshot algorithms in asynchronous message-passing models. The table shows results for Single-Writer Multi-Reader (SWMR) implementations.

Lamport [24] proposed to measure time in asynchronous systems as the length of the longest chain of causally related messages, the metric used to to determine the best-case latency of consensus [24] and Crusader Agreement [1]. However, as we show in this paper, the metric may produce counter-intuitive results for protocols involving all-to-all communication. For instance, in the failure-free case, the $n$ -process reliable-broadcast [11] exhibits a causal chain of $n$ hops, even though, intuitively one expects it to terminate in one.

Building upon the classical approach by Canetti and Rabin [12], Abraham et al. [2] recently proposed an elegant metric to grasp the good-case latency of broadcast protocols. We observe, however, that the metric does not really apply to executions of long-lived abstractions, which may contain holes – periods of inactivity when no protocol messages are in transit. Moreover, we get diverging results when applying [2] and [12] to operation latency, i.e., the time between invocation and response events of a given operation.

We therefore extend the round-based approach to long-lived abstractions (such as ASO and LA) and establish a framework to measure the time between arbitrary events, subsequently showing that the results align with those from earlier classical metrics [6, 12].

To summarize, our main contribution is a novel LA (and, thus, ASO) protocol that is generally faster than prior solutions, i.e., it exhibits shorter latency of its operations in various scenarios. In our complexity analysis, we compared our protocol to the original long-lived LA algorithm by Faleiro et al. [16]²²2We consider the ASO protocol built atop the lattice agreement protocol proposed in [16]., the first direct message-passing ASO implementation by Delporte et al. [15], the ASO algorithm based on the set-constraint broadcast by Imbs et al. [20], and the ASO algorithms by Garg et al. based on generic construction of ASO from one-shot LA with constant latency in fault-free runs [17] or $\log{n}$ worst-case latency by Zheng et al. [26] (where $n$ is the number of processes).

As shown in Table 1, in a fault-free run, the latency of an operation of our protocol is the optimal two rounds if there is no contention and eight rounds in the presence of contention (four rounds if we ignore the “buffering” period when a value is submitted but not yet proposed), regardless of the number of contending operations. Moreover, the worst-case latency of our algorithm is proportional to the number of active failures $k$ , i.e., the number of faulty processes whose messages are received within the operation’s interval, therefore the amortized latency (averaged over a large number of operations in a long-lived execution) converges to the fault-free constant.

Our protocol can be seen as a novel combination of techniques employed separately in prior work. These include the use of generalized (long-lived) lattice agreement as a basis for ASO [22], the helping mechanism where all the learned lattice values are shared [22], relaying of messages to all replicas instead of quorum-based rounds [20, 17, 21, 14], and buffering proposed values until previous proposals get committed [16]. Similar to earlier proposals [16], our algorithm involves $O(n^{2})$ (all-to-all) communication, which is compensated by its constant (amortized) latency. An interesting open question is whether one can reduce the communication cost in good runs, while maintaining constant amortized latency.

The paper is organized as follows. In Section 2, we present our model assumptions, and in Section 3, we state the problem of atomic snapshot and relate it to generalized lattice agreement. In Section 4, we present our protocol and analyze its correctness. In Section 5, we discuss several gaps in the complexity analyses of earlier work. In Section 6, we present a comparative analysis of time metrics. Certain proofs and a detailed discussion of time complexity of earlier protocols are delegated to the appendix.

2 System Model

Processes and Channels. We consider a system of $n$ processes (or nodes). Processes communicate by exchanging messages $m=(s,r,\textit{data})$ with a sender $s$ , a receiver $r$ , and a message content data.

A process is an automaton modeled as a tuple $(\mathcal{I},\mathcal{O},\mathcal{Q},q_{0},\pi)$ , where $\mathcal{I}$ is a set of inputs (messages and application calls) it can receive, $\mathcal{O}$ is a set of outputs (messages and application responses), $\mathcal{Q}$ is a (potentially infinite) set of possible internal states, $q_{0}\in\mathcal{Q}$ is an initial state and $\pi:2^{\mathcal{I}}\times\mathcal{Q}\rightarrow 2^{\mathcal{O}}\times\mathcal{Q}$ is a transition function mapping a set of inputs and a state to a set of outputs and a new state. Each process $i$ is assigned an algorithm $A_{i}$ which defines $(\mathcal{I},\mathcal{O},\mathcal{Q},q_{0},\pi)$ , a distributed algorithm is an array $[A_{1},...,A_{n}]$ .

Events and Configurations. Application calls and responses are tuples $(i,\textit{aReq})$ and $(i,\textit{aRep})$ with a process identifier, a request, and a reply respectively.

An event $e$ is a tuple $(R,P,S)$ where $R$ is a set of received messages and/or application calls, $P$ is the set of nodes producing the event and $S$ is a set of messages sent and/or application responses. We denote $\textsf{receive}(e)$ as the set of messages received in the event, conversely, $\textsf{send}(e)$ is the set of messages sent. A message hop is a pair $(e,e^{\prime})$ in which $e^{\prime}$ receives at least one message that was sent in $e$ .

Messages in transit are stored in the message buffer.³³3We assume that every message in the message buffer is unique. A configuration $C$ is an $(n+1)$ -array $[M,s_{1},...,s_{n}]$ with the buffer’s state $M=C[0]$ and the local state $s_{i}=C[i]$ of each node $i$ ( $i=1,\ldots,n$ ). Let $C_{0}$ denote the initial configuration in which every $s_{i}$ is an initial state and the buffer $M$ is empty.

Executions. An execution (or run) is an alternating sequence $C_{0}e_{1}C_{1}e_{2}...$ of configurations and events, where for each $j>0$ and $i=1,\ldots,n$ :

1.

$\textsf{receive}(e_{j})\subseteq C_{j-1}[0]$ ;
2.

$e_{j}.S$ consists of messages and application outputs that the nodes in $e_{j}.P$ produce, given their algorithms, their states in $C_{j-1}$ and their inputs in $e_{j}.R$ ; the nodes in $e_{j}.P$ carry their states from $C_{j-1}$ to $C_{j}$ , accordingly;
3.

for the nodes $i\notin e_{j}.P$ , $C_{j-1}[i]=C_{j}[i]$ .

Each triple $C_{j-1}e_{j}C_{j}$ is called a step. In this paper, we consider algorithms defined by deterministic automata, and we assume a default initial state. Thus, we sometimes skip configurations and simply write $e_{1}e_{2}\ldots$ .

In an infinite execution, a process is correct if it takes part in infinitely many steps, and faulty otherwise. We only consider infinite executions in which $f<n/2$ , where $f$ is the number of faulty processes and $n$ is the total number of processes. Moreover, in an infinite execution, messages exchanged among correct processes are eventually received, i.e., if there is an event $e$ from a correct process sending a message $m$ to another correct process, then there is $e^{\prime}$ succeeding $e$ such that $m\in\textsf{receive}(e^{\prime})$ .

We also assume that the communication channels neither alter nor create messages. Finally, we assume that the channels are FIFO: messages from a given source to a given destination arrive in the order they were sent. A FIFO channel can be implemented by attaching sequence numbers to messages, without extra communication or time overhead.

3 Lattice Agreement and Atomic Snapshot

3.1 Lattice Agreement

A join semi-lattice is defined as a tuple $(\mathcal{L},\sqsubseteq)$ , where $\sqsubseteq$ is a partial order on a set $\mathcal{L}$ , such that for any pair of values $u$ and $v$ in $\mathcal{L}$ , there exists a unique least upper bound $u\sqcup v\in\mathcal{L}$ ( $\sqcup$ is called the join operator). Also, $u$ and $v$ in $\mathcal{L}$ are said to be comparable if $u\sqsubseteq v\vee v\sqsubseteq u$ .

The (generalized) Lattice Agreement abstraction LA [16] defined over $(\mathcal{L},\sqsubseteq)$ can be accessed by every node with operation $\textsf{Propose}(v)$ , $v\in\mathcal{L}$ (we say that the node proposes $v$ ) which triggers the reply event $\textsf{Learn}(w)$ (we say that the node learns $w$ ). Each node may invoke Propose any number of times but does so sequentially, that is, it initiates a new operation only after the previous one has returned.⁴⁴4Following [22], without loss of generality, we slightly modified the conventional LA interface [8, 16] by introducing the explicit Propose operation that combine proposing and learning the values, the properties of the abstraction are adjusted accordingly. The abstraction must satisfy:

Definition 3.1 (Lattice Agreement (LA)).

•

Validity. Any value learned by a node is the join of some set of proposed values that includes its last proposal.
•

Stability. The values learned by any node increase monotonically, with respect to $\sqsubseteq$ .
•

Consistency. All values learned are comparable, with respect to $\sqsubseteq$ .
•

Liveness. If a correct node proposes $v$ , it eventually learns a value $w$ .

3.2 Atomic Snapshot Object (ASO)

An atomic snapshot object (ASO) stores a vector of values $R=[r_{1},...,r_{m}]$ and exports two operations: $\textsf{update}(i,v)$ and $\textsf{snapshot}()$ . The $\textsf{update}(i,v)$ operation writes the value $v$ in $R[i]$ and returns OK, and $\textsf{snapshot}()$ returns the entire vector $R$ . An ASO implementation guarantees that every operation invoked by a correct process eventually completes. It also ensures that each of its operations appears to take effect in a single instance of time within its interval, i.e., it is linearizable [19].

Linearizable executions. The history of an execution $E$ is the subsequence of $E$ consisting of invocations and responses of ASO operations (update and snapshot). A history is sequential if each of its invocations is followed by a matching response. An execution is linearizable if, to each of its operation (update or snapshot, except, possibly, for incomplete ones), we can assign an indivisible point within its interval (called a linearization point), so that the operations put in the order of its linearizaton points constitute a legal sequential history of ASO (called a linearization), i.e., every snapshot operation returns a vector where every position contains the last value written to it (using an update operation), or the initial value if there are no such prior updates. Equivalently, a linearizable execution $E$ with history $H$ should have a linearization $S$ , a legal sequential history that (1) no node can locally distinguish a completion of $H$ and $S$ and (2) $S$ respect the real-time order of $H$ , i.e., if operation $op$ completes before operation $op^{\prime}$ in $H$ , then $op^{\prime}$ cannot precede $op$ in $S$ .

We say that an ASO is single writer SW (resp. multi writer MW) if for each of its registers $R[i]$ , only a single process can call $\textsf{update}(i,v)$ (resp. every process can call $\textsf{update}(i,v)$ ). In this paper, we focus mostly on SWMR atomic snapshot objects. In Table 1 we give results only for SWMR. A MWMR ASO can be devised from SWMR by adding an additional “read” phase when updating values (see Section 3.3 for more details).

Next, we show that ASO can be implemented on top of LA with no additional overhead.

3.3 From LA to ASO

To implement a SWMR ASO on top of LA, we consider a partially ordered set $\mathcal{L}^{*}$ of $(m+n)$ -vectors (recall that $m$ is the size of the ASO vector and $n$ is the number of nodes), defined as follows.

A vector position $\ell\in 1,\ldots,m$ is defined as a tuple $(w,v)\in R_{\ell}$ , where $v$ is an element of a value set $V$ equipped with a total order $\leq^{V}$ , and $w\in\mathbb{N}$ is the number of write operations on position $\ell$ . A total order on $R_{\ell}$ is defined in the natural way: for any two tuples $(w_{1},v_{1})\leq^{R_{\ell}}(w_{2},v_{2})\equiv(w_{1}<w_{2})\vee(w_{1}=w_{2}\wedge v_{1}\leq^{V}v_{2})$ . For each process $i=1,\ldots,n$ , the vector position $m+i$ stores the number of snapshot operations executed by $i$ .

The lattice $\mathcal{L}^{*}$ of $(m+n)$ -position vectors is then the composition $R_{1}\times\ldots\times R_{m}\times\mathbb{N}^{n}$ . The partial order $\sqsubseteq^{*}$ on $\mathcal{L}^{*}$ is then naturally defined as the compositions of $<^{R_{1}}\times\ldots\times<^{R_{m}}\times\leq^{n}$ . The composed join operator $\sqcup^{*}$ is the composition of $\max$ operators, one for each position in the $(m+n)$ -position vectors. The construction implies a join semi-lattice [22].

In Algorithm 1, we show how to implement an SWMR atomic snapshot on top of LA defined over the semi-lattice $(\mathcal{L}^{*},\sqsubseteq^{*},\sqcup^{*})$ . For simplicity, we assume that $m=n$ , i.e., the size of the array is the total number of nodes, and that each node $i$ has a dedicated register $i$ where it can write. Elements of $\mathcal{L}^{*}$ are then $2n$ -vectors.

When a node $i$ calls $\textsf{update}($ i,v $)$ , it increments its local writing sequence number $w$ and proposes a $2n$ -vector with $(w,v)$ in position $i$ and initial values in all other positions to the LA object. The vector learned from this proposal is ignored. When the node $i$ calls $\textsf{snapshot}()$ , it increments its local reading sequence number $r$ proposes a $2n$ -vector with $r$ in position $n+i$ and initial values in all other positions to the LA object. The values in the first $n$ positions of the returned vector is then returned as the snapshot outcome.

1:Distributed objects:

2: LA instance on

(\mathcal{L}^{*},\sqsubseteq^{*},\sqcup^{*})

3:upon startup

w\leftarrow 0

r\leftarrow 0

6:operation update(

i,v

)

w\leftarrow w+1

V\leftarrow

2n

-vector with

(w,v)

in position

i

and initial values in all other positions

9: LA.Propose

(V)

10:operation snapshot()

11:

r\leftarrow r+1

12:

V\leftarrow

2n

-vector with

r

in position

n+i

and initial values in all other positions

13: return

(\textsc{LA}.\textsf{Propose}(V))[1..n]

Algorithm 1

\textsc{LA}\to\textbf{SW}\textbf{MR}

ASO conversion.

Algorithm 1 can be extended to implement a MWMR ASO: to update a position $j$ in the array, a node first takes a snapshot to get the current state, gets up-to-date sequence number in position $j$ and proposes its value with a higher sequence number. With this modification, the update operation takes two LA operations instead of one. We refer the reader to [22] for further details.

Theorem 3.2.

Algorithm 1 implements ASO.

Proof 3.3.

We show that every execution of Algorithm 1 is linearizable.

Consider an execution of Algorithm 1, let $H$ be its history. Every operation (snapshot or update) is associated with a unique sequence number and performs a Propose operation on the LA object. If there is an $\textsc{LA}.\textsf{Propose}$ operation that returns $(w,v)$ in position $i$ , by Validity of LA, there is an operation $\textsf{update}(i,v)$ executed by node $i$ with sequence number $w$ that started before the $\textsc{LA}.\textsf{Propose}$ completed and invoked a $\textsc{LA}.$ . In this case, we say that the update operation is successful. Notice that by Validity of LA, the update must have invoked $\textsc{LA}.\textsf{Propose}$ with a vector containing $(w,v)$ in position $i$ .

Now we order complete snapshot operations and complete successful update operations in the order of the values returned by their $\textsc{LA}.\textsf{Propose}$ operations (by Consistency of LA, these values are totally ordered. As each of these $\textsc{LA}.\textsf{Propose}$ returns a value containing its unique sequence number (Stability of LA) , this order respects the real-time order of $H$ . A successful update operation performed by node $i$ with $(w,v)$ in position $i$ that has no complete $\textsc{LA}.\textsf{Propose}$ is placed right before the first snapshot whose $\textsc{LA}.\textsf{Propose}$ returns this value. By construction, the resulting sequential history is legal and locally indistinguishable from a completion of $H$ .

Finally, Liveness implies that every operation invoked by a correct process eventually completes.

4 LA Protocol

In Algorithm 2, we describe our protocol for solving LA. To guarantee amortized constant complexity, the protocol relies on two basic mechanisms, employed separately in earlier work [16, 22]. First, when a node receives a request (e.g., a value from the application), it first adds the request to a buffer ( $\mathit{MPool}$ ) and then relays it before starting a proposal. This ensures that “idle” nodes also help in committing the request. Second, the node relays every learned value so that nodes that are “stuck” can adopt values from other nodes.

4.1 Overview

The protocol is based on helping: every node tries to commit every proposed value it is aware of. As long as the node has active proposals that are not yet committed, it buffers newly arriving proposals in the local variable $\mathit{MPool}$ . Intuitively, in the worst case, an $\textsc{LA}.\textsf{Propose}$ operation has to wait until one of the concurrently invoked $\textsc{LA}.\textsf{Propose}$ operations complete. Once this happens, the currently buffered value is put in the local dictionary $\mathit{Pending}$ and shared with the other nodes (lines 31 and 32) via a PROPOSE message. In turn, the other nodes relay the message to each other (line 38). The dictionary maps a value to the number of times it is "supported" by the nodes (using PROPOSE messages). Once a value $v$ in the dictionary assembles a quorum of $n-f$ of $\langle\textbf{PROPOSE},v\rangle$ messages, i.e., $\mathit{Pending}[v]\geq n-f$ (line 39), the value is added to the $\mathit{Validated}$ variable. Once every value currently stored in $\mathit{Pending}$ is in $\mathit{Validated}$ (line 41), the operation completes with $\mathit{Validated}$ as the learned value. As the final element of the helping mechanism, each process broadcasts every value it learns (lines 45 and 51), ensuring that processes that might otherwise remain “stuck” can complete their current proposal.

In summary, the algorithm relies on four main ideas: 1) buffering incoming requests when already proposing, 2) sharing every received proposal so all processes are quickly aware of active ones, 3) initiating a new proposal only after all currently seen proposals have been validated, and 4) broadcasting learned values to help other processes make progress.

Message complexity. The protocol is comprised of three all-to-all communication phases: processes send and relay requests at lines 23 and 27, proposals at lines 32 and 38, and accepted values at lines 45 and 51. The total number of messages is therefore $O(n^{2})$ . However, a value in a PROPOSE message can include up to $n$ distinct requests, and a value in a ACCEPT message may have arbitrary size. Therefore, in Appendix A, we present a refined protocol description in which processes exchange $O(n^{2})$ messages per individual request. This efficiency is achieved by relaying only the differences between current and previously received proposals and the learned values in phases $2$ and $3$ , thus eliminating redundant messages with the same requests.

14:upon Startup

15:

\mathit{MPool},\mathit{Proposing},\mathit{Validated},\mathit{Learned}\leftarrow\perp

16:

\mathit{Pending}\leftarrow\emptyset

17:operation Propose(

v

)

18: SendRequest(

v

)

19: wait until

v\sqsubseteq\mathit{Learned}

20: return

\mathit{Learned}

21:operation SendRequest(

v

)

22:

\mathit{MPool}\leftarrow\mathit{MPool}\sqcup v

23: send

\langle

\textbf{REQUEST},v

\rangle

to every other node

24:upon Receive

\langle

\textbf{REQUEST},v

\rangle

from a node

25: if

v\not\sqsubseteq\mathit{MPool}\sqcup\mathit{Proposing}\sqcup\mathit{Learned}

then

26:

\mathit{MPool}\leftarrow\mathit{MPool}\sqcup v

27: send

\langle

\textbf{REQUEST},v

\rangle

to every other node

28:upon event

(\mathit{MPool}\neq\perp)\wedge(\mathit{Proposing}=\perp)

29:

\mathit{Proposing}\leftarrow\mathit{MPool}

30:

\mathit{MPool}\leftarrow\perp

31:

\mathit{Pending}[\mathit{Proposing}]\leftarrow 1

32: send

\langle

\textbf{PROPOSE},\mathit{Proposing}

\rangle

to every other node

33:upon Receive

\langle

\textbf{PROPOSE},v

\rangle

from a node

34: if

v\in\mathit{Pending}.\textsf{keys}()

then

35:

\mathit{Pending}[v]++

36: else

37:

\mathit{Pending}[v]\leftarrow 1

38: send

\langle

\textbf{PROPOSE},v

\rangle

to every node

39:upon exists

v

s.t.

\mathit{Pending}[v]=n-f

40:

\mathit{Validated}\leftarrow\mathit{Validated}\sqcup v

41:upon event

\bigsqcup\mathit{Pending}.\textsf{keys}()\sqsubseteq\mathit{Validated}

42: if

\mathit{Learned}\sqsubset\mathit{Validated}

then

43:

\mathit{Learned}\leftarrow\mathit{Validated}

44:

\mathit{Proposing}\leftarrow\perp

45: send

\langle

\textbf{ACCEPT},\mathit{Learned}

\rangle

to every node

46:upon Receive

\langle

\textbf{ACCEPT},w

\rangle

from a node

47: if

(\mathit{Proposing}\sqcup\mathit{Learned}\sqsubseteq w)

then

48:

\mathit{Validated}\leftarrow\mathit{Validated}\sqcup w

49:

\mathit{Learned}\leftarrow w

50:

\mathit{Proposing}\leftarrow\perp

51: send

\langle

\textbf{ACCEPT},\mathit{Learned}

\rangle

to every node

Algorithm 2 Long-Lived LA: code for node

x

4.2 Correctness

Validity and Stability are immediate. We now proceed with Consistency and Liveness.

Lemma 4.1.

If nodes $i$ and $j$ learn, resp., values $w_{i}$ and $w_{j}$ , then $w_{i}$ and $w_{j}$ are comparable.

Proof 4.2.

Suppose that $(w_{i}\not\sqsubseteq w_{j})\wedge(w_{j}\not\sqsubseteq w_{i})$ . Then there must exist $v_{i}\sqsubseteq w_{i}$ and $v_{j}\sqsubseteq w_{j}$ such that $v_{i}\not\sqsubseteq w_{j}$ and $v_{j}\not\sqsubseteq w_{i}$ .

Let $Q_{i}$ (resp. $Q_{j}$ ) be the quorum $i$ used to include $v_{i}$ $\mathit{Validated}$ at line 39. Since $Q_{i}\cap Q_{j}\neq\emptyset$ , there is a common node $x$ that sent $\langle\textbf{PROPOSE},v_{i}\rangle$ to $i$ and $\langle\textbf{PROPOSE},v_{j}\rangle$ to $j$ , but since channels are FIFO, either $i$ received $v_{j}$ or $j$ received $v_{i}$ from $x$ before learning a value, therefore adding the value to $\mathit{Pending}$ . Suppose it was $i$ that received $v_{j}$ before $v_{i}$ , from the condition of line 41, $i$ could not have learned $w_{i}$ if $v_{j}\not\sqsubseteq\mathit{Validated}$ .

Lemma 4.3.

If a correct node $x$ sets $\mathit{Proposing}=v$ , $x$ eventually learns a value with $v$ .

Proof 4.4.

A node $x$ sends a PROPOSE message to every other node whenever it adds a new value to $\mathit{Pending}$ (line 38). If $x$ is correct, it will receive at least $n-f$ PROPOSE messages for every value in $\mathit{Pending}$ , adding the value to $\mathit{Validated}$ . Therefore, the condition in line 41 is never satisfied from some point on only if $x$ keeps adding a new value to $\mathit{Pending}$ before all the current ones are validated.

Since each node proposes only one value at a time (until it learns a value, lines 28, 44, 50), for $x$ to indefinitely add new values to $\mathit{Pending}$ , there must be at least one other node that keeps learning values and proposing new ones. Without loss of generality, let $y$ be one such node. Since faulty nodes eventually crash and stop taking steps, $y$ must be correct. Every time $y$ learns a new value $w$ it sends $\langle\textbf{ACCEPT},w\rangle$ to $x$ , and because channels are FIFO, $x$ receives the ACCEPT message before the new value proposed by $y$ . Eventually (because $x$ sent its proposal to $y$ ), one of the received values $w$ contains $x^{\prime}s$ $\mathit{Proposing}$ and the condition on line 46 is satisfied, $x$ then learns $w$ .

Lemma 4.5.

If a correct node calls $\textsf{Propose}(v)$ , it eventually sets $\mathit{Proposing}=v^{\prime}$ , $v\sqsubseteq v^{\prime}$ .

Proof 4.6.

Let a correct node $x$ call $\textsf{Propose}(v)$ , $x$ then includes $v$ in $\mathit{MPool}$ (line 22). If $x$ is not currently proposing, that is, the current value of $\mathit{Proposing}$ is $\perp$ , then it meets the condition in line 28 and immediately sets $\mathit{Proposing}=\mathit{MPool}$ . Otherwise, by Lemma 4.3, it eventually learns a value and sets $\mathit{Proposing}=\perp$ in lines 44 and 50, thus meeting the condition in line 28 and setting $\mathit{Proposing}=\mathit{MPool}$ .

Lemmas 4.1, 4.3 and 4.5 imply:

Theorem 4.7.

Algorithm 2 implements Generalized Lattice Agreement.

Corollary 4.8.

Algorithms 1 and 2 implement Atomic Snapshot.

4.3 Time metric

We now define the latency metric we are going to use in evaluating time complexity. Our metric is inspired by the metric proposed by Abraham et al. [2] (which in turn rephrases the original metric by Canetti and Rabin [12]). The distinguishing feature of our approach is that it also applies to long-lived executions and executions with holes (illustrated in Figure 1).⁵⁵5In Section 6, we show that the three metrics are equivalent in ”hole-free” executions.

Algorithm 3 describes the iterative method that assigns rounds to events in an execution. We give an informal description of the metric below.

Definition 4.9 (Iterative Round Assignment - Informal).

Algorithm 3 assigns round $0$ to the initial event, and defines the end of round $i$ as the last event that receives a message sent in round $i-1$ . In addition, if there are no more messages to be received (or in transit), the event inherits the round number of its immediate predecessor.

52:

e_{0}^{*}:=e_{0}

53:

e_{0}

is assigned round

0

54:

r:=0

55:for i=1… do

56: if

e_{i}

does not receive a message then

57:

e_{i}

is assigned round

r

58: else

59: Let

e_{j}

be the oldest event from which

e_{i}

receives a message

60: Let

r^{\prime}

be the round assigned to

e_{j}

(r^{\prime}\leq r)

61: Let

e^{\prime}

be the most recent event among

e_{r^{\prime}}^{*}

and

e_{j}

62: All events after

e^{\prime}

and up to

e_{i}

receive round

r^{\prime}+1

63:

e_{r^{\prime}+1}^{*}:=e_{i}

64:

r=r^{\prime}+1

Algorithm 3 Iterative Round Assignment (IRA)

Refer to caption — Figure 1: Example of round assignment using IRA. Arrows represent message transmissions and the number below an event corresponds to its round. A “hole” in communication appears betwen events $e_{3}$ and $e_{5}$ .

Definition 4.10 (IRA - Arbitrary Events).

To measure the latency between two events $e_{i}$ and $e_{j}$ , we assign rounds according to Algorithm 3, starting from $e_{i}$ , with all events up to and including $e_{i}$ receiving round $0$ . The latency between $e_{i}$ and $e_{j}$ is then given by the round assigned to $e_{j}$ .

We say that an application request (or simply request, when there is no ambiguity) completes once the receiving node learns a value which includes the request. For a specific node $i$ , we are interested in measuring the latency between the event $e_{C}$ in which $i$ receives a value $v$ from the application software, and an event $e_{R}$ , in which $i$ learns a value $w$ with $v$ .

4.4 Time complexity of Algorithm 2

We define latency as the number of rounds spanning between the moment a correct process receives an application call and the moment it returns from the operation. In evaluating the latency of our protocol, we consider two types of executions: (1) the fault-free case, when all processes are correct, and (2) the worst-case, when only a majority of processes are correct.

A snapshot operation op precedes another operation $\textit{op}^{\prime}$ if the response event of op happens before the call event for $\textit{op}^{\prime}$ . Two operations are said to be concurrent if none precedes the other. For ASO protocols, we analyze latency in fault-free runs of an operation op in two distinct scenarios: (a) without contention, i.e., when no other operation overlaps in time with op, and (b) with contention, i.e., when there might be an arbitrary number of concurrent operations.

Garg et al. [18] use the notion of amortized time complexity, i.e., the average operation latency taken over a large number of operations in an execution. In some protocols, including ours, the latency of an operation is only affected by the number of faulty processes whose messages are received during the operation’s interval (we call these processes active-faulty). Intuitively, faulty processes take a finite amount of steps, so in these protocols a failure can only affect a finite number of operations. In this paper, we also distinguish ASO protocols with constant time complexity.

Next, we establish the optimality of our protocol under no-contention. A protocol implementing LA tolerates $k$ faults if it satisfies all the properties of Definition 3.1 in every execution with at most $k$ faulty processes.

Theorem 4.11.

Let $\mathcal{P}$ be a distributed protocol that implements LA and tolerates at least one faulty process. Then, there exists a fault-free run of $\mathcal{P}$ in which an LA operation requires at least two rounds of communication to complete without contention.

Proof 4.12.

Consider an operation op initiated by node $x$ , with call event $e_{C}$ and response event $e_{R}$ . Suppose op completes in at most one round in fault-free, contention-free executions.

We first show that there exists an execution $E=e_{1},\ldots,e_{C},\ldots,e_{R}$ such that:

•

$x$ is the only process to take a step in $e_{R}$ ,
•

no message sent by $x$ in $e_{C},\ldots,e_{R}$ is received by any other process before $e_{R}$ .

If multiple processes perform steps in the same event $e$ , we can conceptually "split" $e$ into a sequence of events $e^{1},e^{2},\ldots$ , where each process takes the step in its own dedicated event. Since their steps are independent, these split events are indistinguishable from the original $e$ from each process’s perspective. This reasoning also applies to $e_{R}$ .

Now, assume for the sake of contradiction that in every fault-free, contention-free execution containing both $e_{C}$ and $e_{R}$ , there exists some process $y\neq x$ that receives a message $m$ –sent by $x$ in the interval $e_{C},\ldots,e_{R}$ –before $e_{R}$ occurs.

Let $e_{M}$ denote the event where $y$ receives $m$ . We define rounds from $e_{C}$ ’s perspective:

•

all events up to $e_{C}$ are in round $0$ ,
•

round $1$ ends at the last event $e_{L}$ that receives a message originating in round $0$ .

If $m$ is sent after $e_{C}$ , then we can construct $E$ so that all messages from round $0$ are received before $m$ . This ensures that $e_{M}$ occurs after $e_{L}$ , meaning $e_{M}$ is in round 2. Since $e_{R}$ occurs after $e_{M}$ , it too is assigned round 2–contradicting our assumption that op completes in one round.

If instead $m$ is sent in $e_{C}$ , we can again construct the execution so that all round $0$ messages are received before or at the same time as $m$ , making $e_{M}=e_{L}$ . Since $e_{R}$ occurs after $e_{M}$ , it is again assigned to round 2–a contradiction.

These contradictions hold regardless of whether op is concurrent with any other operation. Hence, such an execution $E$ must exist. Now consider an extension $E^{\prime}$ of $E$ where all messages sent by $x$ after (and including) $e_{C}$ are indefinitely delayed, while messages from other nodes are not.

Suppose a node $z$ invokes a new operation $\textit{op}^{\prime}$ after $e_{R}$ , making $\textit{op}^{\prime}$ non-concurrent with op. Since protocol $\mathcal{P}$ tolerates at least one faulty process, and $x$ appears to have crashed in $E^{\prime}$ , node $z$ must eventually complete $\textit{op}^{\prime}$ without any process receiving any messages from $x$ .

Let $v$ and $w$ be the value proposed and the value learned by $x$ in op, and let $v^{\prime}$ and $w^{\prime}$ be the corresponding values for $z$ in $\textit{op}^{\prime}$ . By Validity, we know $v\sqsubseteq w$ , and by Consistency, we know $w\sqsubseteq w^{\prime}$ , hence $v\sqsubseteq w^{\prime}$ .

However, since no process receives a message from $x$ since $e_{C}$ , no one could have known about $v$ , contradicting the requirement that $w^{\prime}$ must contain $v$ .

Finally, after $\textit{op}^{\prime}$ completes, we can allow all delayed messages from $x$ to be received, making all processes correct in the final execution $E^{\prime}$ . This completes the proof.

Theorem 4.13.

In a fault-free run without contention, a request takes at most $2$ rounds to complete.

Proof 4.14.

Consider a contention-free request with call event $e_{C}$ and return event $e_{R}$ invoked by a node $i$ . There are no call events for other nodes between $e_{C}$ and $e_{R}$ , but some messages from previous proposals may still be in transit.

Suppose $v$ is the value to be proposed for the application call. If $i$ is not proposing (has $\mathit{Proposing}=\perp$ ) when it receives $v$ , then it directly sends $\langle\textbf{PROPOSE},v\rangle$ to everyone. Let $e_{P}$ be the last event in which a process receives $\langle\textbf{PROPOSE},v\rangle$ from $i$ , then every process also sends $\langle\textbf{PROPOSE},v\rangle$ by at most $e_{P}$ . Now take $e_{F}$ as the final event in which a process receives $\langle\textbf{PROPOSE},v\rangle$ in the execution, and $e_{S}$ as the corresponding sending event. It must be that $e_{S}$ happens between $e_{C}$ and (potentially including) $e_{P}$ . Also, because the channels are FIFO, every previous proposal must have been validated before $e_{F}$ , and $i$ will learn a value containing $v$ by at most $e_{F}$ . Let $e_{C}$ be assigned round $0$ , then $e_{P}$ happens at most in round $1$ . As a consequence, $e_{S}$ is assigned either $0$ or $1$ , thus $e_{F}$ can be assigned at most round $2$ . Then, by the end of round $2$ , $i$ already has $v$ validated.

Now suppose that $i$ is proposing when it receives $v$ , so it still has a value $v^{\prime}$ in $\mathit{Pending}$ that is not validated, w.l.o.g. assume that $v^{\prime}$ is the only one. This value must be from a call that already finished, and the corresponding node sent $\langle\textbf{ACCEPT},w\rangle$ containing $v^{\prime}$ before $e_{C}$ . Consider two pairs of events: $(e_{A},e_{A}^{\prime})$ and $(e_{C},e_{C}^{\prime})$ . In the first pair, $e_{A}$ is the event where $\langle\textbf{ACCEPT},w\rangle$ was first sent, and $e_{A}^{\prime}$ is the last event in which $\langle\textbf{ACCEPT},w\rangle$ is received from $e_{A}$ . In the second, $e_{C}$ is the usual application call event and $e_{C}^{\prime}$ is the last event in which $\langle\textbf{REQUEST},v\rangle$ is received from $i$ . There are two cases to consider: 1) $e_{A}^{\prime}$ happens before $e_{C}^{\prime}$ and 2) $e_{C}^{\prime}$ happens before $e_{A}^{\prime}$ .

If it is the first case, then at the moment $e_{C}^{\prime}$ happens, every node was already able to propose $v$ (since there was no other value to be learned). Take the last event $e_{L}$ in which a $\langle\textbf{PROPOSE},v\rangle$ (or a value containing $v$ ) is received, and $e_{S}$ as the corresponding sending event, it follows that $i$ validates $v$ by at most $e_{L}$ and can learn a value containing it. Let $e_{C}$ be assigned round $0$ , $e_{C}^{\prime}$ and $e_{S}$ can be assigned at most round $1$ , and since $e_{L}$ receives a message from $e_{S}$ , it can be assigned at most round $2$ . If it is the second case, then all nodes received $\langle\textbf{REQUEST},v\rangle$ and put $v$ in $\mathit{MPool}$ before $e_{A}^{\prime}$ . Every node proposes $v$ by at most $e_{A}^{\prime}$ (since they can adopt $w$ and stop any current proposal). Let $e_{L}$ be the last event in which a process receives a proposal for $v$ and $e_{S}$ it’s corresponding sending event, similarly to the above cases, $e_{S}$ happens between $e_{C}$ and $e_{A}^{\prime}$ . Now, let $e_{A}$ and $e_{C}$ be assigned round $0$ . $e_{S}$ can be assigned at most round $1$ ( $e_{S}$ happens before or at $e_{A}^{\prime}$ ) and $e_{L}$ at most $2$ , which concludes the proof.

Consider an execution of our algorithm, and let $F$ ( $|F|\leq f$ ) be its set of faulty processes.

Lemma 4.15.

Consider an event in which a correct node sends $\langle\textbf{PROPOSE},v\rangle$ and the first event in which a correct node learns a value including $v$ . If no correct node receives a message from a faulty one between these two events, then there are at most $3$ rounds between them.

Proof 4.16.

A message sent by a correct node is received by every correct node in the execution, and since correct nodes do not receive messages from faulty ones in the interval we are analyzing, we can consider only events originated from correct nodes. Therefore, we only refer to correct nodes in the following.

Let $x$ be the node sending $\langle\textbf{PROPOSE},v\rangle$ , $e_{P}$ be the corresponding event and $e_{P}^{\prime}$ the last event a node receives $\langle\textbf{PROPOSE},v\rangle$ from $x$ . Because $x$ also sends $\langle\textbf{REQUEST},v\rangle$ , by $e_{P}^{\prime}$ every node received the request and must be proposing. Any value learned after $e_{P}^{\prime}$ contains $v$ since all nodes have $v$ in $\mathit{Pending}$ .

Now, at the configuration just after applying $e_{P}^{\prime}$ , let $V$ be the set in which $w\in V$ satisfies: there exists a (correct) node where $w$ is in $\mathit{Pending}$ but is not yet validated. Consider a value $w\in V$ that is the last whose $\langle\textbf{PROPOSE},w\rangle$ is received by any node, where $e_{L}^{\prime}$ is the event in which $\langle\textbf{PROPOSE},w\rangle$ is last received and $e_{L}$ the corresponding sending event. It follows that some node learns a value containing $v$ by at most $e_{L}^{\prime}$ .

Next, take the first event $e_{F}$ in which a node sent $\langle\textbf{PROPOSE},w\rangle$ , and $e_{F}^{\prime}$ the event in which the last $\langle\textbf{PROPOSE},w\rangle$ from $e_{F}$ is received. Note that $e_{L}$ happens at most at $e_{F}^{\prime}$ and $e_{F}$ at most at $e_{P}^{\prime}$ . Let $e_{P}$ be assigned round $0$ , then $e_{P}^{\prime}$ (and thus $e_{F}$ ) can be assigned at most round $1$ , $e_{F}^{\prime}$ (and thus $e_{L}$ ) at most $2$ and lastly, $e_{L}^{\prime}$ can be assigned at most round $3$ . Therefore, there are at most $3$ rounds between a propose and the first learn event for $v$ .

Theorem 4.17.

An operation op takes at most $8$ rounds to complete if, during its interval, no correct node receives a message from a faulty one.

Proof 4.18.

Let $v$ be the value received from the application call for op, $e$ be the event in which node $i$ proposes $v$ (or a value containing $v$ ) and $e^{\prime}$ the event in which a value including $v$ is learned for the first time. From Lemma 4.15, there are at most $3$ rounds between $e$ and $e^{\prime}$ . Since the node that learns $v$ sends $\langle\textbf{ACCEPT},v\rangle$ to everyone, $i$ receives and adopts it in one extra round. We conclude that in at most $4$ rounds every correct node can learn $v$ .

If $i$ is already proposing a value when it receives a call for $v$ , it sends $\langle\textbf{REQUEST},v\rangle$ to everyone and put it in $\mathit{MPool}$ , so it is proposed next. Let $e_{P}$ be the event in which $i$ initiated its previous proposal to $v$ , and consider the worst case where the application call $e_{C}$ with $v$ happens just after $e_{P}$ . From $e_{P}$ to the event in which $i$ learns its previous proposal $e_{P}^{\prime}$ (and thus starts proposing $v$ ), there are at most $4$ rounds, and from $e_{P}^{\prime}$ to the learning event of $v$ there are also at most $4$ rounds. Therefore, the operation completes in at most $8$ rounds.

We say that there are $k$ active faulty nodes during an operation op if, in between the call and return events for op, a message is received from a total of $k$ distinct faulty nodes.

Theorem 4.19.

An operation op takes $O(k)$ rounds to complete, where $k$ is the number of active faulty nodes during op.

Proof 4.20.

See Appendix B.

Corollary 4.21.

Algorithms 1 and 2 together have an amortized time complexity of $8$ rounds.

5 Measuring latency of ASO protocols

We conclude the paper with an overview of time complexity of earlier LA and ASO protocols [16, 15, 20, 17, 18]. We highlight certain gaps in their latency analyses and discuss the ways to fix them. Formalities and proofs are delegated to the appendix.

The first message-passing LA protocol. Faleiro et al. [16] came up with the first LA implementation for asynchronous message-passing systems. They use the metric of [6] to measure latency and conclude that it takes $O(n)$ rounds to output from a lattice agreement operation in their protocol.

We show in Appendix E the somewhat surprising result that this protocol has constant latency of $16$ rounds in fault-free runs. The upper bound holds as long as no message from faulty processes is received during the interval of the operation, implying that their LA protocol has constant amortized time complexity. We conjecture that the protocol has $O(k)$ worst-case latency, where $k$ is the number of actual failures in the execution.

The first direct ASO implementation. Delporte et al. [15] is the first paper to directly implement ASO in message passing systems, instead of using an atomic register implementation [6] and the shared-memory snapshot construction [3].

In fault-free runs without contention, the latency of their protocol is only $2$ rounds. In fault-free runs with contention, we support the claim of a bound of $O(n)$ rounds from [18].

ASO with SCD-Broadcast. Imbs et al. [20] introduce the abstraction of Set Constrained Delivery Broadcast ( $\textsc{SCD}-\textsc{Broadcast}$ ), and show that it allows for implementing LA and ASO with no complexity overhead. In their complexity analysis, they assume bounded message delays and show that the latency of their ASO algorithm in faulty-free and contention-free runs is $2$ rounds. In Appendix E, we show that an operation of their resulting ASO algorithm can take $\Omega(n)$ rounds in fault-free runs with contention. We conjecture that this bound is tight, and so the time complexity of their ASO protocol is $\Theta(n)$ .

A generic ASO algorithm. Garg et al. [17, 18] give a generic construction for atomic snapshot which uses any one-shot LA protocol (see definition in Appendix D) as a building block (with constant latency overhead). The protocol thus inherits the asymptotic complexity of the underlying LA algorithm. They also provide a protocol for one-shot LA with $2$ rounds latency in fault-free runs (using [12]’s metric). Their protocol requires 2 rounds of communication plus two lattice agreement invocations in the good case w/o contention and three lattice invocations with contention, making it at least 6 and 8 message delays, respectively.

For the worst-case latency analysis, they assume an additional requirement over communication channels: if a process executes $\textsf{send}(m)$ , sending $m$ to a correct process, then $m$ is eventually received (even if the sender is faulty). Using this assumption, they show a worst-case latency of $O(\sqrt{f})$ for their LA protocol.

In this paper, we assume a weaker channel that only guarantees delivery of messages among correct processes. We show that under this model, the LA protocol of [17] has an execution that takes $\Omega(f)$ rounds. We conjecture the upper bound of their protocol to be $O(f)$ , and also that when using the stronger assumption, both our (Section 4) and [20]’s protocol have $O(\sqrt{f})$ worst-case latency.

The generic ASO construction may also be combined with the one-shot LA protocol presented in [26], which has worst-case latency of $O(\log f)$ , providing an object whose update and snapshot operations take $O(\log f)$ in both fault-free fault-prone executions. For the sake of completeness, we also provide the time complexity analysis for the one-shot LA protocols from [16] and [20] in Appendix D.

6 Comparative Analysis of Time Measurement Metrics

In this section, we recall metrics used in the literature [6, 12, 2, 23] for measuring time in asynchronous systems. We exhibit executions where the metrics by Attiya et al. [6] and Canetti and Rabin [12] yield arbitrary results due to the presence of holes – “periods of silence” during which no messages are in transit – which are common in long-lived protocols. We show that in a subset of executions without holes, which we refer to as covered executions, these metrics align with the one proposed by Abraham et al. [2]. This is not surprising, as these metrics were designed for distributed tasks, which assume finite hole-free executions. We also recall Lamport’s longest causal chain metric [23] and show that it is not suitable for comparing the ASO protocols we consider here.

Next, we show that the metric from [2] diverges from [6] and [12] when naïvely applied to measure time between arbitrary events. We then show that, after employing our refined method from Section 4.3, they match when measuring rounds between arbitrary events in covered executions.

Finally, we show that both our metric and that of [2] yield equivalent results in cases where [2] is applicable. Altogether, we establish that our metric generalizes [2] and aligns with classical metrics [6, 12] when applied to distributed tasks. A summary of the comparative analysis is presented in Table 2.

Timed

Equivalent to CR

(Covered Executions)

Equivalent to CR

(Arbitrary Events)

Admits

Holes

CR [12]

Yes

Round [6, 9]

Yes

NTR [2]

Yes

LCC [24]

Yes

IRA

Yes

Table 2: Comparison between asynchronous time metrics. Metrics that are timed make use of time assignments to determine the number of rounds between events. We compare each metric against CR, evaluating the number of rounds resulting from applying them over entire (covered) executions and between arbitrary events. Blue stands for "good" features and red—for "bad" ones. The equivalence of NTR to CR holds as long as one uses Definition 4.10.

6.1 Definitions

Timed Executions. We assume a global clock, not accessible to the nodes. A timed event $\overline{e}$ is a pair $(t,e)$ in which $t$ is a non-negative real number, we also say that $\overline{e}$ is a time assignment of $e$ . A timed execution is an alternating sequence $C_{0}\overline{e}_{1}C_{1}\dots$ where $\overline{e}_{1}=(t_{1},e_{1}),\overline{e}_{2}=(t_{2},e_{2}),\dots$ , where events $e_{1},e_{2},\ldots$ are equipped with monotonically increasing times $t_{1},t_{2},\ldots$ :

1.

$t_{m}>t_{l}$ whenever $m>l$ ;
2.

$t_{l}\rightarrow\infty$ as $l\rightarrow\infty$ .⁶⁶6We require this property to avoid the case where a never-terminating execution has a finite time duration.

A time assignment of $E$ is a timed execution $\overline{E}$ in which every event $e_{i}$ in $E$ is matched with a timed event $(t_{i},e_{i})$ in $\overline{E}$ and the sequences of configurations in $E$ and $\overline{E}$ are the same. Notice that an execution allows for infinitely many time assignments.

Let $m$ be a message sent in $\overline{e}_{l}$ and received in $\overline{e}_{m}$ , the delay of $m$ is then defined as $t_{m}-t_{l}$ . For a finite timed execution $\overline{E}=C_{0}\overline{e}_{1}...\overline{e}_{l}C_{l}$ , we define $t_{\textit{start}}(\overline{E})=t_{1}$ , $t_{\textit{end}}(\overline{E})=t_{l}$ (we use $t_{\textit{start}}$ and $t_{\textit{end}}$ when there is no ambiguity) and $\textit{duration}(\overline{E})=t_{\textit{end}}-t_{\textit{start}}$ .

In the subsequent discussion, given an execution $E$ , let $\mathcal{T}(E)$ denote the set of all timed executions $\overline{E}$ based on $E$ .

Time Metrics. It is conventional to measure the execution time by the number of communication rounds, typically calculated using the “longest message delay.” These metrics can be applied to both executions and timed executions. The first metric we consider is defined in Definition˜6.1 [6]. When applied to timed executions, this metric assumes a known upper bound on message delays, which can be normalized to one time unit without loss of generality. To apply this metric to an execution, we consider the maximum duration of all possible timed executions that adhere to the upper-bound communication constraint.

Definition 6.1 (Round metric).

Given a timed execution $\overline{E}$ , in which the maximum message delay is bounded by one unit of time, $\overline{E}$ takes $\textit{duration}(\overline{E})$ rounds.

By extension, an execution $E$ takes $\sup_{\overline{E}\in\mathcal{T}(E)}{\textit{duration}(\overline{E})}$ rounds.

In the metric proposed by Attiya and Welch [9, 10], the time assignments are scaled so that the maximum message delay is always $1$ , thus, the metric produces the same results for executions as Definition 6.1. A more general metric introduced by Canetti and Rabin [12] captures the time complexity of any finite execution. Let $\overline{E}$ be a timed execution, and let $\delta_{\overline{E}}$ be the maximum message delay in it. Then $\overline{E}$ takes $\textit{duration}(\overline{E})/\delta_{\overline{E}}$ CR rounds.

Definition 6.2 (CR metric).

A finite execution $E$ takes $\sup_{\overline{E}\in\mathcal{T}(E)}{\textit{duration}(\overline{E})/\delta_{\overline{E}}}$ rounds, where $\delta_{\overline{E}}$ is the maximum message delay of each corresponding timed execution.

Example 6.3.

Figure 2 shows an execution with four events, where we assign a delay of $\delta$ to the message exchanges $(e_{1},e_{3})$ and $(e_{2},e_{4})$ , and a delay of $\delta-\epsilon$ ( $\epsilon>0$ ) to $(e_{1},e_{2})$ . By making $\epsilon$ arbitrarily small, the number of rounds in this execution converges to $2$ in the CR metric. The same result is obtained in the Round metric by setting $\delta=1$ .

Recently, Abraham et al. [2] proposed an elegant approach that can be directly applied to executions without relying on time assignments. We call this metric non-timed rounds (NTR):

Definition 6.4 (NTR metric).

Given an execution $E$ , each event in $E$ is assigned a round number as follows:

•

The first event $e_{0}$ is assigned round $0$ . We also write $e_{0}^{*}=e_{0}$ ;
•

For any $r\geq 1$ , let $e_{r}^{*}$ be the last event where a message of round $r-1$ is delivered. All events after $e_{r-1}^{*}$ until (and including) event $e_{r}^{*}$ are in round $r$ .

The number of rounds in $E$ is the round assigned to its last event.

Example 6.5.

Coming back to Figure 2, if we assign a round to each event based on Definition 6.4 then $e_{1}$ gets round $0$ , $e_{2}$ and $e_{3}$ get round $1$ and $e_{4}$ is assigned round $2$ . The execution has therefore $2$ rounds according to NTR.

Lamport [24] proposed a metric for latency based on the causal chain of messages. The Longest Causal Chain (LCC) was used to show best-case latency of protocols such as consensus [24] and Crusader Agreement [1].

Definition 6.6 (Longest Causal Chain).

Let $e$ be an event in $E$ and $M$ the set of messages received by $e$ , then $e$ is assigned round $k+1$ , where $k$ is the maximum round of an event originating a message in $M$ . If $M=\emptyset$ , then $k=0$ . The number of rounds in an execution becomes the highest round assigned to one of its events.

This metric, however, diverges from CR and NTR.

Example 6.7 (Reliable Broadcast).

In the reliable broadcast primitive [11], a dedicated source broadcasts a message and, if the source is correct, then all correct nodes should deliver the message. Furthermore, if a correct process delivers a message, then every correct process eventually delivers it. The following protocol satisfies this property:

•

When the source invokes broadcast( $m$ ), it delivers $m$ and sends it to everyone;
•

When a process receives $m$ for the first time, it delivers $m$ and sends it to everyone.

In Figure 3, we depict an execution of this protocol with four processes: $p_{1}$ , $p_{2}$ , $p_{3}$ and $p_{4}$ . Here, $p_{1}$ is the source and broadcasts $m$ , the message is received by $p_{2}$ which then sends $m$ to everyone. Process $p_{3}$ receives $m$ from $p_{2}$ before receiving it from $p_{1}$ , and finally, $p_{4}$ receives $m$ from $p_{1}$ in the last event. This execution has $2$ LCC rounds, while having $1$ round according to CR and NTR.

Example 6.7 shows that the LCC metric diverges from the others in cases where a fast exchange of messages happens in the interval of one (or more) slow message. This is the case for several ASO protocols in the literature (including ours) which heavily rely on relaying values to speed up the validation phase, making the metric unsuitable for our use case. On the other hand, CR and NTR provide equivalent results in covered executions, described next.⁷⁷7The Round and CR metrics also provide equivalent results in covered executions (Appendix C).

6.2 Covered executions and holes

Consider an execution $E=C_{0}e_{1}C_{1}...e_{l}C_{l}$ illustrated in Figure 4(a) where no process receives a message from another process, i.e., events may add messages to the buffer but no event removes a message from it. $\delta_{\overline{E}}$ is not defined in any time assignment $\overline{E}$ .

Now consider an execution $E^{\prime}=C_{0}e_{1}C_{1}...e_{l}C_{l}...e_{m}C_{m}$ in which:

•

A message $m$ is sent in $e_{1}$ and received in $e_{l}$ ;
•

A message $m^{\prime}$ is sent in $e_{l+1}$ and received in $e_{m}$ ;
•

No message from $e_{1}...e_{l}$ is received in $e_{l+1}...e_{m}$ .

In this example, illustrated in Figure 4(b) with $5$ events, $\delta_{\overline{E^{\prime}}}$ exists for any time assignment of $E^{\prime}$ , but we can still assign an arbitrary time difference to $e_{l}$ and $e_{l+1}$ without affecting $\delta_{\overline{E^{\prime}}}$ , which results in the number of CR rounds to be unbounded.

The two executions in the examples above have events whose time difference is unrelated to message delays. By consequence, the duration of these executions can grow irrespective of any bound imposed by message exchanges. Similarly, in Figure 4(b), since there is no message being received in $e_{3}e_{4}e_{5}$ from $e_{1}e_{2}$ , there is no round assignment defined when using NTR to $e_{3}$ , $e_{4}$ and $e_{4}$ .

We then restrict the analysis of these metrics to executions that are covered. Formally:

Definition 6.8 (Covered Execution).

A hole in an execution is a pair $(e_{l},e_{l+1})$ in which no event in $e_{l+1}...$ receives a message from $...e_{l}$ , in other words, there are no message hops among the two sequence of events. An execution is covered iff it has no holes.

Abraham et al. [2] introduce NTR as an equivalent to CR, however, no formal proof is provided. The next result corroborates this claim in covered executions. Later in Example 6.12, we show that using NTR naively to measure time between events may not match CR.

Theorem 6.9.

A finite covered execution $E$ has $k$ CR rounds iff it has $\lceil k\rceil$ NTR rounds.

Proof 6.10.

See Appendix C.1.

6.3 Time between arbitrary events

In long-lived executions (such as those of atomic snapshot algorithms) we are interested in measuring time between two events, for instance, between an application call and response. Definition 6.2 can easily be adapted to measure the number of rounds between two events as follows:

Definition 6.11 (Generalized CR metric).

Let $E$ be an execution, let $\mathcal{T}(E)$ denote the set of all timed executions $\overline{E}$ based on $E$ , and $\delta_{\overline{E}}$ - the maximum message delay in $\overline{E}$ . Let $e_{i}$ and $e_{j}$ ( $j>i$ ) be events in $E$ , and $t_{i}$ and $t_{j}$ time assignments in $\overline{E}$ for them respectively. Then we say that in between $e_{i}$ and $e_{j}$ there are: $\sup_{\overline{E}\in\mathcal{T}(E)}(t_{j}-t_{i})/\delta_{\overline{E}}$ CR rounds.

An appealing way of defining time between two events $e_{i}$ and $e_{j}$ using a non-timed metric is to assign rounds according to NTR, and then take the difference of rounds assigned to $e_{i}$ and $e_{j}$ . As illustrated in Example 6.12, this definition can diverge from generalized CR.

Example 6.12.

Consider the execution shown in Figure 5. We can assign times to $e_{1}$ , $e_{3}$ and $e_{4}$ such that the two message hops have delay of $\delta$ . Now consider the number of rounds between $e_{2}$ and $e_{4}$ , since we can assign a time for $e_{2}$ that is arbitrarily close to $e_{1}$ ’s assignment, there are $2$ CR rounds between $e_{2}$ and $e_{4}$ . However, the round assignments using NTR to $e_{2}$ and $e_{4}$ are $1$ and $2$ respectively, so simply taking the difference between them leads to a value that diverges from CR.

We then give the following definition, using the approach described in Section 4.3:

Definition 6.13 (Generalized NTR).

Given an execution $E$ , let $e_{i}$ and $e_{j}$ ( $j>i$ ) be events in $E$ . The number of rounds between $e_{i}$ and $e_{j}$ is given by the round assigned to $e_{j}$ according to the following:

•

All events up to (and including) $e_{i}$ are assigned round $0$ . We also write $e_{0}^{*}=e_{i}$ ;
•

For any $r\geq 1$ , let $e_{r}^{*}$ be the last event where a message of round $r-1$ is delivered. All events after $e_{r-1}^{*}$ until (and including) event $e_{r}^{*}$ are in round $r$ .

Theorem 6.14.

Let $E$ be a covered execution and $e_{i}$ and $e_{j}$ ( $j>i$ ) be events of $E$ . There are $k$ rounds in between $e_{i}$ and $e_{j}$ according to CR (Definition 6.11) iff there are $\lceil k\rceil$ rounds in between them according to NTR (Definition 6.13).

Proof 6.15.

See Appendix C.2.

6.4 Relating IRA to NTR

Theorem 6.16.

Let $E$ be a finite covered execution and suppose that all events of $E$ are assigned rounds according to IRA after all iterations of the algorithm. It holds that:

1.

Round $0$ is composed only of $e_{0}$ (the initial event).
2.

The final event of round $i+1$ is the last event to receive a message from round $i$ .

Proof 6.17.

See Appendix C.3.

Corollary 6.18.

IRA and NTR assign the same rounds to events in covered executions.

References

[1] I. Abraham, N. Ben-David, G. Stern, and S. Yandamuri. On the round complexity of asynchronous crusader agreement. Cryptology ePrint Archive, 2023.
[2] I. Abraham, K. Nayak, L. Ren, and Z. Xiang. Good-case latency of byzantine broadcast: a complete categorization. CoRR, abs/2102.07240, 2021.
[3] Y. Afek, H. Attiya, D. Dolev, E. Gafni, M. Merritt, and N. Shavit. Atomic snapshots of shared memory. J. ACM, 40(4):873–890, 1993.
[4] J. Aspnes, H. Attiya, K. Censor-Hillel, and F. Ellen. Limited-use atomic snapshots with polylogarithmic step complexity. J. ACM, 62(1):3:1–3:22, 2015.
[5] J. Aspnes and K. Censor-Hillel. Atomic snapshots in o(log3 n) steps using randomized helping. In Y. Afek, editor, Distributed Computing - 27th International Symposium, DISC 2013, Jerusalem, Israel, October 14-18, 2013. Proceedings, volume 8205 of Lecture Notes in Computer Science, pages 254–268. Springer, 2013.
[6] H. Attiya, A. Bar-Noy, and D. Dolev. Sharing memory robustly in message-passing systems. J. ACM, 42(1):124–142, jan 1995.
[7] H. Attiya, F. Ellen, and P. Fatourou. The complexity of updating snapshot objects. J. Parallel Distributed Comput., 71(12):1570–1577, 2011.
[8] H. Attiya, M. Herlihy, and O. Rachman. Atomic snapshots using lattice agreement. Distributed Comput., 8(3):121–132, 1995.
[9] H. Attiya and J. Welch. Distributed computing: fundamentals, simulations, and advanced topics, volume 19. John Wiley & Sons, 2004.
[10] H. Attiya and J. L. Welch. Multi-valued connected consensus: A new perspective on crusader agreement and adopt-commit. In 27th International Conference on Principles of Distributed Systems, 2024.
[11] C. Cachin, R. Guerraoui, and L. Rodrigues. Introduction to reliable and secure distributed programming. Springer Science & Business Media, 2011.
[12] R. Canetti and T. Rabin. Fast asynchronous byzantine agreement with optimal resilience. In Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pages 42–51, 1993.
[13] K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63–75, 1985.
[14] G. Danezis, L. Kokoris-Kogias, A. Sonnino, and A. Spiegelman. Narwhal and tusk: a dag-based mempool and efficient BFT consensus. In EuroSys, pages 34–50. ACM, 2022.
[15] C. Delporte-Gallet, H. Fauconnier, S. Rajsbaum, and M. Raynal. Implementing snapshot objects on top of crash-prone asynchronous message-passing systems. IEEE Transactions on Parallel and Distributed Systems, 29(9):2033–2045, 2018.
[16] J. M. Faleiro, S. Rajamani, K. Rajan, G. Ramalingam, and K. Vaswani. Generalized lattice agreement. In Proceedings of the 2012 ACM Symposium on Principles of Distributed Computing, PODC ’12, page 125–134, New York, NY, USA, 2012. Association for Computing Machinery.
[17] V. Garg, S. Kumar, L. Tseng, and X. Zheng. Amortized constant round atomic snapshot in message-passing systems. arXiv preprint arXiv:2008.11837, 2020.
[18] V. K. Garg, S. Kumar, L. Tseng, and X. Zheng. Fault-tolerant snapshot objects in message passing systems. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1129–1139, 2022.
[19] M. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463–492, 1990.
[20] D. Imbs, A. Mostéfaoui, M. Perrin, and M. Raynal. Set-constrained delivery broadcast: Definition, abstraction power, and computability limits. In Proceedings of the 19th International Conference on Distributed Computing and Networking, pages 1–10, 2018.
[21] I. Keidar, E. Kokoris-Kogias, O. Naor, and A. Spiegelman. All you need is DAG. In PODC, pages 165–175. ACM, 2021.
[22] P. Kuznetsov, T. Rieutord, and S. Tucci-Piergiovanni. Reconfigurable Lattice Agreement and Applications. In P. Felber, R. Friedman, S. Gilbert, and A. Miller, editors, 23rd International Conference on Principles of Distributed Systems (OPODIS 2019), volume 153 of Leibniz International Proceedings in Informatics (LIPIcs), pages 31:1–31:17, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
[23] L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications, 1978.
[24] L. Lamport. Lower bounds for asynchronous consensus. Distributed Computing, 19:104–125, 2006.
[25] F. Mattern. Efficient algorithms for distributed snapshots and global virtual time approximation. J. Parallel Distributed Comput., 18(4):423–434, 1993.
[26] X. Zheng, V. K. Garg, and J. Kaippallimalil. Linearizable Replicated State Machines With Lattice Agreement. In P. Felber, R. Friedman, S. Gilbert, and A. Miller, editors, 23rd International Conference on Principles of Distributed Systems (OPODIS 2019), volume 153 of Leibniz International Proceedings in Informatics (LIPIcs), pages 29:1–29:16, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.

Appendix A Protocol with $O(n^{2})$ message complexity per request

Algorithm 2 has a complexity of $(n^{2})$ messages per proposed value, where each proposal might contain an arbitrary number of requests. As a consequence, processes are required to exchange messages that can grow indefinetely in size, resulting in high network bandwidth usage. We address this problem in Algorithm 4, with a few small modifications to Algorithm 2.

Instead of waiting for validation from a quorum and relaying entire proposals, processes keep track of each individual request, and relay only the difference between a received proposal and the current values waiting for validation. The same occurs in the ACCEPT phase of the protocol, where processes only send the difference between new learned values and previous ones. With these modifications, an individual request is now relayed once by every process before proposing, another time at the proposal and validation phase and one last time in the ACCEPT message, for a total of $3*n^{2}$ messages per request.

65:upon Startup

66:

\mathit{Proposing},\mathit{MPool},\mathit{Pending},\mathit{Relaying},\mathit{Validated},\mathit{Learned},\mathit{ToAdopt}\leftarrow\emptyset

67:operation Propose(

v

)

68: SendRequest(

v

)

69: wait until

v\sqsubseteq\bigsqcup\mathit{Learned}

70: return

\bigsqcup\mathit{Learned}

71:operation SendRequest(

v

)

72:

\mathit{MPool}\leftarrow\mathit{MPool}\cup\{v\}

73: send

\langle

\textbf{REQUEST},v

\rangle

to every other node

74:upon Receive

\langle

\textbf{REQUEST},v

\rangle

from a node

75: if

v\not\in\mathit{MPool}\cup\mathit{Proposing}\cup\mathit{Learned}

then

76:

\mathit{MPool}\leftarrow\mathit{MPool}\cup\{v\}

77: send

\langle

\textbf{REQUEST},v

\rangle

to every other node

78:upon event

(\mathit{MPool}\neq\emptyset)\wedge(\mathit{Proposing}=\emptyset)

79:

\mathit{Proposing}\leftarrow\mathit{MPool}

80: for

v\in\mathit{MPool}

81:

\mathit{Pending}[v]\leftarrow 1

82:

\mathit{MPool}\leftarrow\emptyset

83: send

\langle

\textbf{PROPOSE},\mathit{Proposing}

\rangle

to every other node

84:upon Receive

\langle

\textbf{PROPOSE},V

\rangle

from a node

85:

\mathit{Relaying}\leftarrow\emptyset

86: for

v\in V

87: if

v\in\mathit{Pending}.\textsf{keys}()

then

88:

\mathit{Pending}[v]++

89: else

90:

\mathit{Pending}[v]\leftarrow 1

91:

\mathit{Relaying}\leftarrow\mathit{Relaying}\cup\{v\}

92: if

\mathit{Relaying}\neq\emptyset

then

93: send

\langle

\textbf{PROPOSE},\mathit{Relaying}

\rangle

to every node

94:upon exists

v

s.t.

\mathit{Pending}[v]=n-f

95:

\mathit{Validated}\leftarrow\mathit{Validated}\cup\{v\}

96:upon event

\mathit{Pending}.\textsf{keys}()\subseteq\mathit{Validated}

97: if

\mathit{Learned}\subset\mathit{Validated}

then

98:

\Delta\mathit{Learned}\leftarrow\mathit{Validated}-\mathit{Learned}

99:

\mathit{ToAdopt}\leftarrow\mathit{ToAdopt}-\Delta\mathit{Learned}

100:

\mathit{Learned}\leftarrow\mathit{Validated}

101:

\mathit{Proposing}\leftarrow\emptyset

102: send

\langle

\textbf{ACCEPT},\Delta\mathit{Learned}

\rangle

to every node

103:upon Receive

\langle

\textbf{ACCEPT},W

\rangle

from a node

104:

\mathit{ToAdopt}\leftarrow\mathit{ToAdopt}\cup W

105: if

(\mathit{Proposing}\subseteq\mathit{ToAdopt})

then

106:

\mathit{Validated}\leftarrow\mathit{Validated}\cup\mathit{ToAdopt}

107:

\mathit{Learned}\leftarrow\mathit{Learned}\cup\mathit{ToAdopt}

108:

\mathit{Proposing}\leftarrow\emptyset

109: send

\langle

\textbf{ACCEPT},\mathit{ToAdopt}

\rangle

to every node

Algorithm 4 Refined Long-Lived LA: code for node

x

Appendix B Time Complexity of Algorithm 2

We show that an operation op in Algorithm 2 takes $O(k)$ rounds to complete, where $k$ is the number of active faulty nodes during op.

Messages from and to faulty nodes may not arrive, however, a message sent by (and to) a faulty node at round $r$ is received at most by round $r+1$ . Moreover, since channels are FIFO, when a node $i$ receives a message from another node $j$ , $i$ must also have received all previous messages $j$ sent to $i$ , irrespective of them being correct or faulty.

If a correct node receives $\langle\textbf{PROPOSE},v^{\prime}\rangle$ (even from a faulty node) in round $r$ , every correct node will have $v^{\prime}$ added to $\mathit{Pending}$ by the end of round $r+1$ , and will have $v^{\prime}$ validated by the end of round $r+2$ . Also, faulty nodes wait for its current proposal to finish before starting a new one, in which case they send an ACCEPT message for the last learned value before sending the new proposal.

We say that a node introduces a new value $w$ during the operation if it is the first node to send a $\langle\textbf{PROPOSE},w\rangle$ for $w$ in the interval of the operation. A node can introduce a new value coming from an internal source, i.e., the value was buffered and proposed when the node had already finished its previous proposal, or from an external source, i.e., after receiving a proposal originated from another node before the operation started.

Let $v$ be the value received from the application call for op and $e_{C}$ (as well as all previous events) be assigned round $0$ . If there are no active faulty nodes, a correct node learns a value containing $v$ by at most round $7$ (by Lemma 4.15, here, we include the time $v$ can remain buffered). Also by the end of round $5$ , every correct node has sent a PROPOSE message for $v$ and has $v$ validated by the end of round $6$ (including buffering time, a correct node proposes $v$ in round $4$ at the latest). By that point, all correct nodes are waiting for their proposals to complete and, therefore, cannot introduce a value from an internal source. In order to delay a correct node from leaning a value containing $v$ by round $7$ , every correct node should receive a new value in a PROPOSE message before, which is added to $\mathit{Pending}$ but is not validated. Using a simple inductive argument, $2k+1$ new proposals originated from faulty nodes are necessary to delay a correct node from learning a value from round $7$ to round $7+2k$ .

Suppose that there is an execution where it takes $8+2k+1$ rounds for node $i$ to complete an operation. But there are only $k$ active faulty nodes, which means that at least $k+1$ extra proposals were introduced by active faulty nodes.

Let $f_{0}$ be an active faulty node that introduced more than one of the $2k+1$ values that delayed the operation (assuming w.l.o.g. that there are exactly $2k+1$ new proposals). Let $w$ and $w^{\prime}$ be the first and the second values introduced by $f_{0}$ respectively. If $w^{\prime}$ was received by $f_{0}$ from an internal source, $f_{0}$ should have finished its previous proposal (and learned a value containing $w$ ) before proposing $w^{\prime}$ . But because $w$ was one of the values that delayed the operation, and since channels are FIFO, $f_{0}$ needs to add $v$ to $\mathit{Pending}$ before validating $w$ (at least a majority of correct nodes sent a PROPOSE for $v$ before sending a PROPOSE for $w$ ). $f_{0}$ then learns a value containing $v$ and sends ACCEPT with that value to everyone. The ACCEPT message is received by correct processes before $\langle\textbf{PROPOSE},w^{\prime}\rangle$ , and they would be able to adopt it.

So $f_{0}$ must have received $\langle\textbf{PROPOSE},w^{\prime}\rangle$ from an external source at most by round $1$ , which means it issued proposals for $w^{\prime}$ that can be received by at most round $2$ . We can also conclude that at least $k+1$ values were introduced by active faulty nodes from external sources. Now let $w_{k+1}$ be the $(k+1)$ th such value used to delay correct nodes from learning $v$ . The earliest round $w_{k+1}$ can delay is $7+k$ , which means that by round $7+k$ all correct nodes already sent a propose for $w_{k+1}$ , but by the end of round $5+k$ no correct node has done it (otherwise $w_{k+1}$ would have been validated in round $7+k$ by every correct process). Take the first active faulty node $f_{1}$ from which a correct node received $\langle\textbf{PROPOSE},w_{k+1}\rangle$ . Since the earliest this message is received is in round $6+k$ , the earliest it could be sent is in round $5+k$ , so $f_{1}$ first received $\langle\textbf{PROPOSE},w_{k+1}\rangle$ from another distinct active faulty node, $f_{2}$ , which sent it in round $4+k$ the earliest. But $w_{k+1}$ was introduced from an external source and it needs to be received by a faulty node at round $1$ . Following the chain above, for the node $f_{k+6}$ to receive it in round $1$ , there would be necessary a chain of $k+6$ active nodes, although there are only $k$ .

Therefore, an operation takes less than $8+2k+1$ rounds to complete.

Appendix C Equivalence Proofs for Time Measurement Metrics

In this section, we present detailed proofs for the equivalence between CR and NTR in covered executions. The proofs are written with respect to a new (non-timed) method for interpreting latency: the minimum number of message hops that can cover an execution. Before proceeding, we establish the equivalence between the Round and CR metrics.

Theorem C.1.

Round and CR assign the same number of rounds to finite covered executions.

Proof C.2.

Let $E$ be a finite covered execution and $\overline{E}$ a time assignment for $E$ , with maximum message delay $\delta_{\overline{E}}$ . Since we consider algorithms that do not make use of clocks, we can “shrink” or “stretch” time assignments without altering the steps in the underlying execution. Consider the time assignment $\overline{E}^{\prime}$ built as following:

1.

$t_{\textit{start}}(\overline{E}^{\prime})=t_{\textit{start}}(\overline{E})$ ;
2.

For every event $\overline{e}_{l}$ in $\overline{E}$ with time $t_{l}$ , have $\overline{e}_{l}^{\prime}$ in $\overline{E}^{\prime}$ with time $t_{l}^{\prime}=t_{\textit{start}}+(t_{l}-t_{\textit{start}})\frac{1}{\delta_{\overline{E}}}$ .

We call $\overline{E}^{\prime}$ a normalization of $\overline{E}$ . By construction, the maximum message delay in $\overline{E}^{\prime}$ is $1$ and $\overline{E}^{\prime}$ has the same number of CR rounds than $\overline{E}$ .

Now let $\mathcal{T}(E)$ be the set of valid executions for the Round metric and $\overline{E}\in\mathcal{T}(E)$ have $k$ rounds (using Round). If $\delta_{\overline{E}}<1$ , then the normalization $\overline{E}^{\prime}$ of $\overline{E}$ has $k^{\prime}>k$ rounds: $\textit{duration}(\overline{E}^{\prime})=t_{end}(\overline{E}^{\prime})-t_{start}(\overline{E}^{\prime})=(t_{end}(\overline{E})-t_{start}(\overline{E}))/\delta_{\overline{E}}$ .

Consider the set $\mathcal{T}^{\prime}(E)$ of valid executions for the Round metric where for all $\overline{E}^{\prime}\in\mathcal{T}^{\prime}(E)$ , $\delta_{\overline{E}^{\prime}}=1$ . Since for every timed execution $\overline{E}\in\mathcal{T}(E)$ with $k$ rounds, there is a timed execution $\overline{E}^{\prime}\in\mathcal{T}^{\prime}(E)$ with $k^{\prime}$ rounds where $k^{\prime}\geq k$ , then: $\sup_{\overline{E}^{\prime}\in\mathcal{T}^{\prime}(E)}{\textit{duration}(\overline{E}^{\prime})}=\sup_{\overline{E}\in\mathcal{T}(E)}{\textit{duration}(\overline{E})}$ .

Easy to see that every execution $\overline{E}^{\prime}\in\mathcal{T}^{\prime}(E)$ has the same number of rounds according to both CR and Round metrics. So if the Round metric assigns $k$ rounds to $E$ and CR assigns $k^{\prime}$ , $k^{\prime}\geq k$ . But we also know that for any time assignment $\overline{E}$ of $E$ , the normalization of $\overline{E}$ is a valid timed-execution for the Round metric and has the same number of CR rounds as $\overline{E}$ . This means that $k\geq k^{\prime}$ , since for any time assignment $\overline{E}$ , there is a time assignment with the same number of rounds in both Round and CR metrics, therefore $k=k^{\prime}$ .

C.1 A new look at execution latency

In covered executions, it seems natural to relate the number of message hops to the number of communication rounds. Next, we define the concept of covering executions and events with message hops.

Consider the finite execution $E=e_{0},\ldots,e_{l}$ . We can visualize these events as points on a real line, where their positions correspond to their indices, that is, $e_{0}$ at $0$ , $e_{1}$ at $1$ and so on. Each pair of events $(e_{i},e_{j})$ defines an interval $[i,j]$ , and we denote this by $\textsf{interval}((e_{i},e_{j}))=[i,j]$ . Likewise, $E$ defines the interval $[0,l]$ , which we represent as $\textsf{interval}(E)=[0,l]$ .

Since a message hop consists of a pair of events $(e_{i},e_{j})$ , it also specifies an interval $[i,j]$ . For a set $M$ of message hops, we define $\textsf{interval}(M)=\bigcup_{m\in M}\textsf{interval}(m)$ .

Definition C.3 (Execution cover).

Let $E$ be a finite execution and $M$ a set of message hops from $E$ . We say that $M$ covers $E$ if $\textsf{interval}(M)=\textsf{interval}(E)$ . Analogously, we say that $E$ can be covered by $k$ message hops if $|M|=k$ .

Theorem C.4.

If a covered execution $E$ has $k$ rounds according to the Round metric, then $\lceil k\rceil$ message hops are necessary and sufficient to cover $E$ .

Proof C.5.

Let $E$ have $k$ rounds according to Round. There is a time assignment $\overline{E}$ with $\textit{duration}(\overline{E})=k-\epsilon$ , where $\epsilon>0$ can be arbitrarily small. Starting from $t_{\textit{start}}$ , assume that there exists a set of message hops where each hop can cover the maximum amount of time, that is, an interval of one unit, then at least $\lceil\textit{duration}(\overline{E})\rceil=\lceil k\rceil$ message hops are necessary to cover the whole duration.

Now, we proceed to build a set $M$ that covers $E$ with $\lceil k^{\prime}\rceil$ message hops, and show next that there exists a timed execution with $k^{\prime\prime}$ rounds, where $\lceil k^{\prime\prime}\rceil=k^{\prime}$ .

For the first element of $M$ , we take pair $p_{1}=(e_{1}^{\prime},e_{1}^{*})$ where $e_{1}^{\prime}=e_{0}$ (the initial event) and $e_{1}^{*}$ is the last event in $E$ where a message from $e_{1}^{\prime}$ is received, we also define $e_{0}^{*}=e_{0}$ . Now, we inductively take pair $p_{i}=(e_{i}^{\prime},e_{i}^{*})$ where $e_{i}^{*}$ is the last event to receive a message originated in $e_{i-2}^{*}...e_{i-1}^{*}$ and $e_{i}^{\prime}$ is the first corresponding event to have sent such message ( $e_{i}^{*}$ may receive more than one). We continue to select pairs until pair $p_{k^{\prime}}=(e_{k^{\prime}}^{\prime},e_{k^{\prime}}^{*})$ where $e_{k^{\prime}}^{*}$ is the last event of $E$ (note that this construction is possible since $E$ is covered).

The set $M=\{(e_{1}^{\prime},e_{1}^{*}),...,(e_{k^{\prime}}^{\prime},e_{k^{\prime}}^{*})\}$ clearly covers $E$ , implying that $E$ can be covered by $k^{\prime}$ message hops. We now show that there exists a time assignment $\overline{E}$ with $k^{\prime\prime}$ rounds in which $\lceil k^{\prime\prime}\rceil=k^{\prime}$ .

Consider the following time assignment:

•

$t_{0}=t_{1}^{\prime}=t_{0}^{*}=0$
•

$t_{1}^{*}=1$

Take the sub-sequence $E_{1}$ containing all events in $e_{0}^{*}...e_{1}^{*}$ except for $e_{0}^{*}$ . Note that $e_{2}^{\prime}$ appears in $E_{1}$ and by construction, every message originated in $E_{1}$ that is received in the execution is received before $e_{2}^{*}$ or at $e_{2}^{*}$ .

We now enumerate the events in $E_{1}$ in reverse order: $e_{1}^{*}$ is assigned $0$ , the event preceding $e_{1}^{*}$ receives $1$ and so on until the first event of $E_{1}$ receives $n_{1}$ . Assign time to these events according to their enumeration $j_{1}$ as following:

•

$t_{j_{1}}=t_{1}^{*}-\epsilon_{1}j_{1}$ , where $0<\epsilon_{1}n_{1}<t_{1}^{*}$ if $n_{1}>0$ and $\epsilon_{1}=0$ otherwise.

We make so that $t_{2}^{*}=t_{1}^{*}-\epsilon_{1}n_{1}+1$ , so that every message hop originated from $E_{1}$ satisfies the upper bound on message delay.

In general, let $E_{i}$ be the sub-sequence containing all events in $e_{i-1}^{*}...e_{i}^{*}$ except for $e_{i-1}^{*}$ . Enumerate the events in $E_{i}$ in the following order: $e_{i-1}^{*}$ receives $0$ , the event preceding it receives $1$ and so on until the first event in $w_{i}$ receives $n_{i}$ . Assign time to these events according to their enumeration $j_{i}$ as follows:

•

$t_{j_{i}}=t_{i}^{*}-\epsilon_{i}j_{i}$ , where $0<\epsilon_{i}n_{i}<(t_{i}^{*}-t_{i-1}^{*})$ if $n_{i}>0$ and $\epsilon_{i}=0$ otherwise.

We make so that $t_{i}^{*}=t_{i-1}^{*}-\epsilon_{i-1}n_{i-1}+1$ . For simplicity, assume that every $n_{i}>0$ (in the case where some $n_{i}=0$ , we just make $t_{i}*=t_{i-1}^{*}+1$ and the following analysis works analogously). From the time assignments above:

	$\displaystyle\textit{duration}(\overline{E})=k^{\prime\prime}=t_{k^{\prime}}^{}-t_{0}^{}$		(1)
	$\displaystyle t_{k^{\prime}}^{*}=k^{\prime}-(\epsilon_{1}n_{1}+...+\epsilon_{k^{\prime}-1}n_{k^{\prime}-1})$		(2)

With the following constraint, for all $i=1,...,k^{\prime}$ (we make $\epsilon_{0}n_{0}=0$ ):

0<\epsilon_{i}n_{i}<1-\epsilon_{i-1}n_{i-1}

(3)

Let us make $\epsilon=\epsilon_{1}=\epsilon_{2}=...=\epsilon_{k^{\prime}-1}$ , and let $n_{max}=max(n_{1},...,n_{k^{\prime}-1})$ . The conditions in (3) can be satisfied by making:

	$\displaystyle\epsilon n_{max}<1-\epsilon n_{max}$		(4)
	$\displaystyle\epsilon<\frac{1}{2n_{max}}$		(5)

In order to make $\lceil k^{\prime\prime}\rceil=k^{\prime}$ , the difference between $k^{\prime}$ and $k^{\prime\prime}$ needs to be in the interval:

0\leq(k^{\prime}-k^{\prime\prime})<1

(6)

Thus,

	$\displaystyle k^{\prime}-k^{\prime\prime}=k^{\prime}-k^{\prime}+(\epsilon_{1}n_{1}+\ldots+\epsilon_{k^{\prime}-1}n_{k^{\prime}-1})<1$		(7)
	$\displaystyle(\epsilon n_{1}+\ldots+\epsilon n_{k^{\prime}-1})<1$		(8)
	$\displaystyle(\epsilon n_{1}+\ldots+\epsilon n_{k^{\prime}-1})<(k^{\prime}-1)\epsilon n_{max}$		(9)

To satisfy (8), we can make so that:

	$\displaystyle(k^{\prime}-1)\epsilon n_{max}<1$		(10)
	$\displaystyle\epsilon<\frac{1}{(k^{\prime}-1)n_{max}}$		(11)

From (5) and (11):

\epsilon<min(\frac{1}{2n_{max}},\frac{1}{(k^{\prime}-1)n_{max}})

(12)

As long as inequality (12) is satisfied, the time assignments we have chosen guarantee that $\lceil k^{\prime\prime}\rceil=k^{\prime}$ . Because $k^{\prime\prime}\leq k$ and $k^{\prime}$ message hops cover $E$ for any time assignment, it also follows that $\lceil k\rceil=k^{\prime}$ .

Corollary C.6.

If a covered execution $E$ has $k$ CR rounds, then $\lceil k\rceil$ message hops are necessary and sufficient to cover $E$ .

The next results corroborate the equivalence between NTR and CR.

Theorem C.7.

Let $E$ be a finite covered execution. $E$ has $k$ rounds in the NTR metric iff $k$ message hops are necessary and sufficient to cover it.

Proof C.8.

Let events in $E$ be assigned rounds according to NTR, resulting in $k$ rounds. We can select a set $M$ of $k$ message hops as following (sufficiency):

•

Take the first pair $p_{1}=(e_{0}^{*},e_{1}^{*})$ ;
•

Take pair $p_{i}=(e_{i}^{\prime},e_{i}^{*})$ , where $e_{i}^{*}$ is the last event of round $i$ , and $e_{i}^{\prime}$ is the first event from which a message is received in $e_{i}^{*}$ ( $e_{i}^{\prime}$ has to be an event of round $i-1$ ).

Since there are $k$ rounds, $M$ has $k$ message hops. It is also easy to see that $M$ covers $E$ .

Suppose that a sequence $M$ of $k^{\prime}$ message hops can cover $E$ with $k^{\prime}<k$ . If we assume that each pair $(e_{l},e_{m})$ are assigned either with the same number of rounds or $e_{m}$ has one round higher than $e_{l}$ , then since $k^{\prime}<k$ , there would be an entire round that is not covered by any message hop. On the other hand, a pair $(e_{l}^{\prime},e_{m}^{\prime})$ cannot have $e_{m}^{\prime}$ assigned two (or more) rounds higher than $e_{l}^{\prime}$ by definition, since $e_{m}^{\prime}$ receives a message from $e_{l}^{\prime}$ (necessity).

Now let $k$ message hops be necessary and sufficient to cover $E$ , and assume that $E$ has $k^{\prime}$ NTR rounds. Then $k=k^{\prime}$ , since $k^{\prime}$ rounds are necessary and sufficient to cover $E$ .

Corollary C.9.

A finite covered execution has $k$ CR rounds iff it has $\lceil k\rceil$ NTR rounds.

C.2 Latency between arbitrary events

We generalize Definition C.3 to account for the time between any two events in an execution.

Definition C.10 (Event cover).

Let $E$ be a finite execution, $e_{i}$ and $e_{j}$ ( $j>i$ ) events in $E$ and $M$ a set of message hops from $E$ . We say that $M$ covers $(e_{i},e_{j})$ if $\textsf{interval}(e_{i}\ldots e_{j})\subseteq\textsf{interval}(M)$ . Analogously, we say that $(e_{i},e_{j})$ can be covered by $k$ message hops if $|M|=k$ .

Theorem C.11.

Let $E$ be a covered execution and $e_{i}$ and $e_{j}$ be events in $E$ . There are $k$ CR rounds in between $e_{i}$ and $e_{j}$ iff $\lceil k\rceil$ message hops are necessary and sufficient to cover them.

The proof of Theorem C.11 is similar to that of Theorem C.4 and is omitted (we can consider a covered sub-sequence of $E$ with $k$ rounds as a covered execution).

Theorem C.12.

Let $E$ be a covered execution and $e$ and $e^{\prime}$ be events in $E$ . If there are $k$ rounds in between $e$ and $e^{\prime}$ according to NTR (Definition 6.13) then $k$ message hops are necessary and sufficient to cover $e$ and $e^{\prime}$ .

Proof C.13.

Let $e$ be assigned round $0$ (as well as all previous events) and $e^{\prime}$ round $k$ . Take $e_{1}^{*}$ , the last event of round $1$ , and the earliest event $e_{1}^{\prime}$ from which $e_{1}^{*}$ received a message. Since $e_{1}^{*}$ receives a message from round $0$ , $e_{1}^{\prime}$ must be assigned round $0$ .

Inductively, take $e_{i}^{*}$ , the last event of round $i$ , and the earliest event $e_{i}^{\prime}$ from which $e_{i}^{*}$ receives a message. Since $e_{i}^{*}$ receives a message from round $i-1$ (by definition), $e_{i}^{\prime}$ must be assigned round $i-1$ .

Consider the set $M=\{(e_{1}^{\prime},e_{1}^{*}),\ldots,(e_{k}^{\prime},e_{k}^{*})\}$ , $M$ clearly covers $e$ and $e^{\prime}$ (sufficiency).

Now consider a set $M^{\prime}$ with $k^{\prime}$ message hops such that $M^{\prime}$ covers $e$ and $e^{\prime}$ . Since $M^{\prime}$ covers the two events, there must be a message hop whose first event (the sender event) is in round $0$ . This is true for any round up to $k-1$ : suppose that there is a round $i$ where no message hop in $M^{\prime}$ has the first event in round $i$ , then since $e,\ldots,e^{\prime}$ is covered, there exists a message originated from a previous round $j<i$ that is received in a round $l>i$ . But then $l\leq i+1$ by definition of the metric, a contradiction. Thus, $M^{\prime}$ includes at least one message hop for each round from $0$ to $k-1$ , so $k^{\prime}\geq k$ (necessity).

Corollary C.14.

Let $E$ be a covered execution and $e_{i}$ and $e_{j}$ be events in $E$ . There are $k$ CR rounds in between $e_{i}$ and $e_{j}$ iff there are $\lceil k\rceil$ rounds in between them according to NTR.

Finally, we prove Theorem 6.16, relating IRA to NTR.

C.3 Proof of Theorem 6.16

Let $E$ be a finite covered execution and suppose that all events of $E$ are assigned rounds according to IRA after all iterations of the algorithm. It holds that:

1.

Round $0$ is composed only of $e_{0}$ (the initial event).
2.

The final event of round $i+1$ is the last event to receive a message from round $i$ .

Proof C.15.

1. The case where $E$ has a single event is immediate, next, we consider executions with more than one event. From the algorithm, $e_{0}^{*}=e_{0}$ ( $e_{0}^{*}$ does not change). Since $E$ is covered, there is at least one event which receives a message from $e_{0}$ . Let $e^{\prime}$ be the last such event. When the algorithm arrives at the iteration for $e^{\prime}$ , since the oldest message is from round $0$ (from $e_{0}$ ), all events after $e_{0}^{*}$ until $e^{\prime}$ are assigned round $1$ (line 62). Since no event can receive round $0$ in later iterations, $e_{0}$ is the only event remaining with round $0$ assigned.

2. As shown above, there is a single event in round $0$ . Let $e^{\prime}$ be the last event to receive a message from $e_{0}$ , in its iteration $e^{\prime}$ then receives round $1$ . The events following $e^{\prime}$ (assuming $e^{\prime}$ is not the last event) might momentarily be assigned to round $1$ (if they do not receive any message, line 57), but since the execution is covered, there must be an event after $e^{\prime}$ that receives a message from $e_{1}...e^{\prime}$ . Let $e^{\prime\prime}$ be the last such event, in its iteration, $e^{\prime\prime}$ is assigned round $2$ , and all events after $e^{\prime}$ (which is the last event $e_{1}^{*}$ is assigned to, line 63) also receive round $2$ . No later iteration can assign round $1$ to those events since no other event receives a message from round $0$ , thus $e^{\prime}$ is the last event to remain with round $1$ assigned.

Now assume that the final event $e_{i}^{*}$ of round $i$ is the last to receive a message from round $i-1$ , and that there is an event assigned to round $i+1$ . Suppose that $e^{*}$ , the last event to receive a message from round $i$ , is not the final event of round $i+1$ . Since $e^{*}$ receives a message from round $i$ but not from an event before round $i$ , it has to be assigned round $i+1$ and $e_{i+1}^{*}$ receives $e^{*}$ in line 63 of the algorithm. It follows that the final event of round $i+1$ comes after $e^{*}$ and receives no message from $e_{0},\ldots,e_{i}^{*}$ . Because the execution is covered, there must be at least one event after the final event of round $i+1$ that receives a message from round $i+1$ . Once more, consider the last such event, so all events after $e_{i+1}^{*}$ until this event are assigned round $i+2$ , leaving $e^{*}$ as the final event of round $i+1$ .

Appendix D One-Shot Lattice Agreement

In the One-Shot Lattice Agreement problem, every process starts with the proposal of an initial value and terminate when it learns a value, such that Validity, Consistency and Liveness are satisfied (Section 3). In this section, we analyze time complexity of one-shot LA protocols, as the abstraction can be used as a building block for implementing ASO [8, 17].

In every protocol execution, all processes start proposing a value simultaneously, i.e., in the initial event. We measure the time for all correct processes to learn a value in the fault-free and worst-case latency. In fault-free executions, all processes are correct and every message sent in the execution must arrive. On the other hand, in the worst-case, there is a set of correct processes $P$ and a set of potentially faulty processes $F$ , where $P$ has $f+1$ processes and $F$ has $f$ processes. All messages exchanged within $P$ arrive, but this is not the case for exchanges within $F$ or between $P$ and $F$ .

We show that: 1) the protocol presented in [16] has a constant latency in fault-free runs, as opposed to the $O(n)$ complexity claimed in the paper. 2) With the conventional model of reliable channels assumed in this paper, the protocol of [17] has $\Omega(f)$ time complexity in the worst-case, as opposed to $O(\sqrt{f})$ when assuming their model. 3) [20]’s protocol has $\Omega(f)$ time complexity in the worst-case, which is not analyzed in their paper.

D.1 One-Shot Lattice Agreement by Faleiro et al. [16]

Figure 6 shows the one-shot LA description which was extracted from [16]. The protocol describes the roles of proposers and acceptors, but we assume that all processes perform both roles.

A proposer proceeds in rounds, were each round consists in sending a proposed value to every acceptor and waiting for the reply from a majority of them. If all replies are acknowledgments, the process can learn the current proposed value. On the other hand, if there is a NACK with an unseen value, the proposer joins it with the previously proposed value and re-sends them.

An acceptor stores the join of every proposed value it receives. When a proposal is received such that it contains all the stored values, the acceptor replies with an acknowledgment, otherwise it sends the stored values back to the proposer in a NACK message.

Theorem D.1.

The one-shot LA protocol of [16] takes at most $6$ rounds in fault-free runs.

Proof D.2.

All processes start their proposal in the initial event, which is in round 0. Regardless of the order of messages, in the last event of round 1, every process will have received everyone’s first proposal and their local $\mathit{acceptedValue}$ is a join of all initial values. So every reply made in round $2$ onward will contain all values.

Consider a process $p$ , every process receives $p$ ’s proposal in round $1$ and reply, so that $p$ refines its proposal (and propose again) in round $2$ at most. If $p$ re-proposes in round $1$ , suppose that $Q$ is the set of processes from which $p$ receives the replies (for this refined proposal), then either: no reply from $Q$ is made in round $2$ (only round $1$ ), or some reply is from round $2$ .

In the first case, since all replies come from round $1$ , the refinement and new proposal must happen in either round $1$ (in which case we come back to the situation above) or round $2$ . In the second, $p$ receives all values and re-propose in at most round $3$ , and since the proposal contains all values, $p$ learns a value by at most round $5$ .

Now the only remaining case is when $p$ initiates a new proposal in round $2$ with some value missing. In this case all replies will be a join of all values, and by at most round $4$ , $p$ will refine the proposal, learning a value by at most round $6$ .

D.2 One-Shot Lattice Agreement by Garg et al. [17, 18]

Garg et al. [17, 18] assume a stronger underlying reliable channel for communication than in this paper. In their papers, the channel is responsible for delivering a message sent from one process to another, thus messages sent by faulty processes (to correct ones) are guaranteed to arrive in an infinite execution. In the following, we analyze their protocol under the more conventional assumption that messages from faulty processes may never be received.

Figure 7 (extracted from [17]) shows a description of their one-shot LA. Every process $i$ maintains a local view array, where each position $j$ in the array contains the values $i$ received from $j$ . In the start of the protocol, every process sends its initial value to everyone. Processes relay (execute block from lines $5$ to $7$ ) every new value they receive from other processes, and can learn a value once their local view satisfy a predicate called equivalence quorum. Intuitively, the local view $V$ of process $i$ satisfies the predicate if there is a quorum in which $V[i]=V[j]$ for every process $j$ in the quorum.

Theorem D.3.

The one-shot LA protocol of [17] takes at most $2$ rounds in fault-free runs.

Proof D.4.

Every process sends their initial value in round $0$ . By the end of round $1$ , every process has already received and relayed all other values, so that by the end of round $2$ , all the local views contains every value, resulting in all processes learning a value.

Theorem D.5.

The one-shot LA protocol of [17] has $\Omega(f)$ worst-case latency.

Proof D.6.

We proceed to build an execution that takes at least $f/2$ rounds to complete. Assume w.l.o.g. that the number of faulty processes is even. Split $F$ into two groups $A=\{l_{1},\ldots,l_{f/2}\}$ and $B=\{l_{f/2+1},\ldots,l_{f}\}$ with $f/2$ processes each. In round $0$ , every process sends its initial value to everyone.

[Round $1$ ] At the beginning of the round, the value $(x_{1},l_{1})$ from $l_{1}$ is received and relayed by every process in $B$ , as well as by a single correct process $l_{c}$ . All remaining values $(x_{i},l_{i})$ from processes in $A$ are received by a single process $l_{f/2+1}\in B$ , which relays them. Processes in $A$ crash just after $l_{f/2+1}$ receives their values, and no other process receives any message from them. At the end of the round, initial values from every non-crashed process (including those in $B$ ) are received and relayed by every non-crashed process.

[Round $2$ ] At the beginning, the first of the remaining values $(x_{2},l_{2})$ that $l_{f/2+1}$ relayed is received (and relayed) by every non-crashed process in $B$ , as well as by $l_{c}$ . A single process $l_{f/2+2}$ receives all other $f/2-2$ values from $l_{f/2+1}$ and relay them, then $l_{f/2+1}$ crashes and no other process receives messages from it. Finally, every non-crashed process receives the values relayed by other non-crashed processes in the previous round. By the end of round $2$ , any non-crashed process have in its view $V[j]$ all initial values sent by non-crashed processes, but for a single correct process $l_{c}$ and all non-crashed processes in $B$ , their position in the view also contains $(x_{1},l_{1})$ . Therefore, no equivalence quorum exists in any local view.

[Round $i+1$ ] At the beginning, the first of the remaining values $(x_{i+1},l_{i+1})$ that $l_{f/2+i}$ relayed is received (and relayed) by every non-crashed process in $B$ , as well as by $l_{c}$ . A single process $l_{f/2+i+1}$ receives all other $f/2-(i+1)$ values from $l_{f/2+i}$ and relay them, then $l_{f/2+i}$ crashes and no other process receives messages from it. At the end of the round, every non-crashed process receives the values relayed by other non-crashed processes in the previous round, but messages from $l_{c}$ are received before any other. Any non-crashed process have in its view $V[j]$ all initial values sent by non-crashed processes, with addition of $(x_{1},l_{1}),\ldots,(x_{i-1},l_{i-1})$ , but for a single correct process $l_{c}$ and all remaining non-crashed processes in $B$ , their position in the view also contains $(x_{i},l_{i})$ . Note that, since messages from $l_{c}$ are received first, non-crashed processes receive and relay $(x_{i},l_{i})$ before forming an equivalence quorum for previous values, and no process is able to learn a value this round.

We can use the above method to delay the execution by $f/2$ rounds.

D.3 One-Shot Lattice Agreement by Imbs et al. [20]

The protocol displayed in Figure 8 (extracted from [20]) solves the problem of Set-Constrained Delivery Broadcast (SCD-Broadcast). It can easily be adapted to solve one-shot LA by adding to the condition of line $17$ that the initial value must be in the output.

The authors use a FIFO broadcast primitive for forwarding messages, so in the following proofs we will assume message channels to be FIFO. We say that a process relays a value when it executes line $11$ of the algorithm (it sends a new received value to everyone). The fundamental blocks of the protocol include:

•

Each process has a logical clock which ticks every time a new value is received and relayed (including its own initial value). The current clock value is attached to the relaying message (called a forward message).
•

Each process stores a set of value views: an array of logical clock values, one position for each process.
•

The following predicate needs to hold in order to output a set o values $O$ : Let $A$ be the set of all received values and $V$ be the set of values received by a quorum. An output is a non-empty set $O\subseteq V$ satisfying: $\forall w\in O,\forall v\in A-O:$ there is a quorum in which each individual clock value for $w$ is smaller than the corresponding value for $v$ .

Not that each process starts sending its initial value to itself before relaying it to everyone. For simplicity, we consider these two actions to be in a single event (the initial event), where the first message sent to itself is ignored.

Theorem D.7.

The SCD-Broadcast protocol in [20] takes at most $2$ rounds in the fault-free runs

Proof D.8.

In the first event (round $0$ ), every process forwards its own initial value to everyone. At the end of round $1$ , all processes have already received all initial values and relayed them. At the end of round $2$ , regardless of the order, all processes received all values from everyone. As a consequence $A-V=\emptyset$ in their local view, so all processes can output $V$ .

Theorem D.9.

The SCD-Broadcast protocol in [20] has $\Omega(f)$ worst-case latency.

Proof D.10.

We proceed to build an execution that takes at least $f/4$ rounds to complete. When we say that a process crashes at some point in the execution, the process no longer takes any more steps and no further messages are received from it unless explicitly stated.

Assume w.l.o.g. that the number of faulty nodes is even. Split $F$ into four groups with $f/4$ processes each: $A$ and $B$ , $C$ and $C^{\prime}$ . In addition, split $P$ into two groups $D$ and $D^{\prime}$ with $f/2$ and $f/2+1$ processes respectively. In round $0$ , every process FIFO broadcasts its initial value to everyone.

[Round 1] At the beginning of the round, a single process $f_{1}\in C$ receives and relay every value $v_{1},\ldots,v_{f/4}$ (in this order) from processes in $A$ . All processes in $A$ then crash. Similarly, a single process $f_{1}^{\prime}\in C^{\prime}$ receives and relay every value $v_{1}^{\prime},\ldots,v_{f/4}^{\prime}$ (in this order) from processes in $B$ , which then crash.

Subsequently, processes in $D$ and all remaining non-crashed processes in $C$ receive $v_{1}$ from $f_{1}$ . Moreover, processes in $D^{\prime}$ and all remaining non-crashed processes in $C^{\prime}$ receive $v_{1}^{\prime}$ from $f_{1}^{\prime}$ . Note that no process in $C\cup D$ received $v_{1}^{\prime}$ and no process in $C^{\prime}\cup D^{\prime}$ received $v_{1}$ . Both $f_{1}$ and $f_{1}^{\prime}$ then crash.

At the end of the round, every initial value from non-crashed processes (sent in round $0$ ) is received and relayed by non-crashed processes.

[Round $i$ ( $i\geq 2$ )] At the beginning of the round, single process $f_{i}\in C$ receives and relays $v_{i},\ldots,v_{f/4}$ from $f_{i-1}$ (resp. $f_{i}^{\prime}\in C^{\prime}$ receives and relays $v_{i}^{\prime},\ldots,v_{f/4^{\prime}}$ from $f_{i-1}$ ).

Subsequently, processes in $D$ and all remaining non-crashed processes in $C$ receive $v_{i}$ from $f_{i}$ (but not $v_{i}^{\prime}$ ). Processes in $D^{\prime}$ and all remaining non-crashed processes in $C^{\prime}$ receive $v_{i}^{\prime}$ from $f_{i}^{\prime}$ (but not $v_{i}$ ). Both $f_{i}$ and $f_{i}^{\prime}$ then crash. Finally, every remaining value sent in the previous round by non-crashed processes are received (and relayed if applicable).

Output conditions. We use $C\cup D$ (resp. $C^{\prime}\cup D^{\prime}$ ) to refer to processes in $C\cup D$ . By construction $|C\cup D|\leq|C^{\prime}\cup D^{\prime}|<f+1$ . When $C\cup D$ receives $v_{1}$ in round $1$ , it gives a clock value of $2$ to $v_{1}$ (similarly with $C^{\prime}\cup D^{\prime}$ and $v_{1}^{\prime}$ ). In the end of the round, $C\cup D$ receives other initial values from non-crashed processes, but not $v_{1}^{\prime}$ , so $v_{1}^{\prime}$ is attributed a higher clock value later. This ensures that $v_{1}^{\prime}$ cannot be in the output without $v_{1}$ , since $C\cup D$ never receives a quorum of forward messages for $v_{1}$ with clock value smaller than that of $v_{1}^{\prime}$ . But all the forward messages received later for $v_{1}^{\prime}$ from $C^{\prime}\cup D^{\prime}$ have their clock values smaller for $v_{1}^{\prime}$ than for $v_{1}$ , so $v_{1}$ also cannot be in the output without $v_{1}^{\prime}$ .

In addition, $C\cup D$ receives $v_{i}$ before receiving $v_{i-1}^{\prime}$ , attributing a smaller clock value to $v_{i}$ . By the end of round $i$ , $C\cup D$ has no quorum for which clock values are smaller for $v_{i-1}^{\prime}$ than for $v_{i}$ , and thus cannot output $v_{i-1}^{\prime}$ . This creates a chain of dependencies where $v_{i-1}$ cannot be in the output without $v_{i-1}^{\prime}$ (and vice-versa), $v_{i-1}^{\prime}$ cannot be in the output without $v_{i}$ , and $v_{i}$ cannot be in the output because not enough forward messages for $v_{i}$ are received in round $i$ . Therefore, in the end of round $i$ , $C\cup D$ (and $C^{\prime}\cup D^{\prime}$ ) is unable to output a value.

The execution described above can be extended for $f/4$ rounds.

Appendix E Atomic Snapshot Operations

The papers [16] and [20] have a long-lived form of the algorithms in Appendix D, for which one can use to implement AS. In the following, we show that an ASO operation using [16] has constant amortized time complexity, and thus conjecture that it has $O(k)$ time complexity in the worst-case. On the other hand, [20]’s ASO operation latency is $O(n)$ even in fault-free runs.

E.1 Atomic Snapshot by Faleiro et al. [16]

The Generalized Lattice Agreement (GLA) described in Figure 9 splits the roles of the processes into proposers, acceptors and learners. For our purpose, we assume that every process performs the three roles. In addition, we add Algorithm 5 on top of the GLA protocol to match the interface used in Algorithm 1.

110:Distributed objects:

111: GLA instance (Figure 9)

112:operation Propose(v)

113: ReceiveValue(v)

114: wait until

v\sqsubseteq

LearntValue()

115: return LearntValue()

Algorithm 5 Bridge protocol for Generalized Lattice Agreement [16].

Theorem E.1.

Consider the ASO protocol built from the composition of Algorithms 1 and 5. An operation takes at most $16$ rounds to complete if, during its interval, no correct process receives a message from a faulty one.

Proof E.2.

A message sent by a correct process is received by every correct process, and if a message sent in round $r$ is received, it must be received in at most round $r+1$ (from the definition of the metric). Since no message from faulty processes is received in the interval of the operation, we consider only events performed by correct ones.

First, we show that once a process sends a proposal (line $24$ in Figure 9) for a value $v$ , all learners learn a value containing $v$ in at most $8$ rounds.

Let $e_{P}$ be the event where process $i$ first sends a proposal for $v$ , and let $0$ be the round assigned to it. By the end of round $1$ , every (correct) process will have received $v$ and joined it in acceptedValue, so that every NACK reply will now include $v$ . As a consequence, every value learned from a proposal (or refinement) made after round $1$ must contain $v$ .

Suppose that some process already learned a value containing $v$ by the end of round $2$ , then it received ACKs for this value from a majority of processes (which are correct). Every learner (thus, every process) receives the same ACKs within one round at most and is able to learn the same value.

Now, if no process has already learned a value containing $v$ , consider the InternalReceive( $v$ ) message which is sent before the proposal. By the end of round $1$ , every process has received the message and added $v$ to its buffer, and since no process had learned $v$ by the end of round $2$ , every process must be proposing (i.e. status = active).

Let $V$ be set of all active proposals in the end of round $2$ , then by the end of round $3$ every acceptor will have received every value in $V$ and added it to acceptedValue. So every reply made in round $4$ onward will contain all current values. If a process refines its proposal in round $5$ , then it must have received at least one reply containing all values for the previous proposal, so by at most round $6$ all acceptors would reply with $ACK$ and all learners would learn a value by at most round $7$ .

Now consider the case where a process $j$ refines its proposal in round $4$ , it may happen that the refined proposal still misses a value, in which case $j$ refines again in round $6$ (the latest) and this next proposal is guaranteed to include all values. Thus, all processes acknowledge the proposal by at most round $7$ and all learners are able to learn a value by round $8$ .

Let $e_{C}$ be the application call event received at a process $i$ , $e_{R}$ its return event, and $v$ the value received for the operation. If $i$ is already active, it first buffers $v$ and waits until the current active proposal finishes before sending a proposal for $v$ . Consider the worst case where $e_{C}$ happens just after $i$ started a new active proposal. As previously shown, it takes at most $8$ rounds until $i$ can propose a new value from bufferedValues again, and once it proposes $v$ it can take another $8$ rounds at most to learn a value with it. In total, from the call event to the return event, there can be at most $16$ rounds.

Corollary E.3.

Algorithms 1 and 5 together have an amortized time complexity of $16$ rounds.

E.2 Atomic Snapshot by Imbs et al. [20]

Imbs et al. [20] use operations of the SCD-Broadcast protocol to implement atomic snapshot. As such, in the proof for Theorem E.4 we build an execution that takes $\Omega(n)$ rounds for a process to output a value in the SCD-Broadcast protocol, implying that the same time complexity for a snapshot operation.

Figure 10 (extracted from [20]) shows the algorithm for MWMR ASO using SCD-Broadcast. The main difference to the SWMR implementation is the addition of line $3$ , which includes a “read” phase before updating the array and thus requires two SCD-Broadcast operations instead of one. As we only consider SWMR ASO implementations, we assume that the only operations in the executions are snapshots (which is unchanged and requires a single SCD-Broadcast operation).

Theorem E.4.

A snapshot operation in [20]’s protocol can take $\Omega(n)$ rounds in fault-free runs.

Proof E.5.

First consider an execution of SCD-Broadcast with an even number of processes. We proceed to build an execution where an operation takes $n$ rounds to complete. We split the system into two groups $A$ and $B$ with $n/2$ processes each. Note that neither $A$ nor $B$ alone form a quorum. We also say $A$ or $B$ to refer to all processes in $A$ or $B$ . In the execution below, every time a process in $A$ (resp. B) replies a value (sends a forward message for it), all the processes in $A$ receive it immediately after (similar for B).

[Round $0$ ] A single process $a_{0}\in A$ sends a forward message with $v_{0}^{A}$ to everyone.

[Round $1$ ] At the beginning of the round, $A$ receives $v_{0}^{A}$ and forwards the value. Subsequently, a process $b_{0}\in B$ sends a new forward message for value $v_{0}^{B}$ , which is received and relayed right away by $B$ (before $v_{0}^{A}$ ). At the end of the round, $B$ then receives $v_{0}^{A}$ from $A$ and relays it, but although there is a quorum for $v_{0}^{A}$ , no quorum has each clock assignment for $v_{0}^{A}$ smaller then that of $v_{0}^{B}$ , thus $B$ cannot output $v_{0}^{A}$ without $v_{0}^{B}$ . Since there is no quorum of replies for $v_{0}^{B}$ , $B$ cannot output.

[Round $2\cdot i$ ] At the beginning of the round, a new process $a_{i}\in A$ sends forward with $v_{i}^{A}$ , received right away (before $v_{i-1}^{B}$ from $B$ ) by $A$ , which relays it. Subsequently, $A$ receives $v_{i-1}^{B}$ and relays it. $A$ is unable to output $v_{i-1}^{A}$ without $v_{i-1}^{B}$ since $B$ assigned a smaller clock value to $v_{i-1}^{B}$ , and is unable to output $v_{i-1}^{B}$ without $v_{i}^{A}$ since it assigned a smaller clock value to $v_{i}^{A}$ , and there is no quorum of replies received for $v_{i}^{A}$ .

[Round $2\cdot i+1$ ] At the beginning, a process $b_{i}\in B$ sends forward with $v_{i}^{B}$ , received right away by $B$ (before $v_{i}^{A}$ ) which relays it. Subsequently, $B$ receives $v_{i}^{A}$ and the reply for $v_{i-1}^{B}$ from $A$ in this order. But $B$ cannot output $v_{i-1}^{B}$ without $v_{i}^{A}$ , since there is no quorum assigning a smaller clock value to $v_{i-1}^{B}$ than to $v_{i}^{A}$ . But $v_{i}^{A}$ cannot be in the output without $v_{i}^{B}$ either, for which $B$ does not have a quorum of replies. $B$ is therefore unable to output.

Using the steps above we can delay the execution up to $n$ rounds. When the number of processes is odd, we split the system into $3$ groups: $A$ , $B$ and $C$ , and proceed in a similar fashion as above for $A$ and $B$ , but a new process from $C$ now has initiate a new value in the beginning of every turn in order to delay the execution. This construction can delay the execution up to $n/3$ rounds. The execution proceeds as following:

[Round $0$ ] A single process $a_{0}\in A$ sends a forward message with $v_{0}^{A}$ to everyone. A single process $c_{0}\in C$ sends a forward message with $v_{0}^{C}$ to everyone.

[Round $1$ ] At the beginning of the round, $A$ receives $v_{0}^{A}$ and forwards the value, $C$ receives $v_{0}^{C}$ and relays it. Subsequently, a process $b_{0}\in B$ sends a new forward message for value $v_{0}^{B}$ , which is received and relayed right away by $B$ (before $v_{0}^{A}$ or $v_{0}^{C}$ ). Also, another process $c_{1}\in C$ sends a forward message with $v_{1}^{A}$ to everyone, which $C$ receives and forward immediately after. At the end of the round, $B$ receives $v_{0}^{A}$ from $A$ and $v_{0}^{C}$ from $C$ and relays them, but although there is a quorum for $v_{0}^{A}$ and $v_{0}^{C}$ , no quorum has each clock assignment for $v_{0}^{A}$ or $v_{0}^{C}$ smaller then that of $v_{0}^{B}$ , thus $B$ cannot output $v_{0}^{A}$ and $v_{0}^{C}$ without $v_{0}^{B}$ . Since there is no quorum of replies for $v_{0}^{B}$ , $B$ cannot output. Similarly, $A$ receives $v_{0}^{C}$ and $C$ receives $v_{0}^{A}$ but they cannot output.

[Round $2\cdot i$ ] At the beginning of the round, a new process $a_{i}\in A$ sends forward with $v_{i}^{A}$ , received right away (before $v_{i-1}^{B}$ from $B$ and $v_{2\cdot i-1}^{C}$ from $C$ ) by $A$ , which relays it. Similarly, a process $c_{2\cdot i}\in C$ sends forward with $v_{2\cdot i}^{C}$ before $C$ receives $v_{i-1}^{B}$ from $B$ .

Subsequently, $A$ receives $v_{i-1}^{B}$ and $v_{2\cdot i-1}^{C}$ and relays them. $A$ is unable to output $v_{i-1}^{A}$ without $v_{i-1}^{B}$ or $v_{2\cdot i-1}^{C}$ since $B$ assigned a smaller clock value to $v_{i-1}^{B}$ and $C$ assigned a smaller clock value to $v_{2\cdot i-1}^{C}$ , and is unable to output $v_{i-1}^{B}$ and $v_{2\cdot i-1}^{C}$ without $v_{i}^{A}$ since it assigned a smaller clock value to $v_{i}^{A}$ , and there is no quorum of replies received for $v_{i}^{A}$ . Moreover, $C$ receives $v_{i-1}^{B}$ from $B$ . $C$ cannot output $v_{2\cdot i-2}^{C}$ without either $v_{i-1}^{A}$ or $v_{i-1}^{B}$ because $A$ assigned a smaller value to $v_{i-1}^{A}$ and $B$ assigned a smaller value to $v_{i-1}^{B}$ . But both cannot be output without $v_{2\cdot i-1}^{C}$ , which was assigned a smaller value, and there is no quorum for $v_{2\cdot i-1}^{C}$ at $C$ .

[Round $2\cdot i+1$ ] At the beginning, a process $b_{i}\in B$ sends a forward message with $v_{i}^{B}$ , received right away by $B$ (before $v_{i}^{A}$ or $v_{2\cdot i}^{C}$ ) which relays it. A process $c_{2\cdot i+1}$ sends forward with $v_{2\cdot i+1}$ , before receiving $v_{i}A$ .

Subsequently, $B$ receives $v_{i}^{A}$ and $v_{2\cdot i}^{C}$ as well as the replies for $v_{i-1}^{B}$ from $A$ and $C$ in this order. But $B$ cannot output $v_{i-1}^{B}$ without $v_{i}^{A}$ or $v_{2\cdot i}^{C}$ , since $A$ and $C$ assigned smaller clock values to $v_{i}^{A}$ and $v_{2\cdot i}^{C}$ respectively. But neither $v_{i}^{A}$ nor $v_{2\cdot i}^{C}$ can be in the output without $v_{i}^{B}$ , for which $B$ does not have a quorum of replies. Now, $C$ receives $v_{i}^{A}$ and both replies for $v_{2\cdot i-1}$ from $B$ and $A$ in this order. But $A$ assigned a smaller value to $v_{i}^{A}$ and $B$ also to $v_{i-1}^{B}$ , and neither can be output without $v_{\cdot i}$ , for which $C$ has no quorum.

Corollary E.6.

A snapshot operation in [20]’s protocol can take $\Omega(n)$ rounds in the worst-case.

Asynchronous Latency and Fast Atomic Snapshot

Abstract

keywords:

1 Introduction

2 System Model

3 Lattice Agreement and Atomic Snapshot

3.1 Lattice Agreement

Definition 3.1 (Lattice Agreement (LA)).

3.2 Atomic Snapshot Object (ASO)

3.3 From LA to ASO

Theorem 3.2.

Proof 3.3.

4 LA Protocol

4.1 Overview

4.2 Correctness

Lemma 4.1.

Proof 4.2.

Lemma 4.3.

Proof 4.4.

Lemma 4.5.

Proof 4.6.

Theorem 4.7.

Corollary 4.8.

4.3 Time metric

Definition 4.9 (Iterative Round Assignment - Informal).

Definition 4.10 (IRA - Arbitrary Events).

4.4 Time complexity of Algorithm 2

Theorem 4.11.

Proof 4.12.

Theorem 4.13.

Proof 4.14.

Lemma 4.15.

Proof 4.16.

Theorem 4.17.

Proof 4.18.

Theorem 4.19.

Proof 4.20.

Corollary 4.21.

5 Measuring latency of ASO protocols

6 Comparative Analysis of Time Measurement Metrics

6.1 Definitions

Definition 6.1 (Round metric).

Definition 6.2 (CR metric).

Example 6.3.

Definition 6.4 (NTR metric).

Example 6.5.

Definition 6.6 (Longest Causal Chain).

Example 6.7 (Reliable Broadcast).

6.2 Covered executions and holes

Definition 6.8 (Covered Execution).

Theorem 6.9.

Proof 6.10.

6.3 Time between arbitrary events

Definition 6.11 (Generalized CR metric).

Example 6.12.

Definition 6.13 (Generalized NTR).

Theorem 6.14.

Proof 6.15.

6.4 Relating IRA to NTR

Theorem 6.16.

Proof 6.17.

Corollary 6.18.

References

Appendix A Protocol with O​(n2)O(n^{2}) message complexity per request

Appendix B Time Complexity of Algorithm 2

Appendix C Equivalence Proofs for Time Measurement Metrics

Theorem C.1.

Proof C.2.

C.1 A new look at execution latency

Definition C.3 (Execution cover).

Theorem C.4.

Proof C.5.

Corollary C.6.

Theorem C.7.

Proof C.8.

Corollary C.9.

C.2 Latency between arbitrary events

Definition C.10 (Event cover).

Theorem C.11.

Theorem C.12.

Appendix A Protocol with $O(n^{2})$ message complexity per request