Exploiting a Natural Network Effect for Scalable, Fine-grained Clock Synchronization
Introduction
Usage of synchronizing clocks
Usage of synchronizing clocks:
- consistency
- event ordering
- causality
- scheduling of tasks and resources
Related paper:
- LAMPORT, L. Time, clocks, and the ordering of events in a distributed
system. Communications of the ACM 21, 7 (1978), 558–
565. - LISKOV, B. Practical uses of synchronized clocks in distributed
systems. Distributed Computing 6, 4 (1993), 211–219.
Useful in many fields: finance and e-commerce, distributed database, SDN, congestion control.
Accuracy will influence the performance of some systems.
Challenges for high precision synchronized clocks
Challenges for high precision synchronized clocks:
- Uncertainty of clock is comparable to propagation delay of the network. The common used clocks (implemented by a quartz crystal oscillator) may drift from true time at the rate of 6-10 microseconds/sec. But the one-way delay (OWD), defined as the raw propagation (zero-queuing) time between sender and receiver, in high-performance data centers is under 10μs.
- Path noise. Path noise (due to small fluctuations in switching times, path asymmetries
(e.g., due to cables of different length) and clock timestamp noise) is in the order of 10s-100s of ns, and is hard to measure → hard to have ns level clocks.
Current limitation
Current limitation: trade-off between easy deployability and precision.
Huygens (this paper)
Huygens (this paper) achieves 10s of nanoseconds precision, and is easy to be deployed.
Literature survey
Two methods to determining OWD:
- Determining the time spent by the probe at each element en route from A to B.
- By estimating the RTT (where B sends a probe back to A). In this case, assuming the OWD is equal in both directions, halving the estimated RTT gives the OWD.
NTP
Picking the three with the smallest RTTs along multiple probe-echo pairs.
10s of ms in WAN, 10s of μs in DCN.
PTP
Switches record the ingress and egress time of a packet to accurately obtain packet dwell times at switches.
Advanced hardware + dedicated network → < 1ns
conventional fully “PTP-enabled network” → 10s-100s of ns
not fully “PTP-enabled network” → 1000x worse, 10s-100s of μs
high load network → performance degradation
DTP
Use PHY synchronization mechanism defined in IEEE 802.3 Ethernet. It is fine-grained, and is not load-dependent.
In 10Gbps, it can achieve 25.6ns (6.4ns * 4) for a single hop.
Need special extra hardware.
PPS
Use GPS, all communication cost is precisely measured.
Very expensive to deploy at scale.
Our approach
Data center features
- Symmetric, multi-level, fat-tree topology.
- Propagation times are small, well-bounded by 25-30μs. Abundant bisection bandwidth + multiple path → a reasonably good chance probes can traverse the network without encountering queueing delays (really?).
- Many servers → possible to synchronize them in concert.
Algorithms and techniques
- Coded probes. A pair of probe packets going from server iii to jjj with a small inter-probe transmission time spacing of sss. Only take coded probes which keep the sss into account (“pure” coded probes).
- Support Vector Machines (SVM).
- Network effect. from Wikipedia, A network effect (also called network externality or demand-side economies of scale) is the effect described in economics and business that an additional user of a good or service has on the value of that product to others.
Clocks in the real-world
Following is the illustration of a coded probe (a pair of probe packets).
Let Δ\DeltaΔ be the relative difference between clocks of server A and server B (If server A’s clock is ttt, then the B’s is t+Δt + \Deltat+Δ).
We have
{ RXB=TXA+OWDA→B+ΔrxA=txB+OWDB→A−Δ \begin{cases} RX_B = TX_A + \text{OWD}_{A \rightarrow B} + \Delta \\ rx_A = tx_B + \text{OWD}_{B \rightarrow A} - \Delta \end{cases} { RXB=TXA+OWDA→B+ΔrxA=txB+OWDB→A−Δ