Arbitrary-Precision Arithmetic
Arbitrary-Precision Arithmetic
Arbitrary-Precision Arithmetic
by
Jesse Michel
B.S., Massachusetts Institute of Technology (2019)
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
May 2020
© Massachusetts Institute of Technology 2020. All rights reserved.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical Engineering and Computer Science
May 18, 2020
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Michael Carbin
Jamieson Career Development Assistant Professor
of Electrical Engineering and Computer Science
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Katrina LaCurts
Chair, Master of Engineering Thesis Committee
2
Sensitivities for Guiding Refinement in Arbitrary-Precision
Arithmetic
by
Jesse Michel
Abstract
Programmers often develop and analyze numerical algorithms assuming that they operate on
real numbers, but implementations generally use floating-point approximations. Arbitrary-
precision arithmetic enables developers to write programs that operate over reals: given an
output error bound, the program will produce a result within that bound. A key drawback
of arbitrary-precision arithmetic is its speed. Fast implementations of arbitrary-precision
arithmetic use interval arithmetic (which provides a lower and upper bound for all vari-
ables and expressions in a computation) computed at successively higher precisions until
the result is within the error bound. Current approaches refine computations at precisions
that increase uniformly across the computation rather than changing precisions per-variable
or per-operator. This thesis proposes a novel definition and implementation of derivatives
through interval code that I use to create a sensitivity analysis. I present and analyze the
critical path algorithm, which uses sensitivities to guide precision refinements in the compu-
tation. Finally, I evaluate this approach empirically on sample programs and demonstrate
its effectiveness.
3
4
Acknowledgments
I thank my advisor Michael Carbin. He helped guide the intuition and motivation that
shaped this thesis and provided useful feedback and guidance on the experimental results. I
would also like to thank Ben Sherman for helping to develop technical aspects of this thesis,
for making the time to review my writing, and for his guidance throughout the research
process. Alex Renda, Rogers Epstein, Stefan Grosser, and Nina Thacker provided useful
feedback. I am grateful for the financial support that I have received from NSF grant CCF-
1751011.
I thank my parents, sisters, and extended family for their love and support and my
nephew Joseph for being a shining light in my life.
5
6
Contents
1 Introduction 13
1.1 Motivating example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7
4 Automatic Differentiation of Interval Arithmetic 37
4.1 Introduction to automatic differentiation . . . . . . . . . . . . . . . . . . . . 37
4.2 Automatic differentiation on intervals . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Derivative of interval addition . . . . . . . . . . . . . . . . . . . . . . 39
4.2.2 Derivative of interval multiplication . . . . . . . . . . . . . . . . . . . 40
4.2.3 Derivative of interval sine . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Results 45
5.1 Schedules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.1 Baseline schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.2 Critical path schedule . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Empirical comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.1 Improving a configuration . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.2 Improving a schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 Related Work 49
6.1 Mixed-precision tuning and sensitivity analysis . . . . . . . . . . . . . . . . . 50
6.2 Arbitrary-precision arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2.1 Pull-based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2.2 Push-based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8 Conclusions 57
8
List of Figures
2-1 The four key monotonic regions for the definition of interval sine. . . . . . . 22
2-2 A simple Python implementation of interval sin. . . . . . . . . . . . . . . . . 23
9
10
List of Tables
11
12
Chapter 1
Introduction
Floating-point computations can produce arbitrarily large errors. For example, Python
implements the IEEE-754 standard, which produces the following behavior for 64-bit floating-
point numbers:
The result of this computation is 0 instead of 1! This leads to an arbitrarily large error
in results; for example, (1 + 1e17 − 1e17)𝑥 will be always be 0 instead of 𝑥. Resilience to
numerical-computing error is especially desirable for safety-critical software such as control
systems for vehicles, medical equipment, and industrial plants, which are known to produce
incorrect results because of numerical errors [12].
In contrast to floating-point arithmetic, arbitrary-precision arithmetic computes a result
within a given error bound. Concretely, given the function 𝑦 = 𝑓 (𝑥) and an error bound 𝜖,
arbitrary-precision arithmetic produces a result 𝑦˜ such that
|˜
𝑦 − 𝑦| < 𝜖.
13
sider the example of representing 𝜋 to 5 mantissa bits:
exponent
⏞ ⏟
11.0012 = 110012 ×2 −3 .
⏟ ⏞
mantissa
Since the exponent adjusts automatically, I focus on setting the number of mantissa bits for
the variables and operators in the computation. For the rest of the thesis, bits of precision
will denote mantissa bits.
More realistically, suppose that the function +𝑝 : R × R → R2 for bounded addition at pre-
cision 𝑝. Then, +𝑝 and +𝑝 compute the lower and upper bound for adding inputs truncated
to precision 𝑝. They satisfy the property that for all 𝑎, 𝑏 ∈ R, (𝑎 +𝑝 𝑏) ≤ (𝑎 + 𝑏) ≤ (𝑎 +𝑝 𝑏)
where + is exact and where as 𝑝 → ∞ the inequality becomes an equality. Assuming error
in addition,
[1, 2] ;𝑝 [3, 4] = [1 +𝑝 3, 2 +𝑝 4],
which will always have a lower bound ≤ 4 and an upper bound ≥ 6. Computing constants
such as 𝜋 or 𝑒 makes the need for this type of approximation clearer since it requires infinite
space to represent them exactly (since 𝜋 and 𝑒 are transcendental). However, they are
soundly computed using arbitrary-precision arithmetic.
14
1.1 Motivating example
𝑒 + 1000𝜋
to a generous error bound of 500. Existing approaches refine precision uniformly across vari-
ables and operators [30, 25]. In the best case scenario, these approaches require 5 mantissa
bits for ;, 𝑒, 5, and 𝜋 (since 𝑘 is a constant, it remains at a fixed precision). Note that ;
and 5 are the addition and multiplication operators over intervals respectively, described in
detail in Chapter 2. An example of this computation is shown in Figure 1-1. Suppose the
; [3070, 3460]
Figure 1-1: The figure presents a computation graph evaluated at a uniform precision of 5
mantissa bits of precision (except for the constant 𝑘) with an error of 3460 − 2070 = 390.
15
sensitivities are implemented with automatic differentiation through the interval code, which
is novel as well. More explicitly, if the output is the interval 𝑦 = [𝑦, 𝑦], then for each interval
(︂ )︂
𝜕(𝑦−𝑦) 𝜕(𝑦−𝑦)
𝑥 = [𝑥, 𝑥], the derivatives will be 𝜕𝑥
, 𝜕𝑥 as shown in Figure 1-2. Note that the
parentheses in the figure denote pairs of numbers (tuples), not open intervals. The sensitivity
; (1, −1)
is the difference between the derivative of the output with respect to the lower bound and
𝜕(𝑦−𝑦) 𝜕(𝑦−𝑦)
the derivative of the output with respect to the upper bound, namely 𝜕𝑥
− 𝜕𝑥
. The
resulting sensitivities are presented in Figure 1-3.
; 2
𝑒 2 5 2
𝑘 N/A 𝜋 2000
Figure 1-3: Sensitivities are the derivative with respect to the lower bound minus the deriva-
tive with respect to the upper bound as shown in Figure 1-2.
The most sensitive vertex in the computation graph in Figure 1-3 is 𝜋 because 2000 is
the largest sensitivity. The proposed technique identifies the critical path as the path from
the root to the most sensitive vertex. In this case, the critical path is ; → 5 → 𝜋. The
; [3070, 3460]
Figure 1-4: Computation graph using 5 mantissa bits for ;, 5, and 𝜋, 3 mantissa bits for 𝑒,
and not changing the constant 𝑘. The critical path is bolded.
resulting computation graph is shown in Figure 1-4. Along the critical path, variables on
16
operators are incremented by 2 mantissa bits, while the remainder of the computation graph
is incremented by 1. This is an instantiation of the critical path algorithm. In this case,
the first configuration satisfying the error bound assigns 5 mantissa bits along the critical
path and 3 bits to 𝑒. As 𝑘 becomes larger, approaches using uniform refinement techniques
compute more and more decimal places of 𝑒 unnecessarily. The critical path algorithm can
avoid this problem.
1.2 Thesis
In this thesis, I investigate ways to improve precision refinement in arbitrary-precision arith-
metic. I define a novel sensitivity analysis in terms of derivatives computed through interval
code. Using these sensitivities, I propose an algorithm – the critical path algorithm – that
guides the refinement process of arbitrary-precision arithmetic. The sensitivities use deriva-
tives computed with reverse-mode automatic differentiation through interval code, which is
novel. I implement a system for performing arbitrary-precision arithmetic and demonstrate
that the critical path algorithm can guide refinements to produce more accurate results with
less computation on certain programs.
1.3 Outline
The thesis is structured as follows. In Chapter 2, I explain how interval arithmetic works
and elaborate on the mathematical and implementation challenges. Then in Chapter 3, I
present the current approach to implementing arbitrary-precision arithmetic using interval
arithmetic and show how it may be improved assuming derivatives of interval code can be
efficiently computed. I describe the approach to efficient derivative computation in Chap-
ter 4. Next, I present empirical results using the proposed approach to precision refinement
in arbitrary precision arithmetic in Chapter 5. I discuss some related work in Chapter 6
and finally present a discussion in Chapter 7 and conclusions in Chapter 8. The open-source
implementation is available at https://github.com/psg-mit/fast_reals.
17
18
Chapter 2
This chapter provides a brief introduction to interval arithmetic. I describe the interval
operations for addition, multiplication, and sine. I also provide code to elucidate the under-
lying implementation and provide an analysis of some of the properties that arise from using
interval arithmetic.
Interval arithmetic is a method of computing that provides a bound on output error, use-
ful in the implementation of push-based arbitrary-precision arithmetic. For a more thorough
treatment of interval arithmetic, including an analysis of correctness, totality, closedness,
optimality, and efficiency, see [18].
19
2.1 Interval addition
In this section, I show how to implement interval addition given access to primitives provided
in a number of libraries such as MPFR [14]. Assume that the function +𝑝 : R × R → R2 that
computes error-bounded interval addition at precision 𝑝 is given and satisfies the property
that (𝑎 +𝑝 𝑏) ≤ (𝑎 + 𝑏) ≤ (𝑎 +𝑝 𝑏) where + is exact and such that, in the limit as
𝑝 → ∞, the inequality becomes equality. The addition operator over intervals at precision 𝑝
is ;𝑝 : R2 × R2 → R2 and has the following behavior: given the two intervals
This is correct because there is a precondition that 𝑖1 , 𝑖2 are valid intervals (i.e. 𝑖1 ≤ 𝑖1 and
𝑖2 ≤ 𝑖2 ) and addition is monotonic increasing. Thus, the minimum and maximum possible
values of the sum are the lower and upper bounds given.
Implementing interval multiplication is a little more nuanced because multiplication over the
reals is neither monotonic increasing nor monotonic decreasing over the reals. For example,
−5 × −5 = 25 and −4 × −5 = 20, so increasing an argument may decrease the output. On
the other hand, 5 × 5 = 25 and 5 × 6 = 30, so increasing an argument may increase the
output. It is possible to regain monotonicity by partitioning the reals into the negative R−
and non-negative R+ . Kaucher multiplication is an algorithm that takes advantage of this
structure is [22].
I present a simpler, but potentially less efficient, algorithm. Assume that the function
×𝑝 : R × R → R2 that computes error-bounded multiplication at precision 𝑝 is given and
satisfies the property that (𝑎 ×𝑝 𝑏) ≤ (𝑎 × 𝑏) ≤ (𝑎 ×𝑝 𝑏) where × is exact and such that, in
20
the limit as 𝑝 → ∞, the inequality becomes equality. Given the two intervals
𝑖1 5𝑝 𝑖2 = [min 𝑆, max 𝑆]
{︁ }︁
where 𝑆 = 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 is the lower bound of each of the pairwise
{︁ }︁
products and 𝑆 = 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 is the upper bound of each of the
pairwise products. The correctness proof is provided in Section 4.6 of [18].
Even more difficult is the computation of interval sin 𝑝 : R2 → R2 . The contributions in this
section are, to my knowledge, novel, but are not core to the thesis as a whole. This section
serves the purpose of introducing some of the relevant challenges of implementing interval
arithmetic.
21
Figure 2-1: The four key monotonic regions for the definition of interval sine.
The Python implementation in Figure 2-2 surfaces a few details that I did not specify in
the mathematical presentation. For example, it shows how to identify the monotonic regions
labeled in Figure 2-1 using cosine. I use the bigfloat Python wrapper for the MPFR library
to compute sin𝑝 , sin𝑝 : R → R [14].
I check that the width of the interval 𝑥 is less than 3 rather than 𝜋 because the contract
of interval arithmetic allows for over-estimation (loose bounds) and in practice, the bounds
are generally tight intervals (with width much less than 3). Also note that the cosine is used
to identify the various regions by their slope, avoiding modular arithmetic and significantly
simplifying the implementation when compared with other approaches ([2]). For example,
22
def interval_sin ( interval , lower_at_p , upper_at_p ) :
lower , upper = interval
2.4 Analysis
I will briefly reflect on some of the properties of interval arithmetic with these operators.
Applying an interval operator is sound if for every input interval, the output interval contains
the result. Since any operator can be implemented soundly by returning [−∞, ∞], we need a
condition on tightness. A precision-parameterized functions on intervals (e.g., ;𝑝 ) is tight if,
for every input interval, in the limit as the precision approaches infinity, the width approach
the width when the computation is exact (error-free) over the reals.
I analyze these properties with respect to ;𝑝 , ×𝑝 , and sin𝑝 . ; is sound and tight because
23
it acts element-wise on each of the input intervals with +𝑝 : R × R → R. Similarly, 5𝑝 is
sound and tight. The implementation of sin 𝑝 can be both sound and tight [2]. However, the
implementation provided is sound, but not tight, because it returns [−1, 1] for input intervals
with width greater than 3. It is tight for intervals narrower than 3. In the high precision
limit for arbitrary-precision arithmetic, the input interval widths tend to 0 and thus sin 𝑝 is
tight.
24
Chapter 3
25
Definition 3.0.2. A schedule is a sequence of configurations 𝑆 : N → (𝑉 → N) such that
𝑛 ↦→ 𝑆 (𝑛) where 𝑆 (𝑛) is the 𝑛th configuration.
Each forward pass computes with respect to a configuration and a push-based com-
putation will follow a schedule – computing with respect to the successive configurations
(𝑆 (𝑖) )𝑖∈N until the result lies within the error bounds. In general, schedules in push-based
computations will produce configurations that assign variables to increasingly high preci-
sions (𝑆𝑣(𝑘) < 𝑆𝑣(𝑘+1) for all 𝑘 ∈ N), leading to a monotonically decreasing error on commonly
occurring computations.
I begin by considering a baseline that generalizes the schedule proposed by iRRAM [30].
This schedule computes the function 𝑓 (𝑥) by setting the (𝑘 + 1)th configuration as
where 𝑆 (0) , 𝑎, 𝑏 are parameters that define the behavior of the schedule. Notice that the
precisions grow exponentially. I present this schedule simply because it is used in iRRAM, one
of the fastest arbitrary-precision arithmetic libraries [5]. There are fundamental trade-offs
between different choices of configurations, with the central concerns being: (1) overshooting
– when the final configuration is at an unnecessarily high precision – and (2) undershooting
– requiring too many forward passes to converge (i.e. the error bound is satisfied when 𝑘 in
𝑆 (𝑘) is large).
The problem For the schedule in Equation 3.1, the final configuration 𝑆 (𝑘) that satisfies
the given error bound assigns each variable and operation to the same precision; this as-
signment is rarely optimal and may be far from it. Chapter 1 provides a worked example of
a case where uniform refinement is suboptimal and benefits from setting different variables
and operations to different precisions. Although it may be possible to compute the necessary
precisions optimally by hand (at least for simple cases), an optimal, automated approach to
26
precision refinement would require perfectly modeling floating-point error, which has evaded
researchers.
I choose to take a heuristic approach. Heuristics may be used to guide schedules to
configurations satisfying error bounds, while minimizing the amount of total computation
(the sum of all of the compute, generally measured in time, required for all of the configu-
rations run). A good heuristic is fast to compute and guides the computation quickly to a
configuration respecting the given error bound without overshooting or undershooting.
In this section, I describe the novel algorithm to compute sensitivities assuming that deriva-
tives of the interval code are already provided. The sensitivities provide a measure of the
amount of change in the output interval width from a change to the input interval width. In
Chapter 4, I demonstrate that derivatives of interval code can be computed efficiently using
automatic differentiation.
I now present a sensitivity analysis that is a key contribution of this thesis. My construction
of sensitivities of interval computations assumes correctly computed derivatives are already
provided. I will detail the implementation of these derivatives in the following chapter. Run-
ning automatic differentiation on the computation graph of an interval arithmetic expression
produces 4 partial derivatives for each 𝑣 ∈ 𝑉 . In particular, for vertex 𝑣𝑥 corresponding to
the input interval 𝑥, if the function 𝑓 has the output 𝑦 = 𝑓 (𝑥), then the change in the output
𝑦 = [𝑦, 𝑦] with respect to a change in the interval 𝑥 = [𝑥, 𝑥] is
𝜕𝑦 𝜕𝑦 𝜕𝑦 𝜕𝑦
, , , . (3.2)
𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥
𝜕𝑦
For example, 𝜕𝑥
is an intuitive answer to the question “what will be the change in the
lower bound of the output given a small increase in the lower bound of 𝑣𝑥 ?” Increasing the
27
precision at which 𝑣𝑥 is computed decreases the width of the output interval, and thus,
𝜕𝑦 𝜕𝑦 𝜕𝑦 𝜕𝑦
, ≥ 0, , ≤ 0.
𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥
This leads to a natural definition of sensitivity that is one of the core contributions of this
thesis. I define the sensitivity of 𝑣𝑥 with respect to a decrease in the width of 𝑥 as:
𝜕𝑦 𝜕𝑦 𝜕𝑦 𝜕𝑦
sens(𝑣𝑥 ) = + − − . (3.3)
𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥
I will now build a function that explicitly relates a change in the width of the interval
corresponding to a vertex in the computation graph to the width of the output interval, giving
a scalar-valued function whose derivative determines sensitivity. Formally, the sensitivities
are the derivative of the composition of functions that take the derivative (𝐷𝑓 ) : R2 → R2×2
of an interval-valued function 𝑓 : R2 → R2 and transform it in terms of its directional
derivatives. The goal is to understand the decrease in the output interval width as a result
of decreasing the input interval width. This means directing perturbations in a positive
direction for lower bounds and a negative direction for upper bounds. The function with
respect to a specific input 𝑥 satisfying these properties is:
28
the interval 𝑥 by 𝑡. If 𝑦 = 𝑓 (𝑥), the derivative is:
⎡ ⎤
𝜕𝑦 𝜕𝑦
⎢ 𝜕𝑥 𝜕𝑥 ⎥
(𝐷𝑓 )𝑥 = ⎢ ⎥ (3.5)
𝜕𝑦 𝜕𝑦
⎣ ⎦
𝜕𝑥 𝜕𝑥
⎡ ⎤⎡ ⎤
[1 −1] 𝜕𝑦 𝜕𝑦
1⎥
𝑑𝑧𝑥 ⎢ 𝜕𝑥 𝜕𝑥 ⎥⎢
= ⎢ ⎥⎢ ⎥. (3.6)
𝑑𝑡 ⎣ 𝜕𝑦 𝜕𝑦
⎦⎣ ⎦
𝜕𝑥 𝜕𝑥
−1
This evaluates exactly to the proposed sensitivity, meaning that if 𝑥 corresponds to the
vertex 𝑣𝑥 in the computation graph,
𝑑𝑧𝑥
= sens(𝑣𝑥 ).
𝑑𝑡
The proposed sensitivity analysis (in Equation 3.3) implicitly assumes that decreasing the
width of an interval by an infinitesimal amount 𝛿 is just as costly when the interval width
is 100 as when the current interval width is 0.01. This assumption is often inaccurate. For
example, computing the first 𝑛 digits of 𝜋 using the Bailey–Borwein–Plouffe algorithm has
a computational complexity that is 𝑂(𝑛 log3 (𝑛)) [3]. Incorporating the property that it
requires more computational cost to refine narrower intervals may help to encourage cost-
efficient configurations for computations using sensitivities to guide refinement.
The cost-dependent sensitivity analysis for the vertex 𝑣𝑥 in the computation graph cor-
responding to 𝑥 is
[︂ ]︂ ⎡ ⎤
𝜕𝑦
𝑐1 (𝑥) 𝑐2 (𝑥) 𝑐3 (𝑥) 𝑐4 (𝑥) ⎢ 𝜕𝑥 ⎥
⎢ 𝜕𝑦 ⎥
⎢ ⎥
sens′ (𝑣𝑥 ) =
⎢ 𝜕𝑥 ⎥
⎢ 𝜕𝑦 ⎥ .
⎢ ⎥ (3.7)
⎢− ⎥
⎢ 𝜕𝑥 ⎥
⎣ ⎦
𝜕𝑦
− 𝜕𝑥
29
I provide a theoretical analysis of the cost function 𝑐 where 𝑐𝑖 (𝑥) = 𝑥 − 𝑥 with 𝑖 = 1, 2, 3, 4
in Section 3.4. This allows for per-operator cost functions that can model the difficulty of
refining different parts of the compute graph, which I expand upon in the Chapter 7.
Definition 3.3.1. The most sensitive vertex is the vertex 𝑣 ∈ 𝑉 such that
where sens𝑘 is the sensitivity (as defined in Equation 3.3) of the given program evaluated at
the 𝑘th configuration.
Definition 3.3.2. The critical path 𝑃 (𝑘) is the path from the most sensitive vertex 𝑣 (with
ties broken arbitrarily) to the root for computation evaluated at the configuration 𝐶 (𝑘) .
Note that the sensitivities may change throughout the course of the computation due to
changes in values. Thus, the most sensitive vertex and critical path are parameterized by
the configuration.
Armed with this terminology, defining the schedule is quite straightforward. At each
iteration, the configuration is refined by a larger increment along the critical path than it is
in the rest of the computation. Explicitly, I define this schedule as
⎧
⎨𝑆 ′(𝑘) + 𝑎1 𝑏𝑘1 if 𝑣 ∈ 𝑃 (𝑘)
⎪
⎪
𝑣
𝑆𝑣′(𝑘+1) = (3.8)
⎩𝑆 ′(𝑘) + 𝑎2 𝑏𝑘2
⎪
otherwise
⎪
𝑣
where 𝑆 ′(0) , 𝑎1 , 𝑏1 , 𝑎2 , 𝑏2 dictate the behavior of the schedule. I call this the critical path
algorithm for precision refinement. In Chapter 5, I experimentally compare the baseline
(iRRAM) schedule and the proposed schedule.
30
3.4 Analysis
In this section, I compare the asymptotic behavior and theoretical properties of the uniform,
critical path, and cost-modeled schedules. Although the empirical results use multiple-
precision floating point, I use fixed point for the theoretical results because it is easier to
analyze. In particular, I consider numbers in the range [0, 1] in the form of fixed-point binary
numbers
∞
2−𝑖 𝑏𝑖 ,
∑︁
𝑥=
𝑖=1
𝑛
∑︁
𝑦= 𝑎𝑖 𝑥𝑖 ,
𝑖=1
where 𝑎𝑖 is a constant and 𝑥𝑖 ∈ [0, 1] for all 𝑖. In this case, the sensitivities will be 2 for all
of the ; and 5 operators, 𝑎𝑖 for each 𝑥𝑖 , and not applicable for constants (because they are
assumed to be binary rationals at infinite precision e.g. 𝑎1 , 𝑎2 , . . . , 𝑎𝑛 ). I explore the case
where 𝑎𝑖 = 2−𝑖 . Figure 3-1 presents the computation graph of 𝑦.
5 5 ... 5
𝑎1 𝑥1 𝑎2 𝑥2 𝑎𝑛 𝑥𝑛
Figure 3-1: Example computation graph of a family of computations that can benefit from
the critical path algorithm. The critical path, which remains the same for all iterations, is
bolded for 𝑎𝑖 = 2−𝑖 .
I now introduce notation and properties that are useful in the analysis of different sched-
ules. Let [𝑛] denote the set {1, 2, . . . , 𝑛}.
given schedule.
31
Definition 3.4.2. Let 𝑤(𝑘) denote the width of the output (the error) on the 𝑘th iteration
of a given schedule.
The properties below are simple, but useful to refer to in the analysis that follows.
Fact 2. Given a finite geometric series where the ratio between consecutive terms is 21 , the
∑︀𝑛
sum is 𝑖=𝑗 𝑡𝑖 = 2𝑡𝑗 (1 − 2𝑗−𝑛 ).
Assume that each refinement increments the precisions for each of the vertices in the com-
putation graph by 1. Formally, this means that for every 𝑖 ∈ [𝑛],
𝑝(𝑘)
𝑥𝑖 = 𝑘.
1
Since the error for each 𝑥𝑖 is the same, and is 2𝑘
(by Fact 1), it may be factored out of
∑︀𝑛 1
the summation. The other term contributing to the error is 𝑖=1 𝑎𝑖 = 1 − 2𝑛
(by Fact 2).
Therefore, the width of the output when using uniform refinement is
1 1
(︂ )︂
𝑤𝑢(𝑘) = 𝑘
1− 𝑛 . (3.9)
2 2
Figure 3-1 shows the critical path of the computation. Since the derivative for each of the
bounds of 𝑥𝑖 is 𝑎𝑖 , the most sensitive vertex (the one with the largest derivative) is 𝑥1 for
every refinement. I use a schedule where the configuration is incremented by 2 along the
critical path and by 1 everywhere else in the computation graph. As a result, if the first
refinement sets every variable and operator to one bit of precision, the precisions will follow
the equations:
𝑝(𝑘) (𝑘)
𝑥1 = 2𝑘 − 1, 𝑝𝑥𝑖 = 𝑘.
1 1
Again, by Fact 1, the widths of the intervals are 22𝑘−1
and 𝑤𝑥(𝑘)
𝑖
= 2𝑘
. Computing the output
interval width is then a matter of combining these terms with the corresponding coefficients
32
𝑎𝑖 = 2−𝑖 , giving rise to the formula:
𝑛
1 1 ∑︁ 1
𝑤𝑝(𝑘) = + .
22𝑘 2𝑘 𝑖=2 2𝑖
1 1 1
(︂ )︂
𝑤𝑝(𝑘) = 2𝑘
+ 𝑘+1 1 − 𝑛−1 . (3.10)
2 2 2
I analyze a schedule that uses the critical path algorithm with cost-modeled sensitivities,
and I call this the cost-modeled schedule. I define the cost-aware sensitivity as
where “sens” is defined in Equation 3.3. The sensitivities sens′ are a special case of the cost
model presented in Section 3.2.3.
The most sensitive vertex is the one with the smallest product of the sensitivity and
the interval width. I use the same schedule as in the previous algorithm, where for each
refinement, the configuration is incremented by 2 along the critical path and by 1 everywhere
𝑙(𝑙+1)
else in the computation graph. Let 𝑇𝑙 = 2
be the 𝑙th triangular number. Intuitively, the
refinement will proceed as follows:
sensitive.
3. After two additional refinement steps, the precisions are 𝑝(4) (4)
𝑥1 = 6, 𝑝𝑥2 = 5, and
𝑝(4)
𝑥𝑖̸∈{1,2} = 4. 𝑥1 , 𝑥2 , and 𝑥3 are all equally sensitive.
33
4. After three additional refinement steps, the precisions are 𝑝(7) (7) (7)
𝑥1 = 10, 𝑝𝑥2 = 9, 𝑝𝑥3 =
8, 𝑝(7)
𝑥𝑖̸∈{1,2,3} = 7. 𝑥1 , 𝑥2 , 𝑥3 , and 𝑥4 are all equally sensitive.
..
.
𝑝(1+𝑇
𝑥2
𝑙) = 𝑇 (1+𝑇𝑙 )
𝑙+1 − 1, . . ., 𝑝𝑥𝑖̸∈[𝑙] = 𝑘, where 𝑙 is defined so that 𝑘 = 𝑇𝑙 + 1. 𝑥1 , 𝑥2 , . . . , 𝑥𝑙+1
Using these observations and applying Fact 1 and Fact 2, the formula is:
𝑙 1 1
(︂ )︂
𝑤𝑐(𝑇𝑙 +1) = + 1− , (3.11)
2𝑇𝑙+1 +1 2𝑇𝑙 +𝑙+1 2𝑛−𝑙
In this section, I compare the three different schedules for a few specific values of 𝑛, which
∑︀𝑛
varies the number of terms in the summation 𝑖=1 𝑎𝑖 𝑥𝑖 , and I analyze the comparative
asymptotic performances of the different schedules.
I derive formulas for the widths of the output intervals as a function the number of terms
in the summation 𝑛 and the number of refinements 𝑘. I fix 𝑘 = 𝑇𝑛 + 1 because it is the
number of refinements at which all of the leaves of the cost-modeled schedule contribute the
same amount of error. This simplifies the expression for the width to
𝑛
𝑤𝑐(𝑇𝑛 +1) = .
2𝑇𝑛+1 +1
Table 3.1 shows the number of additional refinements needed for results to lie within the
error bound from the cost-modeled schedule at the 𝑇𝑛 +1th iteration. Even for relatively small
𝑛, it is clear that there is a significant practical advantage to using the cost-modeled schedule
over the uniform and critical path schedules. The critical path schedule also consistently
outperforms the uniform schedule.
34
Schedule 𝑛 = 5 𝑛 = 10 𝑛 = 15
Uniform 4 8 13
Crit. path 3 7 12
Table 3.1: The table shows the number of additional refinements needed for the uniform and
critical path schedules to lie within the error bound that the cost-modeled schedule on the
𝑇𝑛 + 1th iteration (𝑇5 + 1 = 16, 𝑇10 + 1 = 56, and 𝑇15 + 1 = 121).
I now study the limiting behavior of the ratio between the width of the cost-modeled
schedule and each of the other two schedules. I find that it is an exponentially better
schedule (note that smaller is better for widths) because
The single additional bit added along the critical path significantly improves the refinement
process for the cost-modeled schedule. This emphasizes the importance of using carefully
considered scheduling algorithms.
To a lesser extent, the critical path schedule outperforms the uniform schedule. The limit
𝑤𝑝(𝑇𝑛 +1) 1
lim =
𝑛→∞ (𝑇𝑛 +1) 2
𝑤𝑢
shows that the critical path schedule is a factor of two tighter than the uniform schedule at
the same refinement iteration. A single additional bit of precision per refinement leads to
halving the interval width globally.
35
36
Chapter 4
In this chapter, I introduce automatic differentiation and detail both the implementation
and relevant analysis behind computing derivatives of interval code. The recent popularity
of deep learning led to a focus on efficient computation of derivatives. Indeed, the backprop-
agation algorithm, key to deep learning, is a special case of automatic differentiation [1, 32].
I begin with a brief overview of different approaches and some design considerations in the
efficient computation of derivatives.
37
the 𝑁 leaf derivatives are assigned at initialization. As a result, forward-mode AD computes
all of the 𝑀 output derivatives with respect to an assignment of input derivatives.
I decide to use reverse-mode AD because it computes from the outputs to the inputs. There-
fore, given an initialization for the two output derivatives, reverse-mode AD computes the
derivatives for all of the inputs and intermediate computations. In contrast, forward-mode
would require a forward-pass for each input (in the case of interval arithmetic, both the lower
and upper bound).
For simplicity, I do not use interval arithmetic to bound the error of the gradient com-
putation, but I do leave the gradients parameterized by precision for convenience and note
that my implementation can easily be extended to compute error-bounded gradients.
38
def _compute_grad ( self , parents , rte ) :
grad = 0
for ( w1 , w2 ) , var in parents :
lower , upper = var . grad ()
grad_term = add ( mul ( w1 , lower , rte ) , mul ( w2 , upper , rte ) , rte )
grad = add ( grad_term , grad , rte )
return grad
Extracting sensitivities Generating the sensitivity for each variable specified in Equa-
tion 3.3 requires two simple but key steps. First, initialize the derivatives at the root
root.lower_grad, root.upper_grad = 1, -1 .
Then, for a vertex 𝑣 in the computation graph, I compute the sensitivity 𝑠𝑒𝑛𝑠(𝑣), described
in Equation 3.3:
v.lower_grad - v.upper_grad .
These correspond to the pre-composition with 𝑔 and post-composition with ℎ𝑥 that maps
the four partial derivatives in Equation 3.5 to the sensitivity. Explicitly,
⎡ ⎤
⎢ 1 ⎥
[︂ ]︂
1 −1 , ⎣ ⎦
−1
correspond to initializing the derivatives at the root and sensitivity assignment respectively
as they appear in Equation 3.6. The given code snippets constitute the implementation
naturally arising from Equation 3.6.
Building upon the explanation of interval addition in Section 2.1, I now show how to take
derivatives through ;𝑝 : R2 × R2 → R2 , the interval addition operator at precision 𝑝. Since
39
;𝑝 is monotonic increasing, the derivative of the lower bound of the output with respect
to the lower bound of either of the inputs is 1 and similarly, the derivative of the upper
bound of the output with respect to the upper bound of either of the inputs is 1. All other
derivatives are 0, with eight derivatives in total.
Explicitly, Figure 4-2 presents an implementation of interval addition with derivatives.
My implementation stores these derivatives and the corresponding object that are used in
the recursive calls to grad .
The derivative of interval multiplication is more difficult because the output interval is the
minimum and maximum of the set of pairwise products of the input intervals (as explained
in Section 2.2). The derivative computation involves identifying the terms that contribute
to the output and assigning the appropriate derivatives. I will only provide an example and
forego the implementation as it is detailed and adds little additional insight (it is in the
provided code).
40
Consider the example:
[−1, 2] 5 [−4, 1] = [−8, 4].
The set of products 𝑆 = {−8, −1, 2, 4}. Since the 𝑧 = [min 𝑆, max 𝑆], the result is 𝑧 = [−8, 4].
There are four input terms {−1, 2, −4, 1} and two outputs −8, and 4, that lead to the eight
derivatives shown in Equation 4.1. Each derivative is provides an answer to the intuitive
question: “how much would a change in this input affect that output?”
(︁ )︁
𝜕𝑧 𝜕𝑧
Since −1 only contributes to 𝑧 where it is multiplied by −4, the derivatives ,
𝜕𝑥 𝜕𝑥
are
(0, −4). Similarly, 2 only contributes to 𝑧 and it is multiplied by −4, so the derivatives
(︁ )︁
𝜕𝑧 𝜕𝑧
,
𝜕𝑥 𝜕𝑥
are (−4, 0). Continuing in this way yields the eight derivatives
41
Each of the cases for the derivative (assuming that 𝑥 − 𝑥 < 𝜋) is shown below
⎧
⎪
⎪
⎪
⎪
⎪((cos𝑝 𝑥, 0), (0, cos𝑝 𝑥)), for 𝑥 ⊂ (I ∪ IV)
⎪
⎪
⎪
⎪
𝑥), (cos𝑝 𝑥, 0)), for 𝑥 ⊂ (II ∪ III)
⎪
⎪
⎪
⎪
⎪((0, cos𝑝
⎪
⎪
⎪
⎪
⎨((cos𝑝 𝑥, 0), (0, 0)), for (𝑥 ⊂ (I ∪ II)) ∧ (sin𝑝 𝑥 < sin𝑝 𝑥)
⎪
sin𝑝
𝑑sin
⎪
= (4.2)
𝑑𝑥 ⎪
((0, 0), (cos𝑝 𝑥, 0)), for (𝑥 ⊂ (I ∪ II)) ∧ ¬(sin𝑝 𝑥 < sin𝑝 𝑥)
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
for (𝑥 ⊂ (III ∪ IV)) ∧ (sin𝑝 𝑥 > sin𝑝 𝑥)
⎪
⎪
⎪
⎪
⎪((0, cos𝑝 𝑥), (0, 0)),
⎪
⎪
⎪
⎪
⎩((0, 0), (0, cos𝑝 𝑥)), for (𝑥 ⊂ (III ∪ IV)) ∧ ¬(sin𝑝 𝑥 > sin𝑝 𝑥)
⎪
⎪
where the regions I, II, III, IV are those specified in Figure 2-1. If 𝑥 − 𝑥 ≥ 𝜋, my im-
plementation returns (0, 0), which is potentially too “loose” (because sine evaluated at an
interval with width 𝜋 may not span the whole range of sine), but is still sound because
over-approximation is acceptable for interval arithmetic.
4.3 Analysis
In this section, I introduce the mathematical challenges that arise from taking derivatives
through interval code and highlight some additional concerns in my implementation. Non-
differentiability is of particular concern. Addition on intervals 5 is differentiable, but mul-
tiplication on intervals 5 is only differentiable almost everywhere. For example, consider
[−2, 4] 5 [−4, 2] = [−16, 8]: the upper bound could either be from −2 × −4 or 4 × 2. Thus,
this computation is not differentiable, but most computations like the one shown in Exam-
ple 1 are differentiable. My implementation takes the derivative with respect to the selected
computation as a result of the nondeterministic choices (arising from the computation of
min and max with multiplicity at the extrema) made during the computation. Similarly,
sin is differentiable almost everywhere and is not differentiable, for example, at the interval
[−1, 2], which has a width of 3 and does not span [−1, 1].
Example 1 also exhibits dead-zones, where a set of inputs has a zero derivative. In
Example 1, where [−1, 2] 5 [−4, 1] = [−8, 4], “1” could be replaced with any 𝑎 in the open
interval (−4, 2) and produce the same result. Since none of these values of 𝑎 will contribute to
42
the output, they will have a derivative of (0, 0). Similarly, there is a dead-zone for all inputs
with a width greater than 3 for sin (the same is true for any definition for a width ≥ 2𝜋).
These dead-zones present a challenge for using derivatives as sensitivities. For example, all
of the derivatives are 0 for sin for a wide interval, indicating that it is not important to
decrease the interval widths. However, this is clearly not the case, as a non-infinitesimal
change (like the those used in refinement) may indeed yield a narrower interval (and a more
accurate result).
Together, dead-zones and non-differentiability present cases where this approach for us-
ing derivatives as sensitivities may fail. They also highlight some subtleties of computing
derivatives through interval code that may be worth further mathematical exploration and
analysis. Since computations “break ties” by making arbitrary non-deterministic choices, I
compute derivatives for every input (even at points that are technically not differentiable).
I now move on to establishing the benefit of computing these derivatives.
43
44
Chapter 5
Results
In this chapter, I present an experiment and provide empirical results demonstrating the
effectiveness of the critical path algorithm described in Chapter 3.
Consider the computation
𝑦 = 𝜋 + 2100000 𝑒, (5.1)
which is the motivating example from Section 1.1 depicted in Figure 1-1, except with 𝑘 =
2100000 . In this case, changing the precision at which 𝜋 is computed from 1 to 2 bits reduces
the output error by approximately 1, whereas for 𝑒, the same change in precision reduces
the output error by approximately 299999 .
5.1 Schedules
In this section, I define the baseline schedule and the critical path schedule that I will compare
empirically in Section 5.2 on the computation in Equation 5.1.
iRRAM uses a precision schedule 𝑆 that uniformly refines variables and operators with
45
where 𝑆 (0) = 0. The computation is computed at configurations starting with 𝑆 (1) . Intu-
itively, the refinement process increases the precision of every variable and operator in the
program by 25% until the output error is within the error bound.
Now consider the alternate precision schedule 𝑆 ′ that sets variables and operators to different
precisions depending on whether or not they are on the critical path 𝑃 (𝑘) = {+, ×, 𝜋} for
every 𝑘. The instantiation of the critical path algorithm I use is
⎧
⎨𝑆 ′(𝑘) + 50 · 1.33𝑘 if 𝑣 ∈ 𝑃 (𝑘)
⎪
⎪
𝑣
𝑆𝑣′(𝑘+1) =
⎩𝑆 ′(𝑘) + 50 · 1.25𝑘
⎪
otherwise
⎪
𝑣
where 𝑆 ′(0) = 0. Notice that the precision refinements increase at a faster rate along the
critical path than in the rest of the program.
I compare the time it takes to run two precision configurations that constitute a mapping
from variables and operations (in this case {+, 𝜋, ×, 2100000 , 𝑒}) to precisions for the example
presented in Equation 5.1 and Figure 1-1. The error in the computation is the width of the
output interval.
Notice that in Table 5.1, the precisions along the critical path (which is {+, ×, 𝑒} because
𝜕𝑦
𝜕𝑒
has the largest derivative) for 𝑆 ′(24) are higher than the precisions in 𝑆 (29) , while the
46
Configuration + 𝜋 × 2100000 𝑒
𝑆 (29) 129046 129046 129046 129046 129046
𝑆 ′(24) 142047 42151 142047 42151 142047
Table 5.1: The table presents a comparison of the precisions generated on the 29th iteration
of the baseline schedule, 𝑆 (29) , and the 24th configuration of the critical path schedule, 𝑆 ′(24) .
Table 5.2: The table presents a comparison of the first configurations satisfying the error
bound of 10−12000 for the baseline and critical path schedules.
variables not on the critical path have a lower precision. Furthermore, note from Table 5.2
that using the configuration 𝑆 ′(24) has more output error than 𝑆 (29) . This means that using
the critical path schedule produces a superior final configuration with a speed increase of
roughly 37% and higher output accuracy.
The amount of total computation of a schedule can be understood in terms of the number
of configurations computed and the time to compute each configuration. In the previous
section, I demonstrate that computing using the critical path algorithm requires less time
on the final configuration and fewer refinement steps. Since this is the case, it follows that
the schedule as a whole will take less total time as well.
Let 𝑡(𝑆 (𝑘) ) denote the time it takes to run the schedule 𝑆 for 𝑘 iterations (i.e. to run all
of the configurations 𝑆 (1) , 𝑆 (2) , . . . , 𝑆 (𝑘) ). Continuing this example, it is clear that the effect
of using the critical path algorithm is even more pronounced at the schedule level. I find
𝑡(𝑆 (29) ) = 0.163s and 𝑡(𝑆 ′(24) ) = 0.112s, while the error on the last iteration is as shown in
Table 5.2. This means that this approach produces a 45% speed increase with higher output
accuracy.
47
5.3 Implementation
I implement a push-based system of arbitrary-precision arithmetic that uses interval arith-
metic and computes derivatives using reverse-mode automatic differentiation. The imple-
mentation is in Python and uses the bigfloat wrapper for MPFR for multiple-precision
floating-point computations [14]. The implementation is available at https://github.com/
psg-mit/fast_reals.
I solve challenge 1 by running each experiment in a separate thread – allowing caching, but
only within each run and not among runs. I resolve challenges 2 and 3 by setting small
enough tolerances that durations are large enough to be easily measurable.
48
Chapter 6
Related Work
Related work aims to improve the performance of arbitrary-precision arithmetic with careful
software implementations, by restructuring computations for efficiency, and by caching [23,
25, 30]. Due to performance concerns, interval arithmetic and arbitrary-precision arithmetic
have not yet been widely adopted. However, interval and arbitrary-precision arithmetic has
uses in robotics and more generally for global optimization, and has been in the proof of the
Kepler conjecture and the Lorenz attractor in 3D [17, 21, 34].
49
6.1 Mixed-precision tuning and sensitivity analysis
The numerical tuning approaches providing error bounds often use SMT-solvers and
restrict the inputs to interval ranges [6, 7, 9, 10, 11]. One common way to use sensitivity
analysis techniques (which measure the effect that changing an input parameter has on the
output) is to produce annotations that identify operations requiring high precision, while
satisfying an error bound [31, 33]. For example, Hwang et. al. use automatic differentiation
to produce a sensitivity analysis of air quality models [20].
50
6.2.1 Pull-based approaches
iRRAM In iRRAM [30], intervals are iteratively, globally refined with a uniform precision
for each node in the computation tree to yield a result with the desired precision. In terms of
relevant optimizations, iRRAM supports by-hand labeling of specific parts of a computation
as more sensitive and thus computing them with higher precision than the rest of the pro-
gram. Müller evaluates iRRAM by showing its performance in computing simple arithmetic
√︁
(e.g. 1/3, log(1/3)), iterative functions (the logistic map 𝑥𝑖 = 3.75𝑥𝑖−1 (1 − 𝑥𝑖−1 )), and
inverting the Hilbert Matrix. Since these computations have a computation graph with few
or no branches, I would not expect significant speed increases using my proposed approach
on this set of benchmarks.
51
floating point in some cases. They also accelerate their computation by caching results, so
that if they appear in multiple places, the expression may be reused.
52
Chapter 7
In this chapter, I introduce some additional, preliminary benchmark results, reflect upon
some ways future researchers may improve precision refinement, and I lay out a new appli-
cation of the techniques developed in this thesis to experimental research.
7.1 Benchmarks
Methodology The benchmarks have variables that come from an interval range, constants
specified as floats, and operators. I implement a stream-based uniform random sampler that
selects a point from the interval range and provides an arbitrary-precision sample. Due to
the heavy use of random sampling, it is relatively computationally expensive to increase
precisions. The constants are left as-is and the precision remains the same throughout the
53
computation. The operations are replaced with their arbitrary-precision equivalents. For
simplicity, I use a subset of the FPBench benchmarks that does not have loops or other
language primitives that are possible to support [8].
Preliminary results Table 7.1 shows the speedup from using the critical path schedule
(Section 5.1.2) instead of the baseline schedule (Section 5.1.1). The parameters for the
critical path schedule are the same as the parameters in the experiments in Section 5. The
results are comparable, if not slightly worse, using the critical path algorithm for all of the
benchmarks except verhulst. Looking at the underlying computations, verhulst is the only
benchmark that has a single clear choice of critical path and benefits with a 2.12x speedup.
For the other computations, the critical path remains the same throughout the computation
and thus, that path is over-refined. In other words, the refinement results in a little extra
computation with little benefit in output precision. This problem is compounded by the
methodology where more digits of variables are sampled on-the-fly, which is computationally
difficult. The benefits of the critical path schedule on general purpose computation are
limited by the lack of per-variable and per-operation cost modeling and by the simplicity of
the algorithm. I discuss ways to broaden the applicability of this technique and to extend it
in Section 7.2.
The sensitivity analysis that I focus on in this thesis does not take into account the compu-
tational difficulty of refining different variables and operations. For example, generating the
10th digit of precision for 𝜋 will generally require more compute than for a sum. Operators
will require different amounts of compute that will scale differently with 𝑝. These differ-
ences can be accounted for by incorporating them into a cost model like the one presented
54
Benchmarks # Ops Speedup
carbon gas 15 0.97
doppler1 11 1.0
doppler2 11 0.99
doppler3 11 0.97
jetEngine 28 0.92
predPrey 7 0.93
rigidbody1 11 0.96
rigidbody2 13 0.93
sine 11 0.91
sineOrder3 6 0.96
sqroot 12 0.97
turbine1 16 0.92
turbine2 13 0.96
turbine3 16 0.96
verhulst 5 2.12
Table 7.1: The figure presents the FPBench benchmark results. “#Ops” is the number of
variables and operations in the computation. The “Speedup” is the ratio of the time it takes
to respect an error bound of 10−12000 using the critical path schedule (Section 5.1.2) and the
baseline schedule (Section 5.1.1).
in Section 3.2.3. Defining the cost model could be done theoretically (by hand-coding the
asymptotic behavior of each variable and operator) or empirically (by collecting data for
each of the variables and operators and modeling the observed behavior).
55
Section 7.1).
The critical path algorithm refines uniformly across the computation except along the critical
path. This algorithm is relatively easy to implement and analyze with respect to alternate
algorithms that may empirically perform better. For example, consider an algorithm that
uses the sensitivities to refine at different rates along each path in the computation graph
(from the root to the leaf) based on the sensitivity of each of the leaves. Understanding the
degree to which to refine each of these paths is an open problem that could lead to significant
speed improvements, since the effect of the critical path algorithm will be compounded along
all paths simultaneously.
56
Chapter 8
Conclusions
57
58
Bibliography
[1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig
Citro, Gregory S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghe-
mawat, Ian J. Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia,
Rafal Józefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Ra-
jat Monga, Sherry Moore, Derek Gordon Murray, Chris Olah, Mike Schuster, Jonathon
Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul A. Tucker, Vincent Van-
houcke, Vijay Vasudevan, Fernanda B. Viégas, Oriol Vinyals, Pete Warden, Martin
Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale
machine learning on heterogeneous distributed systems. CoRR, 2016.
[3] David Bailey, Peter Borwein, and Simon Plouffe. On the Rapid Computation of Various
Polylogarithmic Constants. 1997.
[4] Atilim Günes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jef-
frey Mark Siskind. Automatic differentiation in machine learning: A survey. Journal of
Machine Learning Research, 2017.
[5] Jens Blanck. Exact real arithmetic systems: Results of competition. In Computability
and Complexity in Analysis, 2001.
[6] Wei-Fan Chiang, Mark Baranowski, Ian Briggs, Alexey Solovyev, Ganesh Gopalakrish-
nan, and Zvonimir Rakamarić. Rigorous floating-point mixed-precision tuning. SIG-
PLAN Not., 2017.
[7] Wei-Fan Chiang, Mark Baranowski, Ian Briggs, Alexey Solovyev, Ganesh Gopalakrish-
nan, and Zvonimir Rakamarić. Rigorous floating-point mixed-precision tuning. ACM
SIGPLAN Notices, 2017.
[8] Nasrine Damouche, Matthieu Martel, Pavel Panchekha, Jason Qiu, Alex Sanchez-Stern,
and Zachary Tatlock. Toward a standard benchmark format and suite for floating-point
analysis. 2016.
[9] Eva Darulova, Anastasiia Izycheva, Fariha Nasir, Fabian Ritter, Heiko Becker, and
Robert Bastian. Daisy - framework for analysis and optimization of numerical programs
(tool paper). In TACAS, 2018.
59
[10] Eva Darulova and Viktor Kuncak. Sound compilation of reals. Principles of Program-
ming Languages, 2014.
[11] Eva Darulova and Viktor Kuncak. Towards a compiler for reals. ACM Trans. Program.
Lang. Syst., 2017.
[13] W. Edmonson and G. Melquiond. IEEE interval standard working group - P1788:
Current status. In Symposium on Computer Arithmetic, 2009.
[14] Laurent Fousse, Guillaume Hanrot, Vincent Lefèvre, Patrick Pélissier, and Paul Zimmer-
mann. MPFR: A multiple-precision binary floating-point library with correct rounding.
ACM Trans. Math. Softw., 2007.
[15] Paul Gowland and David Lester. A survey of exact arithmetic implementations. In
Computability and Complexity in Analysis, 2001.
[16] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Deep
learning with limited numerical precision. In International Conference on Machine
Learning, 2015.
[17] Thomas Hales. A proof of the Kepler conjecture. Annals of Mathematics, 2005.
[18] T. Hickey, Q. Ju, and M. H. Van Emden. Interval arithmetic: From principles to
implementation. J. ACM, 2001.
[20] Dongming Hwang, Daewon W. Byun, and M. [Talat Odman]. An automatic differen-
tiation technique for sensitivity analysis of numerical advection schemes in air quality
models. Atmospheric Environment, 1997.
[21] Luc Jaulin and Benoît Desrochers. Introduction to the algebra of separators with ap-
plication to path planning. Engineering Applications of Artificial Intelligence, 2014.
[22] E. Kaucher. Interval Analysis in the Extended Interval Space IR. 1980.
[23] Hideyuki Kawabata. Speeding up exact real arithmetic on fast binary cauchy sequences
by using memoization based on quantized precision. In Journal of Information Process-
ing, 2017.
[24] Reinhard Kirchner and Ulrich W. Kulisch. Hardware support for interval arithmetic.
Reliable Computing, 2006.
60
[26] Yong Li and Yong Jun-Hai. Efficient exact arithmetic over constructive reals. In The
4th Annual Conference on Theory and Applications of Models of Computation, 2007.
[27] Valérie Ménissier-Morain. Arbitrary precision real arithmetic: design and algorithms.
The Journal of Logic and Algebraic Programming, 2005.
[28] Ramon E. Moore, R. Baker Kearfott, and Michael J. Cloud. First Applications of
Interval Arithmetic, chapter 3, pages 19–29.
[29] Duncan J.M Moss, Srivatsan Krishnan, Eriko Nurvitadhi, Piotr Ratuszniak, Chris
Johnson, Jaewoong Sim, Asit Mishra, Debbie Marr, Suchit Subhaschandra, and
Philip H.W. Leong. A customizable matrix multiplication framework for the Intel
HARPv2 Xeon+FPGA platform: A deep learning case study. In International Sympo-
sium on Field-Programmable Gate Arrays, 2018.
[30] Norbert Th. Müller. The iRRAM: Exact arithmetic in C++. In Computability and
Complexity in Analysis, 2000.
[31] B. Nongpoh, R. Ray, S. Dutta, and A. Banerjee. AutoSense: A framework for automated
sensitivity analysis of program data. IEEE Transactions on Software Engineering, 2017.
[32] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary
DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic
differentiation in PyTorch. In NIPS-W, 2017.
[33] Pooja Roy, Rajarshi Ray, Chundong Wang, and Weng Fai Wong. ASAC: Automatic
sensitivity analysis for approximate computing. Conference on Languages, Compilers
and Tools for Embedded Systems, 2014.
[34] Warwick Tucker. A rigorous ODE solver and Smale’s 14th problem. Foundations of
Computational Mathematics, 2002.
61