0% found this document useful (0 votes)
14 views

Arbitrary-Precision Arithmetic

This thesis proposes using automatic differentiation of interval arithmetic to compute sensitivities, which can then guide precision refinement in arbitrary-precision arithmetic computations. The key idea is to refine variable and operator precisions based on a "critical path" approach informed by sensitivities, rather than using a uniform refinement. This approach is evaluated empirically on sample programs and shown to improve performance over baseline uniform refinement schedules. The thesis discusses applications to numerical algorithm development and experimental research.

Uploaded by

黄子龙
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Arbitrary-Precision Arithmetic

This thesis proposes using automatic differentiation of interval arithmetic to compute sensitivities, which can then guide precision refinement in arbitrary-precision arithmetic computations. The key idea is to refine variable and operator precisions based on a "critical path" approach informed by sensitivities, rather than using a uniform refinement. This approach is evaluated empirically on sample programs and shown to improve performance over baseline uniform refinement schedules. The thesis discusses applications to numerical algorithm development and experimental research.

Uploaded by

黄子龙
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Sensitivities for Guiding Refinement in

Arbitrary-Precision Arithmetic
by
Jesse Michel
B.S., Massachusetts Institute of Technology (2019)
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
May 2020
© Massachusetts Institute of Technology 2020. All rights reserved.

Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical Engineering and Computer Science
May 18, 2020

Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Michael Carbin
Jamieson Career Development Assistant Professor
of Electrical Engineering and Computer Science
Thesis Supervisor

Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Katrina LaCurts
Chair, Master of Engineering Thesis Committee
2
Sensitivities for Guiding Refinement in Arbitrary-Precision
Arithmetic
by
Jesse Michel

Submitted to the Department of Electrical Engineering and Computer Science


on May 18, 2020, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science

Abstract
Programmers often develop and analyze numerical algorithms assuming that they operate on
real numbers, but implementations generally use floating-point approximations. Arbitrary-
precision arithmetic enables developers to write programs that operate over reals: given an
output error bound, the program will produce a result within that bound. A key drawback
of arbitrary-precision arithmetic is its speed. Fast implementations of arbitrary-precision
arithmetic use interval arithmetic (which provides a lower and upper bound for all vari-
ables and expressions in a computation) computed at successively higher precisions until
the result is within the error bound. Current approaches refine computations at precisions
that increase uniformly across the computation rather than changing precisions per-variable
or per-operator. This thesis proposes a novel definition and implementation of derivatives
through interval code that I use to create a sensitivity analysis. I present and analyze the
critical path algorithm, which uses sensitivities to guide precision refinements in the compu-
tation. Finally, I evaluate this approach empirically on sample programs and demonstrate
its effectiveness.

Thesis Supervisor: Michael Carbin


Title: Jamieson Career Development Assistant Professor
of Electrical Engineering and Computer Science

3
4
Acknowledgments
I thank my advisor Michael Carbin. He helped guide the intuition and motivation that
shaped this thesis and provided useful feedback and guidance on the experimental results. I
would also like to thank Ben Sherman for helping to develop technical aspects of this thesis,
for making the time to review my writing, and for his guidance throughout the research
process. Alex Renda, Rogers Epstein, Stefan Grosser, and Nina Thacker provided useful
feedback. I am grateful for the financial support that I have received from NSF grant CCF-
1751011.
I thank my parents, sisters, and extended family for their love and support and my
nephew Joseph for being a shining light in my life.

5
6
Contents

1 Introduction 13
1.1 Motivating example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Background on Interval Arithmetic 19


2.1 Interval addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Interval multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Interval sine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Sensitivities for Precision Refinement 25


3.1 A baseline schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Sensitivities from derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Constructing sensitivities . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.2 Sensitivity as a derivative . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.3 Introducing a cost model . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 A schedule using sensitivities . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.1 Uniform schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.2 Critical path schedule . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.3 Cost-modeled schedule . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.4 A comparison of schedules . . . . . . . . . . . . . . . . . . . . . . . . 34

7
4 Automatic Differentiation of Interval Arithmetic 37
4.1 Introduction to automatic differentiation . . . . . . . . . . . . . . . . . . . . 37
4.2 Automatic differentiation on intervals . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Derivative of interval addition . . . . . . . . . . . . . . . . . . . . . . 39
4.2.2 Derivative of interval multiplication . . . . . . . . . . . . . . . . . . . 40
4.2.3 Derivative of interval sine . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5 Results 45
5.1 Schedules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.1 Baseline schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.2 Critical path schedule . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Empirical comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.1 Improving a configuration . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.2 Improving a schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6 Related Work 49
6.1 Mixed-precision tuning and sensitivity analysis . . . . . . . . . . . . . . . . . 50
6.2 Arbitrary-precision arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2.1 Pull-based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2.2 Push-based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7 Discussion and Future Work 53


7.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.2 Further improving precision refinement . . . . . . . . . . . . . . . . . . . . . 54
7.2.1 Per-primitive cost modeling . . . . . . . . . . . . . . . . . . . . . . . 54
7.2.2 Unexplored trade-offs in precision refinement . . . . . . . . . . . . . . 55
7.2.3 Generalizing the critical path algorithm . . . . . . . . . . . . . . . . . 56
7.3 New applications to experimental research . . . . . . . . . . . . . . . . . . . 56

8 Conclusions 57

8
List of Figures

1-1 Example of a uniform configuration . . . . . . . . . . . . . . . . . . . . . . . 15


1-2 Derivatives of the computation in Figure 1-1. . . . . . . . . . . . . . . . . . . 16
1-3 Sensitivities of the computation in Figure 1-1 . . . . . . . . . . . . . . . . . 16
1-4 Example of a non-uniform configuration . . . . . . . . . . . . . . . . . . . . 16

2-1 The four key monotonic regions for the definition of interval sine. . . . . . . 22
2-2 A simple Python implementation of interval sin. . . . . . . . . . . . . . . . . 23

3-1 Computation graph for theoretical analysis . . . . . . . . . . . . . . . . . . . 31

4-1 Reverse-mode automatic differentiation on intervals. . . . . . . . . . . . . . 39


4-2 Interval addition with derivatives. . . . . . . . . . . . . . . . . . . . . . . . 40

9
10
List of Tables

3.1 Theoretical comparison of schedules . . . . . . . . . . . . . . . . . . . . . . . 35

5.1 Comparison of precisions for configurations . . . . . . . . . . . . . . . . . . . 47


5.2 Comparison of error and time for configurations . . . . . . . . . . . . . . . . 47

7.1 FPBench benchmark results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

11
12
Chapter 1

Introduction

Floating-point computations can produce arbitrarily large errors. For example, Python
implements the IEEE-754 standard, which produces the following behavior for 64-bit floating-
point numbers:

>>> 1 + 1e17 - 1e17


0.0

The result of this computation is 0 instead of 1! This leads to an arbitrarily large error
in results; for example, (1 + 1e17 − 1e17)𝑥 will be always be 0 instead of 𝑥. Resilience to
numerical-computing error is especially desirable for safety-critical software such as control
systems for vehicles, medical equipment, and industrial plants, which are known to produce
incorrect results because of numerical errors [12].
In contrast to floating-point arithmetic, arbitrary-precision arithmetic computes a result
within a given error bound. Concretely, given the function 𝑦 = 𝑓 (𝑥) and an error bound 𝜖,
arbitrary-precision arithmetic produces a result 𝑦˜ such that


𝑦 − 𝑦| < 𝜖.

Arbitrary-precision primitives It is necessary to use a data-type that supports arbi-


trary rational numbers in order to refine to arbitrarily small error. I chose to use a multiple-
precision floating-point implemented in MPFR [14]. To understand the representation, con-

13
sider the example of representing 𝜋 to 5 mantissa bits:

exponent
⏞ ⏟
11.0012 = 110012 ×2 −3 .
⏟ ⏞
mantissa

The exponent automatically adjusts as appropriate, so requesting 10 mantissa bits of preci-


sion results in
11.001001002 = 11001001002 × 2−8 .

Since the exponent adjusts automatically, I focus on setting the number of mantissa bits for
the variables and operators in the computation. For the rest of the thesis, bits of precision
will denote mantissa bits.

Implementing arbitrary-precision arithmetic The push-based approach to implement-


ing arbitrary-precision sets the precisions at which to compute each variable and operator
and computes the error in the output. It then refines results at increasingly high precisions
until the result is within the given error bound. Each pass through the computation uses
interval arithmetic, which computes error bounds by “pushing” bounds from the leaves of
the computation graph up to the root. For example, assuming no error in addition, ; works
such that
[1, 2] ; [3, 4] = [4, 6].

More realistically, suppose that the function +𝑝 : R × R → R2 for bounded addition at pre-
cision 𝑝. Then, +𝑝 and +𝑝 compute the lower and upper bound for adding inputs truncated
to precision 𝑝. They satisfy the property that for all 𝑎, 𝑏 ∈ R, (𝑎 +𝑝 𝑏) ≤ (𝑎 + 𝑏) ≤ (𝑎 +𝑝 𝑏)
where + is exact and where as 𝑝 → ∞ the inequality becomes an equality. Assuming error
in addition,
[1, 2] ;𝑝 [3, 4] = [1 +𝑝 3, 2 +𝑝 4],

which will always have a lower bound ≤ 4 and an upper bound ≥ 6. Computing constants
such as 𝜋 or 𝑒 makes the need for this type of approximation clearer since it requires infinite
space to represent them exactly (since 𝜋 and 𝑒 are transcendental). However, they are
soundly computed using arbitrary-precision arithmetic.

14
1.1 Motivating example

Current push-based implementations refine precisions uniformly across the computation


graph [30, 25]. Concretely, this means setting all variables and operators to the same preci-
sion (e.g. 1 mantissa bit) and if the error bound is not satisfied, repeating the computation
at a higher precision (e.g. 2 mantissa bits). This means that certain variables and operators
are computed to a high precision even when they contribute little to the error – an inefficient
allocation of compute resources. For example, consider computing

𝑒 + 1000𝜋

to a generous error bound of 500. Existing approaches refine precision uniformly across vari-
ables and operators [30, 25]. In the best case scenario, these approaches require 5 mantissa
bits for ;, 𝑒, 5, and 𝜋 (since 𝑘 is a constant, it remains at a fixed precision). Note that ;
and 5 are the addition and multiplication operators over intervals respectively, described in
detail in Chapter 2. An example of this computation is shown in Figure 1-1. Suppose the

; [3070, 3460]

𝑒 [2.62, 2.75] 5 [3070, 3330]

𝑘 [1000, 1000] 𝜋 [3.12, 3.25]

Figure 1-1: The figure presents a computation graph evaluated at a uniform precision of 5
mantissa bits of precision (except for the constant 𝑘) with an error of 3460 − 2070 = 390.

approach to precision-refinement is to start at a uniform precision of 1 mantissa bit, and


then increment precisions (mantissa bits) until the error bound is satisfied. In this case,
there are four refinement steps and the error bound of 500 will only be reached on the 4th
refinement at 5 mantissa bits.
I propose an approach that generates non-uniform precision assignments. To determine
which vertices to refine to a higher precision, I introduce a novel sensitivity analysis that
measures the infinitesimal change in the interval width of the output to an infinitesimal
change to the interval width of each of the variables and operators in the computation. The

15
sensitivities are implemented with automatic differentiation through the interval code, which
is novel as well. More explicitly, if the output is the interval 𝑦 = [𝑦, 𝑦], then for each interval
(︂ )︂
𝜕(𝑦−𝑦) 𝜕(𝑦−𝑦)
𝑥 = [𝑥, 𝑥], the derivatives will be 𝜕𝑥
, 𝜕𝑥 as shown in Figure 1-2. Note that the
parentheses in the figure denote pairs of numbers (tuples), not open intervals. The sensitivity

; (1, −1)

𝑒 (1, −1) 5 (1, −1)

𝑘 N/A 𝜋 (1000, −1000)

Figure 1-2: Derivatives of the computation in Figure 1-1.

is the difference between the derivative of the output with respect to the lower bound and
𝜕(𝑦−𝑦) 𝜕(𝑦−𝑦)
the derivative of the output with respect to the upper bound, namely 𝜕𝑥
− 𝜕𝑥
. The
resulting sensitivities are presented in Figure 1-3.

; 2

𝑒 2 5 2

𝑘 N/A 𝜋 2000

Figure 1-3: Sensitivities are the derivative with respect to the lower bound minus the deriva-
tive with respect to the upper bound as shown in Figure 1-2.

The most sensitive vertex in the computation graph in Figure 1-3 is 𝜋 because 2000 is
the largest sensitivity. The proposed technique identifies the critical path as the path from
the root to the most sensitive vertex. In this case, the critical path is ; → 5 → 𝜋. The

; [3070, 3460]

𝑒 [2.5, 3] 5 [3070, 3330]

𝑘 [1000, 1000] 𝜋 [3.12, 3.25]

Figure 1-4: Computation graph using 5 mantissa bits for ;, 5, and 𝜋, 3 mantissa bits for 𝑒,
and not changing the constant 𝑘. The critical path is bolded.

resulting computation graph is shown in Figure 1-4. Along the critical path, variables on

16
operators are incremented by 2 mantissa bits, while the remainder of the computation graph
is incremented by 1. This is an instantiation of the critical path algorithm. In this case,
the first configuration satisfying the error bound assigns 5 mantissa bits along the critical
path and 3 bits to 𝑒. As 𝑘 becomes larger, approaches using uniform refinement techniques
compute more and more decimal places of 𝑒 unnecessarily. The critical path algorithm can
avoid this problem.

1.2 Thesis
In this thesis, I investigate ways to improve precision refinement in arbitrary-precision arith-
metic. I define a novel sensitivity analysis in terms of derivatives computed through interval
code. Using these sensitivities, I propose an algorithm – the critical path algorithm – that
guides the refinement process of arbitrary-precision arithmetic. The sensitivities use deriva-
tives computed with reverse-mode automatic differentiation through interval code, which is
novel. I implement a system for performing arbitrary-precision arithmetic and demonstrate
that the critical path algorithm can guide refinements to produce more accurate results with
less computation on certain programs.

1.3 Outline
The thesis is structured as follows. In Chapter 2, I explain how interval arithmetic works
and elaborate on the mathematical and implementation challenges. Then in Chapter 3, I
present the current approach to implementing arbitrary-precision arithmetic using interval
arithmetic and show how it may be improved assuming derivatives of interval code can be
efficiently computed. I describe the approach to efficient derivative computation in Chap-
ter 4. Next, I present empirical results using the proposed approach to precision refinement
in arbitrary precision arithmetic in Chapter 5. I discuss some related work in Chapter 6
and finally present a discussion in Chapter 7 and conclusions in Chapter 8. The open-source
implementation is available at https://github.com/psg-mit/fast_reals.

17
18
Chapter 2

Background on Interval Arithmetic

This chapter provides a brief introduction to interval arithmetic. I describe the interval
operations for addition, multiplication, and sine. I also provide code to elucidate the under-
lying implementation and provide an analysis of some of the properties that arise from using
interval arithmetic.

Interval arithmetic is a method of computing that provides a bound on output error, use-
ful in the implementation of push-based arbitrary-precision arithmetic. For a more thorough
treatment of interval arithmetic, including an analysis of correctness, totality, closedness,
optimality, and efficiency, see [18].

An interval version of 𝑓 : R𝑛 → R𝑚 will take 𝑛 intervals as an input ⃗𝑥 ∈ (R2 )𝑛 and


produce lower and upper bounds on each of the outputs. Thus, the interval arithmetic
computation of 𝑓 will be 𝑓 ′ : R2𝑛 → R2𝑚 , where 𝑓 ′ (⃗𝑥) produces 𝑚 intervals [𝑓 ′ (⃗𝑥)𝑖 , 𝑓 ′ (⃗𝑥)𝑖 ]
such that
𝑓 ′ (⃗𝑥)𝑖 ≤ 𝑓 (⃗𝑥)𝑖 ≤ 𝑓 ′ (⃗𝑥)𝑖

for each 𝑖 = 1, 2, . . . , 𝑚. To achieve this, I take a compositional approach by converting


each operation in 𝑓 to a version over intervals (that takes intervals as input and produces
intervals as output).

19
2.1 Interval addition

In this section, I show how to implement interval addition given access to primitives provided
in a number of libraries such as MPFR [14]. Assume that the function +𝑝 : R × R → R2 that
computes error-bounded interval addition at precision 𝑝 is given and satisfies the property
that (𝑎 +𝑝 𝑏) ≤ (𝑎 + 𝑏) ≤ (𝑎 +𝑝 𝑏) where + is exact and such that, in the limit as
𝑝 → ∞, the inequality becomes equality. The addition operator over intervals at precision 𝑝
is ;𝑝 : R2 × R2 → R2 and has the following behavior: given the two intervals

𝑖1 = [𝑖1 , 𝑖1 ] and 𝑖2 = [𝑖2 , 𝑖2 ],

it computes the sum


𝑖1 ;𝑝 𝑖2 = [𝑖1 +𝑝 𝑖2 , 𝑖1 +𝑝 𝑖2 ].

This is correct because there is a precondition that 𝑖1 , 𝑖2 are valid intervals (i.e. 𝑖1 ≤ 𝑖1 and
𝑖2 ≤ 𝑖2 ) and addition is monotonic increasing. Thus, the minimum and maximum possible
values of the sum are the lower and upper bounds given.

2.2 Interval multiplication

Implementing interval multiplication is a little more nuanced because multiplication over the
reals is neither monotonic increasing nor monotonic decreasing over the reals. For example,
−5 × −5 = 25 and −4 × −5 = 20, so increasing an argument may decrease the output. On
the other hand, 5 × 5 = 25 and 5 × 6 = 30, so increasing an argument may increase the
output. It is possible to regain monotonicity by partitioning the reals into the negative R−
and non-negative R+ . Kaucher multiplication is an algorithm that takes advantage of this
structure is [22].

I present a simpler, but potentially less efficient, algorithm. Assume that the function
×𝑝 : R × R → R2 that computes error-bounded multiplication at precision 𝑝 is given and
satisfies the property that (𝑎 ×𝑝 𝑏) ≤ (𝑎 × 𝑏) ≤ (𝑎 ×𝑝 𝑏) where × is exact and such that, in

20
the limit as 𝑝 → ∞, the inequality becomes equality. Given the two intervals

𝑖1 = [𝑖1 , 𝑖1 ] and 𝑖2 = [𝑖2 , 𝑖2 ],

the product at precision 𝑝 on intervals 5𝑝 : R2 × R2 → R2 to

𝑖1 5𝑝 𝑖2 = [min 𝑆, max 𝑆]

{︁ }︁
where 𝑆 = 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 is the lower bound of each of the pairwise
{︁ }︁
products and 𝑆 = 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 , 𝑖1 ×𝑝 𝑖2 is the upper bound of each of the
pairwise products. The correctness proof is provided in Section 4.6 of [18].

2.3 Interval sine

Even more difficult is the computation of interval sin 𝑝 : R2 → R2 . The contributions in this
section are, to my knowledge, novel, but are not core to the thesis as a whole. This section
serves the purpose of introducing some of the relevant challenges of implementing interval
arithmetic.

Sine is periodic and composed of monotonic segments. I approach computing interval


sine by cases with respect to these segments and identify which region or regions the bounds
lie within. Figure 2-1 depicts the four monotonic segments key to the implementation of
interval sine.

More formally, let 𝑥 ∈ R2 be given. In the implementation, if 𝑥 − 𝑥 is large (greater


than 3), then the output range will be the range of sine (i.e. [−1, 1]). Otherwise, consider
the following cases: 𝑥 ⊂ (I ∪ IV), 𝑥 ⊂ (II ∪ III), 𝑥 ⊂ (I ∪ II), and 𝑥 ⊂ (III ∪ IV). I assume
access to a function sin𝑝 : R → R2 , where 𝑝 is the precision of the result, defined such that
sin𝑝 (𝑥) ≤ sin(𝑥) ≤ sin𝑝 (𝑥). This function is provided in MPFR [14]. In the first case where
𝑥 ∈ (I ∪ IV), the interval is monotonic increasing and the result is the under-approximation
of the sine of the lower bound of the input and the over-approximation of the upper bound
of the input. Similar reasoning applies to the other cases (assuming that 𝑥 − 𝑥 < 𝜋) and is

21
Figure 2-1: The four key monotonic regions for the definition of interval sine.

represented in the equation below:



for 𝑥 ⊂ (I ∪ IV)




⎪[sin𝑝 𝑥, sin𝑝 𝑥],




⎨[sin𝑝 𝑥, sin𝑝 𝑥], for 𝑥 ⊂ (II ∪ III)


sin 𝑝 (𝑥) = (2.1)

[min(sin 𝑥, sin 𝑥), 1], for 𝑥 ⊂ (I ∪ II)








𝑥)], for 𝑥 ⊂ (III ∪ IV)

⎩[−1, max(sin𝑝 𝑥, sin𝑝

The Python implementation in Figure 2-2 surfaces a few details that I did not specify in
the mathematical presentation. For example, it shows how to identify the monotonic regions
labeled in Figure 2-1 using cosine. I use the bigfloat Python wrapper for the MPFR library
to compute sin𝑝 , sin𝑝 : R → R [14].

I check that the width of the interval 𝑥 is less than 3 rather than 𝜋 because the contract
of interval arithmetic allows for over-estimation (loose bounds) and in practice, the bounds
are generally tight intervals (with width much less than 3). Also note that the cosine is used
to identify the various regions by their slope, avoiding modular arithmetic and significantly
simplifying the implementation when compared with other approaches ([2]). For example,

22
def interval_sin ( interval , lower_at_p , upper_at_p ) :
lower , upper = interval

# Computes the lower or upper bound at a given precision


lp , up = lower_at_p , upper_at_p

# Start at the range of sine


out_lower , out_upper = -1 , 1

if sub ( upper , lower , up ) < 3 :


# Signs of derivatives identify monotonic regions
if ( cos ( lower , lp ) > = 0 ) and ( cos ( upper , lp ) > = 0 ) :
out_lower , out_upper = sin ( lower , lp ) , sin ( upper , up )

elif ( cos ( lower , up ) < = 0 ) and ( cos ( upper , up ) < = 0 ) :


out_lower , out_upper = sin ( upper , lp ) , sin ( lower , up )

elif ( cos ( lower , lp ) > = 0 ) and ( cos ( upper , up ) < = 0 ) :


out_lower = min ( sin ( lower , lp ) , sin ( upper , lp ) )

elif ( cos ( lower , up ) = < 0 ) and ( cos ( upper , lp ) > = 0 ) :


out_upper = max ( sin ( lower , up ) , sin ( upper , up ) )

return [ out_lower , out_upper ]

Figure 2-2: A simple Python implementation of interval sin.

their implementation requires that 𝜋 is computed to higher precisions to compute sine at


higher precisions for certain inputs. Furthermore, the cosine computation can be reused when
implementing the derivative of sin 𝑝 , which is both convenient and efficient (see Section 4.2.3
for more details).

2.4 Analysis

I will briefly reflect on some of the properties of interval arithmetic with these operators.
Applying an interval operator is sound if for every input interval, the output interval contains
the result. Since any operator can be implemented soundly by returning [−∞, ∞], we need a
condition on tightness. A precision-parameterized functions on intervals (e.g., ;𝑝 ) is tight if,
for every input interval, in the limit as the precision approaches infinity, the width approach
the width when the computation is exact (error-free) over the reals.
I analyze these properties with respect to ;𝑝 , ×𝑝 , and sin𝑝 . ; is sound and tight because

23
it acts element-wise on each of the input intervals with +𝑝 : R × R → R. Similarly, 5𝑝 is
sound and tight. The implementation of sin 𝑝 can be both sound and tight [2]. However, the
implementation provided is sound, but not tight, because it returns [−1, 1] for input intervals
with width greater than 3. It is tight for intervals narrower than 3. In the high precision
limit for arbitrary-precision arithmetic, the input interval widths tend to 0 and thus sin 𝑝 is
tight.

24
Chapter 3

Sensitivities for Precision Refinement

The push-based approach to implementing arbitrary-precision arithmetic relies upon guess-


ing appropriate precisions for all of the variables and operations in a computation and then
checking whether the error of the result is small enough. If it is not, precisions must be
increased, or refined, until the error bound is satisfied. In this chapter, I present an estab-
lished approach to precision refinement and present a novel algorithm – the critical path
algorithm – to improve precision refinement. I also provide an analysis of the convergence
rate of different schedules operating on a particular class of computations.
At a high level, the critical path algorithm guides precisions of the variables and operators
in a computation using a heuristic. I represent a computation as its underlying computation
graph (e.g. Figure 1-1). A computation graph is a directed acyclic graph consisting of
vertices 𝑉 (the variables and operators in the computation) and edges such that a vertex
𝑣1 has a directed edge to 𝑣2 if and only if 𝑣1 is an operator and 𝑣2 is one of its arguments.
Thus, the leaves will always be variables or constants, and the non-leaf nodes will always be
operators.
A forward pass on a computation graph performs operations from the leaves to the root
following the precisions at each vertex. To more easily refer to parts of this push-based
computation, I introduce the following terminology:

Definition 3.0.1. A configuration 𝐶 : 𝑉 → N maps vertices in the computation graph to


precisions such that a variable 𝑣 ∈ 𝑉 is represented using 𝐶𝑣 bits of precision.

25
Definition 3.0.2. A schedule is a sequence of configurations 𝑆 : N → (𝑉 → N) such that
𝑛 ↦→ 𝑆 (𝑛) where 𝑆 (𝑛) is the 𝑛th configuration.

Each forward pass computes with respect to a configuration and a push-based com-
putation will follow a schedule – computing with respect to the successive configurations
(𝑆 (𝑖) )𝑖∈N until the result lies within the error bounds. In general, schedules in push-based
computations will produce configurations that assign variables to increasingly high preci-
sions (𝑆𝑣(𝑘) < 𝑆𝑣(𝑘+1) for all 𝑘 ∈ N), leading to a monotonically decreasing error on commonly
occurring computations.

3.1 A baseline schedule

I begin by considering a baseline that generalizes the schedule proposed by iRRAM [30].
This schedule computes the function 𝑓 (𝑥) by setting the (𝑘 + 1)th configuration as

𝑆𝑣(𝑘+1) = 𝑆𝑣(𝑘) + 𝑎𝑏𝑘 (3.1)

where 𝑆 (0) , 𝑎, 𝑏 are parameters that define the behavior of the schedule. Notice that the
precisions grow exponentially. I present this schedule simply because it is used in iRRAM, one
of the fastest arbitrary-precision arithmetic libraries [5]. There are fundamental trade-offs
between different choices of configurations, with the central concerns being: (1) overshooting
– when the final configuration is at an unnecessarily high precision – and (2) undershooting
– requiring too many forward passes to converge (i.e. the error bound is satisfied when 𝑘 in
𝑆 (𝑘) is large).

The problem For the schedule in Equation 3.1, the final configuration 𝑆 (𝑘) that satisfies
the given error bound assigns each variable and operation to the same precision; this as-
signment is rarely optimal and may be far from it. Chapter 1 provides a worked example of
a case where uniform refinement is suboptimal and benefits from setting different variables
and operations to different precisions. Although it may be possible to compute the necessary
precisions optimally by hand (at least for simple cases), an optimal, automated approach to

26
precision refinement would require perfectly modeling floating-point error, which has evaded
researchers.
I choose to take a heuristic approach. Heuristics may be used to guide schedules to
configurations satisfying error bounds, while minimizing the amount of total computation
(the sum of all of the compute, generally measured in time, required for all of the configu-
rations run). A good heuristic is fast to compute and guides the computation quickly to a
configuration respecting the given error bound without overshooting or undershooting.

3.2 Sensitivities from derivatives

In this section, I describe the novel algorithm to compute sensitivities assuming that deriva-
tives of the interval code are already provided. The sensitivities provide a measure of the
amount of change in the output interval width from a change to the input interval width. In
Chapter 4, I demonstrate that derivatives of interval code can be computed efficiently using
automatic differentiation.

3.2.1 Constructing sensitivities

I now present a sensitivity analysis that is a key contribution of this thesis. My construction
of sensitivities of interval computations assumes correctly computed derivatives are already
provided. I will detail the implementation of these derivatives in the following chapter. Run-
ning automatic differentiation on the computation graph of an interval arithmetic expression
produces 4 partial derivatives for each 𝑣 ∈ 𝑉 . In particular, for vertex 𝑣𝑥 corresponding to
the input interval 𝑥, if the function 𝑓 has the output 𝑦 = 𝑓 (𝑥), then the change in the output
𝑦 = [𝑦, 𝑦] with respect to a change in the interval 𝑥 = [𝑥, 𝑥] is

𝜕𝑦 𝜕𝑦 𝜕𝑦 𝜕𝑦
, , , . (3.2)
𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥

𝜕𝑦
For example, 𝜕𝑥
is an intuitive answer to the question “what will be the change in the
lower bound of the output given a small increase in the lower bound of 𝑣𝑥 ?” Increasing the

27
precision at which 𝑣𝑥 is computed decreases the width of the output interval, and thus,

𝜕𝑦 𝜕𝑦 𝜕𝑦 𝜕𝑦
, ≥ 0, , ≤ 0.
𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥

This leads to a natural definition of sensitivity that is one of the core contributions of this
thesis. I define the sensitivity of 𝑣𝑥 with respect to a decrease in the width of 𝑥 as:

𝜕𝑦 𝜕𝑦 𝜕𝑦 𝜕𝑦
sens(𝑣𝑥 ) = + − − . (3.3)
𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥

Implicitly, this formulation of sensitivity asserts that it is just as important to increase


the lower bound as it is to decrease the upper bound because all of the coefficients on the
derivatives have the same (unit) magnitude. Also note that sens(𝑣𝑥 ) ≥ 0 because of the
previous inequalities.

3.2.2 Sensitivity as a derivative

I will now build a function that explicitly relates a change in the width of the interval
corresponding to a vertex in the computation graph to the width of the output interval, giving
a scalar-valued function whose derivative determines sensitivity. Formally, the sensitivities
are the derivative of the composition of functions that take the derivative (𝐷𝑓 ) : R2 → R2×2
of an interval-valued function 𝑓 : R2 → R2 and transform it in terms of its directional
derivatives. The goal is to understand the decrease in the output interval width as a result
of decreasing the input interval width. This means directing perturbations in a positive
direction for lower bounds and a negative direction for upper bounds. The function with
respect to a specific input 𝑥 satisfying these properties is:

𝑧𝑥 (𝑡) := (𝑔 ∘ 𝑓 ∘ ℎ𝑥 )(𝑡) (3.4)

where 𝑔 : R2 → R, ℎ𝑥 : R → R2 with 𝑔(𝑥) = 𝑥 − 𝑥 and ℎ𝑥 (𝑡) = (𝑥 + 𝑡, 𝑥 − 𝑡). In words, 𝑔


computes the (negative) width of an interval, and ℎ𝑥 symmetrically decreases the width of

28
the interval 𝑥 by 𝑡. If 𝑦 = 𝑓 (𝑥), the derivative is:
⎡ ⎤
𝜕𝑦 𝜕𝑦
⎢ 𝜕𝑥 𝜕𝑥 ⎥
(𝐷𝑓 )𝑥 = ⎢ ⎥ (3.5)
𝜕𝑦 𝜕𝑦
⎣ ⎦
𝜕𝑥 𝜕𝑥

and the derivative of 𝑧𝑥 (𝑡) is:

⎡ ⎤⎡ ⎤
[1 −1] 𝜕𝑦 𝜕𝑦
1⎥
𝑑𝑧𝑥 ⎢ 𝜕𝑥 𝜕𝑥 ⎥⎢
= ⎢ ⎥⎢ ⎥. (3.6)
𝑑𝑡 ⎣ 𝜕𝑦 𝜕𝑦
⎦⎣ ⎦
𝜕𝑥 𝜕𝑥
−1

This evaluates exactly to the proposed sensitivity, meaning that if 𝑥 corresponds to the
vertex 𝑣𝑥 in the computation graph,

𝑑𝑧𝑥
= sens(𝑣𝑥 ).
𝑑𝑡

3.2.3 Introducing a cost model

The proposed sensitivity analysis (in Equation 3.3) implicitly assumes that decreasing the
width of an interval by an infinitesimal amount 𝛿 is just as costly when the interval width
is 100 as when the current interval width is 0.01. This assumption is often inaccurate. For
example, computing the first 𝑛 digits of 𝜋 using the Bailey–Borwein–Plouffe algorithm has
a computational complexity that is 𝑂(𝑛 log3 (𝑛)) [3]. Incorporating the property that it
requires more computational cost to refine narrower intervals may help to encourage cost-
efficient configurations for computations using sensitivities to guide refinement.

The cost-dependent sensitivity analysis for the vertex 𝑣𝑥 in the computation graph cor-
responding to 𝑥 is
[︂ ]︂ ⎡ ⎤
𝜕𝑦
𝑐1 (𝑥) 𝑐2 (𝑥) 𝑐3 (𝑥) 𝑐4 (𝑥) ⎢ 𝜕𝑥 ⎥
⎢ 𝜕𝑦 ⎥
⎢ ⎥
sens′ (𝑣𝑥 ) =
⎢ 𝜕𝑥 ⎥
⎢ 𝜕𝑦 ⎥ .
⎢ ⎥ (3.7)
⎢− ⎥
⎢ 𝜕𝑥 ⎥
⎣ ⎦
𝜕𝑦
− 𝜕𝑥

29
I provide a theoretical analysis of the cost function 𝑐 where 𝑐𝑖 (𝑥) = 𝑥 − 𝑥 with 𝑖 = 1, 2, 3, 4
in Section 3.4. This allows for per-operator cost functions that can model the difficulty of
refining different parts of the compute graph, which I expand upon in the Chapter 7.

3.3 A schedule using sensitivities


In the previous section, I gave a formal definition of sensitivities, and I now explore how
these sensitivities may be incorporated into schedules to produce faster computations. To
describe these schedules, I introduce the following terminology:

Definition 3.3.1. The most sensitive vertex is the vertex 𝑣 ∈ 𝑉 such that

𝑣 = arg max sens𝑘 (𝑤),


𝑤∈𝑉

where sens𝑘 is the sensitivity (as defined in Equation 3.3) of the given program evaluated at
the 𝑘th configuration.

Definition 3.3.2. The critical path 𝑃 (𝑘) is the path from the most sensitive vertex 𝑣 (with
ties broken arbitrarily) to the root for computation evaluated at the configuration 𝐶 (𝑘) .

Note that the sensitivities may change throughout the course of the computation due to
changes in values. Thus, the most sensitive vertex and critical path are parameterized by
the configuration.
Armed with this terminology, defining the schedule is quite straightforward. At each
iteration, the configuration is refined by a larger increment along the critical path than it is
in the rest of the computation. Explicitly, I define this schedule as

⎨𝑆 ′(𝑘) + 𝑎1 𝑏𝑘1 if 𝑣 ∈ 𝑃 (𝑘)


𝑣
𝑆𝑣′(𝑘+1) = (3.8)
⎩𝑆 ′(𝑘) + 𝑎2 𝑏𝑘2

otherwise

𝑣

where 𝑆 ′(0) , 𝑎1 , 𝑏1 , 𝑎2 , 𝑏2 dictate the behavior of the schedule. I call this the critical path
algorithm for precision refinement. In Chapter 5, I experimentally compare the baseline
(iRRAM) schedule and the proposed schedule.

30
3.4 Analysis

In this section, I compare the asymptotic behavior and theoretical properties of the uniform,
critical path, and cost-modeled schedules. Although the empirical results use multiple-
precision floating point, I use fixed point for the theoretical results because it is easier to
analyze. In particular, I consider numbers in the range [0, 1] in the form of fixed-point binary
numbers

2−𝑖 𝑏𝑖 ,
∑︁
𝑥=
𝑖=1

where 𝑏1 , 𝑏2 , ... ∈ {0, 1}.


Consider a computation of the form:

𝑛
∑︁
𝑦= 𝑎𝑖 𝑥𝑖 ,
𝑖=1

where 𝑎𝑖 is a constant and 𝑥𝑖 ∈ [0, 1] for all 𝑖. In this case, the sensitivities will be 2 for all
of the ; and 5 operators, 𝑎𝑖 for each 𝑥𝑖 , and not applicable for constants (because they are
assumed to be binary rationals at infinite precision e.g. 𝑎1 , 𝑎2 , . . . , 𝑎𝑛 ). I explore the case
where 𝑎𝑖 = 2−𝑖 . Figure 3-1 presents the computation graph of 𝑦.

5 5 ... 5

𝑎1 𝑥1 𝑎2 𝑥2 𝑎𝑛 𝑥𝑛

Figure 3-1: Example computation graph of a family of computations that can benefit from
the critical path algorithm. The critical path, which remains the same for all iterations, is
bolded for 𝑎𝑖 = 2−𝑖 .

I now introduce notation and properties that are useful in the analysis of different sched-
ules. Let [𝑛] denote the set {1, 2, . . . , 𝑛}.

Definition 3.4.1. Let 𝑝(𝑘)


𝑥𝑖 denote the precision of the variable 𝑥𝑖 on the 𝑘th iteration of a

given schedule.

31
Definition 3.4.2. Let 𝑤(𝑘) denote the width of the output (the error) on the 𝑘th iteration
of a given schedule.

The properties below are simple, but useful to refer to in the analysis that follows.

Fact 1. The largest width possible with 𝑘 bits of precision is 2−𝑘 .

Fact 2. Given a finite geometric series where the ratio between consecutive terms is 21 , the
∑︀𝑛
sum is 𝑖=𝑗 𝑡𝑖 = 2𝑡𝑗 (1 − 2𝑗−𝑛 ).

3.4.1 Uniform schedule

Assume that each refinement increments the precisions for each of the vertices in the com-
putation graph by 1. Formally, this means that for every 𝑖 ∈ [𝑛],

𝑝(𝑘)
𝑥𝑖 = 𝑘.

1
Since the error for each 𝑥𝑖 is the same, and is 2𝑘
(by Fact 1), it may be factored out of
∑︀𝑛 1
the summation. The other term contributing to the error is 𝑖=1 𝑎𝑖 = 1 − 2𝑛
(by Fact 2).
Therefore, the width of the output when using uniform refinement is

1 1
(︂ )︂
𝑤𝑢(𝑘) = 𝑘
1− 𝑛 . (3.9)
2 2

3.4.2 Critical path schedule

Figure 3-1 shows the critical path of the computation. Since the derivative for each of the
bounds of 𝑥𝑖 is 𝑎𝑖 , the most sensitive vertex (the one with the largest derivative) is 𝑥1 for
every refinement. I use a schedule where the configuration is incremented by 2 along the
critical path and by 1 everywhere else in the computation graph. As a result, if the first
refinement sets every variable and operator to one bit of precision, the precisions will follow
the equations:
𝑝(𝑘) (𝑘)
𝑥1 = 2𝑘 − 1, 𝑝𝑥𝑖 = 𝑘.

1 1
Again, by Fact 1, the widths of the intervals are 22𝑘−1
and 𝑤𝑥(𝑘)
𝑖
= 2𝑘
. Computing the output
interval width is then a matter of combining these terms with the corresponding coefficients

32
𝑎𝑖 = 2−𝑖 , giving rise to the formula:

𝑛
1 1 ∑︁ 1
𝑤𝑝(𝑘) = + .
22𝑘 2𝑘 𝑖=2 2𝑖

By Fact 1, it is straightforward to see that the width is

1 1 1
(︂ )︂
𝑤𝑝(𝑘) = 2𝑘
+ 𝑘+1 1 − 𝑛−1 . (3.10)
2 2 2

The result can be proven formally by induction.

3.4.3 Cost-modeled schedule

I analyze a schedule that uses the critical path algorithm with cost-modeled sensitivities,
and I call this the cost-modeled schedule. I define the cost-aware sensitivity as

sens′ (𝑣𝑥 ) = sens(𝑣𝑥 )(𝑥 − 𝑥),

where “sens” is defined in Equation 3.3. The sensitivities sens′ are a special case of the cost
model presented in Section 3.2.3.
The most sensitive vertex is the one with the smallest product of the sensitivity and
the interval width. I use the same schedule as in the previous algorithm, where for each
refinement, the configuration is incremented by 2 along the critical path and by 1 everywhere
𝑙(𝑙+1)
else in the computation graph. Let 𝑇𝑙 = 2
be the 𝑙th triangular number. Intuitively, the
refinement will proceed as follows:

1. The computation begins at precision 𝑝(1)


𝑥𝑖 = 1 for all 𝑖. The most sensitive vertex is 𝑥1 .

2. After one refinement, the precisions are 𝑝(2) (2)


𝑥1 = 3, 𝑝𝑥𝑖̸=1 = 2. 𝑥1 and 𝑥2 are equally

sensitive.

3. After two additional refinement steps, the precisions are 𝑝(4) (4)
𝑥1 = 6, 𝑝𝑥2 = 5, and

𝑝(4)
𝑥𝑖̸∈{1,2} = 4. 𝑥1 , 𝑥2 , and 𝑥3 are all equally sensitive.

33
4. After three additional refinement steps, the precisions are 𝑝(7) (7) (7)
𝑥1 = 10, 𝑝𝑥2 = 9, 𝑝𝑥3 =

8, 𝑝(7)
𝑥𝑖̸∈{1,2,3} = 7. 𝑥1 , 𝑥2 , 𝑥3 , and 𝑥4 are all equally sensitive.

..
.

𝑘. Assuming 𝑘 ≤ 𝑛, after 𝑘 additional refinement steps, the precisions are 𝑝(1+𝑇


𝑥1
𝑙) = 𝑇
𝑙+1 ,

𝑝(1+𝑇
𝑥2
𝑙) = 𝑇 (1+𝑇𝑙 )
𝑙+1 − 1, . . ., 𝑝𝑥𝑖̸∈[𝑙] = 𝑘, where 𝑙 is defined so that 𝑘 = 𝑇𝑙 + 1. 𝑥1 , 𝑥2 , . . . , 𝑥𝑙+1

are all equally sensitive.

Using these observations and applying Fact 1 and Fact 2, the formula is:

𝑙 1 1
(︂ )︂
𝑤𝑐(𝑇𝑙 +1) = + 1− , (3.11)
2𝑇𝑙+1 +1 2𝑇𝑙 +𝑙+1 2𝑛−𝑙

which can be confirmed using induction.

3.4.4 A comparison of schedules

In this section, I compare the three different schedules for a few specific values of 𝑛, which
∑︀𝑛
varies the number of terms in the summation 𝑖=1 𝑎𝑖 𝑥𝑖 , and I analyze the comparative
asymptotic performances of the different schedules.
I derive formulas for the widths of the output intervals as a function the number of terms
in the summation 𝑛 and the number of refinements 𝑘. I fix 𝑘 = 𝑇𝑛 + 1 because it is the
number of refinements at which all of the leaves of the cost-modeled schedule contribute the
same amount of error. This simplifies the expression for the width to

𝑛
𝑤𝑐(𝑇𝑛 +1) = .
2𝑇𝑛+1 +1

Table 3.1 shows the number of additional refinements needed for results to lie within the
error bound from the cost-modeled schedule at the 𝑇𝑛 +1th iteration. Even for relatively small
𝑛, it is clear that there is a significant practical advantage to using the cost-modeled schedule
over the uniform and critical path schedules. The critical path schedule also consistently
outperforms the uniform schedule.

34
Schedule 𝑛 = 5 𝑛 = 10 𝑛 = 15
Uniform 4 8 13
Crit. path 3 7 12

Table 3.1: The table shows the number of additional refinements needed for the uniform and
critical path schedules to lie within the error bound that the cost-modeled schedule on the
𝑇𝑛 + 1th iteration (𝑇5 + 1 = 16, 𝑇10 + 1 = 56, and 𝑇15 + 1 = 121).

I now study the limiting behavior of the ratio between the width of the cost-modeled
schedule and each of the other two schedules. I find that it is an exponentially better
schedule (note that smaller is better for widths) because

𝑤𝑐(𝑇𝑛 +1) 𝑤𝑐(𝑇𝑛 +1) 𝑛


lim (𝑇 +1)
= 𝑛→∞
lim (𝑇 +1)
= .
𝑛→∞
𝑤𝑝 𝑛 𝑤𝑢 𝑛 2𝑛

The single additional bit added along the critical path significantly improves the refinement
process for the cost-modeled schedule. This emphasizes the importance of using carefully
considered scheduling algorithms.
To a lesser extent, the critical path schedule outperforms the uniform schedule. The limit

𝑤𝑝(𝑇𝑛 +1) 1
lim =
𝑛→∞ (𝑇𝑛 +1) 2
𝑤𝑢

shows that the critical path schedule is a factor of two tighter than the uniform schedule at
the same refinement iteration. A single additional bit of precision per refinement leads to
halving the interval width globally.

35
36
Chapter 4

Automatic Differentiation of Interval


Arithmetic

In this chapter, I introduce automatic differentiation and detail both the implementation
and relevant analysis behind computing derivatives of interval code. The recent popularity
of deep learning led to a focus on efficient computation of derivatives. Indeed, the backprop-
agation algorithm, key to deep learning, is a special case of automatic differentiation [1, 32].
I begin with a brief overview of different approaches and some design considerations in the
efficient computation of derivatives.

4.1 Introduction to automatic differentiation

Automatic Differentiation (AD) enables efficient derivative computations and is commonly


used in machine learning to compute first- and second-order derivatives [4]. I will provide a
brief overview of AD and direct readers to [4, 19] for further description.
There are two chain-rule factorizations of derivatives that lead to two different realizations
of AD with different properties: forward-mode and reverse-mode.

Forward-mode AD Consider a differentiable function 𝑓 : R𝑁 → R𝑀 . Forward-mode AD


computes derivatives from the 𝑁 leaves of the computation graph up to the 𝑀 roots. This
simple choice of computing derivatives from the leaves to the roots means that the values of

37
the 𝑁 leaf derivatives are assigned at initialization. As a result, forward-mode AD computes
all of the 𝑀 output derivatives with respect to an assignment of input derivatives.

Reverse-mode AD Consider a differentiable function 𝑓 : R𝑁 → R𝑀 . In contrast to


forward-mode, reverse-mode AD computes derivatives from the 𝑀 roots of the computation
graph to the 𝑁 leaves. Reverse-mode AD reverses the dependencies, requiring an initializa-
tion for the 𝑀 output derivatives and computing the 𝑁 input derivatives. Derivatives with
respect to all inputs are computed with respect to an assignment of output derivatives.

4.2 Automatic differentiation on intervals

I decide to use reverse-mode AD because it computes from the outputs to the inputs. There-
fore, given an initialization for the two output derivatives, reverse-mode AD computes the
derivatives for all of the inputs and intermediate computations. In contrast, forward-mode
would require a forward-pass for each input (in the case of interval arithmetic, both the lower
and upper bound).

For simplicity, I do not use interval arithmetic to bound the error of the gradient com-
putation, but I do leave the gradients parameterized by precision for convenience and note
that my implementation can easily be extended to compute error-bounded gradients.

Figure 4-1 presents an implementation of the gradient computation code on intervals.


Each variable has an appropriate weight (the derivative of the term with respect to self )
assigned during the forward-pass based on the operation performed. The gradient is then the
sum of the product of the appropriate weight and the corresponding gradient, as given by the
chain rule. I provide an example implementation for interval addition. The careful reader
may notice that there is an unnecessary recomputation of the co-recursive call to grad in
_compute_grad . Indeed, my implementation caches computed gradients appropriately, but
these details are omitted in Figure 4-1 for simplicity.

38
def _compute_grad ( self , parents , rte ) :
grad = 0
for ( w1 , w2 ) , var in parents :
lower , upper = var . grad ()
grad_term = add ( mul ( w1 , lower , rte ) , mul ( w2 , upper , rte ) , rte )
grad = add ( grad_term , grad , rte )
return grad

def grad ( self ) :


rte = precision ( self . grad_precision ) + RoundTiesToEven
self . lower_grad = self . _compute_grad ( self . ad_lower_parents , rte )
self . upper_grad = self . _compute_grad ( self . ad_upper_parents , rte )
return self . lower_grad , self . upper_grad

Figure 4-1: Reverse-mode automatic differentiation on intervals.

Extracting sensitivities Generating the sensitivity for each variable specified in Equa-
tion 3.3 requires two simple but key steps. First, initialize the derivatives at the root

root.lower_grad, root.upper_grad = 1, -1 .

Then, for a vertex 𝑣 in the computation graph, I compute the sensitivity 𝑠𝑒𝑛𝑠(𝑣), described
in Equation 3.3:
v.lower_grad - v.upper_grad .

These correspond to the pre-composition with 𝑔 and post-composition with ℎ𝑥 that maps
the four partial derivatives in Equation 3.5 to the sensitivity. Explicitly,
⎡ ⎤
⎢ 1 ⎥
[︂ ]︂
1 −1 , ⎣ ⎦
−1

correspond to initializing the derivatives at the root and sensitivity assignment respectively
as they appear in Equation 3.6. The given code snippets constitute the implementation
naturally arising from Equation 3.6.

4.2.1 Derivative of interval addition

Building upon the explanation of interval addition in Section 2.1, I now show how to take
derivatives through ;𝑝 : R2 × R2 → R2 , the interval addition operator at precision 𝑝. Since

39
;𝑝 is monotonic increasing, the derivative of the lower bound of the output with respect
to the lower bound of either of the inputs is 1 and similarly, the derivative of the upper
bound of the output with respect to the upper bound of either of the inputs is 1. All other
derivatives are 0, with eight derivatives in total.
Explicitly, Figure 4-2 presents an implementation of interval addition with derivatives.
My implementation stores these derivatives and the corresponding object that are used in
the recursive calls to grad .

def __add__ ( self , lower_at_p , upper_at_p ) :


# Perform addition
left , right = self . parents
self . lower = add ( left . lower , right . lower , lower_at_p )
self . upper = add ( left . upper , right . upper , upper_at_p )

# Add derivative information


left . ad_lower_parents . append ((( 1 , 0 ) , self ) )
left . ad_upper_parents . append ((( 0 , 1 ) , self ) )
right . ad_lower_parents . append ((( 1 , 0 ) , self ) )
right . ad_upper_parents . append ((( 0 , 1 ) , self ) )

Figure 4-2: Interval addition with derivatives.

4.2.2 Derivative of interval multiplication

The derivative of interval multiplication is more difficult because the output interval is the
minimum and maximum of the set of pairwise products of the input intervals (as explained
in Section 2.2). The derivative computation involves identifying the terms that contribute
to the output and assigning the appropriate derivatives. I will only provide an example and
forego the implementation as it is detailed and adds little additional insight (it is in the
provided code).

Example 1. In this example, I assume error-free arithmetic to focus on the complexities


on differentiation. Recall that the computation 𝑥 5 𝑦 = 𝑧 expands to

[𝑥, 𝑥] 5 [𝑦, 𝑦] = [𝑧, 𝑧].

40
Consider the example:
[−1, 2] 5 [−4, 1] = [−8, 4].

The set of products 𝑆 = {−8, −1, 2, 4}. Since the 𝑧 = [min 𝑆, max 𝑆], the result is 𝑧 = [−8, 4].
There are four input terms {−1, 2, −4, 1} and two outputs −8, and 4, that lead to the eight
derivatives shown in Equation 4.1. Each derivative is provides an answer to the intuitive
question: “how much would a change in this input affect that output?”
(︁ )︁
𝜕𝑧 𝜕𝑧
Since −1 only contributes to 𝑧 where it is multiplied by −4, the derivatives ,
𝜕𝑥 𝜕𝑥
are
(0, −4). Similarly, 2 only contributes to 𝑧 and it is multiplied by −4, so the derivatives
(︁ )︁
𝜕𝑧 𝜕𝑧
,
𝜕𝑥 𝜕𝑥
are (−4, 0). Continuing in this way yields the eight derivatives

((0, −4), (−4, 0), (2, −1), (0, 0)),

corresponding to the derivatives


(︃(︃ )︃ (︃ )︃ (︃ )︃ (︃ )︃)︃
𝜕𝑧 𝜕𝑧 𝜕𝑧 𝜕𝑧 𝜕𝑧 𝜕𝑧 𝜕𝑧 𝜕𝑧
, , , , , , , . (4.1)
𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑦 𝜕𝑦 𝜕𝑦 𝜕𝑦

4.2.3 Derivative of interval sine

Building on the understanding of how to compute derivatives of ;𝑝 and 5𝑝 , I now briefly


cover how to compute the derivative of sin 𝑝 . It may help to take another look at the detailed
description of interval sine in Section 2.3 and to look at the definition of sin𝑝 in Equation 2.1.

41
Each of the cases for the derivative (assuming that 𝑥 − 𝑥 < 𝜋) is shown below





⎪((cos𝑝 𝑥, 0), (0, cos𝑝 𝑥)), for 𝑥 ⊂ (I ∪ IV)




𝑥), (cos𝑝 𝑥, 0)), for 𝑥 ⊂ (II ∪ III)




⎪((0, cos𝑝




⎨((cos𝑝 𝑥, 0), (0, 0)), for (𝑥 ⊂ (I ∪ II)) ∧ (sin𝑝 𝑥 < sin𝑝 𝑥)

sin𝑝
𝑑sin

= (4.2)
𝑑𝑥 ⎪
((0, 0), (cos𝑝 𝑥, 0)), for (𝑥 ⊂ (I ∪ II)) ∧ ¬(sin𝑝 𝑥 < sin𝑝 𝑥)








for (𝑥 ⊂ (III ∪ IV)) ∧ (sin𝑝 𝑥 > sin𝑝 𝑥)




⎪((0, cos𝑝 𝑥), (0, 0)),




⎩((0, 0), (0, cos𝑝 𝑥)), for (𝑥 ⊂ (III ∪ IV)) ∧ ¬(sin𝑝 𝑥 > sin𝑝 𝑥)

where the regions I, II, III, IV are those specified in Figure 2-1. If 𝑥 − 𝑥 ≥ 𝜋, my im-
plementation returns (0, 0), which is potentially too “loose” (because sine evaluated at an
interval with width 𝜋 may not span the whole range of sine), but is still sound because
over-approximation is acceptable for interval arithmetic.

4.3 Analysis
In this section, I introduce the mathematical challenges that arise from taking derivatives
through interval code and highlight some additional concerns in my implementation. Non-
differentiability is of particular concern. Addition on intervals 5 is differentiable, but mul-
tiplication on intervals 5 is only differentiable almost everywhere. For example, consider
[−2, 4] 5 [−4, 2] = [−16, 8]: the upper bound could either be from −2 × −4 or 4 × 2. Thus,
this computation is not differentiable, but most computations like the one shown in Exam-
ple 1 are differentiable. My implementation takes the derivative with respect to the selected
computation as a result of the nondeterministic choices (arising from the computation of
min and max with multiplicity at the extrema) made during the computation. Similarly,
sin is differentiable almost everywhere and is not differentiable, for example, at the interval
[−1, 2], which has a width of 3 and does not span [−1, 1].
Example 1 also exhibits dead-zones, where a set of inputs has a zero derivative. In
Example 1, where [−1, 2] 5 [−4, 1] = [−8, 4], “1” could be replaced with any 𝑎 in the open
interval (−4, 2) and produce the same result. Since none of these values of 𝑎 will contribute to

42
the output, they will have a derivative of (0, 0). Similarly, there is a dead-zone for all inputs
with a width greater than 3 for sin (the same is true for any definition for a width ≥ 2𝜋).
These dead-zones present a challenge for using derivatives as sensitivities. For example, all
of the derivatives are 0 for sin for a wide interval, indicating that it is not important to
decrease the interval widths. However, this is clearly not the case, as a non-infinitesimal
change (like the those used in refinement) may indeed yield a narrower interval (and a more
accurate result).
Together, dead-zones and non-differentiability present cases where this approach for us-
ing derivatives as sensitivities may fail. They also highlight some subtleties of computing
derivatives through interval code that may be worth further mathematical exploration and
analysis. Since computations “break ties” by making arbitrary non-deterministic choices, I
compute derivatives for every input (even at points that are technically not differentiable).
I now move on to establishing the benefit of computing these derivatives.

43
44
Chapter 5

Results

In this chapter, I present an experiment and provide empirical results demonstrating the
effectiveness of the critical path algorithm described in Chapter 3.
Consider the computation
𝑦 = 𝜋 + 2100000 𝑒, (5.1)

which is the motivating example from Section 1.1 depicted in Figure 1-1, except with 𝑘 =
2100000 . In this case, changing the precision at which 𝜋 is computed from 1 to 2 bits reduces
the output error by approximately 1, whereas for 𝑒, the same change in precision reduces
the output error by approximately 299999 .

5.1 Schedules

In this section, I define the baseline schedule and the critical path schedule that I will compare
empirically in Section 5.2 on the computation in Equation 5.1.

5.1.1 Baseline schedule

iRRAM uses a precision schedule 𝑆 that uniformly refines variables and operators with

𝑆𝑣(𝑘) = 𝑆𝑣(𝑘−1) + 50 · 1.25𝑘 ,

45
where 𝑆 (0) = 0. The computation is computed at configurations starting with 𝑆 (1) . Intu-
itively, the refinement process increases the precision of every variable and operator in the
program by 25% until the output error is within the error bound.

5.1.2 Critical path schedule

Now consider the alternate precision schedule 𝑆 ′ that sets variables and operators to different
precisions depending on whether or not they are on the critical path 𝑃 (𝑘) = {+, ×, 𝜋} for
every 𝑘. The instantiation of the critical path algorithm I use is

⎨𝑆 ′(𝑘) + 50 · 1.33𝑘 if 𝑣 ∈ 𝑃 (𝑘)


𝑣
𝑆𝑣′(𝑘+1) =
⎩𝑆 ′(𝑘) + 50 · 1.25𝑘

otherwise

𝑣

where 𝑆 ′(0) = 0. Notice that the precision refinements increase at a faster rate along the
critical path than in the rest of the program.

5.2 Empirical comparison


In this section, I present the experimental results from the implementation of Equation 5.1,
with the goal of comparing the baseline schedule and the critical path schedule. I begin
by comparing two configurations arising from the two schedules at the final iteration of
a computation satisfying the error bound of 10−12000 . Then I show how this affects the
schedules as a whole for the same error bound.

5.2.1 Improving a configuration

I compare the time it takes to run two precision configurations that constitute a mapping
from variables and operations (in this case {+, 𝜋, ×, 2100000 , 𝑒}) to precisions for the example
presented in Equation 5.1 and Figure 1-1. The error in the computation is the width of the
output interval.
Notice that in Table 5.1, the precisions along the critical path (which is {+, ×, 𝑒} because
𝜕𝑦
𝜕𝑒
has the largest derivative) for 𝑆 ′(24) are higher than the precisions in 𝑆 (29) , while the

46
Configuration + 𝜋 × 2100000 𝑒
𝑆 (29) 129046 129046 129046 129046 129046
𝑆 ′(24) 142047 42151 142047 42151 142047

Table 5.1: The table presents a comparison of the precisions generated on the 29th iteration
of the baseline schedule, 𝑆 (29) , and the 24th configuration of the critical path schedule, 𝑆 ′(24) .

Configuration Error Time(sec)


𝑆 (29) 1.5 · 10−8743 0.037
𝑆 ′(24) 3.1 · 10−12657 0.027

Table 5.2: The table presents a comparison of the first configurations satisfying the error
bound of 10−12000 for the baseline and critical path schedules.

variables not on the critical path have a lower precision. Furthermore, note from Table 5.2
that using the configuration 𝑆 ′(24) has more output error than 𝑆 (29) . This means that using
the critical path schedule produces a superior final configuration with a speed increase of
roughly 37% and higher output accuracy.

5.2.2 Improving a schedule

The amount of total computation of a schedule can be understood in terms of the number
of configurations computed and the time to compute each configuration. In the previous
section, I demonstrate that computing using the critical path algorithm requires less time
on the final configuration and fewer refinement steps. Since this is the case, it follows that
the schedule as a whole will take less total time as well.

Let 𝑡(𝑆 (𝑘) ) denote the time it takes to run the schedule 𝑆 for 𝑘 iterations (i.e. to run all
of the configurations 𝑆 (1) , 𝑆 (2) , . . . , 𝑆 (𝑘) ). Continuing this example, it is clear that the effect
of using the critical path algorithm is even more pronounced at the schedule level. I find
𝑡(𝑆 (29) ) = 0.163s and 𝑡(𝑆 ′(24) ) = 0.112s, while the error on the last iteration is as shown in
Table 5.2. This means that this approach produces a 45% speed increase with higher output
accuracy.

47
5.3 Implementation
I implement a push-based system of arbitrary-precision arithmetic that uses interval arith-
metic and computes derivatives using reverse-mode automatic differentiation. The imple-
mentation is in Python and uses the bigfloat wrapper for MPFR for multiple-precision
floating-point computations [14]. The implementation is available at https://github.com/
psg-mit/fast_reals.

Implementation challenges I encountered three core challenges in the implementation:

1. Caching – computations are cached automatically by MPFR.

2. Inconsistent performance – at low precisions, performance characteristics are erratic.

3. Timing error – simple programs are in the range of timing error.

I solve challenge 1 by running each experiment in a separate thread – allowing caching, but
only within each run and not among runs. I resolve challenges 2 and 3 by setting small
enough tolerances that durations are large enough to be easily measurable.

48
Chapter 6

Related Work

Related work aims to improve the performance of arbitrary-precision arithmetic with careful
software implementations, by restructuring computations for efficiency, and by caching [23,
25, 30]. Due to performance concerns, interval arithmetic and arbitrary-precision arithmetic
have not yet been widely adopted. However, interval and arbitrary-precision arithmetic has
uses in robotics and more generally for global optimization, and has been in the proof of the
Kepler conjecture and the Lorenz attractor in 3D [17, 21, 34].

Implementations of arbitrary-precision arithmetic often rely on interval arithmetic, which


researchers are working to accelerate by introducing further standardization – namely IEEE-
1788 – and by creating specialized computer architectures [13, 24]. I take a mixed-precision
approach, which would see significant improvements with hardware support. For example,
field-programmable gate arrays (FPGAs) have been used for mixed-precision computation
in other applications [16, 29].

I tackle the problem of allocating precisions to variables and operators in expressions in


arbitrary-real computations. Although I do not know of other work addressing this particular
problem, mixed-precision tuning shares similar concerns and trade-offs. I draw inspiration
from prior work this the topic. In Chapter 4, I present my approach to automatic dif-
ferentiation for interval arithmetic, where I include appropriate references as they arise. I
now provide a background of various approaches to sensitivity analysis and to implementing
arbitrary-precision arithmetic.

49
6.1 Mixed-precision tuning and sensitivity analysis

Mixed-precision tuning of floating-point computations involves assigning variables in a pro-


gram different floating-point precisions (e.g. float32 versus float64). The goal is is to
minimize run-time, space, etc. while respecting a bound on the output error.

The numerical tuning approaches providing error bounds often use SMT-solvers and
restrict the inputs to interval ranges [6, 7, 9, 10, 11]. One common way to use sensitivity
analysis techniques (which measure the effect that changing an input parameter has on the
output) is to produce annotations that identify operations requiring high precision, while
satisfying an error bound [31, 33]. For example, Hwang et. al. use automatic differentiation
to produce a sensitivity analysis of air quality models [20].

6.2 Arbitrary-precision arithmetic

I present a new categorization of two high-level approaches to arbitrary-precision arithmetic.


The pull-based approach propagates error bounds recursively from the output to each of the
sub-expressions, requiring a single pass through the computation but often producing overly-
precise results. I call this approach pull-based because error flows down from the root of
the computation graph to the leaves. The push-based approach computes the corresponding
error at precisions set arbitrarily and refines results at increasingly high precisions until the
given error bound is satisfied. Each pass through the computation uses interval arithmetic,
which computes error bounds by “pushing” bounds from the leaves of the computation graph
up to the root. The push-based approach sometimes requires multiple passes through the
computation, but it can potentially avoid producing unnecessarily-precise results. A survey
of the techniques to implementing arbitrary-precision arithmetic is presented in [15]. Due
to looseness in the analysis of specifying error thresholds, there seems to be some consensus
that the push-based approach is faster [15, 23, 25, 26, 30].

50
6.2.1 Pull-based approaches

A representative pull-based approach is to use binary Cauchy sequences and to evaluate


sub-expressions at higher precisions, ensuring a single pass from the root to the leaves is
required [27]. Concretely, a real number is represented by an infinite sequence of integers in
some base and a denominator that increases exponentially by that base at every iteration.
In the binary case, for an integer sequence {𝑧𝑘 }, the corresponding real number is the limit
of the sequence 𝑧𝑘 2−𝑘 . There have been efforts to accelerate this approach by caching results
even at different precisions [23]. Unfortunately, the pull-based approach often computes
overly-precise results and, especially as expression-size scales, has worse overall performance
than push-based approaches (based upon the results of the CCA 2000 competition) [5].

6.2.2 Push-based approaches

iRRAM In iRRAM [30], intervals are iteratively, globally refined with a uniform precision
for each node in the computation tree to yield a result with the desired precision. In terms of
relevant optimizations, iRRAM supports by-hand labeling of specific parts of a computation
as more sensitive and thus computing them with higher precision than the rest of the pro-
gram. Müller evaluates iRRAM by showing its performance in computing simple arithmetic
√︁
(e.g. 1/3, log(1/3)), iterative functions (the logistic map 𝑥𝑖 = 3.75𝑥𝑖−1 (1 − 𝑥𝑖−1 )), and
inverting the Hilbert Matrix. Since these computations have a computation graph with few
or no branches, I would not expect significant speed increases using my proposed approach
on this set of benchmarks.

RealLib Lambov [25] aims to make low-precision arbitrary-precision arithmetic compara-


ble to floating point in terms of speed. Their core insight is that sometimes using a pull-based
approach is faster on small subtrees of the computation graph. They provide a programming
model that allows users to have pull-based sub-expressions within the computation in the
overall push-based computation. At high precisions, they find that iRRAM generally outper-
forms RealLib, but in the low-precision regime on particular computations, their approach
yields orders-of-magnitude faster results than iRRAM, even giving speeds comparable to

51
floating point in some cases. They also accelerate their computation by caching results, so
that if they appear in multiple places, the expression may be reused.

52
Chapter 7

Discussion and Future Work

In this chapter, I introduce some additional, preliminary benchmark results, reflect upon
some ways future researchers may improve precision refinement, and I lay out a new appli-
cation of the techniques developed in this thesis to experimental research.

7.1 Benchmarks

Currently, there is not a comprehensive benchmark suite for arbitrary-precision numerical


tasks with significant branching in the computation graph. The CCA benchmarks are mostly
computations that have little branching, such as arctan 1050 . The FPBench benchmark
suite for scientific computations has more programs with branched computation graphs,
but the inputs are defined over intervals without a clear arbitrary-precision translation [8].
Furthermore, there are few arbitrary-precision constants such as 𝑒, 𝜋, ℎ, etc. A comprehensive
collection of scientific computations that rely on high-precision inputs and constants would
help compare future work aim to speeding up arbitrary-precision arithmetic.

Methodology The benchmarks have variables that come from an interval range, constants
specified as floats, and operators. I implement a stream-based uniform random sampler that
selects a point from the interval range and provides an arbitrary-precision sample. Due to
the heavy use of random sampling, it is relatively computationally expensive to increase
precisions. The constants are left as-is and the precision remains the same throughout the

53
computation. The operations are replaced with their arbitrary-precision equivalents. For
simplicity, I use a subset of the FPBench benchmarks that does not have loops or other
language primitives that are possible to support [8].

Preliminary results Table 7.1 shows the speedup from using the critical path schedule
(Section 5.1.2) instead of the baseline schedule (Section 5.1.1). The parameters for the
critical path schedule are the same as the parameters in the experiments in Section 5. The
results are comparable, if not slightly worse, using the critical path algorithm for all of the
benchmarks except verhulst. Looking at the underlying computations, verhulst is the only
benchmark that has a single clear choice of critical path and benefits with a 2.12x speedup.
For the other computations, the critical path remains the same throughout the computation
and thus, that path is over-refined. In other words, the refinement results in a little extra
computation with little benefit in output precision. This problem is compounded by the
methodology where more digits of variables are sampled on-the-fly, which is computationally
difficult. The benefits of the critical path schedule on general purpose computation are
limited by the lack of per-variable and per-operation cost modeling and by the simplicity of
the algorithm. I discuss ways to broaden the applicability of this technique and to extend it
in Section 7.2.

7.2 Further improving precision refinement


The critical path algorithm is a new approach to precision refinement that inspires a number
of questions and opens up many directions for future research.

7.2.1 Per-primitive cost modeling

The sensitivity analysis that I focus on in this thesis does not take into account the compu-
tational difficulty of refining different variables and operations. For example, generating the
10th digit of precision for 𝜋 will generally require more compute than for a sum. Operators
will require different amounts of compute that will scale differently with 𝑝. These differ-
ences can be accounted for by incorporating them into a cost model like the one presented

54
Benchmarks # Ops Speedup
carbon gas 15 0.97
doppler1 11 1.0
doppler2 11 0.99
doppler3 11 0.97
jetEngine 28 0.92
predPrey 7 0.93
rigidbody1 11 0.96
rigidbody2 13 0.93
sine 11 0.91
sineOrder3 6 0.96
sqroot 12 0.97
turbine1 16 0.92
turbine2 13 0.96
turbine3 16 0.96
verhulst 5 2.12

Table 7.1: The figure presents the FPBench benchmark results. “#Ops” is the number of
variables and operations in the computation. The “Speedup” is the ratio of the time it takes
to respect an error bound of 10−12000 using the critical path schedule (Section 5.1.2) and the
baseline schedule (Section 5.1.1).

in Section 3.2.3. Defining the cost model could be done theoretically (by hand-coding the
asymptotic behavior of each variable and operator) or empirically (by collecting data for
each of the variables and operators and modeling the observed behavior).

7.2.2 Unexplored trade-offs in precision refinement

I think there an opportunity to explore schedules that grow at asymptotically different


rates. For example, applications where computing extra digits of sub-computations requires
significant resources, a schedule that grows linearly rather than exponentially is likely to
perform better (more iterations until reaching the error bound, but less overshoot). On the
other hand, for applications where computing extra digits of sub-computations requires little
additional compute, a schedule that grows super-exponentially rather than exponentially is
likely to perform better (fewer iterations until satisfying the error bound). An important
characteristic is that these various refinement rates are application dependent, which makes
the development of more comprehensive benchmarks all the more important (as I argue in

55
Section 7.1).

7.2.3 Generalizing the critical path algorithm

The critical path algorithm refines uniformly across the computation except along the critical
path. This algorithm is relatively easy to implement and analyze with respect to alternate
algorithms that may empirically perform better. For example, consider an algorithm that
uses the sensitivities to refine at different rates along each path in the computation graph
(from the root to the leaf) based on the sensitivity of each of the leaves. Understanding the
degree to which to refine each of these paths is an open problem that could lead to significant
speed improvements, since the effect of the critical path algorithm will be compounded along
all paths simultaneously.

7.3 New applications to experimental research


Experimental research may provide an excellent future direction for using sensitivity analysis
in interval code. An experiment may yield a set of variables with measurement error that is
naturally represented with intervals [28]. Conclusive experimental results require certainty,
so minimizing the error in the output is important. This can be efficiently computed with
interval arithmetic and automatic differentiation over these intervals. The scientist may
wonder, “what parameters should be measured with higher precision in order to produce the
greatest increase in the accuracy of the results?” The sensitivity defined in Equation 3.3 is
one answer to this question. Scientists may also have a metric of how much effort it takes to
measure different parameters, which can be incorporated into the sensitivity analysis using
a cost model as presented in Section 3.2.3.

56
Chapter 8

Conclusions

This thesis explores an opportunity to improve precision refinements in implementations of


arbitrary-precision arithmetic. I introduce the critical path algorithm as a way to guide
precision refinements using a sensitivity analysis. This new sensitivity analysis uses novel,
efficiently computed, derivatives of interval code. I describe some of the challenges of im-
plementing reverse-mode automatic differentiation through intervals and provide an analysis
of the properties of these derivatives. I provide a system that implements the critical path
algorithm for arbitrary-arithmetic programs and demonstrate that the algorithm can speed
up computation. There there are many opportunities for applying automatic differentiation
through interval code and for improving precision-refinement algorithms that I hope future
research will explore.

57
58
Bibliography

[1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig
Citro, Gregory S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghe-
mawat, Ian J. Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia,
Rafal Józefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Ra-
jat Monga, Sherry Moore, Derek Gordon Murray, Chris Olah, Mike Schuster, Jonathon
Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul A. Tucker, Vincent Van-
houcke, Vijay Vasudevan, Fernanda B. Viégas, Oriol Vinyals, Pete Warden, Martin
Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale
machine learning on heterogeneous distributed systems. CoRR, 2016.

[2] Matthias Althoff and Dmitry Grebenyuk. Implementation of interval arithmetic in


CORA 2016. In ARCH@CPSWeek, 2016.

[3] David Bailey, Peter Borwein, and Simon Plouffe. On the Rapid Computation of Various
Polylogarithmic Constants. 1997.

[4] Atilim Günes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jef-
frey Mark Siskind. Automatic differentiation in machine learning: A survey. Journal of
Machine Learning Research, 2017.

[5] Jens Blanck. Exact real arithmetic systems: Results of competition. In Computability
and Complexity in Analysis, 2001.

[6] Wei-Fan Chiang, Mark Baranowski, Ian Briggs, Alexey Solovyev, Ganesh Gopalakrish-
nan, and Zvonimir Rakamarić. Rigorous floating-point mixed-precision tuning. SIG-
PLAN Not., 2017.

[7] Wei-Fan Chiang, Mark Baranowski, Ian Briggs, Alexey Solovyev, Ganesh Gopalakrish-
nan, and Zvonimir Rakamarić. Rigorous floating-point mixed-precision tuning. ACM
SIGPLAN Notices, 2017.

[8] Nasrine Damouche, Matthieu Martel, Pavel Panchekha, Jason Qiu, Alex Sanchez-Stern,
and Zachary Tatlock. Toward a standard benchmark format and suite for floating-point
analysis. 2016.

[9] Eva Darulova, Anastasiia Izycheva, Fariha Nasir, Fabian Ritter, Heiko Becker, and
Robert Bastian. Daisy - framework for analysis and optimization of numerical programs
(tool paper). In TACAS, 2018.

59
[10] Eva Darulova and Viktor Kuncak. Sound compilation of reals. Principles of Program-
ming Languages, 2014.

[11] Eva Darulova and Viktor Kuncak. Towards a compiler for reals. ACM Trans. Program.
Lang. Syst., 2017.

[12] A. Di Franco, H. Guo, and C. Rubio-González. A comprehensive study of real-world


numerical bug characteristics. In International Conference on Automated Software En-
gineering, 2017.

[13] W. Edmonson and G. Melquiond. IEEE interval standard working group - P1788:
Current status. In Symposium on Computer Arithmetic, 2009.

[14] Laurent Fousse, Guillaume Hanrot, Vincent Lefèvre, Patrick Pélissier, and Paul Zimmer-
mann. MPFR: A multiple-precision binary floating-point library with correct rounding.
ACM Trans. Math. Softw., 2007.

[15] Paul Gowland and David Lester. A survey of exact arithmetic implementations. In
Computability and Complexity in Analysis, 2001.

[16] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Deep
learning with limited numerical precision. In International Conference on Machine
Learning, 2015.

[17] Thomas Hales. A proof of the Kepler conjecture. Annals of Mathematics, 2005.

[18] T. Hickey, Q. Ju, and M. H. Van Emden. Interval arithmetic: From principles to
implementation. J. ACM, 2001.

[19] Philipp H. Hoffmann. A hitchhiker’s guide to automatic differentiation. Numerical


Algorithms, 2016.

[20] Dongming Hwang, Daewon W. Byun, and M. [Talat Odman]. An automatic differen-
tiation technique for sensitivity analysis of numerical advection schemes in air quality
models. Atmospheric Environment, 1997.

[21] Luc Jaulin and Benoît Desrochers. Introduction to the algebra of separators with ap-
plication to path planning. Engineering Applications of Artificial Intelligence, 2014.

[22] E. Kaucher. Interval Analysis in the Extended Interval Space IR. 1980.

[23] Hideyuki Kawabata. Speeding up exact real arithmetic on fast binary cauchy sequences
by using memoization based on quantized precision. In Journal of Information Process-
ing, 2017.

[24] Reinhard Kirchner and Ulrich W. Kulisch. Hardware support for interval arithmetic.
Reliable Computing, 2006.

[25] Branimir Lambov. RealLib: An efficient implementation of exact real arithmetic. In


Mathematical Structures in Computer Science, 2007.

60
[26] Yong Li and Yong Jun-Hai. Efficient exact arithmetic over constructive reals. In The
4th Annual Conference on Theory and Applications of Models of Computation, 2007.

[27] Valérie Ménissier-Morain. Arbitrary precision real arithmetic: design and algorithms.
The Journal of Logic and Algebraic Programming, 2005.

[28] Ramon E. Moore, R. Baker Kearfott, and Michael J. Cloud. First Applications of
Interval Arithmetic, chapter 3, pages 19–29.

[29] Duncan J.M Moss, Srivatsan Krishnan, Eriko Nurvitadhi, Piotr Ratuszniak, Chris
Johnson, Jaewoong Sim, Asit Mishra, Debbie Marr, Suchit Subhaschandra, and
Philip H.W. Leong. A customizable matrix multiplication framework for the Intel
HARPv2 Xeon+FPGA platform: A deep learning case study. In International Sympo-
sium on Field-Programmable Gate Arrays, 2018.

[30] Norbert Th. Müller. The iRRAM: Exact arithmetic in C++. In Computability and
Complexity in Analysis, 2000.

[31] B. Nongpoh, R. Ray, S. Dutta, and A. Banerjee. AutoSense: A framework for automated
sensitivity analysis of program data. IEEE Transactions on Software Engineering, 2017.

[32] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary
DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic
differentiation in PyTorch. In NIPS-W, 2017.

[33] Pooja Roy, Rajarshi Ray, Chundong Wang, and Weng Fai Wong. ASAC: Automatic
sensitivity analysis for approximate computing. Conference on Languages, Compilers
and Tools for Embedded Systems, 2014.

[34] Warwick Tucker. A rigorous ODE solver and Smale’s 14th problem. Foundations of
Computational Mathematics, 2002.

61

You might also like