Mazur Barry Prime Numbers and The Riemanns Hypothesis PDF
Mazur Barry Prime Numbers and The Riemanns Hypothesis PDF
and the
Riemann Hypothesis
Barry Mazur
William Stein
1
Contents
Preface 5
4 Sieves 22
15 Cesàro smoothing 52
2
CONTENTS 3
II Distributions 78
25 Slopes of graphs that have no slopes 79
26 Distributions 86
29 Trigonometric series 97
Endnotes 141
Preface
There are several full-length books recently published, written for a general
audience, that have the Riemann Hypothesis as their main topic. A reader
of these books will get a fairly rich picture of the personalities engaged in the
pursuit, and of related mathematical and historical issues.1
This is not the mission of the book that you now hold in your hands. We
aim—instead—to explain, in as direct a manner as possible and with the least
mathematical background required, what this problem is all about and why it
is so important. For even before anyone proves this hypothesis to be true (or
false!), just getting familiar with it, and with some of the ideas behind it, is
exciting. Moreover, this hypothesis is of crucial importance in a wide range of
mathematical fields; for example, it is a confidence-booster for computational
mathematics: even if the Riemann Hypothesis is never proved, assuming its
truth (and that of closely related hypotheses) gives us an excellent sense of how
long certain computer programs will take to run, which, in some cases, gives us
the assurance we need to initiate a computation that might take weeks or even
months to complete.
1 See, e.g., The Music of the Primes by Marcus du Sautoy (2003) and Prime Obsession:
Bernhard Riemann and the Greatest Unsolved Problem in Mathematics by John Derbyshire
(2003).
5
6 CONTENTS
Here is how the Princeton mathematician Peter Sarnak describes the broad
impact the Riemann Hypothesis has had2 :
2 See page 222 of The Riemann hypothesis: the greatest unsolved problem in mathematics
Mathematics is flourishing. Each year sees new exciting initiatives that extend
and sharpen the applications of our subject, new directions for deep exploration—
and finer understanding—of classical as well as very contemporary mathemat-
ical domains. We are aided in such explorations by the development of more
and more powerful tools. We see resolutions of centrally important questions.
And through all of this, we are treated to surprises and dramatic changes of
viewpoint; in short: marvels.
And what an array of wonderful techniques allow mathematicians to do their
work: framing definitions; producing constructions; formulating analogies re-
lating disparate concepts, and disparate mathematical fields; posing conjectures,
that cleanly shape a possible way forward; and, the keystone: providing unas-
sailable proofs of what is asserted, the idea of doing such a thing being itself one
of the great glories of mathematics.
Number theory has its share of this bounty. Along with all these modes of
theoretical work, number theory also offers the pure joy of numerical experi-
mentation, which—when it is going well—allows you to witness the intricacy of
CONTENTS 9
numbers and profound inter-relations that cry out for explanation. It is strik-
ing how little you actually have to know in order to appreciate the revelations
offered by numerical exploration.
Our book is meant to be an introduction to these pleasures. We take an exper-
imental view of the fundamental ideas of the subject buttressed by numerical
computations, often displayed as graphs. As a result, our book is profusely
illustrated, containing 131 figures, diagrams, and pictures that accompany the
text.4
There are few mathematical equations in Part I. This first portion of our book
is intended for readers who are generally interested in, or curious about, math-
ematical ideas, but who may not have studied any advanced topics. Part I is
devoted to conveying the essence of the Riemann Hypothesis and explaining
why it is so intensely pursued. It requires a minimum of mathematical knowl-
edge, and does not, for example, use calculus, although it would be helpful to
know—or to learn on the run—the meaning of the concept of function. Given
its mission, Part I is meant to be complete, in that it has a beginning, middle,
and end. We hope that our readers who only read Part I will have enjoyed the
excitement of this important piece of mathematics.
Part II is for readers who have taken at least one class in calculus, possibly
a long time ago. It is meant as a general preparation for the type of Fourier
analysis that will occur in the later parts. The notion of spectrum is key.
Part III is for readers who wish to see, more vividly, the link between the
placement of prime numbers and (what we call there) the Riemann spectrum.
Part IV requires some familiarity with complex analytic functions, and returns
to Riemann’s original viewpoint. In particular it relates the “Riemann spec-
trum” that we discuss in Part III to the nontrivial zeroes of the Riemann zeta
function. We also provide a brief sketch of the more standard route taken by
published expositions of the Riemann Hypothesis.
The end-notes are meant to link the text to references, but also to provide
more technical commentary with an increasing dependence on mathematical
background in the later chapters. References to the end notes will be in brackets.
We wrote our book over the past decade, but devoted only one week to it each
year (a week in August). At the end of our work-week for the book, each year,
we put our draft (mistakes and all) on line to get response from readers.5 We
4 We created the figures using the free SageMath software (see http://www.sagemath.org).
Complete source code is available, which can be used to recreate every diagram in this book
(see http://wstein.org/rh). More adventurous readers can try to experiment with the pa-
rameters for the ranges of data illustrated, so as to get an even more vivid sense of how the
numbers “behave.” We hope that readers become inspired to carry out numerical experimen-
tation, which is becoming easier as mathematical software advances.
5 See http://library.fora.tv/2014/04/25/Riemann_Hypothesis_The_Million_Dollar_
Challenge which is a lecture—and Q & A—about the composition of this book.
10 CONTENTS
6 Including Dan Asimov, Bret Benesh, Keren Binyaminov, Harald Bögeholz, Louis-Philippe
Chiasson, Keith Conrad, Karl-Dieter Crisman, Nicola Dunn, Thomas Egense, Bill Gosper,
Andrew Granville, Shaun Griffith, Michael J. Gruber, Robert Harron, William R. Hearst
III, David Jao, Fredrik Johansson, Jim Markovitch, David Mumford, James Propp, Andrew
Solomon, Dennis Stein, and Chris Swenson.
Part I
11
Chapter 1
If we are to believe the ancient Greek philosopher Aristotle, the early Pythagore-
ans thought that the principles governing Number are “the principles of all
things,” the concept of Number being more basic than earth, air, fire, or water,
which were according to ancient tradition the four building blocks of matter.
To think about Number is to get close to the architecture of “what is.”
So, how far along are we in our thoughts about numbers?
The French philosopher and mathematician René Descartes, almost four cen-
turies ago, expressed the hope that there soon would be “almost nothing more
12
13
Figure 1.2: Jean de Bosschere, “Don Quixote and his Dulcinea del Toboso,”
from The History of Don Quixote De La Mancha, by Miguel De Cervantes.
Trans. Thomas Shelton. George H. Doran Company, New York (1923).
Numbers are obstreperous things. Don Quixote encountered this when he re-
quested that the “bachelor” compose a poem to his lady Dulcinea del Toboso,
the first letters of each line spelling out her name. The “bachelor” found2
“It must fit in, however you do it,” pleaded Quixote, not willing to grant the
imperviousness of the number 17 to division.
1 See Weinberg’s book Dreams of a Final Theory: The Search for the Fundamental Laws
Primes as atoms. To begin from the beginning, think of the operation of mul-
tiplication as a bond that ties numbers together: the equation 2 × 3 = 6 invites
us to imagine the number 6 as (a molecule, if you wish) built out of its smaller
constituents 2 and 3. Reversing the procedure, if we start with a whole number,
say 6 again, we may try to factor it (that is, express it as a product of smaller
whole numbers) and, of course, we would eventually, if not immediately, come
up with 6 = 2 × 3 and discover that 2 and 3 factor no further; the numbers 2
and 3, then, are the indecomposable entities (atoms, if you wish) that comprise
our number.
6
2 3
15
16 CHAPTER 2. WHAT ARE PRIME NUMBERS?
not prime, so can be themselves factored, and in each case after changing the
ordering of the factors we arrive at:
12 = 2 × 2 × 3.
If you try to factor the number 300, there are many ways to begin:
300 = 30 × 10 or 300 = 6 × 50
and there are various other starting possibilities. But if you continue the fac-
torization (“climbing down” any one of the possible “factoring trees”) to the
bottom, where every factor is a prime number as in Figure 2.2, you always end
up with the same collection of prime numbers1 :
300 = 22 × 3 × 52 .
300 300
3 100 20 15
10 10 2 10 5 3
2 5 2 5 2 5
Figure 2.2: Factor trees that illustrate the factorization of 300 as a product of
primes.
6469693230
690 9376367
30 23 13 721259
3 10 1309 551
2 5 119 11 19 29
17 7
Figure 2.3: Factorization tree for the product of the primes up to 29.
1 See
Section 1.1 of Stein’s Elementary Number Theory: Primes, Congruences, and Secrets
(2008) at http://wstein.org/ent/ for a proof of the “fundamental theorem of arithmetic,”
which asserts that every positive whole number factors uniquely as a product of primes.
17
The Riemann Hypothesis probes the question: how intimately can we know
prime numbers, those atoms of multiplication? Prime numbers are an important
part of our daily lives. For example, often when we visit a website and purchase
something online, prime numbers having hundreds of decimal digits are used to
keep our bank transactions private. This ubiquitous use to which giant primes
are put depends upon a very simple principle: it is much easier to multiply
numbers together than to factor them. If you had to factor, say, the number
391 you might scratch your head for a few minutes before discovering that 391 is
17 × 23. But if you had to multiply 17 by 23 you would do it straightaway. Offer
two primes, say, P and Q each with a few hundred digits, to your computing
machine and ask it to multiply them together: you will get their product N =
P × Q with its hundreds of digits in about a microsecond. But present that
number N to any current desktop computer, and ask it to factor N , and the
computer will (almost certainly) fail to do the task. See [1] and [2].
The safety of much encryption depends upon this “guaranteed” failure!2
If we were latter-day number-phenomenologists we might revel in the discovery
and proof that
is a prime number, this number having 12,978,189 digits! This prime, which
was discovered on August 23, 2008 by the GIMPS project,3 is the first prime
ever found with more than ten million digits, though it is not the largest prime
currently known.
Now 243,112,609 − 1 is quite a hefty number! Suppose someone came up to you
saying “surely p = 243,112,609 − 1 is the largest prime number!” (which it is not).
How might you convince that person that he or she is wrong without explicitly
exhibiting a larger prime? [3]
Here is a neat—and, we hope, convincing—strategy to show there are prime
numbers larger than p = 243,112,609 − 1. Imagine forming the following humon-
gous number: let M be the product of all prime numbers up to and including
p = 243,11,2609 − 1. Now go one further than M by taking the next number
N = M + 1.
OK, even though this number N is wildly large, it is either a prime number
itself—which would mean that there would indeed be a prime number larger
than p = 243,112,609 − 1, namely N ; or in any event it is surely divisible by some
prime number, call it P .
Here, now, is a way of seeing that this P is bigger than p: Since every prime
number smaller than or equal to p divides M , these prime numbers cannot
2 Nobody has ever published a proof that there is no fast way to factor integers. This is an
4 The sequence of prime numbers we find by this procedure is discussed in more detail with
was declared by Time Magazine to be one of the top 50 best “inventions” of 2008: http://www.
time.com/time/specials/packages/article/0,28804,1852747_1854195_1854157,00.html.
19
Prime numbers come in all sorts of shapes, some more convenient to deal with
than others. For example, the number we have been talking about,
p = 243,112,609 − 1,
is given to us, by its very notation, in a striking form; i.e., one less than a power
of 2. It is no accident that the largest “currently known” prime number has
such a form. This is because there are special techniques we can draw on to
show primality of a number, if it is one less than a power of 2 and—of course—if
it also happens to be prime. The primes of that form have a name, Mersenne
Primes, as do the primes that are one more than a power of 2, those being called
Fermat Primes. [4]
Here are two exercises that you might try to do, if this is your first encounter
with primes that differ from a power of 2 by 1:
20
21
Not all numbers of the form 2prime number − 1 or of the form 2power of two + 1
are prime. We currently know only finitely many primes of either of these
forms. How we have come to know what we know is an interesting tale. See,
for example, http://www.mersenne.org/.
Chapter 4
Sieves
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26,
for example, start by circling the 2 and crossing out all the other multiples of 2.
Next, go back to the beginning of our sequence of numbers and circle the first
number that is neither circled nor crossed out (that would be, of course, the
3), then cross out all the other multiples of 3. This gives the pattern: go back
again to the beginning of our sequence of numbers and circle the first number
that is neither circled nor crossed out; then cross out all of its other multiples.
Repeat this pattern until all the numbers in our sequence are either circled, or
crossed out, the circled ones being the primes.
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
In Figures 4.1–4.4 we use the primes 2, 3, 5, and finally 7 to sieve out the primes
up to 100, where instead of crossing out multiples we grey them out, and instead
of circling primes we color their box red.
22
23
2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
Since all the even numbers greater than two are eliminated as being composite
numbers and not primes they appear as gray in Figure 4.1, but none of the odd
numbers are eliminated so they still appear in white boxes.
2 3 5 7 9
11 13 15 17 19
21 23 25 27 29
31 33 35 37 39
41 43 45 47 49
51 53 55 57 59
61 63 65 67 69
71 73 75 77 79
81 83 85 87 89
91 93 95 97 99
Figure 4.2: Using the primes 2 and 3 to sieve for primes up to 100
2 3 5 7
11 13 17 19
23 25 29
31 35 37
41 43 47 49
53 55 59
61 65 67
71 73 77 79
83 85 89
91 95 97
Figure 4.3: Using the primes 2, 3, and 5 to sieve for primes up to 100
Looking at Figure 4.3, we see that for all but three numbers (49, 77, and 91) up
24 CHAPTER 4. SIEVES
to 100 we have (after sieving by 2,3, and 5) determined which are primes and
which composite.
2 3 5 7
11 13 17 19
23 29
31 37
41 43 47 49
53 59
61 67
71 73 77 79
83 89
91 97
Figure 4.4: Using the primes 2, 3, 5, and 7 to sieve for primes up to 100
Finally, we see in Figure 4.4 that sieving by 2, 3, 5, and 7 determines all primes
up to 100. See [5] for more about explicitly enumerating primes using a com-
puter.
Chapter 5
We become quickly stymied when we ask quite elementary questions about the
spacing of the infinite series of prime numbers.
For example, are there infinitely many pairs of primes whose difference is 2?
The sequence of primes seems to be rich in such pairs
5 − 3 = 2, 7 − 5 = 2, 13 − 11 = 2, 19 − 17 = 2,
and we know that there are loads more such pairs1 but the answer to our
question, are there infinitely many?, is not known. The conjecture that there
are infinitely many such pairs of primes (“twin primes” as they are called)
is known as the Twin Primes Conjecture. Are there infinitely many pairs of
primes whose difference is 4, 6? Answer: equally unknown. Nevertheless there
is very exciting recent work in this direction, specifically, Yitang Zhang proved
that there are infinitely many pairs of primes that differ by no more than 7 ×
107 . For a brief account of Zhang’s work, see the Wikipedia entry http://
en.wikipedia.org/wiki/Yitang_Zhang. Many exciting results have followed
Zhang’s breakthrough; we know now, thanks to results2 of James Maynard and
others, that there are infinitely many pairs of primes that differ by no more than
246.
Is every even number greater than 2 a sum of two primes? Answer: unknown.
Are there infinitely many primes which are 1 more than a perfect square? An-
swer: unknown.
1 For example, according to http://oeis.org/A007508 there are 10,304,185,697,298 such
25
26 CHAPTER 5. QUESTIONS ABOUT PRIMES
bigger than p = 243,112,609 − 1? Answer: For many years we did not know;
however, in 2013 Curtis Cooper discovered the even bigger Mersenne prime
257,885,161 − 1, with a whopping 17,425,170 digits! Again we can ask if there is a
Mersenne prime larger than Cooper’s. Answer: we do not know. It is possible
that there are infinitely many Mersenne primes but we’re far from being able
to answer such questions.
Is there some neat formula giving the next prime? More specifically, if I give you
a number N , say N = one million, and ask you for the first number after N that
27
is prime, is there a method that answers that question without, in some form or
other, running through each of the successive odd numbers after N rejecting the
nonprimes until the first prime is encountered? Answer: unknown.
One can think of many ways of “getting at” some understanding of the place-
ment of prime numbers among all numbers. Up to this point we have been
mainly just counting them, trying to answer the question “how many primes
are there up to X?” and we have begun to get some feel for the numbers behind
this question, and especially for the current “best guesses” about estimates.
What is wonderful about this subject is that people attracted to it cannot resist
asking questions that lead to interesting, and sometimes surprising numerical
experiments. Moreover, given our current state of knowledge, many of the
questions that come to mind are still unapproachable: we don’t yet know enough
about numbers to answer them. But asking interesting questions about the
mathematics that you are studying is a high art, and is probably a necessary
skill to acquire, in order to get the most enjoyment—and understanding—from
mathematics. So, we offer this challenge to you:
Come up with your own question about primes that
• is interesting to you,
• is not a question whose answer is known to you,
• is not a question that you’ve seen before; or at least not exactly,
If you are having trouble coming up with a question, read on for more examples
that provide further motivation.
Chapter 6
In celebration of Yitang Zhang’s recent result, let us consider more of the nu-
merics regarding gaps between one prime and the next, rather than the tally of
all primes. Of course, it is no fun at all to try to guess how many pairs of primes
p, q there are with gap q − p equal to a fixed odd number, since the difference of
two odd numbers is even, as in Chapter 5. The fun, though, begins in earnest
if you ask for pairs of primes with difference equal to 2 (these being called twin
primes) for it has long been guessed that there are infinitely many such pairs of
primes, but no one has been able to prove this yet.
As of 2014, the largest known twin primes are
3756801695685 · 2666669 ± 1.
These enormous primes, which were found in 2011, have 200,700 digits each.1
Similarly, it is interesting to consider primes p and q with difference 4, or 8,
or—in fact—any even number 2k. That is, people have guessed that there are
infinitely many pairs of primes with difference 4, with difference 6, etc. but none
of these guesses have yet been proved.
So, define
Gapk (X)
to be the number of pairs of consecutive primes (p, q) with q < X that have “gap
k” (i.e., such that their difference q − p is k). Here p is a prime, q > p is a prime,
and there are no primes between p and q. For example, Gap2 (10) = 2, since the
pairs (3, 5) and (5, 7) are the pairs less than 10 with gap 2, and Gap4 (10) = 0
primes.
28
29
because despite 3 and 7 being separated by 4, they are not consecutive primes.
See Table 6.1 for various values of Gapk (X) and Figure 6.1 for the distribution
of prime gaps for X = 107 .
X Gap2 (X) Gap4 (X) Gap6 (X) Gap8 (X) Gap100 (X) Gap246 (X)
10 2 0 0 0 0 0
102 8 7 7 1 0 0
103 35 40 44 15 0 0
104 205 202 299 101 0 0
105 1224 1215 1940 773 0 0
106 8169 8143 13549 5569 2 0
107 58980 58621 99987 42352 36 0
108 440312 440257 768752 334180 878 0
90000
60000
20000
2 4 6 8 10 20 30 40 48
Figure 6.1: Frequency histogram showing the distribution of prime gaps of size
≤ 50 for all primes up to 107 . Six is the most popular gap in this data.
Gap 6
200
Gap 2
150 Gap 4
100
Gap 8
50
Here is yet another question that deals with the spacing of prime numbers that
we do not know the answer to:
Racing Gap 2, Gap 4, Gap 6, and Gap 8 against each other:
Here is a curious question that you can easily begin to check out for small
numbers. We know, of course, that the even numbers and the odd numbers are
nicely and simply distributed: after every odd number comes an even number,
after every even, an odd. There are an equal number of positive odd numbers
and positive even numbers less than any given odd number, and there may be
nothing else of interest to say about the matter. Things change considerably,
though, if we focus our concentration on multiplicatively even numbers and
multiplicatively odd numbers.
A multiplicatively even number is one that can be expressed as a product
of an even number of primes; and a multiplicatively odd number is one that
can be expressed as a product of an odd number of primes. So, any prime is
multiplicatively odd, the number 4 = 2 · 2 is multiplicatively even, and so is
6 = 2 · 3, 9 = 3 · 3, and 10 = 2 · 5; but 12 = 2 · 2 · 3 is multiplicatively odd. Below
we list the numbers up to 25, and underline and bold the multiplicatively odd
numbers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Now looking at this data, a natural, and simple, question to ask about the
concept of multiplicative oddness and evenness is:
Is there some X ≥ 2 for which there are more multiplicatively even numbers less
than or equal to X than multiplicatively odd ones?
Each plot in Figure 6.3 gives the number of multiplicatively even numbers be-
tween 2 and X minus the number of multiplicatively odd numbers between 2
and X, for X equal to 10, 100, 1000, 10000, 100000, and 1000000. The above
question asks whether these graphs would, for sufficiently large X, ever cross
the X-axis.
31
0 0
-2
-1
-4
-2
-6
-3 -8
-10
-4
2 4 6 8 10 12 14 16 0 20 40 60 80 100
0 0
-5 -20
-10 -40
-60
-15
-80
-20
-100
-25
-120
0 200 400 600 800 1000 0 2000 4000 6000 8000 10000
0 0
-200
-100
-400
-200 -600
-800
-300
-1000
-400 -1200
0 2e4 4e4 6e4 8e4 1e5 0 2e5 4e5 6e5 8e5 1e6
2 For more details, see P. Borwein, “Sign changes in sums of the Liouville Function” and
the nice short paper of Norbert Wiener “Notes on Polya’s and Turan’s hypothesis concerning
Liouville’s factor” (page 765 of volume II of Wiener’s Collected Works); see also: G. Pólya
“Verschiedene Bemerkungen zur Zahlentheorie,” Jahresbericht der Deutschen Mathematiker-
Vereinigung, 28 (1919) 31–40.
3 See, e.g., Richard Guy’s book Unsolved Problems in Number Theory (2004).
Chapter 7
2 3 5 7 11 13
17 19 23
29 31 37 41
43 47 53
59 61 67
71 73 79 83
89 97
101 103 107 109
113
127 131 137 139
149 151
157 163 167
169 173 179 181
191 193
Slow as we are to understand primes, at the very least we can try to count them.
You can see that there are 10 primes less than 30, so you might encapsulate this
by saying that the chances that a number less than 30 is prime is 1 in 3. This
frequency does not persist, though; here is some more data: There are 25 primes
less than 100 (so 1 in 4 numbers up to 100 are prime), there are 168 primes less
than a thousand (so we might say that among the numbers less than a thousand
the chances that one of them is prime is roughly 1 in 6).
32
33
0.6
0.5
0.4
0.3
0.2
0.1
20 40 60 80 100
Figure 7.2: Graph of the proportion of primes up to X for each integer X ≤ 100
There are 78,498 primes less than a million (so we might say that the chances
that a random choice among the first million numbers is prime have dropped to
roughly 1 in 13).
0.6
0.5
0.4
0.3
0.2
0.1
0.6
0.5
0.4
0.3
0.2
0.1
There are 455,052,512 primes less than ten billion; i.e., 10,000,000,000 (so we
might say that the chances are down to roughly 1 in 22).
Following Eratosthenes, we have sifted those numbers, to pan for primes. Our
first move was to throw out roughly half the numbers (the even ones!) after the
number 2. The graph that is labeled “Sieve by 2” in Figure 7.5 that is, with
one hiccup, a regular staircase climbing at a smaller angle, each step twice the
length of each riser, illustrates the numbers that are left after one pass through
Eratosthenes’ sieve, which includes, of course, all the primes. So, the chances
that a number bigger than 2 is prime is at most 1 in 2. Our second move was
to throw out a good bunch of numbers bigger than 3. So, the chances that a
number bigger than 3 is prime is going to be even less. And so it goes: with
each move in our sieving process, we are winnowing the field more extensively,
reducing the chances that the later numbers are prime.
35
80
60
Sieve by 2
40
20 Primes
20 40 60 80 100
800
600
400
200
Primes
200 400 600 800 1000
The red curve in these figures actually counts the primes: it is the beguilingly
irregular staircase of primes. Its height above any number X on the horizontal
line records the number of primes less than or equal to X, the accumulation of
primes up to X. Refer to this number as π(X). So π(2) = 1, π(3) = 2, π(30) =
10; of course, we could plot a few more values of π(X), like π(ten billion) =
455, 052, 512.
Let us accompany Eratosthenes for a few further steps in his sieving process.
Figure 7.7 contains a graph of all whole numbers up to 100 after we have removed
the even numbers greater than 2, and the multiples of 3 greater than 3 itself.
35 Sieve by 2, 3
30
25
20
15
10
5
20 40 60 80 100
From this graph you can see that if you go “out a way” the likelihood that
a number is a prime is less than 1 in 3. Figure 7.8 contains a graph of what
Eratosthenes sieve looks like up to 100 after sifting 2, 3, 5, and 7.
36 CHAPTER 7. HOW MANY PRIMES ARE THERE?
Sieve by 2, 3, 5, 7
25
20
15
10
5
20 40 60 80 100
This data may begin to suggest to you that as you go further and further out
on the number line the percentage of prime numbers among all whole numbers
tends towards 0% (it does).
To get a sense of how the primes accumulate, we will take a look at the staircase
of primes for X = 25 and X = 100 in Figures 7.9 and 7.10.
0
5 10 15 20 25
20 40 60 80 100
The striking thing about these figures is that as the numbers get large enough,
the jagged accumulation of primes, those quintessentially discrete entities, be-
comes smoother and smoother to the eye. How strange and wonderful to watch,
as our viewpoint zooms out to larger ranges of numbers, the accumulation of
primes taking on such a smooth and elegant shape.
150
100
50
37
38 CHAPTER 8. PRIME NUMBERS VIEWED FROM A DISTANCE
1200
1000
800
600
400
200
8000
6000
4000
2000
50000 100000
But don’t be fooled by the seemingly smooth shape of the curve in the last
figure above: it is just as faithful a reproduction of the staircase of primes as
the typographer’s art can render, for there are thousands of tiny steps and risers
in this curve, all hidden by the thickness of the print of the drawn curve in the
figure. It is already something of a miracle that we can approximately describe
the build-up of primes, somehow, using a smooth curve. But what smooth curve?
That last question is not rhetorical. If you draw a curve with chalk on the
blackboard, this can signify a myriad of smooth (mathematical) curves all en-
compassed within the thickness of the chalk-line, all—if you wish—reasonable
approximations of one another. So, there are many smooth curves that fit the
chalk-curve. With this warning, but very much fortified by the data of Fig-
ure 8.3, let us ask: what is a smooth curve that is a reasonable approximation
to the staircase of primes?
Chapter 9
Mathematicians seems to agree that, loosely speaking, there are two types of
mathematics: pure and applied. Usually—when we judge whether a piece of
mathematics is pure or applied—this distinction turns on whether or not the
math has application to the “outside world,” i.e., that world where bridges are
built, where economic models are fashioned, where computers churn away on
the Internet (for only then do we unabashedly call it applied math), or whether
the piece of mathematics will find an important place within the context of
mathematical theory (and then we label it pure). Of course, there is a great
overlap (as we will see later, Fourier analysis plays a major role both in data
compression and in pure mathematics).
Moreover, many questions in mathematics are “hustlers” in the sense that, at
first view, what is being requested is that some simple task be done (e.g., the
question raised in this book, to find a smooth curve that is a reasonable approx-
imation to the staircase of primes). And only as things develop is it discovered
that there are payoffs in many unexpected directions, some of these payoffs be-
ing genuinely applied (i.e., to the practical world), some of these payoffs being
pure (allowing us to strike behind the mask of the mere appearance of the math-
ematical situation, and get at the hidden fundamentals that actually govern the
phenomena), and some of these payoffs defying such simple classification, inso-
far as they provide powerful techniques in other branches of mathematics. The
Riemann Hypothesis—even in its current unsolved state—has already shown
itself to have all three types of payoff.
The particular issue before us is, in our opinion, twofold, both applied, and pure:
can we curve-fit the “staircase of primes” by a well approximating smooth curve
given by a simple analytic formula? The story behind this alone is marvelous,
has a cornucopia of applications, and we will be telling it below. But our
39
40 CHAPTER 9. PURE AND APPLIED MATHEMATICS
curiosity here is driven by a question that is pure, and less amenable to precise
formulation: are there mathematical concepts at the root of, and more basic
than (and “prior to,” to borrow Aristotle’s use of the phrase) prime numbers—
concepts that account for the apparent complexity of the nature of primes?
Chapter 10
The search for such approximating curves began, in fact, two centuries ago when
Carl Friedrich Gauss defined a certain beautiful curve that, experimentally,
seemed to be an exceptionally good fit for the staircase of primes.
50
40
30
20
10
41
42 CHAPTER 10. A PROBABILISTIC FIRST GUESS
Let us denote Gauss’s curve G(X); it has an elegant simple formula comprehen-
sible to anyone who has had a tiny bit of calculus. If you make believe that the
chances that a number X is a prime is inversely proportional to the number of
digits of X you might well hit upon Gauss’s curve. That is,
X
G(X) is roughly proportional to .
the number of digits of X
But to describe Gauss’s guess precisely we need to discuss the natural logarithm1
“log(X)” which is an elegant smooth function of a real number X that is roughly
proportional to the number of digits of the whole number part of X.
4
3
2
1
20 40 60 80 100
-1
almost always denotes natural log and the notation ln(X) is not used.
43
Figure 10.4: A slide rule computes 2X by using that log(2X) = log(2) + log(X)
In Figure 10.4 the numbers printed (on each of the slidable pieces of the rule) are
spaced according to their logarithms, so that when one slides the rule arranging
it so that the printed number X on one piece lines up with the printed number
1 on the other, we get that for every number Y printed on the first piece, the
printed number on the other piece that is aligned with it is the product XY ; in
effect the “slide” adds log(X) to log(Y ) giving log(XY ).
In 1791, when Gauss was 14 years old, he received a book that contained log-
arithms of numbers up to 7 digits and a table of primes up to 10,009. Years
later, in a letter written in 1849 (see Figure 10.5), Gauss claimed that as early
as 1792 or 1793 he had already observed that the density of prime numbers over
intervals of numbers of a given rough magnitude X seemed to average 1/ log(X).
Very very roughly speaking, this means that the number of primes up to X is
approximately X divided by twice the number of digits of X. For example, the
44 CHAPTER 10. A PROBABILISTIC FIRST GUESS
1 1 1
0 0
5 10 15 20 25 30 0 20 40 60 80 100 200 400 600 800 1000
Gauss was an inveterate computer: he wrote in his 1849 letter that there are
216,745 prime numbers less than three million. This is wrong: the actual number
of these primes is 216,816. Gauss’s curve G(X) predicted that there would be
216,970 primes—a miss, Gauss thought, by
But actually he was closer than he thought: the prediction of the curve G(X)
missed by a mere 154 = 216970 − 216816. Gauss’s computation brings up two
queries: will this spectacular “good fit” continue for arbitrarily large numbers?
and, the (evidently prior) question: what counts as a good fit?
Chapter 11
What is a “good
approximation”?
If you are trying to estimate a number, say, around ten thousand, and you get it
right to within a hundred, let us celebrate this kind of accuracy
√ by saying that
you have made an approximation with square-root error ( 10,000 = 100). Of
course, we should really use the more clumsy phrase “an approximation with at
worst square-root error.” Sometimes we’ll simply refer to such approximations
as good approximations. If you are trying to estimate a number in the millions,
and you get it right to within a thousand, let’s √ agree that—again—you have
made an approximation with square-root error ( 1,000,000 = 1,000). Again,
for short, call this a good approximation. So, when Gauss thought his curve
missed by 226 in estimating the number of primes less than three million, it was
well within the margin we have given for a “good approximation.”
More generally, if you are trying to estimate a number that has D digits and
you get it almost right, but with an error that has no more than, roughly, half
that many digits, let us say, again, that you have made an approximation with
square-root error or synonymously, a good approximation.
This rough account almost suffices for what we will be discussing below, but to
be more precise, the specific gauge of accuracy that will be important to us is
not for a mere single estimate of a single error term,
but rather for infinite sequences of estimates of error terms. Generally, if you
are interested in a numerical quantity q(X) that depends on the real number
parameter X (e.g., q(X) could be π(X), “the number of primes ≤ X”) and if
you have an explicit candidate “approximation,” qapprox (X), to this quantity, let
us say that qapprox (X) is essentially a square-root accurate approxima-
tion to q(X) if for any given exponent greater than 0.5 (you choose it: 0.501,
45
46 CHAPTER 11. WHAT IS A “GOOD APPROXIMATION”?
0.5001, 0.50001, . . . for example) and for large enough X—where the phrase
“large enough” depends on your choice of exponent—the error term—i.e., the
difference between qapprox (X) and the true quantity, q(X), is, in absolute value,
less than X raised to that exponent (e.g. < X 0.501 , < X 0.5001 , etc.). Readers
who know calculus and wish to have a technical formulation of this definition
of good approximation might turn to the endnote [7] for a precise statement.
If you found the above confusing, don’t worry: again, a square-root accurate
approximation is one in which at least roughly half the digits are correct.
Remark 11.1. To get a feel for how basic the notion of approximation to data
being square root close to the true values of the data is—and how it represents
the “gold standard” of accuracy for approximations, consider this fable.
Imagine that the devil had the idea of saddling a large committee of people
with the task of finding values of π(X) for various large numbers X. This
he did in the following manner, having already worked out which numbers are
prime himself. Since the devil is, as everyone knows, in the details, he has made
no mistakes: his work is entirely correct. He gives each committee member a
copy of the list of all prime numbers between 1 and one of the large numbers
X in which he was interested. Now each committee member would count the
number of primes by doing nothing more than considering each number, in turn,
on their list and tallying them up, much like a canvasser counting votes; the
committee members needn’t even know that these numbers are prime, they just
think of these numbers as items on their list. But since they are human, they
will indeed be making mistakes, say 1% of the time. Assume further that it is
just as likely for them to make the mistake of undercounting or overcounting.
If many people are engaged in such a pursuit, some of them might overcount
π(X); some of them might undercount it. √ The average error (overcounted or
undercounted) would be proportional to X.
In the next chapter we’ll view these undercounts and overcounts as analogous
to a random walk.
Chapter 12
To take a random walk along a (straight) east–west path you would start at
your home base, but every minute, say, take a step along the path, each step
being of the same length, but randomly either east or west. After X minutes,
how far are you from your home base?
The answer to this cannot be a specific number, precisely because you’re mak-
ing a random decision that affects that number for each of the X minutes of
your journey. It is more reasonable to ask a statistical version of that ques-
tion. Namely, if you took many random walks X minutes long, then—on the
average—how far would you be from your home base? The answer, as is illus-
trated by the figures below, is that the average distance you will find yourself
from home base after (sufficiently many
q of)√these excursions is proportional to
√
X. (In fact, the average is equal to π2 · X.)
47
48 CHAPTER 12. SQUARE ROOT ERROR AND RANDOM WALKS
blue curve in the right-hand graphs of those four figures is the average distance
from home-base of the corresponding (three, ten, a hundred, and a thousand)
random
q √ walks. The red curve in each figure below is the graph of the quantity
2
π · X over the X-axis. As the number of random walks increases, the red
curve better and better approximates the average distance.
30
30
20 25
10 20
60 30
40 25
20
20
200 400 600 800 1000 15
-20
10
-40
-60 5
-80
200 400 600 800 1000
25
60
40 20
20
15
200 400 600 800 1000
-20 10
-40
5
-60
-80 200 400 600 800 1000
25
100
20
50
15
-50 5
What is Riemann’s
Hypothesis? (first
formulation)
Recall from Chapter 10 that a rough guess for an approximation to π(X), the
number of primes ≤ X, is given by the function X/ log(X). Recall, as well, that
a refinement of that guess, offered by Gauss, stems from this curious thought:
the “probability” that a number N is a prime is proportional to the reciprocal
of its number of digits; more precisely, the probability is 1/ log(N ). This would
lead us to guess that the approximate value of π(X) would be the area of the
region from 2 to X under the graph of 1/ log(X), a quantity sometimes referred
to as Li(X). “Li” (pronounced Li, so the same as “lie” in “lie down”) is short for
logarithmic integral, because the area of the region from 2 to X under 1/ log(X)
RX
is (by definition) the integral 2 1/ log(t)dt.
Figure 13.1 contains a graph of the three functions Li(X), π(X), and X/ log X
for X ≤ 200. But data, no matter how impressive, may be deceiving (as we
learned in Chapter 6). If you think that the three graphs never cross for all large
values of X, and that we have the simple relationship X/ log(X) < π(X) <
Li(X) for large X, read http://en.wikipedia.org/wiki/Skewes’_number.
49
50 CHAPTER 13. WHAT IS RIEMANN’S HYPOTHESIS?
50
40
30
20
10
Figure 13.1: Plots of Li(X) (top), π(X) (in the middle), and X/ log(X) (bot-
tom).
It is a major challenge to evaluate π(X) for large values of X. For example, let
X = 1024 . Then (see [8]) we have:
π(X) = 18,435,599,767,349,200,867,866
Li(X) = 18,435,599,767,366,347,775,143.10580 . . .
X/(log(X) − 1) = 18,429,088,896,563,917,716,962.93869 . . .
Li(X) − π(X) = 17,146,907,277.105803 . . .
√
X · log(X) = 55,262,042,231,857.096416 . . .
Note that several of the left-most digits of π(X) and Li(X) are the same (as
indicated in red), a point we will return to on page 57.
More fancifully, we can think of the error in this approximation to π(X), i.e.,
| Li(X) − π(X)|, (the absolute value) of the difference between Li(X) and π(X),
as (approximately) the result of a walk having roughly X steps where you move
by the following rule: go east by a distance of 1/ log N feet if N is not a prime
and west by a distance of 1 − log1 N feet if N is a prime. Your distance, then,
from home base after X steps is approximately | Li(X) − π(X)| feet.
We have no idea if this picture of things resembles a truly random walk but at
least it makes it reasonable to ask the question: is Li(X) essentially a square
root approximation to π(X)? Our first formulation of Riemann’s Hypothesis
says yes:
Let’s think of what happens when you have a mysterious quantity (say, a func-
tion of a real number X) you wish to understand. Suppose you manage to ap-
proximate that quantity by an easy to understand expression—which we’ll call
the “dominant term”—that is also simple to compute, but only approximates
your mysterious quantity. The approximation is not exact; it has a possible
error, which happily is significantly smaller than the size of the dominant term.
“Dominant” here just means exactly that: it is of size significantly larger than
the error of approximation.
If all you are after is a general estimate of size your job is done. You might
declare victory and ignore Error (X) as being—in size—negligible. But if you
are interested in the deep structure of your mysterious quantity perhaps all you
have done is to manage to transport most questions about it to Error (X). In
conclusion, Error (X) is now your new mysterious quantity.
Returning to the issue of π(X) (our mysterious quantity) and Li(X) (our dom-
inant term) the first formulation of the Riemann Hypothesis (as in Chapter 13
above) puts the spotlight on the Error term | Li(X) − π(X)|, which therefore
deserves our scrutiny, since—after all—we’re not interested in merely counting
the primes: we want to understand as much as we can of their structure.
To get a feel for this error term, we shall smooth it out a bit, and look at a few
of its graphs.
51
Chapter 15
Cesàro smoothing
Often when you hop in a car and reset the trip counter, the car will display
information about your trip. For instance, it might show you the average speed
up until now, a number that is “sticky”, changing much less erratically than your
actual speed, and you might use it to make a rough estimate of how long until
you will reach your destination. Your car is computing the Cesàro smoothing
of your speed. We can use this same idea to better understand the behavior of
other things, such as the sums appearing in the previous chapter.
Suppose you are trying to say something intelligent about the behavior of a
certain quantity that varies with time in what seems to be a somewhat erratic,
volatile pattern. Call the quantity f (t) and think of it as a function defined
for positive real values of “time” t. The natural impulse might be to take
some sort of “average value”1 of f (t) over a time interval, say from 0 to T .
This would indeed be an intelligent thing to do if this average stabilized and
depended relatively little on the interval of time over which one is computing
this average, e.g., if that interval were large enough. Sometimes, of course, these
averages themselves are relatively sensitive to the times T that are chosen, and
1 For 1
RT
readers who know calculus: that average would be T 0 f (t)dt.
52
53
don’t stabilize. In such a case, it usually pays to consider all these averages as
your chosen T varies, as a function (of T ) in its own right2 . This new function
F (T ), called the Cesàro Smoothing of the function f (t), is a good indicator
of certain eventual trends of the original function f (t) that is less visible directly
from the function f (t). The effect of passing from f (t) to F (T ) is to “smooth
out” some of the volatile local behavior of the original function, as can be seen
in Figure 15.2.
15
10
10 20 30 40
Figure 15.2: The red plot is the Cesàro smoothing of the blue plot
Returning to our mysterious error term, consider Figure 16.1 where the volatile
blue curve in the middle is Li(X) − π(X), its Cesàro smoothing is the red
relatively
q psmooth curve on the bottom, and the curve on the top is the graph
of π2 · X/ log(X) over the range of X ≤ 250,000.
100
80
60
40
20
Figure
q 16.1: Li(X) − π(X) (blue middle), its Cesàro smoothing (red bottom),
2
p
and π · X/ log(X) (top), all for X ≤ 250,000
Data such as this graph can be revealing and misleading at the same time. For
example, the volatile blue graph (of Li(X) − π(X)) seems to be roughly sand-
wiched between two rising functions, but this will not continue for all values of
X. The Cambridge University mathematician, John Edensor Littlewood (see
Figure 16.2), proved in 1914 that there exists a real number X for which the
quantity Li(X) − π(X) vanishes, and then for slightly larger X crosses over
into negative values. This theorem attracted lots of attention at the time (and
54
55
Take a look at Figure 13.1 again. All three functions, Li(X), π(X) and X/ log(X)
are “going to infinity with X” (this means that for any real number R, for all
sufficiently large X, the values of these functions at X exceed R).
Are these functions “going to infinity” at the same rate?
To answer such a question, we have to know what we mean by going to infinity
at the same rate. So, here’s a definition. Two functions, A(X) and B(X), that
each go to infinity will be said to go to infinity at the same rate if their
ratio
A(X)/B(X)
tends to 1 as X goes to infinity.
If for example two functions, A(X) and B(X) that take positive whole number
values, have the same number of digits for large X and if, for any number you
give us, say: a million (or a billion, or a trillion) and if X is large enough, then
the “leftmost” million (or billion, or trillion) digits of A(X) and B(X) are the
same, then A(X) and B(X) go to infinity at the same rate. For example,
A(X) 281067597183743525105611755423
= = 1.00000000007963213762060 . . .
B(X) 281067597161361511527766294585
While we’re defining things, let us say that two functions, A(X) and B(X), that
each go to infinity go to infinity at similar rates if there are two positive
constants c and C such that for X sufficiently large the ratio
A(X)/B(X)
56
57
30000 1
25000
0.8
20000
B(X) 0.6 A(X)/B(X)
15000
0.4
10000
A(X)
5000 0.2
20 40 60 80 100 20 40 60 80 100
6000 4
B(X) 3
4000
2
2000
1
20 40 60 80 100
A(X)/B(X)
0 20 40 60 80 100
Now a theorem from elementary calculus tells us that the ratio of Li(X) to
X/ log(X) tends to 1 as X gets larger and larger. That is—using the definition
we’ve just introduced—Li(X) and X/ log(X) go to infinity at the same rate (see
[10]).
Recall (on page 50 above in Chapter 13) that if X = 1024 , the left-most twelve
digits of π(X) and Li(X) are the same: both numbers start 18,435,599,767,3 . . ..
Well, that’s a good start. Can we guarantee that for X large enough, the “left-
most” million (or billion, or trillion) digits of π(X) and Li(X) are the same, i.e.,
that these two functions go to infinity at the same rate?
The Riemann Hypothesis, as we have just formulated it, would tell us that the
difference between Li(X) and π(X) is pretty small in comparison with the size of
X. This information would imply (but would be much more precise information
58 CHAPTER 17. THE PRIME NUMBER THEOREM
than) the statement that the ratio Li(X)/π(X) tends to 1, i.e., that Li(X) and
π(X) go to infinity at the same rate.
This last statement gives, of course, a far less precise relationship between Li(X)
and π(X) than the Riemann Hypothesis (once it is proved!) would give us. The
advantage, though, of the less precise statement is that it is currently known
to be true, and—in fact—has been known for over a century. It goes under the
name of
The Prime Number Theorem: Li(X) and π(X) go to infinity at the same
rate.
Since Li(X) and X/ log(X) go to infinity at the same rate, we could equally
well have expressed the “same” theorem by saying:
A milestone in the history leading up to the proof of the Prime Number Theorem
is the earlier work of Pafnuty Lvovich Chebyshev (see http://en.wikipedia.
org/wiki/Chebyshev_function) showing that (to use the terminology we in-
troduced) X/ log(X) and π(X) go to infinity at similar rates.
The elusive Riemann Hypothesis, however, is much deeper than the Prime Num-
ber Theorem, and takes its origin from some awe-inspiring, difficult to inter-
pret, lines in Bernhard Riemann’s magnificent 8-page paper, “On the number
of primes less than a given magnitude,” published in 1859 (see [11]).
The Riemann Hypothesis remains unproved to this day, and therefore is “only
a hypothesis,” as Osiander said of Copernicus’s theory, but one for which we
have overwhelming theoretical and numerical evidence in its support. It is the
kind of conjecture that contemporary Dutch mathematician Frans Oort might
label a suffusing conjecture in that it has unusually broad implications: many,
many results are now known to follow, if the conjecture, familiarly known as
RH, is true. A proof of RH would, therefore, fall into the applied category, given
our discussion above in Chapter 9. But however you classify RH, it is a central
concern in mathematics to find its proof (or, a counter-example!). RH is one of
the weightiest statements in all of mathematics.
Chapter 18
We have borrowed the phrase “staircase of primes” from the popular book The
Music of the Primes by Marcus du Sautoy, for we feel that it captures the
sense that there is a deeply hidden architecture to the graphs that compile the
number of primes (up to N ) and also because—in a bit—we will be tinkering
with this carpentry. Before we do so, though, let us review in Figure 18.1 what
this staircase looks like for different ranges.
60
61
25
8
20
6
15
4
10
2 5
5 10 15 20 25 20 40 60 80 100
1200
150
1000
800
100
600
50 400
200
200 400 600 800 1000 2000 4000 6000 8000 10000
8000
5000
2000
The mystery of this staircase is that the information contained within it is—in
effect—the full story of where the primes are placed. This story seems to elude
any simple description. Can we “tinker with” this staircase without destroying
this valuable information?
Chapter 19
For starters, notice that all the (vertical) risers of this staircase (Figure 18.1
above) have unit height. That is, they contain no numerical information except
for their placement on the x-axis. So, we could distort our staircase by changing
(in any way we please) the height of each riser; and as long as we haven’t brought
new risers into—or old risers out of—existence, and have not modified their
position over the x-axis, we have retained all the information of our original
staircase.
A more drastic-sounding thing we could do is to judiciously add new steps to
our staircase. At present, we have a step at each prime number p, and no step
anywhere else. Suppose we built a staircase with a new step not only at x = p
for p each prime number but also at x = 1 and x = pn where pn runs through
all powers of prime numbers as well. Such a staircase would have, indeed, many
more steps than our original staircase had, but, nevertheless, would retain much
of the quality of the old staircase: namely it contains within it the full story of
the placement of primes and their powers.
A final thing we can do is to perform a distortion of the x-axis (elongating or
shortening it, as we wish) in any specific way, as long as we can perform the
inverse process, and “undistort” it if we wish. Clearly such an operation may
have mangled the staircase, but hasn’t destroyed information irretrievably.
We shall perform all three of these kinds of operations eventually, and will see
some great surprises as a result. But for now, we will perform distortions only
of the first two types. We are about to build a new staircase that retains the
precious information we need, but is constructed according to the following
62
63
architectural plan.
• Our staircase starts on the ground at x = 0 and the height of the riser of
the step at x = 1 will be log(2π). The height of the riser of the step at
x = pn will not be 1 (as was the height of all risers in the old staircase of
primes) but rather: the step at x = pn will have the height of its riser equal
to log p. So for the first few steps listed in the previous item, the risers
will be of height log(2π), log 2, log 3, log 2, log 5, log 7, log 2, log 3, log 11, . . .
Since log(p) > 1, these vertical dimensions lead to a steeper ascent but no
great loss of information.
Although we are not quite done with our architectural work, Figure 19.1
shows what our new staircase looks like, so far.
8 80
6 60
4 40
2 20
2 4 6 8 20 40 60 80 100
Figure 19.1: The newly constructed staircase that counts prime powers
Notice that this new staircase looks, from afar, as if it were nicely approximated
by the 45 degree straight line, i.e., by the simple function X. In fact, we have—
by this new architecture—a second equivalent way of formulating Riemann’s
hypothesis. For this, let ψ(X) denote the function of X whose graph is depicted
in Figure 19.1 (see [12]).
1000
800
600
400
200
Figure 19.2: The newly constructed staircase is close to the 45 degree line.
Do not worry if you do not understand why our first and second formula-
tions of Riemann’s Hypothesis are equivalent. Our aim, in offering the second
formulation—a way of phrasing Riemann’s guess that mathematicians know to
be equivalent to the first one—is to celebrate the variety of equivalent ways we
have to express Riemann’s proposed answers to the question “How many primes
are there?”, and to point out that some formulations would reveal a startling
simplicity—not immediately apparent—to the behavior of prime numbers, no
matter how erratic primes initially appear to us to be. After all, what could be
simpler than a 45 degree straight line?
Chapter 20
0.5
-6 -4 -2 2 4 6
-0.5
-1
65
66 CHAPTER 20. COMPUTER MUSIC FILES AND PRIME NUMBERS
You’ll notice that there are two features to the graph in Figure 20.1.
1. The height of the peaks of this sine wave: This height is referred to as the
amplitude and corresponds to the loudness of the sound.
2. The number of peaks per second: This number is referred to as the fre-
quency and corresponds to the pitch of the sound.
0.5
5 10 15
-0.5
-1
2
1.5
1
0.5
5 10 15
-0.5
-1
-1.5
Figure 20.3: Graph of sum of the two sine waves with different frequencies
Phase 0.5
5 10 15
-0.5
-1
1.5
1
0.5
5 10 15
-0.5
-1
-1.5
Figure 20.5: Graph of the sum of the two “sine” waves with different frequencies
and phases.
So, all you need to reconstruct the chord graphed above is to know five numbers:
Now suppose you came across such a sound as pictured in Figure 20.5 and
wanted to “record it.” Well, one way would be to sample the amplitude of the
sound at many different times, as for example in Figure 20.6.
1.5
1
0.5
5 10 15
-0.5
-1
-1.5
1.5
1
0.5
5 10 15
-0.5
-1
-1.5
Figure 20.7: Graph obtained from Figure 20.6 by filling in the rest of the points
But this sampling would take an enormous amount of storage space, at least
compared to storing five numbers, as explained above! Current audio compact
discs do their sampling 44,100 times a second to get a reasonable quality of
sound.
Another way is to simply record the five numbers: the spectrum, amplitudes,
and phase. Surprisingly, this seems to be roughly the way our ear processes
such a sound when we hear it.1
Even in this simplest of examples (our pure chord: the pure note C played
simultaneously with pure note E) the efficiency of data compression that is the
immediate bonus of analyzing the picture of the chords as built just with the
five numbers giving spectrum, amplitudes, and phase is staggering.
This type of analysis, in general, is called Fourier Analysis and is one of the
glorious chapters of mathematics. One way of picturing spectrum and amplitudes
of a sound is by a bar graph which might be called the spectral picture of the
sound, the horizontal axis depicting frequency and the vertical one depicting
1.5
0.5
5 10 15
-0.5
-1
C D E
This spectral picture ignores the phase but is nevertheless a very good portrait
of the sound. The spectral picture of a graph gets us to think of that graph as
“built up by the superposition of a bunch of pure waves,” and if the graph is
complicated enough we may very well need infinitely many pure waves to build
it up! Fourier analysis is a mathematical theory that allows us to start with any
graph—we are thinking here of graphs that picture sounds, but any graph will
do—and actually compute its spectral picture (and even keep track of phases).
The operation that starts with a graph and goes to its spectral picture that
records the frequencies, amplitudes, and phases of the pure sine waves that, to-
gether, compose the graph is called the Fourier transform and nowadays there
are very fast procedures for getting accurate Fourier transforms (meaning ac-
curate spectral pictures including information about phases) by computer [13].
The theory behind this operation (Fourier transform giving us a spectral analy-
sis of a graph) is quite beautiful, but equally impressive is how—given the power
of modern computation—you can immediately perform this operation for your-
self to get a sense of how different wave-sounds can be constructed from the
superposition of pure tones.
The sawtooth wave in Figure 20.10 has a spectral picture, its Fourier transform,
given in Figure 20.11:
1
0.8
0.6
0.4
0.2
1 2 3 4
Figure 20.11: The Spectrum of the sawtooth wave has a spike of height 1/k at
each integer k
5 10 15
-1
-2
Suppose you have a complicated sound wave, say as in Figure 20.12, and you
want to record it. Standard audio CDs record their data by intensive sampling
as we mentioned. In contrast, current MP3 audio compression technology uses
Fourier transforms plus sophisticated algorithms based on knowledge of which
frequencies the human ear can hear. With this, MP3 technology manages to
get a compression factor of 8–12 with little perceived loss in quality, so that you
can fit your entire music collection on your phone, instead of just a few of your
favorite CDs.
Chapter 21
This works well for the color spectrum, as initiated by Newton (as in the figure
above, sunlight is separated by a prism into a rainbow continuum of colors):
an analysis of white light into its components. Or in mass spectrometry, where
beams of ions are separated (analyzed) according to their mass/charge ratio
and the mass spectrum is recorded on a photographic plate or film. Or in
the recording of the various component frequencies, with their corresponding
intensities of some audible phenomenon.
In mathematics the word has found its use in many different fields, the most
basic use occurring in Fourier analysis, which has as its goal the aim of either
analyzing a function f (t) as being comprised of simpler (specifically: sine and
cosine) functions, or synthesizing such a function by combining simpler functions
to produce it. The understanding here and in the previous chapter, is that an
71
72 CHAPTER 21. THE WORD “SPECTRUM”
-15 -10 -5 5 10 15
-1
-2
Figure 22.1: Plot of the periodic sine wave f (t) = 2 · cos(1 + t/2)
The θ determines the frequency of the periodic wave, the larger θ is the higher
the “pitch.” The coefficient a determines the envelope of size of the periodic
wave, and we call it the amplitude of the periodic wave.
Sometimes we encounter functions F (t) that are not pure tones, but that can
be expressed as (or we might say “decomposed into”) a finite sum of pure tones,
for example three of them:
73
74 CHAPTER 22. SPECTRA AND TRIGONOMETRIC SUMS
10
8
6
4
2
-15 -10 -5 5 10 15
-2
-4
Figure 22.2: Plot of the sum 5 cos (−t − 2) + 2 cos (t/2 + 1) + 3 cos (2t + 4)
More generally we might consider a sum of any finite number of pure cosine
waves—or in a moment we’ll also see some infinite ones as well. Again, for
these more general trigonometric sums, their spectrum will denote the set of
frequencies that compose them.
Chapter 23
25
20
15
10
5
20 40 60 80 100
• Does this staircase of primes (or, perhaps, some tinkered version of the
staircase that contains the same basic information) have a spectrum?
75
76 CHAPTER 23. THE SPECTRUM AND THE STAIRCASE OF PRIMES
• And here is a most important question: will that spectrum show us order
and organization lurking within the staircase that we would otherwise be
blind to?
77
Part II
Distributions
78
Chapter 25
Differential calculus, initially the creation of Newton and Leibniz in the 1680s,
acquaints us with slopes of graphs of functions of a real variable. So, to discuss
this we should say a word about what a function is, and what its graph is.
79
80 CHAPTER 25. SLOPES OF GRAPHS THAT HAVE NO SLOPES
denote the function f by the symbol f (X) so that this symbolization allows to
“substitute for X any specific number a” to get its value f (a).
The graph of a function provides a vivid visual representation of the function in
the Euclidean plane where over every point a on the x-axis you plot a point above
it of “height” equal to the value of the function at a, i.e., f (a). In Cartesian
coordinates, then, you are plotting points (a, f (a)) in the plane where a runs
through all real numbers.
8
6
4
2
0
-2
-4
-4 -2 0 2 4 6 8
In this book we will very often be talking about “graphs” when we are also
specifically interested in the functions—of which they are the graphs. We will
use these words almost synonymously since we like to adopt a very visual atti-
tude towards the behavior of the functions that interest us.
1.5
1
Here it is!
0.5
0
How to compute the slope? This is Calculus.
-0.5
0 1 2 3 4 5 6 7 8
Figure 25.3 illustrates a function (blue), the slope at a point (green straight
line), and the derivative (red) of the function; the red derivative is the function
whose value at a point is the slope of the blue function at that point. Differential
calculus explains to us how to calculate slopes of graphs, and finally, shows us
the power that we then have to answer problems we could not answer if we
couldn’t compute those slopes.
81
(
1 x≤3
f (x) =
2 x > 3?
(Note that for purely aesthetic reasons, we draw a vertical line at the point
where the jump occurs, though technically that vertical line is not part of the
graph of the function.)
1.5
0.5
1 2 3 4 5 6
Figure 25.4: The graph of the function f (x) above that jumps—it is 1 up to 3
and then 2 after that point.
The most comfortable way to deal with the graph of such a function is to just
approximate it by a differentiable function as in Figure 25.5.
1.8
1.6
1.4
1.2
1
0 1 2 3 4 5 6
Then take the derivative of that smooth function. Of course, this is just an
approximation, so we might try to make a better approximation, which we do
in each successive graph starting with Figure 25.6 below.
82 CHAPTER 25. SLOPES OF GRAPHS THAT HAVE NO SLOPES
2
1.5
0.5
1 2 3 4 5 6
Note that—as you would expect—in the range where the initial function is
constant, its derivative is zero. In the subsequent figures, our initial function
will be nonconstant for smaller and smaller intervals about the origin. Note also
that, in our series of pictures below, we will be successively rescaling the y-axis;
all our initial functions have the value 1 for “large” negative numbers and the
value 2 for large positive numbers.
3.5
3
2.5
2
1.5
1
0.5
1 2 3 4 5 6
14
12
10
8
6
4
2
1 2 3 4 5 6
70
60
50
40
30
20
10
1 2 3 4 5 6
Notice what is happening: as the approximation gets better and better, the
derivative will be zero mostly, with a blip at the point of discontinuity, and the
blip will get higher and higher. In each of these pictures, for any interval of real
numbers [a, b] the total area under the red graph over that interval is equal to
What happens if we take the series of figures 25.6–25.9, etc., to the limit? This
is quite curious:
• the series of red graphs: these are getting thinner and thinner and
higher and higher: can we make any sense of what the red graph might
mean in the limit (even though the only picture of it that we have at
present makes it infinitely thin and infinitely high)?
• the series of blue graphs: these are happily looking more and more
like the tame Figure 25.4.
Each of our red graphs is the derivative of the corresponding blue graph. It is
tempting to think of the limit of the red graphs—whatever we might construe
this to be—as standing for the derivative of the limit of the blue graphs, i.e., of
the graph in Figure 25.4.
Karl Weierstrass, who worked during the latter part of the nineteenth century,
was known as the “father of modern analysis.” He oversaw one of the glorious
moments of rigorization of concepts that were long in use, but never before
systematically organized. He, and other analysts of the time, were interested
in providing a rigorous language to talk about functions, and more specifically
continuous functions and smooth (i.e., differentiable) functions. They wished
to have a firm understanding of limits (i.e., of sequences of numbers, or of
functions).
For Weierstrass and his companions, even though the functions they worked with
needn’t be smooth, or continuous, at the very least, the functions they studied
85
Distributions: sharpening
our approximating
functions even if we have to
let them shoot out to
infinity
The curious limit of the red graphs of the previous section, which you might be
tempted to think of as a “blip-function” f (t) that vanishes for t nonzero and is
somehow “infinite” (whatever that means) at 0 is an example of a generalized
function (in the sense of the earlier mathematicians) or a distribution in the
sense of Laurent Schwartz.
This particular limit of the red graphs also goes by another name (it is officially
called a Dirac δ-function (see [15]), the adjective “Dirac” being in honor of
the physicist who first worked with this concept, the “δ” being the symbol
he assigned to these objects). The noun “function” should be in quotation
marks for, properly speaking, the Dirac δ-function is not—as we have explained
above—a bona fide function but rather a distribution.
86
87
Now may be a good time to summarize what the major difference is between
honest functions and generalized functions or distributions.
An honest (by which we mean integrable) function of a real variable f (t) pos-
sesses two “features.”
0.8
R ∞ 0.6
−∞ f (x)dx 0.4
0.2
-10 -5 5 10
R∞
Figure 26.2: This figure illustrates −∞ f (x)dx, which is the signed area between
the graph of f (x) and the x-axis, where area below the x-axis (yellow) counts
negative, and area above (grey) is positive.
SO, any honest function integrable over finite intervals clearly is a distribution
(forget about its values!) but . . . there are many more generalized functions,
and including them in our sights gives us a very important tool.
It is natural to talk, as well, of Cauchy sequences, and limits, of distributions.
We’ll say that such a sequence D1 (t), D2 (t), D3 (t), . . . is a Cauchy sequence
if for every interval [a, b] the quantities
Z b Z b Z b
D1 (t)dt, D2 (t)dt, D3 (t)dt, . . .
a a a
form a Cauchy sequence of real numbers (so for any ε > 0 eventually all terms in
the sequence of real numbers are within ε of each other). Now, any Cauchy se-
quence of distributions converges to a limiting distribution D(t) which is defined
by the rule that for every interval [a, b],
Z b Z b
D(t)dt = lim Di (t)dt.
a i→∞ a
If, by the way, you have an infinite sequence—say—of honest, continuous, func-
tions that converges uniformly to a limit (which will again be a continuous
89
function) then that sequence certainly converges—in the above sense—to the
same limit when these functions are viewed as generalized functions. BUT,
there are many important occasions where your sequence of honest continuous
functions doesn’t have that convergence property and yet when these contin-
uous functions are viewed as generalized functions they do converge to some
generalized function as a limit. We will see this soon when we get back to the
“sequence of the red graphs.” This sequence does converge (in the above sense)
to the Dirac δ-function when these red graphs are thought of as a sequence of
generalized functions.
The integral notation for distribution is very useful, and allows us the flexibility
to define, for nice enough—and honest—functions c(t) useful expressions such
as Z b
c(t)D(t).
a
For example, the Dirac δ-function we have been discussing (i.e., the limit of
the red graphs of Chapter 25) is an honest function away from t = 3 and—in
fact—is the “trivial function” zero away from 3. And at 3, we may say that it
has the “value” infinity, in honor of it being the limit of blip functions getting
taller and taller at 3. The feature that pins it down as a distribution is given
by its behavior relative to the second feature above, the area of its graph over
the open interval (a, b) between a and b:
• If 3 is not in the open interval spanned by a and b, then the “area under
the graph of our Dirac δ-function” over the interval (a, b) is 0.
• If 3 is in the open interval (a, b), then the “area under the graph of our
Dirac δ-function” is 1—in notation
Z b
δ = 1.
a
We sometimes summarize the fact that these areas vanish so long as 3 is not
included in the interval we are considering by saying that the support of this
δ-function is “at 3.”
90 CHAPTER 26. DISTRIBUTIONS
Once you’re happy with this Dirac δ-function, you’ll also be happy with a Dirac
δ-function—call it δx —with support concentrated at any specific real number x.
This δx vanishes for t 6= x and intuitively speaking, has an infinite blip at t = x.
So, the original delta-function we were discussing, i.e., δ(t) would be denoted
δ3 (t).
A question: If you’ve never seen distributions before, but know the Riemann
Rb
integral, can you guess at what the definition of a c(t)D(t) is, and can you
formulate hypotheses on c(t) that would allow you to endow this expression
with a definite meaning?
A second question: If you have not seen distributions before, and have an-
swered the first question above, let c(t) be an honest function for which your
definition of
Z b
c(t)D(t)
a
applies. Now let x be a real number. Can you use your definition to compute
Z +∞
c(t)δx (t)?
−∞
R +∞
The answer to this second question, by the way, is: −∞
c(t)δx (t) = c(x). This
will be useful in the later sections!
The theory of distributions gives a partial answer to the following funny ques-
tion:
How in the world can you “take the derivative” of a function F (t)
that doesn’t have a derivative?
The short answer to this question is that this derivative F 0 (t) which doesn’t
exist as a function may exist as a distribution. What then is the integral of that
distribution? Well, it is given by the original function!
Z b
F 0 (t)dt = F (b) − F (a).
a
Let us practice this with simple staircase functions. For example, what is the
derivative—in the sense of the theory of distributions—of the function in Fig-
ure 26.4? Answer: δ0 + 2δ1 .
91
3
2.5
2
1.5
1
0.5
Figure 26.4: The staircase function that is 0 for t ≤ 0, 1 for 0 < t ≤ 1 and 3 for
1 < t ≤ 2 has derivative δ0 + 2δ1 .
We’ll be dealing with much more complicated staircase functions in the next
chapter, but the general principles discussed here will nicely apply there [16].
Chapter 27
The operation that starts with a graph and goes to its spectral pic-
ture that records the frequencies, amplitudes, and phases of the pure
sine waves that, together, compose the graph is called the Fourier
transform.
-4 -3 -2 -1 1 2 3 4
Figure 27.1: The graph of an even function is symmetrical about the y-axis.
When we get to apply this discussion to the staircase of primes π(t) or the
tinkered staircase of primes ψ(t), both of which being defined only for positive
values of t, then we would “lose little information” in our quest to understand
them if we simply “symmetrized their graphs” by defining their values on neg-
ative numbers −t via the formulas π(−t) = π(t) and ψ(−t) = ψ(t) thereby
turning each of them into even functions.
92
93
14
12
10
8
6
4
2
-40 -20 20 40
The idea behind the Fourier transform is to express f (t) as made up out of sine
and cosine wave functions. Since we have agreed to consider only even func-
tions, we can dispense with the sine waves—they won’t appear in our Fourier
analysis—and ask how to reconstruct f (t) as a sum (with coefficients) of cosine
functions (if only finitely many frequencies occur in the spectrum of our func-
tion) or more generally, as an integral if the spectrum is more elaborate. For
this work, we need a little machine that tells us, for each real number θ, whether
or not θ is in the spectrum of f (t), and if so, what the amplitude is of the cosine
function cos(θt) that occurs in the Fourier expansion of f (t)—this amplitude
answers the awkwardly phrased question:
How much cos(θt) “occurs in” f (t)?
We will denote this amplitude by fˆ(θ), and refer to it as the Fourier transform
of f (t). The spectrum, then, of f (t) is the set of all frequencies θ where the
amplitude is nonzero.
f (t) fˆ(θ)
Figure 27.3: The Fourier transform machine, which transforms f (t) into fˆ(θ)
R +∞
Now in certain easy circumstances—specifically, if −∞ |f (t)|dt (exists, and) is
finite—integral calculus provides us with an easy construction of that machine
(see Figure 27.3); namely:
Z +∞
fˆ(θ) = f (t) cos(−θt)dt.
−∞
This concise machine manages to “pick out” just the part of f (t) that has
frequency θ! It provides for us the analysis part of the Fourier analysis of our
function f (t).
94 CHAPTER 27. FOURIER TRANSFORMS: SECOND VISIT
But there is a synthesis part to our work as well, for we can reconstruct f (t)
from its Fourier
R +∞ transform, by a process intriguingly similar to the analysis part;
namely: if −∞ |fˆ(θ)|dθ (exists, and) is finite, we retrieve f (t) by the integral
Z +∞
1
f (t) = fˆ(θ) cos(θt)dθ.
2π −∞
R +∞
We are not so lucky to have −∞ |f (t)|dt finite when we try our hand at a
Fourier analysis of the staircase of primes, but we’ll work around this!
Chapter 28
Consider the δ-function that we denoted δ(t) (or δ0 (t)). This is also the “general-
ized function” that we thought of as the “limit of the red graphs” in Chapter 26
above. Even though δ(t) is a distribution and not a bona fide function, it is
symmetric about the origin, and also
Z +∞
|δ(t)|dt
−∞
exists, and is finite (its value is, in fact, 1). All this means that, appropriately
understood, the discussion of the previous section applies, and we can feed
this delta-function into our Fourier Transform Machine (Figure 27.3) to see
what frequencies and amplitudes arise in our attempt to express—whatever
this means!—the delta-function as a sum, or an integral, of cosine functions.
So what is the Fourier transform, δ̂0 (θ), of the delta-function?
Well, the general formula would give us:
Z +∞
δ̂0 (θ) = cos(−θt)δ0 (t)dt.
−∞
For any nice function c(t) we have that the integral of the product of c(t) by
the distribution δx (t) is given by the value of the function c(t) at t = x. So:
Z +∞
δ̂0 (θ) = cos(−θt)δ0 (t)dt = cos(0) = 1.
−∞
95
96 CHAPTER 28. FOURIER TRANSFORM OF DELTA
δ̂0 (θ) = 1.
One can think of this colloquially as saying that the delta-function is a perfect
example of white noise in that every frequency occurs in its Fourier analysis and
they all occur in equal amounts.
To generalize this computation, let us consider for any real number x the sym-
metrized delta-function with support at x and −x, given by
in Figure 28.1.
−x x
Figure 28.1: The sum (δx (t) + δ−x (t))/2, where we draw vertical arrows to
illustrate the Dirac delta functions.
What is the Fourier transform of this dx (t)? The answer is given by making the
same computation as we’ve just made:
Z +∞ Z +∞
1
dˆx (θ) = cos(−θt)δx (t)dt + cos(−θt)δ−x (t)dt
2 −∞ −∞
1
= cos(−θx) + cos(+θx)
2
= cos(xθ)
To summarize this in ridiculous (!) colloquial terms: for any frequency θ the
amount of cos(θt) you need to build up the generalized function (δx (t)+δ−x (t))/2
is cos(xθ).
So far, so good, but remember that the theory of the Fourier transform has—
like much of mathematics—two parts: an analysis part and a synthesis part.
We’ve just performed the analysis part of the theory for these symmetrized
delta functions (δx (t) + δ−x (t))/2.
Can we synthesize them—i.e., build them up again—from their Fourier trans-
forms?
We’ll leave this, at least for now, as a question for you.
Chapter 29
Trigonometric series
Given our interest in the ideas of Fourier, it is not surprising that we’ll want to
deal with things like
∞
X
F (θ) = ak cos(sk · θ)
k=1
where the sk are real numbers tending (strictly monotonically) to infinity. These
we’ll just call trigonometric series without asking whether they converge in
any sense for all values of θ, or even for any value of θ. The sk ’s that occur
in such a trigonometric series we will call the spectral values or for short,
the spectrum of the series, and the ak ’s the (corresponding) amplitudes. We
repeat that we impose no convergence requirements at all. But we also think of
these things as providing “cutoff” finite trigonometric sums, which we think of
as functions of two variables, θ and C (the “cutoff”) where
X
F (θ, C) := ak cos(sk · θ).
sk ≤C
These functions F (θ, C) are finite trigonometric series and therefore “honest
functions” having finite values everywhere.
Recall, as in Chapter 28, that for any real number x, we considered the sym-
metrized delta-function with support at x and −x, given by
97
98 CHAPTER 29. TRIGONOMETRIC SERIES
−x x
Figure 29.1: The sum (δx (t) + δ−x (t))/2, where we draw vertical arrows to
illustrate the Dirac delta functions.
viewed as distribution playing the role of the “inverse Fourier transform” of our
trigonometric series F (t).
Definition 29.1. Say that a trigonometric series F (θ) has a spike at the real
number θ = τ ∈ R if the set of absolute values |F (τ, C)| as C ranges through
positive number cutoffs is unbounded. A real number τ ∈ R is, in contrast, a
non-spike if those values admit a finite upper bound.
1. The first infinite trigonometric sum F (t) is a sum1 of pure cosine waves
with frequencies given by logarithms of powers of primes and with ampli-
tudes given by the formula
X log(p)
F (t) := − cos(t log(pn ))
pn
pn/2
99
100 CHAPTER 30. A SNEAK PREVIEW OF PART III
X
H(s) := 1 + cos(θ log(s)).
θ
These graphs will have “higher and higher peaks” concentrated more and
more accurately at the logarithms of powers of primes indicated in
our pictures below (see Figure 30.6) by the series of vertical blue spikes.
That the series of blue lines (i.e., the logarithms of powers of primes) in our pic-
tures below determines—via the trigonometric sums we describe—the series of
red lines (i.e., what we are calling the spectrum) and conversely is a consequence
of the Riemann Hypothesis.
log(2) log(3)
f (t) = − cos(t log(2)) − 1/2 cos(t log(3))
21/2 3
log(2) log(5)
− 1/2 cos(t log(4)) − 1/2 cos(t log(5))
4 5
1.5
0.5
20 40 60 80 100
Look at the peaks of Figure 30.1. There is nothing very impressive about
them, you might think; but wait, for f (t) is just a very “early” piece of
the infinite trigonometric sum F (t) described above.
Let us truncate the infinite sum F (t) taking only finitely many terms, by
choosing various “cutoff values” C and forming the finite sums
X log(p)
F≤C (t) := − n/2
cos(t log(pn ))
pn ≤C
p
101
and plotting their positive values. Figures 30.2–30.5 show what we get for
a few values of C.
In each of the graphs, we have indicated by red vertical arrows the real
numbers that give the values of the Riemann spectrum that we will be
discussing. These numbers at the red vertical arrows in Figures 30.2–30.5,
θ1 , θ2 , θ3 , . . .
They constitute what we are calling the Riemann spectrum and are key
to the staircase of primes [17].
1.5
0.5
0
0 20 40 60 80 100
P log(p)
Figure 30.2: Plot of − pn ≤5 pn/2 cos(t log(pn )) with arrows pointing to the
spectrum of the primes
3
2.5
2
1.5
1
0.5
0
0 20 40 60 80 100
P log(p)
Figure 30.3: Plot of − pn ≤20 pn/2
cos(t log(pn )) with arrows pointing to the
spectrum of the primes
Note that the high peaks in Figure 30.4 seem to be lining up more
accurately with the vertical red lines. Note also that the y-axis has
been rescaled.
0
0 20 40 60 80 100
P log(p)
Figure 30.4: Plot of − pn ≤50 pn/2
cos(t log(pn )) with arrows pointing to the
spectrum of the primes
Here, the peaks are even sharper, and note that again they are higher;
that is, we have rescaled the y-axis.
103
8
7
6
5
4
3
2
1
0 20 40 60 80 100
P log(p)
Figure 30.5: Plot of − pn ≤500 pn/2
cos(t log(pn )) with arrows pointing to the
spectrum of the primes
We will pay attention to:
• how the spikes “play out” as we take the sums of longer and longer
pieces of the infinite sum of cosine waves above, given by larger and
larger cutoffs C,
• how this spectrum of red lines more closely matches the high peaks
of the graphs of the positive values of these finite sums,
• how these peaks are climbing higher and higher,
• what relationship these peaks have to the Fourier analysis of the
staircase of primes,
• and, equally importantly, what these mysterious red lines signify.
θ1 , θ2 , θ3 , . . .
2 3 4 5 7 8 9 11 13 1617 19 23 25 27 29
P1000
Figure 30.6: Illustration of − i=1 cos(log(s)θi ), where θ1 ∼ 14.13, . . . are the
first 1000 contributions to the spectrum. The red dots are at the prime powers
pn , whose size is proportional to log(p).
105
Chapter 31
On losing no information
Now, the problem we have been concentrating on, in this book, has been—in
effect—to understand the pattern, if we can call it that, given by the placement
of prime numbers among the natural line-up of all whole numbers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
There are, of course, many ways for us to present this basic pattern. Our initial
strategy was to focus attention on the staircase of primes which gives us a vivid
portrait, if you wish, of the order of appearance of primes among all numbers.
106
107
12
10
8
6
4
2
5 10 15 20 25 30 35
As we have already hinted in the previous sections, however, there are various
ways open to us to tinker with—and significantly modify—our staircase without
losing the essential information it contains. Of course, there is always the danger
of modifying things in such a way that “retrieval” of the original data becomes
difficult. Moreover, we had better remember every change we have made if we
are to have any hope of retrieving the original data!
With this in mind, let us go back to Chapter 18 (discussing the staircase
of primes) and Chapter 19, where we tinkered with the original staircase of
primes—alias: the graph of π(X)—to get ψ(X) whose risers look—from afar—
as if they approximated the 45 degree staircase.
At this point we’ll do some further carpentry on ψ(X) without destroying the
valuable information it contains. We will be replacing ψ(X) by a generalized
function, i.e., a distribution, which we denote Φ(t) that has support at all posi-
tive integral multiples of logs of prime numbers, and is zero on the complement
of that discrete set. Recall that by definition, a discrete subset S of real num-
bers is the support of a function, or of a distribution, if the function vanishes
on the complement of S and doesn’t vanish on the complement of any proper
subset of S.
Given the mission of our book, it may be less important for us to elaborate on
the construction of Φ(t) than it is (a) to note that Φ(t) contains all the valuable
information that ψ(X) has and (b) to pay close attention to the spike values of
the trigonometric series that is the Fourier transform of Φ(t).
For the definition of the distribution Φ(t) see the end-note [18].
A distribution that has a discrete set of real numbers as its support—as Φ(t)
does—we sometimes like to call spike distributions since the pictures of func-
tions approximating it tend to look like a series of spikes.
We have then before us a spike distribution with support at integral multiples of
logarithms of prime numbers, and this generalized function retains the essential
108 CHAPTER 31. ON LOSING NO INFORMATION
information about the placement of prime numbers among all whole numbers,
and will be playing a major role in our story: knowledge of the placement of the
“blips” constituting this distribution (its support), being at integral multiples of
logs of prime numbers, would allow us to reconstruct the position of the prime
numbers among all numbers. Of course there are many other ways to package
this vital information, so we must explain our motivation for subjecting our
poor initial staircase to the particular series of brutal acts of distortion that we
described, which ends up with the distribution Φ(t).
Chapter 32
that has support at the points x and −x. We mentioned that its Fourier trans-
form, dˆx (θ), is equal to cos(xθ) (and gave some hints about why this may be
true).
Our next goal is to work with the much more interesting “spike function”
which was one of the generalized functions that we engineered in Chapter 31,
and that has support at all nonnegative integral multiples of logarithms of prime
numbers.
As with any function—or generalized function—defined for non-negative values
of t, we can “symmetrize it” (about the t-axis) which means that we can define
it on negative real numbers by the equation
Φ(−t) = Φ(t).
Let us make that convention, thereby turning Φ(t) into an even generalized
function, as illustrated in Figure 32.1. (An even function on the real line is a
function that takes the same value on any real number and its negative as in
the formula above.)
109
110 CHAPTER 32. FROM PRIMES TO THE RIEMANN SPECTRUM
Figure 32.1: Φ(t) is a sum of Dirac delta functions at the logarithms of prime
powers pn weighted by p−n/2 log(p) (and log(2π) at 0).
Since the Fourier transform of dx (t) is cos(xθ), the Fourier transform of each
dn log(p) (t) is cos(n log(p)θ), so the Fourier transform of Φ≤C (t) is
X
Φ̂≤C (θ) := 2 p−n/2 log(p) cos(n log(p)θ).
prime powers pn ≤C
So, following the discussion in Chapter 29 above, we are dealing with the cutoffs
at finite points C of the trigonometric series1
X
Φ̂(θ) := 2 p−n/2 log(p) cos(n log(p)θ).
prime powers pn
For example, when C = 3, we have the rather severe cutoff of these trigonometric
series: Φ̂≤3 (θ) takes account only of the primes p = 2 and p = 3:
2 2
Φ̂≤3 (θ) = √ log(2) cos(log(2)θ) + √ log(3) cos(log(3)θ),
2 3
1 The trigonometric series in the text—whose spectral values are the logarithms of prime
0
5 10 15 20 25 30 35 40
-1
-2
θ = 14.135375354 . . . .
So in Figure 32.2 we begin this exploration by plotting Φ̂≤3 (θ), together with
its derivative, highlighting the zeroes of the derivative.
2
1
0 5 10 15 20 25 30 35 40
-1
-2
Figure 32.3: Plot of Φ̂≤3 (θ) in blue and its derivative in grey
10 20 30 40
-1
-2
We give a further sample of graphs for a few higher cutoff values C (introducing
a few more primes into the game!).
4
5
2
10 20 30 40
10 20 30 40
-2 -5
-4
-10
To continue:
113
10 10
5 5
10 20 30 40 10 20 30 40
-5 -5
-10 -10
-15 -15
10 10
5 5
10 20 30 40 10 20 30 40
-5 -5
-10 -10
-15 -15
θ1 , θ2 , θ3 , . . .
θ1 = 14.134725 . . .
θ2 = 21.022039 . . .
θ3 = 25.010857 . . .
θ4 = 30.424876 . . .
θ5 = 32.935061 . . .
θ6 = 37.586178 . . .
Riemann defined this sequence of numbers in his 1859 article in a manner some-
what different from the treatment we have given. In that article these θi appear
as “imaginary parts of the nontrivial zeroes of his zeta function;” we will discuss
this briefly in Part IV, Chapter 37 below.
Chapter 33
2.5
1.5
0.5
0
5 10 15 20 25 30
Again, just as with the staircase of primes, we might hope that as we plot this
staircase from a distance as in Figures 33.2 and 33.3 that it will look like a
beautiful smooth curve.
In fact, we know, conditional on RH, the staircase of real numbers θ1 , θ2 , θ3 , . . .
is very closely approximated by the curve
T T
log ,
2π 2πe
(the error term being bounded by a constant times log T ).
114
115
10
25
8
20
6
15
4
10
2 5
0
0 10 20 30 40 50 20 40 60 80 100
T T
Figure 33.2: The staircase of the Riemann spectrum and the curve 2π log 2πe
600
500
400
300
200
100
Figure 33.3: The staircase of the Riemann spectrum looks like a smooth curve
Nowadays, these mysterious numbers θi , these spectral lines for the staircase
of primes, are known in great abundance and to great accuracy. Here is the
smallest one, θ1 , given with over 1,000 digits of its decimal expansion:
14.134725141734693790457251983562470270784257115699243175685567460149
9634298092567649490103931715610127792029715487974367661426914698822545
8250536323944713778041338123720597054962195586586020055556672583601077
3700205410982661507542780517442591306254481978651072304938725629738321
5774203952157256748093321400349904680343462673144209203773854871413783
1735639699536542811307968053149168852906782082298049264338666734623320
0787587617920056048680543568014444246510655975686659032286865105448594
4432062407272703209427452221304874872092412385141835146054279015244783
3835425453344004487936806761697300819000731393854983736215013045167269
6838920039176285123212854220523969133425832275335164060169763527563758
9695376749203361272092599917304270756830879511844534891800863008264831
2516911271068291052375961797743181517071354531677549515382893784903647
4709727019948485532209253574357909226125247736595518016975233461213977
3160053541259267474557258778014726098308089786007125320875093959979666
60675378381214891908864977277554420656532052405
116 CHAPTER 33. HOW MANY θI ’S ARE THERE?
and if, by any chance, you wish to peruse the first 2,001,052 of these θi ’s calcu-
lated to an accuracy within 3 · 10−9 , consult Andrew Odlyzko’s tables:
http://www.dtc.umn.edu/~odlyzko/zeta_tables
Chapter 34
Since people have already computed1 the first 10 trillion θ’s and have never
found one with multiplicity > 1, it is generally expected that the multiplicity of
all the θ’s in the Riemann spectrum is 1.
But, independent of that expectation, our convention in what follows will be
that we count each of the elements in the Riemann spectrum repeated as many
times as their multiplicity. So, if it so happens that θn occurs with multiplicity
two, we view the Riemann spectrum as being the series of numbers
θ1 , θ2 , . . . , θn−1 , θn , θn , θn+1 , . . .
It has been conjectured that there are no infinite arithmetic progressions among
these numbers. More broadly, one might expect that there is no visible correla-
tion between the θi ’s and translation, i.e., that the distribution of θi ’s modulo
any positive number T is random, as in Figure 34.1.
0.15 1
0.8
0.1 0.6
0.4
0.05
0.2
117
118CHAPTER 34. FURTHER QUESTIONS ABOUT THE RIEMANN SPECTRUM
1500
1000
500
0
0 0.5 1 1.5 2
Figure 34.2: Frequency histogram of the first 99,999 gaps in the Riemann spec-
trum
exciting for our understanding of the Riemann Hypothesis, in view of what is known as
the Hilbert-Pólya Conjecture (see http://en.wikipedia.org/wiki/Hilbert%E2%80%93P%C3%
B3lya_conjecture).
Chapter 35
θ1 , θ2 , θ3 , . . .
for large C pinpoint the spectrum (as discussed in the previous two chapters).
To start the return game, consider this sequence of trigonometric functions that
have (zero and) the θi as spectrum
X
GC (x) := 1 + cos(θi · x).
i<C
The theoretical story behind the phenomena that we will see graphically in
this chapter is a manifestation of Riemann’s explicit formula. For modern text
references that discuss this general subject, see endnote [20]
119
120 CHAPTER 35. FROM THE RIEMANN SPECTRUM TO PRIMES
2 3 4 5 7 8 9 11 13 1617 19 23 25 27 29
P1000
Figure 35.1: Illustration of − i=1 cos(log(s)θi ), where θ1 ∼ 14.13, . . . are the
first 1000 contributions to the Riemann spectrum. The red dots are at the prime
powers pn , whose size is proportional to log(p).
27 29 31 32
P1000
Figure 35.2: Illustration of − i=1 cos(log(s)θi ) in the neighborhood of a twin
prime. Notice how the two primes 29 and 31 are separated out by the Fourier
series, and how the prime powers 33 and 25 also appear.
Figure 35.3: Fourier series from 1, 000 to 1, 030 using 15,000 of the numbers θi .
Note the twin primes 1,019 and 1,021 and that 1,024 = 210 .
Part IV
Back to Riemann
121
Chapter 36
We have been dealing in Part III of our book with Φ(t), a distribution that—
we said—contains all the essential information about the placement of primes
among numbers. We have given a clean restatement of Riemann’s hypothesis,
the third restatement so far, in term of this Φ(t). But Φ(t) was the effect of a
series of recalibrations and reconfigurings of the original untampered-with stair-
case of primes. A test of whether we have strayed from our original problem—to
understand this staircase—would be whether we can return to the original stair-
case, and “reconstruct it” so to speak, solely from the information of Φ(t)—or
equivalently, assuming the Riemann Hypothesis as formulated in Chapter 19—
can we construct the staircase of primes π(X) solely from knowledge of the
sequence of real numbers θ1 , θ2 , θ3 , . . . ?
The answer to this is yes (given the Riemann Hypothesis), and is discussed very
beautifully by Bernhard Riemann himself in his famous 1859 article.
Bernhard Riemann used the spectrum of the prime numbers to provide an exact
analytic formula that analyzes and/or synthesizes the staircase of primes. This
formula is motivated by Fourier’s analysis of functions as constituted out of
RX
cosines. Recall from Chapter 13 that Gauss’s guess is Li(X) = 2 dt/log(t). To
continue this discussion, we do need some familiarity with complex numbers,
for the definition of Riemann’s exact formula requires extending the definition
of the function Li(X) to make sense for complex numbers X = a + bi. In fact,
RX
more naturally, one might work with the path integral li(X) := 0 dt/log(t).
Riemann begins his discussion (see Figure 36.1) by defining
X∞ XN
µ(n) 1
(N ) µ(n) 1
R(X) = li(X ) = lim R (X) := lim
n li(X n ),
n=1
n N →∞ N →∞
n=1
n
122
123
where R(N ) (X) denotes the truncated sum, which one can compute as an ap-
proximation.
In all the discussion of this section the order of summation is important. For
such considerations and issues regarding actual computation we refer to Riesel-
Gohl (see http://wstein.org/rh/rg.pdf).
Here µ(n) is the Möbius function which is defined by
if n is a square-free positive integer with an even
1
number of distinct prime factors,
µ(n) = −1 if n is a square-free positive integer with an odd
number of distinct prime factors,
0 if n is not square-free.
10 20 30 40 50
-0.5
-1
Figure 36.2: The blue dots plot the values of the Möbius function µ(n), which
is only defined at integers.
25 150
20
100
15
10
50
5
0
0 20 40 60 80 100 200 400 600 800 1000
Figure 36.3: Comparisons of Li(X) (top), π(X) (middle), and R(X) (bottom,
computed using 100 terms)
1340
1320
1300
1280
1260
1240
Figure 36.4: Closeup comparison of Li(X) (top), π(X) (middle), and R(X)
(bottom, computed using 100 terms)
and so on
R3 (X) = R2 (X) + C3 (X),
and in the limit provides us with an exact fit.
The Riemann Hypothesis, if true, would tell us that these correction terms
C1 (X), C2 (X), C3 (X), . . . are all square-root small
The elegance of Riemann’s treatment of this problem is that the corrective terms
Ck (X) are all modeled on the fundamental R(X) and are completely described
if you know the sequence of real numbers θ1 , θ2 , θ3 , . . . of the last section.
Assuming the Riemann Hypothesis, the Riemann correction terms Ck (X) are
defined to be
1 1
Ck (X) = −R(X 2 +iθk ) − R(X 2 −iθk ),
where θ1 = 14.134725 . . . , θ2 = 21.022039 . . . , etc., is the spectrum of the prime
numbers [21].
In sum, Riemann provided an extraordinary recipe that allows us to work out
the harmonics,
C1 (X), C2 (X), C3 (X), . . .
without our having to consult, or compute with, the actual staircase of primes.
As with Fourier’s modus operandi where both fundamental and all harmonics
are modeled on the sine wave, but appropriately calibrated, Riemann fashioned
his higher harmonics, modeling them all on a single function, namely R(X).
The convergence of Rk (X) to π(X) is strikingly illustrated in the plots in Fig-
ures 36.6–36.11 of Rk for various values of k.
126 CHAPTER 36. BUILDING π(X) FROM THE SPECTRUM
25
20
15
10
0 20 40 60 80 100
25
20
15
10
0
0 20 40 60 80 100
Figure 36.7: The function R10 approximating the staircase of primes up to 100
25
20
15
10
0
0 20 40 60 80 100
Figure 36.8: The function R25 approximating the staircase of primes up to 100
127
25
20
15
10
0
0 20 40 60 80 100
Figure 36.9: The function R50 approximating the staircase of primes up to 100
80
60
40
20
Figure 36.10: The function R50 approximating the staircase of primes up to 500
84
82
80
78
76
74
72
70
Figure 36.11: The function Li(X) (top, green), the function R50 (X) (in blue),
and the staircase of primes on the interval from 350 to 400.
Chapter 37
• We will discuss a key idea that Leonhard Euler had (circa 1740).
We will say only a few words here about this, in hopes of giving at least a shred
of a hint of how marvelous Riemann’s idea is. We will be drawing, at this point,
on some further mathematical background. For readers who wish to pursue the
themes we discuss, here is a list of sources, that are our favorites among those
meant to be read by a somewhat broader audience than people very advanced
in the subject. We list them in order of “required background.”
128
129
Sk (N ) = 1k + 2k + 3k + · · · + (N − 1)k ,
N (N − 1) N2 1
S1 (N ) = 1 + 2 + 3 + · · · + (N − 1) = = − · N,
2 2 2
N3 1
S2 (N ) = 12 + 22 + 32 + · · · + (N − 1)2 = + ··· − · N,
3 6
N4
S3 (N ) = 13 + 23 + 33 + · · · + (N − 1)3 = + · · · − 0 · N,
4
N5 1
S4 (N ) = 14 + 24 + 34 + · · · + (N − 1)4 = + ··· − · N,
5 30
etc. For odd integers k > 1 this linear term vanishes. For even integers 2k the
x2k
Bernoulli number B2k is the rational number given by the coefficient of (2k)! in
the power series expansion
∞
x x X k+1 x2k
= 1 − + (−1) B 2k .
ex − 1 2 (2k)!
k=1
130 CHAPTER 37. AS RIEMANN ENVISIONED IT
So
1 1 1 1
B2 = , B4 = , B6 = , B8 = ,
6 30 42 30
and to convince you that the numerator of these numbers is not always 1, here
are a few more:
5 691 7
B10 = , B12 = , B14 = .
66 2730 6
where the infinite sum on the left and the infinite product on the right both
converge (and are equal) if s > 1. He also evaluated these sums at even positive
integers, where—surprise—the Bernoulli numbers come in again; and they and
π combine to yield the values of the zeta function at even positive integers:
1 1
ζ(2) = + 2 + · · · = π 2 /6 ' 1.65 . . .
12 2
1 1
ζ(4) = 4 + 4 + · · · = π 4 /90 ' 1.0823 . . .
1 2
and, in general,
1 1 22n−1
ζ(2n) = + 2n + · · · = (−1)n+1 B2n π 2n · .
12n 2 (2n)!
A side note to Euler’s formulas comes from the fact (only known much later)
that no power of π is rational: do you see how to use this to give a proof that
131
there are infinitely many primes, combining Euler’s infinite product expansion
displayed above with the formula for ζ(2), or with the formula for ζ(4), or, in
fact, for the formulas for ζ(2n) for any n you choose?
Pafnuty Lvovich Chebyshev’s idea ('1845): The second moment in the
history of evolution of this function ζ(s) is when Chebyshev used the same
formula as above in the extended range where s is allowed now to be a real
variable—not just an integer—greater than 1. Making use of this extension of
the range of definition of Euler’s sum of reciprocals of powers of consecutive
whole numbers, Chebyshev could prove that for large x the ratio of π(x) and
x/ log(x) is bounded above and below by two explicitly given constants. He
also proved that there exists a prime number in the interval bounded by n
and 2n for any positive integer n (this was called Bertrand’s postulate; see
http://en.wikipedia.org/wiki/Proof_of_Bertrand%27s_postulate).
Riemann’s idea (1859): It is in the third step of the evolution of ζ(s) that
something quite surprising happens. Riemann extended the range of Cheby-
shev’s sum of reciprocals of positive real powers of consecutive whole numbers
allowing the argument s to range over the entire complex plane s (avoiding
s = 1). Now this is a more mysterious extension of Euler’s function, and it is
deeper in two ways:
• The formula
1 1 1
ζ(s) := s
+ s + s + ···
1 2 3
does converge when the real part of the exponent s is greater than 1
(i.e., this allows us to use the same formula, as Chebyshev had done, for
the right half plane in the complex plane determined by the condition
s = x + iy with x > 1 but not beyond this). You can’t simply use the
same formula for the extension.
• So you must face the fact that if you wish to “extend” a function beyond
the natural range in which its defining formula makes sense, there may be
many ways of doing it.
Euler’s function to the entire complex plane except for the point s = 1, thereby
defining what we now call Riemann’s zeta function.
Those ubiquitous Bernoulli numbers reappear yet again as values of this extended
zeta function at negative integers:
so since the Bernoulli numbers indexed by odd integers > 1 all vanish, the
extended zeta function ζ(s) actually vanishes at all even negative integers.
The even integers −2, −4, −6, . . . are often called the trivial zeroes of the
Riemann zeta function. There are indeed other zeroes of the zeta function, and
those other zeroes could—in no way—be dubbed “trivial,” as we shall shortly
see.
and we can do this term-by-term, since the real part of s is > 1. Then
taking the derivative gives us:
X∞
dζ
(s)/ζ(s) = − Λ(n)n−s
ds n=1
where
(
log(p) when n = pk for p a prime number and k > 0, and
Λ(n) :=
0 if n is not a power of a prime number.
2. You know lots about an analytic function if you know its zeroes
and poles. For example for polynomials, or even rational functions: if
133
someone told you that a certain rational function f (s) vanishes to order
1 at 0 and at ∞; and that it has a double pole at s = 2 and at all other
points has finite nonzero values, then you can immediate say that this
mystery function is a nonzero constant times s/(s − 2)2 .
Knowing the zeroes and poles (in the complex plane) alone of the Riemann
zeta function doesn’t entirely pin it down—you have to know more about
its behavior at infinity since—for example, multiplying a function by ez
doesn’t change the structure of its zeroes and poles in the finite plane.
But a complete understanding of the zeroes and poles of ζ(s) will give all
the information you need to pin down the placement of primes among all
numbers.
So here is the score:
In the above quotation, Riemann’s roots are the θi ’s and the statement that
they are “real” is equivalent to RH.
The zeta function, then, is the vise, that so elegantly clamps together informa-
tion about the placement of primes and their spectrum!
That a simple geometric property of these zeroes (lying on a line!) is directly
equivalent to such profound (and more difficult to express) regularities among
prime numbers suggests that these zeroes and the parade of Riemann’s correc-
tions governed by them—when we truly comprehend their message—may have
lots more to teach us, may eventually allow us a more powerful understanding of
arithmetic. This infinite collection of complex numbers, i.e., the nontrivial ze-
roes of the Riemann zeta function, plays a role with respect to π(X) rather like
the role the spectrum of the Hydrogen atom plays in Fourier’s theory. Are the
primes themselves no more than an epiphenomenon, behind which there lies,
still veiled from us, a yet-to-be-discovered, yet-to-be-hypothesized, profound
conceptual key to their perplexing orneriness? Are the many innocently posed,
yet unanswered, phenomenological questions about numbers—such as in the
ones listed earlier—waiting for our discovery of this deeper level of arithmetic?
Or for layers deeper still? Are we, in fact, just at the beginning?
These are not completely idle thoughts, for a tantalizing analogy relates the
number theory we have been discussing to an already established branch of
mathematics—due, largely, to the work of Alexander Grothendieck, and Pierre
Deligne—where the corresponding analogue of Riemann’s hypothesis has indeed
been proved. . . .
Chapter 38
Our book, so far, has been exclusively about Riemann’s ζ(s) and its zeroes. We
have been discussing how (the placement of) the zeroes of ζ(s) in the complex
plane contains the information needed to understand (the placement of) the
primes in the set of all whole numbers; and conversely.
It would be wrong—we think—if we don’t even mention that ζ(s) fits into a
broad family of similar functions that connect to other problems in number
theory.
For example—instead of the ordinary integers—consider the Gaussian integers.
This is the collection of numbers
{a + bi}
√
where i = −1 and a, b are ordinary integers. We can add and multiply two
such numbers and get another of the same form. The only “units” among the
Gaussian integers (i.e., numbers whose inverse is again a Gaussian integer) are
the four numbers ±1, ±i and if we multiply any Gaussian integer a + bi by any
of these four units, we get the collection {a + bi, −a − bi, −b + ai, b − ai}. We
measure the size of a Gaussian integer by the square of its distance to the origin,
i.e.,
|a + bi|2 = a2 + b2 .
This size function is called the norm of the Gaussian integer a + bi and can also
be thought of as the product of a + bi and its “conjugate” a − bi. Note that the
norm is a nice multiplicative function on the set of Gaussian integers, in that
the norm of a product of two Gaussian integers is the product of the norms of
each of them.
We have a natural notion of prime Gaussian integer, i.e., one with a > 0
and b ≥ 0 that cannot be factored as the product of two Gaussian integers of
135
136 CHAPTER 38. COMPANIONS TO THE ZETA FUNCTION
smaller size. Given that every nonzero Gaussian integer is uniquely expressible
as a unit times a product of prime Gaussian integers, can you prove that if a
Gaussian integer is a prime Gaussian integer, then its size must either be an
ordinary prime number, or the square of an ordinary prime number?
Figure 38.1 contains a plot of the first few Gaussian primes as they display
themselves amongst complex numbers:
10
1 2 3 4 5 6 7 8 9 10
100
80
60
40
20
0
0 20 40 60 80 100
2 4 6 8 10 12
25
20
15
10
20 40 60 80
150
100
50
1200
1000
800
600
400
200
The natural question to ask, then, is: how are the Gaussian prime numbers
distributed? Can one provide as close an estimate to their distribution and
structure, as one has for ordinary primes? The answer, here is yes: there is a
companion theory, with an analogue to the Riemann zeta function playing a role
similar to the prototype ζ(s). And, it seems as if its “nontrivial zeroes” behave
similarly: as far as things have been computed, they all have the property that
their real part is equal to 12 . That is, we have a companion to the Riemann
Hypothesis.
This is just the beginning of a much larger story related to what has been
come to be called the “Grand Riemann Hypotheses” and connects to analogous
problems, some of them actually solved, that give some measure of evidence for
the truth of these hypotheses. For example, for any system of polynomials in
140 CHAPTER 38. COMPANIONS TO THE ZETA FUNCTION
a fixed number of variables (with integer coefficients, say) and for each prime
number p there are “zeta-type” functions that contain all the information needed
to count the number of simultaneous solutions in finite fields of characteristic
p. That such counts can be well-approximated with a neatly small error term
is related to the placement of the zeroes of these “zeta-type” functions. There
is then an analogous “Riemann Hypothesis” that prescribes precise conditions
on the real parts of their zeroes—this prescription being called the “Riemann
Hypothesis for function fields.” Now the beauty of this analogous hypothesis is
that it has, in fact, been proved!
Is this yet another reason to believe the Grand Riemann Hypothesis?
Endnotes
N := 3452690329392158031464109281736967404068448156842396721012
9920642145194459192569415445652760676623601087497272415557
0842527652727868776362959519620872735612200601036506871681
124610986596878180738901486527
141
142 ENDNOTES
A good exercise is to try to prove this, and a one-word hint that might lead
you to one of the many proofs of it is induction.1
Now we are going to use this as a criterion, by—in effect—restating it in
what logicians would call its contrapositive:
Theorem 39.2 (The Fermat a-test). If M is a positive integer, and a is
any number relatively prime to M such that aM −1 − 1 is not divisible by
M , then M is not a prime.
3334581100595953025153969739282790317394606677381970645616725285996925
6610000568292727335792620957159782739813115005451450864072425835484898
565112763692970799269335402819507605691622173717318335512037457.
ENDNOTES 143
• The first linear algebra run to complete was the one with CADO-
NFS, thus we decided to stop the other run.
Bill Hart
[2] Given an integer n, there are many algorithms available for trying to write
n as a product of prime numbers. First we can apply trial division, where
we simply divide n by each prime 2, 3, 5, 7, 11, 13, . . . in turn, and see what
small prime factors we find (up to a few digits). After using this method
to eliminate as many primes as we have patience to eliminate, we typi-
cally next turn to a technique called Lenstra’s elliptic curve method, which
allows us to check n for divisibility by bigger primes (e.g., around 10–15
digits). Once we’ve exhausted our patience using the elliptic curve method,
we would next hit our number with something called the quadratic sieve,
which works well for factoring numbers of the form n = pq, with p and
q primes of roughly equal size, and n having less than 100 digits (say,
though the 100 depends greatly on the implementation). All of the above
algorithms—and then some—are implemented in SageMath, and used by
default when you type factor(n) into SageMath. Try typing factor(some
number, verbose=8) to see for yourself.
If the quadratic sieve fails, a final recourse is to run the number field
sieve algorithm, possibly on a supercomputer. To give a sense of how
powerful (or powerless, depending on perspective!) the number field
sieve is, a record-setting factorization of a general number using this al-
gorithm is the factorization of a 232 digit number called RSA-768 (see
https://eprint.iacr.org/2010/006.pdf):
n = 12301866845301177551304949583849627207728535695953347921973224521
517264005072636575187452021997864693899564749427740638459251925573263
034537315482685079170261221429134616704292143116022212404792747377940
80665351419597459856902143413
which factors as pq, where
p = 334780716989568987860441698482126908177047949837137685689124313889
82883793878002287614711652531743087737814467999489
ENDNOTES 145
and
q = 367460436667995904282446337996279526322791581643430876426760322838
15739666511279233373417143396810270092798736308917.
We encourage you to try to factor n in SageMath, and see that it fails.
SageMath does not yet include an implementation of the number field sieve
algorithm, though there are some free implementations currently available
(see http://www.boo.net/~jasonp/qs.html).
[3] We can use SageMath (at http://sagemath.org) to quickly compute the
“hefty number” p = 243,112,609 − 1. Simply type p = 2^43112609 - 1 to
instantly compute p. In what sense have we computed p? Internally, p is
now stored in base 2 in the computer’s memory; given the special form
of p it is not surprising that it took little time to compute. Much more
challenging is to compute all the base 10 digits of p, which takes a few
seconds: d = str(p). Now type d[-50:] to see the last 50 digits of p. To
compute the sum 58416637 of the digits of p type sum(p.digits()).
[4] In contrast to the situation with factorization, testing integers of this size
(e.g., the primes p and q) for primality is relatively easy. There are fast al-
gorithms that can tell whether or not any random thousand digit number is
prime in a fraction of second. Try for yourself using the SageMath command
is prime. For example, if p and q are as in endnote 2, then is prime(p)
and is prime(q) quickly output True and is prime(p*q) outputs False.
However, if you type factor(p*q, verbose=8) you can watch as Sage-
Math tries forever and fails to factor pq.
[5] In Sage, the function prime range enumerates primes in a range.
For example, prime range(50) outputs the primes up to 50 and
prime range(50,100) outputs the primes between 50 and 100. Typing
prime range(10^8) in SageMath enumerates the primes up to a hundred
million in around a second. You can also enumerate primes up to a bil-
lion by typing v=prime range(10^9), but this will use a large amount of
memory, so be careful not to crash your computer if you try this. You can
see that there are π(109 ) = 50,847,534 primes up to a billion by then typ-
ing len(v). You can also compute π(109 ) directly, without enumerating
all primes, using the command prime pi(10^9). This is much faster since
it uses some clever counting tricks to find the number of primes without
actually listing them all.
In Chapter 19 we tinkered with the staircase of primes by first counting
both primes and prime powers. There are comparatively few prime powers
that are not prime. Up to 108 , only 1,405 of the 5,762,860 prime powers
are not themselves primes. To see this, first enter a = prime pi(10^8);
pp = len(prime powers(10^8)). Typing (a, pp, pp-a) then outputs
the triple (5761455, 5762860, 1405).
[6] Hardy and Littlewood give a nice conjectural answer to such questions
about gaps between primes. See Problem A8 of Guy’s book Unsolved Prob-
146 ENDNOTES
lems in Number Theory (2004). Note that Guy’s book discusses counting
the number Pk (X) of pairs of primes up to X that differ by a fixed even
number k; we have Pk (X) ≥ Gapk (X), since for Pk (X) there is no require-
ment that the pairs of primes be consecutive.
[7] If f (x) and g(x) are real-valued functions of a real variable x such that
for any > 0 both of them take their values between x1− and x1+ for x
sufficiently large, then say that f (x) and g(x) are good approximations
of one another if, for any positive the absolute value of their difference
1
is less than x 2 + for x sufficiently large. The functions Li(X) and R(X)
are good approximations of one another.
[8] This computation of π(X) was done by David J. Platt in 2012, and is the
largest value of π(X) ever computed. See http://arxiv.org/abs/1203.
5712 for more details.
[10] For a proof of this here’s a hint. Compute the difference between the deriva-
tives of Li(x) and of x/ log x. The answer is 1/ log2 (x). So you must show
RX RX
that the ratio of 2 dx/ log2 (x) to Li(x) = 2 dx/ log(x) tends to zero as x
goes to infinity, and this is a good calculus exercise.
[12] We have X
ψ(X) = log p
pn ≤X
[14] http://en.wikipedia.org/wiki/Distribution_%28mathematics%29
contains a more formal definition and treatment of distributions. Here is
Schwartz’s explanation for his choice of the word distribution:
[15] David Mumford suggested that we offer the following paragraph from
http://en.wikipedia.org/wiki/Dirac_delta_function
on the Dirac delta function:
[16] As discussed in
http://en.wikipedia.org/wiki/Distribution_%28mathematics%29,
“generalized functions” were introduced by Sergei Sobolev in the 1930s,
then later independently introduced in the late 1940s by Laurent Schwartz,
who developed a comprehensive theory of distributions.
[17] If the Riemann Hypothesis holds they are precisely the imaginary parts of
the “nontrivial” zeroes of the Riemann zeta function.
where Ψ(t) = ψ(et ) (see Figure 39.2), and Ψ0 is the derivative of Ψ(t),
viewed as a distribution. We extend this function to all real arguments t
by requiring Φ(t) to be an even function of t, i.e., Φ(−t) = Φ(t). But, to
review this at a more leisurely pace,
148 ENDNOTES
Ψ(t) := ψ(et ).
ψ(X) = Ψ(log(X)).
Our distorted staircase has risers at (0 and) all positive integral mul-
tiples of logs of prime numbers.
2. Now we’ll do something that might seem a bit more brutal: take the
derivative of this distorted staircase Ψ(t). This derivative Ψ0 (t) is a
generalized function with support at all nonnegative integral multiples
of logs of prime numbers.
Figure 39.1: Ψ0 (t) is a (weighted) sum of Dirac delta functions at the logarithms
of prime powers pn weighted by log(p) (and by log(2π) at 0). The taller the
arrow, the larger the weight.
200
150
100
50
In summary: The generalized function that resulted from the above car-
pentry:
Φ(t) = e−t/2 Ψ0 (t),
[19] A version of the Riemann–von Mangoldt explicit formula gives some theo-
retical affirmation of the phenomena we are seeing here. We thank Andrew
Granville for a discussion about this.
Even though the present endnote is not the place to give anything like a
full account, we can’t resist setting down a few of Granville’s comments
that might be helpful to people who wish to go further. (This discussion
can be modified to analyze what happens unconditionally, but we will be
assuming the Riemann Hypothesis below.) The function Φ̂≤C (θ) that we
are graphing in this chapter can be written as:
X
Φ̂≤C (θ) = Λ(n)n−w
n≤C
150 ENDNOTES
1
where w = 2 + iθ. This function, in turn, may be written (by Perron’s
formula) as
Z s=σo +iT X s
1 C ds
lim Λ(n)n−w
2πi T →∞ s=σo −iT n
n s
Z s=σo +iT X
1 ds
= lim Λ(n)n−w−s C s
2πi T →∞ s=σo −iT n
s
Z s=σo +iT
1 ζ0 Cs
=− lim (w + s) ds.
2πi T →∞ s=σo −iT ζ s
s = 0, 1 − w, and ρ − w,
for every zero ρ of ζ(s). We distinguish five cases, giving descriptive names
to each:
1. Singular pole: s = 1 − w.
2. Trivial poles: s = ρ − w with ρ a trivial zero of ζ(s).
3. Oscillatory poles: s = ρ − w = i(γ − θ) 6= 0 with ρ = 1/2 + iγ(6= w) a
nontrivial zero of ζ(s). (Recall that we are assuming the Riemann Hy-
pothesis, and our variable w = 21 + iθ runs through complex numbers
of real part equal to 12 . So, in this case, s is purely imaginary.)
4. Elementary pole: s = 0 when w is not a nontrivial zero of ζ(s)—i.e.,
when 0 = s 6= ρ − w for any nontrivial zero ρ.
5. Double pole: s = 0 when w is a nontrivial zero of ζ(s)—i.e., when
0 = s = ρ − w for some nontrivial zero ρ. This, when it occurs, is
indeed a double pole, and the residue is given by m · log C + . Here m
is the multiplicity of the zero ρ (which we expect always—or at least
usually—to be equal to 1) and is a constant (depending on ρ, but
not on C).
The standard technique for the “Explicit formula” will provide us with a
formula for our function of interest Φ̂≤C (θ). The formula has terms result-
ing from the residues of each of the first three types of poles, and of the
ENDNOTES 151
(1) Φ̂≤C (θ) = Sing≤C (θ) + Triv≤C (θ) + Osc≤C (θ) + Elem≤C (θ)
Or:
(2) Φ̂≤C (θ) = Sing≤C (θ) + Triv≤C (θ) + Osc≤C (θ) + Double≤C (θ),
the first if w is not a nontrivial zero of ζ(s) and the second if it is.
The good news is that the functions Sing≤C (θ), Triv≤C (θ) (and also
Elem≤C (θ) when it exists) are smooth (easily describable) functions of the
two variables C and θ; for us, this means that they are not that related to
the essential information-laden discontinuous structure of Φ̂≤C (θ). Let us
bunch these three contributions together and call the sum Smooth(C, θ),
and rewrite the above two formulae as:
Or:
Here if a zero has multiplicity m, then we count it m times in the sum. Also,
in this formula we have relegated the “θ” to the status of a subscript (i.e.,
w = 12 + iθ) since we are keeping it constant, and we view the two variables
X and C as linked in that we want the cutoff “X” to be sufficiently large,
say X C 2 , so that the error term can be controlled.
At this point, we can perform a “multiplicative version” of a Cesàro
RC
summation—i.e., the operator F (c) 7→ (CesF )(C) := 1 F (c)dc/c. This
has the effect of forcing the oscillatory term to be bounded as C tends to
infinity.
This implies that for any fixed θ,
• CesΦ̂≤C (θ) is bounded independent of C if θ is not the imaginary part
of a nontrivial zero of ζ(s) and
152 ENDNOTES
X X
φ̂(ρ) = − Λ(n)φ(n) + I(φ),
ρ n≥1
where
X x 21 +iθ − 1
x− 1 .
|θ|≤C 2 + iθ
.
ENDNOTES 153
[21] You may well ask how we propose to order these correction terms if RH is
false. Order them in terms of (the absolute value of) their imaginary part,
and in the unlikely situation that there is more than one zero with the
same imaginary part, order zeroes of the same imaginary part by their real
parts, going from right to left.