It's About Time - Elementary Mathematical Aspects of
It's About Time - Elementary Mathematical Aspects of
ROGER COOKE
IT’S ABOUT TIME
ELEMENTARY MATHEMATICAL ASPECTS OF RELATIVITY
ROGER COOKE
Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Permissions to reuse
portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink
service. For more information, please visit: http://www.ams.org/rightslink.
Send requests for translation rights and licensed reprints to [email protected].
Excluded from these provisions is material for which the author holds copyright. In such cases,
requests for permission to reuse or reprint material should be addressed directly to the author(s).
Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the
first page of each article within proceedings volumes.
2017
c by the author. All rights reserved.
Printed in the United States of America.
∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at http://www.ams.org/
10 9 8 7 6 5 4 3 2 1 22 21 20 19 18 17
Contents
Preface vii
Pedagogical Aims ix
Humanistic Aims xii
Special Features of This Book xiii
Other Works on the Subject xvi
Background Necessary to Read This Book xvii
Plan of the Work xviii
Acknowledgments xix
2. Chronology 336
3. Space and Time 350
4. The Reality of Physical Concepts 361
5. The Harmony Between Mathematics and the Physical World 366
6. Knowledge of Hypothetical Objects: An Example 376
7. Knowledge of the Physical World 380
8. A Few Words from the Discoverers 384
9. Epilogue: The Reception of Relativity 386
Bibliography 389
The present book has grown from what was originally the very small project of
writing an article highlighting three mutually unrelated areas of special and general
relativity, namely the twin paradox (§ 6 of Chapter 1), the relativistic Maxwell
equations (Chapter 3), and the precession of the orbit of Mercury (§ 7 of Chapter 4).
As the writing proceeded, I realized that, while I was keeping the mathematical
details to a minimum and omitting all the historical context of the physics, this
material would still not be accessible to the audience I had in mind, consisting
of people with mid-level undergraduate preparation in mathematics. One would
still need to know at least the rudiments of calculus of variations and differential
geometry. One thing led to another, and the filling in of those details required
two years of work and expanded this work to its present Brobdingnagian size of
some 400 pages, plus two additional volumes (posted online) of ancillary material.
My vision of the core of the work (Volume 1) remains: it is intended to be a
random set of commentaries on certain aspects of relativity. This book is neither a
technical introduction to relativity, nor a systematic history of its development, nor
yet a professional-quality examination of philosophical issues. The physicists and
historians who vetted it for publication pointed out to me a number of areas where
my ignorance and brashness led me to make throwaway comments that describe
research already performed or in progress. As I shall keep reminding the reader, I
am not a specialist in any of these areas; if I make some suggestions that expose
my innocence, that is not the worst fate than can befall an author.
I have had two purposes in mind in writing the present work, one pedagogical,
the other humanistic. The pedagogical purpose is to present some highlights of
the special and general theories of relativity with full mathematical details in a
form accessible to advanced undergraduate mathematics, science, and engineering
majors (and, of course, any interested person who knows a little university-level
vii
viii PREFACE
1 Hubble’s paper does not mention the earlier work of Lemaı̂tre; as Hubble was not careless
about citation, one must assume he had not heard of it at the time he published his own work.
PREFACE ix
Pedagogical Aims
One of the hardest questions asked by students studying high-school and college-
level algebra is, “What is algebra good for?” They know perfectly well that people
never need to solve even quadratic equations in everyday life, and no one is ever
faced with the problem of computing where two trains will meet setting out in
opposite directions from Chicago and New York at different times. One can tell
them that algebra is the language in which science is written, but the unfortunate
truth is that science seldom needs to solve only algebraic equations; much more
often, the problems involve differential equations. It is true that one needs to know
algebra very well in order to understand differential equations, but that explanation
is generally lost on people just beginning the study of the subject. It needs to be
x PREFACE
pointed out that algebra plays a vital role in the discovery of scientific laws. Here
are two examples:
• The rule that “distance = speed × time” at constant speed has the same
form as the rule that, for a rectangle of constant width and variable length,
“area = width × length”. It is easy to see geometrically—as the scholars at
Merton College, Oxford, seem to have reasoned in the thirteenth century—
that if the width (w) happens to be directly proportional to the length (l),
that is, w = kl for a constant k, the graph of this relation provides a family
of right triangles with legs varying in proportion to each other, and areas A
given by the relation we nowadays write as the formula A = 12 kl2 . By the
analogy that exists between the two relations just cited, the distance (s)
fallen when speed is directly proportional to time (t) ought to be s = 12 gt2 ,
where g is the constant of proportionality between speed v and and time
t (v = gt).2 Assuming that g is constant near the surface of the Earth, it
is the gravitational acceleration, which by observation is 9.81 meters per
second-squared.
To extend this example, consider the case of a particle moving around
a circle of radius r at uniform speed v. Since it is not moving in a straight
line, it has some acceleration, always directed toward the center of the
circle. It is intuitively obvious that the magnitude of this acceleration is
constant. What is its value? In the absence of any force, the particle
would fly off the circle along a tangent line at speed v, and after time t, it
would be on a circle of radius r 2 + (vt)2 . In order to stay on the circle,
it must “fall” a distance3
s = r 2 + (vt)2 − r = r 1 + (vt/r)2 − 1 .
Over a small interval of time t, the standard “differential” arguments
of calculus show that the right-hand side of this equation is closely ap-
proximated by the expression
1 (vt)2 1 v2 2
r· 2
= t .
2 r 2 r
The relative error in this approximation tends to zero as t tends to 0.
Since it is the instantaneous acceleration we are interested in (essentially,
the relation that results when t = 0), by comparing this expression with
the Merton rule, we see that the factor g in the relation s = 12 gt2 corre-
sponds to v 2 /r, which is therefore the magnitude of the acceleration. This
is what Newton in the quotation above called “the force [with] which a
globe revolving within a sphere presses the surface of the sphere”.
To extend the example still more, given the formula v 2 /r for the
acceleration of a body in motion at speed v around a circle of radius
r, Kepler’s third law implies, as Newton noted, an inverse square law
of gravitational attraction.4 The implication also goes in the opposite
2 The analogy between the geometric and mechanical formulas here was explicitly noted and
illustrated by the fourteenth-century Bishop of Lisieux Nicole d’Oresme (1323–1382), who laid the
groundwork for the analytic geometry of Fermat and Descartes.
3 Only the radial distance fallen is involved here, since the acceleration has no tangential
component.
4 From Newton’s own words, quoted above, this appears to be the reason he believed in an
inverse-square law. One can easily imagine other considerations that point to the same conclusion.
PREFACE xi
3
T = k r 2 (Kepler’s third law, T = period, r = radius of the orbit) ,
2πr 2π − 1
v= = r 2 (speed = distance/time) ,
T k
v2 2π 2
= r −2 (the inverse-square law) .
r k
Finally, to reap the harvest of this simple mathematics, consider, as
Newton claimed to have done, the orbit of the Moon. Its sidereal pe-
riod T is approximately 27 13 days,5 which amounts to 2.361 × 106 sec-
onds. The radius r of the Moon’s orbit (approximating it by a circle) is
3.85 × 108 meters. By the formula given above, its acceleration v 2 /r is
4π 2 r/T 2 , which amounts to 0.002725 meters per second-squared. Since
the radius r is almost exactly 60 times the radius of the Earth, this ought
to be about equal to the acceleration of gravity at the Earth’s surface
(9.81 meters per second-squared) divided by 602 = 3600. And indeed,
9.81/3600 = 0.002725. As Newton said, the two figures “answer pretty
nearly”! (Newton wrote these words half a century after the alleged events
they describe. Historians do not believe Newton had the law of universal
gravitation in 1665. If he had possessed it at that time, the computa-
tion he claimed to have made would not have worked, since he believed
one degree of a great circle on the Earth’s surface was about 60 miles.
In fact, it is about 70 miles (111 km). He really did make the compu-
tation later, with accurate figures on the size of the Earth provided in a
posthumously published work of Jean-Félix Picard (1620–1682).) This as-
tonishingly close agreement with observation is an awe-inspiring example
of the power of simple mathematics to reveal the mysteries of the universe.
To summarize, in three brief, elementary arguments using simple ge-
ometry, simple algebra, and a tiny bit of calculus, we have set up a plau-
sible physical law that can be applied and tested with astronomical ob-
servations and have performed such a test. Our proposed law has passed
the test of observation amazingly well.
That is the kind of connection I enjoy making and will be making in
these pages. As a mathematician, I am most interested in the contribution
that symbolic algebra makes in the process just described. It was the
analogy between the relations “length × width = area” and “speed ×
time = distance”, expressed by two symbolic equations of exactly the
For example, if the total “gravitational attraction” remains constant as it spreads out from the
center of attraction, its “density”, which is the attraction on a particle at a given point at distance
r, would be “diluted” in proportion to the area of the sphere of that radius, which is to say, in
proportion to the square of the distance.
5 Because the Earth moves about 26 degrees around the Sun during that sidereal period, the
Moon must revolve about 390◦ degrees from one full moon to the next. Hence the synodic period
of the Moon is about 29 12 days.
xii PREFACE
same algebraic form, that led us (not the Merton scholars) to the Merton
rule, and ultimately to the law of falling bodies. It was the algebraic
similarity of the formula connecting the distance a heavy body falls in
time t with the distance a body in circular motion falls toward the center
of the circle in time t that produced the law of acceleration for uniform
circular motion.
My second example shows how algebra enables us to reason about the geometry
of multi-dimensional spaces too complex to visualize.
• It will be shown in the discussion of curvature in Chapters 5 and 6 that the
intuitive geometric idea of projecting the derivative of a tangent vector to
a surface in three-dimensional real space R3 into the tangent plane of the
surface leads to the Christoffel symbols, for which an explicit algebraic
formula can be given. This algebraic formula can be trivially extrapo-
lated to any finite number of variables by the simple algebraic process of
extending the range of summation on the indices. In that way, the whole
panoply of covariant derivatives, parallel transport, and curvature can be
defined in a completely abstract way, without the need for any embedding
in an absolute Euclidean space. That is a crucial point, since space-time
is the only universe we know, and we have no direct evidence that it is
embedded in a space of higher dimension. The geometric, intuitive origin
of the Christoffel symbols lies in the familiar territory of surfaces in three-
dimensional space. But, by a simple algebraic generalization, replacing 3
by n, we get Christoffel symbols that describe the curvature of manifolds
of any dimension.
Humanistic Aims
The special and general theories of relativity are milestones in the progress of
theoretical and experimental physics in the late nineteenth and early twentieth cen-
turies. At the same time, these theories have produced a profound paradigm shift
in the modes of thought by which scientists order the universe. They were necessary
on purely theoretical grounds due to asymmetries in the equations of electromag-
netism and on experimental grounds due to the failure of attempts to detect the
hypothetical “luminiferous ether” in which light was believed to be an elastic wave.
In that respect, they resemble the paradigm shift in chemistry in the eighteenth
century, which coincided with the failure of the hypothetical “phlogiston” to show
itself to experiment and its replacement by a quantitive theory of oxidation.
After presenting the mathematical skeleton of the theories in Chapters 1–7,
I take some time in the final chapter to explore these issues and reflect on the
evolution of physics all the way from Aristotle through Einstein. In this chapter,
not being a specialist in philosophy, I am more concerned to raise questions for
people who may not have thought of them than to propose answers that professional
philosophers of science may have thought of already and perhaps even rejected as
inadequate.
In order to fulfill my humanistic aims, I have sought breadth and generality
rather than profundity and detail. But to avoid sacrificing my pedagogical goals, I
have not sought ultimate generality. The whole book reflects the tension between
these two goals. For example, a minimal exposition of special relativity is given in
the first two chapters. That is pedagogically necessary for what follows, but to keep
PREFACE xiii
the student’s imagination active, I replace some of the standard material on special
relativity in the first three chapters with topics that I happen to find interesting.
These sections are marked with asterisks to indicate that they can be omitted
without loss of continuity. They are the kinds of questions that mathematicians
ask—Is the composition of two Lorentz transformations a Lorentz transformation?
If so, how can I get its matrix in standard form? Given that this standard form
is obtained by “sandwiching” the actual coordinate transformation between two
rotations on the spatial portions of the observers’ space-times, can we be sure this
composition is associative? Can relativistic velocities be made into a group?—
and those who are not fascinated by such questions would be well advised to skip
these starred sections. I do think, however, that the deduction of the Maxwell
curl equations from the divergence equations in Chapter 3 by use of the relativistic
transformation of electric and magnetic fields between observers will be interesting
to nonmathematicians.
word for speed, celeritas. In his 1905 paper on special relativity, Einstein used the symbol V .
xiv PREFACE
http://www.maths.ed.ac.uk/~aar/papers/wigner.pdf
material particle can move faster than light relative to that “personal ab-
solute” space. (Just to be clear, two particles in the space can move faster
than light relative to each other, as judged by an observer at rest in the
framework; but neither of them will judge the relative speed to be larger
than c.) It is the entangling of the time axis with the spatial axis along
the common line of motion when two observers reconcile their coordinates
that leads to the bizarre, yet real, phenomena of length and time con-
traction. A discussion of this transformation provides a healthy caution
on the use of vector operations to avoid coordinates. In special relativity
theory, vector notation is applicable to the space used by each observer,
but is not transferrable between two observers unless a particular set of
coordinates is used, since the Lorentz transformation does not preserve dot
and cross products on the spatial portion of space-time. If Y has velocity
u relative to X, and X has velocity −u relative to Y —that is, Y assigns
to the velocity of X coordinates that are the negatives of those that X
assigns to the velocity of Y —while Z has velocity v relative to Y , vector
notation can be used to compute the velocity w that Z has relative to
X. But unless u and v are parallel—that is, when expressed in terms
of the coordinates used by Y , each is a scalar multiple of the other—this
computation cannot be reversed to give the velocity that X has relative
to Z through the simple replacement of u by −v and v by −u and com-
putation of the dot products contained in the formula. When X and Y
use arbitrary orthonormal coordinate systems, Y will not generally assign
to the velocity of X the coordinates that are the negatives of those that
X assigns to Y . Consequently, these velocities cannot be passed back and
forth between different observers using the familiar dot and cross products
that each individual observer can use for his own purposes. The observers
do not have a common space that they can talk about. Vector notation
remains useful for recording physical relations, but can be passed from
one observer to another only when they both agree to use their common
line of motion as the first spatial axis. In other words, one is forced to
derive physical relations coordinate-wise, just as Einstein did in 1905, or
else adopt a completely new approach to the subject. As this book aims
to be elementary, we choose the first of these options.
considered is the Gödel metric. I do explore certain special questions that are more
of interest to mathematicians than to physicists.
Outside of Zee’s book, perhaps the closest in its pedagogical aim of explaining
relativity theory in plain language accessible to undergraduates is the book of the
late Richard L. Faber [27]. Besides the works of Narlikar, Dray, and Sternberg
mentioned above—and, no doubt, dozens of others unknown to me—the subject
matter of this book is discussed in the book of Torretti [81]. Like Zee’s book, Tor-
retti’s erudite book is many times more complete and detailed than the present one
and contains, as well, detailed historical information on the evolution of Einstein’s
thought in the years 1913–1915. My pedagogical aim differs from what I perceive
to be Torretti’s, which appears to be aimed at specialists in the philosophy of sci-
ence. My goal is to simplify selected parts of the theory of relativity, to make them
accessible to a person who has studied only the basic three semesters of calculus
and two semesters of linear algebra.
Among older works, Bertrand Russell devoted considerable space to an expo-
sition of relativity theory, for example, in [72]. As often happens with such books,
however, he explained the mechanics of manipulating the formulas without giving
much information on the grounds for accepting them as laws of nature. His in-
tended audience also appears to be people well versed in the philosophy of science,
and he explores only philosophical issues. He is definitely not writing a textbook
for undergraduates.
Given the sheer quantity of writings on this subject—many dozens of works
that one can find in almost any library and which are not mentioned here—and
my consequent unfamiliarity with most of them, I would not venture to say that
anything I have written here is appearing for the first time. Like many other
mathematicians, I have too often found that my ideas have also occurred to others,
sometimes much earlier. For any passage in this book containing no citation of a
source, there are two possible explanations: (1) the ideas contained in the passage
are well known and can be found in many standard sources; (2) I have never found
the ideas in any source, but have thought them up on my own. In neither case
should the absence of a citation be interpreted as a claim of originality on my part.
7 This is the name that used to be given to what is nowadays more frequently called elementary
real analysis. We actually do not use much of it outside the appendices in Volume 2. Still, it
will be helpful if the reader knows the implicit function theorem, Stokes’s theorem, the divergence
theorem, and the standard criteria for term-wise differentiation and (Riemann) integration of a
uniformly convergent sequence.
xviii PREFACE
a way to avoid digressions in the main narrative, I took the opportunity to include
among them some topics that I find irresistible. Examples are (1) Euler’s wonderful
result that a particle moving along a surface but free of tangential acceleration will
describe a geodesic on the surface (Appendix 2); (2) Jacobi’s elegant last-multiplier
principle (Appendix 5), which shows that a system of n ordinary first-order linear
differential equations with algebraic coefficients whose divergence vanishes has so-
lutions that can be expressed as quotients of theta functions, provided one can find
n − 1 independent algebraic integrals (functions not identically constant that are
constant on the trajectories of a solution); and (3) the whole subject of point-set
topology (Appendix 3), which I include as a way for the reader to acquire some
practice in visualizing completely abstract spaces, along with the basic facts of the
theory, which are used in both real analysis and differential geometry.
Volume 3 contains all the Mathematica notebooks that I used to lighten the
labor of some of the more complex computations in the book and suggested answers
to the exercises in the first two volumes. Eleven Mathematica notebooks are refer-
enced in the first volume and one more in the second volume. They are collected as
a unit at the beginning of Volume 3. For the convenience of the reader, Volumes 2
and 3 and all twelve Mathematica notebooks can be downloaded from my website
at the University of Vermont:
http://www.cems.uvm.edu/~rlcooke/RELATIVITY/
Acknowledgments
I am grateful to Stephen Wolfram for inventing Mathematica and thereby
putting enormous computing and graphing power and accuracy in computation
into the hands of people of limited patience, who would otherwise face the dreary
prospect of spending days or weeks floundering in an attempt to carry out a com-
putation that Mathematica enables us to do flawlessly in a fraction of a second.
That it will also render a beautiful perspective drawing of the graph of a function
of two variables is a delightful bonus.
I wish to thank the anonymous reviewers who vetted the manuscript for pub-
lication and made some extremely valuable suggestions, pointing out places where
my writing was misleading or revealed too much of my innocence of the full history
of this subject. Their advice to tone down the occasional references to controversial
political issues was also sound, and I have heeded it.
Special thanks go to my lifelong friend Charles Gillard, with whom I shared
philosophical reflections as a teenager when we were both delivering newspapers,
at whose wedding I met my wife, and who, now that we are both retired, has been
kind enough to read the entire manuscript of the first volume and send me detailed
comments, which I have taken into account in the proofreading.
Very special thanks go to my wife Catherine, who endured my getting up at
5:00 AM every morning for two years to work on this project.
Roger Cooke
August 2016
Part 1
I have seen many books which have objected to [Euclid’s fifth postu-
late], among the earlier ones Heron and Autocus (Autolycus),and
the later ones Al-Khazen, Al-Sheni, Al-Neyrizi, etc. None has
given a proof. Then I have seen the book of Ibn Haytham, God
bless his soul, called the solution of doubt in Chapter One. This
postulate among other things was accepted without proof. There
are many other things which are foreign to this field, such as:
If a straight line segment moves so that it remains perpendicular
to a given line, and one end of it remains on the given line, then
the other end of it draws a parallel.
There are many things wrong here. How could a proof be based on
this idea? How could geometry and motion be connected?
Omar Khayyám (1048–1131). See [1], p. 277.
at a given time (the presence or absence of a given particle, for example) depends on
who is observing. Perpendicularity of two lines requires that a certain configuration
among three variable points remain constant at all times. One of the points, say
Q, is the moving point of intersection of the two lines; the second point, say P , is
fixed on the moving line; and the third, say R, lies on the fixed line. As we are
about to discuss, when special relativity is admitted, it is impossible to state in
an observer-independent way what the relative positions of three particles are at a
given instant of time. It follows from the relativistic equations of transformation
between observers that the temporal order of two events occurring in different
locations may depend on the observer, and it is easy to see that lines regarded as
mutually perpendicular by one observer are normally not mutually perpendicular as
measured by a second observer. In terms of spatial coordinates, this disagreement
comes about because of the FitzGerald–Lorentz contraction,2 whereby the length
one observer ascribes to a line in the direction of the other observer is found to
be shortened by the factor 1 − u2 /c2 when measured by the other. Here u is
the mutual speed with which the two are moving relative to each other, and c is
the speed of light, approximately 300,000 km/sec. This inconsistency of spatial
measurement, in turn, arises from the fact that the observers disagree as to which
of two events occurred earlier: They are unable to agree as to where one end of the
line is in relation to the other at a given instant of time, since simultaneity of two
events occurring in different places is not observer-independent. The details of this
phenomenon will be explored below. We are now going to flesh out these abstract
ideas with a scenario taken from the modern world.
1.1. The car wash puzzle. Imagine a very long limousine parked in a car
wash, the limousine being exactly as long as the car wash. It would just fit inside,
and the attendant would be able to close the doors at both ends of the car wash.
Now imagine that same limousine driving through the car wash (at any speed you
like, but imagine it to be a very high speed). Is there an instant of time when
the limousine is entirely inside the car wash? The limousine driver will say no:
Due to the FitzGerald–Lorentz contraction, the car wash shrank in length, and the
limousine wouldn’t fit inside. The car wash attendant will say yes: Due to the
FitzGerald–Lorentz contraction, the limousine shrank in length and fitted inside
with room to spare. Who is right here?
The explanation involves the observer-dependence of the concept of simultane-
ity. We are looking at two events here that take place in different locations. One
event is the rear end of the limousine entering the car wash. The other is the front
end of the limousine leaving the car wash. Those events occur in the opposite order
for the two observers. While they share the same four-dimensional space-time and
agree about the four-dimensional “proper-time” interval between the two events—
taking one of the events to have occurred at time zero and at the origin of the
spatial coordinates in both systems, while the other occurred at time t and at a
point (x, y, z), that interval is t2 − (x2 + y 2 + z 2 )/c2 and is the same for both of
them—they do not agree about the scales on either the line in space along the
common direction of motion or on the time axis. In other words, t and at least one
of the spatial coordinates, say x, are different for the two observers even though the
four-dimensional interval is the same for both of them. The answer to the question
2 Named after George Francis FitzGerald (1851–1901) and Hendrik Antoon Lorentz (1853–
1928).
1. SIMULTANEITY AND SEQUENTIALITY 5
is that the question does not make sense. The individual time order of two events A
and B will be the same for all observers only if a ray of light could set out from the
location of one of them, say A, at the time A occurred and arrive at the location
of event B before B occurs there. That is to say, the four-dimensional interval
between A and B is positive—“timelike,” as physicists refer to it. Because this
four-dimensional interval is the same for all observers, any two will agree whether
this is the case or not.
Another way of saying that the interval between events A and B is timelike
is to say that an observer could physically be present at both events A and B.
The proper time interval3 between the two events(for any observer) is obtained
by parameterizing a path from A to B, say r → t(r), x(r), y(r), z(r) , where A
corresponds to r0 and B to r1 , and then integrating the infinitesimal proper time
interval ds, which is given by
1 1 1
ds2 = dt2 − dx2 − 2 dy 2 − 2 dz 2 .
c2 c c
Thus we find that the proper time interval is
r1
2 x (r) 2 y (r) 2 z (r) 2
Δs = ds = t (r) − − − dr .
r0 c c c
If an observer is present at all the events t(r), x(r), y(r), z(r) , then in that
observer’s coordinate system, x(r) = y(r) = z(r) = 0 at time t(r) for all r. Hence
x (r) = y (r) = z (r) = 0 for that observer, and (assuming B is later than A for
that observer), t (r) ≥ 0. Therefore the time interval recorded by that observer is
r1 r1
2
t (r) dr = t (r) dr = t(r1 ) − t(r0 ) .
r0 r0
In short, the proper time interval between two events is the time interval recorded
by an observer who was present at both. More generally, if an observer assigns the
same spatial coordinates to two events, then the proper time interval between them
is just the time difference recorded by that observer.
If there were any observer who perceived B as occurring before A, then para-
doxes might result if the interval is timelike—that is, a ray of light could leave
the location of A at the time A occurred and arrive at the location of B before B
occurred (which is equivalent to saying that Δs2 > 0). A second observer at the
site of event B could transmit information about event A before event B occurs
and thus the observer who perceives B as occurring first could get historical infor-
mation about event A before it occurred. That observer would be “remembering
the future.”
If the space-time interval between A and B is not positive—it is “spacelike,”
as physicists say—there is no observer-independent temporal ordering between two
events occurring in different locations, and there is no absolute sense in which one
3 The terminology is due to Hermann Minkowski (1864–1909), who gave it this name in a
paper [60] published in the last year of his life. Minkowski had read this paper at a meeting in Köln
on 21 September 1908. He actually gave the name “proper time” (Eigenzeit) to a quantity that
has the physical dimension of length, namely −dx2 − dy 2 − dz 2 − ds2 , where s is time made√into an
imaginary length by means of what he called (p. 86), the “mystical formula” 3 · 105 km = −1 sec.
The concept of proper time had been introduced earlier, however, by Lorentz, who called it local
time.
6 1. TIME, SPACE, AND SPACE-TIME
event occurred before the other. A geometric explanation of this puzzle can be
found by working Problem 1.19 below.
The car wash puzzle throws some doubt on ibn al-Haytham’s intuition. The
points P , Q, and R are in different locations, and different observers will not agree
as to their relative configuration at a given time. For two observers O and O in
relative motion, both watching the two lines slide along each other, there will not
in general be any agreement as to the angle between the two lines. Unless the
fixed line is the line joining O and O , the two will not both agree that the moving
line is perpendicular to it. In that respect, Omar Khayyám’s objection gains force
when special relativity enters the picture. Nevertheless, it remains true that both
observers are carrying out their measurements using Euclidean principles, and they
agree that the end of the moving line not on the fixed line is describing a line
parallel to the fixed line. Thus, one can retain ibn al-Haytham’s conclusion while
rejecting the considerations that led him to it.
In the following sections, we shall investigate the relations among the speeds
and directions of three or more observers, assuming that the space-time coordinates
of any pair are reconciled using the equations of special relativity. As we shall
learn, there is an inherent difficulty in regarding the velocity spaces used by two
different observers as having any vectors in common, even though the equations
of transformation from the coordinates of one to those of the other assume that
the vector spaces representing locations (not velocities) do share a line, namely the
line joining the origins. A unifying theory, which unfortunately lies beyond the
scope of the present book, is the Lorentz group, the six-dimensional Lie group of
transformations of R4 that preserve the four-dimensional interval between events.
(a concept).” The word comes from the Greek word hypostatos (υπóστ‘ ατ oς), which has the same
two roots as the Latin-derived word substance.
5 Originally, gravity (gravitas) was just the Latin word for heaviness. It was a property pos-
sessed by such things as earth, air, and water, but not by fire, which had the opposite property
of levity (levitas) or lightness. The mental picture of gravity as a force, a thing “existing” inde-
pendently of bodies having the property of gravity, even though no one has any clear ideas as to
what that thing is, remains useful.
8 1. TIME, SPACE, AND SPACE-TIME
It was born in upon him that the creatures were really moving,
though not moving in relation to him. This planet which inevitably
seemed to him while he was in it an unmoving world —the world,
in fact—was to them a thing moving through the heavens. In re-
lation to their own celestial frame of reference they were rushing
forward to keep abreast of the mountain valley. Had they stood
still, they would have flashed past him too quickly for him to see,
doubly dropped behind by the planet’s spin on its own axis and by
its onward march around the Sun. [Perelandra, Chapter 16.]
The idea of absolute time and space still lingers, and even general relativ-
ity appears at first sight to require some replacement for it.6 Beyond doubt, a
great deal has been achieved with this concept through Newtonian mechanics. By
the twilight years of this concept, at the end of the nineteenth century, it had
6 One such replacement is known as Mach’s principle, after Ernst Mach (1838–1916). We
avoid clumsy he/she constructions, we shall simply assume they are both male. Each is pictured
as located at the origin of a set of three spatial coordinates and carrying a clock. Reconciling the
spatial coordinates and the clocks between the two observers is the central problem of the present
chapter.
10 1. TIME, SPACE, AND SPACE-TIME
siblings to conclude that these two events occurred simultaneously. (We shall ignore
the five-hour difference in solar time between London and Boston and assume that
events that occurred on a given date in Massachusetts also occurred on that same
date by London reckoning, even though this is often not the case.) That is, Mary
would conclude that the fair her brother attended on the frozen Thames was going
on at the same time that her husband was on his deathbed. And John would agree
that, unaware of this sad event, he had been enjoying beer and leg of mutton while
his brother-in-law was breathing his last. The two events were simultaneous; and
even though neither sibling observed both of them, they would have been able to
reconcile their timekeeping by using a common calendar.
Even without that calendar, however, each of them could have computed the
time of the event at which the other was present, provided they knew (1) the time
t0 elapsed between those events and the dispatch of the report aboard the ship, (2)
the average speed v0 at which the ship sailed, and (3) the distance d0 by sea from
London to Boston.8 Dividing the distance d0 by the speed v0 would yield the time
that the newspaper was en route. Subtracting the sum t0 +d0 /v0 from the time that
the news arrived would give the time of the event on the common calendar both
were using. To make the analogy with physics, the point of Newtonian mechanics
is that there is a universal method of measuring time, usable by everyone, and it is
not affected by any state of motion of any observer. Likewise, there is a universal
method of specifying locations through a system of three rectangular coordinates
(x, y, z), and everyone can use this system, thereby always agreeing on the absolute
location of any particle at any absolute time.
This small anecdote—James Foster was a real person who died in Dorchester
on 8 January 1763, but his having emigrated from London with a wife named
Mary is fiction—illustrates two points of importance to physics. First, information
travels at a finite speed.9 Second, if an event in one place is the cause of a second
event in another place, it must be possible for information about the first event
to reach the second place before the second event occurs: John cannot write to
his sister to express condolences and perhaps suggest that she consider returning
to London until after he receives news of his brother-in-law’s death. Thus, travel
at eighteenth-century speeds, with the eighteenth-century “speed of information”10
being what it was, has rather weak effects on the lives of ordinary people. If,
anachronistically, they both had accurate calendar-watches that they synchronized
when Mary departed from London, they would find when she returned that their
watches were still synchronized, as far as they could tell. In particular they would
find that they had both aged by the same amount, and they would agree as to
the total distance Mary had traveled. We shall bring these siblings back to the
stage later in the present chapter, moving their stories forward in time by some
three centuries and making Mary’s travel a high-speed journey to a planet orbiting
8 As John and Mary were citizens of the British Empire in the eighteenth century, that
distance would have been given as roughly 3300 miles (say 5300 kilometers).
9 The claims that paranormalists make on behalf of precognition and clairvoyance do not hold
reserving the symbol v0 for the arbitrary hypothetical value we assign to it in the Newtonian
world. In Newtonian theory, it could have any positive value, but of course had a rather small one
in the eighteenth century. Once communication by radio was invented, v0 became coincident with
the speed of light, for which we use the symbol c. Further increases in its value are not expected.
2. SYNCHRONIZATION IN NEWTONIAN MECHANICS 11
a nearby star. At that time, as we shall see, travel has rather more noticeable—
relativistic—effects on a person, and their maps and timetables for the journey will
not agree. For their final bow, in Chapter 7, we’ll move them ahead by yet another
three centuries and ask Mary to travel to a black hole. The discrepancies in their
measurements of time and space will then be still more noticeable.
Physics is much more abstract than everyday life. The kind of events reported
in news media are described by the “five W’s”: what, who, where, when, and
why. That is, they tell what happened (picnic on the frozen Thames, death of
James Foster), who was involved (large numbers of people, James Foster), when
(8 January 1763), and (in reporting complicated issues) a context that makes the
event comprehensible to the reader or viewer. We have confidence in the accuracy
of reports if all journalists agree on these five points. Physics, in contrast, pays
no attention to individual people and their motives. Accordingly, in physics we
have observers, not journalists, and they are concerned only with what happened,
where, and when. Now if two observers are to reconcile their observations at all,
they surely have to agree as to what happened. (Say, one particle collided with
another.) In Newtonian physics, two observers can reconcile their observations
so as to agree separately about the time and location of the event. In special
relativity, in contrast, they cannot. Instead of having a where and a when to
reconcile, each observer has a “where-when,” and one where-when is reconciled
with another through the Lorentz transformation that will be introduced below.
From the concept of events, we get the notion of a space-time in which an
event is simply a set of four coordinates (t; x, y, z), t being the absolute Newtonian
time of the event, and (x, y, z) its spatial location in terms of a conventional three-
dimensional coordinate system. In one sense, it can be argued that the speed of
information in Newtonian physics is infinite. In such equations as Laplace’s equa-
tion, the heat (diffusion) equation, and Newton’s law of gravity, any perturbation of
the controlling initial/boundary conditions is propagated instantly to the solution
at all points of space for all later times. The only exception is the wave equation, in
which disturbances of the initial condition propagate at a finite speed. This unreal-
istic feature of the Newtonian equations is one reason for preferring the relativistic
ones. The one exception—the wave equation—which governs electromagnetic radi-
ation, lies at the very heart of the special theory of relativity.
It is hoped that the reader finds none of this subsection difficult to understand.
Most readers will, more likely, be impatient at being patronized by such a detailed
discussion of what is, after all, only common sense. The trouble is that Common
Sense is Newtonian, but the physical universe isn’t. We have included this subsec-
tion in order to lay down a detailed background of concepts that can be modified
in intuitively reasonable ways, one step at a time, all the way to the general theory
of relativity.
We are now in the realm of abstract mathematics, and to tie it to the physical
world, we need an interpretation of the mathematical object (t; x, y, z). In the
Newtonian scheme, we can think of (x, y, z) as the location of an identifiable particle
at time t, perhaps a proton or electron. Although these physical bodies do occupy
some volume, we can idealize them as having all three of their geometric dimensions
equal to zero. If the particle moves, we can identify its position at time t with a
vector r(t) = x(t), y(t), z(t) . Along with that position, the particle has other
numbers associated with it, such as its mass m, possibly its electrical charge q, its
12 1. TIME, SPACE, AND SPACE-TIME
velocity r (t), its acceleration r (t), its momentum mr (t), the force acting on it
mr (t), its kinetic energy (1/2)m|r (t)|2 , and,
if the forces that are acting on it
are conservative, its potential energy V r(t) , which depends on its location. But
everything is defined in terms of the mass m and the four coordinates (t; x, y, z),
with t having the physical dimension of time, and the other three having the physical
dimension of length. We are now going to explore the concept of ordered time in
Newtonian mechanics from this more abstract point of view.
2.2. Four kinds of time. In the story told above, we see two related events
occurring. One is a primary event, but secondary to it is another event, namely
the observation of the primary event. The second event occurs later because in-
formation requires time to travel from the location of the primary event to the
location of the secondary event. To “abstractify” these considerations and fit them
into mechanics, yet at the same time present a simple model for understanding, we
find it useful to consider a moving particle that is being viewed by an observer.
For the time being, let us assume that the observer is using an orthogonal system
in which the Pythagorean theorem holds, and we think of the observer as sitting
at the origin of this coordinate system. Since Newtonian time is absolute, we can
assume that the all clocks used by all observers are properly synchronized with one
another. Finally, we shall assume that information travels at some fixed speed v0
in any direction. If, at time t an observer at rest at the origin of Newton’s universal
space receives information that something interesting is happening to a particle at
time t that is located at the point with universal coordinates r = (x, y, z) at time t,
then that observer, taking account of the finite speed of information, will conclude
that that event really happened at an earlier time s given by
s = t − x2 + y 2 + z 2 /v0 = t − |r|/v0 .
The time s is the Newtonian universal time showing on a clock at the point
(x, y, z) when the event occurred. It will be the same for any observer viewing
the event. The observation time t, which differs from one observer to another, is
the value that universal time has when information about the event reaches the
observer. For two events that occur at locations r 1 and r 2 and are observed at
the origin at times t1 and t2 respectively, the time interval between the events
themselves is
|r 2 | |r 1 | Δ|r|
Δs = s2 − s1 = t2 − − t1 − = Δt − .
v0 v0 v0
This expression will be the same for any two observers viewing the events, even
though the information about the events will reach them at different times and the
coordinates they assign to the locations of the events will generally be different. In
order to think clearly about the four-dimensional relativistic world whose points
are “events” (t; x, y, z), we need to keep this “speed of information” v0 in the
background. In relativity, it will be the speed of light in free space, but in Newtonian
mechanics, it can be any convenient positive speed. In Newtonian mechanics, the
difference between observation time and universal time is merely the time required
for information to travel from the site of the event to the observer. In relativity,
that Newtonian adjustment is taken for granted as having been made by any given
observer O, who therefore has a clear concept of simultaneity throughout his own
personal Euclidean space. But communication with another observer O in motion
relative to O is complicated, since neither the similar synchronization O has carried
2. SYNCHRONIZATION IN NEWTONIAN MECHANICS 13
out nor his spatial measurements agree with those of O. The discrepancy all hinges
on one issue, which is precisely time-keeping. Since we no longer have a universal
time, we shall distinguish between Newtonian and relativistic measurements by
calling the observer’s time laboratory time. The best replacement we can get to
help our two observers reconcile the order of events is yet another kind of time we
shall call proper time, which we use as the replacement for universal time.
2.3. Proper time. The distinction we have just made between observation
time and universal time in Newtonian mechanics becomes much more important in
relativity, where the interval between the time s shown by a clock attached to a
moving particle and the time t recorded by someone observing the particle is given
by “enlarging” the Pythagorean theorem so as to include a time dimension. On the
infinitesimal level, as mentioned above,
1 2
ds2 = dt2 − 2
dx + dy 2 + dz 2 .
c
Here, c is the speed of light, the fastest speed at which information can be
transmitted. The analogy is not perfect, as will be seen, since in relativity it is
impossible for two observers in relative motion, each having an accurate clock, to
synchronize those clocks. In relativistic mechanics, we need to imagine two clocks
associated with any moving object that is being observed. The time t that the
observer records, which we just agreed to call laboratory time, comes from a clock
synchronized with the observer’s clock and attached at the point of the observer’s
space that the moving object is passing through at a given instant; it is, from the
point of view of someone riding on the moving object, not keeping accurate time.
The proper time s is what is shown on a clock attached to the object. Although
the first clock is synchronized with his own local clock, the observer will still have
to make the correction for the time it takes for a signal from that clock to reach
him in order to assign a time to this event. But even after that correction is made,
the time the observer records for an event will not agree with the time broadcast
at the same instant by the second clock, attached to the moving object.
All that will be explained in detail below. In the relativistic equations relating
the space-time coordinates of two observers, the correction for time lag is already
taken into account in the variable t. The laboratory time t of an event is not
the time at which the observation was recorded (what we just called “observation
time”), but rather the time at which the event actually happened in the observer’s
personal space-time, taking the Newtonian adjustment into account. The Newtonian
correction amounts to taking account of the time delay involved in transmitting
information about the event, and any two observers who have made that correction
will agree on the time of the event. In the relativistic model, however, even after
correcting for the time lag due to a finite speed of information, different observers
will not in general ascribe the same space-time coordinates to an event.
There is in fact only the one invariant across all these coordinate systems,
namely the infinitesimal squared-increment of proper time ds2 , which is expressed
as a quadratic form in the differentials of the four coordinates. It is the same for any
two observers in relative motion along a straight line at a constant speed. All of this
falls into the domain of the special theory of relativity, which will be discussed in the
next chapter. In this theory, two observers using different coordinate systems can
“talk to” each other, if their coordinates are related by a Lorentz transformation,
14 1. TIME, SPACE, AND SPACE-TIME
which is a linear transformation of R4 that preserves the quadratic form ds2 just
written.
t2 = t1 ,
x2 = x1 − ut1 ,
y2 = y1 ,
z2 = z1 .
Remark 1.1. Newtonian space is Euclidean and each point in it, regarded as a
vector ξ = (x, y, z), has a squared-distance from the origin given as its dot product
with itself:
|ξ|2 = ξ · ξ = x2 + y 2 + z 2 .
The vector notation for the dot product (invented in the late nineteenth cen-
tury) is very useful because of the compact expression it gives to many physically
important quantities. The Euclidean structure of space singles out certain coordi-
nate systems as “preferred,” namely those that are orthonormal, meaning that the
basis coordinates ξ 1 = (1, 0, 0), ξ2 = (0, 1, 0) and ξ3 = (0, 0, 1) satisfy ξi · ξ j = 0 if
i = j and ξi · ξ i = 1. If we confine ourselves to orthonormal coordinate systems, it
does not matter which particular one we choose, since the square-distance is given
by the same expression: |ξ|2 = x2 + y 2 + z 2 whenever ξ = xξ1 + yξ2 + zξ 3 . That
is one huge advantage of orthogonal systems: The dot product is invariant under
orthogonal transformations, which are defined by that property: T ξ · T η = ξ · η.11
This seems an appropriate point to foreshadow certain other aspects of physics
that are affected by the use of alternative systems of coordinates. We have in mind
particularly the concept of kinetic energy. Assume that O1 and O2 are both fixed at
the same origin (u = 0), and O2 continues to use an orthonormal coordinate system
but that O1 is using a general coordinate system. For what we want to do, we need
to be slightly more formal and systematic about our labeling. Henceforth, we let
xi = x1i , yi = x2i and zi = x3i , i = 1, 2. Then for some constants gij , i, j = 1, 2, 3,
we have
11 The cross product u× v is invariant under a rotation, that is, an orthogonal transformation
whose determinant is 1. For that reason, physicists sometimes refer to the cross product as
a pseudo-vector reserving the term vector for vectors that are invariant under all orthogonal
transformations.
16 1. TIME, SPACE, AND SPACE-TIME
where
3
tij = m gki gkj
k=1
if i = j, and
3
1 2
tii = m gki .
2
k=1
In Newtonian mechanics, it is perfectly legitimate—though one might think
it foolish—to use general coordinates. Still, one might be studying the crystalline
structure of a body and prefer axes that follow the lines of symmetry of the crystals.
In that case, we might actually use oblique coordinates. If we do so, we see that
we need more than the simple scalar equation T = (1/2)m|r (t)|2 to keep track of
kinetic energy. In its place we need what is called a tensor, which in this case is a
bilinear mapping of pairs of velocity vectors u = (u1 , u2 , u3 ) and v = (v 1 , v 2 , v 3 ):
3
T (u, v) = tij ui v j .
i,j=1
where gij = 2tij /m. This last tensor is of basic importance throughout differential
geometry. It was the starting point for modern abstract differential geometry,
introduced by Bernhard Riemann (1826–1866) in his 1854 inaugural lecture. It
soon came to be called the fundamental tensor by Einstein and others. We shall
call it the metric tensor, since it gives the metric by which intervals are measured
on an abstract manifold. Here we have our first hint of the intimate connection
between differential geometry and mechanics: the tensor that defines the geometry
of a manifold also serves to convert velocities into kinetic energy, through exactly
the same bilinear operation on a pair of vectors in both cases. As we progress from
Newtonian mechanics to general relativity, that metric-energy connection will serve
as a guide. We will think of the metric coefficients (the gij ) as potential energy
functions. Notice that all of this insight comes about because we attempted to free
ourselves from dependence on particular coordinate systems. As long as we confined
2. SYNCHRONIZATION IN NEWTONIAN MECHANICS 17
ourselves to orthonormal coordinates, where the metric is ds2 = dx2 +dy 2 +dz 2 and
the kinetic energy is m|r |2 /2, we wouldn’t necessarily think in terms of a matrix.
While we would not normally bother with this tensor in Newtonian mechanics,
the analogous four-dimensional concept in relativity will turn out to be very useful
to us. We shall see this at the end of Chapter 2 and again in Chapters 6 and 7.
Certain bilinear mappings of pairs of velocity vectors occur quite naturally in the
equations of geodesics, which lie at the heart of relativity theory. By putting this
example here, we are foreshadowing some important results that will come later
and preparing the reader to adjust to a new way of thinking about mechanics.
Remark 1.2. The reader may also be wondering why we chose to measure
distance as time, converting it via a conventional standard velocity v0 (taken to be
c, the speed of light, in relativity). Why not instead convert time to distance by
defining t̃ = v0 t? After all, we generally find it easy to measure distance. We can,
in the simplest case, carry a ruler around with us calibrated in standard units of
length, such as millimeters, and determine the distance between two nearby points.
Time, on the other hand, is a rather mysterious, mystical thing (see the discussion
in Chapter 8). In order to measure it, we have to select some process that we accept
as proceeding at a uniform rate—the dripping of water through a hole, or sand in an
hour-glass, or the swing of a pendulum, or the unwinding of a watch spring, or the
vibration of the crystal in a digital watch or the right ascension of a star—and use
that process as a measure of time. It is not intuitively obvious that all these ways
of measuring time are even mutually consistent. The now old-fashioned clock with
hour and minute hands goes in exactly the opposite direction from the conversion
we made, measuring time by the lengths of the arcs traversed by the tips of the two
rotating hands. And we have all learned geometry by thinking of lines as lengths.
Why this nonintuitive, seemingly needless complication? It is certainly possible to
express time as a length in this way, in which case one specifies the conversion by
giving the standard speed and a standard unit of length.
In fact, we do exactly that any time we draw a trend line on a piece of paper.
The horizontal axis represents time, and each horizontal distance a certain amount
of elapsed time. We shall even do so below on occasion. For the purposes of theo-
retical physics, however, we wish to invoke the least-time principle that simplifies
so much of classical physics: A physical process evolves in such a way that the
integral of the difference between kinetic and potential energy with respect to time
is “stationary” (usually a minimum). That is why we shall generally homogenize
dimensions and express them all as time. From Chapter 4 on, this aspect of the
theory moves to center stage, and we shall then exclusively write intervals as time
intervals.
Actually, we can measure the “distance” from one place to another in many
different ways. For an astronaut, the distance between two points of space might
be most practically measured as the amount of fuel required to get from one to the
other. For an economy-minded ordinary citizen, the distance from, say Boston to
Chicago, might be measured by the cost of the airline fare for a round-trip journey
(in which case, many mutually inconsistent measures of the distance would exist).
The important aspect of Newtonian space-time to be kept in mind is that it
involved two conventional standard units. There is no “natural” unit of time, and
there is no “natural” unit of speed. The choice of each is arbitrary. We can think
of the standard v0 as the maximum rate at which information can be transmitted,
18 1. TIME, SPACE, AND SPACE-TIME
calling it the “speed of information.” This fact is in complete accord with the well-
known fact that there is no natural unit of length in a Euclidean space. It is this
“flatness” of Euclidean space that makes it possible to build scale models of vintage
automobiles, ships, airplanes, and shopping malls, in which all lengths are shrunk
in the same proportion and all angles are the same in the model as in the original.
This is not possible, for example, when one tries to draw a map of a large portion
of the Earth’s surface. Changing the lengths also causes the angles to change. In
fact, a sphere does have a natural unit of length, namely its radius. Similarly, the
curved plane of hyperbolic geometry has a natural unit of length, which might be
(for example) the distance at which the angle of parallelism is half of a right angle.
(See Appendix 1 for details. Angles have an absolute meaning in all geometries, a
right angle, for example, being exactly one-fourth of a complete rotation.)
In the special theory of relativity, by way of contrast, the constancy of the
speed of light provides a natural link between space and time that is absent from
the Newtonian model. It has profound—and observable—consequences for physics.
these objects as a , b , c , and d . The question that naturally arises is: Suppose
O combines a , b , and c following what is verbally the same algorithm that O fol-
lowed? Will the result be the corresponding d ? The general answer is affirmative,
provided all the objects that are combined are tensors. Detailed discussion of the
problem is given in Appendix 6. Just to make this matter seem a bit less abstract,
we note that the objects we are interested in are all obtained from the coordinate
parameters through algebraic operations and differentiation. The criterion for an
object to be a tensor is, in informal language, that when space-time coordinates are
changed, its coordinates transform according to the chain rule for differentiation.
When vector analysis was invented in the late nineteenth century, it seemed that
a very powerful mathematical language had been created, one ideally suited to the
purpose of getting compact expressions of physical laws. Now vector analysis is
indeed a powerful tool, but in the first decade of the twentieth century it was a
recent creation and by no means universally used. Einstein did not use the vector
operations of curl and divergence in his 1905 paper on special relativity, although he
did use the word vector. But he wrote out the Maxwell equations connecting electric
and magnetic fields componentwise. His use of this seemingly more cumbersome
notation may have been caused by the fact that two observers in relative motion
do not agree that they are both using Euclidean geometry, in which the curl and
divergence have a coordinate-free meaning.
Because Newton’s second law of motion asserts that forces are directly propor-
tional to acceleration, it follows that two observers in relative motion at constant
velocity (zero relative acceleration) must agree about the magnitude and direction
of all forces. This principle does not appear to have raised any doubts until the
discovery of Maxwell’s four laws of electromagnetism, when an asymmetry arose
for two such observers looking at a charged particle moving in a pair of electric
20 1. TIME, SPACE, AND SPACE-TIME
and magnetic fields. The two observers, it turned out, would agree about the mag-
netic field, but not about the electric field. They agreed about the magnitude and
direction of the forces on the particle, but not about the physical nature of those
forces. It was this asymmetry that Einstein remarked upon in his fundamental
1905 paper on special relativity. He made only a casual allusion to the famous
Michelson–Morley experiment12 of 1887 that had failed to detect any dependence
of the speed of light on its direction of motion in a hypothetical absolute space.
Einstein did, later on, discuss an 1851 experiment by Armand Hippolyte Louis
Fizeau (1819–1896) of which the Michelson–Morley experiment was an improved
reconstruction.
In the Newtonian system, there is a time axis common to all observers and
a three-dimensional Euclidean space also common to all observers. When two
observers wish to communicate their observations to each other, it is only necessary
for one of them to say what coordinates are assigned to three points in space
relative to that observer’s origin and what event marks the epoch (time 0) of the
time axis at that origin. The physical anomalies mentioned above, however, forced
a reformulation of mechanics, in which time and space could not be separated.
What Observer O takes to be an orthonormal coordinate system in space is not
orthonormal as seen by Observer O . As a result of that stark difference, each
individual observer might at first sight seem to be isolated in a set of time and
space coordinates, which are for that observer just like the old Newtonian ones, but
does not agree with the equally valid system of time and space coordinates used by
another observer. Two observers in relative motion almost appear to be inhabiting
parallel universes. How can they determine whether an event occurring at point x
at time t in the coordinates used by O is to be regarded as the event occurring at
point x at time t in the coordinates used by O ? What common observations will
enable them to make such an identification?
The key to solving this problem is their common line of motion, assuming
that each believes the other is moving along a straight line at constant speed. We
assume that each can at the very least observe the origin used by the other—it is
convenient to think of the observers as “sitting” at their respective origins at all
times—and assign a location to the origin of that other observer at any given time.
We can identify the line in O-coordinates joining the two origins with the line in
O -coordinates joining the two origins.
After these lengthy preliminaries, we are at last ready to tackle the problem
posed by the constancy of the speed of light and thereby explain the impossibility
of getting 100% agreement on the order of events between two observers in relative
motion.
When we discuss events, we must keep in mind that their temporal order may
depend on the observer. If the observers are not in relative motion, however, that
is, the spatial coordinates of all points are constant in both frames of reference and
the origins always coincide, then we can assume that the time coordinate of any
event is also the same for both, and coordinates can be converted as in classical
mechanics, that is, by a linear transformation (t; x, y, z) → (t ; x , y , z ) given by
t = t,
x = a11 x + a12 y + a13 z ,
y = a21 x + a22 y + a23 z ,
z = a31 x + a32 y + a33 z ,
where the matrix ⎛ ⎞
a11 a12 a13
⎝a21 a22 a23 ⎠
a31 a32 a33
is the invertible matrix that transforms coordinates in one fixed basis of R3 to an-
other. In our case, it will always be a rotation matrix, since we are going to assume
that our observers use only right-handed orthonormal bases in their coordinates.13
To see what these coordinate changes look like in relativity, imagine two ob-
servers, O and O , whose spatial frames of reference are moving relative to each
other in a fixed direction at a constant speed u. Each of our observers is imag-
ined to have a clock measuring time in the Newtonian way through some physical
process that is, by definition, said to be proceeding at a uniform rate, and each
has measuring instruments that measure distances and angles in such a way that
triangles obey the trigonometric laws of Euclidean geometry. These are the proper
time and proper space for that observer, and they have the properties of Newton’s
absolute time and space, including the property that time and space measurements
are independent variables. The difference from Newton’s system is that these times
and spaces are not in agreement with the proper times and spaces of other ob-
servers. When two observers try to reconcile their measurements, each finds the
space and time measurements of the other are entangled and so no longer appear to
be independent variables when compared with his own. In particular, distances (x)
along the common line of motion are mixed up with time (t), and only the difference
(ct)2 − x2 (where c is the speed of light) is agreed upon by both observers.
For simplicity, we assume that at some instant of time, given the value t = 0 = t
by both observers, the three mutually perpendicular coordinate axes used by O
coincide with those used by O , and that the relative motion is a translation along
the direction of the common x-axis (x -axis) at constant speed. We assume that
O ’s origin is moving in the positive direction of the common axis at speed u (from
O’s point of view). Of course, from O ’s point of view O’s origin is moving along the
x -axis in the negative direction, that is, at speed −u. Because of the assumption
to define what a right-handed system is intrinsically. One can divide systems of orthonormal
bases into two equivalence classes and always say whether two bases belong to the same class—
the determinant of the transition matrix between coordinates in the two bases is positive—but
otherwise, there is nothing intrinsic to either system that marks it as being “right-handed.” Thus,
what we are really saying is that we assume the coordinate transformation between any two
observers has a positive determinant.
22 1. TIME, SPACE, AND SPACE-TIME
that the speed of light must be the same for both observers, we cannot now assume
that they are using the same time coordinate, or that simultaneity means the same
thing for both of them. The best we can assert is a kind of homogeneity in events,
expressed by assertions like if event P took place twice as far away from O’s origin
as event Q, and after an elapsed time (measured from the instant when the two
origins coincided) twice as large as the time elapsed when Q occurred, as seen by
O, then the same should be true from O ’s point of view. That is, if all four of the
space-time coordinates of event P are twice those of event Q in O’s system, the
same should be true in O ’s system.
Einstein must have had something like this in mind when he asserted that,
because of our beliefs about the nature of time and space, it seems clear that
the coordinates of an event in one frame of reference must be linear functions of
those in the other frame. In other words, we are assuming that there is a linear
transformation such that
Our first assumption is that the yz-plane coincides point by point with the
y z -plane at a time we shall take as the epoch (time 0) for both observers. That
is, if t = 0 = t and x = 0 = x , then y = y and z = z. This assumption yields the
equalities
0 = a13 y + a14 z ,
0 = a23 y + a24 z ,
y = a33 y + a34 z ,
z = a43 y + a44 z .
If these equalities are to hold for all y and z, then we must have a13 = 0 = a14 ,
a23 = 0 = a24 , a33 = 1, a34 = 0, a43 = 0, a44 = 1. The equations now read
t = a11 t + a12 x ,
x = a21 t + a22 x ,
y = a31 t + a32 x + y ,
z = a41 t + a42 x + z .
The assumption that the motion is along the x-axis in both systems implies
that this axis is the same for both at all times. In other words, if y = 0 = z, then
y = 0 = z also. Putting these values in the last two equations, we find
0 = a31 t + a32 x ,
0 = a41 t + a42 x .
4. THE LORENTZ TRANSFORMATION 23
0 at time 2t ............................................................................................................................................
......................................................................................................................................
ct at time t
Observer O’s view
−ut1 at time t1 ..........................................................................................................................................
.......................................................................................................................................
ct at time t
Observer O ’s view
Figure 1.1. Top: A light ray traveling from the origin of Observer
O’s coordinate system to a mirror on the axis of common motion
and then back to the origin, as described by O. Bottom: The same
process as described by O .
Since these equations hold for all t and x, we must have a31 = 0 = a32 and
a41 = 0 = a42 . Our transformation equations now read
t = a11 t + a12 x ,
x = a21 t + a22 x ,
y = y,
z = z.
These determinations are not particular to the theory of relativity; they are
independent of any assumptions about the speed of light c. In order to determine the
four remaining coefficients aij , i, j = 1, 2, we need to introduce the assumption that
c is the same for all observers. With that assumption, we first consider coordinates
assigned to events on the axis of relative motion. Consider a light ray that leaves
the common origin at time t = 0 = t , travels to a mirror on the positive x-axis
(which is also the positive x -axis), arriving at time t according to Observer O, then
is reflected straight back to the origin of O’s system, necessarily arriving there at
time t1 = 2t according to O. To Observer O , the light ray arrives at the mirror
at some time t , then returns to O’s origin at some later time t1 , when that origin
has coordinates (−ut1 , 0, 0). Thus we have two events, the arrival of the light
ray at the mirror, to which the two observers assign coordinates (t, ct, 0, 0) and
(t , ct , 0, 0) respectively, and its return to O’s origin, to which the two observers
assign coordinates (2t, 0, 0, 0) and (t1 , −ut1 , 0, 0), so that
t = a11 t + a12 ct ,
ct = a21 t + a22 ct ,
t1 = 2a11 t ,
−ut1 = 2a21 t.
The last two equations here imply that a21 = −ua11 .
Now let us look more closely at the return portion of this trip in the “light” of
the fact that the speed of light c is the same for both observers. According to O ,
after leaving the mirror, the light traveled a distance c(t1 − t ) = ct + ut1 . (Here
the left-hand side represents the distance traveled as speed times time elapsed. The
right-hand side represents it directly as the difference of the two distances from O ’s
origin to the starting point ct and ending point −ut1 .) Solving this relation for t ,
24 1. TIME, SPACE, AND SPACE-TIME
we find
c−u
t = t .
2c 1
Substituting this value of t into the first two equations of the coordinate trans-
formation, we find
c−u
t = a11 t + ca12 t ,
2c 1
c−u
t = −ua11 t + ca22 t .
2 1
Since 12 t1 = a11 t, we can cancel this factor; and we get, upon dividing the
second equation by c,
c−u ca12
= 1+ ,
c a11
c−u u a22
= − + .
c c a11
We now rewrite these equations as
a12 c−u 1 u
= − =− 2,
a11 c2 c c
a22
= 1.
a11
Now, letting α = a11 , we have a21 = −uα, a12 = − cu2 α, and a22 = α. Thus,
we have the transformation
u
t = α t − 2 x ,
c
x = α(−ut + x) .
It remains to determine the factor α. This time we imagine a point fixed on O’s
y-axis. A light ray again leaves the common origin at time 0 (in both coordinate
systems) and travels to this point, arriving at time t. Then O assigns to this arrival
the coordinates (t, 0, ct, 0), and, by what we know of the transformation so far, O
assigns coordinates, (αt, −αut, ct, 0) to this same event. For O , the event is the
arrival at the point (−αut, ct, 0) of a light ray that left O ’s origin at time 0, and
this arrival occurs at time αt. Since the proper space of each observer is Euclidean,
we deduce that
cαt = (αut)2 + (ct)2 = t (αu)2 + c2 .
Canceling t, squaring the equation, and then solving for α, we find
c 1
α= √ = .
c − u2
2
1− u2
c2
Theorem 1.1. For two systems of measuring time t and t and rectangular
space coordinates (x, y, z) and (x , y , z ) for which (1) the x-axis and the x -axis
coincide at all times and (2), the origin of the primed system is moving along the
4. THE LORENTZ TRANSFORMATION 25
along the x-axis at speed u, the four coordinates in the two systems are related by
the following system of equations:
u
(1.1) t = α t − 2 x ,
c
(1.2) x = α(−ut + x) ,
(1.3) y = y ,
(1.4) z = z .
We shall refer to this transformation as the Lorentz transformation and say
that it corresponds to a velocity vector u = ui of O relative to O. The reader
can easily compute that the space-time interval between two events is the same for
both observers:
1 1 1 1 1 1
Δs2 = Δt2 − 2 Δx2 − 2 Δy 2 − 2 Δz 2 = Δt2 − 2 Δx2 − 2 Δy 2 − 2 Δz 2 .
c c c c c c
The interpretation of Eqs. 1.1–1.4, is that the event recorded by observer O as
(t; x, y, z) is the same event that is recorded by O as (t ; x , y , z ), provided the
two sets of coordinates are related by these equations (and the x-axis/x -axis is the
line of mutual motion). These equations make it possible for two observers to agree
on
what happens, say,
to a particle that both observe moving along trajectories
x(t), y(t), z(t) and x (t ), y (t ), z (t ) (where of course, the primes do not mean
differentiation).
Remark 1.3. It is sometimes useful to have a four-dimensional space-time in
which all the coordinates have the physical dimension of length, (see Problem 1.19
below). For that reason, we shall occasionally replace the time coordinates t and t
by the “spatialized” times τ = ct and τ = ct . When that is done, the matrix of
the Lorentz transformation takes on a more symmetric appearance:
ux
τ = α τ − ,
uτc
x = α − +x ,
c
y = y ,
z = z .
Thus, the Lorentz transformation imposes two significant modifications on
Newtonian space-time:
(1) It provides a natural absolute velocity—the speed of light, denoted c—
by which we can “temporize” spatial coordinates through Minkowski’s
“mystical formula.” (See the note on p. 5.) The arbitrary velocity v0 we
used earlier for this purpose is no longer arbitrary, being replaced by c.
The unit of time remains arbitrary, however.
(2) It does not preserve the Euclidean metric on R4 , in which a point (t; x, y, z)
has the square-norm t2 + x2 + y 2 + z 2 ; instead, it preserves the pseudo-
metric given by the bilinear form t2 − (x2 + y 2 + z 2 )/c2 , which assumes
both positive and negative values. This bilinear form, used as a metric,
creates a pseudo-Euclidean space. This space is still flat, since its metric
coefficients (the coefficients in the bilinear form) are still constant, but
the square of the space-time interval between two events can be negative.
Such intervals are called spacelike because the spatial portion of the proper
26 1. TIME, SPACE, AND SPACE-TIME
time interval is larger than the time portion; intervals in which the time
portion is larger are called timelike.
Those who appreciate the power and beauty of vectors will want to state it in
vector form, and this can be done, by simply decomposing the spatial portion of
the vectors in R4 into vectors parallel to u and perpendicular to u, that is, writing
x · u x · u
x = u + x − u .
u·u u·u
Since the time coordinate is easily written in terms of the vector dot product,
the resulting vector equation is:
u · x x·u
(t ; x ) = α t − 2 ; x + (α − 1) − αt u ,
c u·u
√
where α = c/ c2 − u · u. The inverse relation is
u · x x · u
(t, x) = α t + ; x + (α − 1) + αt u ,
c2 u·u
Here, for the first but not the last time, we emphasize that both observers
must be using i = u/u as the first vector of an orthonormal basis in order for
the vector equation to be valid. These must be such that the coordinates of the
velocity vector u that O ascribes to O , are the negatives of the coordinates of
the velocity that O ascribes to O. Only in that sense is it helpful to say that
the velocity of O relative to O is −u. The Lorentz transformation was derived
assuming the two observers are using such coordinates. If either of them rotates
the spatial axes, this vector equation becomes false when written out in terms of
components in the two systems. When we introduce a third observer O having a
velocity v relative to O —we assume its components are assigned by O —that is
not parallel to u, we shall find that the relative velocity vectors ±w between O and
O do not have this property unless all three of the observers rotate their spatial
coordinates. Composing two Lorentz transformations is thus not just a simple
matter of multiplying the matrices of the two transformations in fixed coordinate
systems used by the three observers. If the three observers are using arbitrary
orthonormal coordinate systems, the complete composition requires multiplying
five matrices rather than two.
Thus, although the vector notation is compact,√it must be used carefully. We
remark that the quantities u · u = u2 and α = c/ c2 − u2 determine each other
(u · u = u2 = c2 (α2 − 1)/α2 ), so that we could eliminate one of the two from the
Lorentz transformation, replacing (α − 1)/(u · u) by α2 c2 /(α + 1).
....................................... .......................................
........... ........ ........... ........
....... ...... ....... ......
...... ...... ...... ......
..... .....
.. ..
.
.
....
...
12 .
.....
.....
.....
.... ..
.
..........
12 .....
.....
....
. . .
.
.... ... .. ...
.... . ... .... ...
..
. ..
. ...
... ... ..........
...
...
... ..
. ... ..
. ....... ...
.... .... ... .... .....
..... ...
... ... ... ... .....
..... ...
.... ... ..
... ... ..... ...
.. ....................................... ... ..... ................................. ..
...
...
9 3 ..
..
.
... ...
... 9 .. .
3 .
...
..
... .. ... ..
.. ... .
...
... .
. ... ...
... ... ... ...
... ... ...
...
..
... ... ... .
... .... ... .
..
.... ...
. .... ...
..... .... ..... ....
..... ..... ..... .....
......
....... 6 ........
......
......
.......
........
6 ........
.......
.........
................ .................... ..... ............. ........
............ ..................................
.. ....
.... ....
.. ..
... ...
........................................... ..................................................................................................................
... ...
x = 1.496 × 10
... ... 11
x=0
...
...........................................
.. ...
..................................................................................................................
m ..
.... ....
.. ..
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
.. ..
Sun .................................................................................................................. c = 2.99 × 108 m/s ............................................................................................................
Earth
sign on the post giving its distance from the origin and a clock showing the time.
Since each observer is using the familiar Newtonian ideas of space and time, it is
possible for each to synchronize all the clocks in the system. The way to do so is
to imagine, say O seated at his own origin holding a clock, and looking through a
telescope at one of these sign posts, say corresponding to coordinate x. If O’s clock
reads time t, then, looking through the telescope at the signpost located at x, O
should see the clock showing time t − |x|/c, to account for the fact that the light
revealing the clock to O left the sign post earlier and required time |x|/c to reach
the origin. If the two clocks show times related in this way, then the clock at x
can be regarded as synchronized with the clock at the origin. In that way, O gets
a concept of “now” that applies all over the universe, although what is happening
“now” at a distant location cannot be known “now.” That information must await
the arrival of a signal from the point with coordinate x, and cannot occur until
time |x|/c after “now.” Observer O can perform a similar synchronization, and we
shall always assume that both observers have already equipped every point of space
with such a sign post bearing an accurate clock. The synchronization procedure is
illustrated in Fig. 1.2.
The problem is that, while the x-axis is the same for both observers, those sign
posts are moving past each other. Although posts with negative x-values are in
locations coinciding with posts bearing negative x -values, a post with a negative
x value will eventually reach O’s origin and move to a position coinciding with a
post bearing a positive x-value. As it will turn out, the two observers will assign
different x-coordinates and different clock times to all points on the x-axis except
28 1. TIME, SPACE, AND SPACE-TIME
the origin itself, even at the time both call t = 0 = t . We need to see what
the discrepancy amounts to. We shall see that, at the very least, it amounts to the
impossibility of agreement on what is happening “now” at time t = 0 = t anywhere
except at the common origin of the two systems, which coincide at that time—or
should we say, those times, since time is something they don’t generally agree on.
5.1. Time contraction. Imagine O reading the time t on the clock located
at O’s origin and comparing its reading with his own clock. Since O’s origin at
what O calls time t is located at (−ut , 0, 0)—that is, the event (t; 0, 0, 0) to O is
the event (t ; x , 0, 0) = (αt, −αut, 0, 0) to O —we have the equality t = αt. Since
α > 1, O perceives O’s clock as “running slow.” That is, O , reading the clock
at O’s origin and comparing it with his own, says that it assigns too small a value
to the interval between events occurring at O’s origin. There is perfect reciprocity
here: Oobserves a clock located at O ’s origin as running slow by the same factor
1/α = 1 − u2 /c2 . If these two facts appear to contradict each other, note that
they refer to different sequences of “events.” In the first case, we are looking at the
event that O describes as (t; 0, 0, 0) and computing the time coordinate assigned
to it by O , which is t = αt. In the second case, we are looking at the event that
O describes as (t ; 0, 0, 0). Thus we are describing events at a location regarded as
fixed by one of the observers but not the other in each case.
It should not be thought that the apparent slowness each observes in the other’s
clock is caused by the fact that each observer is moving away from the other. The
same time contraction is observed even when they move toward each other. That
is, it holds for negative times t as well as positive ones. True, given that the two
clocks agree when the origins coincide, O will observe a later time on O’s clock
than on his own as the two observers approach each other and an earlier time as
they recede from each other. The time elapsed between two events occurring at
O’s origin—say, for example, the time between two flashes of a beacon located at
that origin—will be shorter as read by O from O’s clock than as read from O ’s
own clock, whether the two are getting closer to each other or farther away.
Nor is the difference due to the fact that the light from O’s origin takes some
time to arrive at O ’s origin. To correct for that time lag, we apply the Newtonian
synchronization discussed above. The relativistic computation assumes that O has
already corrected for that time lapse by keeping a clock synchronized with his own
at every point of space and recording the time of an event at each point using that
clock. For an event occurring at any given point P that is fixed in O ’s frame, the
time O will assign to that event is the time t at which a light ray originating at P
at the time of the event reaches O ’s origin, less d/c, where d is the distance from
the point P to O ’s origin. Let us compare the time shown on O’s clock when O
passes through the point P with the time O assigns to that event. In other words,
we want to compare O’s clock at the instant when O passes through the point
P with the clock at P synchronized with the one at O ’s origin. The two clocks
will not show the same time when O passes through P . To see why, suppose O
passes through P at the time t0 in O’s frame. Suppressing the unnecessary y and
z coordinates, O assigns to this event the coordinates (t0 , 0). To O , that event has
coordinates (t0 , x0 ) = (αt0 , −αut0 ), and so O assigns time αt0 to that event. Since
α > 1, we conclude that O records a smaller time interval between events occurring
at his own origin than O records.
5. CONTRACTION OF LENGTH AND TIME 29
Now let us consider the arrival time of information from that event at O ’s
origin. That time will be
⎧
⎨ t0 c+u
if t0 > 0 ,
c−u
αt0 + α|ut0 /c| =
⎩ t0 c−u
if t0 < 0 .
c+u
5.2. The relativistic Doppler shift. To pursue this line of thought one step
farther, we can imagine O broadcasting a signal in the form of a sine wave whose
peaks are at times nt0 , n = 0, ±1, ±2, . . . and hence have frequency ν = 1/|t0 |.
Assuming t0 > 0, we see that those peaks will reach O at times nt0 (c + u)/(c − u)
and hence the received signals will have frequency ν (c − u)/(c + u). This is a “red
shift” since the frequency of the received signals is smaller than the frequency of
the transmitted signal. That of course is because the two observers are moving
apart for positive values of time. The reader can verify that for negative values of
t0 , when the two
observers were approaching each other, the shift is a “blue shift”
by the factor (c + u)/(c − u).14
In summary, we have two time intervals being measured by two different ob-
servers. There is the time between the transmission of the two peaks at O’s origin,
which O measures as |t0 | and O as α|t0 |. And there is the time interval between
thereception of the two peaks at O ’s origin, which, if t0 > 0, O measures as
t0 (c + u)/(c − u) and O as αt0 (c + u)/(c − u) = ct0 /(c − u). (These two events
both occur at the same point of space in O ’s frame, and hence the time interval
that O measures between them is α times the interval measured by O ; that follows
from the symmetry of the principle enunciated above.) From O’s point of view, the
reception times differ by more than the transmission times because O is moving
away, and each successive peak has farther to travel in order to reach O . In fact,
O perceives the Doppler shift in the frequency of the signal O is receiving to be by
a factor 1 − u/c when the two observers are moving apart and by a factor 1 + u/c
when they are moving toward each other. This is precisely the classical Doppler
shift. To compare these quantities, we have the following inequalities for a signal
14 This factor characterizes the relativistic Doppler shift, which differs from the acoustic
Doppler shift that applies for signals transmitted as waves in a stationary medium. In the latter
case, the shift in frequency is by the factor 1±u/c for observer speeds u smaller than the speed c of
the signal. The Doppler shift is named after the Austrian physicist Christian Doppler (1803–1853),
who identified it in 1842.
30 1. TIME, SPACE, AND SPACE-TIME
at time t0 to point b on the same axis at time t1 > t0 . Then O will say the
6. COMPOSITION OF PARALLEL VELOCITIES 31
journey required time t1 − t0 and covered the distance b − a. In contrast, O will
say that the particle passed from the point α(a + ut0 ) at time t0 = α(t0 + ua/c2 )
to the point α(b + ut1 ) at time t1 = α(t1 + ub/c2 ), that the passage required
time
t1 −t0 = α(t1 −t0 +(b−a)u/c2 ) and covered distance α b−a+u(t1 −t0 ) . In terms of
the speed the two observers ascribe to the particle,
O thinks it is v = (b−a)/(t
1 −t0 ),
while O thinks it is w = u(t1 − t0 ) + (b − a) / (t1 − t0 ) + u(b − a)/c . Replacing
2
b−a by v(t1 −t0 ) and cancelling t1 −t0 , the reader can easily verify the fundamental
equation
w = (u + v)/(1 + uv/c2 ) .
This formula gives the velocity of the particle relative to O, given that its
velocity relative to O is v and O ’s velocity relative to O is u. It works only in the
particular case when the two velocities are parallel.
Remark 1.4. The alert reader will have observed the similarity of the formula
u+v
w=
1 + uv
c2
with each speed u a length U , referred to a standard length k, making the following
correspondence:
U
u = c tanh ,
k
u c+u
U = k arctanh = k ln .
c c−u
(See Problem 1.8 below.)
Under this correspondence we find that
u+v U V
uv ↔ c tanh + .
1 + c2 k k
A further advantage of this convenient substitution is that √ it gives an el-
egant expression for the Lorentz magnification factor α = c/ c2 − u2 , namely
α = cosh(U/k). That relation could be used to simplify the study of relativistic
velocities in two dimensions. For other reasons, however, we shall work laboriously
through our transformations using only the language of velocities and only later
(mostly in Appendix 1) reveal its astonishing connection with hyperbolic geometry.
Thus, under the pairing u ↔ c tanh (U/k) relativistic velocities (positive, neg-
ative, and zero) are in one-to-one correspondence with lengths, and the usual geo-
metric addition of collinear lengths corresponds to the relativistic addition of the
corresponding velocities. Moreover, the speeds ±c correspond to the points ±∞ on
the extended real line.
If that were all there is to the paradox, we might be surprised, but not aston-
ished. But, if we believe in the relativity of motion, we should be flabbergasted,
or so it seems at first sight. After all, in the frame of reference used by Mary, it
was her brother John who went traveling, along with the rest of the universe, and
reversed directions just as the star arrived at her rocket ship. It was his clock that
was running slow. Why isn’t John the younger of the two?15 This paradox has
been discussed in many popular accounts of relativity. Quite often, the discussion
invokes the general theory of relativity, according to which clocks do run slower in a
gravitational field. However, this phenomenon was discussed by Einstein before he
developed the general theory of relativity; and, as we shall show below, the fact that
Mary winds up younger in this scenario follows from the Lorentz transformation
alone and does not require the general theory or any invocation of inertial forces to
explain it.
At the heart of the matter lies the FitzGerald–Lorentz contraction. The two
siblings agree as to their relative speed, which we have called u and will assume
constant here. But they do not agree on the distance between the point where they
passed each other and the star that later came by Mary’s ship. √ If John measures
that distance as d, Mary measures it as d/α, where α = c/ c2 − u2 . This is the
case on both the outward and return journeys, so that to Mary the journey covered
a total distance 2d/α and since she was traveling at speed u relative to the Earth-
bound frame, the total elapsed time for the round-trip journey was t = 2d/(αu).
That is the amount of proper time elapsed on the rocket ship during the round-trip
journey, whether measured by a mechanical clock on board the ship or by the aging
process in her body. But from John’s point of view, Mary traveled a distance d and
back at speed u, so that his clock records an elapsed time of 2d/u. Since α > 1,
the twins will agree that Mary’s clock shows less time elapsed than John’s.
This scenario does not have the symmetry that is sometimes erroneously said
to follow from the relativity of motion. To be sure, John also observes that the
distance between two fixed points in Mary’s frame of reference is shorter than
Mary herself measures them to be. But the star and Mary are not relatively fixed.
The distance between them is constantly changing. Taking the primary axis for
both observers to be their line of mutual motion, so that the star lies on that axis,
we see that when the siblings pass each other the first time, Mary is not setting out
to reach the point on her own axis that John regards as being at distance d from
Mary’s origin at that instant. In fact, to make the story an accurate reflection of
relativity, Mary can’t reach any point other than the origin in her own space, since
we imagine that is where she stays located. Her spatial coordinates move along
with her. If the twins agree to synchronize their clocks at time 0 when they pass
each other, the two clocks they are keeping at Mary’s destination will not agree as
to the time when Mary’s journey began. Since the star is fixed as far as John is
concerned, he will say it is also time 0 on the star, which is at distance d. In other
words, if he has placed a clock at that star synchronized with his own and looks at
it through a telescope, he will always see that clock showing time t − d/c, when his
own Earth-bound clock shows time t.
15 If you look on the Internet, you will not have any difficulty finding people, some even
claiming to have advanced degrees in physics, who deny the paradox for exactly that reason. Such
people ought to know better. The reality of the twin paradox has been confirmed by experiment.
34 1. TIME, SPACE, AND SPACE-TIME
Now, what John regards as the event (t; x, y, z) is seen by Mary as the event
α(t − ux/c2 ); α(x − ut/c), y, z). John, as just remarked, has a clock at the distant
star synchronized with his own. At the instant (t = 0) when Mary passes him,
he records the event at the distant star as (0, d, 0, 0), the event being that John’s
local clock at the star reads zero. Mary, however, will record this same event as
(−αud/c2 ; αd, 0, 0), so that in her coordinates John’s clock at her destination read
zero when the star was at the distance αd from her origin, and that was at the
earlier time −αud/c2 . Between that earlier time and the time when she passed
her brother, which both agree was time 0, the distance between Mary and the star
shrank from αd to αd − αu2 d/c2 = αd(1 − u2 /c2 ) = d/α; that is the distance she
plans to travel.
True, Mary can regard herself as remaining in one place, at rest, while her
brother moves at speed −u, but she does not measure the distance that John
moved as d. When the twins meet for the second time, Mary will say that John
traveled only the distance 2d/α, and her clock will show elapsed time 2d/(αu), just
as we found when analyzing the situation from John’s point of view. The difference
is that the star did not move in John’s frame of reference, whereas it did move in
Mary’s frame of reference.
To summarize: Mary, whose distance to the turnaround point is changing over
time, is the one who stays younger, because her biological clock is slow. As some
writers express the matter, the traveling twin moves out of the frame of the Earth-
bound twin, then moves back in again.16 It is the bookkeeping involved in making
those transitions that causes the loss of time for the traveling twin.
And yet. . . one still has the feeling of something-wrong-here. After all, accord-
ing to Mary, the clock her Earth-bound brother uses has been running slow for the
entire time of the journey. Why then does it show a later time when the twins meet
for the second time? How can the tortoise, who always runs slower than the hare,
nevertheless win the race? How can someone go into a revolving door behind you
and emerge ahead of you? It seems to be a sleight-of-hand trick, like the conjuror
who asks you to draw a card from a deck and hold onto it, then suddenly pulls it
out of the deck that you drew it from. Somehow, when you weren’t looking, the
card got switched, but how?
To make the matter as plain as possible, “the card was switched” before the
journey even started. In fact, Mary cannot synchronize her clock with any accurate
clock on the star because the two are in relative motion. As long as we focus our
attention on the Earth-centered frame of reference, we do not notice the switch,
and then we are surprised when we look at Mary’s clock and discover that it is
running slow as she arrives at the star.
The algebraically simplified form in which we derived the Lorentz transforma-
tion requires a common event as origin, an event that can be regarded as having
both space and time coordinates equal to zero in both systems. For this hypothet-
ical journey, let O, using time-space coordinates (t, x), be John, and let O , using
coordinates (t , x ) be Mary. There are three events that we use as anchors here.
Event 0, when Mary passes Earth heading to the star; Event 1, when she arrives at
16 This statement assumes that Mary began and ended the journey standing on the Earth; it
ignores the (perhaps tiny) portion of the journey during which acceleration was needed. We prefer
to picture Mary as constantly whizzing around and just happening to pass her brother going in
opposite directions at two different times, while hurtling past him. That way, we avoid any need
to discuss acceleration.
8. RELATIVISTIC TRIANGLES 35
the star and immediately heads back toward Earth at the same speed; and Event 2,
when she passes Earth for the second time. We have to change origins at Event 1
in order to use Lorentz coordinate transformations on the return journey. For any
event E occurring at John’s origin at time t, we have O-coordinates (t, 0) and O -
coordinates (t , x ), where t = αt and x = −αut. Thus, during the outbound por-
tion of the journey Mary does record later times for all events at John’s origin than
John records for them. In that sense, Mary can say that John’s clock is slow. But
Event 1 does not occur at John’s origin. We have focused our attention on the wrong
place and by so doing missed the “card switch.” To John, Event 1 has coordinates,
say (t1 , d), where d = ut1 . According to the Lorentz transformation, Mary assigns
coordinates (t1 , d ) to that event, where d = α(d − ut1 ) = 0—that is, it occurs at
Mary’s origin, as we already knew—and t1 = α(t1 −ud/c2 ) = αt1 (1−u2 /c2 ) = t1 /α,
which is smaller than t1 . It is the ambiguity in the meaning of the term simulta-
neous that causes the surprise. By John’s measurements, Event 1 occurred at time
t1 = d/u; by Mary’s, it occurred at the earlier time t1 = t1 /α. It is the same event,
of course, namely Mary’s arrival at her destination.
Starting with Event 1, John and Mary both need to “zero out” their coordinate
systems for Mary’s return journey, and the velocity u needs to become −u. The
Lorentz transformation that is in effect for this part of the journey is as follows:
u(x − d)
t − t1 = α (t − t1 ) + ,
c2
x = α x − d + u(t − t1 ) .
Then, when Event 2 occurs (Mary arrives back at Earth), John’s coordinates
for this event will be t = 2d/u = 2t1 , x = 0, so that Mary’s will be t = t1 +
α(t1 − ud/c2 ) = t1 /α + αt1 − u2 t1 α/c2 = t1 α(1/α2 + 1 − u2 /c2 ) = 2t1 /α and
x = α(−d + ut1 ) = 0. Thus, we can rigorously compute that the time on Mary’s
clock will be 1/α times the time on John’s clock: Mary will be younger by this
factor.
Switching to our hare-and-tortoise analogy, we recall that the tortoise won the
race because the hare took a nap. We are the ones who were caught napping in this
race, focusing our attention on the Earth-bound clock, since the episode began at
Event 0 and ended at Event 2, both of which took place on the Earth. We should
have looked at the clocks at Mary’s destination.
To phrase the matter in terms of the conjuring-trick analogy, the assertion that
O’s clock is running slow is the kind of distraction the conjuror uses to get you
to focus on the card you thought was in your hand (John’s clock on the Earth)
but which was in fact always in the deck (John’s clock on the star), not in your
hand. The switching is revealed when your attention finally focuses instead on the
deck of cards. If you would like to see the trick performed in slow-motion, work
Problem 1.2 below.
8. Relativistic Triangles
With the advent of special relativity, the clean, simple Newtonian system, with
its absolute, universal space and time, became untenable. It became impossible to
determine the angle made by two lines at a given time, since observers in relative
motion to each other may agree on the location of one point at a given time,
while disagreeing as to the location of a second point at that same instant of time.
36 1. TIME, SPACE, AND SPACE-TIME
As a result, two observers will probably disagree as to what points constitute the
vertices of a given triangle at a given time. Although each individual observer
is using his own proper space, which is Euclidean, when that observer’s findings
are communicated to a second observer, the lines that the first observer regards
as fixed are moving when viewed by the second observer, and that motion can
cause perpendicular directions to go askew. The only kind of perpendicularity
agreed upon by two observers moving at constant speed and in a constant direction
relative to each other, is one particular system of coordinate axes consisting of the
line of their mutual motion (that is, the line from each to the other), and the planes
perpendicular to it. Even in that case, they do not agree about the unit of length
along the first axis.
The absence of absolute simultaneity makes it difficult to discuss the “triangle
whose vertices are P , Q, and R.” While those points may be fixed in the spatial
coordinates of Observer O, if O and O are moving relative to one another, Observer
O will see a triangle of very different shape. Thus, triangles of position are slippery
objects, impossible to define in an observer-independent way. What we are going
to call relativistic velocity triangles are much better behaved in this regard.
We shall almost always find it easier to assume that suitable rotations of coordi-
nates have already been performed by O and O , that is, that O and O are already
using the “privileged” coordinate systems in which they share the first spatial axis.
A vector equation used by O to express a relationship between vectors measured
only by O (for example, Maxwell’s laws) is independent of the orthonormal basis
used by O, since rotations preserve the vector operations of addition and scalar
multiplication, as well as the dot and cross product on R3 . But O cannot rotate
axes and reinterpret correctly data received from O . Transmitting the rotated data
back to O will not produce a rotation of the original data. We shall have occasion
to write the laws of mechanics and electromagnetism in vector form and exhibit the
transformation of those laws from one observer to another. Such equalities are not
truly vector equalities, since they hold only when the vectors are expressed in the
privileged bases of the two observers’ systems, that is, those that share a common
spatial axis along the direction of mutual motion.
As an example of what we have been talking about, notice that O regards
the lines whose equations are y = 2x, z = 0 and x = −2y, z = 0 as mutually
perpendicular. But at any given instant t , O considers these to be the lines with
equations y = 2αx + 2αut , z = 0 and αx + αut = −2y , z = 0, which are not
perpendicular unless u = 0. They have slopes m1 = 2α and m2 = −α/2, and the
usual condition for perpendicularity (m1 m2 = −1) implies α = 1, which is true
only when u = 0.
w = w12 + w22
17 We remark here that this angle is measured by O using Euclidean measuring instruments,
even though the resulting triangle will not have sides and angles that satisfy Euclidean rela-
tionships. There is no paradox here. An angle is a physically and geometrically dimensionless
quantity representing an amount of rotation, and rotation is absolute in all three geometries: el-
liptic/spherical, parabolic (Euclidean), and hyperbolic. A right angle is one-fourth of a complete
rotation. It forms a natural unit of angular measure, and is universally assigned the numerical
value π/2. The reason for using that seemingly arbitrary value comes from analysis; Taylor series
become very cumbersome if any other measure is assigned to a right angle. We shall call the
resulting values of all angles simply the numerical values of those angles. We do not like to use
the term radian measure, since it suggests what is true only in Euclidean geometry—that the nu-
merical value of the central angle subtended by a circular arc is the ratio of the length of the arc
to the length of the radius and is independent of the radius. In hyperbolic and elliptic geometry,
the numerical value of the angle is not normally equal to that ratio, and the ratio itself varies with
the radius of the circle.
18 Since O is moving along O ’s negative first spatial axis, the angle η must be measured
clockwise from that axis to the line of motion that O ascribes to O . That accounts for the
negative sign in the first coordinate of the location of O ’s origin, as seen by O .
38 1. TIME, SPACE, AND SPACE-TIME
Remark 1.5. This last equality relates the three magnification factors in the
Lorentz
√ transformations
√ corresponding √ to the three velocities. That is, if α =
c/ c2 − u2 , β = c/ c2 − v 2 , and γ = c/ c2 − w2 , then
uv cos η
γ = αβ 1 − .
c2
We have now established the simple but fundamental fact that if O ’s velocity
relative to O is constant in direction and constant (less than c) in magnitude, as
judged by O , while O ’s velocity relative to O is similarly constant in direction and
constant (less than c) in magnitude relative to O, then O ’s velocity relative to O
is also constant in direction and constant (less than c) in magnitude. Therefore
the composition of two relativistic velocities is a relativistic velocity and hence a
suitable (privileged) pair of coordinate systems that can be chosen by O and O
should be related to each other by equations of the simple form derived above for
the Lorentz transformation. Making that fact computable and verifying that the
implied composition is associative if a fourth observer O comes along will occupy
the last few sections of the present chapter. In the meantime, we wish to develop
the trigonometry of these relativistic velocity triangles.
Recalling our remark above that if u = c tanh (U/k), then α = cosh(U/k),
we have, upon replacing β and γ by cosh(V /k) and cosh(W/k), the fundamental
relation
This last equality√is precisely the law of cosines in a hyperbolic plane whose ra-
dius of curvature is k −1 (see Appendix 1). Putting the matter another
way, under
the correspondence u ↔ U , where u = c tanh(U/k) and U = k ln (c + u)/(c − u),
the relativistic velocity triangle with sides u, v, w corresponds to a triangle with
sides U , V , W , and if the latter is regarded as being in the hyperbolic, plane,
then the two triangles have the same angles. Thus, at least as far as the law of
cosines goes, the trigonometry of a relativistic velocity triangle is identical to the
trigonometry of a triangle in a hyperbolic plane. Who could have guessed, two
centuries ago, when this geometry was invented, that it would turn out to describe
a world of physical laws that had not yet been imagined?
The line from O to O lies in the plane of O, O , and O , and, as will be shown
in the optional material that now follows, it makes an angle ξ with the positive
x-axis, where
w1 u − v cos η u − v cos η
cos ξ = = = ,
w w(1 − uv cos η/c2 ) u2 − 2uv cos η + v 2 − u2 v 2 sin2 η/c2
w2 v sin η
sin ξ = =
w (wα(1 − uv cos η/c2 ))
v sin η
= .
α u2 − 2uv cos η + v 2 − u2 v 2 sin2 η/c2
9. COMPOSITION OF RELATIVISTIC VELOCITIES AS A BINARY OPERATION* 39
agree on as to the magnitude and direction of a velocity. Only after we get those
conventions, which we shall do using hyperbolic plane trigonometry, will we be in
a position to write the concatenation of two relativistic velocities in a form that
computers can accept and work with.
There is a second difficulty as well, again connected with the special choice of
axes used in deriving the Lorentz transformation. Because of the special choice of
coordinates that we made, we got a simple “standard-form” matrix to represent the
transformation. It is symmetric, and ten of its twelve nondiagonal entries are equal
to zero. But this matrix takes account of only the relative speed of the motion; it
assumes that the axes have already been adjusted to take into account the direction
of that motion. If two velocities that we are composing are not parallel, multiplying
the corresponding matrices does not yield the matrix of the composite velocity,
since the simplified matrices relating O and O to O have to be computed in axes
that are rotated relative to those used for the matrix relating O to O . After we
solve the first problem, thereby making it possible to say in a computational sense
that composition of relativistic velocities is an associative operation, we will be
in a position compose the matrices of two standard-form Lorentz transformations
and get the standard-form matrix of the composition of the two velocities that
they correspond to. (As the reader may already have guessed, the procedure is to
sandwich each matrix between two rotations of R4 representing the alignment of
the first spatial axis of each observer with the lines of relative motion of the other
two.)
19 This is because circles have an absolute meaning in all three geometries, and arc length
on a circle is rotation-invariant, just like the central angles subtended by arcs on the circle. On a
given circle in any of these geometries, central angles and the arcs they subtend are proportional.
The value π/2 for a right angle is based on the Euclidean case, where the measure of an angle
really is the ratio of the arc it subtends to the radius. While this value is only a convention, it is
extremely practical and will be called the measure of a right angle (not the radian measure, since
this ratio is not the same for circles of different radii in elliptic and hyperbolic geometry).
9. COMPOSITION OF RELATIVISTIC VELOCITIES AS A BINARY OPERATION* 41
O...
. ..
... ......
... .........
.. .....
... .
.............. ........................
. ....... .....
..
.
.
...
. η .....
.....
.....
.
.. .....
... .....
. .....
...
. .....
.....
..
. .....
.... .....
.....
... .....
.... .....
.... .....
.....
... .....
.... .....
.... .....
.....
.
.. .....
... .....
v ..
.
..
.
. .....
..... u
.....
. .....
.... .....
... .....
.... .....
.....
.... .....
.....
...
.....
.... .....
.
.. .....
... .....
. . .....
.
. .....
.
... .....
.....
...
. .....
... .....
.. .....
. . .....
.... .....
.....
..
. .....
.... .....
....... .....
..
.
.. .....
...
ζ ξ
....
... .....
... . .. ........
.... ... .
... .....
. .
...............................................................................................................................................................................................................................................................................................................................................................
O w O
it and the angle between them. The angle measured by the observer located at
each vertex is the angle between his lines of sight to the other two observers. Since
the velocities are assumed constant, that angle does not change over time, and it is
therefore taken as the definition of the angle of the velocity triangle at that vertex.
Each observer knows the two sides and included angle where he is located, and
each pair of observers agrees about the magnitude of the side that each measures
as the speed of the other. The set of three sides and three angles that results
will be called a relativistic velocity triangle. If one of the observers computes the
magnitude of the side opposite to his vertex using the Euclidean law of cosines, the
result will not be the value that the other two observers agree on. That computed
velocity might well be larger than c when computed in this way. That means only
that two objects in an observer’s space can move faster than light relative to each
other, as judged by the observer, but not relative to the fixed frame of axes used by
the observer. Since the sides of these triangles do not represent lengths, we need
not think of them as situated in the Euclidean space familiar from geometry. The
assignments of sides and angles that we have made are what three actual observers
would measure in reality, however; and we can transfer them into a geometric
space where the sides are lengths via the correspondence u ↔ c tanh(U/k), U =
k arctanh (u/c) mentioned earlier, retaining the assignment of angles already made.
They now form ordinary triangles in a three-dimensional space, but the geometry
of that space is hyperbolic rather than Euclidean.
In Newtonian mechanics, an observer looking at the origins of the three systems
at any given time would see them at the vertices of a physical triangle having the
angles shown in the velocity triangle and sides proportional to those shown. The
relations between the parts of the observed triangle will satisfy the Euclidean law
of cosines:
w2 = u2 + v 2 − 2uv cos η .
42 1. TIME, SPACE, AND SPACE-TIME
This law of cosines was derived in Einstein’s 1905 paper on special relativity.
Assuming that it holds for each angle in the triangle, it allows us to express the
side opposite that angle in terms of the angle and the two sides adjacent to it.
The law of cosines is the basic rule for all trigonometry. Since a triangle is
to be determined by any two sides and the included angle (Euclid’s fundamental
hypothesis about congruence of triangles), trigonometry faces the task of expressing
all six parts of a triangle in terms of a given angle and its two adjacent sides.
Obviously, it suffices to find an expression for the other two angles, since the third
side is already determined by the law of cosines itself, and the two adjacent sides
have to be given anyway. That task is achieved by the following formulas.
Theorem 1.3. The cosines and sines of angles ξ and ζ in Fig. 1.3 are given
in terms of angle η and sides u and v by the following formulas:
u − v cos η u − v cos η
(1.6) cos ξ = = ,
u2 + v 2 − 2uv cos η − u2 v 2 sin η/c2 2 w(1 − uv cos η/c2 )
v sin η v sin η
(1.7) sin ξ = = ,
α u2 + v2 − 2uv cos η − u2 v 2 sin 2
η/c2 αw(1 − uv cos η/c2 )
v − u cos η v − u cos η
(1.8) cos ζ = = ,
u2 + v 2 − 2uv cos η − u2 v 2 sin2 η/c2 w(1 − uv cos η/c2 )
u sin η u sin η
(1.9) sin ζ = = .
β u2 + v 2 − 2uv cos η − u2 v 2 sin η/c2 2 βw(1 − uv cos η/c2 )
√
Here β = c/ c2 − v 2 . Notice that, when η = π/2 (the case of a right triangle)
the expressions for cos ξ and cos ζ agree with the Euclidean definitions, being u/w
and v/w respectively. The sines are not the same as for a Euclidean triangle, being
off by a factor of α for sin ξ and β for sin ζ.
Proof. We take as our starting point the assumption that the relativistic law
of cosines holds for each of the angles of a triangle, and we begin by showing how
to derive Eq. (1.6) for the cosine of the angle ξ from this assumption. Applying the
9. COMPOSITION OF RELATIVISTIC VELOCITIES AS A BINARY OPERATION* 43
u2 + v 2 − 2uvy − u2 v 2 (1 − y 2 )/c2
w2 = 2 ,
1 − uvy
c2
u2 + w2 − 2uwx − u2 w2 (1 − x2 )/c2
v2 = 2 .
1 − uwx
c2
uwx 2
(1.12) v2 1 − 2 = u2 + w2 − 2uwx − u2 w2 (1 − x2 )/c2 .
c
To sum up, we shall say that a relativistic velocity triangle with sides u, v, and
w and opposite angles ζ, ξ, and η respectively, is one for which all three relativistic
44 1. TIME, SPACE, AND SPACE-TIME
These laws imply the following relation between the cosines of the angles ξ and
η opposite sides v and w respectively.
u − v cos η
(1.13) cos ξ =
η .
w 1 − uv ccos
2
All the relations among the velocities and angles measured by our three ob-
servers can be deduced from any three of them—which may even be the three angles
ξ, η, and ζ—using these trigonometric formulas. What makes these triangles rel-
ativistic rather than Euclidean is the fact that each vertex is associated with an
observer, whose Euclidean/Newtonian measurement of the two sides and angles at
that vertex are used to assign measures to these parts of the triangle. Each pair of
observers agrees about the velocity (length) of the side joining their vertices, but
there is no such agreement about any of the angles. They are arbitrarily defined
as the angles measured by their corresponding observers. In general none of the
six parts of the triangle is agreed to by all three of the observers. Each observer,
we emphasize, is using Euclidean geometry to measure the two sides and the angle
at his vertex. That observer will consequently use the Euclidean law of cosines
and Euclidean trigonometry to determine the opposite side and the other two an-
gles, and each of the other two observers will generally disagree with him about all
three of them. These values inferred from Euclidean geometry are therefore to be
discounted when the observers reconcile their observations. By making use of the
relativistic trigonometry just discussed, each observer can deduce the measurements
the other two are making, and they can then agree on all their relative velocities.
(1) To solve a triangle given two angles and a side, we use the law of sines
given in Problem 1.10 below. This law for relativistic velocity triangles is
equivalent to the hyperbolic law of sines stated above.
(2) To find the angles given all three sides, see Problem 1.9 below.
(3) To solve a triangle given only its three angles ξ, η, and ζ opposite sides v,
w, and u respectively, we combine the algebraic equations for the cosines
of the angles (Mathematica is highly recommended for this exercise), and
get the following formulas, which hold provided the expressions under the
radicals are positive, as they must be if the sum of the three angles is less
than two right angles.
c cos2 ξ + cos2 η + cos2 ζ + 2 cos ξ cos η cos ζ − 1
u = ,
cos ζ + cos ξ cos η
c cos2 ξ + cos2 η + cos2 ζ + 2 cos ξ cos η cos ζ − 1
v = ,
cos ξ + cos η cos ζ
c cos2 ξ + cos2 η + cos2 ζ + 2 cos ξ cos η cos ζ − 1
w = .
cos η + cos ζ cos ξ
It is easy to verify that u, v, and w, as determined from these equa-
tions, are all smaller than c, provided all the denominators are positive, as
must be the case if ξ, η, and ζ are to be the angles of a relativistic velocity
triangle. In fact, any three positive angles ζ, ξ, and η, the sum of whose
measures is less than π form the angles of a unique relativistic velocity
triangle whose respective opposite sides are given by the three formulas
above. The condition ζ + ξ + η < π guarantees that the expression under
the radical and all three denominators are positive. (See Problem 1.17.)
10.1. Right triangles. When O and O are moving along mutually perpen-
dicular lines as judged by O , these formulas naturally simplify. Taking η = π/2,
we get the following equalities:
c 2 2
(1.16) w = u2 + v 2 − u2 v 2 /c2 = α β − 1,
αβ
u uαβ
(1.17) cos ξ = = ,
u + v − u v /c
2 2 2 2 2 c α2 β 2 − 1
v vβ
(1.18) sin ξ = = ,
α u + v − u v /c
2 2 2 2 2 c α β2 − 1
2
v vαβ
(1.19) cos ζ = = ,
u2 + v 2 − u2 v 2 /c2 c α2 β 2 − 1
u uα
(1.20) sin ζ = = .
β u + v − u v /c
2 2 2 2 2 c α2 β 2 − 1
Example 1.1. The vector addition of velocities and displacements was dis-
cussed by Galileo in the seventeenth century. To use a modification of his example,
suppose a ship is moving at a rate of 12 km per hour relative to the Earth. If
you walk from the√ port side to the starboard side at 5 km per hour, then your
actual speed is 122 + 52 = 13 km per hour, in a direction that makes an angle
arctan(5/12) ≈ 22.6◦ with the direction the ship is moving. Relativity changes this
relation. Since the speed of light is about 300, 000 km per second, if Observer O sees
10. PLANE TRIGONOMETRY* 47
N..... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...... N.... ... ... ... .
.......... .......... . .. ... ... ... ... ... ... ... ... ...
........ ... ... ... ... ....
... ... .. ... ...........
... ... .. .... ..
... ... .. .. ... ..
. .. ..
... ... .. ... .
... ... .. ... ... ..
... ... .
.. ... ... ...
... ... .. ... ... ...
... ... .. ... ...
... ..
. .. ... ... ..
... .. ... ..
. .
.
. ..
... ..
. ... .
...
..
.. .. ..
... . .. .... ... .
... .. .. ...
... ...
. ..
... ... ...
.. .. .
... . .. ... ... ...
... ... .. ... ...
... ... .. ... ... ..
... ... .. ... ... ...
... ..
. ... ... 5.
. 13 km/h .. ...1.28×10 ...km/s
12 km/h ... .
. .
.
.
. .. 1.2×105 km/s ..... ...
..... .
.
.. ..
.... ... ...
.. ..
.. ... .. ... ... .
... ..
. .. ... ... .
... ... ... ... ..
.. ..
... ... .. ... ... ..
... ... .. ... ...
... ... .. ... ...
. ..
.
... 22.6◦.... .. .... 24.4◦ ... ...
... .
. .. .. ...
..................... .... .. ...................... ..... ..
... .... .. ... ...... ..
.. ..
... . .. ... ... ..
... ..
. .. ... ...
... ..
. .. ... ... ..
... .... ... .... ...
... ... .. ... ... .
... ... ... ... ... ..
... ... . ... ... ..
....... ..
. ...... ...
.................................................................................................... ..
..................................................................................................
.
E 4
E
5 km/h 5×10 km/s
fact, since it is a fact of hyperbolic geometry. Nevertheless, we shall give two direct
proofs of it, one here for right triangles, and one later for general triangles.
Theorem 1.4. Let ξ and ζ be the two acute angles of a relativistic velocity
triangle containing a right angle. Then ξ + ζ < π/2.
Proof. By √ formulas (1.18) and (1.19), we have α sin ξ = cos ζ = sin(π/2 − ζ).
Since α = c/ c2 − u2 > 1, it follows that sin ξ < sin(π/2 − ζ), and therefore
ξ < π/2 − ζ, that is, ξ + ζ < π/2, as asserted.
It is not difficult to show (see Problem 1.13) that the sum of the three angles
ξ, η, and ζ in any relativistic velocity triangle is less than two right angles.
In the special case when u = v, we get what O regards as an isosceles right
triangle, with acute angles ξ given by
α 1
ξ = arccos √ = arccos .
α2 + 1 1 + 1/α2
Obviously, ξ is a decreasing function of α. The limiting cases are u ↑ c, in which
ξ ↓ 0, and u ↓ 0, in which ξ ↑ π/4, the latter being the case when all observers
are at rest relative to one another, so that the geometry becomes Euclidean, and
simultaneity becomes observer-independent.
Remark 1.7. Although in general three observers do not agree about the ve-
locities in the triangle whose vertices they form, there is a range of vertex angles
η in an isosceles triangle for which there is a speed u (depending on η) on each of
the two sides of the angle η such that the observer O at the vertex of the angle
will agree with O and O as to their mutual speed. The range of angles η for which
such a speed u exists is
60◦ = arccos(1/2) < η < arccos(1/3) < 70.528779365509308630755◦ .
Over this range, the corresponding speed u decreases from c to 0. (See Problem
1.11 below.) Observers O and O do not agree with O about any of the angles,
and neither of them thinks this is an isosceles triangle. (For example, O does not
agree with O and O that their√relative speed is v = u.)
As an example, take u = (3 2/5)c and cos(η) = 5/12. With those values, you
will find that
η ≈ 67.056553501352011261◦ .
For this case, all three observers agree as to the relative speed of O and O ,
namely 0.84c.
really work out? We take up that question in Section 12 below and exhibit its
computational implementation with Mathematica Notebook 5 in Volume 3.
The second question is the associativity of the law of relativistic composition
of velocities. Suppose we have four observers, say A, B, C, and D, and that
the velocity of B relative to A is u, that of C relative to B is v, and that of D
relative to C is w. Then the velocity of C relative to A can be represented as
u +L v, where this notation indicates the composition of velocities given by the
relativistic velocity triangle determined by A, B, and C, as above. And then, by
our computations, the velocity of D relative to A must be (u +L v) +L w. On
the other hand, the velocity of D relative to B is v +L w, and consequently—if we
believe the computations that have been performed above, the velocity of D relative
to A must also be u +L (v +L w). Since these two velocities are obviously the same,
and we don’t think we have made any mistakes in logic during our derivations, we
have to conclude that the binary operation +L is an associative operation:
(u +L v) +L w = u +L (v +L w) .
This relation would appear to settle the question, but there are subtleties in-
volved when we try to implement this rule computationally. The velocities here are
given in three distinct coordinate systems, used by A, B, and C. In what sense can
they be added at all? How can you add coordinates in one set of axes to coordi-
nates in a completely different set? We can tame the problem a bit by passing to
a three-dimensional hyperbolic space, in which case the associative law becomes a
geometric theorem:
Theorem 1.5. If the sides and angles of three faces of a tetrahedron satisfy the
relations of relativistic velocity triangles, then (1) the sides and angles of the fourth
face are determined and (2) the fourth face is also a relativistic velocity triangle.
The proof of this theorem forms Section 5 of Appendix 1 in Volume 2. The fact
that Lorentz transformations are closed under composition and that their composi-
tion is an associative operation makes them into a group, called the Lorentz group.
It is a six-dimensional Lie group.
11.1. Associativity*. The fact that the fourth face of the velocity tetrahe-
dron is a relativistic velocity triangle when the other three are amounts to the
associative law for the relativistic composition of velocities. It is quite obvious
that velocity 0 is an identity for this composition, and it seems that the inverse
of velocity u should be −u. And indeed it is, when −u is defined as the velocity
of O relative to O given that u is the velocity of O relative to O. As we have
already pointed out above, the associativity of this verbally described operation is
completely obvious, even without any computations. Thus we have turned the set
of physically possible relative velocities in three-dimensional space into a group. We
emphasize, however, that both the elements of this group and the group operation
have so far only verbal descriptions. Even though we think of these velocities as
vectors, they do not transform between observers the way vectors do.
Most importantly, although it is trivial that composition of mappings is an
associative operation, there is a subtlety involved in the present case: Before two
observers apply the Lorentz transformation as we defined it (using a 4×4 matrix) to
convert their space-time coordinates, they must both, in general, perform a rotation
50 1. TIME, SPACE, AND SPACE-TIME
of their spatial axes. The fact that the actual transformation is a “sandwich”
consisting of two rotation matrices with a standard-form Lorentz matrix between
them makes it far from obvious that the composition is associative. Our reasons for
believing that it is, at this point, are physical and geometric. It would be desirable
to have an algebraic proof of the fact. That is difficult to do on the basis of our
definition of a Lorentz transformation; we give an “empirical” verification of it in
Theorem 1.6 below.
To get a computable associative operation out of the relativistic addition of
velocities, the triples we need are not the components of velocity in some particular
frame of reference, but rather the relative speed of two observers and the polar
angles along which the two observe each other relative to fixed frames they are
using. Using what we have proved by means of the velocity tetrahedron, we are
now in a position to state the associative law formally and verify that it “computes”
as it should. We still will not quite have made the velocities into a group, even
when we do that, due to the singularity of the polar coordinate system at the origin.
But at least we can allow our three observers to use whatever coordinates they like,
and as long as none of the velocities is zero, we can say exactly how each needs to
rotate its axes in order to communicate with the other two using a vector formula.
We are not going to give a formal proof of the procedure, however, but rather rely
on an “empirical” proof, using Mathematica to generate random data and verify
the associativity of the composition. (The formal verification takes too long, even
for Mathematica.)
Since we wish to discuss the composition without invoking a privileged coordi-
nate system, we shall make a “sandwich” out of the relative velocity of two observers
O and O , writing it as (θ, u, ϕ), where θ is the angle measured counterclockwise
from O’s first coordinate axis to the line of motion O observes O to be traversing,
u is the speed of that motion, and ϕ is the angle measured counterclockwise from
O ’s first coordinate axis to the line of motion O observes O to be traversing. Sim-
ilarly, let (χ, v, ψ) represent the relative velocity of O and O . The speeds u and
v are positive numbers between 0 and c and the four angles are any real numbers,
equality being taken modulo 2π. We wish to define a binary operation “+L ” that
we shall call Lorentz addition for these two triples of real numbers.
y1
............................ ...........
....... ............
..... .
.... .....
...
. χ .....
.... ...........................
. ..
... .... .........
ϕ ....
...
... ..
... ............. O
... ....... .....
... ... .....
.... .. ..
η
..... .......... .............................
......... ........ ....... ....
.....
leaves the Lorentz sum of two velocities invariant. One transformation having this
property is the mapping (θ, ϕ) → (d + θ, d + ϕ) for any fixed angle d (see Prob-
lem 1.18). Once we have the appropriate transformation, we can take a “quotient
space” modulo it and get the two-dimensional object that we need, except for that
troublesome problem with the identity.
Defining the addition that we need is not complicated, given that we know how
to solve the relativistic velocity triangle in Fig. 1.5
Definition 1.1. The Lorentz composition +L of the speeds u and v in the
directions indicated in Fig. (1.5) is the speed w in the direction shown, where we
“sandwich” each speed between the two angles from the observer’s first spatial axis
in the directions of the other two observers. As a formula, replacing the vector
velocity u by the triple (θ, u, ϕ), and similarly for v and w, we get
(θ, u, ϕ) +L (χ, v, ψ) = (θ + ξ, w, ψ − ζ) .
Since we have formulas for ξ, w, and ζ, the associativity of this operation ought
to be straightforward.
We state it as a formal proposition:
Theorem 1.6. The composition (θ, u, ϕ) +L (χ, v, ψ) is an associative oper-
ation.
Proof. There is a complication resulting from the orientation of the triangle.
If the angle η = ϕ − χ is larger than a straight angle, that is, |ϕ − χ| > π, then the
roles of O and O will interchange, and instead of (θ + ξ, w, ψ − ζ), we would need
(θ−ξ, w, ψ+ζ). That will also happen if |ϕ−χ| is less than π but the y1 -axis is inside
52 1. TIME, SPACE, AND SPACE-TIME
the angle η. (Since we represent angles as numbers between 0 and 2π, the angle χ
will be larger than ϕ in this case.) To handle this complication, we need to multiply
ξ and ζ by sgn (π − |ϕ − χ|). Once that is done, although the computations are very
tedious—so tedious that Mathematica will probably run out of memory before it
can actually compute the Lorentz sum if the data are given as infinitely precise real
numbers—it is possible to demonstrate convincingly through numerical examples
that this operation is indeed associative. Mathematica Notebook 3 of Volume 3 will
provide that convincing proof by generating as many random inputs as one likes.
In that notebook, if you input two angles and a speed (the latter in the form ac,
where a is a real number in the range [0, 1)) for each of two triples, the addition
“+L ” (called ladd in Mathematica Notebook 3), can compute the Lorentz sum of
the two relative velocities these triples represent, with the caveat that infinitely
precise real numbers as data may well lead to a long computation requiring an
inordinate amount of computer memory. If you input the data as finite-precision
floating-point numbers (again, the speeds must bear the letter c as a suffix, even
if they are zero), Mathematica will perform the computation in short order. The
last command in the notebook checks the associative property by generating five
triples of triples representing relative velocities and showing that the composition
of three such velocities is the same, no matter how they are grouped.
In dozens of trials, this program always wrote
“Out[4] = {{0,0,0},{0,0,0},{0,0,0},{0,0,0},{0,0,0}}’’ .
Thus, by empirical verification, the operation ladd is associative.
Although the theoretical basis of this program does not include the case of
speed 0, the program will accept an input of 0.0c for either speed, and the output
will be the other term in the sum, except that the two angles in it will be reduced
modulo 2π.
where u is the speed of the second observer relative to the first and α =
(1 − u2 /c2 )−1/2 . More generally, since each observer can perform an orthonormal
transformation on the spatial part of R4 , a Lorentz transformation is one whose
matrix M is such that there exist two rotation matrices R1 and R2 on the spatial
portion of R4 (leaving the time axis fixed) such that L = R1 M R2 has this form.
Or, more practically, we can define a Lorentz transformation to be the set of all
matrices of this form, where M is a fixed matrix in standard form and R1 and R2
vary over all rotations of the spatial portion of space-time. When we use this de-
scription as a definition, it is by no means obvious that the Lorentz transformations
are even closed under the operation of composition.
Mathematicians generally define a Lorentz transformation on R4 to be a linear
transformation that preserves the quadratic form ρ2 − (x1 )2 − (x2 )2 − (x3 )2 . In that
form, it is obvious that the composition of two Lorentz transformations is a Lorentz
transformation. Those concerned with logical rigor are then faced with two choices:
(1) Demonstrating that the transformations fitting the description just given are
precisely the ones that preserve the given quadratic form; (2) showing directly that
if L1 = R11 M1 R12 and L2 = R21 M2 R22 are both Lorentz transformations, then
there are rotations S1 and S2 of the given form such that S1 L2 L1 S2 has the given
form. That is the route we shall follow, using Mathematica to avoid having to do
messy computations by hand.
We now take up the computational problem just posed: getting the standard-
form matrix of the composition of two velocities knowing relative speeds and lines
of sight between the pairs of observers. The “sandwich” made by putting the
two transformations between two rotations of a special form, which we created to
compose relativistic velocities, makes this process computable, though a bit messy.
In our derivation of the Lorentz transformation between X and Y (O and O , as
we called them at the time), we assumed that that the x1 - and y 1 -axes coincide at
all times, that Y is moving in the positive direction with speed u along this axis, as
observed by X, without any rotation. This last means that x2 = y 2 and x3 = y 3 for
any event E having coordinates (ρ, x1 , x2 , x3 ) = (ρ, ξ) and (σ; y 1 , y 2 , y 3 ) = (σ; η),
as measured by X and Y respectively. The FitzGerald–Lorentz
√ contraction factor
for this transformation is 1/α, where α = c/ c2 − u2 . In the three-dimensional
velocity space used in common by X and Y , the velocity is u = (u, 0, 0), when both
observers use the axes just described.
In these two coordinate systems, the Lorentz transformation corresponding to
the velocity u can be written as the matrix equation
⎛ ⎞ ⎛ ⎞⎛ ⎞
σ α −αu/c 0 0 ρ
⎜y 1 ⎟ ⎜−αu/c α 0 0⎟ ⎜x1 ⎟
⎜ 2⎟ = ⎜ ⎟⎜ ⎟ ,
⎝y ⎠ ⎝ 0 0 1 0⎠ ⎝x2 ⎠
y3 0 0 0 1 x3
54 1. TIME, SPACE, AND SPACE-TIME
The matrix of the inverse of the matrix of the Lorentz transformation in these
coordinates is obtained by replacing u with −u, as we would expect.
Lorentz transformations are not the only linear transformations that preserve
the space-time interval. An obvious group of linear transformations with this prop-
erty corresponds to the set of matrices of the form
⎛ ⎞
1 0 0 0
⎜0 r11 r12 r13 ⎟
⎜ ⎟
⎝0 r21 r22 r23 ⎠ ,
0 r31 r32 r33
form as
⎛ ⎞ ⎛ ⎞⎛ ⎞
τ γ −γw/c 0 0 ρ
⎜(z )1 ⎟ ⎜−γw/c γ 0 0⎟ ⎜ 1⎟
⎜ 2⎟ = ⎜ ⎟ ⎜(x )2 ⎟ ,
⎝(z ) ⎠ ⎝ 0 0 1 0 ⎝(x ) ⎠
⎠
(z )3 0 0 0 1 (x )3
√
where γ = c/ c2 − w2 , and the transformation is equivalently described by the set
of four equations
These relations are the result of composing the two Lorentz transformations
with the rotations in the correct sequence. In particular, the 4 × 4 matrix of the
composite transformation in the proper coordinate systems for O and O is the
product of five 4 × 4 matrices, whose entries are rational functions of the entries in
the two Lorentz matrices and the sines and cosines of the angles of rotation. The
full computation is intimidatingly complicated, but a judicious use of Mathematica
will verify its correctness.
Lγ = R−ζ Lβ Rθ Lα R−ξ , ,
where Rψ represents a rotation of R4 that leaves the time axis and the third spatial
axis fixed, and on the plane of the other two axes is a rotation about the third spatial
axis through angle ψ. Moreover,
with his principal axis “aimed” at Y . In the present discussion, X begins with his principal axis
“aimed” at Z.
13. ROTATIONAL MOTION AND A NON-EUCLIDEAN GEOMETRY* 57
product we are discussing is the correct one. The algebra, however, is complicated,
and Mathematica doesn’t seem to be able to simplify it. As a result, we need to go
interactive with Mathematica, using a new approach. Mathematica Notebook 5 in
Volume 3 achieves this end.
Comment: The reasoning seems to be that, since the Earth is not a point, anything
having such a ratio to it would have to be infinitely large.
21 Ptolemy himself, of course, was not bothered by this fact. In his cosmology, circular motion
was the natural motion of all bodies in the heavens, and no further explanation of it was needed.
60 1. TIME, SPACE, AND SPACE-TIME
is using separate time and spatial coordinates and the proper space of each is
Euclidean, from the point of view of the other observer, those time and space coor-
dinates are intertwined, and the space is, as a result, not Euclidean. The Lorentz
transformation was derived for the case of rectilinear motion at constant speed,
which classically requires no forces.
The length of the radius from the center of rotation to each point should be
the same for both observers, since the motion of that radius is perpendicular to
itself at every point, and therefore it does not undergo any FitzGerald–Lorentz
contraction.22 We can assume that our two observers, whom we shall call Ptolemy
and Copernicus, share the same clock at the origin. We do not need to worry about
the synchronization of clocks that each has situated at what he regards as fixed
points on a circle about the origin. Each of them thinks the clocks belonging to
the other are whizzing by his own clocks, and, as Einstein remarked, that means
each of them thinks the other’s clocks are running slow. The two must disagree
as to the length of the orbit of a star. Copernicus says the stars are fixed, and
therefore the circumference of a circle centered at the Earth and passing through
a star at distance R is 2πR. His geometry is, by his measurements, Euclidean. If
we are going to tailor our theory to fit the facts in this case, we shall have to allow
Ptolemy to shorten that circumference, making it less than the distance that light
can travel in a single day. We must picture Ptolemy observing a star careering
around its orbit and seeing the path it is going to travel as being shorter than
Copernicus measures it.
That is the theoretical basis of what we are about to do. The rest is merely a
matter of computational details, which we give below. As it turns out, Ptolemy can
say that what Copernicus is regarding as a three-dimensional space partitioned into
parallel flat planes is in fact a stack of surfaces of revolution resulting from curling
up each of those planes in the same way, the entire stack forming the interior of
an infinite cylinder whose axis is the axis of mutual rotation and whose radius is
C/2π, where C is the distance that light can travel during the time of one complete
revolution.
22 Due to the ambiguity in the concept of “now,” however, a radial path through the stars
observed at a given instant by one of the two observers would be a spiral as observed by the other.
We don’t worry about this fact, since we are confining our attention to a single circle with center
at the origin.
13. ROTATIONAL MOTION AND A NON-EUCLIDEAN GEOMETRY* 61
Where they disagree is in the length C of the orbit of the star. For Copernicus,
for whom the star is fixed in a Euclidean space, that length is simply 2πR. Let us
temporarily assign length C to the orbit in Ptolemy’s scheme of things. The speed u
of the star, according to Ptolemy, is C/T = Cθ /(2π). In his 1916 paper on general
relativity, Einstein brought up the subject of a rotating frame of reference, and
noted that the circumference of a circular orbit would undergo FitzGerald–Lorentz
contraction. Developing that idea, we find
C = 2πR 1 − C 2 (θ )2 /(4π 2 c2 ) .
Solving this equation for C, we find
2πR 2πR
C = = cT ,
2
1 + R c(θ2 )
2
(2πR)2 + (cT )2
C 2πRc
u = = .
T (2πR)2 + (cT )2
These last expressions show that the relativistic length of the orbit C is smaller
than cT (the distance light travels over a period of revolution), and that u is smaller
than c, no matter how large R becomes.
As c → ∞, or R → 0, both of these expressions approximate the classical
expressions for circumference
(2πR) and speed (2πR/T ). That is, for small R or
large c, we have cT / (2πR)2 + (cT )2 ≈ 1.
As R → ∞, we find C/cT = √ 2πR 2 2
→ 1 and u/c → 1.
(2πR) +(cT )
We can represent the geometry of this relativistically rotating plane as the
ordinary Euclidean geometry of a curved surface in three-dimensional space and
at the same time make C the circumference of an actual circle in R3 . Introducing
the variable r = C/2π, we claim that such a surface has the following equation in
cylindrical coordinates (r, θ, z):
2 2
cT (2πr) /(cT ) s2 − 3s + 3
z = z(r) = ds ,
4π 0 (1 − s)3
where the domain is r < cT2π . This surface is shown in Fig. 1.5. The radius of the
orbit described by a point is R, and it is the length of a certain curve from the
origin to the orbit. You can compute this length as
r dz 2
1+ dt .
0 dt
This integral is elementary, since its integrand is (1 − (2πt/cT )2 )−3/2 , and the
integral works out to be
r
= R.
1 − (2πr/cT )2
The orbit of the star is a circular horizontal section of the surface having cir-
cumference 2πr.
This example shows how a curved representation of physical space can be useful
in physics. After we define curvature in Chapter 5, we will be able to verify that
the curvature of this surface at radius r is
2π 2 2πr 2 2
κ=3 1− .
cT cT
62 1. TIME, SPACE, AND SPACE-TIME
This expression shows that the curvature is small (the surface is nearly flat) if
c is very large. After we discuss curvature in Chapter 5, we will recognize that this
surface has positive curvature, just from its convexity. If the speed of light were
infinite, this surface would be a plane. Likewise, if T is very large (the rotation is
very slow), then the curvature is also small. When T = ∞, the space is not rotating
at all, and the equation of the surface is simply z = 0.
From Ptolemy’s point of view, each star traverses a circle with period T equal
to one sidereal day (approximately 86,164 sec). Considering a near star, traversing
a circle of radius 4 light years, which gives R = 4 × 366.2422cT , or R = 3.76158 ×
1016 m, one finds the circumference of its orbit contracted to length
1
C = 2πr ≈ cT 1 − ≈ 0.99999999409861cT .
128π 2 (366.2422)2
The speed u of the star is less than the speed of light by just 5.90138 × 10−9 c,
that is, 1.769 m/sec.
As we shall see in the next chapter, the relativistic mass of this star is about
9200 times its rest mass.
into
π3
3 sec3 θ
√ √ dθ ,
2 θ0 ( 3 sin θ − cos θ)3
√
where θ0 = arctan 3 − 2u)/ 3 . For 0 ≤ u ≤ 1 we have π/6 ≤ θ0 ≤ π/3.
13. ROTATIONAL MOTION AND A NON-EUCLIDEAN GEOMETRY* 63
The substitution ψ = 2θ − π/6, together with the fact that sin π6 = 1/2, then
converts the integral to
π2
3 1
√ dψ = ,
2 2 ψ0 (sin ψ − 1/2)3
where ψ0 = 2θ0 − π/6.
It is more convenient to integrate in the opposite direction, which we can do
by replacing ψ by ϕ = π/2 − ψ. Letting ϕ0 = π/2 − ψ0 , we then have
ϕ0
3 1
√ dϕ .
2 2 0 (cos ϕ − 1/2)3
Next, we use the identity cos ϕ = 1 − 2 sin2 (ϕ/2) = 1 − 2 sin2 η to write this as
η0
3 1
√ 3 dη .
2 0 (1/2 − 2 sin2 η
Here, EllipticE [x,m] is the notation Mathematica uses for the elliptic integral
of second kind x
1 − m sin2 t dt .
0
The condition imposed on the angle η0 simply says that 0 ≤ η0 ≤ π/6, and this
is indeed the case.
This is the first of several times when we shall encounter elliptic functions.
They deserve to be better appreciated than they generally are by most physicists
and mathematicians. They give exact expressions for solutions to some common
differential equations of mathematical physics, solutions that otherwise have to
be described qualitatively by inserting corrective terms into expressions involving
elementary functions.
14. Problems
Problem 1.1. Solve the equations of the Lorentz transformation for t, x, y, and
z in terms of t , x , y , and z , and show that the solution is the same transformation
with u replaced by −u (which makes no change in α).
Problem 1.2. Revisit the problem of the twin paradox by imagining that
Mary has a telescope trained on the Earth, so that she can constantly observe
John’s clock. What would she see? Why is it that this clock shows a later time
than Mary’s own clock when the twins meet at the end of the journey?
Problem 1.3. Consider the vector formulation of the Lorentz transformation
given by the mutually inverse relations
u · x x·u
(t ; x ) = α t − 2 ; x + (α − 1) − αt u
c u·u
and
u · x x · u
(t, x) =α t + ; x + (α − 1) + αt
u .
c2 u·u
Verify that these relations really are inverses of each other by inserting the
values of x and t from the second relation into the right-hand side of the first
relation.
Problem 1.4. Show that the observers O and O agree about the “space-time
metric,” that is, show that (ct2 ) − x2 − y 2 − z 2 = (ct )2 − x2 − y 2 − z 2 .
Problem 1.5. Let a and b be dimensionless positive constants. Show that the
mutually perpendicular lines bx = ay, z = 0 andby = −ax, z = 0 observed by
O make the nonobtuse angle arccos ab(α2 − 1)/ (a2 α2 + b2 )(b2 α2 + a2 ) when
observed by O at any given instant s. Show that this is a right angle only if u = 0,
and that it tends to 0◦ as u ↑ c.
Problem 1.6. Translate the equation of the unit circle x2 +y 2 = R2 , as seen by
O, into O ’s coordinate system. What kind of curve does this equation represent?
How does the shape depend on time?
Problem 1.7. Translate the equation of a general conic section Ax2 + 2Bxy +
Cy + Dx + Ey + F = 0, observed by O, into the coordinate system used by O ,
2
(You will actually get a cubic equation from which the trivial factor x − 1 can be
divided out.) Show that the positive sign in the first expression for x (corresponding
to the negative sign in the second one) is consistent with the relation x ∈ [0, 1] only
when a = 1, which is the case when u = v = c. In this case, x = 1, that is, this
is the case η = 0, which, as we have already remarked, is trivial. Then, for the
negative sign on the square root in the numerator, show that x lies in the range
[1/3, 1/2] for all values of a ∈ [0, 1]. √
Verify the example mentioned in the text, in which a = 3 2/5 and cos(η) =
5/12, showing that
η ≈ 67.056553501352011261◦
Problem 1.13. Show that the sum of the angles of a relativistic velocity tri-
angle is smaller than two right angles.
Problem 1.14. This problem has four parts. We define the angle defect of a
relativistic velocity triangle whose angles are ξ, η, and ζ to be the positive number
π − (ξ + η + ζ). Consider a triangle with these angles and divide it into two smaller
triangles by drawing a line from the vertex a angle η to a point on the opposite
side, thereby dividing the triangle into two smaller triangles, one having angles η1
(part of angle η), ξ, and ϕ1 (at the vertex on the side opposite the angle η), and
the other having angles η2 , ζ, and ϕ2 = π − ϕ1 .
Part 1: Show that the defect of the original triangle is the sum of the defects of the
two triangles into which it is divided. (It is not difficult to prove—although you
are not being asked to do so—that when a triangle is partitioned into any number
of other triangles, its defect is the sum of the defects of the triangles that partition
it.) Thus the defect of a triangle is proportional to what we think of as the area of
a triangle, and so we shall define the area of a triangle to be c2 times its defect. We
then define the area of a polygon to be the sum of the areas of any set of triangles
into which it can be partitioned. It is not difficult to show that this definition is
independent of the way in which the polygon is triangulated.
Part 2: Consider an isosceles relativistic velocity triangle having two equal sides of
length u with angle η between them, and let the other two angles both be equal to
14. PROBLEMS 67
2u 1− u2
c2 cos η2 η
w = · sin .
1− u2
c2 cos η 2
2u 1−
u2
c2 π
cos2 π
n
π(Pn ) = · n sin ,
1 − cos u2
c2
2π
n
n
sin πn
A(Pn ) = (n − 2)π − 2n arccos c2 .
1 − uc2 cos2
2 π
n
Problem 1.18. Suppose (θ, u, ϕ) +L (χ, v, ψ) = (μ, w, ν). Prove that for any
angles d, e, and f :
(d + θ, u, ϕ) +L (χ, v, ψ) = (d + μ, w, ν) ,
(θ, u, e + ϕ) +L (e + χ, v, ψ) = (μ, w, ν) ,
(θ, u, ϕ) +L (χ, v, ψ + f ) = (μ, w, ν + f ) .
Different operations are being applied to the two addends in each of these cases.
Show that if we take d = e = f and combine these results, we obtain a mapping
Td (θ, u, ϕ) = (d + θ, u, ϕ + d) that satisfies the equality
Td (θ, u, ϕ) +L Td (χ, v, ψ) = Td (θ, u, ϕ) +L (χ, v, ψ) .
Thus Lorentz addition is invariant under each operation Td . (Caution: This
result does not enable us to reduce the dimension of the three-dimensional space we
have invented to describe the addition. We cannot, for example, replace (θ, u, ϕ)
by (0, u, ϕ − θ) and (χ, v, ψ) by (0, v, ψ − χ), even though (θ, u, ϕ) = Tθ (0, u, ϕ − θ)
and Tχ (0, v, ψ − χ) = (χ, v, ψ). The difficulty is that Tθ is not the same operator
as Tχ .
Problem 1.19. Suppressing the second and third spatial dimensions, we focus
attention on just the time and first spatial axes. The Lorentz transformation is
u
τ = α τ − x ,
uc
x = α − τ +x .
c
For a reason that will become clear in a moment, let the angle θ be
u2 1
θ = arccos 1 − 2 = arccos .
c α
Thus, α = sec θ.
Solve the second equation of the Lorentz equations for x in terms of x and
τ (and θ), then substitute the result in the first equation, so that τ and x are
expressed in terms of τ and x , yielding
τ = (cos θ)τ + (sin θ)x,
x = −(sin θ)τ + (cos θ)x.
Use these equations to solve the car wash puzzle.
Problem 1.20. Here is a variation on the car wash puzzle. Suppose that as
the limousine moved through the car wash with speed u, two car wash attendants
simultaneously, as measured by a clock in the car wash, put scratches in it, one in
the front fender, the other in the rear fender. Suppose that they were standing 3
meters apart, as measured by the car wash attendants themselves, when they made
the scratches. If the limousine is then stopped and measured by the car wash
attendants, how far apart will the scratches be?
Problem 1.21. Verify that the space-time interval ds2 between the two
events—the rear of the limousine entering the car wash and the front of it leaving
the car wash—is negative (spacelike).
14. PROBLEMS 69
Problem 1.22. Consider the special case of a relativistic velocity triangle when
u and v lie along perpendicular directions. For this case, we have γ = αβ. Recall,
as noted above, that α corresponds to the cosine of the angle of rotation that de-
scribes the Lorentz transformation between O and O when they interchange time
coordinates, and likewise β is the cosine of the corresponding angle for the trans-
formation between O and O and γ the angle corresponding to the transformation
between O and O . Show that, when they are regarded as arcs on a sphere, the
three angles that provide these geometric representations are the sides of a spherical
right triangle in this case.
CHAPTER 2
Relativistic Mechanics
Once we abandon the intuitive absolute space and time of Newtonian mechanics,
everything in classical physics becomes “negotiable.” A thorough revamping of the
entire subject is required, each of its fundamental concepts receiving a new defi-
nition. If we assume Lorentz transformations between two observers in motion at
constant speed, for example, what becomes of acceleration and force, two funda-
mental concepts in mechanics? In order to answer that question, we first ask how
two observers in uniform relative motion—for notational convenience we shall call
them X and Y —reconcile their measurements of the velocity of a particle. Each
of them will assign it a velocity vector at each particular “time,” but, as we now
realize, we need both the time and the particle’s location at that time before we
can reconcile the two observers’ bookkeeping. Our discussion of these topics at first
adheres as closely as possible to the Newtonian quantities that would be measured
by each individual observer. The proper definition of each of these concepts in more
formal special relativity differs in that it treats them as points in four-dimensional
space-time and uses proper time rather than observer time to define the basic quan-
tities involved in mechanics. We shall use the familiar spatial three-vectors as much
as possible and discuss these interesting four-vectors only in the final section of this
chapter.
By studying the motion of a particle, we can get a specific mapping between
the two observers’ time axes r ↔ s. The two times assigned by the observers to any
event involving the particle, such as its collision with another particle, for example,
are placed in correspondence with each other. The definitions of the relativistic
kinematics of a particle—its position, velocity, and acceleration, are straightfor-
ward, except for the complication in reconciling the measurements of these quan-
tities between two observers in relative motion; the details of that operation are
consequences of the Lorentz transformation introduced in the previous chapter.
acceleration; (2) when the mass of the particle is taken into consideration. We
begin with the first of these.
Suppose that X is using (r; x) as space-time coordinates, and from that per-
spective the motion of the particle is given by a vector-valued function x(r) =
x1 (r)i + x2 (r)j + x3 (r)k. We shall say that X records a world-line for the particle,
namely the set of points(r; x) in R4 satisfying
x = x(r) .
Similarly, let us suppose Y is using (s; y) as space-time coordinates and that
Y records the world-line of the particle as
y = y(s),
where y(s) = y 1 (s)i + y 2 (s)j + y 3 (s)k. We first see how the two observers’ mea-
surements of velocity and acceleration are to be reconciled.
x.2 y2 y2
.
..........
... ...
...................
...... ......... ........
... ..... •........... ... ....
... ... ... ......... ... ...
... ... ............
... . ..... ... .. .......... ... ...
... .. ... ... . ... ...
... .. ... ... vX ... ...
... .. ... ... ... ...
... .. ... ... ... ...
... .. ... ... ... ...
... .. ... ... ... ...
... ....... ...
... ... ...
... ..... .. ... ...
... ..... .......... ... ...
... ...... .
. ... ...
... ..... ... ...
... ......... aX ... ...
... ..... ........ θ ... ...
........ ... ... ...
...... ... .... ... ..............
....................................................................................................................... . ...............
...............................................................................................................................
. ... ..
..............................................................................................................................
.. x1 ....
...
y1 ...
...
y1
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
•...
. ...
........ ...
...
....... ...
... ... ...
... .....
... ...... •...
.....
... ......
... ...... .......
.. .... .....
..... ........ ......... ......
... ...... .. ......
... ...... ..
..
.
...
...
aY ............
........ ... ......
.. ... .....
aY .. ......
......... .....
....
.. .....
vY .....
......
........
..
vY
(a) (b) (c)
1 This figure is not realistic, as the projectile is imagined to have been fired at an elevation of
75◦ with muzzle velocity three-fifths that of light. In order for this projectile to fall as shown in
the figure, the Earth’s gravitational field would have to be nearly two million times stronger than
it actually is. We also scaled the velocities by a factor of 5 and the accelerations by a factor of
50. It is legitimate to do so, since these vectors “inhabit” a space that is different from the space
traversed by the projectile. In the spaces these vectors inhabit, the coordinates have physical
dimensions of length/time and length/time-squared respectively.
2. FROM KINEMATICS TO DYNAMICS: MASS AND MOMENTUM 75
The quantity of matter is the measure of the same, arising from its
density and bulk conjointly. . . The quantity of motion is the mea-
sure of the same, arising from the velocity and quantity of matter
conjointly.
Isaac Newton ([63], p. 5).
complex entities such as momentum, force, and energy are defined in terms of them.
The important conservation laws (conservation of momentum, conservation of an-
gular momentum, and conservation of energy) are pillars of mechanics, and our
redefinition of mass will be aimed at keeping at least the first two of these laws.
We start with momentum. Imagine that Y sees two particles of identical mass
m approaching from opposite directions, each being the same distance away and
moving with the same speed v, so that Y ascribes to them velocities v and −v.
These particles are coated with “superglue,” so that when they collide, directly in
front of Y , they both come to a dead stop. As far as Y is concerned, both classically
and relativistically, this is all fine. The momentum of the pair added up to zero
algebraically both before and after the collision. The kinetic energy lost by the
particles has to be explained, but let that pass for the moment.
The same reconciliation would be possible for X, whom we imagine as moving
along with one of the particles, say the one whose speed is −v, provided we took
the classical addition of velocities, whereby the other particle is moving with speed
2v relative to X. In classical terms the total momentum was m(2v) + m(0) = 2mv
before the collision. Afterwards it is mv +mv = 2mv, since the two particles remain
at Y ’s origin, while X’s origin continues to move with velocity −vi.
Now let us look at the situation relativistically. Relative to X the speed of the
other particle before the collision is w = 2vv2 . Hence the momentum before the
1+ c2
2mv
collision would seem to be 2 , and afterward it would appear to be 2mv, which
1+ vc2
is obviously a larger number than before.
One way out of this difficulty is to assume that a particle is more massive when
it is moving than when it is at rest. Let us assign the value mv to the particle
when moving at speed v. We shall assume that both mass and momentum are
conserved in this thought experiment. Then from X’s point of view the total mass
of the system before the collision is m0 + mw , and afterward it is 2mv , since X
continues to move in the negative direction at speed v. These two expressions must
be numerically equal. (Although mass is not constant in relativity, the mass of a
system changes only when some external work is done on it to increase its total
energy. No such work occurred in the collision just described.) The momentum
before the collision was mw w, and afterward it was (m0 + mw )v. Setting these
expressions equal to each other, we get the equation
m0 v = mw (w − v) .
Now, if we solve the equation between w and v for v, assuming v < c, we find
c2 w2
v= 1− 1− 2 ,
w c
which—we note for reference below—is equivalent to the relation
w2 vw 2v 2 c2 − v 2
1− 2 =1− 2 =1− 2 = .
c c c + v2 c2 + v 2
c2 w2
Since w = w c2 ,
it follows that
c2 w2 w2 w2
w−v = 1− 2 − 1− 2 =v 1− 2 .
w c c c
This relation then yields immediately the following principle:
2. FROM KINEMATICS TO DYNAMICS: MASS AND MOMENTUM 77
This equation was derived as the only possible way of retaining conservation
of momentum in relativity. The derivation does not prove that momentum will
be conserved in all situations, however. Our redefinition of mass was necessary,
but has not been proved sufficient, for that purpose. We are going to omit the
proof of sufficiency, however, and content ourselves with a necessary condition that
determines the equations we need to use.
There is one obvious danger of inconsistency here. The total mass before the
collision, according to X, was m0 + mw . According to Y , the total mass before the
collision was 2mv . Since Y is at rest relative to the system after the collision and
mass is conserved when no external work is done on the system, its new rest mass
will be 2mv , and X will ascribe to it the mass 2mv / 1 − v 2 /c2 . But that number
is precisely m0 + mw . For we have m0 + mw = m0 1 + 1/ 1 − w2 /c2 . By the
relation derived above, we thus get
c2 + v 2 2m0 2mv
m0 + mw = m0 1 + 2 = = .
c − v2 1 − v 2 /c2 1 − v 2 /c2
As far as we have explored, there are no obvious inconsistencies between the
two observers if we adopt the formula in Eq. (2.4) as the definition of the mass
of a particle of rest mass m0 when it is moving with speed w. We can define the
relativistic (three-)momentum of a particle as mv, where v is the observed velocity
of the particle and m its observed relativistic mass. One consequence of this fact is
that the momentum and velocity of a particle are no longer directly proportional,
as they are in Newtonian mechanics; for m is no longer a constant independent of
v. That difference will have further consequences for the concepts of force, work,
and energy in relativity.
If we consider instead of a particle of mass m a mass density ρ, Eq. (2.4) has
the following consequence:
Theorem 2.4. A solid body having density ρ0 in a coordinate system with
respect to which it is at rest has density
ρ0 c 2
(2.5) ρ = α 2 ρ0 =
c2 − u2
when measured in a system moving with speed u relative to the given system.
Proof. Consider a portion of the body having mass m0 in a volume V0 and
thus average density m0 /V0 . When measured in the moving coordinate system m0
will become m = αm0 , and by Theorem 1.2 of Chapter 1, the volume will become
V = V0 /α.
We demonstrated above that the mass of a particle measured by an observer
depends on its speed relative to that observer. In that connection, we introduce
the quantities
1 1
β= ; γ= .
1 2 2 2 3 2 1 2 2 2 3 2
1 − ((y ) ) +((yc2) ) +((y ) ) 1 − ((x ) ) +((xc2) ) +((x ) )
78 2. RELATIVISTIC MECHANICS
The equations γ = αβη and β = αγδ can be computed directly, as was done in
the preceding chapter when we computed the composition of two Lorentz transfor-
mations. (Imagine a third observer Z “riding” the particle. The roles of α, β, and
γ are then exactly as in Chapter 1.)
Notice that, since we are not dealing with a constant particle velocity here, the
quantities β and γ are not constant. In fact,
dβ β3
β = = 2 (ay · v y ) ,
ds c
where v y = (y 1 ) (s), (y 2 ) (s), (y 3 ) (s) is the velocity of the particle measured by
Y and ay = dv y /ds the corresponding acceleration, the dot product being taken in
Y ’s coordinates. We shall similarly write v x and ax for the velocity and acceleration
measured by X, in which case we have
dγ γ3
γ = = 2 (ax · v x ) .
dr c
Theorem 2.5. Two observers in motion at relative speed u along their common
first spatial axis assign components to the momentum p of the particle that have
the following relations to each other:
(y 1 ) + u
px1 = m0 γ(x1 ) = m0 γ = αm0 β((y 1 ) + u) = αpy1 + αm0 βu ,
η
px2 = m0 γ(x2 ) = m0 β(y 2 ) = py2 ,
(x1 ) − u
py1 = m0 β(y 1 ) = m0 β = αm0 β((x1 ) − u) = αpx1 − αm0 γu ,
δ
py2 = m0 β(y 2 ) = m0 γ(x2 ) = px2 ,
These relations are shown in Fig. 2.2. In that figure Observer X sees a particle
of unit mass moving around a circle of radius one light-second at a linear speed
of π/4 light-seconds per second (one revolution every eight seconds). Observer Y ,
moving as usual along the x1 /y 1 -axis at constant linear speed equal to three-fifths
of light speed, observes the particle doing a loop-the-loop. After 3.7 seconds have
elapsed at the particle on a clock synchronized with X’s clock (5.35428 seconds on
a clock at the particle that is synchronized with Y ’s clock), the particle is at the
point marked by a bullet in the figure. Observer Y perceives this event as occurring
later (thinks X’s clock is slow). The momenta observed by the two observers are
shown as solid arrows (the longer one in the case of the double arrow on the left).
For comparison, the trajectory Y would observe using Newtonian physics is shown
as a dashed line and the corresponding momentum as a dashed arrow on the right.
3. RELATIVISTIC FORCE 79
x.2 y2
. ..
......... ........
.... ...
.. ....
.
.
.............................. .............. . ................................................ ... ... ... ... ... . ...
............
..
..
.................................................. ....
.
.
..... ....
.....
... ...
............ ... ... . ..........
...
........
.. ...
. .. ................... ... ............
.. .........
.. ... ....... ....
.. ....... ....
... ............ ......
... ..
... .............
. ... ......
•... .. ..
........................................................................................................... 1
...... .... ......
. •....... •... .. ........
................................................................................................................................................................................................................................................................................................................................................. y 1
............. .... .. x ......... .. . . ....
.... ..... ... .
... .
..
.
.
......... ..............
. .
.
.
........ ....... .... .........
. ...... .... .... .. ....
.... ....................... ........ ... ...pY ... ...
......... ... .. ...
pX ... pY ...
.... ...
.. ...
... ...
.
3. Relativistic Force
If A is the mover and B the thing moved, which has been moved a
certain distance Γ in a certain time Δ, then in the same amount
of time Δ a force (δύναμις, dynamis) equal to A will move half of
B twice the distance Γ or move it the same distance Γ in half of
the time Δ. Thus we have a direct proportion.
Aristotle, Physics, Book VII. My translation.
Although it is generally risky to “modernize” the words of scientists2 who lived long
ago, we might think of the “thing moved” as either mass or weight. Aristotle would
not have had the concept of mass; but what we think of as weight, that is, the force
required to lift an object against gravity, would have made sense to him. With that
interpretation, Aristotle is saying that the ratio that a force applied to an object
bears to the weight of that object is directly proportional to the distance the object
moves and inversely proportional to the time required to move that distance. In
our terms the “constant of proportionality” (another anachronism in the world of
ancient Greek science) must be the reciprocal of a velocity. If, on the other hand,
we forge ahead with our modernized interpretation of “the thing moved” as a mass,
it appears that Aristotle thought of force as the cause of what we call momentum.
The quotation shows that he thought that forces were measured by numbers and
could be added and subtracted. He does not seem to take account of any direction
that a force may have. Having no precise definition of instantaneous velocity, he
really could handle only mechanical problems where direct proportion could be
applied. A quantitative understanding of even the simplest case of accelerated
motion—where the acceleration is constant—had to await the scholars of Merton
College, Oxford, in the thirteenth century. Even so, Aristotle’s use of the term
force is closer to the everyday nonscientific meaning we still give it today. Like
2 Indeed, the very concept of a “scientist” is anachronistic when applied to ancient times.
80 2. RELATIVISTIC MECHANICS
m0
p= v.
1 − v·v
c2
This is a genuine, coordinate-free vector equation, since both sides are vectors used
by the same observer. Such will be the case with all the vectors that arise in the
present section, except when we compare the expressions for these vectors used by
two observers, at which point we shall once again revert to coordinate-wise notation.
In Newtonian mechanics, since momentum is directly proportional to velocity,
it makes no difference whether we write Newton’s second law as F = ma = mr
or F = p . But these two definitions are not equivalent in special relativity, and
we must choose between them. We choose the equation F = p as the definition of
force. Although the equation is formally the same as the classical definition of force,
there are two important differences: (1) the mass depends on the speed, and (2)
consequently there is an extra term in the expression for the force. In particular,
the acceleration produced by a force is not generally parallel to the force.
dv dm
F =m + v = ma + m v.
dt dt
3. RELATIVISTIC FORCE 81
For our purposes, we need to rearrange this last equation slightly, and to that
end we compute m explicitly, using the chain rule:
1 v · v − 32 −2v · a m(v · a)
m = m0 − 1− 2 = 2 .
2 c c2 c −v·v
Definition 2.1. The relativistic force on a moving particle is given by
v·a
(2.6) F =m a+ 2 v .
c −v·v
Since the projection of a perpendicular to v is
v·a
a− v,
v·v
we can resolve F into components parallel and perpendicular to v:
v·a 1 1
F =m a− v + m(v · a) 2 + v
v·v c −v·v v·v
v·a c2 (v · a)
=m a− v +m v.
v·v (v · v)(c2 − v · v)
We can now express the acceleration in terms of the force and the velocity. The
key to doing so is the relation
c2 (v · a) dm
F ·v =m = m0 α3 (v · a) = c2 ,
c2 − v · v dt
which allows us to express v · a as
F ·v 2 F ·v
v·a= (c − v · v) = .
mc2 m0 α 3
F v·a F F ·v
(2.7) a= − v= − v.
m c2 − v · v m mc2
Equation (2.7) shows that the relativistic expression for the acceleration due
to a force is the same as the classical expression when the force is perpendicular to
the velocity. Such a situation arises, for example, in the case of a charge moving in
a magnetic field. But in general, to get a desired acceleration a, one must apply
a force that has a component perpendicular to a. Since one component of the
relativistic force is parallel to the velocity rather than the acceleration, we might
revert to our driving analogy to say that steering is more complicated than we
instinctively think if relativistic forces are involved. A turn of the steering wheel
must generally be accompanied by pressure on the brake or the accelerator in
order to get the car accelerating in the desired direction. It is fortunate that our
automobiles do not move at speeds comparable to c. If they did, steering them
would become a very counter-intuitive process and would require a lot of practice
before one could be safely licensed to drive.
82 2. RELATIVISTIC MECHANICS
Theorem 2.6. Two observers X and Y using coordinates such that Y has
velocity u = ui relative to X and X has velocity −u = −ui relative to Y will
reconcile the components of force by the following formulas.
1 m0 β 3 1 m0 β 3
fx1 = px1 = αpy1 + (v y · a y )u = fy1 + (v y · ay )u ,
αη c2 η αηc2
1 1
fx2 = px2 = p = fy2 ,
αη y2 αη
1 1
fx3 = px3 = py3 = fy3 .
αη αη
Proof. This is a computation entirely contained in the previous discussion.
Corollary 2.1. In the special case when the velocity v y is perpendicular to u
(that is, (y 1 ) = 0), these equations simplify to
1 m0 β 3 m0 β 3
(2.8) fx1 = px1 = αpy1 + 2
(v y · ay )u = fy1 + (v y · ay )u ,
α c αc2
1 1
(2.9) fx2 = px2 = p = fy2 ,
α y2 α
1 1
(2.10) fx3 = px3 = p = fy3 .
α y3 α
Proof. In this case, we have η = 1.
The relativistic forces on the projectile depicted in Fig. 2.1 are shown in Fig. 2.3,
both being scaled by a factor of 25. Qualitatively speaking, the departure from
the classical downward force is not really noticeable, even with this much scaling.
(The force actually points downward and very slightly to the right.) The main
departure from Newtonian mechanics is that the two observers, even though in
uniform relative motion along a straight line, do not agree about the force.
In the case of the uniform circular motion shown in Fig. 2.3, where aX is
perpendicular to v X , the relativistic acceleration and force align with the Newtonian
acceleration and force, only the relativistic force is larger because the relativistic
mass of the particle is larger. From Y ’s point of view, aY · v Y = −0.0465046 light-
seconds-squared per second-cubed, so that the relativistic force does not point in
the same direction as the relativistic acceleration (shown magnified by a factor of 5
on the right-hand side of Fig. 2.4). A computation made using Mathematica reveals
that the two sides of Eq. (2.6) are both equal to 0.872973i − 0.167666j.
3.1. How are relativistic forces determined? At this point, we have built
up the basic machinery of mechanics in the context of special relativity. We have
done this by assuming (1) that the speed of light c in empty space is the same for
all observers and (2) that any two observers in motion at constant relative velocity
must use the same kinematic laws when they both observe a moving particle. These
assumptions led to the Lorentz transformation for space-time coordinates and to
the relativistic expressions for the transformation of velocity and acceleration of a
moving particle. When we adjoined the assumption (3) that the two observers must
both agree on the dynamic law known as conservation of momentum, we were led
to the relativistic phenomenon—absent from classical mechanics—that the mass of
a particle measured by an observer depends on its speed relative to that observer.
3. RELATIVISTIC FORCE 83
x2 y2
.... ...
....... .......................... ........
.. ...... •
...... ....
.. ..... ... ......
.... ..... ...
.. ..... ... ....
... ... ...
... .
... ... ... ...
.
... .. .... ... ...
... ...
... . ...
. ... ... ...
... . ...
. ... ... ...
... .. .... ...
...
...
... ...
... .. ... ... .. ...
... .. ... ... ...
... .. ... ... ...
... .. ... ... ...
... .. .. .
. ...
... .. ... f X ...... ...
... .. ... ...
... ..... ... ...
... ....... ... ...
...
... ..... ... ...
... ..... ... ...
... ..... ... ...
... .............. ... ...
... ..... ........ θ ... ...
... ..... ... ... ...
... . . ...
........ ... . ........
....... .. ... ........................
......................................................................................................................................................... . .. ..
... x1 ..............................................................................................................................................................
...
...
y1
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
• ...
.....
......
.....
f Y ............
... ...
.... ....
.
......... ....
... ...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
(a) (b) ...
...
...
..
That fact in turn led us to the relativistic version of the fundamental second law
of motion F = p .
Newtonian mechanics is useful only when we have a complete list of the relevant
forces acting on a body. The net force acting on a specific moving particle cannot
be defined to be simply ma. That definition leads to the highly uninteresting,
though indisputable, equation ma = ma. To get any practical value out of the
theory for explaining the motion of a pendulum or a vibrating string or membrane
or an elastic spring or a planet, we need to bring in a list of specific forces such
as tension, friction, air resistance, and gravity so that we get a second expression
for the force. By setting that second expression equal to ma, we get a system of
84 2. RELATIVISTIC MECHANICS
x.2 y2
. ..
......... ........
.... ...
.. ....
.
.. . .. . ......................... .........
..... . ................................................... ... ... ... ... ... .. . . . . ...........
..
..
.................................................. ....
.
.
..... ....
.....
... .............
. ... ... ..........
...
.
. ...
... . . . .. .............. ... ...........
.. ..........
.. .. ....... ....
.. ....... ....
... ............ ......
.. ..
... ...........
. ... ...... ......
•............. ... ..
........................................................................................................................... 1
...... .... ......
. .•
............... ...• ..... ..... f
.......................................................................................................................................................................................................................................................Y
.. ..........
.......................................................................................................... y 1
............. .... f ... x ......... .. .. . . .
.
.... ..... ... X ..... .
..
. ......... ..................... f Y
. . . . .
.
.
.
........ ....... .... ........ ...... .... .... . . ....... ....
........ ... ... pY ... ... .......
.... ....................... ......... ... .. aY ...
pX ... pY ...
.... ...
.. ...
... ...
.
second-order differential equations whose solution will describe the motion of the
body. All that is well known, and brings us to a significant question: How do we
get that second expression for the force so that we can obtain a system of equations
of motion for a particle?
In some cases—for example, the Lorentz equation F = q(E + (1/c)v × B) for
a charged particle moving in an electromagnetic field—we simply take the classical
definition, setting this force equal to the relativistic force p . Since the force is
now the relativistic force and in general vectors are not invariant under the Lorentz
transformation, the assumption of this law in relativity necessitates a revision in
the equations of transformation for the electric and magnetic fields if two observers
in uniform relative motion are to agree on the basic physical law. Einstein carried
out that revision in his first paper on special relativity [17], and one consequence of
it was a stunning simplification in Maxwell’s laws, essentially a reduction from four
equations, two of which were vector equations, to a single pair of scalar equations.
This reduction will be discussed in Chapter 3.
Assuming we have solved the problem of postulating the relativistic force on
a particle—and we have emphatically not done this yet for the important case of
gravitational forces—the description of the motion of a particle under the influence
of a relativistic force is more complicated than in the classical case, since the rela-
tivistic expression for force introduces terms into the differential equations involving
not only r but also r , which is frequently absent from the equations describing
mechanical processes.
The force on a charged particle given by the Lorentz equation in electromagnetic
theory is independent of the mass of the particle. It depends on the charge of the
particle rather than its mass. The Lorentz law, and Coulomb’s law of electrostatics
differ from the inverse-square law of gravity given by Newton, where the force is
4. WORK, ENERGY, AND THE FAMOUS E = mc2 85
The work W (γ) is the sum of infinitesimal terms, each of which is of the form “force
× distance.” The result of doing such work is a change ΔK in the kinetic energy
(1/2)mv 2 = m|r (t)|2 /2. That is, since F = mr , the work can be computed using
time t as a parameter if the particle is at location r(t) at time t. Then
b b
d 1 1 1
W (γ) = mr (t)·r (t) dt = mr (t)·r (t) dt = m|r (b)|2 − m|r (a)|2 .
a a dt 2 2 2
see that it is more convenient to think of them as length times velocity-squared divided by mass.
86 2. RELATIVISTIC MECHANICS
Thus, the work is equal to the change in kinetic energy of the particle. We observe
that it is only the portion of the force tangential to the path that does any work
on the particle. The normal component is there only to keep the particle on the
path. If there is no normal force, then the particle moves along a straight line.
Caution: The final expression here is determined by the values of the velocity at
just the endpoints of the path. But the value of r (b) may very well (in general,
does) depend on the particular path joining r(a) to r(b). We cannot in general fix
a and then define the work done as a function of a point r. There is, however, one
case in which we can do that, and it is an important one that we shall discuss after
we describe work and kinetic energy in relativity.
4.2. Special relativity. The relativistic definition of the work done by a force
F (t) acting over a path parameterized by a function x(t), t0 ≤ t ≤ t1 , follows the
Newtonian pattern, using the line integral of the force over the path:
t1
W = F (t) · x (t) dt .
t0
Theorem 2.7. The work done by a relativistic force moving a particle over a
path is determined by the speed of the particle at the initial and terminal points. If
those speeds are v0 and v1 respectively, the work is
(2.11) W = (mv1 − mv0 )c2 .
In other words, the work done in accelerating the particle from speed v0 to
speed v1 is the relativistic increase in the mass of the particle multiplied by the
square of the speed of light. We define this expression to be the change in kinetic
energy of the particle. Thus, if we assign to the particle a “rest energy” of m0 c2 ,
then its kinetic energy at speed v is (mv − m0 )c2 , and its total energy is given by
the most famous equation of twentieth-century physics:
(2.12) E = mc2 .
Equation (2.12) expresses the equivalence of mass and energy, undoubtedly
the best-known of all the consequences of the special theory of relativity. As a
consequence of Eq. (2.12), the relativistic kinetic energy T is slightly larger than
the Newtonian, which is m0 v 2 /2. It is
T = mc2 − m0 c2 = m0 c2 (α − 1)
= m0 c2 − 1 + (1 − v 2 /c2 )−1/2
1 v2 3 v4 5 v6
= m0 c 2 2
+ 4
+ 6
+ ···
2c 8c 16 c
4 6
1 3m 0 v 5m 0v
= m0 v 2 + + + ··· .
2 8c2 16c4
(See Problem 2.4 below.)
In the limit as c tends to infinity, the relativistic kinetic energy approaches the
Newtonian. The difference is hardly measurable at ordinary speeds, where v <
10−7 c. That fact gives us confidence that we have adopted the correct relativistic
definition of kinetic energy.
The negative sign appears in this definition because the potential energy V (r) is
the work done against the field F in moving a particle from the base point r 0 to
the variable point r.
88 2. RELATIVISTIC MECHANICS
where the nabla-symbol (∇) denotes the gradient operator. This operator will be
used often in what follows, especially in connection with the Euler equations in
the calculus of variations and harmonic functions in mechanics. William Rowan
Hamilton introduced the symbol in 1837 in the form . After vector analysis was
distilled from Hamilton’s quaternions, the symbol was found useful by Peter Guthrie
Tait (1831–1901), an enthusiastic champion of vector analysis. In an address to the
Edinburgh Physical Society in 1889, published the following year ([79], p. 92),
he said that Maxwell had an aversion to the name, but preferred it to the more
descriptive, but vulgar-sounding names sloper and grader. Tait goes on to say that
nabla is the Hebrew word for an ancient Assyrian harp—that word is nowadays
transliterated as nevel —and that the name was suggested by the orientalist William
Robertson Smith (1846–1894).
The resemblance of the symbol ∇ to an upside-down version of the Greek letter
Δ, led Maxwell to have some fun with Tait. On 7 November 1870, he wrote to Tait,
“What do you call this? Atled”? And then on 23 January 1871, he asked Tait,
“Still harping on that Nabla?” Those details and a fuller history can be found at
the following website.
www.mat.univie.ac.at/~neum/contrib/nabla.txt
The multiplicity of potential functions corresponding to a given force (a differ-
ent one for every base point) is not a problem, since they all have the same gradient,
namely the negative of the force itself.
If the kinetic energy of a particle in motion whose position attime t is r(t) is
defined as T (t) = m|r (t)|2 /2, we have T (t) = mr (t) · r (t) = F r(t) · r (t). As
a result, we have proved the following fundamental fact:
5. NEWTONIAN POTENTIAL ENERGY 89
Theorem 2.8. In Newtonian mechanics, if the force F (r) depends only on the
position and is the negative of the gradient of a potential function V (r), then the
total energy V (t) + T (t) of a moving particle at any time t is constant:
V (t) = ∇V r(t) · r (t) = −F r(t) · r (t) = −T (t) .
This theorem is the law of conservation of energy and is the reason for using
the term conservative to describe a force field that is the negative of the gradient
of a potential. Since, as we shall prove below, the Newtonian force of gravitation is
conservative, it follows that the gravitational field does no net work when a particle
makes one revolution in a periodic orbit.
5.1. The criterion for a field to be conservative. Locally, a force field is
conservative if it is irrotational. That means its curl is 0: If F = P i + Q j + R vk,
then
∂R ∂Q ∂P ∂R ∂Q ∂P
0 = curl F = ∇ × F = − i+ − j+ − k.
∂y ∂z ∂z ∂x ∂x ∂y
Any field that is the negative of the gradient of a potential function is conservative,
that is curl grad V = ∇ × (∇V ) = 0. That fact is a consequence of the equality
of mixed partial derivatives, as one can see by replacing P , Q, and R in the last
equation by their values as partial derivatives of V .
This2 will 2not doglobally, as we can see by considering the field F (x, y, z) =
− y/(x + y ) i + x/(x2 + y 2 ) j. The curl is given by
∂ x/(x2 + y 2 ) ∂ y/(x2 + y 2 )
∇×F =− i− j+
∂z ∂z
∂ x/(x2 + y 2 ) ∂ − y/(x2 + y 2 )
+ − k
∂x ∂y
1 2x2 1 2y 2
= − 2 + 2 − 2 k = 0.
x2 + y 2 (x + y 2 )2 x + y2 (x + y 2 )2
It is therefore an irrotational field. But if we parameterize a closed path γ1
that begins at (1, 0, 0) and ends at (−1, 0, 0) as r(t) = cos t i + sin t j, 0 ≤ t ≤ π,
we have π
F · dr = sin2 t + cos2 t dt = π ,
0
γ1
while over the path γ2 given by r(t) = cos t i − sin t j, 0 ≤ t ≤ π, we get
π
F · dr = − sin2 t − cos2 t dt = −π .
0
γ2
is the negative of the force given by this equality. The potential and its gradient
are undefined at 0, but otherwise perfectly well-defined at every other point of
R3 . As mentioned above, that makes the condition ∇ × F = 0 the necessary and
sufficient condition for the potential V (r) to be defined. (We do not need the
general theorem, however, since we already have an explicit expression for V (r).)
This particular force has the interesting property that not only its curl ∇ × F
but also its divergence ∇ · F vanishes:
∂F 1 ∂F 2 ∂F 3
div F = ∇ · F = 1
+ 2
+ = 0.
∂x ∂x ∂x3
This means that the potential function V (r) is a harmonic function. It satisfies
Laplace’s equation, named after Pierre-Simon Laplace (1749–1827):
∇2 V = ∇ · ∇ × V = −div grad V = div F ≡ 0 .
It is well-known that there is only one function harmonic in the interior of a
closed surface (such as a sphere) that has a given set of values on the boundary
of the sphere. If we didn’t already know the Newtonian potential, we could get
Newton’s equations of motion by looking
for a potential function that is a spherically
symmetric harmonic function of r = x2 + y 2 + z 2 , say V (r). For such a function
Laplace’s equation says that
2V (r)
V (r) + = 0,
r
and the general solution of this equation is V (r) = a+b/r. Since additive constants
make no difference to a potential function, we see that V (r) could be taken to be
of the form b/r, as in fact it is.
The value of potential functions is not really computational. In the end, they
send us back to the Newtonian forces when we take their gradients. But they
have great value when combined with the kinetic energy function in what is called
Hamilton’s principle, which we shall discuss below.
To gain further insight, we replace the particle at the origin by a continuous
density4 ρ(r). In that case, there is still a potential function, since we can compute
the gravitational field due to this density by passing to the limit of sums of point
masses ρ(r) dV ; for the resultant (vector sum) of conservative forces is conservative.
When the gravitating mass is given by a density, the potential energy per unit mass
ϕ(r) = V (r)/m satisfies Poisson’s equation, named after Siméon-Denis Poisson
(1781–1840):
(2.14) ∇2 ϕ(r) = 4πGρ(r) .
This last relation is directly computable if the density ρ(r) is spherically sym-
metric, that is, can be written as ρ(r) where r = |r|. For that case, Newton
showed that the gravitational attraction exerted on a particle of mass m located
4 It is most unfortunate that, as remarked in the preceding chapter, the symbol ρ is used to
denote both the radial coordinate when spherical coordinates are used and the density of mass or
charge. The usage is so well established, however, that our formulas would only look peculiar if
we chose a different symbol. The reader is hereby warned once again to be alert for the ambiguity.
Also, having ousted the symbol from its role as the radial variable in spherical coordinates on R3 ,
we are forced to replace it with r, which is normally the radial coordinate in R2 .
92 2. RELATIVISTIC MECHANICS
at the point x by a spherical shell of radius s with center at the origin and having
constant two-dimensional density ρ̃ per unit area satisfies
4πρ̃Gms2
F (x) = − r3 x , if r = |x| > s ,
0, if r = |x| < s .
That is, the force on a particle of mass m outside the sphere is the same as
would be exerted by a particle having the same mass M = 4π ρ̃s2 as the whole
spherical shell but situated at the origin, while there is no net force at any point
inside the shell. We can define a volume density ρ on an infinitely thin shell of
thickness ds by taking ρ(s) ds = ρ̃(s) and consider the infinitesimal increment of
mass dM = ρ(s) dV = 4πs2 ρ(s) ds and the infinitesimal increment of force given
by
Gm Gm 4πGm 2
dF (x) = − 3
dM x = − 3 (4πs2 ρ(s) ds) x = − (s ρ(s) ds) ξ ,
r r r2
where ξ = x/|x| = x/r is a unit vector having the direction of x.
We then integrate from 0 to x along any radial line consisting of the points
sξ, 0 ≤ s ≤ r, in order to get the resultant attraction on a particle at x due to all
the shells. When we do, we find that the potential function V (r) is also spherically
symmetric, and the potential per unit mass is
r s2
ϕ(r) = 4πG ρ(s) s − ds .
0 r
(See Problem 2.7.) The potential per unit mass ϕ satisfies the Poisson equation
(2.14), which is directly computable in this case.
In the special case we are going to consider, where ρ(s) = ρ is constant, we
have
2πGρr 2
ϕ(r) = .
3
The force on a particle of mass m at a point x on the sphere of radius r is
therefore
4
F (x) = − πGρmx .
3
Throughout this section, we have considered potential energy functions that
depend only on distance, not velocity. They correspond to forces that depend only
on the position of a particle and not on its velocity. As a result, our discussion does
not apply, for example, to a charged particle in a magnetic field, on which the force
is given by the Lorentz law F = (v/c) × B, where v is the velocity of the particle
and B the magnetic field.
6. Hamilton’s Principle
Up to now, we have adopted the simplest progression to develop the mathematics of
special relativity, following in the strict Newtonian sequence of concepts: time (t),
position (r = (x, y, z)), velocity (v), acceleration (a), mass (m), momentum (p),
and force (F ). Then the velocity is v(t) = r (t), the acceleration is a(t) = r (t), the
momentum is p(t) = mv(t) = mr (t), and whatever force F is hypothesized must
satisfy Newton’s second law F (t) = p (t) = mr (t) = ma(t). In order to analyze
any mechanical phenomenon and determine the position of the particle r(t), it is
necessary (1) to know what the net force F (t) is, and (2) to solve the system of
differential equations mr (t) = F (t). In the case that we are most interested in,
7. THE NEWTONIAN LAGRANGIAN 93
F depends only indirectly on time, being a function of the position r. The force is
Newton’s law of gravity, introduced above.
The formulation F = ma has a nice intuitive feel to it, but does not lend
itself readily to the kind of geometrization required for the general theory of rela-
tivity. As a first step on the way to that geometrization, we have adjoined three
concepts introduced over the eighteenth and nineteenth centuries, mostly by Conti-
nental mathematicians, namely the concepts of work (W ), kinetic energy (T ), and
potential energy (V ). We then applied these concepts relativistically, noting that
the work done on a particle by a force, which in Newtonian mechanics equals the
change in its kinetic energy, could be interpreted as a√change in its mass through
the famous equation E = mc2 , where m = m0 α = mc/ c2 − v 2 , and m0 is the rest
mass.
Newton’s formulation of physics does not involve the notion of energy, which
was foreshadowed as vis viva (living force = mv 2 = the double of what we now call
kinetic energy), introduced by Newton’s contemporary Leibniz. Their successors
in Britain and on the Continent introduced the concept of energy and showed that
Newton’s second law could be formulated by saying that a particle subject to a
conservative force moves so that the integral of the difference between the kinetic
energy and potential energy per unit mass is minimized or maximized. As we are
primarily interested in the case of gravitation, we shall describe the computations
for that case.
The Lagrangian depends only on (1) the masses M and m of the two bodies, (2) the
universal gravitational constant G, and (3) the position and speed of the particle. In
the case of gravitational force, the potential energy V depends only on the position
and the kinetic energy T only on the speed. It turns out that when x, y, and z vary
with time (the particle moves), Newton’s equations of motion are equivalent to the
statement that the integral of the Lagrangian with respect to time is “stationary”
(an extremal). That fact is known as Hamilton’s principle, named after William
Rowan Hamilton (1805–1865). We state it as a formal theorem, although we keep
the context limited to potential functions that are independent of velocity:
Theorem 2.9. If the force acting on a particle is conservative, then over a time
interval t0 ≤ t ≤ t1 , the particle will move over a path r(t) for which the integral
t1
T r (t) − V r(t) dt
t0
motion:
d ∂L ∂L
= ,
dt ∂x ∂x
d ∂L ∂L
= ,
dt ∂y ∂y
d ∂L ∂L
= ,
dt ∂z ∂z
and these equations say that
∂V
mx = − ,
∂x
∂V
my = − ,
∂y
∂V
mz = − ,
∂z
nearly abandoned when it was resurrected by Lagrange for the calculus of variations. It is still
much commoner than our primed notation.
6 The subscripted nabla-notation here is too neat to resist. It does, however, conflict with
the use of subscripted nablas to denote covariant derivatives, a notation found in all textbooks
on relativity and one that we shall introduce in Chapter 5. Our use of it in the present context
is restricted to this chapter and should not be allowed to interfere with its standard use when it
reappears later on.
7. THE NEWTONIAN LAGRANGIAN 95
To get what Gauss called the “shortest path” and what we now call a geodesic,
we use the Euler equations to write down the differential equations of a stationary
value of this integral. Here we have the same variational problem that we just
stated as Hamilton’s principle in the context of mechanics, only now it is in the
context of pure geometry. Thus, we can set Hamilton’s principle alongside the
problem of finding a geodesic and treat them as being exactly the same kind of
mathematical problem. The “geometrization” of mechanics is now in sight. In
general, geodesics are not simple to compute, even on a surface as simple as the
hyperbolic paraboloid whose equation is z = xy. The reader will find a number of
examples in Chapters 5 and 6. It is easy to verify that in Euclidean space and in
four-dimensional space-time, they are given by linear relations among the variables.
96 2. RELATIVISTIC MECHANICS
Definition 2.2. If W (r) is the Newtonian potential for a given particle acted
on by a force F , the relativistic potential for the same particle is
1
V (r, r ) = W (r) + α − 1 + m0 c2 ,
α
where, as usual, α = (1 − r · r /c2 )−1/2 .
98 2. RELATIVISTIC MECHANICS
9.1. Newtonian angular momentum. Since the force F is the time deriv-
ative of momentum p, the preceding equation can be written as
r 1 × (p1 ) = r 2 × (p2 ) .
In Newtonian mechanics, a particle of mass m moving in a plane in such a
way that its location at time
t
has polar coordinates r(t), θ(t) —that is, r(t) =
r(t) cos θ(t) i + r(t) sin θ(t) j—will have an instantaneous angular momentum
about the origin, denoted l, given by
l = r(t) × mr (t) = mr 2 (t)θ (t) k .
The angular momentum is perpendicular to the plane of motion, and its magni-
tude mr 2 (t)θ (t) is equal to the mass m times twice the rate at which area is swept
out. We establish this claim informally as follows.
7 From the Latin word torquor meaning I twist. It is the source of our word torture.
9. ANGULAR MOMENTUM AND TORQUE 99
out by the radius vector from the origin to a point moving along a curve whose polar
equation is r = r(θ), starting at a fixed point where θ = θ0 , we simply integrate:
θ r(ϕ) θ
1 2
A(θ) = s ds dϕ = r (ϕ) dϕ .
θ0 0 θ0 2
As a result, we find
dA 1
= r 2 (θ).
dθ 2
If θ in turn is a function of time t, then
dA 1 dθ
= r 2 θ(t) .
dt 2 dt
The angular momentum per unit mass therefore has a geometric interpretation
in Newtonian mechanics as twice the rate at which area is swept out by the radius
vector from the origin to the moving particle. Kepler’s second law of planetary
motion asserts that this rate is constant for each of the planets, and so we can
interpret that law as conservation of angular momentum.
and this interval is the same for all observers whose space-time coordinates are
related by a Lorentz transformation. Again, as already pointed out, this equation
implies that
ds v·v 1
= 1− 2 = ,
dt c α
where v = r is the Newtonian velocity, obtained by differentiating
r with respect to
laboratory time t. Thusour differentiation operator d/dt is 1 − v · v/c2 d/ds, and
conversely, d/ds = 1/ 1 − v · v/c2 d/dt = α d/dt. We also recall from Chapter 1
that proper time on a moving particle is the time that would be recorded by a clock
moving with the particle. It is smaller than laboratory time.
We now have a very elegant expression for the mass mv of a particle moving
at speed v whose rest mass is m0 :
dt
m v = m0 .
ds
Angular momentum is useful in the solution of the differential equations of me-
chanics, since its conservation provides an integral of those equations. An integral
of a set of differential equations is a function of the variables in the equation that
assumes a constant value when the time and position satisfy the set of differential
equations. See Appendix 5 for further explanation of integrals.
The fact that relativistic angular momentum is conserved—that is, mv r 2 dθ/dt
is constant—can also be expressed by saying that m0 r 2 dθ/ds is constant.
9.3. Conservation of angular momentum under a central force. An-
gular momentum is conserved under a wide class of forces, including any central
(centripetal or centrifugal) force of the form F = ϕ(r)r, where r is the location.
This fact will follow from pure geometry—well, actually, algebra—when we replace
force by curvature in Chapter 4. Newton himself (see [63], Proposition 1, Theo-
rem 1, pp. 32–33) gave an infinitesimal proof of it that is very geometric, based on
Fig. 2.5.
Now in relativity, we know that the acceleration of a particle is not normally
parallel to the force, and therefore we can’t rely on the geometric infinitesimal
argument given by Newton. Conservation of angular momentum nevertheless holds
under a central force. Let us state this fact as a formal theorem.
Theorem 2.10. Angular momentum is conserved when a particle moves under
the influence of a central Newtonian or relativistic force F .
Proof. We shall show that the vector l = r × p is constant. If the force
(whether Newtonian or relativistic does not matter) is F = ϕ(r) r, where ϕ(r) is
a scalar-valued function of position, then
l = r × p + r × p = 0 + r × F = r × ϕ(r)r = 0 .
Here again, the term r × p vanishes because p = mr , and m is a scalar, whether
it represents Newtonian or relativistic mass.
Corollary 2.2. Under a central Newtonian or relativistic force field, the orbit
of a particle lies in a plane.
Proof. The radius vector r from O to the particle is always perpendicular
to the constant angular momentum l. Therefore the particle stays in the plane
through O perpendicular to l.
10. FOUR-VECTORS AND TENSORS* 101
l ..
l .
.. ....
... ..
...
......................B .. .............B
. .... .
... . ... .
... ... ............ ... D ..
. .. ...
. ....
... ... .... ...
...... ...... .................. ...
... ... ........... ... AB=v dt ... ........ . ......... ... AB=v dt
.. ...
. .
. . ... ... ......... ... ............
... ... ...
.......
.. .
.
. .
.. ... ...
..
... . ...... ....
... . .. ...... . .
... . . .. . ........ ..
. ..
. ....
. .
.
. . . .
. .
..
..... .
. .. .
. ..
... ...
..
......
... ....... A ... ...
.. .... ......... A
... ... ........ .... ... ... ..... ..
......... ....
...... ......... ... ...
..... .. .. ...............
. .
.. ... ... ..
......... .
....
...........
. ..
.
. .
. ... ..
..
...... .
. ......
. .
..
.
.
..
...
.. . .. . .
... ... ...... ......... .... .. ... .
..... ......... ...
... ... ... ...
....... ..
. ............ .
. .
.. ... . ... ..
......
. .....
..........
. C ...
.
.. ..
...... .
..... ...... .
.
.
.
. .
.....
. .. .
......... ...
..... .............. . . . . .
... .
.......
.. .
..
. .
. .
. .
..
. .
. ..
.. ...
. ...
.... ............ .
. . ..
.... ............ ...
k ... ...................... . .
.
. k ... ...................... . .
..
• •
.
......................... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ........................ ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
..
. ....
O h .... O h ...
here because it is logically most closely related to the material of this chapter and
would be a distraction to explain on the fly when it is needed.
10.1. The four-velocity. We can best explain the revised language by bring-
ing back the two hypothetical observers X and Y from Section 1. To get the kind of
unity we want, we will assign them systems of space-time coordinates (x0 , x1 , x2 , x3 )
and (y 0 , y 1 , y 2 , y 3 ) respectively, where the coordinates x0 and y 0 represent time and
the other three in each system are spatial coordinates in an orthonormal basis of R3 .
As usual, we assume that Y is moving along the common x1 -axis (y 1 -axis) at speed
u relative to X, so that the coordinates are related by the Lorentz transformation
u
y 0 = α x0 − 2 x1 ,
c
y 1 = α(−ux0 + x1 ) ,
y2 = x2 ,
3
y = x3 ,
where α = 1/ 1 − u2 /c2 . This system of equations can be written in matrix form
as
Y = LX ,
where ⎛ 0⎞ ⎛ ⎞ ⎛ 0⎞
y α − uα
c2 0 0 x
⎜y 1 ⎟ ⎜−αu α 0 0 ⎟ ⎜x1 ⎟
Y =⎜ ⎟ ⎜ ⎟ and X = ⎜ 2 ⎟ .
⎝y 2 ⎠ L = ⎝ 0 0 1 0⎠ ⎝x ⎠
y3 0 0 0 1 x3
We make the seemingly trivial remark that the matrix L is actually the Jaco-
bian 8 matrix of the change of coordinates (x0 , x1 , x2 , x3 ) → (y 0 , y 1 , y 2 , y 3 ):
⎛ ∂y0 ∂y0 ∂y0 ∂y 0
⎞
∂x0 ∂x11 ∂x12 ∂x13
⎜ ∂y1 ∂y ∂y ∂y ⎟
⎜ 0 ∂x23 ⎟
L = ⎜ ∂x
∂y 2
∂x21 ∂x22
∂y ⎟
⎝ ∂x 0
∂y
∂x31
∂y
∂x32 ∂x33
⎠
∂y 3 ∂y ∂y ∂y
∂x0 ∂x1 ∂x2 ∂x3 .
This remark appears trivial, since L is a constant matrix. That is, the change
of coordinates (x0 , x1 , x2 , x3 ) → (y 0 , y 1 , y 2 , y 3 ) is a linear transformation. That
linearity is precisely what makes the special theory of relativity “special.” The
general theory differs from it in allowing more general changes of coordinates. The
Jacobian matrix associated with a change of coordinates is one of two matrices that
will occur constantly beginning in Chapter 4. The other one is the matrix of the
space-time metric, described below.
For two events E1 and E2 , to which X assigns coordinates (x01 , x11 , x21 , x31 ) and
(x2 , x12 , x22 , x32 ) respectively while Y assigns them coordinates (y10 , y11 , y12 , y13 ) and
0
(y20 , y21 , y23 , y23 ), the proper time interval Δs between the two events is the same for
both observers, being the square root of
1
Δs2 = (x02 − x01 )2 − 2 (x12 − x11 )2 + (x22 − x21 )2 + (x32 − x31 )2
c
1
= (y2 − y1 ) − 2 (y21 − y11 )2 + (y22 − y12 )2 + (y23 − y13 )2 .
0 0 2
c
8 Named after Carl Gustav Jacobi (1804–1851)
10. FOUR-VECTORS AND TENSORS* 103
1 1
(s2 − s1 )2 = (x02 − x01 )2 −2
(x2 − x11 )2 + (x22 − x21 )2 + (x32 − x31 )2
c
1
= (y2 − y1 ) − 2 (y21 − y11 )2 + (y22 − y12 )2 + (y23 − y13 )2 .
0 0 2
c
At this point, we no longer need the restriction that Z is moving a constant speed
and direction. Taking derivatives automatically “linearizes” equations, in the sense
that the best linear approximation to any function has exactly the same deriva-
tive as the function itself. In this way, we have found one observer-independent
quantity that can stand in for the velocity vector of a moving particle, even one
whose observed velocity is not constant. Although X and Y measure the velocity
differently, both can compute the velocity four-vector of Z, which we define to be
the quadruple that X computes as
dx0 dx1 dx2 dx3
u4 = (u0 , u1 , u2 , u3 ) = , , ,
ds ds ds ds
and Y computes as
dy 0 dy 1 dy 2 dy 3
v 4 = (v 0 , v 1 , v 2 , v 3 ) = , , , .
ds ds ds ds
We use the subscript 4 here to distinguish a four-velocity from an ordinary
velocity. In particular u4 represents a velocity that X records for Z, whereas
previously u was the velocity X assigned to Y .
The components of these two vectors are not numerically the same. What is
the same in both is the meaning of the symbol s, and one other important quantity,
which we state as a theorem (already proved):
104 2. RELATIVISTIC MECHANICS
coordinate system. The coefficients linking the two are ∂y i /∂xj , that is, they are
derivatives of Y ’s coordinates with respect to X’s and therefore normally expressed
as functions of X’s coordinates. Thus, the right-hand side mixes up the two sets
of coordinates, and that mixing accounts for the prefix contra- that occurs in the
name.
In contrast, the differential dxj satisfies
∂xj i
dxj = dy ,
i
∂y i
and here the right-hand side contains both coefficients and differentials expressed
in Y ’s coordinates. The differential is therefore a covariant vector, often called a
covector.
The danger of confusion arises because, looking at Eq. (2.15), one can see that
the left-hand side is expressed in Y ’s coordinates and both the components uj and
the coefficients ∂y i /∂xj are expressed in X’s coordinates. It appears then that
this equation defines a covariant vector. The explanation is that it "truly does,
i i
but
" j the covariant vector is not the directional derivative operator v ∂/∂y =
j i
u ∂/∂x . It is rather the coefficient v , which can be thought of as a linear
functional operating on directional derivative operators. That is where the duality
arises: a linear operator on contravariant vectors is a covariant vector, and vice
versa.
In practice, it is easy to tell contravariant tensors from covariant ones, since
by a now-universal convention,9 the coefficients in the expansion of a contravariant
tensor in any basis are denoted with superscripts and those of a covariant tensor are
subscripts. There are also mixed tensors, whose coefficients have both subscripts
and superscripts. Finally, a scalar is regarded as a tensor of type (0, 0) (rank 0)
and can be treated as either covariant or contravariant. When visual clues are
lacking—say a tensor is described as an operator rather than being given as a set of
coefficients—just keep in mind that a contravariant vector on an abstract manifold
is a directional derivative operator that transforms between coordinate systems via
the rule
∂ ∂y i ∂ ∂y i j ∂ ∂
u↔ uj j = uj j i
= j
u i
= vi i ,
j
∂x j i
∂x ∂y i j
∂x ∂y i
∂y
As this equation shows, the two four-velocity vectors regarded as equivalent cor-
respond to the same directional derivative operator. That is a good reason for
insisting on this transformation law when coordinates are changed. If two quanti-
ties transform in this way, they can be regarded as different representations of the
same object.
9 One must be cautious here, however. It is not difficult to find expositions on-line in which
the differentials dxi are called contravariant and the partial derivatives ∂/∂xi are called covariant.
More important than the name attached to each is the fact that they are interacting with each
other in dual ways. The components (coordinates) of a contravariant vector transform covariantly,
but a basis of the space of contravariant vectors transforms contravariantly. Another possible
source of confusion, especially pronounced among earlier writers such as Eddington ([16], p. 43)
is the failure to distinguish between an operation and the result of applying that operation—that
is, between the operator ∂/∂xi and the corresponding partial derivative that results when this
operator is applied to a function: ∂ϕ/∂xi . In the place just cited, Eddington refers ambiguously
to “a set of four quantities,” and one can’t be sure whether he means the quantities that we now
write as (dx1 , dx2 , dx2 , dx4 ) or the coefficients in a linear combination of these four quantities.
106 2. RELATIVISTIC MECHANICS
The real reason for the term contravariant comes from the theory of manifolds,
explained in Appendix 4. Given a C ∞ -mapping ϕ : M → N from one manifold to
another, associated with it is a mapping dϕ of their tangent spaces. If ϕ(P ) = Q,
then tangent vectors at P and Q are linear mappings, say X and Y from the space
of locally C ∞ -functions at P and Q respectively having the product-rule property:
X(f g) = f X(g) + g X(f ) and Y (hk) = h Y (k) + k Y (h) for locally C ∞ -functions f
and g at P and h and k at Q. (It is shown in Appendix 4 that any such mapping
is a linear combination of partial derivative operators.) The mapping dϕ allows X
to operate on C ∞ -functions near Q by the rule dϕ(X)(h) = X(h ◦ ϕ). Thus we
associate with the mapping
ϕ
M −→ N ,
the covariant functional dϕ:
dϕ
TP −→ Tϕ(P ) ,
which maps in the same direction as ϕ, that is, it is “covariant” with ϕ. Since dϕ is
a covariant linear operator—in local coordinate systems at P and Q, its matrix is
the Jacobian matrix of the mapping ϕ—the vectors (linear combinations of partial
derivative operators) on which it operates are contravariant.
Remark 2.1. This appears to be a suitable point to introduce the Einstein
summation convention, which we shall re-introduce in Chapter 4 for the benefit of
readers who postpone the reading of this section until the end. Whenever the same
index appears both “up” and “down” in a term, summation is understood to be
performed over that index (and hence it is a “dummy index” which can be replaced
by any convenient symbol without changing the meaning of the term). In this case,
the repeated indices are i and j which appear as superscripts (“up”) and in the
denominators of the partial derivative operators (“down”). That convention saves
huge amounts of writing. (Actually, in Einstein’s day, the usage of subscripts and
superscripts was less rigid, and he simply said that one should sum over a repeated
index, unless there was an explicit instruction not to do so.)
Tensors turn out to be the best possible language for general relativity. A
quantity that transforms contravariantly (or covariantly, as we shall shortly note)
can form the basis for a common discussion by two different observers. Tensors are
a sort of lingua franca of physics. Each observer has a “native” language consisting
of observed time and space, but is separated from other observers by the specificity
of that language. One can picture two observers attempting to talk about the
motion of a particle. If one is speaking Italian and the other Romanian, there
will be some difficulty in communicating. But if they agree to converse in French,
they can talk, confident that they are both discussing the same thing.10 Einstein
believed that physical laws should be stated as tensor relations, since that language
made it possible for two observers to reconcile their measurements. At present,
contravariance is restricted to linear changes of coordinates, but in Chapters 5
and 6, we shall remove that restriction and allow any smooth coordinate changes
and require (nearly) all physically meaningful quantities to be tensors.
10 In the eighteenth century, Prussians, Russians, Britons, Scandinavians, Italians, and others
often spoke and wrote to one another in French; that is the origin of the term lingua franca. The
“lingua franca” of Europe had previously been Latin (which had displaced Greek in the West),
and nowadays it is English.
10. FOUR-VECTORS AND TENSORS* 107
The four-velocity vector has an additional virtue besides its simple contravari-
ant behavior under changes of variable. It is easily translated into the “native” lan-
guage of the observer. Dividing by its zeroth component, which is the (dimension-
less) reciprocal of the FitzGerald–Lorentz contraction factor, we get a four-vector
whose “time” component is the absolute, dimensionless constant 1 and whose three
spatial components constitute the velocity three-vector of classical mechanics:
dx1 dx2 dx3
dx1 dx2 dx3
ds
1, dx ds ds
0 , dx0 , dx0 = 1, 0 , 0 , 0 = (1, r ) ,
ds ds ds
dx dx dx
1
where r(t) = x (t), x2 (t), x3 (t) and t = x0 . As a result, we have the equation
dx0 dt
u4 = 1, r (t) = 1, r (t) .
ds ds
To summarize, the four-velocity carries within it the Newtonian velocity in
the form of the ratios of the last three components to the zeroth, and that zeroth
component is just the reciprocal of the FitzGerald–Lorentz contraction factor. That
is, it is the quantity that appears in the Lorentz transformation equations as α, β,
or γ in Chapter 1. We shall denote it by α in the present discussion.
To prove that the zeroth component has this value, observe that the world-line
of the particle can be parameterized by observed time t = x0 as well as proper time
s. That means that t = x0 (s) is an increasing function of s. It therefore has a local
inverse, and ds/dt = 1/(dt/ds). Moreover, as one can easily see from the definition
of proper time,
ds 2 ds 2 |r (t)|2
= 0
=1− ,
dt dx c2
so that we could write
u4 = α 1, r (t) ,
where α is the reciprocal of the FitzGerald–Lorentz contraction
√ factor for a body
moving at velocity r (t) relative to X, that is, α = c/ c2 − r · r .
10.2. The metric tensor. At this point, we need to introduce the second
matrix mentioned above, the one that defines the metric of space-time. In standard
space-time coordinates (t; x, y, z) it is the simple diagonal matrix with constant
entries ⎛ ⎞
1 0 0 0
⎜0 − 12 0 0 ⎟
M =⎜ ⎝0
c ⎟.
0 − c12 0 ⎠
0 0 0 − c12
In contrast to the Jacobian matrix, the matrix M operates within a given
coordinate system rather than connecting two different ones. It is regarded as a
bilinear function on pairs of contravariant four-vectors u and v. These need not be
velocity four-vectors, but we remind the reader that it is necessary for all the ratios
ui /u0 and v i /v 0 , i = 1, 2, 3, to have the physical dimension of velocity, so that all
the terms that are to be added in the following expression have the same physical
dimension.
M (u, v) = gij ui v j ,
where the entry in row i + 1 and column j + 1 of M is gij . That is, g00 = 1,
gii = −1/c2 if i = 1, 2, 3 and gij = 0 if i = j. The off-diagonal elements in the first
row and first column are thought of as having dimensions of (length × time)−1 .
108 2. RELATIVISTIC MECHANICS
This fact is theoretically important, since the sum that defines M (u, v) has to make
sense, but it does not affect any computations in the present case since the numbers
are all zero anyway. If u = u4 , as defined above, then
M (u4 , u4 ) = 1 .
The bilinear form defined by this matrix is covariant, and is a tensor of type
(0, 2), since it operates on a pair of contravariant vectors. If we change the coor-
dinate system from X to Y , the vectors u4 and v 4 will transform into u˜4 and v˜4 ,
where
∂y i k ∂y j l
ũi4 = u and ṽ4j = v .
∂x k ∂xl 4
If we want the metric to continue measuring things as before, we must replace
M by M # such that
∂y i ∂y j k l
g̃ij ũi4 ṽ4j = g̃ij u v = gkl uk4 v4l .
∂xk ∂xl 4 4
Thus, instead of multiplying the coordinates of u4 and v 4 by the Jacobian
matrix whose entries are ∂y i /∂xk when we change from X to Y coordinates, we
need to multiply the coordinates gkl by the inverse of the Jacobian, whose entries,
by the chain rule, are ∂xk /∂y i . And, since i and k range from 0 to 3, it is the same
as ∂xl /∂y j with the same range of indices. We therefore have
∂xk ∂xl
g̃ij = gkl .
∂y i ∂y j
The contrast between covariant and contravariant tensors can now be seen.
The functions gkl and g̃ij are the coordinates of the tensor M and therefore have
variance opposite to that of M itself. In the equation connecting the two sets of
coordinates, those on the right are expressed in X’s coordinates and the connecting
coefficients ∂xl /∂y j are in Y ’s coordinates. Therefore, the coordinates of this tensor
are contravariant, and so the tensor M itself is covariant.
The inverse of the metric tensor M is an example of a contravariant tensor of
type (2, 0), and so we shall write its components as mij to show this contravariant
nature. There are also mixed tensors, for example of type (1, 1), an example of
which is the matrix product M M −1 , which is the identity matrix I. We write its
entries as δij to show the mixed nature of this tensor. Here δij is known as the
Kronecker delta:11
j ki 1 , if i = j
δi = mjk m
0 , if i = j .
11 Named after Leopold Kronecker (1823–1891), who introduced it in a paper bearing the
title “Über bilineare Formen” that he read before the Royal Prussian Academy of Sciences on
15 October 1866. This paper was published in the Monatsberichte (Monthly Bulletin) of the
Academy for that year, pp. 597–612, and again in Crelle’s Journal für die reine und angewandte
Mathematik, Bd. 68, pp. 273–285. The relevant passage can be found on p. 150 of the first volume
of the 1895 Teubner edition of his collected works, edited by Kurt Hensel.
10. FOUR-VECTORS AND TENSORS* 109
10.4. The equation E = mc2 revisited. As is the case with the four-velocity
and four-momentum, the ratios F i /F 0 here for i = 1, 2, 3 have the physical dimen-
sion of velocity. We can therefore consider the effect of applying the metric tensor
M as a bilinear operator on the pair (u4 , F 4 ). Since the metric coefficients gij are
constant and symmetric in the indices i and j, we can differentiate the relation
(M (u4 , p4 ) = M (u4 , m0 u4 ) = m0 and get
d i j duj
0 = m0 gij (u4 u4 ) = 2m0 gij ui4 4 = 2M (u4 , F 4 ) .
ds ds
Canceling the 2 and taking account of the definition of F 4 , we get the following
relation, after canceling a factor of α on both sides.
α
F 0 = 2 F · r ,
c
where, again, F is the relativistic three-force defined earlier and the prime denotes
differentiation with respect to t.
Using the explicit form of F 0 , we find that
d dm
F · r = m0 α3 r (t) · r (t) = c2
(m0 α) = c2 .
dt dt
Now the left-hand side of this equation is just dW/dt, where
t
W = F (s) · r (s) ds
t0
110 2. RELATIVISTIC MECHANICS
represents the work done on the particle between a fixed (laboratory) time t0 and
time t. Comparing the two relations, we find, as we did before, that
mc2 = C + W ,
In the previous chapter, we found that the simple Euclidean formula for distance
ds2 = dx2 + dy 2 + dz 2 was concealing the more general metric tensor (gij ), which
appeared only when general coordinates were used. Something similar happens
here and leads to a very interesting and useful class of tensors in four variables.
We saw above that the time component of the four-force is
m0 α 4 v · a α 1 dr
F0 = = 2F · v = 2F · .
c2 c c ds
Except for the factor 1/c2 , this is the rate at which the force is doing work,
measured in proper time, since dr/ds = αdr/dt = αv. Thus, the time component
of the four-force represents the rate at which the mass m of a moving particle
is changing relative to a clock attached to the particle, both particle and clock
being observed by someone in a “laboratory” frame. (From the point of view of
an observer moving with the particle, the clock is keeping perfect time and the
mass never changes.) This suggests a further enlargement of mechanics into space-
time. Kinetic energy can be expressed as the quadratic form associated with a
bilinear form T that operates on pairs of velocities (u, v), where u = (u1 , u2 , u3 )
and v = (v 1 , v 2 , v 3 ), by the equation
T (u, v) = tij ui v j .
Here, tij = (m/2)gij /2, where the metric in these coordinates is given as ds2 =
"
gij dxi dxj .
If we take u0 = dt/ds = α, we can then consider an analogous representation
of energy as a bilinear form T operating on four-velocity vectors, which we shall
call a stress-energy-momentum tensor. In matrix form, with the usual colloquial
expression of the coordinates (x0 ; x1 , x2 , x3 ) as (t; x, y, z) we can write analogous
10. FOUR-VECTORS AND TENSORS* 111
expressions to represent various forms of energy. The simplest of these is the “rest-
energy” tensor T = (tij ) = m0 M whose entries are tij = m0 gij :
⎛ ⎞ ⎛ dt ⎞
t00 t01 t02 t03 ds
dt dx dy dz ⎜t10 t11 t12 t13 ⎟ ⎜ dx ⎟
⎜ ⎟ ⎜ dy
ds ⎟ .
ds ds ds ds ⎝t t21 t22 t23 ⎠ ⎝ ds ⎠
20
t30 t31 t32 t33 dz
ds
To see the effect of this definition, apply this tensor to a pair of velocity four-
vectors u4 and v 4 . The result is
(2.17) T (u4 , v 4 ) = m0 gij ui v j .
When u4 = v 4 , this becomes
T (u4 , u4 ) = m0 |u4 |2 = m0 .
Thus, the associated quadratic form, when applied to a four-velocity, returns the
rest mass as output.
As another example, let ρ0 be the rest density of a distribution of matter in
space, and consider the simple tensor whose matrix in standard coordinates is
⎛ ⎞
ρ0 0 0 0
⎜ 0 0 0 0⎟
T =⎜ ⎝ 0 0 0 0⎠ .
⎟
0 0 0 0
See Eddington ([16], p. 102). Theorem 2.4 now implies that when the matter in
the distribution is moving with four-velocity u4 , we have
T (u4 , u4 ) = ρ0 α2 = ρ .
Stress-energy-momentum tensors play a role analogous to the various forces
that appear in applications of Newton’s second law. The analog of this law in
general relativity is provided by the Einstein field equations
8πG
Gμν + Λgμν = Tμν ,
c4
Here Gμν is the Einstein tensor, which we are going to call “Ein” in Chapter 7. It
is formed in turn from the Ricci tensor, which we shall call “Ric.” (It is generally
denoted simply Rμν .) The exact relation is Ein = Ric + Rg, where g is the metric
tensor g = (gij ), and R the contraction of the Ricci tensor using the inverse of the
metric tensor, that is, R = g ij Rij . The Einstein and Ricci tensors are explained in
Chapters 6 and 7. The constant Λ is the cosmological constant, and G is Newton’s
gravitational constant (and is the reason why we are not using the letter G for the
Einstein tensor). The left-hand side of this equation is a one-size-fits-all expression
analogous to mr in Newton’s second law. It represents the “response” of the
system to the tensor on the right-hand side, which corresponds to the force F ,
which has to be postulated individually for each motion that is to be explained. In
the particularly simple case of empty space where no matter is present and “nothing
is happening,” the stress-energy-momentum tensor can be chosen with ultimate
simplicity, as the zero tensor. It is stunning that such a simple assumption explains
with incredible precision the precession of a planetary orbit and the deflection of
light around a star. All that will be discussed in Chapter 4 below.
112 2. RELATIVISTIC MECHANICS
where dA = (dA) n(ξ), dA is the infinitesimal element of surface area (that is, the
area of the patch dΣ), and n(ξ) is the outward unit normal vector. The negative
sign is taken because an outward flow decreases the mass in the region B. Summing
up these flows over all the infinitesimal patches dΣ, we get the infinitesimal change
in the mass in the region B:
dM (t) = − ρ(t; ξ)v(t; ξ) · dA dt ,
Σ
that is,
M (t) = − ρ(t; ξ)v(t; ξ) · dA .
Σ
11. PROBLEMS 113
On the other hand, differentiating inside the triple integral in the expression
for M (t), we obviously have
∂ρ(t; x, y, z)
M (t) = dx dy dz .
∂t
B
Since the region B is arbitrary, the integrands in the last two expressions must
be equal, and we get the equation of continuity
∂ρ
+ ∇ · (ρv) = 0 .
∂t
Because of this equation, if the condition ∇ · (ρv) = 0 holds, then ρ(t; x, y, z) is
independent of t. That is, mass does not have any tendency to accumulate or rarify
at any point. The density at a given point does not change over time, although it
may vary from one point to another.
The equation of continuity suggests a definition of the divergence of the mo-
mentum density four-vector ρ4 = (ρ, ρ dx/dt, ρ dy/dt, ρdz/dt) as
∂ρ ∂ ρ dx dt ∂ ρ dy
dt ∂ ρ dz
dt
div ρ4 = + + +
∂t dx dy dz
∂ρ
= + ∇ · (ρv) .
∂t
If that is done, the equation of continuity merely says div ρ4 = 0. For more details,
see, for example, the book by Eddington, ([16], p. 117).
A similar increase in notational efficiency can be introduced into electromag-
netic theory using the stress-energy-momentum tensor and reformulating the elec-
tric and magnetic fields as tensors. When that is done, Maxwell’s four equations
(two curl equations and two divergence equations) become just two tensor equa-
tions. Since we are aiming at a minimal presentation of the principles of general
relativity, with a heavy emphasis on the geometry, we shall not attempt to “ten-
sorize” electromagnetism. Instead, in the following (optional) chapter, we shall
return to our three-vectors and show through direct application of the Lorentz
transformation how Maxwell’s two curl equations can be derived from the two di-
vergence equations.
11. Problems
Problem 2.1. Show that Eqs. (2.1)–(2.3) can be written in vector form as
1 α − 1 v y (s) · u 1
v x (r) = v y (s) + u + u,
αη αη u·u η
where v x = (x1 ) i+(x2 ) j +(x3 ) k and v y = (y 1 ) i+(y 2 ) j +(y 3 ) k are the velocity
vectors the two observers assign to the particle. Here the dot product is taken by
Y . The vector u is common to the two, in accordance with our convention that a
vector equation of this type is to be used only in the privileged coordinate systems
where both observers take i = u/u.
114 2. RELATIVISTIC MECHANICS
Problem 2.2. Show that v y is constant (in s-time) if and only if v x is constant
(in r-time).
Problem 2.3. Verify that as c approaches infinity, all the equations of rel-
ativistic mechanics become the classical equations of Newtonian mechanics. In
particular, show that mw → m0 as c → ∞.
Problem 2.4. Use the binomial expansion
v 2 − 12 1 v2 3 v4
1− 2 =1+ + + ···
c 2 c2 8 c4
to verify that, as mentioned in connection with Eq. (2.12), the relativistic kinetic
energy is
1 3m0 v 4 5v 6
(mv − m0 )c2 = m0 v 2 + + + ··· .
2 8c2 16c4
Deduce that, as c → ∞, the relativistic kinetic energy approaches the Newto-
nian kinetic energy 12 m0 v 2 .
Problem 2.5. Show that the increase in rest mass observed by Y in the two-
particle collision discussed in the text can be accounted for by saying that each
particle converted the kinetic energy of the other into mass. Thus, both mass and
energy are conserved in this case.
Problem 2.6. Consider a particle moving along a straight line, so that both v
and a also have the direction of this line. With that direction fixed, we can regard
velocity, acceleration, and force as scalars. Show that in this case
F = α 3 m0 a ,
√
where, as usual, α = c/ c2 − v 2 .
Problem 2.7. Prove “Newton’s lemma” that the gravitational attraction ex-
erted by a spherical shell of constant density (per unit area) is zero inside the sphere,
while at points outside the sphere it is equal to that of a particle at the center of
the sphere having mass equal to the total mass of the shell. Then show that the
force exerted on a body of unit mass by a continuous, spherically symmetric mass
density ρ(r) (per unit volume) is equal to
r
4πG
F (x) = − 3 ρ(s)s2 ds x ,
r 0
where r = |x|. Finally, show that the potential function ϕ(x) for the force exerted
on a body of unit mass is
r s2
ϕ(x) = −4πG ρ(s) s − ds ,
0 r
and that
∇2 ϕ(x) = 4πGρ(r) .
CHAPTER 3
Electromagnetic Theory*
One motive for Einstein’s 1905 paper on special relativity was a desire to improve
the handling of the Maxwell equations.1 The Lorentz transformation, besides ex-
plaining the peculiarity that electromagnetic radiation propagates with constant
velocity relative to all observers, also turns out to make the equations of transfor-
mation for electric and magnetic fields more symmetric. In order to show how this
simplification comes about, we have to explain how two observers reconcile their
measurements of charge, charge density, current density, and electric and magnetic
fields when one is moving with constant velocity u relative to the other. Through-
out, we assume that they are using spatial coordinates with one axis along the
common line of motion. This assumption enables us to use vector language for the
transformations, even though the dot and cross product are not generally invariant
under Lorentz transformations. As the ultimate goal of the present chapter, we plan
to use the classical language of three-component vectors and the special-relativity
equations of transformation of electric and magnetic fields to reduce the Maxwell
curl equations to the Maxwell divergence equations, thus essentially replacing eight
simultaneous equations in the coordinates of the fields with two equations.
The reader is warned that the present chapter is highly algebraic, with just a
small admixture of elementary physical theory and almost no geometry (no figures
at all!). Those who are impatient with arguments that proceed by concatenating
formulas might prefer to accept the main result as given, omit the proofs, and
enjoy contemplating the beautiful and fruitful interaction of special relativity with
Maxwell’s laws.
115
116 3. ELECTROMAGNETIC THEORY*
Note that this result is independent of the radius of the sphere. Even more
is true. The sphere can be replaced by any closed surface enclosing the charged
particle. The reason is that one can imagine a small sphere centered at the location
of the charged particle between the particle and the enclosing surface. Then, by the
divergence theorem, the difference between the integrals of E over the two surfaces
equals the integral of the divergence ∇ · E over the region between the two surfaces.
But by direct and trivial computation, ∇ · E = 0 throughout that region.
These results imply the following theorem, known as Gauss’s Theorem. It
applies either to discrete charges inside a closed surface or to a continuous charge
density2 ρ.
Theorem 3.1. If a closed surface encloses a number of point charges, or a
continuous distribution of charge producing an electric field E, then the integral of
E over that surface—called the flux of E through the surface—equals 4π times the
total amount of charge enclosed.
Corollary 3.1. In a region of space containing a continuous charge distribu-
tion with density ρ, the divergence of the electric field is
∇ · E = 4πρ .
This corollary is one of Maxwell’s equations.
It is a fact well established by experiment that the charge on a particle, unlike
the mass of the particle, is independent of the observer. Thus, if Observer X detects
a point charge q as an event (r; x1 , x2 , x3 ), then Observer Y will also detect that
same point charge q at the corresponding event (s; y1 , y2 , y3 ) given by the Lorentz
transformation.
Suppose that observer O detects a lattice of equal charges of magnitude q that
are permanently located at the points (jd, kd, ld), where j, k, and l range over all
the integers, and d is a fixed length. For that observer there is a charge density
q
ρ= 3.
d
The physical dimension of ρ is charge per unit volume. Now imagine that
observer X sees the charges moving with velocity v = v1 i + v2 j + v3 k, so that X
has velocity −v relative to observer O. The charge density observed by X is easy
3
to compute, since the charge
is invariant, but each volume d is decreased by the
factor 1/γ, where γ = c/ c2 − |v|2 . Thus, X observes a charge density ρx = γρ.
(This γ is the same quantity introduced in § 2 of the previous chapter, but with x
replaced by v.)
Now suppose that Y is moving with velocity u = ui relative to X, and as
usual, Y and X are sharing their first axis, which is the line of mutual motion, and
assigning the same second and third coordinates to each point. Then, according to
2 Just as a reminder: The symbol ρ is used here for charge density, while in previous chap-
ters it denoted the “spatialized time” rc, or mass density, or the radial coordinate in spherical
coordinates.
118 3. ELECTROMAGNETIC THEORY*
Y , the charges are moving with velocity w, which is the relativistic composition of
v and −u. By Eqs. (2.1)–(2.3) of the previous chapter, we have
−u + v 1
w1 = ,
δ
1 2
w2 = v ,
αδ
1 3
w3 = v ,
αδ
where α = c/ c2 − |u|2 and δ = 1 − uv 1 /c2 = 1 − u · v/c2 .
Thus, Y will observe a charge density ρy = βρ, where β = c/ c2 − |w|2 .
Again, this β is the same quantity introduced in § 2 of the previous chapter, only
with y replaced by w.
2.1. The divergence of the current density. We are now going to look at
current density from the point of view of a single observer. The physical units of
current density J are charge per unit area per unit time. The statement that J
has magnitude J in a particular direction, means that in an infinitesimal amount
of time dt, the amount of charge passing through an infinitesimal element of area
dS perpendicular to the direction of J is J dS dt. As a consequence, in a volume R
of space enclosed by a surface S, the surface integral
J · dn = J dS ,
S S
3. TRANSFORMATION OF ELECTRIC AND MAGNETIC FIELDS 119
where n is the unit outward-pointing normal to the surface, gives the amount of
charge passing out of the region R per unit of time. If Q(t) is the total amount of
charge enclosed at time t, we thus have
J · dn = −Q (t) .
S
3 Obviously, this terminology comes from the magnetic compass, whose needle aligns itself
in the north-south direction when free to do so. Since like poles repel and unlike poles attract,
the linguistically odd result is that the north geographical pole of the Earth is approximately its
south magnetic pole and vice versa.
4 Named for the same Hendrik Antoon Lorentz (1853–1928) for whom the Lorentz transfor-
mation is named. Lorentz’s derivation of the law, however, had been anticipated by both James
Clerk Maxwell and Oliver Heaviside (1850–1925).
120 3. ELECTROMAGNETIC THEORY*
We remark that if c were infinite, the electric fields observed by X and Y would
also be equal.
correction of the law—changing electric fields) produce magnetic fields. The fourth
is Faraday’s law:6 A changing magnetic field produces an electric field.
Now it is very easy to derive the divergence equations from the curl equations
and suitably given initial values for B and E (see Problems 3.2 and 3.4 below).
The basic fact involved is that the divergence of the curl is zero. This can be done
in either a Newtonian or a relativistic setting.
The more difficult task, which we now undertake, is to show that if we have the
relativistic equations of transformation for the electric and magnetic fields, we can
make the converse deduction of the curl equations from the divergence equations.
That is one of the most fascinating features of special relativity: its implication
that there are really only two laws here. If everybody observes the divergence
equations, then everybody will also observe the curl equations. Formally, we have
the following theorem.
Theorem 3.4. Let X be a fixed but arbitrary observer. If every observer Y in
uniform motion with respect to X observes that Eq. ( 3.6) holds, then X will observe
that Eq. ( 3.7) holds. Likewise, if every observer Y in uniform motion with respect
to X observes that Eq. ( 3.5) holds, then X will observe that Eq. ( 3.8) holds.
Proof. Since we are dealing with vector expressions used by two observers
here, it is best to resort once again to coordinate-wise reasoning in a privileged
coordinate system. We note that the partial derivative operator along the direction
of motion for the two observers transforms as follows:
∂ ∂x1 ∂ ∂r ∂ ∂ u ∂
= + = α + .
∂y 1 ∂y 1 ∂x1 ∂y 1 ∂r ∂x1 c2 ∂r
The other two partial derivative operators are the same for both observers.
Here r is the time variable used by X. For aesthetic reasons, we shall replace it
with the more traditional t.
We now assume that all observers agree on the equation ∇ · E = 4πρ, that is,
∂Ey1 ∂Ey2 ∂Ey3
+ + = 4πρy ,
∂y 1 ∂y 2 ∂y 3
∂Ex1 ∂Ex2 ∂Ex3
1
+ 2
+ = 4πρx .
∂x ∂x ∂x3
Writing E y in terms of E x and B x and the partial derivative operators ∂y∂ 1 ,
∂ ∂ ∂ ∂ ∂ ∂
∂y2 , and ∂y3 in terms of ∂t , ∂x1 , ∂x2 , and ∂x3 , we find, since Ey1 = Ex1 ,
∂E u ∂Ex1
x1
4πρy = ∇y · E y = α +
∂x1 c2 ∂t
∂E u ∂Bx3 ∂E u ∂Bx2
x2 x3
+α − + α +
∂x2 c ∂x2 ∂x3 c ∂x
3
5. Problems
Problem 3.1. Prove that the convection current density J y = Jy1 i + Jy2 j +
Jy3 k detected by Y is given by the equations
Jy1 = α Jx1 − uρx ,
Jy2 = Jx2 ,
Jy3 = Jx3 .
Problem 3.2. Show that if ∇ × E = −(1/c)∂B/∂t, then ∂(∇ · B)/∂t ≡ 0.
(This means that ∇ · B is constant over time at each point, and hence identically
zero at a given point if it vanishes at that point for even one value of t.)
Problem 3.3. Show that if every “Observer Y ” observes that ∇y · B y = 0,
then X will observe that
1 ∂B x
∇x × E x = − .
c ∂t
Problem 3.4. Derive the divergence equation for the electric field E from the
curl equation for the magnetic field B, assuming that at each point there is a time
when ∇ · E = 4πρ at that point.
Problem 3.5. Assume that the charge density and current density are zero.
Show that in this case, all the components of B and E satisfy the homogeneous
three-dimensional wave equation
∂2u 2 ∂2u ∂2u
2 ∂ u
= c 2
∇ · ∇u = c + + 2
∂t2 ∂x2 ∂y 2 ∂z
= c ∇ u ,
2 2
127
128 INTRODUCTION TO PART 2
The present book aims to take the reader from the familiar Newtonian basic law
of mechanics F = ma to the Einstein field equation that replaces it, traditionally
129
130 4. PRECESSION AND DEFLECTION
written as
8πG
Gμν = Tμν .
c4
(Our own notation for this relation, in Chapter 7 below, will appear bizarre; it
will certainly not be found in any other book. We have been led to this peculiar
notation in a desperate—and perhaps futile—effort not to use the symbol G in
more than one sense within the covers of a single book, and especially not within a
single equation, as is done here.)
The road from Newton’s second law of motion to the Einstein field equation is
long and arduous. In a sense, the change in point of view arises out of Newton’s first
two laws. The notion of force is an elusive one, since many forces (gravitational,
electric, magnetic, and the like) are invisible. By Newton’s two laws, the way we
know a force is acting on a particle is to observe it and see if it moves in a straight
line at constant speed. If it doesn’t, some force is required to explain its motion. If
we introduce four-dimensional space-time, we can see that the unforced motion of
a particle is graphed as a straight line in that space-time. Since a straight line is a
geodesic—for present purposes, the shortest path joining two of its points, we can
rephrase Newton’s first law by saying that an unforced motion is along a geodesic
in the Euclidean metric. We thereby get a rough equivalence: a curved world-line
is the indicator of a net force. But, since force itself is invisible, what we observe
is the curvature. Here we have the germ of an idea: Get rid of force entirely, and
let the motion simply be along a geodesic in a curved space-time. That is, impose
a non-Euclidean metric on space-time in which the observed path is a geodesic.
That road, as we say, is very long. For that reason, in the present chapter
we are going to get ahead of the story and see how the end result (the Einstein
field equation) explains the precession of Mercury and the deflection of light around
the Sun. We shall reserve the motivation for the field equations to later chapters.
Since this departure from the Newtonian view is a radical one, for psychological
reasons, we consider whether some less drastic approach might work. There are two
such approaches that might come to mind. The first would be to retain Newton’s
equations of motion, but interpret force as in special relativity.1
The second would be to accept the geometric point of view, but try to im-
pose the metric on three-dimensional Newtonian space rather than four-dimensional
space time with its unusual metric.2
I confess in advance that both of these approaches fail to explain the preces-
sion of the perihelion of Mercury. Nevertheless, they do bring out some important
concepts that are of use in general relativity. We emphasize that these two explo-
rations are present for psychological and pedagogical purposes only, and the reader
can choose to omit them without loss of continuity. We make no claim that they
provide a logical transition or connection between special and general relativity.
General relativity is not merely a perturbation of special relativity.
1 An anonymous reviewer, to whom I am grateful, pointed out that this idea was anticipated
and fully developed by the Dutch physicist Willem de Sitter (1872–1934) in 1911 [10].
2 As with the previous approach, I am grateful to an anonymous reviewer who pointed out to
me that what I have done here is subsumed as a small part of two papers ([6], [7]) by Elie Cartan
(1869–1951).
1. GRAVITATION AS CURVATURE OF SPACE 131
3 The germ of this idea had occurred much earlier, to Einstein’s Swiss compatriot Leonhard
Euler (1707–1783). In his 1744 work on the calculus of variations, written while he was at the
Prussian Academy of Sciences, he showed that a particle moving on a surface and not subject to
any tangential forces would move along a geodesic. For information on Euler’s equation in the
calculus of variations and on geodesics, see Appendix 2. We really ought to be using Gauss’ term
shortest path rather than geodesic at this point, since a shortest path remains a shortest path
when reparameterized, but a geodesic has only one parameterization. Nevertheless, we shall save
time and words by just using the term geodesic, since the second-order differential equation from
which they are constructed is the same in both cases.
4 Le Verrier had noted the unexplained precession of the perihelion of Mercury by an amount
that he calculated to be 38 seconds of arc per century more than the approximately 5557 seconds
that could be accounted for through perturbations caused by the other planets. He thought
the discrepancy might indicate the presence of an undiscovered planet near the Sun. He tended
to think along these lines, having been the co-discoverer of Neptune, to which he was led by
calculating its perturbation of the orbit of Uranus.
132 4. PRECESSION AND DEFLECTION
passive equations. The former are the broad general laws of physics, like the field equations of
general relativity, applicable to idealized models and arrived at by inverting what Sternberg calls
a Legendre transformation. The latter are particular applications of them intended to apply in
practical cases, with the above-mentioned exclusion of small effects.
2. FIRST ANALYSIS: NEWTONIAN ORBITS 133
attraction −(GM m/r3 )r. This principle, in relation to falling bodies, was stated
by Galileo, and there is a famous legend of his having dropped two weights from
the Leaning Tower of Pisa to test it. The principle distinguishes gravitational force
from other forces such as, for example, electrostatic forces, which are determined
by charge rather than mass. (Correspondingly, charge, unlike mass, is independent
of velocity.) Everyone has noticed that when a traffic light turns from red to green,
a large truck will accelerate more slowly than a small sports car next to it in the
adjacent lane. That is because the motive forces per unit mass in the two are dif-
ferent. But if the two vehicles were driven off a cliff simultaneously, each would
experience the same acceleration, and they would remain side-by-side all the way
to the bottom.
According to Clifford M. Will ([86], Chapter 2), Newton tested the principle,
since it implied that the period of a pendulum was determined only by its length
and was independent of the weight and material of the pendulum bob. Later,
the Hungarian nobleman Baron Roland von Eötvös6 (1848–1919) made a more
sophisticated test, using the fact that the gravitational attraction of the Earth and
the centrifugal force due to its rotation, at any point between the poles and the
equator, are not lined up. The two components of the acceleration of a body due to
these forces are respectively inversely proportional to the gravitational and inertial
mass. Eötvös found no difference in the two.
A corollary of this principle is that one can use gravitational acceleration as
a unit of force, as we do when we talk about g-force. The apparent increase in
weight that we feel when an elevator we are in begins to go upward or an airplane
we are aboard takes off is an example of this equivalence. We feel heavier until the
elevator or airplane reaches its “cruising” speed, as if the gravitational field of the
Earth had suddenly increased and (in the case of the airplane) changed direction.
2.1. Kepler’s second law: conservation of angular momentum. Our
first observation is that, because the force is central, the angular momentum per
unit mass l, which is r ×r , is conserved, as we proved in Chapter 2 (Theorem 2.10).
As a result (Corollary 2.2), the motion is confined to the plane perpendicular to
the constant vector l, in which we shall henceforth use polar coordinates (r, θ).
The vector equation now becomes a system of two differential equations. Taking
r(t) = r(t) cos θ(t) i + r(t) sin θ(t) j, we find that these two equations are
GM
r − r(θ )2 = − ,
r2
2r θ + rθ = 0.
The second of these equations merely expresses the conservation of angular
momentum per unit mass, as established in Theorem 2.10. This is Kepler’s second
law.
Because of this law, after we prove that the orbit is an ellipse, we shall be able
to express the magnitude of the angular momentum per unit mass l as follows,
using only the average distance (a) from the Sun, the eccentricity of the orbit (e),
and the period (T )
√
2πab 2πa2 1 − e2
l= = ,
T T
where a and b are respectively the maximum and minimum distances from Mercury
to the Sun (the distances at aphelion and perihelion respectively), and T is the time
elapsed between successive perihelia. When we work the same problem using the
theory of relativity, we shall be able to eliminate l and T entirely from the differential
equation of the orbit. But we need these constants now in order to determine a
certain distance that is required to make the relativistic computation possible.
The average distance from the Sun to the planet is interpreted as the average
of the distance raph at aphelion and its distance rperi at perihelion. That distance
is the semi-major axis of the elliptical orbit, denoted a here. It will follow once
we have proved that the√orbit is in fact an ellipse (Kepler’s first law) that the
semi-minor axis is b = a 1 − e2 = 5.66715 × 1010 meters. The eccentricity e is
(raph − rperi )/(raph + rperi ), so that
4raph rperi
1 − e2 = .
(raph + rperi )2
We can therefore write the angular momentum per unit mass as
√
π(raph + rperi ) raph rperi
l= .
T
The fact that the motion is planar (direction of the angular momentum vector is
constant) and that angular momentum is conserved (its magnitude is also constant)
both follow from the fact that the force is central. In the relativistic model, the
analogous feature of the problem is that the space-time metric is radial, that is, the
coefficients of the differentials depend only on the radial coordinate ρ.
When we replace θ by l/r 2 in the first of Newton’s equations, we get the
fundamental differential equation for r:
1
r = 3 (l2 − GM r) .
r
To get an estimate of the magnitudes we are dealing with here, let us calculate
some numbers specific to the orbit of Mercury. The observational data are as
follows:
Distance from the Sun at perihelion: rperi = 4.60012 × 1010 meters;
Distance from the Sun at aphelion: raph = 6.98169 × 1010 meters;
Average distance from the Sun: a = 12 (rperi + raph ) = 5.7909 × 1010 meters;
Eccentricity (since, as we are about to prove, the orbit is an ellipse):
raph − rperi
e= = 0.20563 .
raph + rperi
Orbital period: T = 87.969 × 86, 400 = 7.60052 × 106 seconds.
From these data, we calculate that:
GM rperi = 6.10939 × 1030 m2 /s2 ,
GM raph = 9.27234 × 1030 m2 /s2 ,
l2 = 7.3603 × 1030 m4 /s2 .
Our computation of l2 is obtained by dividing twice the area of the ellipse
(2πab) by the orbital period (T ) to get the angular momentum l per unit mass,
then squaring the result. Since the angular momentum l per unit mass is twice the
rate at which area is swept out and is a constant r2 θ , it follows that the tangential
2. FIRST ANALYSIS: NEWTONIAN ORBITS 135
2.2. Kepler’s third law: period vs. mean radius. Because r is missing
from the differential equation, we can execute the usual trick, setting p = dr/dt, so
that
r = dp/dt = (dp/dr)(dr/dt) = p (dp/dr) .
1 2 GM l2 GM l2
p = − 2− + 2 .
2 r 2r rperi 2rperi
That is,
dr 2 2GM r − l2 2GM rperi − l2
= − 2 .
dt r2 rperi
This equation has no real solutions unless some value r(t) is such that the right-
hand side is positive. If this condition is met, then the solution r(t) will always stay
between the perihelion and aphelion values. As it approaches one of these values,
its derivative tends to zero, and its second derivative, as shown above, is negative
at aphelion and positive at perihelion, meaning that r will return to the region
after reaching its extreme value, that is, to the region where the right-hand side is
positive, rather than crossing into the region where it is negative.
Taking r = raph , we find, since dr/dt = 0 at this point also, that
1
1 1 2 1 2
2GM − =l 2
− ,
rperi raph rperi raph
136 4. PRECESSION AND DEFLECTION
that is,
r+ rperi 2l2
aph
2GM = l2 =
raph rperi a(1 − e2 )
π 2 raph rperi (raph + rperi )2 raph + rperi
=
T2 raph rperi
2 3
π (raph + rperi )
= ,
T2
and this means (since a = (rperi + raph )/2) that
a3 GM
= ,
T2 4π 2
that is, the cube of the average distance to the Sun divided by the square of the
period is a constant determined by the mass of the Sun and the gravitational
constant. It is independent of the particular planet whose orbit is being calculated.
This is Kepler’s third law, historically a very important one, since it suggested the
inverse-square law of gravity in the first place. For Mercury, a3 /T 2 = 3.36165×1018
cubic meters per second-squared, and GM/(4π 2 ) = 3.36228 × 1018 cubic meters per
second-squared. Considering that all four of the quantities in these two equations
are obtained from independent measurements, this is remarkably good agreement
between theory and observation.
using only special relativity. Retaining Newton’s law of gravity, but taking the force
to be the relativistic force of Chapter 2, we face the system of differential equations
dp GM m
=− r,
dt (r · r)3/2
−1/2
where now m = αm0 , p = mr , and α = 1 − (r · r )/c2 . Because the force is
central, the motion will be planar; that is, relativistic angular momentum will be
conserved. Although the computations are tedious—Mathematica can save a great
deal of work here (see Problem 4.21 below)—one does eventually get the interesting
second-order differential equation
dθ 2 dr 2 d2 r
(4.2) r r2 + 2 − r 2 = GM .
dt dθ dθ
Letting u = 1/r, and recalling that (θ )2 = l2 /(α2 r 4 ), where l is the constant
relativistic angular momentum, we are led to the equation
GM α2
u + u = ,
l2
where
the independent variable is the angle θ. We shall assign the Newtonian value
GM a(1 − e2 ) to the relativistic angular momentum l (see Problem 4.22), since
we are going to be doing approximations. With a little reflection, we find that α2
can be eliminated from this equation, leading to the final version of it:
GM GM
(4.3) u + u = 2 + 2 u2 + (u )2 .
l c
This relation shows that the relativistic equation is essentially a perturbation
of the Newtonian equation, the perturbation term being of the order of u2 , since
typically (u )2 is very small, unless the eccentricity of the orbit is very large. Pertur-
bations of that magnitude will also appear in the next two modifications of Newton’s
equation of motion. The use of the word perturbation here is not intended to imply
that general relativity is in any sense a mere perturbation of special relativity, much
less of Newtonian mechanics.
Even Mathematica finds a closed-form solution of this differential equation to
be beyond its power and simply gives it back when asked to solve it. We are
not interested in solving it exactly at the moment and have presented it only to
show what happens when we try to investigate planetary orbits using relativistic
mechanics—what Poincaré in the quotation above called the “new Dynamics.”7
Nevertheless, there is a way of solving it that one might call the “Catch-22” method.
By trying plausible forms for the inverse function θ(u), one finds that this inverse
function can be can be expressed in quadratures (see Problem 4.23), leading to
an intriguing reduction of this equation to an equivalent form that Mathematica
can solve.8 But one has to give the equivalent form to Mathematica, which does
not discover it on its own. Even then, Mathematica only gives back the inverse
function that was used to produce the reduction in the first place. In other words,
Mathematica is of no help in solving this equation. The reason for the bizarre
name we have given this method is that it is, to say the least, unusual to require
7 Poincaré himself, inspired by the work of Lorentz, created a considerable portion of special
solved by the standard technique of letting u = p, u = p dp/du; any sophomore can do it.
3. SECOND ANALYSIS: NEWTON’S LAW WITH RELATIVISTIC FORCE 139
....
........
....
..
...............................................................................
........
.... .................. .... .....................
............
. ..
..
.. ... ...........
......... ...
.........
..
.......... ...
.......
......
........
. .
.
......
......
.
....... .
.
. .....
...
..... .
.
.
.....
.....
....... .
.
. .....
.
..
.... .
.
. .....
.....
...... .
.
. ....
....
. .
.
. ....
...
..... .
.
. ...
.... .
.
. ...
...
...
. ..
. ...
... ..
. ...
... ..
. ...
...
..
. ..
. ...
..
. ..
. ...
.... .... ...
...
.. .
. ...
.... .... ...
... ... ...
... ... ...
... ... ...
...
.... ... ..
... ... ..
.
...
... aphelion
Sun
.
..................................................................................................................................................................................................................................................................................................................................................
.
..
...
.
.
... perihelion
... ... ....
... .
. ..
...
... .... ...
... ... ...
... ... ...
... ... ..
.
... ... ...
...
... ... ...
... ... ....
... .
. .
.
... .... ...
... ... ...
...
... ...
...
.... .
. .
....
....
....
.... .... ...
.... ... ....
.... ... ...
..... .
. ..
.......
..... ....
.....
...... .... .....
... .....
......
... ......
......
...... .
. ..
.........
....... ...
........ .... .......
........ ... .......
......... ........
............
................. ... .....................
.............................................................................. .
. ..
..
..
..
..
..
...
..
...
...
.
larger than about 0.01. Figure 4.2 shows the case a = 2, b = 0.02 with an assumed
9 Another example of the method comes from the Galois group of an equation, from which
one can determine whether or not the equation is solvable through the application of a finite
number of algebraic operations. The trouble is, in most cases (not all), one has to know the roots
already in order to compute the group. Once you know the roots, it is probably not an interesting
question whether they could have been obtained by algebraic operations.
140 4. PRECESSION AND DEFLECTION
.
..........
....
.
.................................... ..
.................. ....... ... ... ... .....................
......... . ... ... ... ... ... .... .......
.. .
..... . .. .................. ..............................................
....
..... ........ ................ ..
..... . .... ............................
...... ...... ........... ... ................
. . ...........
... .... ...... ... ........
.... ... ........ .
. ........
.......
.. .. ....
. ..
. ........
... .. .... ..
. ........
.... ... .... ..
. .......
... .. ... Sun .... ......
.... ... ....
.. .. .....................................................................................................................................................................................
...
.
.......
.. . .. .
. ... ..
..... ... ..
. ..... ...
.
..... .. .
..... .. .... ..... ...
... ... ... .... ...
.... ... ... .. .. ..
...... .
. .. ...... ....
....... . .
......... .... ... ... ...
.......... ... .... .. ..
... .... . ..
..... ......
... ....... .
. .
..
...... .... .....
.
... .. .
...... ......
...... ......... .... ........... ..... .....
........ ........... ......... ... ....
......... ... ........... .. .... .....
..... .... . ................................... ............ ..........
...... .. ... ..
....... . ... ... ... ... ... ...... ... . ... .............
......... .
.................. .........
...........................
initial value u(0) = 3. In the case shown in Fig. 4.1, with actual data for the
orbit of Mercury, we have a = 1.80309 × 10−11 m−1 and b = 1476.9 m, so that
ab = 2.66298 × 10−8 , which is obviously much less than 0.01.
10 The Greeks, especially Eudoxus and Archimedes, were able to carry out some infinitesimal
reasoning, in a rather ponderous way. By considering all possible integer multiples of two quantities
of the same type, they were able to say what a ratio was with infinite precision. Actually, that
last statement is not quite accurate. They were able to give meaning to the statement that the
vaguely defined, intuitive quantity they called a ratio was equal to another such ratio, with infinite
precision. That is, they could talk meaningfully about exact proportion. Assuming a trichotomy
law for any two ratios, they were then able to reason by reductio ad absurdum using what we
would call rational approximations to the ratios to prove that two ratios were equal to each
other in that infinitely precise sense. Newton, one of the pioneers of calculus, used infinitesimal
reasoning, saying explicitly that he was doing so to avoid the tedium of having to write everything
out according to the method of the ancients.
11 The choice of c to denote curvature looks natural in English, although it actually is a bit
odd, since Hertz wrote in German, where the word is Krümmung. We shall repay the linguistic
compliment by using the letter κ for curvature, since we are already using the letter c for the
speed of light.
142 4. PRECESSION AND DEFLECTION
By the late nineteenth century, Henri Poincaré was writing that geometry and
physics were to some degree interchangeable at their interface. If one wanted space
to be Euclidean, that could be arranged by choosing a suitable list of forces to
explain motion on the basis of Newton’s second law of motion. But Poincaré also
realized that the opposite was possible. The notion of force can be eliminated en-
tirely if a suitable non-Euclidean geometry is assumed. The trajectory of a particle
can then be described as a geodesic. Such a principle (Fermat’s principle) works
well in explaining the refraction of light, for example, where the concept of force
does not enter. Mathematically, such an equivalence is to be expected, since the
Euler equations for extremals in the calculus of variations are second-order partial
differential equations, just like the dynamic equations used to formulate the laws
of mechanics.
The Euler equations are explained in Appendix 2 of Volume 2. Their appli-
cation in geometry will be explained piecewise as we proceed. The best way to
begin is to note that on the infinitesimal level, the Pythagorean theorem allows us
to rectify a curve in R3 by integrating its element of arc length ds, given by the
equation
ds2 = dx2 + dy 2 + dz 2 ,
so that the length of a curve is
2 2 2
dx dy dz
+ + ds ,
ds ds ds
and volumes are given similarly on the infinitesimal level as
dV = dx dy dz .
When we change coordinates, for example, to use spherical co-latitude (ϕ) and
longitude (θ) along with the radius (ρ), these formulas become
ds2 = dρ2 + ρ2 dϕ2 + ρ2 sin2 ϕ dθ 2 ,
dV = ρ2 sin ϕ dρ dϕ dθ .
These formulas are typical of a general rule that allows us to define distance and
n-dimensional volume on a manifold12 of dimension n parameterized by coordinates
(x1 , . . . , xn ). To do so, we use the matrix M that we called the matrix of metric
coefficients in the preceding chapter. (Einstein called it the fundamental tensor
and denoted it g. This is the last time we can use the letter M for it, since we are
henceforth reserving that letter for the mass of the Sun.)
⎛ ⎞
g11 · · · g1n
⎜ .. ⎟ .
M = ⎝ ... ..
. . ⎠
gn1 ··· gnn
We then define the elements of arc length and volume as
n n
ds2 = gij dxi dxj ,
i=1 j=1
dV = det(M ) dx1 · · · dxn .
Remark 4.2. In setting Newton’s law of gravity as our goal and making it the
equation of a geodesic, we are assuming the principle of equivalence mentioned in
Remark 4.1 above. That is, the geodesic traversed by a particle will be independent
of its mass, determined only by the metric in space. That principle will thus be
part of our geometrized celestial mechanics. The same will be true, when we reach
our goal of giving the relativistic law of gravity. Thus the principle of equivalence
will be a testable part of general relativity.
13 Less traveled because Einstein built a superhighway to bypass it. But not untraveled; see,
The reason for modifying only the coefficient of the radial coordinate is that
gravitational force is a central force and acts along radial lines. Although our quest
will fail, it will nevertheless achieve a considerable amount, stopping just short of
the result that we want.
The geodesics
in the unperturbed metric are easily computed, being the map-
pings t → x(t), y(t), z(t) that minimize the integral
2 2 2 12
1 dx dy dz
dt = + + dt .
v0 dt dt dt
Since t, x, y, and z do not appear in the integrand, Euler’s equations say that
there are constants cx , cy , cz , x0 , y0 , and z0 such that
x(t) = x0 + cx t ,
y(t) = y0 + cy t ,
z(t) = z0 + cz t .
In other words, the geodesics are straight lines in the Cartesian sense, traversed
at a constant speed, as we already knew.
Let us now switch to spherical coordinates, in which the unperturbed metric
has an infinitesimal time increment given by
1
dt2 = 2 dρ2 + ρ2 dϕ2 + ρ2 sin2 ϕ dθ 2 .
v0
We are going to modify this metric by perturbing the coefficient of ρ only,
since gravitational force has no tangential component. How shall this be done?
We expect the influence of gravity to wane with distance, so that the perturbation
should disappear at infinity. Thus, the function g(ρ) that we assumed above ought
to tend to 1 as ρ → ∞. It is simpler to deal with the logarithm of g(ρ), which will
tend to 0 as ρ → ∞. Accordingly, we assume that the perturbed metric has the
form
1
(4.4) dt2 = 2 eλ(ρ/ρs ) dρ2 + ρ2 dϕ2 + ρ2 sin2 ϕ dθ 2 ,
v0
where λ(ρ/ρs ) is written as a function of the ratio of ρ to a fixed distance ρs ,
since mathematical functions generally accept only pure numbers as input. The
fixed distance ρs can be any unit. We are going to choose it so as to get the
closest reasonable approximation to Newton’s equations. The value we choose will
be called, for reasons that will become clearer as we proceed, the “Newtonian
Schwarzschild radius.” (The relativistic version of it, corresponding to v0 = c, is
an extremely important quantity.) If v0 , the speed of information, were infinite,14
we would be forced to set dt = 0 and replace the indeterminate form v0 dt = ∞ · 0
by a finite spatial metric ds. In that case, we would have ρs = 0, ρ/ρs = ∞,
and eλ(ρ/ρs ) ≡ 1 for all r; the square-distance metric would then be the usual
Euclidean metric. (As we have already remarked, there is a sense in which v0 = ∞
in Newtonian mechanics.) But we are going to depart from that principle and
assume temporarily that v0 is finite, since we don’t want dt = 0. Our object is
to choose the function λ so as to get Newton’s equations of motion out of the
Euler equations for a geodesic. Before beginning this project, we note that two
14 Imagine the “subspace transmissions” that science-fiction writers sometimes invoke to solve
preliminary results from the Newtonian theory already, independently of the choice
of λ, follow from the equations for a geodesic.
First, the Euler equation on the co-latitude angle ϕ is
d 2 2
2ρ ϕ = 2ρ2 sin ϕ cos ϕ θ .
dt
This equation has a unique solution given initial values of the co-latitude ϕ(t0 )
and its derivative ϕ (t0 ). Without stretching our imaginations too much, we can
suppose that there is some time t0 at which ϕ(t) achieves a local minimum or
maximum value ϕ0 .15 If the axes are rotated suitably, we can arrange that ϕ0 =
π/2. Since the point is a local extremum, we have ϕ (t0 ) = 0. Now the equation just
written is certainly satisfied by the constant function ϕ(t) = π/2, and this solution
also satisfies the initial conditions. Therefore it is the only solution. Thus, ϕ drops
out of the discussion, and so we have achieved the reduction to a planar problem.
This is the same reduction we got by Newtonian reasoning from the vector relation
d
(r × r ) = 0 .
dt
Being now in the plane, our spherical coordinates ρ and θ are the same as polar
coordinates r and θ. Accordingly, we change ρ to r and ρs to rs . Our metric then
becomes
1
dt2 = 2 eλ(r/rs ) dr 2 + r 2 dθ 2 .
v0
Our first preliminary result is now established.
The second preliminary result is Kepler’s second law (conservation of angular
momentum). It follows from Euler’s equation on the variable θ, which says
d 2
2r θ = 0 .
dt
As a consequence, the angular momentum per unit mass given by l = r 2 θ , is
constant. Written out in full, Euler’s equation says
dr dθ d2 θ
4r + 2r 2 2 = 0 .
dt dt dt
This equation is trivially the same as the second of Newton’s two equations of
motion, which, we recall from our proof of Kepler’s first law, have the following
form in polar coordinates:
d2 r GM dθ 2
(4.5) = − + r ,
dt2 r2 dt
d2 θ 2 dr dθ
(4.6) = − .
dt2 r dt dt
Again, this result is independent of any choice of λ.
It now looks as if we have already come very near to the goal of geometrizing
the motion of a planet, and we have not yet introduced any perturbation into the
metric. We just need to choose λ so as to get the first of Newton’s two equations,
15 If not, by the intermediate-value theorem for derivatives, the co-latitude must be forever
increasing or forever decreasing. But such a phenomenon has never been observed in any planet.
146 4. PRECESSION AND DEFLECTION
and for this purpose we have two tools. One of them is Euler’s equation on the
variable r, which says
d λ(r/rs ) dr 1 r λ(r/rs ) dr 2 dθ 2
2e = λ e + 2r ,
dt dt rs rs dt dt
d2 r 1 r λ(r/rs ) dr 2 dθ 2
2eλ(r/rs ) 2 = − λ e + 2r ,
dt rs rs dt dt
d2 r 1 r dr 2 2
−λ(r/rs ) dθ
= − λ + re .
dt2 2rs rs dt dt
The other tool is the result of dividing the equation for the metric by dt2 . This
equation can be solved to show that
dr 2 dθ 2
= e−λ(r/rs ) v02 − r 2 .
dt dt
(This equation is really just an integral of the preceding one.)
When we insert this value of (dr/dt)2 into the previous equation, we have the
second-order differential equation
d2 r v02 r −λ(r/rs ) −λ(r/rs ) r r dθ 2
=− λ e + re 1+ λ .
dt2 2rs rs 2rs rs dt
We need this last equation to be Newton’s other equation of motion, that is,
Eq. (4.5). Hence the right-hand side must be
GM dθ 2
− 2 +r .
r dt
The first term in the equation for d2 r/dt2 is
v02 d −λ(r/rs )
e ,.
2 dr
It appears then that we need to choose λ so that
d −λ(r/rs ) 2GM
e =− 2 2 .
dr v0 r
By integrating, we see that we need
2GM 2GM rs
e−λ(r/rs ) = C + =C+ .
rv02 rs v02 r
Since the left-hand side of this last equation tends to 1 as r → ∞, we see that
C = 1. We have not yet chosen the Newtonian Schwarzschild radius rs . We now
do so, setting
2GM
rs = .
v02
Then the perturbation factor eλ(r/rs ) is r/(r + rs ). Thus, we have
r
λ(r/rs ) = ln ,
r + rs
and
r r rs rs2
λ
s
= − = .
rs r r + rs r(r + rs )
We now have one of the two terms we were seeking, and the function λ is
completely determined.
4. THIRD ANALYSIS: NEWTONIAN ORBITS AS GEODESICS 147
4.3. A possible new approach to gravity? Despite this failure, our effort
to geometrize the physics of planetary orbits as geodesics in a non-Euclidean metric
has yielded one positive result toward such a geometrization, namely conservation
of angular momentum. And it comes very close to predicting the actual orbit of a
planet. If rs is very small compared with the average distance from the planet to
the Sun (and it assuredly is if v0 is the speed of light, as we shall see), that term
can be neglected, and the geodesic equations will coincide with Newton’s laws.
Perhaps we were not sufficiently imaginative when we made it our goal to get
the equations Newton left us. This new, geometric point of view has brought to our
attention a perturbation of them that may well yield the same predictions within
the limits of physical measurement. Dare we hope that it might even do better and
explain phenomena for which Newton’s law of gravity has not proved adequate?
Let us explore this possibility. We begin by stating the result we have found as a
formal theorem:
16 This fact makes the geometry of the perturbed metric difficult to visualize. If the ratio
tc /tr were less than 2π, we could imagine the circles being circles of constant latitude on a sphere,
with the radius of a circle being the latitude in question. But how does extra length get “crammed
into” a circle without increasing its radius?
4. THIRD ANALYSIS: NEWTONIAN ORBITS AS GEODESICS 149
between geometry and energy, in the form of the following relation, which amounts
to a direct proportion between two energies and two distances:
(4.10) ρs T + ρV = 0 .
Equation (4.10), which says ρ = −ρs T /V , is the best clue we will have when we
geometrize gravity in Chapter 7. The most difficult challenge we will face is that
of adjusting our mathematical formulation of physical intuition so as to replace the
Newtonian relation F = mr with the differential equations of a geodesic. The
connecting link turns out to be the fact that the Newtonian potential energy is in-
versely proportional to ρ, which is the variable that occurs in the metric coefficients
of the manifold that represents space-time in the presence of an attracting particle.
The metric coefficients that we shall encounter will be recognized as analogs of
potential energy, and the equations of motion will be expressed in terms of them
and their partial derivatives.
4.4. Geometrized astronomy. Let us now explore this new approach and
see how well it explains the things we already know. Our fundamental relation is
the metric relation, which for a particle moving in a plane, becomes, after being
divided by dθ 2 ,
r dr 2 dt 2 v2 r4
= v02 − r 2 = 02 − r 2 .
r + rs dθ dθ l
Here l is the constant angular momentum per unit mass, depending on the
particular orbit the particle is traversing. We rewrite this relation as
dr 2
v0 r 2
= r(r + rs ) −1 .
dθ l
Since r > 0 and rs > 0, we see that the equation for (dr/dθ)2 imposes a lower
limit on r, namely r ≥ l/v0 . Circular orbits where r has this lower limit as a
constant value do satisfy the geodesic equation. But this same expression gives
us the very bad news that elliptic orbits are impossible! In an elliptic orbit, we
must have dr/dθ = 0 at two different positive values of r, namely perihelion and
aphelion. That is impossible, given that v0 and l are constant for a given orbit.
That is a disastrous failure of our project,17 since it means we cannot get Kepler’s
first law.
Despite this setback, there is insight of importance to our project that can be
gained by pursuing the analysis a little longer. As in the analysis we gave based on
Newton’s laws, we find that this equation looks better if we replace r by u = 1/r,
then differentiate with respect to θ and cancel 2 du/dθ. The result of doing so is
the equation
v 2 rs 3 GM 3
u + u = 0 2 − rs u2 = 2 − rs u2 .
2l 2 l 2
This equation is Newton’s equation derived above, only with the perturbation
term −(3/2)rs u2 . Of course, when we do the usual trick of replacing u by p dp/du,
where p = du/dθ and integrating from a value of u where p = 0 (in this case the
value u = u1 = v0 /l), we get the previous equation back, and so we can apparently
17 In case the reader has forgotten, our new project was begun after the failure of our at-
tempt to get Newton’s equations of motion out of geometry, and represented an attempt to study
astronomy using the perturbed Newtonian equations that we were able to derive from geometry.
150 4. PRECESSION AND DEFLECTION
not get much comfort from the fact that this equation is a small perturbation of
the Newtonian equation.
Nevertheless, it will be useful to consider this equation in the abstract, since
it will appear again in the relativistic discussion. We state the main results as a
formal theorem.
Theorem 4.3. Consider the second-order ordinary differential equation
2
u (θ) + u(θ) = α + β u(θ) ,
where α > 0, β is an arbitrary real number (positive, negative, or zero), and u(θ)
is required to assume only positive values. Let u1 = u(θ1 ) > 0 be a local maximum
value of a solution u(θ), Then, either u(θ) decreases for all θ > θ1 , or it assumes
a positive minimum value u2 = u(θ2 ) and thereafter remains in the range u2 ≤
u(θ) ≤ u1 for all θ > θ1 .
If u1 is a local minimum value of u and β < 0, then u(θ) increases toward a
finite value u2 > u1 that it cannot exceed. If it reaches that value, it will thereafter
remain in the interval u1 ≤ u(θ) ≤ u2 . If β > 0, there may or may not be such a
value u2 . If there is not, u(θ) will increase without bound.
Proof. Since we are going to be taking u to be the reciprocal of the radius
vector from the Sun to an orbiting object, only positive values of u can be con-
sidered. From basic geometry, the orbit necessarily has a perihelion, where the
reciprocal of the radius vector is maximized at value u1 . We then look for places
where du/dθ = 0 and u < u1 . (It would be of no use to find such values with
u > u1 , since those values are inaccessible from the local maximum u1 .)
We now set u = p(dp/du), where p = du/dθ, as we have done twice before.
Since p = 0 when u = u1 , we have
u
1 2 β 1
p = (βu2 − u + α) du = (u3 − u31 ) − (u2 − u21 ) + α(u − u1 )
2 u1 3 2
β 1 1
= (u − u1 ) (u2 + u1 u + u21 ) − u − u1 + α
3 2 2
β 2 β 1 β 1
= (u − u1 ) u + u1 − u+ u21 − u1 + α .
3 3 2 3 2
We need to know if the quadratic factor here has a positive root u2 that is less
than u1 . To that end, let u = w + u1 . We rewrite the quadratic factor as
β β 1 β 1 β 1
(w + u1 )2 + u1 − (w + u1 ) + u21 − u1 + α = w2 + βu1 − w+D,
3 3 2 3 2 3 2
where D = βu21 − u1 + α = u (θ1 ). Since u(θ1 ) is a local maximum of u, we see
that D < 0. The product of the roots of this quadratic expression in w is 3D/β.
If β > 0, that product is negative, and so there is exactly one negative root
w2 = u2 − u1 , where u2 < u1 , and one positive root u3 − u1 , where u3 > u1 . In this
case, since the leading coefficient (β/3) of this quadratic polynomial is positive, the
polynomial is negative between the roots, which means it is negative at least for
w2 ≤ w ≤ 0, that is, u2 ≤ u ≤ u1 . Multiplying it by u − u1 makes it positive again
in this range, and hence makes it a possible value of p2 /2.
Since u must begin to decrease (as a function of θ) after passing through the
local maximum, it will remain in a range where p can have real values (that is,
p2 > 0) unless it reaches the local minimum value u2 < u1 . The only possible
4. THIRD ANALYSIS: NEWTONIAN ORBITS AS GEODESICS 151
difficulty in that case would occur if u2 were negative. That might well happen,
and in that case, we would be describing a nonclosed trajectory, since u cannot
cross 0. It might not get arbitrarily close to zero; at least conceivably, it might
approach a positive lower limit but never reach it.18
If β < 0, the product of the roots of the quadratic function in w is positive,
and so we have either no roots with w < 0 or two such roots. In the former case,
u1 is the smallest (and perhaps only) real root of the original cubic polynomial.
Since its leading coefficient is negative, this polynomial is negative for u1 < u < u2 ,
where u2 is the next root of the cubic, if any, and equal to infinity if there are no
other roots. Either way, the polynomial cannot yield possible values of p2 /2 in this
range. Thus u(θ) ≤ u1 for all θ. Since the solution has attained the value u1 at θ1 ,
it must be always less than or equal to u1 . Unless it is constant, it cannot reach
any minimum value, since p would vanish at such a value. It must therefore be
monotonically decreasing for θ > θ1 .
If there are two such roots with w < 0, then u1 is the largest of the three roots
of the cubic, say u1 > u2 > u3 . Since the leading coefficient of the cubic polynomial
is negative, the polynomial is positive for u2 < u < u1 . It follows that if u is in
this range, the polynomial provides possible real values for p, and u1 can be a local
maximum. As in the case β > 0, the function will decrease for θ > θ1 until it
either reaches the value u2 or approaches 0. Again, the latter case corresponds to
a nonclosed trajectory.
The situation when u1 is a local minimum of u(θ) is handled similarly and is
left to the reader. (See Problem 4.19.)
In the particular model we have been developing, we have been unlucky. We
have u1 = v0 /l, α = v02 rs /(2l2 ), and β = −3rs /2, so that βu21 /3 − u1 /2 + α =
−v0 /(2l), which is negative. It follows that there is no zero u2 of du/dθ between
0 and u1 , as we already knew from the explicit expression we had for du/dθ. The
project was sound, but nature did not cooperate. Facts are stubborn things, and
we must reconcile ourselves to this outcome. This is easier to do when we realize
that relativity enables us to carry out the idea to a most satisfying conclusion that
is an improvement on Newton.
4.5. Noncircular orbits. Before leaving this topic, we shall wring the last
tiny details out of the project we just aborted, details that will also find an echo in
the relativistic solution. Let us introduce a new angle ϕ—not the co-latitude angle
we used previously—via the relation
l
r= sec ϕ ,
v0
We then have
l2 dϕ 2 dr 2 l2
2 2
2 sec ϕ tan ϕ = = 2 sec2 ϕ tan2 ϕ(1 + (rs v0 /l) cos ϕ) ,
v0 dθ dθ v0
18 A little more can be said in this case. The solution cannot be a uniformly almost-periodic
function of time unless it is constant (that is, the corresponding orbit is a circle), since it can have
at most one local extremum. As Sternberg ([77], pp. 1–14) has pointed out, Ptolemy’s program
of explaining planetary motion by means of epicycles can be neatly interpreted—in a language
that would have been incomprehensible to Ptolemy—by saying that planetary orbits are uniformly
almost-periodic functions of time. (See Problem 4.18.)
152 4. PRECESSION AND DEFLECTION
so that
dϕ
dθ = √ ,
1 + μ cos ϕ
and
θ ϕ
dt
θ= dt = √ ,
0 0 1 + μ cos t
where μ = rs v0 /l and we choose the positive sign on the square root, since we can
orient the angles θ and ϕ to suit ourselves. Typically, as we shall see, μ is very small.
In fact, for a circular orbit of radius r, we see easily that μ = rs /r. The relation
between θ and ϕ just written is well known in the theory of elliptic functions,
expressed in the language of Mathematica by the mutually inverse relations19
2 ϕ 2μ
θ = √ EllipticF , ,
1+μ 2 1+μ
θ √1 + μ 2μ
ϕ = 2 JacobiAmplitude , .
2 1+μ
The function EllipticF (ϕ, m) is defined by the relation
ϕ
1
EllipticF (ϕ, m) = dt .
0 1 − m sin2 t
In our case, m = 2μ/(1+μ), so that μ = m/(2−m). The Maclaurin expansions
of these functions are
m
EllipticF (ϕ, m) = ϕ + ϕ3 +
3!
9m2 − 4m 5 225m3 − 180m2 + 16m 7
+ ϕ + ϕ + ··· ,
5! 7!
m
JacobiAmplitude (θ, m) = θ − θ 3 +
3!
m2 + 4m 5 m3 + 44m2 + 16m 7
+ θ − θ + ··· .
5! 7!
The integrand that defines the function EllipticF[x, m] is of period π, which
means that the difference EllipticF [x+π, m]−EllipticF [x, m] is a positive constant.
Thus, this function increases by a constant amount 2K[m] over each interval of
length π, where K[m] = EllipticF [π/2, m] is a constant known as the complete
elliptic integral of first kind. The graph of the function EllipticF [x, m] − x is shown
in Fig. 4.3 for m = 0.6. This is a rather large parameter value compared to what
we will actually have in our astronomical application, but for very small values of
m, the function increases too slowly to be noticeable over a short portion of the
graph. In the astronomical examples we shall be considering, m will be of the order
10−7 for a typical planetary orbit when v0 = c. Hence the equation θ = ϕ is very
nearly exact. The function EllipticF [x, m] − 2K[m]x/π has period π.
If we imagine that (r, ϕ) are polar coordinates that closely approximate the
standard coordinates (r, θ), we see that the new family of geodesics we have in-
troduced are simply the vertical lines x = r cos ϕ = l/v0 . This result is not very
impressive, although, when we replace ϕ by θ, these lines do bend, thereby form-
ing very shallow, roughly hyperbolic shapes, but with “wiggles.” Unfortunately
19 In the older language introduced by Jacobi, EllipticF (ϕ, m) was called the elliptic integral
√
of first kind with modulus k = m. The function JacobiAmplitude (θ, m) was its inverse and
written simply am (θ).
4. THIRD ANALYSIS: NEWTONIAN ORBITS AS GEODESICS 153
y
...
........ .
. .......
... .......
1 ................. .......
.... .. ..
...
..........
.
...
.... ......................................................
............
........
−5 −4 −3 −2 −1
..
... .......
.
. ....
........
... .. .. .. .. ... ............. ... ... ... ...
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................... x
.
... ... ... ... ............ .... ... ... ... ... ...
.
...
......... . .
.
....... .... 1 2 3 4 5
..
...
......... .
.
..
............ ....
......................................................
............ ..
........
.
........
....... −1 ...................
..
....... ...
..
also, they bend in the wrong direction, as if the attracting particle was actually a
repelling particle.
Let us summarize what we have achieved.
• A perturbed version of Newton’s equations of motion of a particle in the
gravitational field due to a particle of mass M located at the origin can
be obtained from the Euler equations for a geodesic, provided the element
of time dt in polar coordinates (r, θ) is required to satisfy
1 r
dt2 = 2 dr 2 + r 2 dθ 2 .
v0 r + rs
• The distance rs equals 2GM/v02 and is called the Newtonian Schwarzschild
radius. The ratio rs /r at a typical point in the orbit is the ratio of the
negative of the potential energy of the moving particle to its kinetic energy
at speed v0 . (We cannot simply incorporate the potential energy into our
equation, since a nonhomogeneous mathematical function can accept as
input only dimensionless variables. Thus we need a fixed energy as a
unit, one that can be specified in advance of any particular orbit. The
kinetic energy per unit mass at speed v0 seems like the only reasonable
candidate.)
• Besides the circular geodesic orbits predicted by both the standard Newto-
nian theory and the approach via geodesics, there are other geodesics that
are asymptotic to straight lines and can be expressed exactly in terms of a
new polar angle ϕ such that the standard polar angle θ can be expressed
in terms of ϕ as an elliptic integral of first kind:
θ = (2/ 1 + μ) EllipticF (ϕ/2, 2μ/(1 + μ)) .
• The geodesic approach unfortunately rules out elliptical orbits and hence
must be abandoned as a serious attempt at mathematical astronomy. It
can be retained, however, as a useful “toy” geometry exhibiting some
features that will prove to be important in the relativistic model we are
now going to study.
Remark 4.3. The Newtonian Schwarzschild radius ρs can also be described as
the radius such that the escape velocity at distance ρs equals the standard unit of
speed v0 . In relativity, where v0 = c, it is the radius of a (nonrotating) black hole
of mass M , and it will reappear in exactly this form when we discuss relativistic
154 4. PRECESSION AND DEFLECTION
gravitation. Right now, we note that a material particle moving at speed v0 has
equal kinetic and potential energies at the distance ρs from the fixed particle of
mass M . Without any force being applied to it, it could escape to infinity if located
outside that radius, but not if it were inside.
The Schwarzschild radius of the Sun (with v0 = c) is about 3 km, as will be
seen below. In general, this radius tends to be very small compared with other
quantities involved in astronomy. If the Sun were replaced by a body of this radius,
when seen from Earth it would subtend an angle of less than 1/120th of an arc-
second, far too small to be observed, even by the Hubble telescope, whose resolving
power is at best 1/20th of an arc-second.
On several occasions, we have indulged in mathematical whimsy, investigating
a relativistic version of Ptolemaic astronomy in Chapter 1 and two unfruitful ways
of analyzing planetary orbits in the present section and the one preceding. We shall
now indulge in an even more fanciful misapplication of physical laws, extrapolating
them ludicrously beyond the range of measurements for which they are valid. We do
so to make the simple point that the Schwarzschild radius is a very tiny dimension!
The difference in the order of magnitude of the Schwarzschild radius of an object
and the dimensions of that object is seen in stark relief on the subatomic level.
Obviously, we do get information about the structure of atomic nuclei; an
atomic nucleus is not a black hole. It follows that the radius of an atomic nucleus—
tiny though it be—has to be much larger than its Schwarzschild
√ radius. In fact,
the formula for a nuclear radius is rn = 1.25 × 10−15 3 n meters, where n is the
number of protons and neutrons in the nucleus. Since the mass of a proton or
neutron is approximately 1.674 × 10−27 kg, and G = 6.674 × 10−11 m3 /kg-s2 , while
c = 3 × 108 m/s, you can see that the Schwarzschild radius of a nucleus could
exceed the actual radius only if the nucleus contained roughly 1.1×1058 protons and
neutrons. (Of course, as we warned, this number is ludicrously outside the range of
atomic weights for which the nuclear radius formula is known to be approximately
true, and there is not the slightest reason to take this bit of fantasy as a serious
scientific argument.) For comparison, the number of protons in the observable
universe (the Eddington number 20 ) is 1.58 × 1079 . Since there are about one-fifth
as many neutrons as protons, the total number of protons and neutrons is about
1.9 × 1079 .
Remark 4.4. Looking ahead to the concepts of Chapter 5, we note that the
Gaussian curvature of the punctured plane with this metric is, if expressed in units
of length rather than time,
rs
κ(r) = 3 .
2r
Looking still farther ahead, to Chapter 6, we note that this plane is Riemannian
since its matrix of metric coefficients is positive-definite. Its Laplace–Beltrami
operator is
1 ρ2 ρ2 sin2 ϕ 2
ds2 = f (ρ) dt2 − g(ρ) dρ 2
− dϕ 2
− dθ .
c2 c2 c2
As with our Newtonian analysis, the reason for assuming a metric of this form
is provided by symmetry considerations. It seems reasonable that the perturbation
would be the same in all directions from the origin and at all times, hence should
depend only on the radial coordinate ρ and not on time or on the longitude or
co-latitude. It also seems reasonable that it would not change the infinitesimal
length on a path at right angles to a line from the origin near the point where
the path and the line intersect. Thus we apply the perturbation factors f (ρ) and
g(ρ) only in the two coordinates expressing the time t and the radial distance ρ. A
space-time metric of this form will henceforth be called radial. It corresponds to a
central attractive force in classical mechanics.
Again, since we would expect this metric to tend to coincide with the flat
special relativity metric at large distances, we assume that f (ρ) and g(ρ) tend to
1 as ρ → ∞. For that reason, as in the Newtonian case, we shall write them in
a form that will turn out to be simpler for computation, namely f (ρ) = eλ(ρ) and
plained phenomenon, in this case part of the precession of the perihelion of Mercury.
156 4. PRECESSION AND DEFLECTION
.........................................................................
................... .
............. ... ... ... ... ... ... ...... ... ... ... ... ... ... ... ... .. ........................
........... . ... ... ..
...
...
............ .... ... ... ... ... .
. ... ... .................
..
....... .... ... .. ... ... .........
..
. ... ... ........
........... ... ... .. ... ... .......
.
....... .. ... ... .. .........
...
........ ...... . .... ......
..... ......
.
....... ......
. .... .......
....
.... ...... .... .......
..... ..... .... .....
..
. .... .....
....... .... ... ........
..
.... ......
. ... ....
... .....
...... ...
. ... .....
.
..... ......
. ... ....
... ...
......... ... ....
.... ... ... ...
.... ... ... ...
......
... ..
. .....
.... .. ......
........ .....
.....
... ..
.....
.... .. ......
... .. .....
....... .....
... .. .....
... .. ......
...... .....
.....
....
. .....
....
. ...
...... ...
...
... .. ...
... .. ...
...... ..
.... . . .. . ..........
...... .................
. ..... . ... . . . . ....
..... .
..... ... .. ....
...... .
... .
. ...
. ..
. ................ ....
...
.. ................... ....
....... .
.....
. .......
...................... ....
...
.... ...........
.................. ....
.... ...................... .....
...... Sun .......................................
......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ......................................................... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..........
...
....
.....
.
........
....
....
............. ..
......
........ ..
.
.
.
. ...
.
..... ................ ... ..
.
..... ..................
.................. ... ..
..... ................... ... ..
.... ...................
.................................. .... ..
... .. ..
... ... ..
... .. .
...
. .. ...
.
...
... ... ..
... ... ..
..
.... ... ..
.... ... ..
..... ... ..
..... ... ..
...... .. ..
..... ..... ...
...... . .
..... ... ..
..... ... .
...... ... ..
..... ..... ...
.
...... ... ..
..... ... ..
...... ... ..
...... .
..... ...
.
.. ... .
.. . .
........ ..... ....
........ ..... ..
......... ..... ....
... .....
... ...... ...
...... ....
... ..... ..... ...
... ...... ..... ...
... ..... ...... ...
... ...... ...... ....
.... .....
.... ....... .......... .... ..
. .
.... ....... ....... .......
.... ......... ......
..... ....... ....... .......
..... ............ ........ .
. ... ......... .
....
.......... ... ....
... ... .......... ....... ...... ...
... ... ................ ...........
... ... .. ............. .. ...... ...
... ... .. ................................................................................................... ... .
. ... ... ..
. ... ... ... ... .. ... ... ... ... ... ...
. ... ... ... ... ... ... ... ... ...
g(ρ) = eν(ρ) , where λ(ρ) and ν(ρ) tend to zero as ρ → ∞. Thus we assume that22
1 ν(ρ) 2 ρ2 ρ2 sin2 ϕ 2
ds2 = eλ(ρ) dt2 − 2
e dρ − 2 dϕ2 − dθ .
c c c2
22 As the formulas are already sufficiently complicated, we are going to ignore the pedantic
point that the exponential function really needs dimensionless variables as its input. We could
de-dimensionalize by replacing ρ with ρ/ρs , as we did above, but the final result would be the
same. The reader is hereby alerted to this caveat, and can now ignore it.
5. FOURTH ANALYSIS: GENERAL RELATIVITY 157
In order to proceed we need to find the explicit forms of λ(ρ) and ν(ρ). For
that, we need a system of differential equations that they must satisfy. Where can
we find them?
In the Newtonian case, we were guided by the principle that we wanted the
Euler equations for a geodesic to produce Newton’s equations of motion for the
gravitational field. Since it is precisely that system of equations of motion that we
are now planning to replace, we cannot use them to determine λ and ν. Without the
intuitive “push and pull” of Aristotelian forces, it is not obvious how gravitation
can be explained in purely geometric terms. Our effort to do so on the basis of
absolute time and space shows that even a tiny deviation from the exact Newtonian
system of differential equations causes great disruption in the predictions of the
theory, excluding elliptic orbits, for example. We shall take that sobering lesson
as evidence of the value of relativistic refinements when we find (as we shall) that
the same approach works extremely well when the relativistic refinements are taken
into account. One thing is clear, however: We’d certainly be lost if we looked for
any explanations that did not involve second-order differential equations, preferably
linear ones in first approximation. That makes the geodesic approach seem more
plausible, since Euler’s equations for geodesics are exactly of that type.
Very well, we can agree that we are looking for second-order differential equa-
tions. But on what intuitive basis can we derive them? Einstein argued that (1)
the laws of physics must have the same form for all observers, and hence must be
stated as tensor equations and (2) apart from the Riemann curvature tensor, whose
vanishing implies that the space is flat (see below), there is only one tensor of the
same type (0, 2) as the tensor of metric coefficients and involving at most second-
order partial derivatives of the metric coefficients, namely the Ricci tensor23 we are
about to define.
Again, we can say, “Very well, we need the Ricci tensor to derive the differ-
ential equations of motion.” But that admission still leaves us essentially clueless:
What intuitive physical quantity are we supposed to set equal to the Ricci tensor in
order to get the equations? Since for a gravitational field in empty space-time that
physical quantity turns out to be identically zero, we can postpone answering the
harder question until we have done the computations. At the moment, we merely
accept that we are going to require the Ricci tensor to vanish for some reasonable
perturbation of the flat-space metric and then determine the perturbation from that
principle. In the back of our minds we hold the thought that force corresponds to
curvature, as in Newton’s first law (where the absence of force means an absence of
curvature), and hence curvature may be used to replace force in physical theory. As
Einstein mentioned, there is another important tensor involving only second-order
derivatives of the metric coefficients. That is the metric of type (3, 1), known as the
Riemann curvature tensor. It will be defined in the next chapter; but requiring it
to vanish would be tantamount to requiring the space to be flat. The Ricci tensor,
which is obtained from the Riemann curvature tensor by the operation of contrac-
tion, can vanish on a curved space, and so it represents a reasonable compromise,
which very fortunately achieves the desired explanation of the precession of the
perihelion of Mercury. We now turn to the actual computations.
This geometric approach to the problem involves some quantities from differ-
ential geometry that we cannot explain at the moment. We must take it on faith
23 Named after Gregorio Ricci-Curbastro (1853–1925).
158 4. PRECESSION AND DEFLECTION
that these quantities, the Christoffel symbols 24 Γijk , determine the curvature of
any manifold for which they are computed. That is the arcane part that we are
temporarily skipping. The actual computation of these symbols is not difficult.
5.2. The Christoffel symbols. The infinitesimal proper time interval ds is
a quadratic form on R4 :
ds2 = gil dxi dxl ,
where x = t, x = ρ, x = ϕ, and x4 = θ, and by the Einstein summation
25 1 2 3
convention the terms on the right-hand side are summed for each index over the
range from 1 to 4.
We then have
1 ρ2 ρ2 sin2 ϕ
g11 = eλ(ρ) , g22 = − 2 eν(ρ) , g33 = − 2 , g44 = − ,
c c c2
and gil = 0 if i = l. It is therefore very easy to find the inverse of the matrix (gij ),
namely
c2 c2
g 11 = e−λ(ρ) , g 22 = −c2 e−ν(ρ) , g 33 = − , g 44
= − ,
ρ2 ρ2 sin2 ϕ
and g il = 0 if i = l.
From this information we can compute the sixty-four Christoffel symbols Γijk ,
which are defined as follows:
1 ∂gjl ∂glk ∂gjk
(4.11) Γijk = g il k
+ j
− .
2 ∂x ∂x ∂xl
By the Einstein summation convention, the terms on the right are summed on
the index l from 1 to 4.
Remark 4.5. The coordinates used in physics usually come with some physical
dimension attached to them, meaning that their numerical value corresponds to the
measurement of a physical quantity: length, mass, time, force, energy, and the like.
For rectangular space-time coordinates (t; x, y, z), the dimension of t is time, and
the dimension of x, y, and z is length. In spherical coordinates (t; ρ, ϕ, θ), the
dimension of ρ is length, but ϕ and θ are dimensionless, being angles measured as
the ratio of an arc to a radius, both of which are lengths. It is a commonplace
that the arithmetic operation of addition can be performed only on terms that
have the same physical dimension. Thus one can see the need for the compensating
coefficients 1/c2 , ρ2 , and ρ2 sin2 ϕ in the expression for ds2 .
Suppose now that we use coordinates (x1 , x2 , x3 , x4 ) on space-time. Let the
dimension of xi be denoted [i]. That is also the dimension of its infinitesimal
increment dxi . If we assign a dimension [d] to the space-time interval ds—which
may be time, as we have chosen, or length, or energy, or any other suitable measure
of the interval between two events—then the dimension of gij needs to be [d]2 /([i][j])
in order for the equation defining ds2 to make sense. It is then easy to compute
that the dimension of g ij must be [i][j]/[d]2 . This dimensionality has the rather odd
consequence that, while the entries in the matrix products gij g jk and g ij gjk are both
numerically equal to the Kronecker delta δij and δji , the dimension of this Kronecker
delta should theoretically be taken into account: [δij ] = [j]/[i] and δji = [i]/[j].
24 Named after Elwin Bruno Christoffel (1829–1900).
25 We shall henceforth let time be the first coordinate, rather than the zeroth, as it was in
Chapter 2.
5. FOURTH ANALYSIS: GENERAL RELATIVITY 159
in Appendix 4, we are going to use the word tensor rather freely, with minimal
explanation. As to what actually constitutes a tensor, the details are given in
Section 1 of Appendix 6. Right now, we are merely carrying out the computations.
Leaving aside the questions of what the Riemann curvature tensor is and why this
is an appropriate law to assume, we merely state the fact that the Ricci tensor on
a four-dimensional manifold is given as a bilinear functional acting on two vectors
u = (u1 , u2 , u3 , u4 ) and v = (v 1 , v 2 , v 3 , v 4 ). Without the Einstein summation
convention, this tensor would be written as26
4
4 4
∂Γilj ∂Γiij 4
i m j l
Ric (u, v) = − + Γim Γlj − Γlm Γij u v .
i m
j=1 i=1
∂xi ∂xl m=1
l=1
This tensor is determined by the 4 × 4 tableau {Ricjl }4j,l=1 , (where again, and
for the last time, the Einstein summation convention is not invoked)
4
∂Γilj ∂Γiij 4
i m
(4.13) Ricjl = − + Γim Γlj − Γlm Γij
i m
,
i=1
∂xi ∂xl m=1
27 The reader can verify this visually in the case of the space-time metric we are using. The
is not changed.28 The Ricci tensor is actually is a measure of the amount by which
this volume element departs from its flat-space value, as we shall see in Chapter 6.
When it vanishes, volumes—space-time volumes, in this case—are unchanged from
their flat-space values.
Replacing ν(ρ) by −λ(ρ) in Eq. (4.14), we find
2 2λ (ρ)
λ (ρ) + λ (ρ) + = 0.
ρ
Letting μ(ρ) = λ (ρ), we find the first-order equation
2μ
μ + μ2 + = 0,
ρ
which can be written as
1 2 1 2
μ − + μ+ = 0.
ρ ρ
In other words, if κ = μ + 1
ρ, then
κ + κ2 = 0 .
Assuming κ is not identically zero, this last equation has the general solution
κ(ρ) = ρ−ρ1
s
for a constant distance ρs , which will turn out to be the Schwarzschild
radius mentioned above with v0 = c. We have chosen instead to subtract the
radius, writing ρ − ρs rather than the expression ρ + ρs that occurred in our “toy”
metric. We did so anticipating what we shall learn about the sign of ρs below.
The difference is precisely the difference between the positive-definite metric of
the Euclidean space R4 —in general, such a metric is called Riemannian—and the
four-dimensional space-time of special relativity, which has what is called a pseudo-
metric, where ds2 can be negative.
We now know that
1 1
λ (ρ) = − ,
ρ − ρs ρ
so that ρs
λ(ρ) = C + ln 1 − .
ρ
Since λ(ρ) → 0 as ρ → ∞, it follows that C = 0. Summarizing, we find that
the infinitesimal proper time increment ds is given by
ρ − ρs 2 ρ ρ2 ρ2 sin2 ϕ 2
(4.16) ds2 = dt − 2 dρ2 − 2 dϕ2 − dθ .
ρ c (ρ − ρs ) c c2
What we have just done could be checked on a computer if necessary. In fact,
the output from the second extension of Mathematica Notebook 6 (Volume 3) is
'
{λ → Function [{ρ}, C[2] − Log [ρ] + Log [1 − ρC[1]]] ,
(
ν → Function [{ρ}, C[3] + Log [ρ] − Log [ρ] − Log [1 − ρC[1]]]} .
The constants C[1], C[2], and C[3] can then be determined by the reasoning
given above.
6.2. Geodesics in this metric. Now that we have the metric, we can write
the four Euler equations for its geodesics. They are:
d 2(ρ − ρs ) dt
= 0,
ds ρ ds
d 2ρ dρ ρs dt 2 ρs dρ 2 dθ 2
= + 2 − 2ρ ,
ds c (ρ − ρs ) ds
2 2
ρ ds c (ρ − ρs ) ds
2 ds
d 2 dϕ dθ 2
2ρ = 2ρ2 sin ϕ cos ϕ ,
ds ds ds
d 2 2 dθ
2ρ sin ϕ = 0.
ds ds
A suitable rotation will eliminate ϕ from consideration, just as happened in our
earlier analysis. That is, we can assume that ϕ has an extreme value at ϕ = π/2, in
which case ϕ = π/2 is the unique solution of the third of the four Euler equations.
Since we know that the orbit lies in a plane, we switch to polar coordinates, replacing
ρ with r and ρs with rs . Also, just as in our previous analysis, the Euler equation
for the variable θ once again gives Kepler’s second law, with the difference that
the independent variable is now proper time s rather than observed time t. (It is
relativistic angular momentum that is conserved.)
With these reductions taken into account, we now need to deal with the fol-
lowing pair of differential equations:
d 2(r − rs ) dt
= 0,
ds r ds
d 2r dr rs dt 2 rs dr 2 dθ 2
= + − 2r .
ds c2 (r − rs ) ds r 2 ds c2 (r − rs )2 ds ds
That will be our task in the next section.
There is an obvious resemblance between Eqs. (4.7) and (4.16). We are going to
see that ρs has essentially the same meaning in both equations. In the relativistic
scheme, we know that the time coordinate and the spatial coordinate along the
line of motion are entangled when two observers try to reconcile their observations.
This entangling made two significant changes in the Newtonian scheme: (1) the
purely conventional speed of information v0 , used to compute the “exchange rate”
dt = (1/v0 ) dx between time and space, gets replaced by the speed of light c, and
produces an absolute “exchange rate” dt = (1/c) dx; (2) the sign of the spatial
portion of the metric is reversed. Therefore, we should expect a perturbation of the
form ds2 = f (ρ) dt2 − (g(ρ)/c2 ) dρ2 in spherical coordinates. And so it turned out.
We have mentioned that ρs is very small in comparison with the values of ρ
typical in planetary orbits. Even for the orbit of Mercury, we always have ρs /ρ <
10−7 . When ρs is taken equal to zero, Eq. (4.16) becomes the flat space-time
metric of special relativity, while Eq. (4.7) becomes the Euclidean metric on the
spatial part of space-time. The importance of the Schwarzschild radius ρs was first
recognized in connection with the relativistic law of gravity, although, as we saw
above, it was “lurking” even in the Newtonian law and reveals itself when this law
is formulated in geodesic terms. All the “curvature” that causes the geometry of a
gravitational field to differ from flat-space geometry is due solely to the presence of
the radius ρs in the metrics.
With those preliminaries taken care of, we now set out to study the motion of
a particle, assuming it must be a geodesic in the modified space-time metric.
166 4. PRECESSION AND DEFLECTION
Here the parameter u may be arbitrary, since the integral is invariant under
changes of variable. Things become much simpler, however, if we use proper time s
as parameter. The integrand is then equal to the constant 1, since s itself is merely
the difference between the limits of integration:
F ρ(s), ϕ(s), t(s), ρ (s), ϕ (s), θ (s) = 1 .
This fact simplifies the computation of the geodesics from the three Euler equa-
tions for the minimal path. (See Section 1 of Appendix 2 in Volume 2 for the
statement and derivation of these equations. We shall not bother to verify that
the critical path we find actually is a minimum rather than a maximum, since it
is intuitively obvious that there can be no local maximum. One can always vary a
path slightly and make it longer.)
Since our first reduction has already eliminated the co-latitude variable ϕ, and
we have replaced the distance ρ by the variable r we customarily use in polar
7. COMPUTATION OF THE RELATIVISTIC ORBIT 167
coordinates on the plane, and ρs by rs at the same time, we have the proper-time
length s of the geodesic in the form
rs 2 r
s = ds = 1− (t ) − 2 (r )2 − (r 2 /c2 ) (θ )2 ds
r c (r − rs )
= F (r, t , r , θ ) ds .
Here F ≡ 1 along the geodesic. This is the integral for which Schwarzschild
found the exact geodesics in his first 1916 paper. His r, however, was actually
a radius that he denoted R, related to the normal one by the equation R =
r(1 + rs3 /r 3 )1/3 . Schwarzschild died on 11 May 1916. Two weeks after his death,
the dissertation of Johannes Droste (1886–1963) at the University of Leiden was
presented (by Droste’s advisor Lorentz) at the 27 May meeting of the Royal Nether-
lands Academy of Arts and Sciences [15]. It contained what we now know as the
Schwarzschild solution in its present form.29
Schwarzschild remarked that for circular orbits, dθ/dt—he wrote ϕ where we
have written θ—satisfies (dθ/dt)2 = rs /(2R3 ), so that Kepler’s third law holds
exactly for such orbits. Droste noted that the same was true for the normal polar
coordinate r of Euclidean geometry, and we shall demonstrate this below.
7.1. Conservation of angular momentum. Let us now look at the Euler
equations for the variables t and θ:
d 1 − rrs t
= 0,
ds F
d −r 2 θ
= 0.
ds c2 F
The second of these equations says, after it is divided by −r(s)2 , that
d2 θ 2 dr dθ
2
= .
ds r ds ds
and we take the time to notice that in the “Newtonian limit” when c → ∞ and
rs → 0, so that s → t, this is precisely the same as the Newtonian equation.
In the relativistic case, this equation implies that
dθ
r2 = l,
ds
for some constant l, which is precisely the statement of conservation of relativistic
angular momentum per unit rest mass. This principle was a consequence of having
a central force in the classical case, but it was also deduced from Euler’s equation
on θ, just as we have now done in the relativistic case. The difference now is that
the derivative is with respect to proper time s rather than observed time t. To
express it in terms of observer time t, we need the derivative of t with respect to s.
The Euler equation gives us that, at the price of introducing another constant of
integration that we will have to eliminate somehow.
29 Droste is slighted in many accounts of relativity theory; his contributions were many and
brilliant. He had earlier filled out the mathematical details of the 1913 “draft” of general relativity
[24] by Einstein and Marcel Grossmann (1878–1936), and this paper represented his attempt to
keep current, now that Einstein had found a fully invariant formulation of the theory. It was
through a 1917 study by Hilbert that the world came to call this solution the Schwarzschild
solution.
168 4. PRECESSION AND DEFLECTION
r − rs dt 2 r dr 2 r 2 1 r4
− 2 − 2 = 2 = 2 .
r dθ c (r − rs ) dθ c dθ l
ds
dt
When we insert the value of dθand cancel a factor of r, we get
2 4
k r 1 dr 2 r r3
− − = .
l2 (r − rs ) c2 (r − rs ) dθ c2 l2
30 The constant k is not absolute. It depends on the particular geodesic.
7. COMPUTATION OF THE RELATIVISTIC ORBIT 169
dr 2
When we solve this equation for dθ , we find
1 dr 2 c2 k 2 c2 (r − rs ) r − rs
= − − .
r 4 dθ l2 l2 r r3
We can rewrite this equation as
dr 2 c2 (k2 − 1) c 2 rs 2
=r r 3
+ r − r + rs .
dθ l2 l2
Recall that our attempt to geometrize Newton’s astronomy failed at this point,
since the right-hand side of the corresponding equation had to be nonnegative, but
could assume the value 0 only for one positive value of r. That restriction excluded
the possibility of elliptical orbits. Now, however, depending on the values of k, rs ,
and l, the right-hand side of this equation may have up to three positive zeros, and
hence the set of r for which it is positive may contain a finite interval of positive
numbers, enabling us to consider bounded noncircular orbits. We don’t know that
it does contain such an interval, but at least such orbits are not mathematically
excluded. If, for example, k = 1, there will be two positive zeros provided l > 2crs .
The precise requirement is that the cubic equation
c2 (k2 − 1) 3 c2 rs 2
r + 2 r − r + rs = 0
l2 l
have at least two positive zeros between which the left-hand side is positive. This
cannot happen if k2 > 1, since the left-hand side is positive at r = 0 and tends to
−∞ as r → −∞. That means the equation has a negative root. By Descartes’ rule
of signs, it has only one negative root. But if it has two positive roots, it must be
negative in between them, which means the equation dr/dθ = 0 can be satisfied at
only one of the two roots, since r cannot cross the interval between the two roots.
Hence k2 ≤ 1. If k2 = 1, the equation is a quadratic equation and the left-hand
side is once again negative between the two roots. Hence it must be that k2 < 1.
The left-hand side is still positive at r = 0, but goes to −∞ as r → +∞. Again,
by Descartes’ rule of signs, the equation has no negative roots and either 1 or 3
positive roots. If it has three positive roots, it will be positive between the two
larger roots, which is what we must have.
This equation resembles the equation for the orbit in Newtonian mechanics.
As we have now done three times, we rewrite the equation in terms of u = 1/r, so
that du/dθ = −(1/r 2 )(dr/dθ). The result is
du 2 c2 (k2 − 1) rs c2
= + 2 u − u2 + rs u3 ,
dθ l2 l
which we rearrange as
du 2 c2 (k2 − 1) rs c2
(4.18) + u2 = + 2 u + r s u3 .
dθ l2 l
The presence of three as-yet-undetermined constants is still annoying, but since
k occurs only in the constant term, we can get rid of it by differentiating and then
dividing out the factor 2 du
dθ :
d2 u rs c 2 3
(4.19) 2
+ u = 2 + r s u2 .
dθ 2l 2
This equation is merely a perturbed version of the classical equation of New-
tonian mechanics. It differs from that equation in only two ways: (1) the constant
170 4. PRECESSION AND DEFLECTION
aber boshaft ist Er nicht.” (“The Lord God is subtle, but not malicious.”) Einstein is said to
have made this remark during a visit to Princeton University in 1921, in response to a recent
announcement by Dayton Miller (1866–1941) that he had detected some difference in the speed
of light relative to the motion of the Earth. (See p. 390 of the book of Ronald W. Clark [8].)
The difference in speed was still much smaller than classical predictions made on the basis of an
“ether drift,” and Miller had to make more than 5 million measurements with his sophisticated
interferometer in order to reveal it. The large number of measurements, analogous to turning up
the volume on a radio full of static in order to hear the signal, suggests that the effect really was
just “noise.”
7. COMPUTATION OF THE RELATIVISTIC ORBIT 171
hole. In all other cases, the physical behavior of matter at that radius from the
center is not different from its behavior anywhere else.
Remark 4.9. If Kepler’s third law held in this situation, we would have GM =
4π 2 a3 /T 2 , where a is the semi-major axis of the orbit and T its period, and that
would give us another expression for rs , namely
8π 2 a3
rs = .
T 2 c2
But in relativity, the orbit isn’t an ellipse when expressed in ordinary polar
coordinates, and Kepler’s third law doesn’t hold in general. On the other hand, it
does hold for circular orbits.
From now on, we shall work with the equation
d2 u
+ u = α + βu2 ,
dθ 2
where α = rs c2 /(2l2 ) and β = 3rs /2.
Since u does not occur explicitly in this equation, we can once again use the
familiar technique of writing u = p(dp/du), where p = u . If we integrate this last
equation from u0 (aphelion) to u, taking θ = 0 at aphelion, we get (after multiplying
by 2) the equation
du 2 c2
= rs 2 (u − u0 ) − (u2 − u20 )/rs + u3 − u30
dθ l
c2
= rs (u − u0 ) 2 − (u0 + u)/rs + u20 + u0 u + u2 .
l
Here, on the left-hand side, we use the fact that du/dθ = 0 at aphelion. From
the physical model, we know that du/dθ = 0 also at perihelion (u1 ), and so the
constants rs and l must be such that
c2
= (u0 + u1 )/rs − (u20 + u0 u1 + u21 ) .
l2
With this explicit expression for c2 /l2 , we can now write (du/dθ)2 in terms of
u0 and u1 :
du 2
= (u − u0 )(u1 − u) 1 − rs (u + u0 + u1 ) .
dθ
Remark 4.10. The expression for c2 /l2 also allows us to write the angular
momentum per unit rest mass in a form that involves only the parameters of the
orbit and the Schwarzschild radius:
√
ca(1 − e2 ) rs
l= .
2a(1 − e2 ) − rs (3 + e2 )
The fact that du/dθ is the square root of a cubic polynomial in u implies that
u is an elliptic function of θ. To get the equation into a form in which we can
compute this function, we need to make some changes of variable. The first is a
simple linear transformation on u. Let u = γv + δ with γ and δ chosen so that
u0 = −γ + δ and u1 = γ + δ, that is, v ranges over [−1, 1]. This is easily done:
γ = (u1 − u0 )/2, δ = (u1 + u0 )/2. In relation to the geometry of a planetary orbit
172 4. PRECESSION AND DEFLECTION
which becomes
r(1 − e cos ϕ) = a(1 − e2 )
if, instead of taking ϕ = −π/2 at aphelion, we take ϕ = 0 at this point. We
note that the eccentricity and semi-axis of this orbit are exactly the same as in the
Newtonian model.
Remark 4.11. This equation for the orbit in terms of the fictitious angle ϕ
shows how to construct a kinematical model of the orbit based on the Keplerian
orbit. We simply imagine a “phantom” Mercury traveling around the old Keplerian
(elliptic) orbit predicted by the Newtonian theory. The radius vector from the Sun
to the observable Mercury at right ascension θ equals the radius vector to the
phantom Mercury at right ascension ϕ. (Again, this ϕ is not the co-latitude angle
that we eliminated earlier!) The result is illustrated in Fig. 4.5.
Thus, in terms of the angle ϕ, the relativistic equation of the orbit is exactly
the same as the classical equation. If we had ϕ = θ, there would be no difference
between them. As there is a difference, we need to explore how ϕ is related to θ.
Let us return to our original assumptions, whereby θ = 0 and ϕ = −π/2 at
aphelion. Thus
θ ϕ
ds
θ= dt =
0 −π/2 1− 3rs
a(1−e2 ) − rs e
a(1−e2 ) sin(s)
ϕ+π/2
dt
= ,
0 1− 3rs
a(1−e2 ) + rs e
a(1−e2 ) cos(t)
where we replaced s by t − π/2 to get the last equation (sin(t − π/2) = − cos(t)).
7. COMPUTATION OF THE RELATIVISTIC ORBIT 173
.............................................................................
..................
............. . ... ... ... ... ...... ... ... ... ... ... ... ... ... ... ... ... ........................
........... .. ... ... ... ..
.. .
..................... ... ... ... ... ... . .................
. .. ... . ........
........ ... ... .. ... .
.
............ ... ... ... ... ... .............
. . ... .. .........
....... ... ..
.................. .... ......
..... ......
. ..
...... ... .... .......
..
....... ...... .... .......
.... .....
.
............ .... .....
. ..
......... ... ........
.
. .. ........ ... ....
... .....
. . .......... ... .....
.........
. ... ....
... ....
.. .. ... ... ....
.
........ ... ...
.
. ... ...
.....
. ... ...
.
... .. .. ...
.
...... .. ....
....... .. ...
.. ...
.
.... .. ...
...... .. ...
....
. ......
......
. .. ...
.....
. .....
.... .....
. ......
..... .....
..... .....
..... .....
..... ......
....... .....
..... .....
..... ......
..... ..
. .
..
..................
..... ...
....
. ..
. ..
..... . ..... .
..... ..
.. ............
. ......
..... ......
. ...
.. ..
.. ............. ...
...
................... ...
...
..
....
....
.. .
. ... ....................
. ...
... .. .
. ........ ..
... ............ ..
.. ..
... .. .
.............. ..
..
....
Sun ............ ........ .
...
................. ..
..... ..
....
.
.... .
.... ... ... ... ... ... ... ... ... ... ... ... ......... ... ... ... ... ... ....... ... ... ... ... ... ... ... ... ......................................... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..........
......... ..
... .. .. ............... ..
.. ..... .
...
.
... . ......... . ........ .
... .. ..... .
. .
...
...... ... ....................................... ...... .....
.....
.....
.
.... ....................
.............
.
. .......
.....
..........................
...
...
.... ..
..
..
......
.
. ...
....... • = “Observable” Mercury ..
.....
. ..
.. ....
..... ... .. .. ..... .....
...... ... .. .. .... ...
..... ... .. . .. .
......
◦ = “Phantom” Mercury
.. ...
..... ...
...
..
... .. ..... .....
...... ... ... .. .... .....
..... ... ... .. .... .. .
...... ... ....
. . ... .... .....
.
.
... ... ... .. ....
..... ..... ... ... ..
.....
......
....
....
.....
ϕ ... ... .
.. ... ... ... ... ... ... ... ... . ..
..
....
....
....
... ..
.. ..
..... .....
.. .. ... .
......
...... ..... ...... .
.....
...... ..
.. ........... ...... ... ..
...... .. .
..... ....... .
........ ... ... ..
...... .......
......... .. .
.......... ... ....
.. .... ...
.
.. ... ........... .. . . .
...... .................. ........... .. .... .. .
... .
.. ... ........................................ ..
..
....
.... ... ..
.. ....
θ .. .... ... ..
.. .... . ...
..
.... ....
.. .... .. .... ... .
.. .... .. .... ... ..
... ..... .. .... ..... ...
... ..... .. .... ..... ..
... ......
... ..... .. ...
. ........ ......
.
...
... ..... .. .... ..... ...
... ...... .. ..... ...
... ..... ..
.... ..... ...
... ...... ... ..... ...
... .....
.... ......
..
..
....
.... .
..
....... ......
.
.... ...... .. .... ...... ....
.... ....... .... ............ ......
•
.... .......... .. ......... ....
..
..... .......
..... ........... .. .
. ...
......... .......
. ... ......... .. ...... ...
... ... ........... .. ............ ... ... ...
... ... ............... ............ ...
............. . ... ...
... ... ..
... ... .. ..................................................................................................... ... ... ........
. ... ... ..
. ... ... ... ... .. ... ... ...
◦
. ... ... ... ... ... ... ... ... ...... ...
where
rs (3 − e) 2rs e
C= 1− and m= .
a(1 − e2 ) a(1 − e2 ) − rs (3 − e)
Thus, we find that
2 ϕ π
θ= EllipticF + ,m .
C 2 4
By definition of the function EllipticF, this means that
ϕ π
+ = am (Cθ/2, m) ,
2 4
that is,
π
ϕ = 2 am (Cθ/2, m) − .
2
As in our toy Newtonian model, am (x, m) is the Jacobi amplitude function
corresponding to modulus m, that is, the inverse of the function EllipticF (x, m).
Here once again we find an echo of our unsuccessful attempt to geometrize Newton’s
analysis. The same two angles θ and ϕ have the same relation to each other here as
in the earlier case. The difference is that the equation connecting r and ϕ is now
the equation of an ellipse rather than a straight line.
We recall that ϕ = −π/2 at aphelion. Remeasuring now, so that ϕ = 0 at this
point (that is, relabeling ϕ itself), we find the simple relation
ϕ = 2 am (Cθ/2, m) .
From the form of the equation of the orbit in terms of ϕ, with ϕ being set equal
to 0 at one particular aphelion, we saw above that
r(1 − e cos ϕ) = a(1 − e2 ) .
We now see that the actual polar equation in terms of the observable angle θ
looks almost the same, and we have the following elegant result:
Theorem 4.4. The observed polar coordinates (r, θ) of the orbiting particle
satisfy the equation
(4.20) r 1 − e cos 2 am (Cθ/2, m) = a(1 − e2 ) .
For the rest of the argument, we need to insert some numerical values and
do some precise calculations. The reader is referred to Mathematica Notebook 7
in Volume 3 for the necessary details, including the solution of the differential
equation. We now take those results as known.
With these data, we quickly get numerical values for the constant C and the
elliptic modulus m:
C = 1 − 7.441360 × 10−8 ,
m = 2.1903594 × 10−8 .
We can now see how very nearly equal ϕ and θ are. As a function of ϕ, r has
period 2π. But an increment of 2π in the value of ϕ corresponds to a slightly larger
increment in the polar angle θ. The actual increment in the polar angle θ for this
increment in ϕ is
2 π ds
Δθ = .
C 0 1 − m sin2 s
7. COMPUTATION OF THE RELATIVISTIC ORBIT 175
When we insert the numerical values of C and m into this equation and evaluate
the integral, we find that
Δθ = 2π + 5.02233126 × 10−7 .
Another way of putting this is to say that the orbit of Mercury can be described
by the usual equation with right ascension ϕ(t) and radius r(t) related by
r(t) 1 − e cos ϕ(t) = a(1 − e2 ) ,
except that ϕ(t) differs slightly from the observed right ascension θ(t) and in par-
ticular, has a slightly longer period than the period T of the right ascension. Thus,
we express the orbit in terms of a product of two observable, measurable periodic
functions r(t) and θ(t), which now have different periods. In the Newtonian ellip-
tic model, these two functions have the same period T . In the Newtonian limit,
in which the speed of information v0 is infinite and consequently the Newtonian
Schwarzschild radius rs is 0, we get m = 0, C = 1, ϕ(t) = θ(t), and so the relativistic
solution reduces to the Newtonian.
That small discrepancy, whereby the increment in θ between successive peri-
helions exceeds 2π by one two-millionth of a radian, accounts completely for the
previously unexplained portion of the precession of the perihelion of Mercury. In
fact, if we convert it into seconds—there are 3600 seconds in one degree—and find
the accumulated discrepancy over a century, when Mercury makes some 415 com-
plete revolutions, we learn that it is
180
5.02233126 × 10−7 × × 3600 × 415 = 42.9678 .
π
As we began by saying that we were trying to explain the already observed
fact that the perihelion of Mercury was precessing by about 43 seconds of arc per
century more than could be explained as the result of the gravitational action of
the other planets, this very precise “fit” between theory and observation provides
powerful evidence that the theory is more precise than the Newtonian theory.
Remark 4.12. Taking the Sun as origin, we can regard the plane of Mercury’s
orbit as the complex plane and write the location of Mercury in it at time t as a
complex-valued function of t, namely
r(t)eiθ(t) .
In Newtonian mechanics, the two functions r(t) and eiθ(t) have the same period,
and therefore their product also has that period. In relativity, due to the different
FitzGerald–Lorentz contractions in the radial and tangential directions, the two
periods are different. As noted above, there is a constant k = 2K/π such that
θ(t) − kt has period 2π, and thus the location of the planet at time t is given as a
product of two periodic functions:
r(t)eiθ(t) = r(t)eiθ(t)−ikt eikt ,
whose periods are 2π and 2π/k = π 2 /K. The function is now periodic if and only
if the ratio of the two periods 2K/π is a rational number. In any case, however, it
176 4. PRECESSION AND DEFLECTION
where the frequencies λn are not necessarily integers. Thus, even the relativistic
orbits confirm that Ptolemy’s epicycle approach to astronomy, as formulated by
Sternberg ([77], pp. 1–14), is feasible in theory.
7.3. Kepler’s third law*. Kepler’s first law has similar forms in Newtonian
and relativistic physics, namely
r(1 + e cos θ) = a(1 − e2 )
r(1 + e cos ϕ) = a(1 − e2 ) ,
where ϕ is an elliptic function of θ (measured, in this formula, from perihelion, in
contrast to the formula given above, where it was measured from aphelion).
Similarly, Kepler’s second law merely expresses conservation of angular mo-
mentum per unit rest mass:
dθ
l = r2
dt
2 dθ
l = r .
ds
The precession of perihelion means that r is no longer known to be a periodic
function of θ, so that it would not appear to make sense to speak of its period.
Still, relativity has not changed the aphelion and perihelion distances, so that r
has a definite period as a function of time, and the average of the perihelion and
aphelion distances is still defined. Those are the ingredients of Kepler’s third law.
The period T can be taken as the time elapsed between successive perihelia, and it
might be measured in either “laboratory” (heliocentric) time or in proper time on
the orbiting planet. Thus, we have already two possible forms for Kepler’s third
law, depending on which time we wish to use. We might get two more forms by
considering the time required for θ to increase by 2π. In only one case, namely the
case of circular orbits, would this period also be a period of the actual position of the
planet. In that one case, we do indeed find that Kepler’s third law holds. Replacing
r by 0 makes F independent of r and thereby changes the Euler equation.
For a circular orbit of radius a, we have
rs dt 2 r 2 dθ 2
1= 1− − 2 = F (t, r, θ, t , r , θ ) .
r ds c ds
By Euler’s equation on the variable r, we thus get (since F is independent of
r , which means ∂F/∂r = 0)
d ∂F ∂F rs dt 2 2r dθ 2
0=
= = 2 − 2 .
ds ∂r ∂r r ds c ds
32 The theory of these functions was developed single-handedly by Harald Bohr (1887–1951)
during the second decade of the twentieth century in an attempt to settle the Riemann hypothesis
(the still-unproved conjecture that all the nontrivial zeros of the Riemann zeta function have real
part equal to 1/2). Harald Bohr was the brother of the famous physicist Niels Bohr(1885–1962).
8. THE SPEED OF LIGHT 177
to ∞. The increase is sufficiently fast that even at the visible surface of the Sun,
which corresponds to ρ = 2.3 × 105 ρs , we have
r
≈ 0.999988 .
ρ
The error in replacing our measured ρ by r is therefore about 0.001%, and
therefore not much to worry about. The difference ρ − r is very small indeed, since
we have the formula
1 1 3ρs
ρ − r = ρ1 + < .
4 2 1 + 1 − ρs /ρ 4
Outside the radius of the Sun, this difference has the nearly constant value
of ρs /2, which is less than 2 km. Given that we have no empirical proof that
the spatial portion of space-time actually is Euclidean, we could not prove anyone
wrong who chose to use r instead of ρ in measuring interplanetary distances. The
two sets of coordinates would give results that are experimentally indistinguishable.
For example, even the very precise values we used for the perihelion and aphelion
distances of Mercury were given with a precision only up to ±50 km, which is nearly
30 times as large as this difference.
We introduce isotropic spherical coordinates and isotropic rectangular coordi-
nates by replacing ρ with r while retaining the co-latitude (ϕ) and longitude (θ):
x = r sin ϕ cos θ ,
y = r sin ϕ sin θ ,
z = r cos ϕ .
Naturally, these rectangular coordinates are not exactly equal to the ones we
have been using, but, as already noted, the difference is too small to measure. With
this modification, we then alter the metric so that Eq. (4.16) is replaced by
r − r 2 1 rs 4 2
s
ds2 = dt2 − 2 1 + dr + r 2 dϕ2 + r 2 sin2 ϕ dθ 2
r + rs c r
r − r 2 1 rs 4
s
= dt − 2 1 +
2
(dx2 + dy 2 + dz 2 ) .
r + rs c r
The speed of light cr at distance r from the attracting particle is now the same
in every direction, namely
dx 2 dy 2 dz 2 1 − rrs
cr = + + = 3 c .
dt dt dt 1 + rs
r
The independence of the speed of light from direction is the reason for the name
isotropic applied to these coordinates. When we add the gravitational fields of a
number of particles, isotropic coordinates simplify the computations; and we shall
use them in Chapter 7.
Remark 4.13. The speed of light approaches zero as r decreases to the Schwarz-
schild radius rs , but increases rapidly to its value in “empty” space c as r → ∞.
At Mercury’s orbit, it is already 0.9999997c.
9. DEFLECTION OF LIGHT NEAR THE SUN 179
One of the two great early triumphs of general relativity was the explanation of the
precession of Mercury’s perihelion, which we have just discussed. The other was the
prediction of the amount of deflection a ray of light would undergo when passing
near to the Sun. There is a certain irony in the history of this second triumph
of general relativity, as one can see by comparing the computation reported by
Soldner in 1801, which was based on Newtonian mechanics, and the one Einstein
reported in 1911, which was based on relativity. The predicted deflections are
180 4. PRECESSION AND DEFLECTION
identical!! Einstein was not aware of what Soldner had done a century earlier.33
He was very lucky that no one took up his urgent request for an investigation in
1911. If his prediction of that year had been observed, it would have confirmed
Newtonian and relativistic mechanics equally well. As it was, the war intervened,
during which Einstein published his general theory and Schwarzschild produced his
elegant solutions of the field equations in free space, under which the predicted
deflection is approximately doubled. It was not until the eclipse of 29 May 1919
that two expeditions were dispatched to make the measurements Einstein had been
urging, one to the island of Principe, a Portuguese possession off the west coast of
Africa, the other to Sobral in northern Brazil. Despite intermittent cloudy, rainy
weather, both were able to take a number of photographs, which could then be
studied and used to measure the deflection of light from stars near the Sun. We
now take up the comparison of Soldner’s original result with Einstein’s revised
result.
9.1. Soldner’s result. Since the speed of light is not a barrier in Newtonian
mechanics, we can, like Soldner, take note of the fact that in the Newtonian scheme
all bodies, of whatever mass, undergo the same acceleration under Newton’s law
of gravity, and just extend that fact to include massless particles like photons. At
the perihelion distance r = rperi , the velocity is perpendicular to the radius vector
and hence the speed is c = rperi θ . Since the angular momentum l per unit mass
is conserved, it follows that l = rperi c at all times. Since a particle of light moves
faster than many observed material particles that describe hyperbolic orbits, and
theory guarantees that the orbit must be a conic section, we conclude that the path
of a light ray is hyperbolic and has a polar equation of the form
r(1 + e cos θ) = a(e2 − 1) ,
where e > 1 and the origin of these polar coordinates is at the center of the Sun.
We take the amount of deflection in passing near the Sun to be the angle
between the asymptotes of this hyperbola, that is, the angle that does not contain
the hyperbola. In rectangular coordinates the equation of the hyperbola is
(x + c)2 y2
2
− 2 = 1,
a b
√
where c = a2 + b2 . The foci of this hyperbola are at x = 0 and x = −2c. We shall
assume that the Sun is at x = 0 and that the path of the light ray is the branch
of the hyperbola that crosses the x-axis at x = a − c (which, as shown in Fig. 4.6,
is a negative number). The asymptotes have the equations y = ±(b/a)(x + c).
The limiting value of y/x as x goes to infinity on the hyperbola is b/a, and the
minimum value of the polar angle θ is the arctangent of this value; we denote it
θ∞ , that is, θ∞ = arctan(b/a). The angle ϕ between the asymptotes is π − 2θ∞ .
33 One wonders why he didn’t undertake a Newtonian computation for comparison. Soldner
had no idea that photons were massless, but Einstein knew this had to be the case because of the
relativistic increase of mass with velocity. If Lucretius and Soldner had known about photons, or
what is now called dark matter, they would probably not have asserted that everything is either
matter or a void. Perhaps Einstein assumed that on Newtonian principles gravity could not affect
a massless particle. In the quotation, he seems quite confident that any bending of light around
the Sun would tell in favor of relativistic mechanics vis-à-vis Newtonian.
9. DEFLECTION OF LIGHT NEAR THE SUN 181
y
....
.......
....
...
..
..
..
...
...
.. ... ..
.. . ... ....
.. .. ... ...
.. .. ... ...
. .
.. . .
. ..... ...
.. .
.. .. ... ........
.. .. . . ....
.. .. . ....
.. .. .. .....
.. .. .
.. .. . ... ...
.. .. .. ... ...
. .
.. .. .. ... ..
.. .. .. ... ...
.. .. .. .... ....
.. .. . . .
. ....
.. .. .. ... ..
.. .. .. .. ....
.. ..
. .. .... ..
.. . .. ..
.. .. .. .. . ....
.. .. . .. . ..
.. . . .. . ...
. . ... ...
•
.. . ..
.. ... .. ...... (x,y) ...
.. .. .. ...... ...
.. .. .. ... ... ...
.. .. .. .. ... ...
.. .. .. .... .... ...
.. .. .. ... ... ...
.. .. .. ... ... ...
.. .. .. ... ... ....
.. .. .. .. ...
.. ..
.. .. .
ϕ
. ..
.. ... ...
.
...
..
...
.. ........................................................ ... . ... ...
... ...
.. .. .
. ..
. .
... ...
.. .. .. ..
. ... ...
.. .. . .
.. .. .
. .
.
.
. .... ....
.. .. .
.
. .
.
. ... ...
.. .. . . .
... .............
.. ... ...... .. .................
..
..
..
..
.. .......... ...
.. ..............
... ... θ ......
.....
.. ... ...
..
..
..
..
..
.. ....... ∞
.... θ ... ..
... ...
....
...
...
. . .
. .
. . ... . .
. ...
... . ..
.. . .... .....
.
. . . . ........................ ...
.. .. .. . ..
. . .. ...... .... ...
•
.
. .
. ..
.
..
.
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ....... ... ... ... ... ............ ... ... ... ... ..... ... .... ... ... ... ... ............. .......................................................................
... .. .....
..
• . ..
.
. x
. . . .................... . .
... . .
.. ...
...
...
... .. .. ...
Sun
.. .. .. ..
...
.. .. .. ...
.. .. .. ...
.. .. .. ...
.. .. .. ...
.. .. .
. ...
.. .. .
. ...
.. .. .
. ...
.. .. .
. ..
.
... ...
. .. .....
.. .. .. ...
.. .. .. ...
.. ...
.. .. .. ...
.. .. .. ....
.. .. .. ...
.. .. .. ...
.. .. .. ....
.. .. .. ...
.. ... .. ...
.. .. .. ...
.. .. .. ....
.. .. .. ...
.. .. .. ...
.. .. .. ....
.. .. .. ...
.. .. .. ...
.. .. .. ...
.. .. .. ....
.. ... .. ...
.. .. .. ...
.
. .. .. ....
. .. ...
... .. .. ...
.. ... .. ....
.. . .. ...
.. ... .. ...
.. .. .. ...
.. .. .. ....
.. ... .. ...
.. .. .. ....
.. .. .. ...
.. .. ...
.... ...
...
.. ...
.. ...
.. ...
... ...
. .
that the equation makes sense for 0 ≤ u ≤ uperi , that is, that the expression giving
p2 in terms of u is nonnegative on this interval. That fact follows from elementary
algebra. By Descartes’ rule of signs, the cubic polynomial in u on the right has
precisely one negative root, say u = −ε, where ε > 0. It has the positive root
u = uperi , and the sum of its roots is the negative of the ratio of the coefficient
of u2 to the coefficient of u3 . That is, the sum of the roots is 1/ρs . This means
that the other positive root r is 1/ρs + ε − uperi , which is larger than 1/ρs − uperi .
This number is certainly larger than uperi , since 1/ρs is larger than 100, 000uperi .
It follows that the polynomial is positive over the entire interval 0 ≤ u < uperi .
For the given values ρs = 3, uperi = 1/(697, 000), Mathematica reveals that
ϕ = 8.61313 × 10−6 radians = 1.77659 .
Thus, general relativity predicts a deflection almost exactly twice what New-
tonian mechanics predicts. Here then is a test case to see which of the two works
better, and it was (barely) within the limits of precision of the available instruments
a century ago to measure the difference. Of course, because the paths are so nearly
straight, it was necessary to get a ray of light that passed as close as possible to
the Sun, and thus such a ray would be detectible on Earth only during a total solar
eclipse.
9.3. The results of observation. The raw data obtained by a camera pho-
tographing the heavens do not give a direct read-out of the amount of deflection.
These data have to be massaged and averaged, and reported in terms of a mean
and standard deviation. On that basis, it appears that Einstein was right. The
massaged data from Sobral gave a deflection of 1.87 ± 0.13 seconds, while the data
from Principe yielded 1.98 ± 18 seconds. Both were large enough to rule out the
Newtonian model and close enough to fit the relativistic model.
There is more to be said, however, and discussion of these results has continued.
Sir Arthur Eddington was the most prominent member of the Principe expedition,
and he was known to be an enthusiastic proponent of the theory of relativity, and
also a man eager to reconcile with the Germans after the recent war. The suggestion
was made that, consciously or unconsciously, he manipulated his data. It seems
very unlikely that he would do so deliberately, since there were two independent
expeditions, and he could manipulate the data of only one of them. He would have
nothing to gain and everything to lose by such a risky gambit. For an account of
this controversy, see the paper of Harvey [37].
10. Problems
Problem 4.1. Verify Eq. (4.16).
Problem 4.2. Let us explore the “punctured disk” of nonzero relativistic
velocities as a two-dimensional manifold (see Appendix 2), with a metric ds2 =
g11 dx2 + g12 dx dy + g21 dy dx + g22 dy 2 . We will find it easier to do this using polar
coordinates (r, θ), where r ranges over the real numbers in the interval (0, c) and θ
is the numerical measure of an angle, two angles being identified as usual if they
differ by 2π. To get the squared element of arc length, consider an infinitesimal
triangle with vertex at the origin and two sides equal to r and r + dr enclosing an
angle dθ. The third side will be ds. On the infinitesimal level, the squared element
184 4. PRECESSION AND DEFLECTION
of arc length is
r 2 + (r + dr)2 − 2r (r + dr) cos dθ − r 2 (r + dr)2 sin2 dθ/c2
ds2 = .
(1 − r (r + dr) cos dθ/c2 )2
Expand every term in the numerator in a Maclaurin series in dr and dθ, using
the well-known expansions cos x = 1 − 12 x2 + 24 1 4
x − · · · and sin2 x = 12 − 12 cos 2x =
x2 − 13 x4 + · · · , retaining only terms of degree 2 or less in these infinitesimals.
Then expand the denominator as a geometric series—that is, using the expansion
(1 − x)−2 = dx d
(1 − x)−1 = dx d
(1 + x + x2 + x3 + · · · ) = 1 + 2x + 3x2 + · · · , and
multiply the two expansions together to show that
r 2 −2 2 r 2 −1 2
ds2 = 1 − 2 dr + r 2 1 − 2 dθ .
c c
Then compute the length of the radius from the origin to the point with polar
coordinates (r, θ) and the circumference of the circle through that point with center
at the origin. Here, r is the coordinate of the endpoint of the radius, not its length.
Denote the length R(r), and express the circumference of the circle as a function
of R.
Problem 4.3. Compute the eight Christoffel symbols Γkij and the Ricci tensor
Ricab for the metric of the previous problem.
Problem 4.4. The expression for ds2 in Problem 4.2 was known classically as
the first fundamental form. When the metric on a two-dimensional surface in R3
has symmetry (g12 = g21 ), the element of arc length can be written as35
ds2 = E dr 2 + 2F dr dθ + G dθ 2 .
In that case, the element of area on the surface is
dA = EG − F 2 dr dθ .
Compute this element of area. Then use the expression for dA to compute the
area enclosed by the (punctured) circle centered at the origin passing through the
point (r, 0). Finally, use the formula for dA to express the area of a triangle having
sides u, v with included angle η.
Problem 4.5. A finite piece of the hyperbolic plane can be represented accu-
rately as part of a pseudo-sphere in R3 (see Appendix 1). The portion in question
can be conveniently represented as the graph of a function in polar coordinates in
an annulus, 0 < k0 < r < k, where k0 may be an arbitrarily small positive number:
k k 2 r 2
z(r, θ) = k ln + −1 − 1−
r r k
r r 2
= k arcsech − 1− .
k k
With only a small amount of tedium one can compute that
dz k 2
=− − 1,
dr r
35 We do apologize for the abuse of the letter G, which we have made strenuous efforts to
avoid in our notation for the Ricci tensor. It appears here as a metric coefficient. The notation is
due to Gauss, and seems too venerable to change.
10. PROBLEMS 185
ds2 = dr 2 + r 2 dθ 2 + dz 2
dz 2
= 1+ dr 2 + r 2 dθ 2
dr
k2 2
= dr + r 2 dθ 2 .
r2
Considering curves on the pseudo-sphere that are parameterized
by arc length,
show that a geodesic on which the point closest to the z-axis is r0 , θ0 , z(r0 , θ0 )
(assuming k0 < r0 ) must satisfy the system of Euler equations
r0
θ =
r2
r 2
rr − r 2
0
= .
k
Then show that the curve in the annulus whose polar equation is
kr0
r=
k − r02 (θ − θ0 )2
2
Problem 4.6. Show that there is one other class of geodesics on the pseudo-
sphere not included in the family of curves
given in the previous
problem, namely
the curves whose parameterizations are r(s), θ(s), z r(s), θ(s) , where
r r
r(s) = r0 e−s/k , θ = θ0 , k ln
0 0
< s < k ln , k0 < r0 < k .
k k0
These are the hyperbolic analogs of lines of longitude on a sphere, and the
parameter s is arc length.
Problem 4.7. Show that Ricab = Ricba for the space-time metric of general
relativity and that the equation Ricab = 0 is an identity when a = b.
Problem 4.8. Show that the equations Ric33 = 0 and Ric44 = 0 are conse-
quences of the equations Ric11 = 0 = Ric22 .
Problem 4.9. In the Newtonian orbital computation, use the fact that dr/dt =
0 at both aphelion and perihelion to express l2 in terms of the universal constants
G and M and the planet-specific constants a and e. That is, show that l2 =
GM a(1 − e2 ).
Problem 4.10. Prove that ρ2 (dθ/ds) is constant for any radial space-time
metric, that is, any metric of the form
1
ds = f (ρ) dt2 − 2 g(ρ) dρ2 − ρ2 dϕ2 − ρ2 sin2 ϕ dθ 2 .
c
Here, ρ is again the radial space coordinate, not charge density. Hint: Show
first that a suitable choice of coordinates allows us to take ϕ ≡ π/2, dϕ/ds = 0.
186 4. PRECESSION AND DEFLECTION
= 2 1− = 2 1 −
C a(1 − e2 ) a 1 − e2
− 12
3rs e e3
= 2 1− 1− +e − 2
+ ···
a 3 3
3rs rs e
= 2 1+ − + ···
2a 2a
π π − 12
m m
(1 − m sin2 s)− 2 =
1
1− + cos(2s)
2 2
0
0 π
m m
= 1+ − cos(2s) · · · ds
0 4 4
2rs e 2rs e
m = = + ··· .
a(1 − e ) − rs (3 − e)
2 a(1 − e2 )
By neglecting terms that are not larger than rs e/a—since rs /a is already very
small, as is the the eccentricity e = 0.205, so that e/3 is less than 7%—show that the
increment in the polar angle θ when the angle ϕ increases by 2π is approximately
3πrs 24π 3 a2
Δθ = 2π + = 2π + ,
a(1 − e2 ) T 2 c2 (1 − e2 )
as Einstein asserted.
Problem 4.12. For a metric that has symmetry, that is, gij = gji , it is trivial
to show that the Christoffel symbols are also symmetric in the two subscripts. Show
that in this case, the metric coefficients gij satisfy the system of first-order linear
partial differential equations
∂gij
= Γlik glj + Γljk gil .
∂xk
It follows that the metric coefficients can be determined from the values they
have at any one point, provided the Christoffel symbols are given.
Problem 4.13. Use the result of the last problem to establish the dual differ-
ential equation
∂g ij
k
= − Γjmk g im + Γimk g jm .
∂x
Problem 4.14. Again
assuming symmetry of the metric coefficients, show that
if s → x1 (s), . . . , xn (s) = γ(s) is the parameterization of a path of minimal total
length, using arc length s as a parameter, then γ satisfies the system of differential
equations 1 j
(xm ) (s) + Γm n k
jk x (s), . . . , x (s) (x ) (s)(x ) (s) = 0 ,
for m = 1, 2, . . . , n.
Problem 4.15. Show that if the arc length s in the previous problem is replaced
by an arbitrary parameter t for which ds/dt > 0 at all points, the equation satisfied
is
d2 s ds m 1 j
(xm ) (t) − 2 (x ) (t) + Γm n k
jk x (t), . . . , x (t) (x ) (t) (x ) (t) = 0 .
dt dt
Notice that this equation is the same as the one derived in the previous problem
if s is a linear function of t.
10. PROBLEMS 187
Problem 4.16. Show that the Christoffel symbols are not altered (and hence
neither is the Ricci tensor) if each of the metric coefficients gij is multiplied by the
same constant k. Thus, curvature is independent of the scale by which distances
are measured, as we should hope if it is to be the same number for all observers.
Problem 4.17. Show that there is no surface z = f (r, θ) in R3 for which the
metric coefficients induced from the metric on R3 are given by the diagonal matrix
r
r+rs 0
.
0 r2
the metric on R ,” we
3
By the phrase “induced from mean that ds2 = dr 2 +
2
r 2 dθ 2 + dz 2 = dr 2 + r 2 dθ 2 + ∂f /∂r) dr + (∂f /∂θ) dθ = 1 + (∂f /∂r)2 dr 2 +
2(∂f /∂r) (∂f /∂θ) dr dθ + (r 2 + (∂f /∂θ)2 dθ 2 .
Problem 4.18. Show that a nonconstant uniformly almost-periodic function
of time has more than one local extreme value. The definition of uniform almost-
periodicity is as follows: A complex-valued function f (t) of a real-variable is uni-
formly almost-periodic if for every ε > 0 there is a length Lε such that every interval
of length at least Lε contains an ε-translate, which is a number T such that
|f (t + T ) − f (t)| < ε
for all real numbers t. The fundamental theorem of almost-periodic functions says
that if f (t) is such a function, then for every ε > 0 there is a finite generalized
trigonometric polynomial
n
p(t) = cλj eiλj t
j=1
such that |f (t) − p(t)| < ε for all t. The terms in this finite sum represent Ptolemy’s
epicycles. The frequencies λj are arbitrary, so that these polynomials are generally
not periodic.
Problem 4.19. Finish the proof of Theorem 4.3.
Problem 4.20. Is it possible to endow space-time with a metric of the form
1
ds2 = f (ρ) dt2 − 2 g(ρ) dρ2 + ρ2 dϕ2 + ρ2 sin2 ϕ dθ 2
c
such that the following conditions are met?
(1) f (ρ) → 1 and g(ρ) → 1 as ρ → ∞. (That is, the metric approaches the
flat-space metric at infinity.)
(2) The equations of a geodesic given by the Euler equations are Newton’s
equations of motion (4.5) and (4.6).
Hint: Notice the quotation from Eddington at the beginning of Section 4.
Problem 4.21. Derive Eqs. (4.2) and (4.3) by using a moving frame of reference
ω 1 (θ) = cos θ i + sin θ, j, ω 2 = − sin θ i + cos θ j and setting the dot products of
(αr ) +(GM/r 3 )r with ω 1 and ω 2 equal to zero. Solve the second of these equations
for θ (t) and substitute that value in the first of them. If you try to do this without
the assistance of Mathematica, keep in mind that a fraction vanishes if and only if
its numerator vanishes.
188 4. PRECESSION AND DEFLECTION
Problem 4.22. Show that in Newtonian mechanics, the constant angular mo-
mentum per unit mass of a planet is l = GM a(1 − e2 ), where a is the average
distance of the planet from the Sun and e is the eccentricity of the elliptic orbit.
Problem 4.23. Show that the solution of Eq. (4.3) with u(0) = u0 , u (0) = 0
is the inverse of the function θ = θ(u) given by
u
dx
θ= ,
u0 q(u )e2s(x−u 0 ) − q(x)
0
where q(x) = x2 − 2p/rs ) , p = GM/l2 , and rs = 2GM/c2 .
Show also that in this case we have
1
u + u = q(u0 )e2rs (u−u0 ) .
2
Thus, the study of Eq. (4.3) is subsumed in the general study of equations of
the form
u + u = aebu ,
with constants a and b.
Problem 4.24. Assuming Newtonian mechanics, imagine that a photon “falls”
from infinity with initial speed c, use conservation of energy to show that its speed
v at distance r from the Sun satisfies
v 2 = c2 + 2GM/r ,
that is, v = c 1 + rs /r. (In contrast to what we found about the speed of light
in a relativistic gravitational field, Newtonian mechanics predicts that light slows
√
down as r increases. At the Schwarzchild radius, the speed of light would be 2c.
In the case of the Sun, however, where rs /r < 10−5 for any photon that does not
actually fall into the Sun, the speed would be nearly constant, and the resulting
path nearly straight.)
CHAPTER 5
In the preceding chapter, we left some important parts of the general theory of
relativity unexplained. To mention only two, we did not explain why the word
curvature is applied to describe what gravity is, and we did not explain why the
formula that we gave for computing the curvature really does represent something
that is intuitively a curvature. These two problems are the basic ones that lie
at the foundation of an understanding of general relativity. The path to that
understanding is a long and arduous one, even when it is stripped down to just the
problem of gravitation in empty space.
The connection between curvature and physics will be discussed in more detail
in Chapter 7. For now, it suffices to point out that the informal principle forces
produce curvature describes a great deal of classical physics. The main example lies
at the very heart of Newtonian mechanics in the form of the law of inertia: A body
subject to no forces will move in a straight line at constant speed. In other words,
to produce any curving of its path, forces are required. And conversely, when elastic
forces are considered, curvature produces force. For example, the earliest attempt
to analyze the vibrating string used the principle that the restoring force at each
point of a stretched string is proportional to the curvature of the string at that
point. Later, Euler showed that a particle subject to forces that confine it to a
given surface but having no acceleration tangential to that surface will move along
a geodesic of the surface, that is, a path of minimal curvature (see Appendix 2 of
Volume 2).
Since the key concept is the curvature of space-time, we’ll start much farther
back and give some details of the classical subject of differential geometry, as devel-
oped in the eighteenth and nineteenth centuries. The twentieth-century concept of
a differentiable manifold provides sufficient generality for our purposes. The theory
of these manifolds is discussed in Appendix 4 of Volume 2. The evolution of the
concept of curvature can be divided chronologically into four phases, associated
with the names of Euler, Gauss, Riemann, and Ricci. The first two of these phases
are the subject of the present chapter.
Our first goal is to present a streamlined version of the eighteenth-century work
of Euler and the nineteenth-century work of Gauss on curvature. The context is
two-dimensional surfaces in R3 . Our aim is to see how the curvature of a param-
eterized surface can be computed from knowledge of its metric coefficients alone,
independently of any embedding in R3 . The notion of curvature that is needed
for relativity involves some rather sophisticated multilinear algebra, especially the
concept of a tensor and the differential geometry in which tensors are the natu-
ral language. But we approach the subject gently. Neither Euler nor Gauss had
these concepts; and, although we have foreshadowed them in Chapter 2 with our
discussion of covariant and contravariant objects and even computed with them in
189
190 5. CONCEPTS OF CURVATURE, 1700–1850
Chapter 4, we need to build a bridge to them starting from the basic calculus that
Euler and Gauss possessed (as, we presume, the reader does also). It is known (see
Section 3) below that Gauss was able to get the curvature formula we are seeking,
but he did so in a way that now seems unnecessarily cumbersome. What would
have streamlined his work, had he known about it, is the notion of a Christof-
fel symbol. We have given an algebraic definition of these in Chapter 4. In the
present chapter, we shall give a geometric definition and verify that it agrees with
the algebraic definition we gave earlier. The Christoffel symbols are the essential
bridge between the classical multi-variable calculus used by Euler and Gauss and
the tensor analysis that now pervades both geometry and physics. After we obtain
our formula for curvature in terms of the metric coefficients, we can devote the
following two chapters to a discussion of the use of tensors in differential geometry
and physics, bringing the technical part of this book to a close.
1. Differential Geometry
Differential geometry is the application of the differential and (to a lesser extent)
integral calculus to study the geometry of curves and surfaces. In ancient Greek
geometry, things that we now do easily and in great generality using algebra and
infinitesimal methods had to be done on the macro-level. It was possible in this
way to find the tangents to circles and conic sections, but more complicated curves
could not be handled.1 We begin by looking at material the reader has no doubt
seen before, but emphasizing certain points that may have escaped notice earlier.
1.1. Derivatives and differentials. The elementary material we wish to
review is the basic principle of differential calculus. Once you have the idea of
graphing a distance-against-time relation, expressing the distance y as a function
y = f (x) of time x, you get a very easy geometric interpretation of average velocity
over a time interval. In terms of Fig. 5.1, the average velocity is BC/AC = Δy/Δx,
which is geometrically the slope of the secant AB. If the time interval Δx is very
small, we expect that the average velocity will be close to what we intuitively think
of as the instantaneous velocity at the time x0 . Intuitively, as the time interval gets
very short (B slides down the curve toward A), that secant should approximate a
small section of the tangent at A, and hence its slope should approximate the slope
of the tangent at P . Both geometrically and algebraically, there is a difficulty in
making sense of this, since we cannot make any sense out of Δy/Δx when Δx = 0.
(You cannot divide by zero.) We get lucky in the case of polynomial functions,
however. For example, with a falling body, y = y0 + v0 x + (g/2)x2 , where y0 ,
− y(x0 ) = v0 (x − x0 ) + (g/2)(x − x0 ) =
2 2
v0 , and g are constants, and Δy = y(x)
v0 (x − x0 ) + (g/2)(x + x0 )(x − x0 ) = v0 + g/2(x + x0 ))Δx. Thus, for any value of
x except x = x0 , we find that Δy/Δx = v0 + (g/2)(x + x0 ). Although the left-hand
side of this equation still does not make any sense when x = x0 , the right-hand
side does: it is v0 + gx0 . One would naturally be inclined to adopt that value as
the definition of the instantaneous velocity at time x = x0 . This algebraic kindness
1 Of all the ancients, only Archimedes came close to discovering the secret of the derivative,
when he found that the tangent to an Archimedean spiral at the end of its first turn, the line from
the origin of the spiral to the point in question, and the perpendicular to that line at the origin
formed a right triangle whose area was equal to that of the circle through the point in question
with center at the origin. In other words, he established what we now recognize as a connection
between tangents and areas, dimly prefiguring the fundamental theorem of calculus.
1. DIFFERENTIAL GEOMETRY 191
y
....
........
y = f (x)
..
.... ..
... ..
... ...
.
.
... ...
... ...
... ...
... .. .
.
... ... ....
... ... ....
... ... ...
... ... .....
... ... ....
... ... ....
... ... ...
... ........ ....
...
... B ......
.....
.....
..... Δy = CB
...
...
...
• .
.... ....
. .
....
..
......
... .. .....
... ...... ... .....
...
... • .
.
..... .. .....
....... ............
..... ..... ..
...
... D ...... ...... ..
...... ..... ....
Δx = dx = AC
... ........... .
... .................. ....
... ............ ...
..... A• ........
..............................................
.
.......... . ..
.
............
... ..........
... ......... .
...
...
............................ .........
... ..
............... C dy = CD
... ... ..
... ..... ...
... ..... ...
... ......... .....
... .... ..
..... ...
... ..... ...
... ..... .....
... ......
. .
.
... ...
... ......... ...
... ...... ...
......... .....
.. ..
...
.............................................................................................................................................................................. x
.
....
.
..
...
...
...
...
2 Logical purists like the philosopher George Berkeley (1685–1753) raised objections. Writing
in The Analyst in 1736, about a decade after the death of Newton, he pointed out (using the
example of y(x) = x3 ) what the fallacy was: You can’t set Δx = 0 in an equation derived under
the assumption that Δx = 0. But he didn’t wish to overthrow Newton’s calculus. He merely
pointed out that Newton’s use of this kind of argument was only heuristic, and that Newton was
happy to dispense with the infinitesimals once he had finite lines bearing the same ratio (CD and
AC in Fig. 5.1).
192 5. CONCEPTS OF CURVATURE, 1700–1850
when Δx is small. What makes calculus work is the quantitative fact that, when
we take dx = Δx—the differential of an independent variable equals its finite
increment—the approximation dy = f (x0 ) dx = f (x0 ) Δx ≈ Δy is good in the
sense that (Δy − dy)/Δx is very small when Δx is very small. In other words, the
quantity Δy − dy ≈ 0 is not merely small; it is small even in comparison with the
small quantity Δx, and the smaller Δx is taken, the smaller the ratio (Δy −dy)/Δx
becomes. Despite the fact that the denominator is becoming small, the numerator
is becoming small so much faster that this ratio gets arbitrarily small.
To summarize, there are three essential points to keep in mind: (1) For an
independent variable, the infinitesimal increment dx can be identified with the
finite increment Δx; (2) the differential dy is a linear function (constant multiple)
of the differential dx; (3) the linear mapping Δx = dx → dy is the best linear
approximation to the mapping Δx → Δy = f (x0 + Δx) − f (x0 ).
These three points are the essence of differential calculus. In practice, we often
switch from the language of the derivative f (x) to the language of the differentials
dy and dx and vice versa. For example, we may write a differential equation either
in the form f (x) = af (x) or dy = ay dx. It is the same equation either way. Solving
it means finding the function f (x) such that the relation y = f (x) is equivalent to
the relation dy = f (x) dx together with, say, an initial condition y = y0 when
x = x0 , that is, f (x0 ) = y0 .
Despite the near-equivalence of derivatives and differentials in practice, differ-
ential geometry makes a distinction between them. That difference shows up best
when we have two sets of parameters that can be used to describe a geometrical
object. Suppose these are x = (x1 , x2 , . . . , xn ) and y = (y 1 , y 2 , . . . , y n ). (It is es-
sential that n be the same in both cases.) Assuming each ordered set of parameters
is in one-to-one correspondence with the points P of the same geometrical object,
we get a natural association
(x1 , . . . , xn ) ↔ P ↔ (y 1 , . . . , y n ) .
Thus, P can be regarded as a function of either set of parameters, and theoret-
ically, linking these two associations defines each 1set 1of parameters as functions of
n n 1
the other,
1 1 so that we get two sets of functions
x (y , . . . , y ), . . . , x (y , . . . , y n )
n n 1 n
and y (x , . . . , x ), . . . , y (x , . . . , x ) . If z = f (P ) is a function defined on the
geometric object, then z can be regarded as a function of either set of parameters.
As such, it has partial derivatives:
∂z ∂z
, and , i = 1, . . . , n , j = 1, . . . , n .
∂xi ∂y j
It also has a total differential in each coordinate system:
∂z ∂z ∂z ∂z
dz = dx1 + · · · + n dxn and dz = dy 1 + · · · + n dy n .
∂x1 ∂x ∂y 1 ∂y
As in the case of a function of one variable, we can identify the differentials
dx1 , . . . , dxn of the independent variables x1 , . . . , xn with their finite increments
Δx1 , . . . , Δxn . When we do, dz (with the partial derivatives evaluated at a given
point x0 = (x10 , . . . , xn0 ) that is held constant) is a linear function of dx1 , . . . , dxn ,
and the mapping
(Δx1 , . . . , Δxn ) = (dx1 , . . . , dxn ) → dz
1. DIFFERENTIAL GEOMETRY 193
is the best linear approximation to the mapping of the increments in the x-variables
to the increment in z: (Δx1 , . . . , Δxn ) → Δz = z(x10 + Δx1 , . . . , xn0 + Δxn ) −
z(x10 , . . . , xn0 ).
We need to know how to translate the partial derivatives and the differentials
from one set of parameters to the other, since the two sets may not be equally
convenient in every situation. The important infinitesimal relation that allows us to
do this, and also contains the secret to understanding a huge amount of differential
geometry, is the chain rule, which has both a derivative form and a differential
form:
∂z ∂z ∂x1 ∂z ∂xn
j
= 1 j
+ ···+ n , j = 1, . . . , n ,
∂y ∂x ∂y ∂x ∂y j
∂y j ∂y j
dy j = 1
dx1 + · · · + n dxn , j = 1, . . . , n .
∂x ∂x
These two sets of relations can be written as matrix equations, with the partial
derivatives of z written as single-rowed matrices and the differentials of the variables
as single-column matrices:3
∇y z = ∂y ∂z
1
∂z
∂y 2 · · · ∂y ∂z
n
⎜ ⎜ ∂x
2
∂x2 ∂x2 ⎟
· · · ∂y n⎟
1 n
= ∂x1 ∂x2 · · · ∂xn ⎜ .
∂z ∂z ∂z ⎜ ∂y 1 ∂y 2 ⎟ = ∇x z ∂(x , . . . , x )
. .. . ⎟ ∂(y 1 , . . . , ∂y n )
⎝ .. .. . .. ⎠
n n n
∂x
∂y 1
∂x
∂y 2 · · · ∂x
∂y n
and ⎛ ⎞ ⎛ ∂y1 ⎞⎛ ⎞
∂y 1
dy 1 ∂x1 ··· ∂xn
dx1 1 n
⎜ ⎟ ⎜ .. ⎟ ⎜ .. ⎟ = ∂(y , . . . , y ) dx .
dy = ⎝ ... ⎠ = ⎝ ... . ⎠ ⎝ . ⎠ ∂(x1 , . . . , xn )
dy n ∂y n ∂y n dxn
· · · ∂x
∂x1 n
These formulas contain the two mutually inverse Jacobian matrices that we
introduced in Section 10 of Chapter 2.
⎛ 1 ⎞
∂x1
∂x
1 · · · ∂y n
∂(x1 , . . . , xn ) ⎜ .. ⎟
∂y
1 n
=⎜
⎝
..
.
..
. . ⎠
⎟
∂(y , . . . y )
∂xn ∂xn
∂y 1 · · · ∂yn
and ⎛ ∂y1 ⎞
∂y 1
1 n ∂x1 ··· ∂xn
∂(y , . . . , y ) ⎜ . .. .. ⎟ .
= ⎝ .. . . ⎠
∂(x1 , . . . xn ) n
∂y n
∂y
∂x1 ··· ∂xn
Example 5.1. Consider the change from rectangular coordinates (x, y) to polar
coordinates (r, θ) in the right half-plane where x > 0. We have
r = x2 + y 2 x = r cos θ
, .
θ = arctan(y/x) y = r sin θ
3 The nabla-symbols ∇ and ∇ are a useful notation for the gradient operator that we shall
x y
use frequently in what follows. The symbols are inserted here just to show how they are expressed
algebraically.
194 5. CONCEPTS OF CURVATURE, 1700–1850
Then
∂(r, θ) √ x √ y
∂(x, y) cos θ −r sin θ
= x2 +y 2 x2 +y 2 , = .
∂(x, y) −y x ∂(r, θ) sin θ r cos θ
x2 +y 2 x2 +y 2
It is easy to verify that the product of these two matrices is the identity matrix
by replacing either set of variables in the product with the expressions they have
in the equations of transformation.
called tensors of type (0, 2), (1, 1), and (2, 0) respectively. For the first type, a
tensor T (u, v) of covariant rank 2, we write the matrix as
⎛ ⎞
a11 · · · a1n
⎜ .. .. ⎟ ,
⎝ . . ⎠
an1 ··· ann
meaning that the value of the tensor on a pair of (contravariant) vectors u =
ui ∂/∂xi , v = v j ∂/∂xj is
n
T (u, v) = aij ui v j ,
i,j=1
" i j
The tensor itself is written as aij dx dx to indicate its type. The metric tensor
ds2 introduced in Section 10 of Chapter 2 is an example of a tensor of this type,
and it acts on a pair of contravariant vectors (which in physical applications will
be velocities or four-velocities).
For a tensor of type (1, 1), say T (u, ω), acting on a vector u = ui ∂/∂xi and a
covector ω = vj dxj we write the matrix as
⎛ 1 ⎞
a1 · · · a1n
⎜ .. .. ⎟ ,
⎝ . . ⎠
an1 ··· ann
meaning that
n
T (u, ω) = aji ui vj .
i,j=1
We leave it to the reader to work out that the matrix representation of a tensor
of contravariant rank 2, say T (υ, ω), operating on a pair of covectors υ = ui dxi
and ω = vj dxj would be
n
T (υ, ω) = aij ui vj .
i,j=1
Remark 5.1. One advantage of using subscripts and superscripts is that they
naturally encode the Einstein summation convention: If an index appears in a
product as both subscript and superscript, the terms containing that index are to be
summed over the appropriate range. In this context, the superscript on a variable
xi counts as a subscript when it appears in the denominator of a partial derivative,
that is, in the expression ∂/∂xi , the i is considered a subscript. In that notation,
we have
∂ ∂ ∂
u = ui i = u1 1 + · · · + un n and υ = ui dxi = u1 dx1 + · · · + un dxn .
∂x ∂x ∂x
By convention, the variables themselves are written with superscripts. That
notation does not mean that they are either covariant or contravariant vectors.
Given the three kinds of tensors of rank 2, we see that our definition of the
Kronecker delta in Section 10 of Chapter 2 may have been unnecessarily restric-
tive. The identity matrix there was assumed to result from multiplying a square
contravariant matrix by a square covariant matrix, resulting in a square mixed
matrix. But it very well might represent the identity (covariant) matrix of metric
coefficients gij or its inverse g ij , in which case we could take all three symbols to
196 5. CONCEPTS OF CURVATURE, 1700–1850
have the same literal meaning (denotation): δij = δji = δ ij , the three cases being
distinguished by context (connotation). We shall mostly need the mixed symbol
that we have already introduced, however.
Just to maximize our confusion, the coordinates ui of a contravariant vector
u, being the result of applying a linear functional to u, transform in a covariant
way. If we switch to coordinates (y 1 , . . . , y n ) in which the components of u are
(v 1 , . . . , v n ), then
∂ ∂y j ∂ ∂y j ∂ ∂
ui i = u i i j
= ui i j
= vj j ,
∂x ∂x ∂y ∂x ∂y ∂y
that is ⎛ 1 ⎞ ⎛ ∂y1 ⎞ ⎛ 1⎞
∂y 1
v ∂x1 · · · ∂x n u
⎜ .. ⎟ ⎜ . . ⎟ ⎜ .. ⎟
⎝ . ⎠ = ⎝ .. .. ⎠ ⎝ . ⎠ .
vn ∂y n ∂y n un
∂x1 ··· ∂xn
Remark 5.2. The reader is warned that the notation
∂(x1 , . . . , xn )
∂(y 1 , . . . , y n )
usually denotes the determinant of this matrix. Since we have need of the matrix
itself, we shall explicitly write
∂(x1 , . . . , xn ) ) )
) ∂(x1 , . . . , xn ) )
det or ) ) )
∂(y 1 , . . . , y n ) ∂(y 1 , . . . , y n ) )
when we need to mention this determinant. The important fact about the Jaco-
bian, easily proved, is that it must be an invertible matrix if the correspondence
(x1 , . . . , xn ) ↔ (y 1 , . . . , y n ) is to be differentiable in both directions, and that is a
requirement we always impose. As matrices, we have
⎛ ⎞
1 0 ··· 0 0
⎜0 1 · · · 0 0⎟
∂(x1 , . . . , xn ) ∂(y 1 , . . . , y n ) ⎜⎜ .. .. . .
⎟
.. .. ⎟ ,
= ⎜ . . .⎟
∂(y 1 , . . . y n ) ∂(x1 , . . . xn ) ⎜. . ⎟
⎝0 0 · · · 1 0⎠
0 0 ··· 0 1
while as determinants the relation is simply
∂(x1 , . . . , xn ) ∂(y 1 , . . . , y n )
det 1 n
det = 1,
∂(y , . . . y ) ∂(x1 , . . . xn )
as illustrated above in Example 5.1.
the differentials dx1 , . . . , dxn are really the same as the increments Δx1 , . . . , Δxn ,
and amount to merely a new set of variables. In view of that fact, we feel free
to alter our point of view on these differentials and regard them as operators that
interact with partial derivatives according to the formula
∂
dxi = δji .
∂xj
The entry in row i, column j of the identity matrix is δji . In the language of
Appendix 4 of Volume 2, {dx1 , . . . , dxn } is the basis of the cotangent space dual
to the basis {∂/∂x1 , . . . , ∂/∂xn } of the tangent space. The tangent and cotangent
spaces act on each other and can be treated symmetrically.
Parametrizations, vectors, and covectors are the basic tools of differential ge-
ometry. Our object in this chapter is to explore curved spaces using these tools, and
specifically to investigate the concept of curvature. Several different definitions of
curvature have been found useful in physics and geometry, but we shall start with
the classical notion of curvature of curves and surfaces in R3 or, more generally, Rn ,
and generalize from there. Surfaces in R3 provide a very good visualizable model for
understanding the general situation. Although we live in a three-dimensional world,
we can imagine ourselves observing beings who are confined to a two-dimensional
surface embedded in that space. From our three-dimensional perspective, we can
visualize the surface, but they cannot; it is their whole universe and they cannot
imagine anything outside it. From our point of view, the metric, the geodesics, and
the curvature of that surface are determined by its embedding in R3 , which serves
as a kind of scaffolding to be used when building a condominium. It is attached to
the “condominium,” which is the surface itself, but does not form an intrinsic part
of that structure. We visualize the “scaffolding” (tangent plane) as dangling in a
space that does not exist as far as the residents of the condominium know. Eventu-
ally, by removing the scaffolding, we—and the residents of the condominium—hope
to be able to study the condominium/surface intrinsically, replacing concepts de-
fined by use of the tangent plane and the third dimension with concepts that can
be defined in terms of the parameterization alone, without reference to anything
outside the surface. After we see how to do that for a surface in R3 , we will be in
a position to do it for our own four-dimensional space-time, which is not, as far as
we know or need to know, embedded in any space of higher dimension. We shall
look at this imaginary world again at the end of the next section, after we have
introduced the general notion of curvature.
2.1. Curvature of a plane curve. The switch from the macroscopic point
of view of the Greek geometers to the modern application of differential calculus
to solve problems in geometry has had profound consequences. It would have been
impossible without two key ingredients: the algebraic apparatus that Descartes
brought to bear on geometry, and the infinitesimal reasoning of Descartes, Fermat,
Pascal, Newton, Leibniz, and other seventeenth-century mathematicians. Newton’s
Principia, although it contains the elements of differential calculus, still reflects a
preference for finitistic reasoning. Newton liked to express himself in the classical
language of Euclidean geometry. It took more than two centuries for the “action-at-
a-distance” view of gravity in Newtonian mechanics to be supplanted by reasoning
about a metric given in infinitesimal form.
In the previous section, we pointed out that differential calculus is useful be-
cause a large class of curves can be fitted closely with straight lines via the differ-
ential approximation described above. In accordance with that discussion, we note
that, for a fixed base point x0 , the straight line whose equation is
y = f (x0 ) + f (x0 )(x − x0 )
approximates the curve y = f (x) to first order. In the language of infinitesimals,
on the first-order infinitesimal level, a smooth curve is a straight line. On that
infinitesimal level, we replace Δy = y − f (x0 ) by dy and write dy = f (x0 ) dx. We
remind the reader that for an independent variable x, we have Δx = dx = x − x0
on both the finite and infinitesimal levels.)
Linear functions are easily handled without infinitesimals, and that is because
in this case Δy is not just approximated by dy, but actually equal to it. Linear
motion at constant speed amounts to the simple “distance = rate × time” formula
taught in high-school algebra. Given the Descartes/Newton law of inertia, we see
that this motion is in a sense the natural state of things. In Newtonian physics, it
is departures from this simple formula that need to be explained, and this is done
by positing forces that act on the irregularly moving object. For that reason, we
need to go beyond the level of first-order approximation.
Approximating with straight lines, we can’t normally do any better than first-
order approximation unless the curve we are approximating is itself “flat” at a
point. That is, the point is a point of inflection, where the curve is locally very
close to being a straight line. Such points are exceptional. If the curve does not
5 Which of several people named Hippias did this is disputed among historians of ancient
Greek mathematics.
2. CURVATURE, PHASE 1: EULER 199
Definition 5.1. The curvature of the graph of the function y = f (x) at the
point x, f (x) is the number
dα dα dx f (x) 1 f (x)
= = 2 2 = 2 3/2 .
ds dx ds 1 + f (x) 1 + f (x) 1 + f (x)
If f (x) > 0, the curvature is positive, which means the local portion of the
plane just above the graph is convex; if f (x) < 0, the portion just below the graph
is convex.
2.2. Curves in Rn . The notion of curvature generalizes easily to curves in
R , but in this case we will have to settle for the absolute value of the curvature,
n
since what appears as a clockwise rotation from one side of a plane looks like
a counterclockwise rotation from the other side. The case n = 3 is sufficiently
general to convey the idea. Suppose the curve is given parametrically as the graph
of a vector-valued function r(t). Then arc length measured from the point r(t0 ) in
the direction of increasing t is
t
s= r (x) · r (x) dx ,
t0
which is to say
ds dt 1
= r (t) · r (t) and = ,
dt ds r (t) · r (t)
and we assume for the sake of simplicity that the point “keeps moving,” that is,
r (t) is never the zero vector.
200 5. CONCEPTS OF CURVATURE, 1700–1850
(2) The vector curvature κ(t) is the projection of r (t) perpendicular to the
tangent vector r (t) divided by the square of the magnitude of r (t). If r
is assumed to have physical dimension of length—and we normally take
that for granted—then the curvature has the physical dimension length−1 ,
whatever physical dimension is ascribed to the parameter6 t.
2.3. Surfaces in R3 . Near any given “base” point (x0 , y0 , z0 ) ∈ R3 , the equa-
tion of the most general smooth (differentiable) surface can always be written in
the form F (x, y, z) = 0. Except in degenerate cases, this equation can be solved
locally for one of the variables, which without any loss of generality we may take
to be z, and rewritten as z = f (x, y). Such a solution exists by the implicit func-
tion theorem, near any base point (x0 , y0 , z0 ) where ∂F ∂z is not zero. The function
f (x, y) is determined by the condition f (x0 , y0 ) = z0 and
the formulas
for its partial
derivatives, obtained by differentiating the identity F x, y, f (x, y) ≡ 0:
∂F
∂f
= − ∂x
∂F
,
∂x ∂z
∂F
∂f ∂y
= − ∂F
.
∂y ∂z
When y is held fixed, the first of these is a differential equation for a function
ϕy (x) = f (x, y) that has a unique solution up to a “constant” that actually depends
on y. That is, f (x, y) = ϕy (x) + c(y). When ϕy (x) + c(y) is substituted into the
second equation, the result is a differential equation that determines c(y) (that is to
say, f (x, y)) up to a constant term k. The constant term is then determined from
the condition f (x0 , y0 ) = z0 . (See Appendix 5 for justification of this argument.)
Example 5.2. The simplest example of this process is provided by the sphere
whose equation is F (x, y, z) = x2 + y 2 + (z − r0 )2 − r02 = 0. Near the point (0, 0, 0),
we can solve explicitly for z and write
z = r0 − r02 − x2 − y 2 .
This equation is valid for all points (x, y, z) in the “Southern Hemisphere,”
where 0 ≤ z < r0 . If we had found it necessary to resort to the differential equation
to determine this function, we would have used the equations
∂z ∂f 2x x
= = − =− ,
∂x ∂x 2(z − r0 ) z − r0
∂z ∂f 2y y
= = − =− .
∂y ∂y 2(z − r0 ) z − r0
The first of these would yield, via the differential equation x dx+(z −r0 ) dz = 0
that holds when y is constant (that is, dy = 0), the relation
z = r0 − c(y) − x2 ,
(the negative sign is needed since z < r0 ), and when this equation is differen-
tiated with respect to y, holding x fixed, we find that the second equation says
c (y)/2(z − r0 ) = ∂z/∂y = −y/(z − r0 ), which yields c(y) = −y 2 + k for a constant
6 In the example of the circle just given, t appears as the argument of trigonometric functions;
We regard the Hessian as a tensor of type (0, 2), so that its entries will be
denoted with a pair of subscripts as hij . As such, its effect on a pair of contravariant
vectors u = (u1 i + u2 j) and v = v 1 i + v 2 j is written
∂2 1 1 ∂2f 1 2 ∂2f 2 1 ∂2f 2 2
Hf (u, v) = hij ui v j = u v + u v + u v + 2u v .
∂x2 ∂x ∂y ∂y ∂x ∂y
More generally, we can define the Hessian Hf (u, v) of a function f (x1 , . . . , xn )
on Rn as an n × n matrix, the matrix of the bilinear function (tensor of covariant
rank 2, that is, type (0, 2)) whose action on a pair of vectors u = (u1 , . . . , un ) and
v = (v 1 , . . . v n ) is given by
n
∂2f
Hf (u, v) = i ∂xj
ui v j .
i,j=1
∂x
Notice that Hf varies from one point (x, y) to another, but we are suppressing that
argument to avoid getting formulas that look too cumbersome. But keep in mind
that there is always a “base point” (x, y) where the partial derivatives are to be
evaluated.
Because the mixed second-order partial derivatives are equal, the Hessian ma-
trix is symmetric. As the reader likely knows already, the gradient and the Hessian
on Rn play the roles of first and second derivatives in the discussion of series expan-
sions of functions of several variables. Thus, potential extreme values of a function
f (x1 , . . . , xn ) are found by looking for points where the gradient is zero (all the
first-order partial derivatives vanish). When those points are found, a sufficient
7 Named after Ludwig Otto Hesse (1811–1874).
2. CURVATURE, PHASE 1: EULER 203
where
this last equation is simply the definition of the symbols Aθ (t) =
(cos θi + sin θj) · ∇f and Hθ (t) = Hf cos θi + sin θj, cos θi + sin θj . In both
of these, it is understood that the partial derivatives of f that occur are to be
evaluated at (x + t cos θ, y + t sin θ).
To take a simple case for example, say f (x, y) = xy, we have Aθ (t) = x sin θ +
y cos θ + 2t cos θ sin θ and Hθ (t) = 2 cos θ sin θ.
Although we really would like to use arc length s as a parameter, we cannot
assume that t = s unless
∂f ∂f
cos θ + sin θ = 0,
∂x ∂y
since the assumption that t = s implies that |r θ (t)| = 1. In general this is true
only
for onedirection, but if the xy-plane is tangent to the surface at the point
x, y, f (x, y) , then the two first-order partial derivatives of f will vanish at that
point, and this equation will hold in all directions θ. It will not hold in general
at nearby points where t = 0, and we need those points in order to compute the
second-order partial derivatives. Hence, we do not assume that t = s.
The relation Aθ (t) = Hθ (t) is immediate. From the expression above, we have
dt 1 1
= = ,
ds |r θ (t)| 1 + Aθ (t)2
and the unit tangent vector is, when we suppress the argument t,
1
τ = cos θi + sin θj + A θ k .
1 + A2θ
The vector curvature of the parameterized curve r θ (t) (see Definition 5.2 above)
works out to be
−Aθ Hθ Hθ
κθ (t) = 2 cos θi + sin θ j + k,
(1 + Aθ ) 2 (1 + A2θ )2
204 5. CONCEPTS OF CURVATURE, 1700–1850
from which it follows that the numerical curvature at x, y, f (x, y) in the direction
given by θ is
|Hθ |
κθ (0) = ,
(1 + A2θ )3/2
where Hθ and Aθ are evaluated at t = 0. In our example z = xy, we find κθ (t) =
2 sin θ cos θ/(1 + x2 sin2 θ + 2xy sin θ cos θ + y 2 cos2 θ)( 3/2). At the point x = y = 0,
we thus get κθ (0) = sin(2θ), and, as we would expect, the curvature is 0 along the
axes and ±1 at points making a 45◦ angle with the axes.
Notice the analogy with the case of a plane curve, where
f (x)
κθ (0) = 2 3/2 .
1 + f (x)
Because of this convenient analogy, we can now relax our definition of the
numerical curvature. That is, we can allow κθ (0) to be negative, defining it by the
equation
Hθ
κθ (0) = .
(1 + A2θ )3/2
The complication here is the directionality of this curvature. It depends on θ. We
shall now explore this dependence in more detail.
Since we are now finished with the variable t, we set it equal to zero and write
Hθ
κθ (x, y) = ,
(1 + A2θ )3/2
where now the partial derivatives that occur in Hθ and Aθ are to be evaluated
at (x, y), both coordinates of which are still being held fixed. Our next task is
to get an expression that will measure the curvature of the surface at the point
(x, y) by selecting information from the numerical curvatures in various directions.
The algebra involved will become prohibitively complicated at this point unless we
make an intuitive “leap of faith” and believe that the curvature is intrinsic to the
surface and would be the same in any coordinates. We explore the algebraic basis
of that faith below (Problem 5.2), and a full proof of its correctness can be found
in Appendix 6 of Volume 2. Right now we simply make the leap.
We first translate our coordinate axes so that the origin is at x, y, f (x, y) ,
that is, (x, y) = (0, 0) and f (0, 0) = 0. Then, by a rotation of axes, we force the
tangent plane at this point to be the xy-plane. Both of these changes of variable
are rigid motions of R3 , and hence should not affect the curvature if it is what we
picture it as being. The fact that the z-axis is perpendicular to the tangent plane
provides a notion of “above” and “below” relative to that plane, and hence a notion
of positive and negative curvature for sections of the surface by planes containing
the z-axis (perpendicular to the tangent plane). By this choice of coordinates, we
gain a great deal of simplicity, since now the first partial derivatives of f (x, y) at
(0, 0) are both equal to zero. Thus Aθ = 0 at that point, and our curvature κθ is
simply
κ θ = Hθ .
By using trigonometric identities, we can write
Hθ = a + b cos(2θ) + c sin(2θ) ,
2. CURVATURE, PHASE 1: EULER 205
where
1 ∂2f ∂2f
a = 2
+ 2 ,
2 ∂x ∂y
1 ∂ f2
∂2f
b = − ,
2 ∂x2 ∂y 2
∂2f
c = ,
∂x∂y
and then Hθ is given as
Hθ = a + d cos(2θ − ϕ) ,
where
1 ∂2f ∂ 2 f 2 ∂ 2 f 2
d = b2 + c2 = − + 4 ,
2 ∂x2 ∂y 2 ∂x∂y
b
cos ϕ = ,
d
c
sin ϕ = .
d
We now need to study the way κθ varies as a function of θ. Being of period π
it necessarily has a maximum and minimum value in each period, of course. From
the equations just written, we can see that the values of θ at which these extrema
occur are those for which cos(2θ − ϕ) = +1 and cos(2θ − ϕ) = −1 respectively.
The numerical curvatures in these directions are called the principal curvatures of
the surface at that point. The result is the following theorem, due to Euler, and
illustrated in Fig. 5.2:
Theorem 5.1. The directions of maximum and minimum curvature of planar
sections perpendicular to the tangent plane are given by the angles
ϕ π+ϕ
θ= and .
2 2
In particular, they are perpendicular to each other.
These maximum and minimum values are a ± d.
The two curvatures just described are called the principal curvatures
of the
surface at the base point. If the
surface has points x, y, f (x, y) arbitrarily near
the base point x0 , y0 , f (x0 , y0 ) where f (x, y) ≥ f (x0 , y0 ), then the maximum
curvature
of such sections is nonnegative. Likewise, if there are points arbitrarily
near x0 , y0 , f (x0 , y0 ) where f (x, y) ≤ f (x0 , y0 ), then the minimum curvature is
nonpositive.
The preceding considerations motivate Euler’s definition of curvature for a sur-
face.
Definition 5.5. The curvature κ at (x, y) of the surface z = f (x, y) is the
product of the minimum and maximum numerical
values
of the curvatures of planar
sections of the surface through the point x, y, f (x, y) perpendicular to the tangent
plane at that point.
In terms of the angle ϕ defined above and the constants a and d defined above,
this means
∂ 2 f 2 ∂ 2 f 2 ∂ 2 f 2
κ = κϕ/2 κ(ϕ+π)/2 = a2 − d2 = − .
∂x2 ∂y 2 ∂x ∂y
206 5. CONCEPTS OF CURVATURE, 1700–1850
.........
.......
....... ............
............... .............. .....
... ... ....... .
. ....... .. .......
.... ... .............. ............. ........ ...
.
... ...... . ...... . ..... .... ..
... ......... ............. ............. .......... .....
.. . . . .... . .
.. . ...... ...... . .... ... .... .....
.. . .... ......... ............. .... ....... ..... ...
... .. ..... .......... .. . . . .
.......... ..... .............. ...
.... . ........ ............. ..
....... ... ... ...... ..
.. . ... . ........ . . ..
.. ... ...... ....... ........ ... .... .. ..
.. .... .......... ............. .... ...... ... ..
.. .. ..... .........
......... ..... ...... ..... ..
... ..... ..... ...... .......
... ............ .
........ ... ..... . ..
... ... . . . .
.
.. ....
...... ........... . . ... ... ...... ... .
... ..... ... ..... .......... ...
..... .
.......... ... ... . . . . . .. . .
. ...
... ... .. . . . . . .. . . . . . . . . .
. ..
.... ... ... ......... .... ...... .....
. .. .. . ..
.
.... . . . ..
........ ....... .. . . ... . . . . . . .
.
.. .. ..... .... .... .. ......... ............ ........ ...... . . . ..
... ... ... ..
.. ... .. .. ...... . ...... . ....... ..... ........ ... ... .... ..
...
... ... .. ...... ............ ............... ..... ........... ... . ..
.... ... ..... .. ... .......... ............... ....... .... ........ . .. ... ... ...
... . ... .. .. . ...... ........
. . . .. .. ..
........ ... ..... . .
...
. .....
.
. ...
... .. .. . . ... . ........... . . . . . .
..... ..... .......... .. ... . . . . .
. ...
.. . .... .. .. .
. . . ....... ......... . . ... .... . .. .
....
..
. ... .... . ... . . .. . ............ .... ... ... . ...
. . . . . . .. ... ...
. . . . ........ .. ........ ..... ........ ... .....
. .. ...
.. ... .. . .
..... . .. . ..
....... . .
. ..
.......... .. . ...
..
............. ...... .... . . . . . .. .
. ..
... .... .. . . .. .. . ..... . . .. . . . ...
. . . ..
. ...
.. . . .
........................ ....... ........ . . . . .
.... .... ... ..... ... ...... .... ............ . . . . .... .. .
. . . ...
. ..
... .... . .. .... ...
.
..... .. .........
. . .. . . . . .
. ........ ........ ....... .
.
. ... .. ...
. . . .. .
.
. ..
... . ... .. ... .
.... ... .. .............. . .
................. ......... . .......... . ... .. .. . .
.. ... .
. ..
. .. . . . . . . .
. .. . . .
... ... .. .. ..... ..........
. .........
........................ ........ ... .............. . .. ......
. .. . .. . .
. . . . .
. .......... . .. ......... ... ............. . . . . . . .
.
..
... ... .. ....... .... .............
......................... ........ ... . . . . ..
... ... ...
... .. .. . ..... .. .. ............. ............... ... .. . .
... ... .... .. .... ..
.. ... . .. .. ................
............... ....................... ....... .......... . ...
... ... ... ... . .. ... ..
....................................................... . ... ....................... ....... ..... ... .. .. . ..
... ...... . .... ... ...
.... ... .. .. .. ...... . . ..... .
.
........
.. .. .
.. .. .... . .
. . . . . . . . . .
. .
. ....
. .
. .
.
....
...
.
... .
..
..
. .
.. . ...
. ....................................................... ...................
.
. . . .. ............. .... ..
. ..
. . .. .
. ...
.... ... .. . ......... ............. ............ ....... .. ...
.
...
... ... . ... .. .. .. .......... .... .. ................................................ ... . ...........
.
... ... ..... ...... . .
.. ...
..
..
... ....... .... ... ............ .. .........
. ... .. ......... ... ........ ..... ....
. ... .. ..
... ... . .... .... ..... . ..
................. ...... .. ............. .. ... .. ...
.. ... .. ... ..... ..
.. .......................... ......... ..... ............ .. ........... ... .. ..... ... . ......
... ..... . . ...... .....
. ... .. . .
.
........................ .... ......................... . ... . . .. . . . .
. . .
.
. .. . . . . .
.
. . . .. .
. .... ...... .. ....... .. .. ... .. ............ ........ .... .. ... . ..
.. .. ... . ....... .
.
.. .. ....... ... .. .. . .. ...
.. ..... .. .. ............... ... . ..... .. ... ..... .. .. .
. ...... .. ... ..
..... .. .... .. .. ..... ...
... ... .......... ....... ... ... ..... ... ..... . ..... ..
... .... ..... ....... ..
. . ...... .... . .
. .
. . . . ... . . . . .
..... ..
.
. ... . .. ...... .. .. ....
.......... ...... .. ...... ..... .. .. ..
... ...
...
... ... ... ... .
.
.. ........... ... .
.. . . . .
. ... ............. ... ..... . . . . . . ....
.
. ... . . .
............. . . .. . . . .. .
. .... . .
.. .. . . .
... .
.. ... . . .... ..
.................. ...... ........... . ...... ... ..... ....
. . ... ...
...
... .... .. ...... .... ......... ... ...
...
.. .... .... .... ... ..............
....................................................................... ....... ......... .. .. .....
... ...
. ...
...
... ... .... ....... ... ...
.
. ............ .... . ..... . ..
. . .... .. .. . ...
. .. . . . . . ... . .. . . . . . . . . . . . . . .. .
. .
.
... .... .... .
.. .. .. ...... ..... .
... .
.. .... ... .. . ....... .
.. .. .
.. ....... .....
.. ...
... .. ..... ... ... .. . ..... .. ... ... ..
... .. ..... .. ....... ... . .. ...... .... . ...
... .. .. ........ ........ ..
. ..
. .......... ....
. ..
. .. . ........... . ... .....
... . .
. ....... . . . . . . . . . . . . .
... ...
. .. . ....... . .. ..... .. ... .
.. ...
... ..... ..... ... ....... ... ..... . .. .
.. ...
... ..
...
..... .
...... ... ... ...
......
....
.... .. ... .......... .... ... .. ..
... ..
... .. .. ...... .... .. .
......... .. ......... . . .... ... .....
.... .... ...... ...
....... ...... .. .
. .
.......
.
.. ...... .. . . . . . .
.
. .. .. . .
.
..... .
... .
... ........ .. ........ .. ........ .. .. .
... . .................. . ...... ..
......... .......... ...
... .. .. ........... ..... .. .......... ... ...... ... ...
...
... .. .. . ................................................................. .. . .. . . . ....
. . . ...... ..... . . . . . . . .. .
... .. . . ....... .
... ... ... ....
......... ....... ............. ..........
..
... ...
...
.. .. ...... .. ...
....
.. ... ...... .
.. .. ...
....
............ ...
.... ... ... .... ... . . .
... ... ..... . ....
.... .. ... .....
.
.. ... ....
.... .. ... ........ ... .. ......
......
..... ... .......... .. ...........
. . . .
.
.....
.. .....
.....
.
.. .....
.....
.. ......... .. .....
.....
... .. .....
..... ...... .....
......
...... ...
. ..........
.
...
. .. ........
...... . .....
.
.. ....
....... .... ........ ......
........ ....... .. .........
........
......... ........
.........
............. . . . . . . . . . . . ........
.........................................
The curvature of a surface must have physical dimension length−2 , that is, the
reciprocal of area, if R3 is regarded as the product of three physical lines.
Remark 5.3. One may well ask: Why the product? Surely there are other
combinations of the two directional curvatures that would tell us something about
the shape of the surface at a given point. For example, the sum of the two principal
curvatures was used by Sophie Germain (1776–1831) in a paper on the theory of
elasticity. Half of this sum, which is their arithmetic mean, is called the mean cur-
vature. In Chapter 6, we shall also encounter the sectional curvature of a manifold,
which is, for example 1/R2 for an n-dimensional sphere of radius R in Rn+1 . Also
in Chapter 6, we shall introduce the scalar curvature, which for a two-dimensional
manifold embedded in R3 is twice the curvature defined by Euler. Thus, we have
a variety of notions of curvature that we might deal with. But the product of the
2. CURVATURE, PHASE 1: EULER 207
two principal curvatures has one outstanding advantage, which will be developed
below: It allows the curvature of a surface in R3 to be defined using the curvature
of a sphere as a standard, just as the circle is the standard of curvature for curves.
That approach was developed by Gauss, who revamped this whole subject nearly
a century later. The curvature that Euler defined has come to be called Gaussian
curvature.
Remark 5.4. It follows from the definition of curvature that if all the points
on the the surface near the base point lie on the same side of the tangent plane,
the curvature at the base point is nonnegative, whereas if there are pairs of points
arbitrarily near the base point that are on opposite sides of the tangent plane,
then the curvature is nonpositive. It is easy to see intuitively that near a point
of negative curvature, the surface must be shaped rather like a “saddle.” Indeed,
the Hessian determinant of f (x, y) must be negative at such a point, so that the
point is neither a maximum nor a minimum, even though both of its first partial
derivatives vanish there.
2.5. Curvature of a surface in R3 : II. With these preliminaries out of
the way, we can now state Euler’s main result, which motivates the definition of
curvature of a surface just given .
Theorem 5.2. In terms of the quantities a, b, c, and d introduced above, the
curvature of the surface z = f (x, y) at a point where the xy-plane is tangent to that
surface is given by
∂ 2 f ∂ 2 f ∂ 2 f 2
κ(x, y) = a2 − d2 = a2 − b2 − c2 = − .
∂x2 ∂y 2 ∂x∂y
Proof. This proof is self-working and is merely a matter of inserting the values
of a, b, c, and d into the definition just given for κ.
The curvature turns out to be the determinant of the Hessian matrix, but only
in coordinates where the xy-plane is tangent to the surface at the point where
curvature is being computed. It is the analogy between the Hessian and the second
derivative that motivates our definition of curvature.
It is important to keep in mind the special configuration of coordinates that
makes this simple expression possible. When general coordinates are used, even if
the equation of the surface has the form z = f (x, y), the formula for the curvature
is a little more complicated. We state this result as a separate theorem.
Theorem 5.3. For a surface that is the graph of a function z = f (u, v), the
curvature at a point u, v, f (u, v) is
∂2f ∂2f
∂ 2 f 2
∂u2 ∂v 2 − ∂u∂v
κ(u, v) = 2 ∂f 2 2 .
1 + ∂f ∂u + ∂v
formula given in the theorem
therefore holds for f if it holds when u, v, f (u, v) is
replaced by x, y, g(x, y) . Thus without any loss of generality, we can assume that
u0 = 0 and v0 = 0. We have only to perform a rotation of the coordinate system
in order to get the tangent plane to be the xy-plane.
We are assuming that the surface is the graph of a function f (u, v), and this
means that the tangent vectors
∂r ∂f ∂r ∂f
=i+ k and =j+ k
∂u ∂u ∂v ∂v
are linearly independent. In particular, the normal vector n, which is the cross
product of these tangent vectors, is perpendicular to the tangent plane and not the
zero vector.
To keep our computations straight, we now introduce some abbreviations:
∂f ∂f
f1 = , f2 = , a= 1 + f12 , b= 1 + f12 + f22 ,
∂u ∂v
where the two partial derivatives are to be evaluated (u0 , v0 ) = (0, 0). All four of
the quantities f1 , f2 , a, and b are constants, not functions of u and v.
We now introduce a new orthonormal basis of R3 , as follows:
1 f1
i = i+ k,
a a
−f1 f2 1 + f12 f2
j = i+ j+ k,
ab ab ab
−f1 −f2 1
k = i+ j + k.
b b b
(This change of coordinates is not unique; all that really matters is that k be
a unit vector normal to the surface.)
All the partial derivatives in these expressions are evaluated at (0, 0), and hence
are constants. They are not to be differentiated when we differentiate expressions
in which they occur.
This being a transformation of one right-handed orthonormal system into an-
other, its inverse has coefficients that are merely the transpose of these:
1 −f1 f2 −f1
i = i + j + k ,
a ab b
1 + f12 −f2
j = j + k ,
ab b
f1 f2 1
k = i + j + k .
a ab b
x f1 f2 y f1 g(x, y)
u = − − ,
a ab b
a y f2 g(x, y)
v = − ,
b b
−(u f1 + v f2 ) + f (u, v)
g(x, y) = ,
b
∂u 1 f1 ∂g
= − ,
∂x a b ∂x
∂u −f1 f2 f1 ∂g
= − ,
∂y ab b ∂y
∂v −f2 ∂g
= ,
∂x b ∂x
∂v a f2 ∂g
= − .
∂y b b ∂y
From the third of these equations, we easily deduce that at a point (u, v) we
have
∂g 1 ∂u ∂v 1 ∂f ∂u ∂f ∂v
= − f1 + f2 + + .
∂x b ∂x ∂x b ∂u ∂x ∂v ∂x
It follows from this expression that ∂g/∂x = 0 at the origin, and similarly
∂g/∂y = 0. Thus the tangent plane to the surface is indeed the xy-plane. In other
words, we have now made the reduction that Euler made in defining curvature. As
a consequence, we find that at this point
∂u 1
= ,
∂x a
∂u −f1 f2
= ,
∂y ab
∂v
= 0,
∂x
∂v a
= .
∂y b
Moreover, by extending the use of the chain rule in the formula for g(x, y), we
find, again at the point (0, 0), that the second-order partial derivatives of u and v
with respect to x and y cancel out, and we are left with
∂2g 1 ∂ 2 f ∂u 2 ∂ 2 f ∂u ∂v ∂ 2 f ∂v 2
= + 2 + ,
∂x2 b ∂u2 ∂x ∂u ∂v ∂x ∂x ∂v 2 ∂x
∂2g 1 ∂ 2 f ∂u ∂u ∂ 2 f ∂u ∂v ∂u ∂v ∂ 2 f ∂v ∂v
= + + + ,
∂x ∂y b ∂u2 ∂x ∂y ∂u ∂v ∂x ∂y ∂y ∂x ∂v 2 ∂x ∂y
∂2g 1 ∂ 2 f ∂u 2 ∂ 2 f ∂u ∂v ∂ 2 f ∂v 2
= +2 + 2 .
∂y 2 b ∂u2 ∂y ∂u ∂v ∂y ∂y ∂v ∂y
210 5. CONCEPTS OF CURVATURE, 1700–1850
From this point, a slightly messy computation (see Problems 5.2 and 5.4 below)
reveals that the curvature is
∂ 2 g ∂ 2 g ∂ 2 g 2
κ(u, v) = − ,
∂x2 ∂y 2 ∂x ∂y
∂2f ∂2f
∂2f 2
2 2 −
= ∂u ∂v ∂u ∂v
∂f 2 2 .
∂f 2
1 + ∂u + ∂v
(Four terms cancel in pairs in the expanded version of the right-hand side.)
Theorem 5.3 is an analog of the formula for the curvature of a plane curve that
is the graph of a function y = f (x). Recall that this curvature is
f (x)
κ(x) = 3/2 .
1 + (f (x))2
By comparing with Theorem 5.3, we can see that the second derivative is re-
placed by the determinant of the Hessian, the square of the first-order derivative by
the sum of the squares of the two first-order partial derivatives, and the exponent
3/2 by 2 = 4/2.
Definition 5.6. The radius of curvature at a point of the surface is the recipro-
cal of the mean proportional between the maximum and minimum radii of curvature
among all planar sections of the surface passing through that point perpendicular
to the tangent plane. In other words, it is the square root of the reciprocal of the
curvature.
For a sphere of radius R, the radius of curvature is R, but at a point where a
surface has negative curvature, the radius of curvature is an imaginary number. An
important example of a surface of constant negative curvature is the pseudo-sphere,
which plays a large role as a model of the hyperbolic plane and will frequently serve
us as an example, in this chapter and the next.
Example
5.3. Consider the lower hemisphere of radius r0 , for which f (x, y) =
r0 − r0 − x − y 2 . We first compute the Hessian at a general point:
2 2
∂ 2 f ∂ 2 f ∂ 2 f 2
H(x, y) = −
∂x2 ∂y 2 ∂x ∂y
(r0 − x )(r0 − y 2 ) − x2 y 2
2 2 2
=
(r02 − x2 − y 2 )3
r04 − r02 (x2 + y 2 )
=
(r02 − x2 − y 2 )3
r02
= .
(r02 − x2 − y 2 )2
Since
∂f 2 ∂f 2 r02
1+ + = ,
∂x ∂y r02 − x2 − y 2
we find that the curvature is
r02 1
κ(x, y) = 4 = 2.
r0 r0
2. CURVATURE, PHASE 1: EULER 211
Since the equation of this surface will be exactly the same at every point when
the tangent plane is taken as the xy-plane, we infer that the curvature of a sphere
is constant and equal to 1/r02 . For a sphere, the radius of curvature is precisely the
radius of the sphere.
At this point, in theory, we know exactly what to do to compute the curvature of
a surface in R3 . If it is a proper surface, it is the graph of an equation F (x, y, z) ≡ 0
such that the gradient ∇F (x, y, z) never vanishes on the surface itself. For example,
the sphere of radius r0 with center at the origin has the equation F (x, y, z) =
x2 + y 2 + z 2 − r02 ≡ 0, and the gradient ∇F (x, y, z) = 2xi + 2yj + 2zk does not
vanish at any point of the surface. (It vanishes only at the origin, which is not
on the surface.) Given that fact, the implicit function theorem tells us that every
“proper” surface in R3 is the graph of a function, say z = f (x, y). Granted, it may
be difficult to find the function f (x, y) analytically, but that is a separate issue. If
f (x, y) can be found, the curvature can be computed.
What more could we want? For one thing, given that a surface may have more
than one analytic equation, we want assurance that the curvature is determined by
the surface itself, and not by any particular analytic representation of it. Second,
we’d rather not have to find the function f (x, y). For example, we might like to
parameterize the sphere of radius r0 with longitude-latitude coordinates (θ, ϕ) →
r0 cos θ cos ϕ i + r0 sin θ cos ϕ j + r0 sin ϕ k. It would be less messy to work directly
with these trigonometric functions than to have to manipulate the square root of a
sum of squares. What we hope to get, ultimately, is a set of key functions derivable
directly from the parameterization of the surface in terms of which the curvature
can be expressed. More than anything else, we want to know how to generalize this
notion of shape to higher-dimensional objects. The crucial steps in that direction
were taken by Gauss and will be studied in the next section.
2.6. A two-dimensional universe. The space that formed the background
of this discussion was the three-dimensional space R3 of Euclidean geometry. We
used that three-dimensional model to generate an expression for the element of arc
length along a curve in terms of the parametric function that defines the curve,
and we then extended that discussion to define the curvature of the surface using
Euclidean circles and finally the sphere as our standard of curvature. As a simple
model of this surface, we might imagine (x, y) to be coordinates on a small portion
of a flat plane and f (x, y) the elevation of the point of the Earth’s surface directly
above or below that flat plane. We used the three-dimensional Pythagorean theorem
to define the element of arc length on this surface.
Imagine now a different species of earthlings whose visual perception is hand-
icapped, so that they cannot see hills and valleys. As far as they are concerned,
the pair (x, y) (longitude and latitude, perhaps) is the complete set of coordinates
of a point, and altitude does not exist for them as a thing they can perceive. They
may nevertheless be able to form some concept of it. True, our three-dimensional
Pythagorean theorem is of no use to them, since they cannot directly perceive any
third dimension; their geometry is that of a two-dimensional world. Because of grav-
ity, however, they can detect and deal with the third dimension that we perceive
and explain it to themselves in terms of the gravitational field. Even though they
can’t see themselves going uphill or downhill, they know when they are doing either
by the amount of effort involved in moving with or against the gravitational field.
They might define the distance from point A to point B as a suitably dimensioned
212 5. CONCEPTS OF CURVATURE, 1700–1850
multiple of the amount of work required to move a standard unit mass from A to
B, that is, the difference in gravitational potential between the two points. That
“distance” is negative if B is downhill of A in the three-dimensional model and zero
if the two points are at the same altitude. What we, from our three-dimensional
perspective, picture as a hill is, for the inhabitants of this two-dimensional world,
a region containing a point of maximum gravitational potential (a point that it
requires the maximum amount of work to reach), surrounded by a family of closed
level curves of the potential.
This trichotomy of distances recalls the space-like and time-like intervals be-
tween two events and the light cone of an event in the four-dimensional universe
that we actually do inhabit. Relative to that four-dimensional universe, we are
in the position the inhabitants of the two-dimensional world find themselves in
according to the scenario we just imagined. The proper-time interval ds given by
ds2 = dt2 − (dx2 + dy 2 + dz 2 )/c2 has some of the properties of the metric we just de-
fined using the difference in gravitational potential. Just as our imaginary denizens
lacking three-dimensional vision will still perceive a difference between uphill and
downhill, so are we able to perceive a difference between time and space even though
we cannot perceive or imagine a world of four-dimensional spatial coordinates.
In the next section, we shall recast what we have just done in R3 in a way that
will provide a generalization to n-dimensional spaces.
curvature that depends only on these coefficients and avoiding the geo-
metric transformations needed to compute curvature in Euler’s approach.
This line of reasoning leads to the Riemann curvature tensor mentioned in
the previous chapter. That tensor will be the final outcome of the present
chapter, but it will be written in the language of calculus. (In the next
chapter, we shall give it a more geometric form, better adapted to purely
abstract spaces.)
(2) The notion of geodesic polar coordinates in a neighborhood of a given
point. Gauss showed that the points lying at geodesic distance r from
a given point on the surface lie on a curve perpendicular to each of the
geodesics through the given point. This line of reasoning leads to the
exponential mapping and normal coordinates, and meshes well with the
Ricci tensor that provides Einstein’s law of gravity. All of that must await
the reformulation of the Riemann curvature tensor in the next chapter.
.................
..................................................................
.............................................................................................................
.......................... ............................................... ............................
. ...
..
. ..................................................................................................................................................................................................................................................
.. ... ... . . .... .. ... . .
................................. ....................................................................... ..................................
........................ .................... ..................... ......................... ..........................
....................... ................. .................................................... ..................... ......................
................ .................... ............... ..................................... ................................................................................................
.......................................................... ................. . . . . .... .. . ..
. . . .. . . . ..... . .. . ......... ...... ........
. . . . . .
................................ ........... ... ........................... ........ ........ ................... ................
. ................................. ......... ............................... ... ............................... ........ .......................................
.. .. .
....... . .. . . .. .. . . . . .... . . . . . . . .
. . . . . . . .. .... . . . ... .. ...................
.......................... ........... ...................... ... ..... ................ ... ............... ..... ......................
............. ... .................. ... ............................................. ... .............. ... ...............
................ ..... ......... ............................................................ ... ..................... .......... .... ...................
.......... .... ...... ........................ ........................................................................... ... .. .............. ...... .... .........
..... ... ........ ..... ......... ....................................................................................... .... .........
. . .... . .. . ..... ....
...................... ........ ................. ............................................................................................................................................ . .............................
..... ... ...... .. .............. .. . . . .. . .
..... ..... ........ . . . .................... .... ............................................................................................................................. .............. .................
....... ....... ................ ....... .............................. ............. ..... ... ..
......... .......... .... ................. . ................ ............. ........ .... .. .
............................................................................................................................. ............... ...... .. ..........................
... ...... ............. ....... ... ..............
..... ......... ........... .. ....................................... ................ ..... ....... .......... ..............
................... ...
....... .......... ...............................................................................................................
.. . ..... . . ... ..
... .
.. ............ ............ ......................
. ............. ..
..............
....... .............. .. .................. ....... ................... ....................... ......... ........ ...............
......... .
............ .......................................................... ...... . . . ...... ..................... ................................ .........
... .........
.......................................... ..........
. .......... ... . .. .. ........ ................. ... ....... ......................
. .. . . .. .. ... . .
............... .. ........ ... .......... ............... ...... ....................... .........
...... ............... ........ ................... ... .. ........... .........
.......................... ............................ ..................................................................................... ............................ .........
............
.. .... .............. .... ........... .......... ....... ............... ...... ............... ......
.......................... ............................... ............................................................. ....................... ............ ...............
.. . .
.................
... .................. ..... ............... ................ ............ ...... ........... ...... . ..........
......... ............................ ........................................... ......... ........ ............. ....... .
. . .. ...... ...... .
. .
...... ....... ..........
...................................................................................... ......... ...... ...... ...... ......
...... ......
..........
.......
............... .............. ... ....... ..... ...... ...... ...... ......
....................... .............. ...... ................. ......... ...... ...... ...... . .....
.......... ..... .......... ....
..... ... .... ....... ...... .
.. ..... ..... ...... ......
. ....... ..... .. ........... ......
. ........
... .... .... . . . . . . . . . .. .. .
. .. .
... ... ... ... . .. . .. . ... ...... . .
...........
... ... ... ... ...... ...... . ......... .....
... ... .... ... . . ..... ......
... ... . ... ...... ....... ..... .......
... ... ..... ...
.
....
.
....
..
.....
. ..
..........
... ... ......... .... ..... .... . .
...........
... ... .... ... ... ....
... ... ... ...... ....
... .... ...
...
....
....
... ... . .
... ... ... .
............
... ....... .
... ...... ...........
... .
...
...
..........
..
areas provides a measure of the curvature of the surface. Letting the region U in
the parameter space shrink down on a point (u0 , v0 ), we can take the limit of the
ratio of the area of its image under n to the area of its image under r and call that
limit the curvature at the point r(u0 , v0 ). That is the theory behind the approach
used by Gauss. What remains is the actual algebraic computation of the curvature
using these principles and verifying that it is the same as the curvature defined by
Euler.
We shall retain much of the notation used by Gauss, who defined six functions
on the surface. The first three, which Gauss denoted E, F , and G, are
∂r ∂r ∂x 2 ∂y 2 ∂z 2
E = · = + + ,
∂u ∂u ∂u ∂u ∂u
∂r ∂r ∂x ∂x ∂y ∂y ∂z ∂z
F = · = + + ,
∂u ∂v ∂u ∂v ∂u ∂v ∂u ∂v
∂r ∂r ∂x 2 ∂y 2 ∂z 2
G = · = + + .
∂v ∂v ∂v ∂v ∂v
The three functions E, F , and G are called the metric coefficients of the surface,
and it is useful to put them into a “metric matrix” M :
E F
M= .
F G
Later, when we wish to be more systematic and generalize this work, we shall
change the notation to E = g11 , F = g12 = g21 , G = g22 .
The metric
coefficients
can be used to compute the length (s) of a parameterized
curve t → r u(t), v(t) on the surface and the area S of a portion of the surface via
3. CURVATURE, PHASE 2: GAUSS 215
the formulas
2 2 2
ds2 = E u (t) + 2F u (t)v (t) + G v (t) dt ,
dS = EG − F 2 du dv .
The first of these relations is usually written in invariant differential form,
independent of the parameterization of the specific curve, but depending on the
parameterization of the surface:
ds2 = E du2 + 2F du dv + G dv 2 .
The quadratic form ds2 is called the first fundamental form of the surface.
Notice that the element of area on the surface is
dS = det(M ) du dv .
The coefficients E and G are obviously nonnegative, as (by the Schwarz in-
equality) is the determinant det(M ). We shall assume that det(M ) > 0, which
implies that E > 0 and G > 0 as well, so that the matrix is positive-definite.9 The
metric on a parameterized surface is such that ds2 is always nonnegative, and the
matrix M is symmetric. Such a metric is now called a Riemannian metric.
The analogous expressions for the mapping n lead to the second fundamental
form, which (up to a constant, since definitions of this concept vary) we write—
again following Gauss—as
D du2 + 2D du dv + D dv 2 .
Remark 5.5. Because the cross product of the two partial derivatives of r
that defines the normal vector is anti-symmetric, the mapping n has a direction
determined by the order of the parameters. If u and v are interchanged, then n
becomes − n. This is not important in itself, but it does call attention to the
two-sided ambiguity of the unit normal vector. When u and v are interchanged,
however, the cross product of the two partial derivatives n also reverses its sign. As
a result, the vector triple product n · ∂n
∂u × ∂v does not change when the parameters
∂n
u and v are interchanged. If the sign of that triple product is positive, we call the
curvature positive, and if negative, we call it negative.
With the second fundamental form, we can compute what Gauss called the
curvature, namely the ratio of the infinitesimal areas.
Definition 5.7. The Gaussian curvature of the surface is
DD − (D )2
κ=± ,
EG − F 2
where the ambiguous sign is the sign of the vector triple product n · ∂n
∂u × ∂n
∂v .
9 At points where det(M ) = 0, the parameterization has a singularity, and we need to use a
different set of parameters. For example, co-latitude and longitude coordinates on a sphere have
such a singularity along the equator, and so we generally use them only to study the punctured
upper hemisphere with the north pole (where longitude is not defined) removed.
216 5. CONCEPTS OF CURVATURE, 1700–1850
(We include the constant a, which is intended to have the geometric dimension of
length, so that both sides of the equation
will represent a length.) We shall use x
and y as parameters so that r(x, y) = x, y, (x2 − y 2 )/a . We then have
∂r
= (1, 0, 2x/a) ,
∂x
∂r
= (0, 1, −2y/a) .
∂y
Thus the normal vector to this surface is
∂r ∂r
N (x, y) = × = (1, 0, 2x/a) × (0, 1, −2y/a) = (−2x/a, 2y/a, 1) ,
∂x ∂y
and so the unit normal is
−2x 2y a
n(x, y) = , , .
a2 + 4x2 + 4y 2 a2 + 4x2 + 4y 2 a2 + 4x2 + 4y 2
From these expressions, we find
E = 1 + 4x2 /a2 ,
F = −4xy/a2 ,
G = 1 + 4y 2 /a2 ,
4(a2 + 4y 2 )
D = ,
(a2 + 4x2 + 4y 2 )2
−16xy
D = ,
(a2 + 4x2 + 4y 2 )2
4(a2 + 4x2 )
D = .
(a + 4x2 + 4y 2 )2
2
Notice that E, F , and G are dimensionless, while D, D , and D have the
dimension of length raised to power − 2. Thus, curvature will also have this
dimension, as it should have.
Computation reveals that
∂r ∂r ∂n ∂n −4
× · × = 2 ,
∂x ∂y ∂x ∂y a + 4x2 + 4y 2
so that the curvature of this surface at every point is negative. That is no sur-
prise, since everyone knows that a hyperbolic paraboloid is saddle-shaped. The
computation shows that the curvature is
−4a2
κ(x, y) = .
(a2 + 4x2 + 4y 2 )2
The curvature at the origin, where the surface is tangent to the xy-plane, is
κ(0, 0) = −4/a2 , which is exactly the value computed by Euler’s method.
The definition of curvature given by Gauss for a parameterized surface is con-
sistent with Euler’s definition for the curvature of the graph of a function of two
variables, as we can see
by applying Gauss’s definition to the parameterization
r(x, y) = x, y, h(x, y) . Mathematica Notebook 8 in Volume 3 verifies this asser-
tion. Apart from the annoying fact that Mathematica prefers working with the
negative of the Hessian rather than the Hessian itself and stubbornly insists on ex-
pressing an explicitly real number a as Abs [−a] Sign [a], one can see from the output
3. CURVATURE, PHASE 2: GAUSS 217
of Notebook 8 that this is exactly Euler’s result, derived in detail in Problem 5.4
below.
3.2. Curvature in terms of the metric coefficients. In § 11 of his first
treatise on differential geometry, Gauss gave the following formula10 for curvature,11
except for using p and q as parameters where we are going to be using u and v and
writing k instead of κ for the curvature.
∂E ∂G ∂F ∂G ∂G 2
(5.1) 4(EG − F ) κ = E
2 2
−2 +
∂v ∂v ∂u ∂v ∂u
∂E ∂G ∂E ∂G ∂E ∂F ∂F ∂F ∂F ∂G
+F − −2 +4 −2
∂u ∂v ∂v ∂u ∂v ∂v ∂u ∂v ∂u ∂u
∂E ∂G ∂E ∂F ∂E 2 ∂ 2
E ∂2F ∂2G
+G −2 + − 2(EG − F ) 2
−2 + .
∂u ∂u ∂u ∂v ∂v ∂v 2 ∂u ∂v ∂u2
Given this formula, it becomes possible to begin the discussion of a surface with
the metric coefficients E, F , and G, ignoring their origin as dot products of tangent
vectors to the surface. The ambient space in which the surface is embedded then
becomes superfluous. The shape of the surface is determined by its parameters, and
the conversion from one set of parameters to another is governed by the Jacobian
matrix. In other words, algebra turns the study of the geometric object into the
study of a set of equations, and the foundation is laid for generalization to geometry
in any number of dimensions.
Although this formula applies only to two-dimensional surfaces in R3 , its an-
alytic form naturally inspires the question how it might be generalized to give a
definition of curvature in higher-dimensional manifolds. Our aim is to study the
“shape” of such manifolds as functions of parameters without the need to refer to
any space in which the manifold is embedded. The key to doing so—defining a “di-
rection” in a completely abstract space—is described in Appendix 4 of Volume 2,
where it is shown how the tangent plane can be generalized as the space of deriva-
tions of locally C ∞ mappings. A derivation is a linear operator X on locally differ-
entiable functions f having the multiplicative property: X(f g) = f X(g) + gX(f ).
In Euclidean space, such an operator is necessarily a directional derivative, so that
these operators provide an algebraic encoding of the concept of a direction. This
fact makes up for the ambiguity in the notion of direction when a surface is given
parametrically. Each parameter domain can be regarded as a Euclidean space made
up of points whose analytic expressions as ordered sets of real numbers provide a
natural set of directions. But the fact that a given surface can be parameterized in
infinitely many ways means that any such choice of “direction” is arbitrary. The
derivation operators X, however, operate on real-valued functions defined on the
surface itself, independently of any parameterization. Each such operator there-
fore “stands in” for a direction on the surface away from a given point, thereby
providing the abovementioned algebraic encoding of the concept.
The generalization of this work of Gauss involves considerable difficulty. The
form of Eq. (5.1) is sufficiently asymmetric that it is not obvious what the analogous
10 Felix Klein ([46], p. 155) commented that Gauss obtained this formula “rather laboriously,
by a lengthy computation” (nicht ohne Mühe, auf Grund längerer Rechnung).
11 The standard English translation of this work by James Caddall Morehead and Adam Miller
Hiltbeitel, published by the Princeton University Library in 1902, has a misprint on the left-hand
side of the formula, where (EG − F 2 ) appears instead of the correct expression (EG − F 2 )2 .
218 5. CONCEPTS OF CURVATURE, 1700–1850
To add some heuristic support for this (unproved) assertion, we note that the
curvature is defined as the square root of the quotient of the first and second fun-
damental forms. Now the first fundamental form is essentially the three metric
coefficients. As for the second, it is obtained from the unit normal vector n just as
the first fundamental form is computed from r, that is, by taking dot products of
partial derivatives. Since n is a unit vector, its partial derivatives are perpendic-
ular to it, that is, they are vectors in the tangent plane, and therefore expressible
as linear combinations of the two first derivatives of r. Their dot products can
therefore be expressed as linear combinations of E, F , and G.
The weak point in this reasoning is that, when the second fundamental form
is expressed in terms of E, F , and G, the coefficients involve derivatives of the
cross product of the two tangent vectors. That is the piece of the computation that
specifically requires three dimensions and so needs to be cut out and replaced with
something intrinsic to the parameterization.
One can see easily see that Theorem 5.4 holds when the surface is the graph of
a function z = f (x, y) (Problems 5.2 and 5.4).
∂2r
∂xi .∂xj
....
............
.......
... ...
... ...
... .... 3
... ... Γ n
... .... ij
... ...
... ...
... ....
... .....
... ...........
... ........... ............... 2 ∂r
... ....... .... ... Γ
... . ........... ij ∂x2
..
........ ....
. ..
.......
.
..... .. .. .....
..... .... .... .........
..... .. .. ......
∂r ............
Γ1ij ∂x ............. ................
1 . .................
Figure 5.4. The standard Christoffel symbols Γ1ij and Γ2ij are the
coefficients of the tangential component of a second-order partial
derivative, that is, the components of the projection of the partial
derivative into the tangent plane. The nonstandard symbol Γ3ij is
the coefficient of its normal component.
that it is the same as the algebraic definition given in Chapter 4. That task is the
purpose of the present subsection.
In the previous subsection, we remarked that when a surface is described in
absolute terms as the graph of a vector-valued function r(u, v)—say it consists of the
points x(u, v), y(u, v), z(u, v) —the curvature can be expressed in terms of the first
and second derivatives of the vector-valued function r(u, v) = x(u, v) i + y(u, v) j +
z(u, v) k. The difficulty we are trying to overcome is the fact that the second-order
derivatives, regarded as vectors in R3 , are not tangential to the surface. We have
no way, as yet, of expressing them that is intrinsic to the surface and independent
of its embedding in R3 and hence applicable to any parameterization of the surface.
To deal with this difficulty, we separate the second-order partial derivatives of the
mapping r into components tangential and normal to the surface. We write
∂2r ∂r ∂r
2
= Γ111 + Γ211 + Γ311 n ,
∂u ∂u ∂v
∂2r ∂r ∂r
= Γ112 + Γ212 + Γ312 n ,
∂u ∂v ∂u ∂v
∂2r ∂r ∂r
2
= Γ122 + Γ222 + Γ322 n ,
∂v ∂u ∂v
where the vector n is the unit normal to the surface, so that its dot products with
∂r
both ∂u and ∂r
∂v are zero.
The symbols Γikj will turn out to denote the same objects we denoted by these
symbols in Chapter 4. As mentioned there, they are called the Christoffel symbols,
after Elwin Bruno Christoffel (1829–1900). They are intimately involved with the
differential geometry of vector fields on manifolds. The important Christoffel sym-
bols are the coefficients of the two components in the tangent plane, that is, Γ1ij
and Γ2ij . They are the two components of the projection of the second-order partial
derivative into the tangent plane, as illustrated in Fig. 5.4.
The subscript and superscript notation we have introduced here will gradually
come to dominate all of our computations, allowing us to ignore any embedding of
a manifold in a higher-dimensional ambient space and work directly with the pa-
rameters and the metric coefficients. For the moment, however, we need the vector
notation of R3 to instill an intuition about the meaning of the algebraic formulas
220 5. CONCEPTS OF CURVATURE, 1700–1850
we are going to be deriving. Without that intuition, the basic formulas of differ-
ential geometry would simply be learned by rote, the way we all once learned the
multiplication table. The route by which these formulas were originally discovered
would remain a mystery, and the fact that they encode something useful in the
description of nature would be an even deeper mystery. Also, without the intuitive
geometric picture of their meaning they would not be of much help in suggesting
physical hypotheses. Anyone can manipulate the formulas; but in order to be of
any use, that manipulation must be guided by intuition.12
Notice that the two subscripts j and k correspond to the two parameters u and
v; thus the subscript pair 11 corresponds to taking the derivative with respect to u
both times, 12 and 21 correspond to the mixed derivative, and 22 corresponds to
two differentiations with respect to v. Since the mixed partial derivative is the same
in either order, it follows that Γi21 = Γi12 . The superscript i = 1 or i = 2 corresponds
to the first-order partial derivative vector whose coefficient is the Christoffel symbol
in question.
Remark 5.6. Normally, on a manifold of dimension n, there are n3 Christof-
fel symbols, all defined by the formula that was given in Chapter 4. On a two-
dimensional surface in R3 the four Christoffel symbols with superscript 3 would not
normally be used, and we shall occasionally rewrite them as what they are, namely:
∂2r
Γ311 = · n,
∂u2
∂2r
Γ312 = Γ321 = · n,
∂u ∂v
∂2r
Γ322 = · n.
∂v 2
For the time being, we shall distinguish these four Christoffel symbols, which
are defined only for surfaces in R3 (and perhaps only in the present book!), by
calling them the nonstandard Christoffel symbols, although of course we wouldn’t
have introduced them unless we had some use for them. As a matter of fact, we
shall find it easy to standardize them by introducing a third parameter. That will
allow the subscripts to assume the value 3 as well, and we shall then have a 3 × 3
matrix of metric coefficients and a triply-indexed tableau of 27 Christoffel symbols.
This seeming complication will save us some labor even in the two-variable case we
are now studying, since it will allow us to write formulas in a uniform notation that
12 What we are trying to do here is raise the level of abstract visualization. The goal is to
harmonize intuitive but not rigorous geometric pictures with analytic expressions that contain
unimpeachable rigor but lack intuitive content. All higher-level mathematics courses depend on
the student’s bridging this gap. Algebraists learn to visualize abstract groups, and analysts to
visualize abstract topological spaces. In both cases, there is a basic substrate of concrete examples
(small finite groups, or two- and three-dimensional Lie groups in the case of algebraists, ordinary
Euclidean space in the case of analysts) that allows abstract definitions to seat themselves in
the mind. Conjectures about abstract objects are suggested intuitively, but have to be checked
against logic, since an abstract object does not have all of the properties found in the intuitive
stock of mental pictures. In 1962, I had the privilege of learning real analysis from the late Ralph
P. Boas, Jr. (1912–1992). I remember vividly his posing the question whether the closure of a
ball in a metric space is compact. (The answer is no. In finite-dimensional Euclidean spaces, it
is compact; but the example of any infinite set equipped with the discrete metric shows that this
is not true in general.) Thus, another staple in the diet of research mathematicians is a stock of
counterexamples to restrain our potentially misleading intuition.
3. CURVATURE, PHASE 2: GAUSS 221
otherwise would contain anomalous terms. For the time being, the nonstandard
Christoffel symbols are defined purely by convention, that is, from the equations
displayed above.
As for the standard Christoffel symbols Γikj , for which the superscript has the
same range (i = 1, 2) as the two subscripts, it is proved in Appendix 6 of Volume 2
that if we set g11 = E, g12 = g21 = F , and g22 = G, these symbols are just the two-
dimensional version of the four-dimensional symbols in Eq. (4.11), that is, when
x1 = u and x2 = v,
1 ∂g ∂glk ∂gjk
jl
Γijk = g il + − .
2 ∂xk ∂xj ∂xl
l
(5.3) κ= ,
g11 g22 − g12 g21
where the range of summation on the indices m and p is from 1 to 2. (Thus, only
the standard metric and Christoffel symbols, intrinsic to the surface, appear in this
formula.)
It appears that we have here an embarrassment of riches, since the two sides
of Eq. (5.2) can be made to assume the value (g11 g22 − g12 g21 )κ by two obvious
choices of i, j, k, l: either i = k = 1, j = l = 2 or i = k = 2, j = l = 1. One of our
choices for expressing the curvature κ can be
∂r ∂r ∂r ∂r
Rie ∂u , ∂v , ∂u , ∂v
κ= .
g11 g22 − g12 g21
We have already encountered this kind of redundancy when we used the van-
ishing of the Ricci tensor to determine the metric coefficients for space-time (see
Eq. (4.16) in Chapter 4). In that situation, on a manifold of dimension 4, the Ricci
tensor has 16 components, 12 of which vanish automatically, just as happens here
in Eq. (5.2).
Remark 5.7. Although we have now succeeded in expressing the curvature
in terms of the metric coefficients, those coefficients are not absolute. They are
defined within a particular system of parameters. The proof that the same number
is obtained for the curvature in any other set of parameters is the main purpose of
Appendix 6 in Volume 2.
To see that Eq. (5.2) really does work, examine Mathematica Notebook 9 in
Volume 3, which uses the equation to compute some curvatures directly from the
metric coefficients. We have at last uncovered the secret of generalizing curvature
in the covariant Riemann curvature tensor Rie (u, v, w, z).
Remark 5.8. The mathematical beneficence of the universe is wonderfully
revealed in the mere fact that the Riemann curvature tensor really is a tensor. We
should not expect this, since it is assembled from the Christoffel symbols and their
partial derivatives, which are not tensors. (See Appendix 6 of Volume 2.) The
Christoffel symbols fail to be a tensor in just exactly the right way to make the
Riemann tensor transform correctly.
3.4. A look ahead. Although the expression given here for the covariant
Riemann curvature tensor is still just a formula containing a lot of symbols, the
subscript notation that allows us to regard it as a tensor of type (0, 4) provides the
foundation that will enable us to modify it in the next chapter. We shall first raise
one of the indices to get the standard Riemann curvature tensor, usually denoted
ρ
Rσμν , a tensor of type (1,3) that operates on one covector υ = ui dxi and three
4. PROBLEMS 223
4. Problems
Problem 5.1. Verify that the curvature of a circle of radius r0 is 1/r0 .
Problem 5.2. Assume that the tangent plane to the surface z = f (x, y) (where
f (0, 0) = 0) is horizontal at the point (0, 0, 0), so that ∂f /∂x = 0 = ∂f /∂y at this
point. It was shown in the text that the curvature at the point (0, 0, 0) is the
determinant of the Hessian, that is,
∂ 2 f ∂ 2 f ∂ 2 f 2
κ(0, 0) = − .
∂x2 ∂y 2 ∂x ∂y
Assume now that the coordinates (x, y, z) are changed by an orthogonal matrix
⎛ ⎞
a11 a12 a13
O = ⎝a21 a22 a23 ⎠
a31 a32 a33
into the coordinates (u, v, w), and that in the new coordinates the equation of the
surface is w = h(u, v). That is,
u = a11 x + a12 y + a13 f (x, y) ,
v = a21 x + a22 y + a23 f (x, y) ,
w = h(u, v) = a31 x + a32 y + a33 f (x, y) .
224 5. CONCEPTS OF CURVATURE, 1700–1850
We now take up the story of curvature where we left it in the previous chapter. The
torch was passed from one mathematical giant to another in 1854, when Riemann
gave his inaugural lecture at the University of Göttingen with the aged Gauss in the
audience. (The latter had only one year of life remaining.) Riemann took Gauss’s
concept of the first fundamental form and generalized it to what he called (and what
is still called) the metric on an n-dimensional manifold. The idea of a manifold is an
abstraction and generalization of the concept of a surface. The parameterizations
introduced by Euler and Gauss are applicable only locally, in a neighborhood of a
point. If their domain is enlarged too far, singularities are typically encountered.
For example, polar coordinates are not applicable at the origin in the plane, since
the polar angle θ is not defined at that point. Furthermore, if every point in the
plane except the origin is to have unique coordinates, it is necessary to restrict θ to
some half-open interval, usually [−π, π), leaving a “barricade” at one end. There
is no way of using these parameters in such a way that (1) every point except the
origin has unique coordinates and (2) each coordinate can vary in some interval
about its value at any given point. Using parameterizations, one can normally
study only a local piece of a surface, while the surface itself is thought of as a single
object, all of whose points are equally dignified and deserving of attention.
As originally introduced, manifolds had this same limitation, and the same was
true in the theory of Lie groups (a way of extending the Galois theory of algebraic
equations to differential equations). Originally, a Lie group was not actually a
group, but only a piece of a group near its identity element. Nevertheless, the
power of analytic symbolism (parameterization) to express deep geometric intuition
had profound consequences, even at the earliest stage. In the present chapter,
we work mostly in such an environment, using whatever local parameterization is
convenient for a specific purpose. These methods sufficed for Einstein’s work on
general relativity.
It was only in the mid-twentieth century that the modern concepts of a global
differentiable manifold and manifold-with-boundary were developed to remove the
limitations on parameterizations mentioned above. As a consequence, a manifold
once again became a unified object, and Lie groups became actual groups. The
systematization and improved perspective provided by the concept of a manifold
are well worth knowing, even though we can get along without it as far as the
application to relativity is concerned. To avoid cluttering up the exposition with
all this geometric machinery, we have relegated the modern theory of manifolds
to Appendix 4, which shows the advantages and the cost of getting a theory that
allows a manifold to be regarded as a single object, like a surface in R3 . Even
in Appendix 4, our discussion of this material is minimal, and a more systematic
treatment can be found in, for example, the book by Sternberg [78].
225
226 6. CONCEPTS OF CURVATURE, 1850–1950
1. Second-Order Derivations
Comparison of the formula (5.1) given by Gauss for curvature in terms of the
metric coefficients with formula (5.3), which expresses it in terms of the Riemann
curvature tensor shows the importance of the ingredient that Gauss was lacking:
the Christoffel symbols. We used these symbols in an ad hoc way in the previous
chapter, but their application in Chapter 4 shows how much greater still their
potential is. The present chapter is devoted to developing that potential, extending
what we have done on surfaces in R3 to a general abstract manifold. We begin by
describing a way of replacing tangent vectors in R3 with a corresponding object
that can be expressed in terms of parameters without the need for the ambient
space R3 .
A vector field on a manifold (see Appendix 4) is intuitively interpreted as a
choice of a “directional derivative” at each point. More formally, a vector field
u defined in a neighborhood of a point P of a manifold is a derivation, which
is a linear mapping of the space of local C ∞ functions into itself satisfying the
multiplicative condition u(f g) = u(f ) g + f u(g). It is proved in Appendix 4 that
such a functional can be expressed in terms of a local coordinate system near P as a
linear combination of partial derivatives in which the coefficients are C ∞ functions.
A covector field υ is a linear mapping of these vector fields into the space of C ∞
functions. The reader is assumed to be familiar with the duality of vector and
covector spaces in linear algebra, and the analogous fact holds here. Although
a vector field is defined as operating on the space of locally C ∞ functions, it can
equally well be regarded as acting on the space of covector fields, since the mapping
υ(u) can be “turned around” and regarded as a mapping u(υ). What we really
have is a bilinear function υ, u = υ(u) = u(υ).
Any smooth surface in R3 has—theoretically—a tangent plane, but to describe
it without algebra is extremely difficult. Only for a few very simple surfaces were the
ancient Greeks able to characterize the tangent plane to a surface in a comprehensi-
ble way. And without such a description, what does it mean to say that the surface
even has a tangent plane? We may feel intuitively that it ought to, but to express
what it is in the language of Greek geometry is a fool’s errand. Fortunately, analytic
geometry solves this problem. We can retain the absolute character of the surface
and still say precisely what its tangent plane is by defining the surface as the points
(x, y, z) in a universal Euclidean space R3 that satisfy an equation F (x, y, z) = 0.
In that case, we can characterize the tangent plane at a point (x0 , y0 , z0 ) as the set
of points (x, y, z) such that the vector (x − x0 )i + (y − y0 )j + (z − z0 )k is perpen-
dicular to the gradient ∇F (x0 , y0 , z0 ). This characterization amounts to the vector
equation
∇F (x0 , y0 , z0 ) · (x − x0 )i + (y − y0 )j + (z − z0 )k = 0 ,
that is,
∂F ∂F ∂F
(x − x0 ) + (y − y0 ) + (z − z0 ) = 0,
∂x ∂y ∂z
where the partial derivatives are evaluated at (x0 , y0 , z0 ).
When the surface is parameterized
by two real variables s and t as a vector-
valued function r(s, t) = x(s, t), y(s, t), z(s, t) defined on an open set U in R2
containing (x0 , y0 ), the tangent plane coincides with the set of all points u in R3
1. SECOND-ORDER DERIVATIONS 227
of the form
∂r ∂r
u = x0 i + y0 j + z0 k + u1 + u2 ,
∂s ∂t
where u1 and u2 both range over the set of real numbers and again the partial
derivatives are evaluated at (x0 , y0 ). In this way, an observer using the coordinates
s and t and the parameterization r has an interpretation of the tangent plane and
the notion of a tangent vector expressed in absolute terms.
A field of tangent vectors is a vector-valued function
∂r ∂r
u(s, t) = u1 (s, t) (s, t) + u2 (s, t) (s, t) .
∂s ∂t
The tangent vector u(s, t) here is identified with the point u(s, t) + r(s, t) in
R3 , which is a point in the tangent plane to the surface at the point r(s, t). We
have included the arguments (s, t) to emphasize that a vector field is a collection
of tangent vectors. If we wish to free ourselves from the ambient space and talk
about the surface as it intrinsically is, independent of any such embedding, we
must get rid of the vector-valued function r in this expression. One way to do
this is to associate with the vector field u whose expression in (s, t)-coordinates is
the one just exhibited the derivation ud : C ∞ (U ) → C ∞ (U ) whose expression in
(s, t)-coordinates is
∂f ∂f
ud (f ) = u1 + u2
∂s ∂t
for all smooth functions f . It is shown in Appendix 4 that derivations are completely
characterized by the properties of being linear functions of the argument f having
the multiplicative property ud (f g) = ud (f ) g + f ud (g).
Although the identification u ↔ ud seems formally an obvious one to make, it
requires some explanation. If we assume that f (x, y, z) is a C ∞ -function defined on
an open set containing the surfacewe are dealing with, it coincides on the surface
itself with the function f˜(s, t) = f r(s, t) = f x(s, t), y(s, t), z(s, t) , and we have
the equation
ud (f˜) = u · ∇f .
Notice that the abstract vector (derivation) operator ud = u1 ∂/∂s + u2 ∂/∂t
is obtained by simply erasing the symbol r. From now on, we shall work mostly
with the derivation ud , omitting the subscript d. We shall keep the identification
of the derivation ∂/∂s with the derivative ∂r/∂s for use whenever we wish to make
an argument geometrically intuitive, but most of our discussion will be carried out
in the language of algebra and will be simpler in the former notation, without the
specific parameterization r in view. When we discuss the Lie bracket [u, v] and
the covariant derivative ∇v u below, it will be essential to think of a vector v as
a directional derivative operator v i ∂/∂xi rather than an “arrow” v i ∂r/∂xi . The
symbol r here only gets in the way of the computation.
at this point by a pedantic insistence on using, say, the symbols f˜ and g̃ for the representation
of these functions in terms of parameters, as we did above in order to clarify the meaning of the
symbol ud .
2 Named after the Norwegian mathematician Sophus Lie (1842–1899).
1. SECOND-ORDER DERIVATIONS 229
1.2. The covariant derivative of a vector field. The Lie bracket is one
of the two tools we need to discuss the abstract Riemann curvature tensor. The
other, which we introduced in the previous chapter without giving it a name, is the
covariant derivative of one vector field with respect to another. It provides the clue
to a geometric interpretation of the Christoffel symbols. Recalling our geometric
definition of these symbols in the previous chapter, we can see a second way to get
a (tangential) vector field out of the second derivatives: Project them orthogonally
into the tangent plane. That is, simply erase the last term in the equations
∂2r ∂r ∂r
= Γ1ij 1 + Γ2ij 2 + Γ3ij n ,
∂xi ∂xj ∂x ∂x
replacing the second-order partial derivative on the left with what we shall call the
covariant derivative of the vector field ∂/∂xi with respect to the vector field ∂/∂xj
and write as the abstract relation
∂ ∂ ∂
∇ ∂j i
= Γ1ij 1 + Γ2ij 2 .
∂x ∂x ∂x ∂x
3 Named after Carl Jacobi (1804–1851).
230 6. CONCEPTS OF CURVATURE, 1850–1950
4 But, after all, the tangent space is all that we can define intrinsically. What else could we
do?
1. SECOND-ORDER DERIVATIONS 231
u(v)......
... ............
....... ..............
..................... .......
.......
. ........
∇u (v) .........
.........
.......
...
......... ..............
......... .
......... .............. ........
......... .. ........... ....... n
......... ............. [v,u] ...............................................................
......... ........ ........................... ...................... ........... .......
......... ....... .......
......... ........ .......................... .. ............ ...... .....
......... ........ ............................................ .............. ...... .......
......... ....... .. ...... .................. .................
....................................... ... ............. ...........
................. ............................ ........... ...
..............................................................................................................
........................................................................... ................
.................................................. ............. ........
...............................................• ...................................... .........
. ...... . . ......... ...... .
...
. ......
.................................................................................................. ...............
..
. .
...... ...
. .............. .
.
. ..........
.. .. .... . . .................. ................ ..........
........ .......
.......... ............................................ ................... ..........
v .. . .
.................... ......................................... .................. ............
.
......................... ................... ........... ...
................................................... .......................... .................... .
...................................................... .................... .............
... . .. .
................. ... ..........
........................................... ................................
................................................ ...........
..
u
2 2 2
Figure 6.1. Small √section of the sphere x + y + z = 1 near the
point (1/2, 1/2, 1/ 2), showing two tangent vectors u and v and
their Lie bracket [v, u] all of which lie in the plane perpendicular
to the normal vector n at that point. The vector u(v) does not
lie in this plane, but its perpendicular projection into the plane is
the covariant derivative ∇u (v). (All the vectors are drawn to scale
except u and [v, u], which are shown one-third of their actual size.)
of f in the direction of v, which is to say, the usual result of applying the derivation
v = v 1 ∂/∂s + v 2 ∂/∂t to f :
∂f ∂f
∇v f = v 1 + v2 = v(f ) .
∂s ∂t
In particular,
∂f ∂f
and ∇ ∂ f = . ∇∂f =
∂s ∂t ∂t ∂s
The covariant derivative has the usual property of a derivation when applied to
the product of a scalar-valued function and a vector-valued function (Problem 6.18):
∇v (f u) = ∇v f u + f ∇v u .
As a bonus from the covariant derivative, we obtain another way of writing the
Lie bracket, namely
∂v i ∂ui ∂
[u, v] = ul l − v l l = ∇u v − ∇v u ,
∂x ∂x ∂xi
and the interesting formula
∇[u,v] = ∇∇u v−∇v u = ∇∇u v − ∇∇v u .
The proofs of these formulas are easy exercises and are left to the reader (Prob-
lem 6.6 below).
The concepts of tangent vectors, Lie bracket, and covariant derivative are il-
lustrated in the case of the unit sphere in Fig. 1.2, which exhibits the vector
fields u = (1/(4y))∂/∂x
√ + (1/(4x2 ))∂/∂y and v = x∂/∂x − y∂/∂y at the point
(1/2, 1/2, 1/ 2), along with the second-order field u(v), the Lie bracket [v, u], and
the covariant derivative ∇u (v). These are all coplanar (tangent) vectors, except
u(v), whose projection into the tangent plane is the covariant derivative. The field
v(u), not shown, also lies outside the tangent plane.
232 6. CONCEPTS OF CURVATURE, 1850–1950
5 This lecture was published as a 14-page paper containing almost no mathematical symbols,
under the title “Über die Hypothesen die der Geometrie zu Grunde liegen” (On the assumptions
that underlie geometry). Later, in 1861, he developed this subject more mathematically in a paper
on heat conduction submitted in a competition established by the Paris Academy of Sciences.
The paper was entitled “Commentatio mathematica, qua respondere tentatur quaestioni ab Illma
Academia Parisiensi propositae” (A mathematical note attempting to answer a question posed
by the distinguished Paris Academy), but was not published until after his death. In the course
of that paper, he noted that an expression of the type shown above could be regarded as the
squared length of an infinitesimally short curve. Incidentally, in the same paper, he was led to
study differential equations of third order, which had been rare in physics up to the time.
2. CURVATURE, PHASE 3: RIEMANN 233
and so, instead of thinking of the Riemann curvature tensor as a quadrilinear map-
ping of three vectors and one covector into the real numbers, it makes much better
sense to think of it as a trilinear mapping of three vectors into a fourth vector.
Remark 6.2. Here once again we are defining an absolute object (a tangent
vector) in terms of its expression in a particular set of parameters. The absoluteness
of this vector depends on its yielding the same numbers as values when it is applied
as a derivation to a given function in any parameterization. The proof that this
one has that property is given below.
Equation (6.2) looks cleaner than Eq. (6.3), consisting entirely of Christoffel
symbols and their partial derivatives. Just as our ability to compute curvature
directly from the metric coefficients allows us to take those metric coefficients as
starting points and forget about the way they were constructed from an embed-
i
ding in Euclidean space, the Riemann curvature tensor whose coefficients are Rjkl
suggests that we could start even farther in, taking the Christoffel symbols as our
point of departure and not worrying about how they are computed from some hypo-
thetical metric coefficients. It is possible to do this since the equations that define
the Christoffel symbols in terms of the partial derivatives of the metric coefficients
can be solved for the partial derivatives of the metric coefficients in terms of the
Christoffel symbols.6 The metric coefficients satisfy a homogeneous linear system
of partial differential equations whose coefficients are Christoffel symbols (see Prob-
lems 4.12 and 4.13). We are not going to explore that possibility, however, since in
the application we have in mind—relativistic gravitation—our approach in Chap-
ter 4 was based on assuming a certain form for the metric coefficients, which in turn
imposed a certain form on the Christoffel symbols. We proceeded by first positing
that the vanishing of the Ricci tensor was the law of gravitation, then computing
what the metric coefficients had to be, knowing the coefficients of that tensor, and
finally computing the geodesics in the resulting metric to get a description of the
motion of a particle.
Remark 6.3. In his systematic exposition of general relativity in the fourth
series of the Annalen der Physik, Einstein gave an exposition of these results, along
with some more general information about tensors. Using another standard nota-
tion for the Christoffel symbols, namely
$
jk
Γijk = ,
i
i
he referred to the tensor whose components are Rjkl as the Riemann–Christoffel
tensor and wrote Eq. (6.2) as
$ $ $ $ $ $
∂ μσ ∂ μτ μσ ατ μτ ασ
ρ
Bμστ =− + − + .
∂xτ ρ ∂xσ ρ α ρ α ρ
(See Eq. (43) on page 800 of [21].) Eddington’s notation was similar, except that
he wrote Γkij = {ij, k}. He also wrote ([16], p. 72) the covariant Riemann curvature
tensor that we have denoted Rijkl as Bjkli .)
of the n3 partial derivatives ∂gij /∂xk , with coefficients that are rational functions of the gij .
2. CURVATURE, PHASE 3: RIEMANN 235
Chapter 4 it has the physical dimension [d]2 , that is, “length”-squared, as length is measured on
the manifold. In particular, u, u is the squared ”length” of the vector u. That is exactly as it
should be, in analogy with R3 .
236 6. CONCEPTS OF CURVATURE, 1850–1950
The coefficient of ∂/∂xi in the expression for ∇w ∇z v , which is
∂ui
wk + uj i
Γ kj ,
∂xk
can then be written as a sum of seven terms, each of which involves summation on
two, three, or four indices, namely
∂z l ∂v i ∂ 2 vi ∂z l ∂v p
wk k l
+ wk z l k l + wk k v p Γilp + wk z l k Γilp
∂x ∂x ∂x ∂x ∂x ∂x
i
∂Γ lj ∂v p
+ wk z l v j k + wk z l l Γikp + wk z l v j Γplj Γikp .
∂x ∂x
(Here, we replaced the dummy index of summation p in the fifth term by the index
j. Doing so does not change the value of that term.)
If we interchange w with z and k with l, the second term remains unchanged
here, and the fourth and sixth terms are interchanged. Now the interchange of
k with l makes no difference to any of the terms, since both of these indices are
mere dummy variables over which summation
is performed.
Therefore these three
terms drop out of the expression ∇w ∇z v − ∇z ∇w v , and what remains can be
conveniently written as
∂z l ∂wl ∂v i
∇w ∇z v − ∇z ∇w v = wk k − z k k + v p i
Γ lp
∂x ∂x ∂xl
∂Γi ∂
lj p
+ (wk z l − wl z k )v j + Γ i
Γ
kp lj .
∂xk ∂xi
Since
∂z l ∂wl ∂
[w, z] = wk k − z k k ,
∂x ∂x ∂xl
computation reveals that
∂z l ∂wl ∂v i ∂
∇[w,z] v = wk k − z k k + v p i
Γ lp ,
∂x ∂x ∂xl ∂xi
and so the vector
R(w, z)v = ∇w ∇z v − ∇z ∇w v − ∇[w,z] v
consists of just the second half of the formula above, that is
∂Γi ∂
lj p
∇w ∇z v − ∇z ∇w v − ∇[w,z] v = (wk z l − wl z k )v j + Γ i
kp Γ lj .
∂xk ∂xi
By interchanging the summation indices k and l in the subtracted terms and
then rearranging, we find that
∇w ∇z v − ∇z ∇w v − ∇[w,z] v
∂Γi ∂Γikj i p
lj p ∂
= − + Γ kp Γ lj − Γ i
lp Γ kj v j wk z l i .
∂xk ∂xl ∂x
Comparison of this result with Eq. (6.2) shows that the coefficient of ∂/∂xi in
this expression is precisely
i
Rjkl v j wk z l .
2. CURVATURE, PHASE 3: RIEMANN 237
Remark 6.4. Using this alternative approach to the Riemann curvature tensor,
one can immediately see that it really is a tensor. The work done in Appendix 6
shows that v, w, z, [w, z], ∇z v, and so forth, are all tensors, and hence R(w, z)v
must also be a tensor. That is, under a change of coordinates, it transforms correctly
in terms of the Jacobian matrix.
Remark 6.5. The Riemann curvature tensor can also be elegantly rewritten
in terms of the so-called second covariant derivative, which is defined as
∇2u,v = ∇u (∇v ) − ∇∇u v .
Taking account of Problem 6.5, we have the formula
R(u, v)w = ∇2u,v w − ∇2v,u w .
In particular, it is not generally true that a mixed second covariant derivative
is independent of the order in which the covariant derivatives are taken, although
it is independent of that order when the two vector fields u and v have constant
coefficients and it is applied to a function f rather than to a vector field w. As we
shall see below, this second covariant derivative is essentially the Hessian operator
used by Euler to define the curvature of a surface at a point where the surface is
tangent to the xy-plane.
Regarded as a tangent vector that operates on functions, the second covariant
derivative is
∂2 ∂
∇2 ∂ , ∂ = i j
− Γkij k
∂x i ∂x j ∂x ∂x ∂x
(Note that the right-hand side of this relation is symmetric in i and j.) In particular,
on a flat space like R3 , where all the Christoffel symbols are zero, we have
∂2
∇2 ∂ , ∂
= ,
∂xi ∂xj ∂xi ∂xj
that is, the second covariant derivative with respect to a pair of basis vectors in a
flat space is just the ordinary mixed second-order partial derivative.
2.2. A tensor formulation of curvature. If we lower the superscripts i
in the components of the vector R(w, z)v as described in Appendix 4, we get a
covector υ that operates on a vector u to produce the number
Rie(u, v, w, z) = u, R(w, z)v = gim Rjkl
m i j k l
u v w z = Rijkl ui v j wk z l .
With this connection established, we can now give a tensor expression for the
curvature. To do so, we need a lemma:
Lemma 6.1. The covariant Riemann curvature tensor satisfies the relation
Rijkl = −Rjikl . In terms of vectors,
u, R(w, z)v = −v, R(w, z)u .
Consequently,
u, R(w, z)v = v, R(z, w)u .
Proof. By Eq. (5.3),
Rijkl = Γ3ki Γ3lj − Γ3li Γ3kj .
It is obvious that interchanging i and j causes the expression on the right-hand
side to reverse its sign.
238 6. CONCEPTS OF CURVATURE, 1850–1950
Since it is obvious from the definition of the operator R(·, ·) that it is antisym-
metric in its arguments, the last equation now follows as well.
Remark 6.6. Since Eq. (5.3) was proved above only for surfaces in R3 , this
proof is not general. The lemma is true in general, however, as Problem 5.7 shows .
Theorem 6.4. For any two linearly independent tangent vectors u and v, the
Gaussian curvature κ is given by
* +
u, R(u, v)v
(6.4) κ= .
u , u v , v − u , v2
Proof. If we take u = w = ∂/∂x1 and v = z = ∂/∂x2 , we get the situation
described in the discussion following the proof of Theorem 6.5 of Appendix 6 in
Volume 2, in which the left-hand side of Eq. (5.3) is the curvature multiplied by
(g11 g22 −g12 g21 ) = u, u v, v−u, v v, u. Since the right-hand side of Eq. (5.3)
is the numerator in Eq. (6.4) in this case (that is, since u1 = 1 = w1 = v 2 = z 2 and
u2 = 0 = w2 = v 1 = z 1 , so that the only nonzero terms occur when i = k = 1 and
j = l = 2), we see that Eq. (6.4) is true in this special case.
Let us temporarily denote the quotient that we are claiming is equal to the
curvature by κ(u, v). Since the denominator vanishes when the vectors u and v
are linearly dependent, and therefore κ(u, v) is not defined in this case, we fix a
linearly independent pair u, v to start with for which κ(u, v) equals the curvature.
We have just shown that there is one such pair u, v. All that remains is to show
that this expression is unaltered if u and v are replaced by any linear combinations
au + bv, cu + dv with ad − bc = 0. (This restriction is necessary to assure that
the new pair is linearly independent.) The homogeneity of the expressions in the
numerator and denominator implies that replacing u by a u and v by b v, where a
and b are both nonzero, leaves κ(u, v) unaltered. (Both numerator and denominator
are multiplied by a2 b2 .) But the numerator and denominator are also unaffected
by a shear transformation u → u + v, as one can easily see. Indeed, looking at the
numerator, we find
u+v, R(u+v, v)v = u, R(u, v)v+u, R(v, v)v+v, R(u, v)v+v, R(v, v)v .
The first term on the right-hand side is just the numerator of the expression
in Eq. (6.4). The second and fourth terms vanish due to the antisymmetry of the
operator R(·, ·), and the third term vanishes by Lemma 6.1 . It follows that the
numerator remains invariant. The invariance of the denominator under the same
shear mapping is a routine computation. Hence, if a and b are both nonzero,
κ(u, v) = κ(au, bv) = κ(au + bv, bv) = κ(au + bv, v) .
Since κ(au, v) = κ(u, v) if a = 0, this equation also holds when b = 0. The
symmetry relation κ(u, v) = κ(v, u) then implies that κ(u, v) = κ(u, cu + dv)
provided d = 0.
If d = 0, we have κ(au + bv, cu + dv) = κ(au + bv, cu) = κ(bv + au, u) =
κ(v, u) = κ(u, v), provided bc = 0. Since we are assuming ad − bc = 0, we will not
have both bc = 0 and d = 0. Without loss of generality,
assume d = 0.
We then have κ(u, v) = κ(u, cu + dv) = κ su + t(cu + dv), cu + dv . Taking
s = a − bc/d = (ad − bc)/d, which is nonzero, and t = b/d, we get
ad − bc b
κ(u, v) = κ u + (cu + dv), cu + dv = κ(au + bv, cu + dv) .
d d
3. PARALLEL TRANSPORT 239
3. Parallel Transport
Now that we have a geometric interpretation of the Riemann curvature tensor, we
can see why it is called a curvature tensor. We know already that it has a connection
with curvature through its covariant form, in which the upper index is lowered. The
key ingredient in producing this tensor, as we now see, is the covariant derivative,
applied three times, with the term involving the covariant derivative with respect
to the Lie bracket playing the role of a “corrective” term when the components of
the vector fields u and v are not constant.
The covariant derivative makes it possible to speak of two vectors that are
tangent to a manifold at different points as being, in some sense, “the same vector.”
The idea is quite intuitive. A vivid image of it can be conveyed by imagining an
ancient warrior marching eastward along the equator carrying a spear that always
points directly to the front. From the cosmic perspective, that spear, representing
a tangent vector at each point where the warrior happens to be, does rotate. As
a vector in R3 , it does not remain the same. But, relative to the spherical Earth
that it is on, it comes “as close as possible” to being the same. That is, it rotates
by the minimal amount, and the warrior would have to exert some force on it to
keep it cosmically parallel to itself. From a terrestrial point of view, the rest of the
cosmos does not count, and parallelism is relative to the surface of the Earth. That
is the picture we want to keep in mind as we describe parallel transport around a
curve in any manifold, independently of any embedding that manifold may have
in an ambient Euclidean space. The covariant derivative turns out to be the key
to a precise encoding of this intuition: essentially, we require that the covariant
derivative of a “parallel transported” vector with respect to the tangent vector to
the curve vanish identically.
From now on, it will save a lot of writing and confusion if we refer to a point
P on a manifold M having coordinates (x1 , . . . , xn ) in some chart as “the point
P = (x1 , . . . , xn ).”
Definition 6.3. Let γ(t) be a parameterized curve on a manifold M in a
coordinate system (x1 , x 2 n
, . . . , x ). (Wethink of γ as a mapping into the parameter
space, that is, γ(t) = x1 (t), . . . , xn (t) .) Let w 0 be a tangent vector at a point
γ(t0 ) = (x10 , . . . , xn0 ). The vector field
∂ ))
w(t) = wi (t) i ) ,
∂x γ(t)
that is the (unique) solution of the initial-value problem
(wi ) (t) + (γ j ) (t)wk (t)Γijk γ(t) = 0 , wi (t0 ) = w0i , i, 1, 2, . . . , n ,
is the parallel transport of w 0 along the curve γ(t).
Remark 6.7. Although this definition appears unmotivated, we can provide
the motivation through the concept of the covariant derivative. As we have given
it, the vector field w(t) is defined only at points of the curve γ(t). But suppose
that w(x1 , . . . , xn ) is a vector field defined on an open set U in the parameter space
containing the curve γ and is such that w γ(t) = w(t). Then we have
∂wi
(wi ) (t) = (γ j ) (t) ,
∂xj
240 6. CONCEPTS OF CURVATURE, 1850–1950
and the system of equations that defines the parallel transport of w0 along γ is
equivalent to the vector-field equation
∇γ (t) w γ(t) = 0 , w γ(t0 ) = w 0 .
This covariant derivative makes sense only if w(x1 , . . . , xn ) is defined on an open
set, and that restriction is not necessary for our definition of parallel transport. If
it happens to be satisfied, then the condition for parallel transport has the intuitive
meaning that the directional derivative of the vector field w in the direction of the
tangent vector to the curve, when projected back into the tangent plane, is zero.
That means that, relative to the tangent line to the curve, the field is constant. In
that sense, the parallel transport of w0 is “the same vector” as w0 , only based at
a different point.
Another way of describing the situation is to say that, while the definition of
the covariant derivative ∇v u at a point requires that u be defined as a vector field
in a neighborhood of the point, we can expand this definition to cover one more
important case, allowing ∇γ (t) w to be defined by this last equation if w γ(t) is
defined for t in an open interval.
It needs to be emphasized that this “sameness” of w0 and w(t) is relative to
the curve γ, and if one arrives at the same point over a different path, the parallel
transport of w 0 will very likely be a different vector. We thus have the paradoxical
situation of two different tangent vectors at a given point both being “the same
vector” as a third vector at a different point. This possibility is the very essence
of what is meant by a curved space, and we shall investigate it thoroughly below
when we given a geometric interpretation of the Riemann curvature tensor.
To summarize, our motivation for this definition comes from the notion of a
covariant derivative, but parallel transport is actually defined using minimal princi-
ples, requiring only an initial vector and a smooth curve. The question of whether
or not there exists a vector field that is defined on an open set and whose covariant
derivative with respect to the tangent vector at each point vanishes is psychologi-
cally helpful as motivation, but not logically relevant.
N .
...............................v 1
.............................................
.................................. .. θ
... ... ....................................
. .... .... . .. v 0
...
. ...
... ...
... ...
. . ... .
. ...
... ...
... ...
... ..
...... ... ... ... ... ... ... .......
.
... θ ...
..
. ...
... ...
.... ...
...
... ...
... ...
... ...
..
. ...
..
. ...
...
.................... ...
. ................. ............................................................................
. ................. .
..
. ................... ....
..
. .... ...
... ...
... ...
... ..
..
..
. ...
.... ..
...
..
..
. ...
..
.... ...
.
. ..
.... ..
..
.. ...
.... ...
... ...
... ...
... ...
... ..
...
... ..
... ..
..
... ..
... ..
... ..
..
... ..
... ..
..
........... .. .
... .........................
................. ..........................................................................
... .................... ...
... ... ..
... ..
..
.. ..
... ..
... ..
.... ...
...
... ...
... ...
... ...
... ..
... ...
... ..
... ..
... ..
..
... ..
... ..
... ..
... ..
... ...
... ..
... ...
... ..
... ....
... ...
... ....
.... ..
. . . . . . . . . ...
..................................... . . . ..............
A .. . . . ... . . . . ... .... ...... .
............................................................ B .. .................................................... ...
...............................................................................................................................................................................................
....
algebraically, using the metric coefficients g11 = R2 , g12 = g21 = 0, g22 = R2 cos2 ϕ;
or you can take a geometric approach and create the unit normal vector n =
− cos ϕ cos θ i − cos ϕ sin θ j − sin ϕ k, then, for each i = 1, 2, 3, set up three systems
of three linear equations in the coefficients Γijk , j, k = 1, 2. If you take the second
approach, you will get the same Christoffel symbols, along with the unneeded fact
that Γ311 = Γ312 = 0, Γ322 = R cos2 ϕ.) Knowing the Christoffel symbols, we set up
two pairs of initial-value problems to determine w1 and w2 , and these turn out to
be
dw1 (θ)
0 = + (sin ϕ cos ϕ) w2
dθ
dw2 (θ)
0 = − (tan ϕ) w1
dθ
0 = w1 (0)
1/R = w2 (0) .
Here we use the fact that ϕ ≡ 0 on the path, so that the obvious solution
is w1 ≡ 0, w2 ≡ 1/R, and so the parallel transport of the vector (1/R)∂/∂θ
remains (1/R)∂/∂θ over the whole path. In terms of our identification of direc-
tional derivatives with “arrows,” this derivation is to be pictured as the vector
(1/R)∂r/∂θ = − sin θ i + cos θ j in R3 , which is indeed what we usually think of as
the unit eastward-pointing tangent vector at a point (R cos θ, R sin θ, 0) along the
equator.
∂wi
(6.5) = −wj Γikj , i = 1, 2 , k = 1, 2 ,
∂xk
where the summation on j is from j = 1 to j = 2. (Note that we regard the coeffi-
cients of w as functions
of x1 and 2
x rather than what they properly are: functions
of r(x1 , x2 ) = x1 , x2 , z(x1 , x2 ) .) Consider a closed curve on S parameterized as
γ(t), say
γ(t) = r γ 1 (t), γ 2 (t) = γ 1 (t), γ 2 (t), z(t) , 0 ≤ t ≤ 1 ,
where γ(0) = γ(1), and z(t) is an abbreviation for z γ 1 (t), γ 2 (t) . Then
k ∂wi γ 1 (t), γ 2 (t)
(6.6) (γ ) (t) = −(γ k ) (t)wj γ 1 (t), γ 2 (t) Γikj γ(t) , i = 1, 2 ,
∂xk
where the summation indices j and k assume the values 1 and 2.
The left-hand side of Eq. (6.6) is
d i 1
w γ (t), γ 2 (t) .
dt
1
Therefore,
1 if
we integrate it from t = 0 to t = 1, the result is w i
γ (1), γ 2
(1) −
i 2
w γ (0), γ (0) = 0. We conclude that
1
(γ k ) (t)wj γ 1 (t), γ 2 (t) Γikj γ(t) dt = 0 , i = 1, 2 .
0
Now this last integral can be interpreted as a line integral in the x1 x2 -plane:
A(x1 , x2 ) dx1 + B(x1 , x2 ) dx2 ,
C
where C is the closed curve given as the set of values γ 1 (t),
γ 2 (t) , 0 ≤ t ≤ 1,
A(x1 , x2 ) = wj (x1 , x2 )Γi1j r(x1 , x2 ) , and B = wj (x1 , x2 )Γi2j r(x1 , x2 ). Since we
can vary the curve C to suit outselves, let us assume that it has no self-intersections
244 6. CONCEPTS OF CURVATURE, 1850–1950
except that the point P corresponds to both parameter values 0 and 1. By Green’s
Theorem,
∂B ∂A 1 2
0= − dx dx = 0 ,
∂x1 ∂x2
S
where S is the portion of the plane enclosed by the curve C.8 Because the region
inside C can be varied at will, the integrand here must vanish identically.
It is a routine computation, using Eq. (6.5), to show that
∂Γi ∂Γi1j i 2
∂B ∂A 2j p
− = w j
− + Γ 1p Γ 2j − Γ i
2p Γ 1j .
∂x1 ∂x2 ∂x1 ∂x2
Again, since w is arbitrary, we can vary wj at will. Thus we conclude for
i = 1, 2 and j = 1, 2
∂Γi2j ∂Γi1j i p
1
− 2
+ Γ1p Γ2j − Γi2p Γp1j ≡ 0 .
∂x ∂x
Comparison with the Riemann curvature tensor shows that this equation says
i
Rj21 ≡ 0.
i
Since Rj12 = −Rj21
i i
while Rj11 i
= 0 = Rj22 , it follows that all sixteen compo-
nents of the Riemann tensor vanish in this case.
In the case of an n-dimensional manifold M, we can give a similar argument,
but we need to consider the covariant (invariant) form of Stokes’s theorem in order
to do so. We shall not give the details, but merely note that, just as above, we
can consider a closed curve γ, parameterized by a real variable t ∈ [0, 1] on the
submanifold S. This curve encloses an open region S of S, of which it is the
boundary ∂S. Since this curve is also contained in the manifold M, the differential
equations that characterize the components of the transported vector field w(Q) in
the n-dimensional parameter space are
∂wi
= −wk Γikj , i = 1, 2, . . . , n , j = 1, 2, . . . , n .
∂xj
The parallel transport of w around γ leads to the equations
0 = ω i , i = 1, 2, . . . , n ,
∂S
i
where ω is the one-form
ω i = wk Γikj dxj .
By Stokes’s theorem in its covariant form, we then have the equations
∂(wk Γikj ) l
0 = dω i = dx ∧ dxj , i = 1, 2, . . . , n .
∂xl
S U
8 Because we assume C is nonintersecting, the Jordan Curve Theorem implies that it has a
well-defined inside.
3. PARALLEL TRANSPORT 245
and U is the region of the parameter space whose image under the parameterization
is S.
Again, since this holds for any region S ⊆ S bounded by a curve γ contained in
the given two-dimensional submanifold of M, the integrand must vanish identically,
and then the equations of parallel transport imply that the n-dimensional version
of the Riemann curvature tensor must vanish identically.
Conversely, the vanishing of the Riemann curvature tensor implies that the one-
form ω i is closed. On a simply connected region, a closed form is exact, and hence
the vanishing of the Riemann curvature tensor on such a region implies that parallel
transport of a vector from one point to another within the region is independent of
the path followed.
9 In spherical and hyperbolic geometry, the area of a triangle is proportional to its angle
excess or defect—that is, the amount by which its angle sum differs from π radians—and this is
also the angle between the initial and final positions of a vector parallel-transported around the
triangle.
246 6. CONCEPTS OF CURVATURE, 1850–1950
D(0, s) B(r, s)
•................................................γ............................................................•...........
........
.... 4 ....
... ...
... ...
... ...
... ...
γ 2 ... ... ...
...
γ3
... ...
... ...
... ...
... ...
...
.......................................................1 γ ...
• ....................................................... • .
A(0, 0) C(r, 0)
m = 1, 2, 3, 4, defined as follows:
γ 1 (x) = (x, 0) , 0 ≤ x ≤ r,
γ 2 (y) = (0, y) , 0 ≤ y ≤ s,
γ 3 (y) = (r, y) , 0 ≤ y ≤ s,
γ 4 (x) = (x, s) , 0 ≤ x ≤ r.
Theorem 6.6. Under the conditions just described, the difference us (r)−v r (s)
satisfies the limiting relation
1 ∂ ∂ ∂ ∂
lim (us (r) − v r (s)) = R w, 1 , 2 = R 1
, 2 w.
(r,s)→(0,0) rs ∂x ∂x ∂x ∂x
This theorem gives us the geometric interpretation we have been wanting for
the Riemann curvature tensor R(w, u, v): Its value when u and v are infinitesimal
increments in two mutually perpendicular directions and w is any vector in the
tangent space is, up to higher-order vanishing, equal to the discrepancy when w
is parallel-transported around the infinitesimal parallelogram spanned by u and v,
divided by the area of the pre-image of that parallelogram in the parameter space.
Proof. The only thing required in the proof other than the definition of par-
allel transport is a rather abusive use of the mean-value theorem. Since u0 (r) is
the parallel transport of w over γ1 from A to C, we have
r
u0 (r) = w −
i i
uj0 (x)Γij1 (x, 0) dx
0
for i = 1, 2. Here the summation index j runs from 1 to 2, as will the other
summation index p that will occur below.
3. PARALLEL TRANSPORT 247
j We thus iget ∗two more terms to be multiplied by −r, namely the terms
v0 (s) − w j
Γj1 (x , 0) and
x∗
us (x)Γp1 (x, s) − u0 (x)Γp1 (x, 0) dx Γij1 (x∗ , 0) .
p j p j
0
248 6. CONCEPTS OF CURVATURE, 1850–1950
and η such that the solution is defined for |t| < η whenever |u| < ε. If γ u (t)
is this path, then δ r (t) = γ u (rt) is a solution of the same system of differential
equations, passes through the same point at parameter value t = 0, and satisfies
δ r (0) = ru. By the uniqueness of the solution to this initial-value problem (proved
in Appendix 5), this means that γ u (rt) = γ ru (t). If r < η and |u| < ε, then γ u (t)
is defined at t = r, that is, γ ru (t) is defined at t = 1. We have now achieved our
goal:
Definition 6.4. The exponential mapping exp (u) is the mapping u → γ u (1).
In other words, if 0 < |u| < εη, then exp (u) = γ u (1) = r x1 (1), . . . , xn (1) .
where
xi (u1 , . . . , un ) = xi0 + ui − v i (u1 , . . . , un ) ,
and, regarding xj and (xk ) once again as functions of a real variable r,
1
i 1 n
v (u , . . . , u ) = (1 − r)Γijk x1 (r), . . . , xn (r) (xj ) (r) (xk ) (r) dr .
0
The exponential mapping u → γ u (1) = exp (u) takes a tangent vector u
satisfying |u| < ηε to a point of the manifold. Moreover γ u (t) = exp (tu). It is
obvious from the smooth dependence of solutions of differential equations on the
initial conditions that this is a C ∞ mapping. In this way, the exponential mapping
allows us to use the tangent space as the natural source of a parameterization of
a local piece of the manifold. To verify that, we need to show that the mapping
exp (u) is nonsingular at the point 0.
Theorem 6.7. The exponential mapping is of full rank n in some neighborhood
of the origin.
Proof. The exponential mapping is the composite mapping
exp (u1 , . . . , un ) = r x1 (u1 , . . . , un ), . . . , xn (u1 , . . . , un ) .
It follows that its Jacobian is the product of the Jacobian of r at (x10 , . . . , xn0 )
and the Jacobian ∂(x1 , . . . , xn )/∂(u1 , . . . , un ). Thus, it suffices to show that the
latter is nonsingular at (0, . . . , 0). In fact, it is the identity matrix, as we now show.
At the point (0, . . . , 0), we have, for any sufficiently small fixed value of r
∂xi 1
= lim γsi ∂r (1)
∂uj s→0 s ∂xj
1 i
= lim γ s r ∂r (1)
s→0 s r ∂xj
1 s
= lim γri ∂r
s→0 s ∂xj r
1 i 1 ∂r i
= γ r ∂r (0) = r = δji ,
r ∂xj r ∂xj
since r(x1 , . . . , xn ) = x1 , . . . , xn , xn+1 (x1 , . . . , xn ), . . . , xm (x1 , . . . , xn ) .
Thus the exponential mapping is of full rank at the origin and by continuity
also in some neighborhood of the origin. It therefore provides a coordinate chart
on the manifold M.
This result implies that, given any basis of the tangent space, say u1 , . . . , un ,
the mapping r(x1 , . . . , xn ) = exp (x1 u1 + · · · + xn un ) is a parameterization whose
inverse ψ : U → Rn is a local chart at P . This chart is the preferred one for
some applications. Its advantage is that it is an isometry along radial lines in the
tangent plane from the point of tangency, so that the image of each such line can
be regarded as a “straight line” in the surface. All the non-Euclidean features of
the manifold M are then concentrated in the hypersurfaces of dimension n − 1
orthogonal to these radial “lines.” Moreover, being the mapping from the tangent
space into the manifold that is an isometry along radial lines near the origin and
“preserves directions” at that point (in the sense that for each unit vector u, the
derivative of the mapping t → exp (tu) at t = 0 is u), the exponential mapping
is the unique solution to the corresponding initial-value problem. That point has
important implications, which we shall explore after looking at some examples.
4. THE EXPONENTIAL MAPPING AND NORMAL COORDINATES 253
r0 x x2 + y 2
exp (x, y) = sin i
x2 + y 2 r0
r0 y x2 + y 2 x2 + y 2
+ sin j + r0 cos k.
x2 + y 2 r0 r0
x2 + y 2 x2 + y 2 x2 + y 2
exp (x, y) = xϕ , yϕ , r0 ψ ,
r02 r02 r02
t t2 t3
ϕ(t) = 1 − + − + ···
3! 5! 7!
∞
tk
= (−1)k ,
(2k + 1)!
k=0
t t2 t3
ψ(t) = 1− + − + ···
2! 4! 6!
∞
tk
= (−1)k .
(2k)!
k=0
(r, θ),
r02 r
2 2
g11 = cos2 θ + sin θ sin ,
r2 r0
r2 r
g12 = cos θ sin θ − 02 cos θ sin θ sin2 ,
r r0
r2 r
g22 = sin2 θ + 02 cos2 θ sin2 .
r r0
It follows that g11 = 1 = g22 and g12 = 0 at x = 0 = y (r = 0, θ arbitrary). In
particular, for each fixed angle θ0 , along the radial path γ(t) = exp (t cos θ, t sin θ)
we have x = t cos θ0 , y = t sin θ0 , that is, r = t and θ = θ0 , from which it follows
that
2 2 2
ds2 = g11 (x, y) x (t) + 2g12 (x, y) x (t)y (t) + g22 (x, y) y (t) dt = dt2 ,
so that the parameter t = r is indeed arc length on γ.
The exponential mapping in this case is very transparent, being a wrapping of
each tangent line along the line of longitude in whose plane the tangent line lies.
The top view is shown in Fig. 6.4, and an oblique view in Fig. 6.5.
Example 6.3. Our second example is the pseudo-hemisphere of radius11 c (see
Subsection 1.2 of Appendix 1), obtained by revolving the tractrix about the z-axis.
11 We like to use the letter c for distances that have some application to relativity. Think of
g(x, y) = r4 r4 ,
(c2 −r 2 )xy c2 y 2 +r 2 x2
r4 r4
This family of curves is indexed by a parameter a satisfying c2 − r12 ≤ |a| ≤ c,
and the projections of these curves into the punctured disk satisfy Eqs. (6.8)–(6.9).
In
general (not yet putting θ1 = 0), the equations of the geodesics that pass through
r1 , θ1 , z(r1 , θ1 ) are
t t
(6.8) r(t) = r1 cosh − a2 − (c2 − r12 ) sinh ,
c c
c−a c+a
c ln < t < c ln .
r1 − a2 − (c2 − r12 ) r1 − a2 − (c2 − r12 )
√
(c2 − r12 ) c2 − a2
(6.9) θ(t) = θ1 −
r1 (ar1 + c a2 − (c2 − r12 ))
√
(c2 − r(t)2 ) c2 − a2 1 1
+ = θ0 + c 2 − r2 .
r(t)(ar(t) + c a − (c − r(t) ))
2 2 2 r 0
the Jacobian, the exponential mapping is independent of the choice of initial pa-
rameterization. The exponential mapping is uniquely determined as the mapping
that is an isometry on radial lines through the origin and preserves directions at
that point. Thus, it will be the same mapping no matter what initial parameteriza-
tion is chosen, provided only that the mapping is of full rank. But the exponential
mapping itself is of full rank at (0, . . . , 0), and so, if we knew what it was—and
we sometimes do, as the two preceding examples have shown—we could use the
258 6. CONCEPTS OF CURVATURE, 1850–1950
exponential mapping itself as the initial mapping to produce itself. This fact has
consequences that are more profound than one might expect from such a tautolog-
ical beginning. When the initial mapping r is the exponential mapping, we have
xi (t) = tui , and as a result:
xi0 = 0,
i
(x ) (t) ≡ ui ,
(xi ) (t) ≡ 0.
It follows from the last two of these and the Euler equation that in normal
coordinates Γijk (tu1 , . . . , tun ) uj uk ≡ 0 for all t ∈ [0, 1], all n-tuples (u1 , . . . , un )
that are sufficiently small, and all i = 1, . . . , n. One might think that this last
identity makes all the Christoffel symbols vanish identically near the origin. That,
of course, would be absurd, since these coordinates exist at every point of every
manifold, while the vanishing of all the Christoffel symbols on an open set implies
that the manifold is flat on that open set. What is true, however, is that all
the Christoffel symbols vanish at the origin itself, that is, Γijk (0, . . . , 0) = 0 for
all i, j, k. This equality is not difficult to prove. Just take t = 0, and you find
Γijk (0, . . . , 0) uj uk = 0 for all (u1 , . . . , un ). Taking uj = u = 0, uk = 0 for k = j we
find that Γijj (0, . . . , 0) = 0. Then, if j = k, take uj = uk = u = 0 and ul = 0 for
l = j and l = k. The result is that 0 = Γijk (0, . . . , 0)+Γikj (0, . . . , 0) = 2Γijk (0, . . . , 0).
We can employ similar reasoning to get another fundamental fact about the
Christoffel symbols.
Theorem 6.8. in normal coordinates:
∂Γijk ∂Γilj ∂Γikl
(6.10) + + =0
∂ul ∂uk ∂uj
at the point (0, . . . , 0).
Proof. This is a little more complicated than the previous fact, but still not
difficult. We differentiate the equation Γijk (tu1 , . . . , tun ) ≡ 0 with respect to t and
then set t = 0, getting the equality
∂Γijk j k l
u u u ≡0
∂ul
for all sufficiently small values of uj , uk , and ul . (Be careful to distinguish the two
uses of the symbol ul in this equation. As the variable of differentiation, it could
be called anything. It is not a number. But in the argument of this function, it is
a number.)
First, fix an index j and take uj = u = 0 and uk = 0 = ul for k = j and l = j.
The result is
∂Γijj
= 0,
∂uj
which implies Eq. (6.10) when j = k = l. Next, fix j and k with j = k, let
uj = u = 0 and uk = v = 0, where u and v are distinct small (fixed) positive
numbers, and let ul = 0 for l = j and l = k. The result is the equality
∂Γi ∂Γijj ∂Γikj 2 ∂Γi ∂Γikj ∂Γikk 2
jk jk
+ + u v + + + uv = 0 .
∂uj ∂uk ∂uj ∂uk ∂uk ∂uj
4. THE EXPONENTIAL MAPPING AND NORMAL COORDINATES 259
5. Sectional Curvature
At this point, since we are are dispensing entirely with the embedding of a man-
ifold in Rn , it is useful to review the evolution of our notation. We began by
considering parameterizations r : U → R3 , where U is an open connected set of
5. SECTIONAL CURVATURE 261
points (s, t) in R2 . We found that arc length on the surface that is the image of
this parameterization could be conveniently given by the first fundamental form
E ds2 + 2F ds dt + G dt2 , where
∂r ∂r ∂r ∂r ∂r ∂r
E= · F = · G= · ,
∂s ∂s ∂s ∂t ∂s ∂t
When it came to finding the Gaussian curvature of such a surface (defined earlier by
Euler and reformulated by Gauss using the spherical mapping), we needed second-
order partial derivatives of r, and that involved introducing the unit normal vector
∂s ×
∂r ∂r
n = ) ∂r ∂t )
.
) × ∂r )
∂s ∂t
That normal vector allowed us to define the standard Christoffel symbols Γijk ,
i = 1, 2 and the nonstandard Γ3jk , which we soon standardized by introducing a
third parameter. Letting s = x1 and t = x2 , we defined the Christoffel symbols by
the equations
∂2r ∂r ∂r
= Γ1jk 1 + Γ2jk 2 + Γ3jk n .
∂xj ∂xk ∂x ∂x
By introducing the systematic notation E = g11 , F = g12 = g21 , G = g22 , and
the matrix
g11 g12
g= ,
g21 g22
with inverse 11
g g 12
g −1 =
g 21 g 22
we were able to get a unified algebraic formula for the standard Christoffel symbols
1 ∂gjl ∂gkl ∂gjk
Γijk = g il k
+ j
− ,
2 ∂x ∂x ∂xl
where by the Einstein convention, the terms were summed on l over the range l = 1
to l = 2.
The curvature was at first expressed as a simple formula in the nonstandard
Christoffel symbols Γ3jk , and accordingly, we standardized them by introducing a
third variable x3 and the modified mapping r̃(x1 , x2 , x3 ) = r(x1 , x2 ) + x3 n(x1 , x2 ).
When we did so, we found that the 19 new Christoffel symbols could all be obtained
by the simple device of allowing the indices i, j, k to range from 1 to 3 and the
summation on l to extend from l = 1 to l = 3 in the algebraic formula already
given for the Christoffel symbols.
Then, through intricate combinatorial work, we were able to express the curva-
ture in terms of the standard Christoffel symbols and their derivatives, dispensing
with the third variable x3 . All this was done to lay down groundwork for the more
abstract treatment of the subject that we are now engaged in. We have repeatedly
stressed that the true starting point of all this geometry is the set of metric coeffi-
cients gij . We don’t actually need to know the mapping r that produces them as
dot products of its derivatives. We have already seen that the ambient space R3
can be replaced by any Euclidean space Rn , and this makes no change whatever in
the definition of the curvature of a two-dimensional manifold, provided only that
the Christoffel symbols continue to be given by the same formula, with the index
of summation now running from 1 to n. (On the other hand, as the Whitney Em-
bedding Theorem proved in Appendix 4 for compact manifolds makes clear, any
262 6. CONCEPTS OF CURVATURE, 1850–1950
“canonical,” we would undoubtedly choose the great circle, that is, the geodesic,
which has minimal curvature.
Our task would be difficult if not for the canonical normal coordinates provided
by the exponential mapping. Since the Riemann curvature tensor is a tensor, the
sectional curvature we are going to define is the same number in any coordinate
system, and we are now fortunate in having this canonical parameterization near
any point P , in which the domain is an open ball about the origin in Rn and the
origin maps to P . We can also assume that the matrix g is the identity at the
origin, that the variables (u1 , . . . , un ) are coordinates of an orthonormal system
in Rn , that all the Christoffel symbols vanish at the origin (so that all partial
derivatives of the metric coefficients gij also vanish there), and that the image of
each line through the origin is a geodesic. It is this canonical set of coordinates
that makes it easy, given a pair of linearly independent tangent vectors u and v at
P , to construct a two-dimensional submanifold through the point P whose tangent
plane contains u and v (is the subspace of the tangent space spanned by u and v)
and whose curvature at P is given by the curvature formula we derived in terms
of the Riemann curvature tensor. This submanifold is in fact the set of geodesics
through P that are tangent to the plane spanned by u and v.
Theorem 6.9. Let u and v be two linearly independent tangent vectors at a
point P of a manifold M of dimension larger than 2. The set of geodesics through
the point P whose tangents at P lie in the subspace of the tangent space to M
spanned by u and v is a two-dimensional submanifold M # ⊂ M whose curvature at
P is
u, R(u, v)v
κ(u, v) = .
u, uv, v − u, v2
follows that dui duj = 0 if either i or j is larger than 2, and thus the sum actually
extends only over i, j = 1, 2. We thus have gij dui duj = ds2 = g̃ij dũi dũj =
g̃ij dui duj , and therefore the stated equality must hold. Thus, for i and j equal to
1 or 2, the metric coefficients g̃ij (u1 , u2 ) and all their partial derivatives with respect
to u1 and u2 coincide with the corresponding coefficients gij (u1 , u2 , 0, . . . , 0) and
their partial derivatives with respect to the same variables.
It is obvious that the coordinates (u1 , u2 ) are normal coordinates on M, # and
, i i
in particular Γjk (0, 0) = 0 for all i, j, k = 1, 2, just as Γjk (0, 0, . . . , 0) = 0 for all
i, j, k = 1, 2, . . . , n. That leaves very little of the Riemann curvature tensor to be
accounted for, and it now suffices to show that
∂Γ,i ∂Γijk
jk
l
= ,
∂u ∂ul
where the left-hand side is evaluated at (0, 0) and the right-hand side at (0, 0, . . . , 0).
In fact, we have
∂Γ,i
kj 1 ∂g̃ im ∂g̃jm ∂g̃mk ∂g̃jk 1 im ∂ 2 g̃jm ∂ 2 g̃mk ∂ 2 g̃jk
l
= l k
+ j
− m
+ g̃ l k
+ l j − l m ,
∂u 2 ∂u ∂u ∂u ∂u 2 ∂u ∂u ∂u ∂u ∂u ∂u
where the summation extends from m = 1 to m = 2, and
∂Γikj 1 ∂g im ∂gjm ∂gmk ∂gjk 1 im ∂ 2 gjm ∂ 2 gmk ∂ 2 gjk
= + − + g + − ,
∂ul 2 ∂ul ∂uk ∂uj ∂um 2 ∂u, ∂uk ∂ul ∂uj ∂ul ∂um
where the summation extends from m = 1 to m = n. Now all we have to do is note
that (1) the first term on the right-hand side of both expressions vanishes at the
origin of R2 or Rn respectively, since the Christoffel symbols vanish there in both
cases, and (2) the summation on m in the second term amounts to just the single
term where m = i, since g im = δim = g̃ im at the origin. Thus, as long as i, j, k, l
assume only the values 1 and 2, the two expressions coincide at the origin.
It then follows that R ,i = Ri for that range of indices, and we are done.
jkl jkl
The second missing piece is the Laplacian operator, which plays a central role
in classical physics. Since the metric coefficients are going to be playing a role
analogous to potential energy in our geometrized theory of gravity in the next
chapter, we need to explore the analog of the Laplacian on a general manifold and
see what it means for the Laplacian of the metric coefficients to vanish, as the
ordinary Laplacian does for Newtonian potential functions.
These two lacunae will be filled in the present section. The third and final
missing piece is the actual nexus between geometry and physics. We shall discuss
the role of the Laplacian in classical physics at the end of the present section and
the application of the generalized Laplacian in relativity in the next chapter.
6.1. The Hessian and the Laplacian in R3 . As Euler showed, the deter-
minant of the Hessian matrix is the curvature of the surface z = f (x, y) at a point
where it is tangent to the xy-plane if the matrix of metric coefficients at that point
is the identity, which we now temporarily assume. For 2 × 2 matrices M , the de-
terminant is the constant term in its characteristic polynomial χM (λ), given by the
equality
χM (λ) = det(λI − M ) = λ2 − Tr (M )λ + det(M ) ,
where I is the 2 × 2 identity matrix.
The linear term in this relation is of independent interest. We have denoted its
coefficient Tr, since it is generally called the trace of M . When M is the Hessian
matrix Hf (x, y), the trace is the well-known Laplacian:
∂2f ∂2f
Tr H(x, y) = 2
+ 2 = ∇2 f .
∂x ∂y
The eigenvalues of the Hessian matrix can then be expressed in terms of the
Laplacian on the parameter space and the curvature as
√
∇2 f ± ∇2 f + 2 κ ∇2 f − 2 κ)
.
2
The Laplacian can be decomposed into a sequence of two applications of a
single formal vector operator ∇: ∇2 f (x, y) = ∇ · (∇f ) = div grad f (x, y) . It
is named after Pierre-Simon Laplace (1749–1827), who introduced it in cylindrical
coordinates in a work devoted to the study of gravity. It is a linear second-order
differential operator, and so we might suspect that it has some connection with the
covariant derivative of a vector field. It is sometimes denoted by the symbol Δ, and
sometimes prefixed by a negative sign, since geometers prefer a positive-definite
operator to a negative-definite one. One can easily verify that the Laplacian as we
have defined it is negative-definite as an operator on a function space where the
inner product of two continuous functions of compact support on Rn is
f, g = f (x)g(x) dx ,
Rn
266 6. CONCEPTS OF CURVATURE, 1850–1950
that is, ∇2 f, f < 0 for all nonzero C 2 -functions f of compact support. (See
Problem 6.13.)
The Hessian and the Laplacian are also connected with the second covariant
derivative defined earlier. In fact, the second covariant derivative provides the
natural definition of the Hessian in generalized coordinates. When that definition
is made (see Subsection 6.6 below), it turns out that the double contraction of the
Hessian is the correct generalization of the Laplacian to these coordinates. In order
to bring out these points fully, we need to explore the operator ∇ whose iteration
is the Laplacian. This is a part of vector analysis that is fairly well known, but we
include a summary of it now so that we can efficiently discuss the generalization of
the Laplacian to an arbitrary manifold.
6.2. The operators ∇ and d and the wedge product. The gradient-curl-
divergence operator on R3 is
∂ ∂ ∂
∇= i+ j+ k.
∂x ∂y ∂z
∂f ∂f ∂f
∇f = grad f = i+ j+ k,
∂x ∂y ∂z
∂R ∂Q ∂P ∂R ∂Q ∂P
∇ × u = curl u = − i+ − j+ − k,
∂y ∂z ∂z ∂x ∂x ∂y
∂P ∂Q ∂R
∇ · u = div u = + + .
∂x ∂y ∂z
We could naturally identify the unit vectors i, j, and k with the partial deriv-
ative operators ∂/∂x, ∂/∂y, and ∂/∂z. With that identification, only the curl has
any possibility of representing a covariant derivative, mapping vectors to vectors.
Since the space R3 is flat, all of its Christoffel symbols are zero, and it is easy to
compute that
∂ ∂ ∂ ∂ ∂ ∂
∇u P +Q +R = u · ∇P + u · ∇Q + u · ∇R) ,
∂x ∂y ∂z ∂x ∂y ∂z
As we shall see below (Theorem 6.10), the Laplace–Beltrami operator that we are
going to introduce is related to the second covariant derivative, justifying to some
extent the slight risk of confusing them.
On real-valued functions f defined on an open set in R3 the Laplacian is given
in rectangular, cylindrical, and spherical (longitude and co-latitude) coordinates
6. THE LAPLACE–BELTRAMI OPERATOR 267
a completely unnecessary singularity at the equator, and thereby confines us to the northern
hemisphere.
268 6. CONCEPTS OF CURVATURE, 1850–1950
If ui = um
i ∂/∂x
m
and uj = unj ∂/∂xn , we have
k
u ukj
(dx ∧ dx )(ui , uj ) =
k l
uki ulj − uli ukj = det il .
ui ulj
If (u1 , . . . , un ) are new coordinates in which ui = ∂/∂ui , then, as we know from
the chain rule
∂ ∂xk ∂
ui = i
= ,
∂u ∂ui ∂xk
so that in the coordinates {x1 , . . . , xn } we have
∂xk
uki = ,
∂ui
and so
∂(xk , xl )
dxk ∧ dxl = det dui ∧ duj .
∂(ui , uj )
Because of this transformation law, the wedge product is a tensor of type
(0, 2). We can create alternating k-linear functionals by “antisymmetrizing” tensor
products. The process is sufficiently straightforward that we give just the example
of a 3-form: If ul = um m
l ∂/∂x , then
⎛ i ⎞
u1 uj1 uk1
dxi ∧ dxj ∧ dxk (u1 , u2 , u3 ) = det ⎝ui2 uj2 uk2 ⎠ .
ui3 uj3 uk3
6.3. Duality. When vectors on R3 are identified with 1-forms and 2-forms on
R , as illustrated above with the operator ∇, a function f (x, y, z) corresponds to
3
∇2 (ω) = d(δω) + δ(dω), and the reader can verify that this definition is the same
as the one already given for 0-forms.14
All of this generalizes to any manifold, but there are some subtleties involved,
since what has been done up to here applies only in the case when the matrix
of metric coefficients gij is an orthogonal matrix at each point. This hypothesis is
trivially satisfied in R3 . We are now about to generalize it to an arbitrary coordinate
system.
A generalization of the operator ∇2 was introduced for a manifold having metric
coefficients gij (x1 , . . . , xn ) by Eugenio Beltrami (1835–1900) in an 1868 paper on
spaces of constant curvature. The generalization is obvious for a flat space such as
Rn , where, in rectangular coordinates, the operator ∇2 is the standard Laplacian:
n
∂2f
∇2 f = ,
i=1
∂(xi )2
14 In this form, the operator with a negative sign attached is called the Laplace–de Rham
6.4. Invariant definition. It is not obvious that the Laplacian ∇2 just de-
fined is a tensor, since in general second-order differential operators are not tensors.
But if we take the definition ∇2 = dδ + δd, we can see that such is the case, since
both d and δ are tensor operations. This is already clear in the case of d, which
maps tensors of type (0, k) to tensors of type (0, k + 1). (Since an alternating ten-
sor retains its alternating character when coordinates are changed, restricting d to
alternating tensors of type (0, k)—that is, to k-forms—does not change its tensor
character.) It remains to be shown that δ is also a tensor. In view of the definition
δ = ∗d∗, it suffices to show that the adjoint operation ∗ is “tensorial,” and we
have not yet given the general definition of this operation. We shall now do so.
After we give that definition and verify that it is tensorial, we still need to show
(Theorem 6.11 below) that ∇2 as defined by Beltrami coincides with the restriction
of δd to 0-forms.
Here is how the operation ∗ works in general coordinates. First, we choose, in a
smooth way, an orthonormal basis {u1 , . . . , un } of the tangent space at each point
P —orthonormal, that is, relative to the standard inner product u, v = gpq up v q —
with dual basis {υ 1 , . . . , υ n } of the cotangent space. This is easily done in a number
of ways. We shall demonstrate below how it can be done by changing coordinates
in a local chart. For now, just assume that ui = uji (∂/∂xj ), where (x1 , . . . , xn ) are
the local coordinates. Then let υ 1 , . . . , υ n be the dual base of the co-tangent space,
that is, υ i (uj ) = δji . If the matrix15 (uji ) has inverse (vji ), then υ i = vji dxj . The
definition of ∗ is as follows.
Let ω = υ i1 ∧ · · · ∧ υ ik , where 1 ≤ i1 < · · · < ik ≤ n. Let j1 < · · · < jn−k be
the remaining indices, that is {j1 , . . . , jn−k } = {1, . . . , n} \ {i1 , . . . , ik }. (Important:
Note that the sets of indices {il } and {jl } are both arranged in ascending order.)
We define
∗ω = ε(i1 , . . . , ik , j1 , . . . , jn−k ) υ j1 ∧ · · · ∧ υ jn−k ,
where ε(i1 , . . . , ik , j1 , . . . , jn−k ) is +1 if (i1 , . . . , ik , j1 , . . . , jn−k ) is an even permu-
tation of (1, 2, . . . , n) and − 1 if this permutation is odd. It is easy to prove that
ε(m1 , . . . , mn ) = (−1)N for any permutation (m1 , . . . , mn ) of (1, . . . , n), where N is
the number of inversions in the permutation (the number of ordered pairs (mi , mj )
with i < j and mi > mj ).
Since the reader can easily verify that the parity of (i1 , . . . , ik , j1 , . . . , jn−k )
differs from that of (j1 , . . . , jn−k , i1 , . . . , ik ) by the factor (−1)k(n−k) (Problem 6.9),
it follows that ∗ ∗ω) = (−1)k(n−k) ω. It is important to keep the signs straight here.
The sign of each term is the parity of the permutation of the n-form υ 1 ∧ · · · ∧ υ n
obtained by putting the indices of the image wedge product υ j1 ∧ · · · ∧ υ jn−k in
order, after those of the pre-image υ i1 ∧ · · · ∧ υ ik , that is, by looking at the n-form
υ i1 ∧ · · · ∧ υ ik ∧ υ j1 ∧ · · · ∧ υ jn−k .
We note
that
if the number n of variables is odd, then either k or n − k is even,
so that ∗ ∗ ω = ω. If n is even, ∗ ∗ ω) = (−1)k ω for each k-form ω.
15 There seem to be advantages and disadvantages in all ways of writing matrices. In these
change-of-coordinate formulas, it seems advisable to use a subscript to indicate the row of an entry
and a superscript to indicate the column. Thus uji denotes the entry in row i, column j. On the
other hand, the matrix of metric coefficients g = (gij ), since it represents a tensor of type (0, 2),
really needs to have both of its indices subscripted, and this is particularly convenient as a way
of denoting its inverse g −1 = (g ij ). For much of what we are doing in the present section, the
matrices involved are symmetric, so that the distinction between rows and columns is not crucial.
6. THE LAPLACE–BELTRAMI OPERATOR 271
uj1 · · · ujk
ik ik
where the right-hand side is to be summed over all k-tuples (j1 , . . . , jk ). (Only those
for which no two of the indices are equal actually count, and all k! permutations of
a given set of indices {j1 , . . . , jk } can be consolidated and written canonically as a
single term with the indices in ascending order.)
Although this definition appears to be a bit messy, we really need it only for
the k-forms involved in the Laplace–Beltrami operator on functions, that is, for
0-forms, 1-forms, (n − 1)-forms, and n-forms.
For a 0-form f (x1 , . . . , xn ) we have ∗f (x1 , . . . , xn ) = f (x1 , . . . , xn ) υ 1 ∧· · ·∧υ n .
Conversely, for an n-form we have ∗f (x1 , . . . , xn ) υ 1 ∧ · · · ∧ υ n = f (x1 , . . . , xn ).
For a 1-form ω = P1 υ 1 + P2 υ 2 + · · · + Pn υ n ,
∗ω = P1 υ 2 ∧ · · · ∧ υ n − P2 υ 1 ∧ υ 3 ∧ · · · ∧ υ n + · · · + (−1)n−1 Pn υ 1 ∧ · · · ∧ υ n−1 .
Conversely, for an (n − 1)-form ω = Q1 υ 2 ∧ · · · ∧ υ n + Q2 υ 1 ∧ υ 3 ∧ · · · ∧ υ n +
· · · + Qn υ 1 ∧ · · · ∧ υ n−1 , we have
∗ω = (−1)n−1 Q1 υ 1 + (−1)n−2 Q2 υ 2 + · · · + Qn υ n .
It is then easy to see that ∗(∗ω) = (−1)n−1 ω for 1-forms and (n − 1)-forms.
Since the basis {υ 1 , . . . , υ n } of the cotangent space was created artificially just
in order to define the operation ∗, we need to see how the adjoint operator δ is
expressed in terms of the standard basis {dx1 , . . . , dxn } of the cotangent space.
Consider first a 1-form ω = Pi dxi . We define the operator δ by (1) express-
ing each dxi in terms of υ 1 , . . . , υ n , (2) applying the adjoint operation ∗ in the
orthonormal basis {υ 1 , . . . , υ n } of the cotangent space, so as to get an (n − 1)-form
ω, (3) applying the differential operator d to obtain the n-form dω, and (4) ap-
plying the adjoint operation ∗ again to get a zero-form, whose expression depends
only on the coordinate system and is independent of any bases in the tangent and
cotangent spaces. We need the orthonormal bases {u1 , . . . , un } and {υ 1 , . . . , υ n }
272 6. CONCEPTS OF CURVATURE, 1850–1950
in order to apply the adjoint operation ∗, since its direct definition in terms of the
basis {dxi , . . . , dxn } is too messy to write as a generic formula.
Example 6.4. Consider polar coordinates on R2 , for which g11 = 1, g12 =
g21 = 0, and g22 = r 2 . Thus u, v = u1 v 1 + r 2 u2 v 2 . (The superscript on r here
is an exponent; those on u and v are simply superscripts.) We get an orthonormal
basis of the tangent space easily: {u1 , u2 } = {∂/∂r, (1/r)∂/∂θ}. The basis dual
to this basis is υ 1 = dr, υ 2 = r dθ. Since we are in two dimensions here, we see
that ∗υ 1 = υ 2 and ∗υ 2 = −υ 1 . Consequently, we have the following table of
conversions:
f (r, θ) = f (r, θ) for 0-forms f (r, θ) ,
1
P (r, θ)dr + Q(r, θ)dθ = P (r, θ)υ 1 + Q(r, θ) υ 2 for 1-forms ,
r
1
F (r, θ) dr ∧ dθ = F (r, θ) υ 1 ∧ υ 2 for 2-forms .
r
From this table of conversions, we easily deduce that
∗f (r, θ) = f (r, θ) υ 1 ∧ υ 2
= r f (r, θ) dr ∧ dθ ,
Q
∗(P dr + Q dθ) = ∗ P υ 1 + υ 2
r
Q
= P υ2 − υ1
r
1
= − Q dr + rP dθ ,
r
1 1
∗F (r, θ) dr ∧ dθ = ∗(F υ 1 ∧ υ 2 ) = F (r, θ) .
r r
It is then easy to compute that
1 ∂ ∂f 1 ∂2f
∇2 f = δ(df ) = r + 2 2,
r ∂r ∂r r ∂θ
which is exactly the standard expression for the two-variable Laplacian in polar
coordinates.
To give an example that is less trivial than the case of polar coordinates in the
plane, but still fairly simple, we shall look at the Laplace–Beltrami operator on a
2-sphere.
Example 6.5. Consider longitude (θ) and latitude16 (ϕ) coordinates on the
sphere of radius r0 in R3 with center at (0, 0, 0), with the half-plane y ≤ 0 removed,
so that −π < θ < π and −π/2 < ϕ < π/2. The coordinate mapping is
r(θ, ϕ) = (r0 cos θ cos ϕ, r0 sin θ cos ϕ, r0 sin ϕ) .
The matrix of metric coefficients (gij ) is given by
g11 = r02 cos2 ϕ , g12 = g21 = 0 , g22 = r02 .
16 Note that in this example, we are defying the preference of physics textbooks for co-latitude
over latitude. Physicists also seem to prefer using ϕ for longitude and θ for co-latitude, again in
direct opposition to the practice of mathematicians.
6. THE LAPLACE–BELTRAMI OPERATOR 273
18 The reader will notice that we are apparently abusing the star symbol by using it here for
the classical adjoint of a matrix, when up to now we have used it for the adjoint of a k-form. The
two uses are closely related, however, as we shall see very shortly.
276 6. CONCEPTS OF CURVATURE, 1850–1950
about the distinction between rows and columns.) The fundamental fact is the
well-known relation
g(∗g) = det(g)I ,
that is,
∗g ij = det(g)g ij ,
where g −1 = (g ij ).
With these preliminaries out of the way, the proof is self-operating, just a
matter of following the definitions.
The positive-definite matrix g = (gij ) has a unique positive-definite square root
B = (bkl ), whose inverse A = (akl ) is the unique positive-definite square root of
g −1 = (g ij ). We have the following relations among these matrices:
B2 = g,
A2 = g −1 ,
AB = BA = I ,
Ag = gA = B ,
Bg −1 = g −1 B = A ,
1
det(A) = ,
det(g)
det(B) = det(g) ,
1
∗A = det(A)A−1 = B,
det(g)
∗B = det(B)B −1 = det(g)A .
This last relation is the most important one, since it says ∗bkl = det(g)akl .
The vectors uj = alj ∂/∂xl form an orthonormal basis of the tangent space.
The dual basis of the cotangent space is υ i = bik dxk , since
∂
υ i (uj ) = bik alj dxk = bik akj = δij .
∂xl
∇ · ∇f = div grad f .
The formula for the operator in generalized coordinates suggests that we define
the gradient of a function f (x1 , . . . , xn ) and the divergence of a vector field u =
ui ∂/∂xi on a general manifold as
∂f ∂
grad f = g il ,
∂xl ∂xi
1 ∂ det(g) ui
div u =
det(g) ∂xi
∂ui ui ∂(det(g))
= i
+
∂x 2 det(g) ∂xi
∂ui 1 ∂gjk
= + g jk ui ,
∂xi 2 ∂xi
∂ui
= + ui Γkik ,
∂xi
where the next-to-last equality follows from Eq. (6.16), proved below, and the last
one by combining that result with the result of Problem 4.12. The tensor nature
of the gradient is then easily verified.
The divergence is now seen to be connected with the covariant derivative via
the relation
div u = dxi ∇ ∂ i u .
∂x
In other words, it is the contraction of the mixed tensor T of type (1, 1) whose
component Tij is
∂
Tij = ∇ ∂ i uj j
∂x ∂x
Warning! Habits of thought learned by constant use of flat spaces, such as in
Newtonian mechanics, can be misleading. It is obvious that the gradient of a
constant function is 0. Conversely, if grad f vanishes over a region of parameter
space, then its ordinary gradient, whose components are (∂f /∂x1 , . . . , ∂f /∂xn ) is
orthogonal to every row of the matrix g −1 . Since g −1 is a nonsingular matrix, all
of these partial derivatives must vanish, and hence the function is constant. On
the other hand, it is not necessarily true that the divergence of a vector field whose
local coordinates are constant is 0. As the formula for the divergence just given
makes clear, this will be the case at points where all the Christoffel symbols vanish.
But in general a vector field that has constant coefficients on a curved manifold
may represent a nonzero flow of matter through a closed surface. (See Example 6.9
below for the Newtonian interpretation of a vanishing divergence.)
6. THE LAPLACE–BELTRAMI OPERATOR 279
Remark 6.11. Abusing the tilde-symbol once again, we note that with each
vector u, we can associate a tensor of type (1, 1), which we denote ũ and define as
follows, for all vectors v = v j ∂/∂xj and covectors υ = υk dxk :
∂uk
ũ(v, υ) = υ ∇v u = υk v j + ui k
Γ ij .
∂xj
In the standard bases of the tangent and co-tangent spaces, the coordinates of
ũ are
∂uk
ũkj = + ui Γkij ,
∂xj
and
div u = ũkk .
Thus, the divergence of u is the contraction of the corresponding tensor ũ,
and consequently also a tensor. It follows that the Laplace–Beltrami operator is a
tensor.
Remark 6.12. We have made constant use of the assumption that the matrix
g = (gij ) is positive-definite, since we needed the square root of its determinant
to produce the Laplace–Beltrami operator. The metric of relativistic space-time,
however, is not positive-definite, and its determinant is negative. In his 1916 paper
on general relativity, Einstein simply replaced the determinant by its negative before
taking the square root.
√ From our point of view, that was justifiable, since the
imaginary quantity −1 cancels out of the final expression anyway. Now that we
have the representation in Eq. (6.13) for this operator, we can dispense with that
assumption. An important example is the metric of flat space-time, which is not
positive-definite, and for which
∂ il ∂f 2
il ∂ f
(6.14) ∇2 f = g = g .
∂xl ∂xi ∂xi ∂xl
Thus, on four-dimensional space-time, where
⎛ ⎞
1 0 0 0
⎜0 −1/c2 0 0 ⎟
g=⎜ ⎝0
⎟,
0 −1/c 2
0 ⎠
0 0 0 −1/c2
we have
∂2f 2 ∂2f ∂2f
2 ∂ f
∇2 f = − c + + .
∂t2 ∂x2 ∂y 2 ∂z 2
The Laplace–Beltrami operator for this case is called the d’Alembertian, after
Jean Le Rond d’Alembert (1717–1783), and usually denoted . The differential
equation u = 0 is the classical wave equation.19 As a consequence of the Maxwell
equations (Chapter 3), each component of an electric or magnetic field satisfies this
equation. No wonder, then, that these equations have an intimate relation with the
metric of space-time.
19 One of the early definitive studies of the one-dimensional wave equation was made by
d’Alembert, who derived this equation (in one spatial dimension) independently of earlier work
by Brook Taylor (1685–1731).
280 6. CONCEPTS OF CURVATURE, 1850–1950
(The summation sign is included here because the repeated index of summation j
occurs only as a subscript.) Problem 4.12 shows that div gij is the zero one-form.
6.6. The Hessian. We can now explain what is meant by the Hessian of a
function on a manifold.
Definition 6.7. Let f (x1 , . . . , xn ) be a C ∞ -function on a manifold with metric
coefficients gij and inverse metric coefficients g ij . The Hessian Hf is the tensor of
type (0, 2) given by
Hf (u, v) = ∇2u,v f .
6. THE LAPLACE–BELTRAMI OPERATOR 281
Notice that it is the second covariant derivative ∇2u,v that appears here, not
the Laplace–Beltrami operator ∇2 . In expanded form, we have
∂2f ∂f
Hf (u, v) = i j
− Γkij k ui v j .
∂x ∂x ∂x
Notice also that this expression differs from the ordinary Hessian previously
defined for a function on the parameter space (a subset of Rn ) by the presence of
the subtracted terms
∂f
Γkij k ui v j .
∂x
On the Euclidean space Rn , the Christoffel symbols are identically zero, and
therefore the Hessian as just defined coincides with the one we defined earlier. For
more connections, see Problem 6.14 below.
Comparison of the expression just given for Hf (u, v) with Eq. (6.13) strongly
suggests that there is a connection between the Hessian and the Laplace–Beltrami
operator. And indeed there is. If we raise one index on the Hessian to get a mixed
tensor (Hf )ji = g jk (Hf )ki , then contract, we get precisely the Laplace–Beltrami
operator:
Theorem 6.11.
∂2f ∂f
(6.15) ∇2 f = g ij (Hf )ij = g ij − g ij Γkij k .
∂xi ∂xj ∂x
Proof. Given Eq. (6.13), we see that we need to establish the relation
∂g km g km ∂ det(gpq )
g Γij = − m −
ij k
= −div g k ,
∂x 2 det(gpq ) ∂xm
where g k is the kth row (or column) of the matrix inverse to the matrix of metric
coefficients.
Starting from the definition of the Christoffel symbol Γkij , we find
1 ∂g ∂gjm gij
im
g ij Γkij = g ij g km + − .
2 ∂xj ∂xi ∂xm
Since g km gim = δik , it follows that
∂(g km gim )
= 0,
∂xj
and therefore
∂gim ∂g km
g km = − gim .
∂xj ∂xj
Likewise,
∂gjm ∂g km
g km i
=− gjm .
∂x ∂xi
Therefore,
1 ij km ∂gim ∂gim 1 ij ∂g km ∂g km
g g + = − g gim + g ij
gjm
2 ∂xj ∂xi 2 ∂xj ∂xi
1 ∂g km ∂g km
= − +
2 ∂xm ∂xm
km
∂g
= − m .
∂x
282 6. CONCEPTS OF CURVATURE, 1850–1950
where the ordered (n − 1)-tuple (j1 , . . . , ji−1 , ji+1 , . . . , jn ) ranges over all (n − 1)!
permutations of (1, 2, . . . , j − 1, j + 1, . . . , n).
Now det(gpq ) has the expansion
det(gpq ) = ε(j1 , . . . , jn )g1 j1 · · · gn jn ,
where (j1 , . . . , jn ) ranges over all n! permutations of (1, . . . , n). Thus, the coefficient
of gij in this determinant is
ε(j1 , . . . , ji−1 , j, ji+1 , . . . , jn )g1 j1 · · · gi−1 ji−1 gi+1 ji+1 · · · gn jn ,
j1 ,...,ji−1 ,ji+1 ,...,jn
symbols are zero at that point. Despite the restricted nature of its validity, it
does provide for us a connection between the Laplace–Beltrami operator and the
Riemann curvature tensor.
Corollary 6.4. In normal coordinates (u1 , . . . , un ), the following relation
holds at the origin (0, . . . , 0):
∂ 2 gpq 2 ∂Γrpq ∂Γrrp 2 r
(6.17) ∇2 gpq = g mr m r = − r
− q
= − Rprq ,
∂u ∂u 3 ∂u ∂u 3
i
where Rjkl is the coordinate of the Riemann curvature tensor at that point.
Proof. Since the Christoffel symbols vanish at (0, . . . , 0), we have
∂Γrpq ∂Γrrp
r
Rprq = − .
∂ur ∂uq
Applying relation (6.10) to the second term in this expression, taking account
of the symmetry of the Christoffel symbols, we find
r
∂Γrpq ∂Γrrq
Rprq =2 + .
∂ur ∂up
Adding these two expressions and dividing the sum by 2, we find
3 ∂Γrpq 1 ∂Γrrq ∂Γrrp
r
Rprq = + − .
2 ∂ur 2 ∂up ∂uq
By writing out the definition of the Christoffel symbols and differentiating,
bearing in mind that the partial derivatives of the metric coefficients vanish at
(0, . . . , 0), we find that the second term here vanishes. (This is also a consequence
r
of the fact that Rprq is symmetric in p and q. That fact, however, is not obvious.)
Indeed we have
∂Γrrq 1 ∂2g ∂ 2 gqm ∂grq
rm
p
= g rm p q
+ r p− p m .
∂u 2 ∂u ∂u ∂u ∂u ∂u ∂u
Since r and m are both merely dummy indices of summation here, they can be
reversed in the last of these terms, and then the last two terms cancel each other.
Thus we have
∂Γrrq 1 rm ∂ 2 grm
= g .
∂up 2 ∂up ∂uq
Reversing p and q and subtracting demonstrates that
∂Γrrq ∂Γrrp 1 rm ∂ 2 grm ∂ 2 grm
− = g − = 0.
∂up ∂uq 2 ∂up ∂uq ∂uq ∂up
Thus, we find
r 3 ∂Γrpq
Rprq = .
2 ∂ur
At the same time, we find that
∂Γrpq 1 rm ∂ 2 gpm ∂ 2 gqm ∂gpq
= g + − .
∂ur 2 ∂ur ∂uq ∂ur ∂up ∂ur ∂um
By Corollary 6.1, this equality yields
∂Γrpq ∂gpq
= −g rm r m = −∇2 gpq .
∂ur ∂u ∂u
The corollary is now proved.
284 6. CONCEPTS OF CURVATURE, 1850–1950
∇2 = g ij ∇2 ∂ ∂
.
,
∂xi ∂xj
infinitesimal area dA in time dt will be −a∇u · dA dt. Thus, for the closed surface
S = ∂B that is the boundary of the region B, the rate at which heat is flowing
across S and into B is
QB (t) = a ∇u · dA .
S
By the divergence theorem, we thus have
∂u
a ∇ u dV = QB (t) =
2
dV ,
∂t
B B
Since the region B is arbitrary, the two integrands must be equal, and we thus
get the classical linearized heat equation
∂u
= a∇2 u .
∂t
In particular, when the temperature reaches a steady state, so that its time
derivative is zero, there is no flow of heat, and the temperature satisfies Laplace’s
equation ∇2 u = 0. That means that a steady-state temperature is a harmonic
function. A harmonic function represents a physical quantity that has no tendency
to flow across any closed surface. That is, the integral of its gradient over a closed
surface is zero, being equal to the integral of its Laplacian over the region enclosed
by the surface. Whatever flows out of the surface at one place is counterbalanced
by an inflow at some other point.
Example 6.8. Most closely related to the historical roots of relativity is the
wave equation, mentioned above, which is succinctly stated as “the d’Alembertian
vanishes.” In classical terms, it says
∂2u
= c2 ∇2 u ,
∂t2
where c is the speed with which the wave propagates. For a wave in a pair of
√
interacting electric and magnetic fields, that speed is 1/ εμ, where ε is the dielectric
permittivity of the medium and μ its magnetic permeability. It was Maxwell’s
startling discovery in 1861 that in free space that speed happens to be the speed of
light that led to the conclusion that light is an electromagnetic wave.
In this case, the Laplace equation ∇2 z = 0, which is the equation satisfied by
a harmonic function, gives the shape of a standing wave.
Example 6.9. Consider a fluid flowing through a region of space such that at
time t the particle (molecule) at the point x = (x, y, z) has velocity u(t; x, y, z). If
the fluid has density ρ(t; x, y, z) at time t at point (x, y, z), the net amount of fluid
dm flowing out of a region B bounded by a surface Σ = ∂B during a brief time
interval dt can be represented as the product
dm = − ρu · dA dt .
Σ
(In portions of the surface where fluid is flowing into the region B, the dot product
u · dA is negative.) By the divergence theorem, the mass of fluid m in the region
satisfies
dm
= − div (ρu) dV .
dt
B
7. CURVATURE, PHASE 4: RICCI 287
When det(g) is expanded, one of the terms is the product of the diagonal
elements. Up to second order, that product is
n
1
1− Δrrpq up uq .
2 r=1
All the other terms in the expansion of the determinant have at least two factors
that are off-diagonal elements and are therefore less than some absolute constant
2
times (u1 )2 + · · · + (un )2 . Thus the entire expansion of the determinant up to
second order is just the preceding expression. Since
∂Γrpq 2 r
Δrrpq = r
= Rprq ,
∂u 3
it follows that
1 r
det(g) = 1 − Rprq up uq + E ,
3
3/2 √
where the error term E is of the order (u1 )2 + · · · + (un )2 . Since 1 − x =
1 − 12 x + F , where F is of the order x2 , we see finally that the volume element at
points near P is
1 r
dV = det(g) du1 · · · dun ≈ 1 − Rprq up uq du1 · · · dun .
6
r
We can now see that the expression Rprq up uq measures the departure of n-
dimensional volume on the manifold from its Euclidean value in the parameter
space. If it is constantly zero, then volume remains Euclidean (up to third order).
When we reflect that gravitational forces distort objects and change their volume,
we can dimly see a possible connection between this expression and gravitation. We
introduced this object in Chapter 4 under the name Ricci tensor and set it equal
to zero in order to define a metric on space-time in which Einstein’s law of gravity
is the simple statement that the world-line of a particle is a geodesic. We have now
begun to put some foundation under what we did, and so it is time to develop the
properties of this tensor.
7.1. The Bianchi identity. As background for what we are about to do, we
need to prove the following lemma, known as the first Bianchi identity 20
Lemma 6.3. The Riemann curvature tensor R(u, v)w satisfies the relation
R(u, v)w + R(w, u)v + R(v, w)u = 0 .
Proof. The very form of this relation suggests that it must follow from the
Jacobi identity, and indeed it does. With each tangent vector u, we associate an
operator L(u) on the tangent space called the Lie derivative with respect to u
and defined by the relation L(u)v = [u, v] = ∇u v − ∇v u. It is obvious from the
definition that L(x)y = −L(y)x. The Jacobi identity implies that
L(u) L(v)w + L(w) L(u)v + L(v) L(w)u = 0 .
If we use the relation L(x)y = −L(y)x, we can rewrite this relation as
0 = L(u) L(v)w − L L(u)v w − L(v) L(u)w .
20 Named after Luigi Bianchi (1856–1928), although it had been noticed two decades earlier
by Ricci.
7. CURVATURE, PHASE 4: RICCI 289
If we write out each of the terms on the right using the formulas L(x)y =
∇x y − ∇y x and R(u, v)w = ∇u,v w − ∇v,u w, this relation becomes (when terms
are rearranged) precisely the statement of the lemma.
7.2. The Ricci tensor. We begin by giving a formal definition of the Ricci
tensor on a manifold of dimension n.
Definition 6.8. The Ricci tensor Ric (u, v) is the tensor of type (0, 2) obtained
i
from the Riemann curvature tensor whose coordinates are Rjkl by contracting on
the indices i and k, that is, by setting k = i and summing on this repeated index.
As a bilinear functional, in a basis of the tangent space {u1 , . . . , un } in which
v = v j uj and z = z l ul ,
i
∂Γlj ∂Γiij i m j l
i j l
Ric (v, z) = Rjil v z = − + Γim Γlj − Γlm Γij v z .
i m
∂xi ∂xl
Older texts, such as that of Eddington [16], define the Ricci tensor by con-
tracting on the indices i and l rather than i and k, thereby obtaining the negative
of what we are calling the Ricci tensor. Given that we mostly set it equal to zero,
the difference in sign is not important. Eddington, Einstein, and others used the
letter G to denote this tensor, a usage that conflicts with the notation for the uni-
versal gravitational constant. Modern presentations tend to use the letter R for
this tensor, and that in turn conflicts with the notation for the Riemann curvature
tensor R(u, v)z. The notation we are using here is not standard, but at least has
the virtue of suggesting exactly what it denotes.
Phrasing this definition another way, we can say that if {u1 , . . . , un } is a basis
of the tangent space and {υ 1 , . . . , υ n } the dual basis of the cotangent space, then
n
Ric (v, z) = υ i R(ui , z)v .
i=1
If v and z are fixed vectors, the mapping w → R(w, z)v is a linear operator
Tv,z on the tangent space. In those terms Ric (v, z) is the trace Tr Tv,z of the
linear operator Tv,z . The reader can easily verify this fact, since in the standard
basis of the tangent space {∂/∂x1 , . . . , ∂/∂xn } we have
i ∂
Tv,z (u) = Rjkl v j uk z l i ,
∂x
so that the matrix of Tv,z in this standard basis is
⎛ ⎞
t11 · · · t1n
⎜ .. ⎟ ,
T = ⎝ ... ..
. . ⎠
tn1 ··· tnn
where
i
tik = Rjkl vj zl ,
and so
n
Ric (v, z) = tii = Tr Tv,z .
i=1
Since the linear operator Tv,z is a tensor (transforms correctly under a change
of coordinates), and the trace of a linear operator T is the same in any and all
bases—it is the negative of the coefficient of λn−1 in the characteristic polynomial
290 6. CONCEPTS OF CURVATURE, 1850–1950
of T —we see that Ric (u, v) is a tensor. It also has the important property of
symmetry, as we shall now prove.
Remark 6.14. It may be useful to say a word about the operation of contrac-
tion in general. If we are given a tensor of type (k, l), say a multilinear functional
T (υ 1 , . . . , υ k , u1 . . . , ul )
mapping k covectors υ 1 , . . . , υ k and l vectors u1 , . . . , ul into the real numbers,
where both k and l are positive, then holding υ i fixed for i = a and uj fixed
for j = b produces a bilinear functional (υ a , ub ) → T (υ 1 , . . . , υ k , u1 . . . , ul ). We
can think of this bilinear functional as a transformation of ub into a vector L(ub )
whose action on each covector υ a is given by L(ub )(υ a ) = T (υ 1 , . . . , υ k , u1 , . . . , ul ).
Contraction on the indices a and b produces a tensor of type (k − 1, l − 1), which
is the trace of the linear operator L. In contrast to contraction, the operation of
lowering an index produces a tensor of type (k − 1, l + 1). In the present case,
starting from the Riemann tensor, which is of type (1, 3), contraction produces
the Ricci tensor of type (0, 2). When we lowered the superscript in the Riemann
curvature tensor, we got the covariant Riemann curvature tensor, which is of type
(0, 4).
7. CURVATURE, PHASE 4: RICCI 291
7.3. Ricci curvature. At the beginning of this section, we showed that the
element of volume dV near a point P of a manifold M is given at a point Q
having normal coordinates (u1 , . . . , un ) based at P (that is, the coordinates of P
are (0, . . . , 0)) by the formula
1
(6.18) dV = du1 · · · dun − Ric (u, u) du1 · · · dun + E ,
6
3/2
where E is smaller than some absolute constant times (u1 )2 +· · ·+(un )2 . That
motivates the following definition.
Definition 6.9. The Ricci curvature of M at P in the direction of a tangent
vector u, denoted Ricc (u), is Ric (u, u).
We have at last managed to link this long winding journey through the concept
of curvature to the discussion of planetary orbits in Chapter 4. Equation (6.18)
shows that an infinitesimal n-dimensional piece of the tangent space having Eu-
clidean volume dV in the tangent space maps to the same infinitesimal volume
on the manifold M at the point P . If Ricc (u) is positive at P , the ratio of the
volume of the image of such an infinitesimal piece to the volume of its preimage
in the parameter space will decrease in the direction of u. Conversely, if the Ricci
curvature in that direction is negative, that ratio will increase in the direction of u.
When the Ricci curvature is zero, as is the case in the metric chosen for space-time,
volume will be stable in every direction.
For convenience, we use the symbol Rn−1
v⊥
to denote the (n−1)-dimensional sub-
space of the tangent space at a point consisting of the vectors u for which u, v = 0.
In Theorems 6.14 and 6.15 below, as well as Definition 6.10 and Lemma 6.4, we
assume a positive-definite matrix of metric coefficients. (The extension of these
results to the general case is straightforward, but the general case of Theorem 6.15
requires the concept of a signed measure, and the extra space required to explain
that is not justified by the additional understanding to be gained.)
Theorem 6.14. Let {u1 , . . . , un−1 } be an orthonormal basis of the subspace
Rn−1
v ⊥ of the tangent space. Then the Ricci curvature Ricc (v) in the direction of a
unit vector v is the sum of the sectional curvatures of the geodesic submanifolds
tangent to the planes spanned by v and ui , i = 1, 2, . . . , n − 1. That is,
n−1
Ricc (v) = κ(ui , v) .
i=1
Proof. For any basis {u1 , . . . , un } whatever in the tangent space, we know
that
n
Ric (v, v) = ui , R(ui , v)v = κ(ui , v) v, vui , ui − v, ui 2 .
i=1
By choosing the basis to be orthonormal and such that un = v, we get the last
term of this sum to be zero and the others to be κ(ui , v). Thus, as asserted,
n−1
Ricc (v) = κ(ui , v) .
i=1
292 6. CONCEPTS OF CURVATURE, 1850–1950
It follows immediately from this result that the Ricci curvature of a two-
dimensional surface, in any direction whatever, is its Gaussian curvature.
Since the right-hand side of this last equation is the same for all orthonormal
bases of Rn−1
v⊥
, we make the following definition:
Definition 6.10. The average sectional curvature at P in the direction of v is
the number χ(v) given by
n−1
1
χ(v) = κ(ui , v)
n−1 i=1
, where μn
The definition of this measure is straightforward: σn−1 (E) = n μn (E),
is n-dimensional volume (Lebesgue measure) on R , and for any subset E ⊆ Sn−1 ,
n
, is the “cone” whose base is E and whose apex is at the origin, that is,
the set E
, = {tx : x ∈ E, 0 < t ≤ 1} .
E
The polar coordinate formula is proved by direct computation if the function
f (x) is the characteristic function of a set of the form
(a, b] × E = {tξ : ξ ∈ E ⊆ Sn−1 , a < t ≤ b},
a computation that requires only the fact that the mapping x → tx multiplies
n-dimensional volumes by tn . The formula is then extended to general integrable
functions on Rn by the fact that linear combinations of such characteristic functions
are dense in the space of integrable functions. Since we need the result only for
continuous functions, we can rely on the fact that any continuous function of com-
pact support on Rn can be uniformly approximated by a finite linear combination
of such characteristic functions.
The formula ∞
1√
e−x dx =
2
π
0 2
7. CURVATURE, PHASE 4: RICCI 293
294 6. CONCEPTS OF CURVATURE, 1850–1950
Taking μ =√ν = 1/2 and noting that Γ(1) = 1 (a trivial computation), we see
that Γ(1/2) = π. We remark that the number ωn−1 never involves any square
root of π. When n is odd, the denominator Γ(n/2) is a rational multiple of Γ(1/2),
which cancels that square root from the numerator. Thus, for example, Γ(5/2) =
√ √
(3/2)Γ(3/2) = (3/4)Γ(1/2) = 3 π/4, and so ω4 = 2π 5/2 /(3 π/4) = 8π 2 /3.
Polar coordinates are theoretically applicable to any n-fold integral. In practice
they are used most often for “radial” functions that depend only on the distance
from the origin. About the only other time one ever finds them being used arises
when the function being integrated depends on only one of the n variables. In that
case, the following formula is useful.
Lemma 6.5. For a “zonal” function f (x) of the form f (x) = ψ(x · y), where
y is a fixed vector and ψ a function of a real variable, the integral over Sn−1 is
π2
f (ξ) dσn−1 (ξ) = ωn−2 ψ(|y| sin θ) cosn−2 θ dθ .
−π
2
Sn−1
Proof. There is no loss in generality in taking y = (0, 0, . . . ,1), since the
measure is rotation-invariant and we can replace f (x) with f |y|x if we wish.
Then f (x) is a function of xn only. By definition,
f (ξ) dσn−1 (ξ) = n f˜(x) dx ,
Sn−1 Bn
where B n is the unit ball with the origin removed, that is, {x : 0 < |x| ≤ 1}, and
x
f˜(x) = f .
|x|
In the present case, where f is a function of xn only, we have
xn
f˜(x) = ψ dx ,
|z|2 + (xn )2
where z = (x1 , . . . , xn−1 ) ∈ Rn−1 .
Thus, we have
1
xn
n f˜(x) dx = n ψ dz dxn
−1 |z|2 + (xn )2
Bn |z|2 ≤1−(xn )2
This last integral is simply an integral over half of a unit disk, as we see by
setting t = r cos θ, xn = r sin θ:
π2 1
f (ξ) = nωn−2 ψ(sin θ) r n−1 cosn−2 θ dr dθ
−π
2 0
Sn−1
π
2
= ωn−2 ψ(sin θ) cosn−2 θ dθ .
−π
2
We now apply Lemma 6.5 with ψ(t) = t2 and y = (δi1 , . . . , δin−1 ), so that
|y| = 1. We thus have
π/2 3
Γ n−2 Γ
i 2
(ξ ) dσn−2 (ξ) = 2ωn−3 2
sin θ cosn−3
(θ) dθ = ωn−3 n+1 2 .
2
0 Γ 2
Sn−2
⊥
v
296 6. CONCEPTS OF CURVATURE, 1850–1950
as asserted.
7.5. Scalar curvature. Finally, we introduce yet one more notion of curva-
ture on a manifold.
Definition 6.11. The scalar curvature R on a manifold is the contraction of
the mixed Ricci tensor Ricji = g jk Ricik , that is,
R = g ij Ricij .
(Although we have made strenuous efforts to avoid the abusive use of the symbol
R for a large number of different entities, the use of this letter to denote the scalar
curvature is so well established that we just have to warn the reader to watch out
for the context when this symbol is encountered.)
The scalar curvature is obtained by first raising an index in the Ricci tensor to
get the mixed Ricci tensor Ricji = g jk Ricik , then contracting on i and j (taking the
trace of the matrix that represents the mixed tensor Ricji ).
The scalar curvature has the correct physical dimension to be a genuine cur-
vature, namely [d]−2 , where [d]2 is the dimension of the squared metric ds2 =
gij dxi dxj . The coordinates of the tensor Rgij have the same physical dimensions
as the corresponding coordinates of the Ricci tensor, and hence we can consider
linear combinations
Ricij + aRgij ,
where a is any real number.
8. PROBLEMS 297
Expressions of this kind are the key to the relativistic reformulation of mechan-
ics. Notice that R vanishes if the Ricci tensor is identically zero, as we assumed
for the gravitational field of a particle. (In particular, the vanishing of the scalar
curvature does not imply that a manifold is flat.) That fact meant that this term
would not appear in the Einstein law of gravity in free space, which we discussed in
Chapter 4. The task remaining to us is to insert this term back into the field equa-
tions and explore the relation between curvature and gravity in a wider context.
That is the subject of the next chapter.
Meanwhile, we note that the succinct statement of Einstein’s formulation of the
gravitational field of a particle is that the Ricci tensor vanishes identically. As we
have just recalled, the Ricci tensor may vanish on a curved space. We should point
out, however, that the Ricci tensor cannot vanish identically on a curved surface
in R3 , since, as one can easily compute, the scalar curvature on such a surface is
exactly twice the Gaussian curvature.
With that remark, we bring to a close this long and winding path through
differential geometry. What remains—the subject matter of the next chapter—is
to see how the use of this language affects the formulation of physical laws.
8. Problems
Problem 6.1. A vector a = (a1 , a2 , a3 ) in R3 can be naturally associated with
a skew-symmetric 3 × 3 matrix
⎛ ⎞
0 a3 a2
A = ⎝ − a3 0 a1 ⎠ .
−a 2
−a 1
0
Show that, if b = (b1 , b2 , b3 ) is associated in this way with the matrix B, then
the cross product a × b is associated with [A, B] = A B − B A. (Replacing an
associative product with its commutator, that is, replacing AB with [A, B] is a
standard way of turning an associative algebra into a Lie algebra. If the associative
algebra happens to be commutative, of course, the Lie algebra is trivial, since the
Lie products are all equal to zero.)
Problem 6.2. Consider a general surface in R3 parameterized by u and v, that
is, (u, v) → r(u, v), and a curve γ(s) on that surface:
γ(s) = r u(s), v(s) .
For a fixed parameter value s = s0 , let u0 = u(s0 ), v0 = v(s0 ). Let v(u0 , v0 )
be a vector at the point P0 = r(u0 , v0 ) on the surface:
∂r ∂r
v(u0 , v0 ) = a(u0 , v0 ) + b(u0 , v0 ) .
∂u ∂v
Show that, if v u(s), v(s) is the parallel transport of v u0 , v0 )) from the point
P0 along the curve, that is
∂r ∂r
v u(s), v(s) = a u(s), v(s) + b u(s), v(s) ,
∂u ∂v
then the squared length of the vector v u(s), v(s) , which is
2
a u(s), v(s) E u(s), v(s) + 2a u(s), v(s) b u(s), v(s) F u(s), v(s))
2
+ b u(s), v(s) G u(s)v(s) ,
is constant. (That is, the derivative of this expression with respect to s is zero.)
298 6. CONCEPTS OF CURVATURE, 1850–1950
Problem 6.3. Finding geodesics is a task that can take bizarre twists. In Ex-
ample 6.3, we found the complete family of geodesics on the pseudo-sphere through
a given point, despite the fact that this surface has a complicated equation involv-
ing transcendental functions. To be sure, when cylindrical coordinates are used,
z is independent of θ, and that makes the task much easier. Even so, compared
to that example, one would expect it to be utterly trivial to find the geodesics on
a surface as simple (algebraically) as the hyperbolic paraboloid whose equation is
z = xy/c. In the natural parameterization r(x, y) = x i + y j + (xy/c) k, we have
g(0, 0) = I, and so all we need to do is find the geodesic γ(t) with γ(0) = 0 whose
tangent at 0 when arc length is the parameter is given by
x y
γ (0) = i+ j.
2
x +y 2 x + y2
2
and that
L(u)∇ (w, v) = L(u)∇ (v, w) .
Problem 6.8. Consider a coordinate system (for example, normal coordinates)
in which the matrix of metric coefficients is the identity matrix at the point P . Prove
j
i
that in these coordinates Rjkl = −Rikl .
Problem 6.9. Prove that the parity of the permutation (i1 , . . . , ik , j1 , . . . , jn−k )
differs from that of (j1 , . . . , jn−k , i1 , . . . , ik ) by the factor (−1)k(n−k) .
Problem 6.10. Show that in normal coordinates on the 2-sphere S2 in R3 , the
metric coefficients are
2 2 2 2 2 2
y 2 ϕ x r+y2
0
+ x2 ψ x r+y 2
0 x2 x2 + y 2 2
g11 (x, y) = + 2 ϕ ,
2
x +y 2 r0 r02
2 2 2 2 2
x2 ϕ x r+y 2
0
+ y 2 ψ x r+y2
0 y 2 x2 + y 2 2
g22 (x, y) = + ϕ ,
x2 + y 2 r02 r02
2 2 2 2
2 2
ψ x r+y − ϕ x r+y
xy x2 + y 2 2
2 2 xy
0 0
g12 (x, y) = + ϕ .
x2 + y 2 r02 r02
√ √ √
Here the functions ϕ(t) = sin( t)/ t and ψ(t) = cos( t) are needed only for
nonnegative values of t, namely t = (x2 +y 2 )/r02 . They satisfy the easily established
2 2
relations t ϕ(t) + ψ(t) = 1, ϕ (t) = 12 ψ(t) − ϕ(t) , and ψ (t) = − 12 ϕ(t). It is
easy to see that g11 (x, y) → 1, g22 (x, y) → 1, and g12 (x, y) → 0 as (x, y) → (0, 0).
Thus, suppressing the argument (x2 + y 2 )/r02 of ϕ and ψ, since ϕ → 1 and ψ → 1
as (x, y) → (0, 0), we find
) y 2 (ϕ2 − 1) + x2 (ψ 2 − 1) x2 )
) )
0 ≤ |g11 (x, y) − 1| = ) + 2 ϕ2 )
x2 + y 2 r0
x2
≤ max |ϕ2 − 1|, |ψ 2 − 1| + 2 ϕ2 → 0 ,
r0
) x2 (ϕ2 − 1) + y 2 (ψ 2 − 1) y 2 )
) )
0 ≤ |g22 (x, y) − 1| = ) + 2 ϕ2 )
x2 + y 2 r0
y2
≤ max |ϕ2 − 1|, |ψ 2 − 1| + 2 ϕ2 → 0 ,
r0
) ψ 2 − ϕ2 xy )
xy ) 1 ) ) |xy|
)
0 ≤ |g12 (x, y)| = ) 2 2
+ 2 ϕ2 ) ≤ ) ψ 2 − ϕ2 ) + 2 ϕ2 → 0 .
x +y r0 2 r0
Convert these expressions to those given in the text (Example 6.2).
Deduce as a consequence that the element of surface area in these coordinates
is
r0 r 1 r2
g11 g22 − g12
2 = sin ≈1− .
r r0 6 r02
Use this density function (not the approximation) to show that the area of the
spherical cap centered at (0, 0, r0 ) and whose boundary lies at geodesic distance s
from this point has area 4πr02 sin2 (s/2r0 ). In particular, the upper hemisphere, for
which s = πr0 /2 has area 2πr02 , as it ought to.
300 6. CONCEPTS OF CURVATURE, 1850–1950
Apply that equation instead to the first term and show that
3 ∂grm
r
Rprq = − g rm p q .
2 ∂u ∂u
Problem 6.16. When the 2-sphere S2 in R3 is parameterized by longitude and
latitude coordinates, that is, by the mapping (θ, ϕ) → (cos θ cos ϕ, sin θ cos ϕ, sin ϕ),
the orthogonally invariant measure on it is dσ2 (θ, ϕ) = cos ϕ dθ dϕ. Prove that
4π
(ξ 1 )2 dσ2 = (ξ 2 )2 dσ2 = (ξ 3 )2 dσ2 = .
3
S2 S2 S2
(This is the special case of Theorem 6.15 that occurs when f (ξ) = (ξ · u)2 for the
unit vectors u = (1, 0, 0), u = (0, 1, 0), and u = (0, 0, 1), since ω2 = 4π.)
Problem 6.17. Let ε > 0. Define two functions fε (x) and gε (x) on [1, ∞) as
follows:
nε
sin2 πn3 (x − n) , if n ≤ x ≤ n + n13 , n = 1, 2, 3, . . . ,
f ε(x) = 2 ,
0, otherwise
x
π2ε
gε (x) = + fε (s) ds .
24 1
Show that fε (x) ≥ 0, so that and fε (n + 1/(2n3 )) = nε/2, so that
fε (n + 1/(2n3 )) → ∞ as n → ∞. Thus, in particular, fε (x) is not bounded.
Then show that gε (x) is an increasing function of x, and that for all x,
π2ε π2ε
0< ≤ gε (x) ≤ <ε
24 12
By letting ε tend to zero, show that gε (x) can be made arbitrarily small while its
derivative gε (x) = fε (x) remains unbounded.
Problem 6.18. Verify that the covariant derivative ∇v has the derivation
property ∇v (f u) = ∇v f u + f ∇v u.
Problem 6.19. Show that the computed sectional curvature κ(u, v) does not
change if the vector u is replaced by au + bv, for any nonzero a and any b. Thus,
the sectional curvature depends only on the plane spanned by u and v.
Problem 6.20. Show that the length of the geodesic γ u (t) used in defining
the exponential mapping is |u|.
Problem 6.21. The three-sphere S3 of radius 1, whose sectional curvature we
have computed, is an excellent example of a Lie group. It consists of the points in
R4 that we may identify with points in space-time, calling them T = (t0 ; t1 , t2 , t3 ).
Then
S3 = {T : (t0 )2 + (t1 )2 + (t2 )2 + (t3 )2 = 1} .
What makes this manifold a Lie group is the group operation of quaternion
multiplication. If we identify the quaternion T with the formal sum of a real num-
ber and a vector in R3 , say T = t0 + τ , where t0 is identified with the quaternion
(t0 , 0, 0, 0) and τ = t1 i = t2 j + t3 k is identified with the quaternion (0, t1 , t2 , t3 ),
the group operation is quaternion multiplication: If S = s0 + σ, then ST =
(s0 t0 − σ · τ ) + (sτ + tσ + σ × τ ).
Verify that S3 is a group under the operation of quaternion multiplication with
identity I = 1 + 0.
302 6. CONCEPTS OF CURVATURE, 1850–1950
Problem 6.22. The exponential mapping at the point (1; 0, 0, 0) in the group
of unit quaternions is given by the analog of the mapping used in Example 6.2 on
the sphere S2 , namely
exp (x) = ψ |x|2 + ϕ |x|2 x
But one thing is certain: I have never before in my life worked this
hard, and I have acquired a great deal of respect for mathematics,
whose finer points I had naively regarded as a pure luxury up to
now. Compared with this problem, the original theory of relativity
is child’s play.
Einstein, letter to Arnold Sommerfeld, 29 October 1912. Klein,
Kox, and Schulmann ([47], p. 505). My translation.
Now that the mathematical background of general relativity has been explored in
Chapters 5 and 6, we can set the computational work of Chapter 4 in its proper
context. Using the principle that mechanical laws should be expressed as tensor
equations involving only at most second-order partial derivatives of the metric coef-
ficients, Einstein was led to the Ricci tensor. Setting that tensor equal to zero in free
space—when the attracting body is regarded as a massive particle—provided the
simplest of all possible nontrivial tensor equations of this type and led to the very
precise explanations of the precession of Mercury’s perihelion and the deflection of
light passing near the Sun. The closed-form Schwarzschild solution presented in
Chapter 4 was a triumph of theoretical science and made a very satisfying connec-
tion between theory and observation.
If we were to leave off at that point, we would be presenting the reader with an
oversimplified version of the story and a bit of a mystery to ponder. Two points in
particular need to be addressed: (1) Why the Ricci tensor? Why not, for example,
apply the Laplace–Beltrami operator to the metric coefficients instead? After all,
we have perturbed the flat-space metric by adjoining terms connected with the po-
tential energy. In Newtonian mechanics, it is the Laplacian of the potential energy
that we work with. Why wouldn’t the Hessian, which also describes the curvature,
do equally well? (2) How are the equations of motion to be determined in other
applications of relativistic mechanics? What replaces “force” and what replaces
the famous “F = ma” in a situation where the gravitational field is produced by a
continuous distribution of matter, and how do we formulate electromagnetic forces
in this language?
We shall explore some of these questions in the present chapter. (Not all of
them; we do not have space to discuss electromagnetism.) We shall introduce the
stress-energy-momentum tensor promised in Chapter 2, along with the Einstein
field equations that replace F = ma. We are entering very deep waters here, a
full exploration of which would require another book the size of the present one.
Accordingly, we shall keep safely close to shore, in water that is comparatively
shallow. Our aim is to try to make the reformulation of mechanics using the Einstein
field equations—which, it must be admitted, takes considerable getting used to for
303
304 7. THE GEOMETRIZATION OF GRAVITY
Our aim is to start with the case where we have seen general relativity work
well, the gravitational field around a massive particle in empty space, and then
fill up that empty space with matter of a certain density. In empty space we got
our metric coefficients by setting the Ricci tensor equal to zero. We discovered
that we were essentially perturbing the flat-space metric by subtracting the ratio
that the Schwarzschild radius ρs bears to the range ρ in the coefficient of dt2 and
subtracting the reciprocal of this ratio in the coefficient of dρ2 . We noted that
this ratio is the negative of twice the Newtonian potential per unit mass divided
by the square of the speed of light (ρs /ρ = −2V /c2 , where V = −GM/ρ is the
Newtonian potential energy per unit mass of the orbiting particle). We now face
the essential question: Why did Einstein think the Ricci tensor ought to vanish
in this case? After we answer that question, we can make a guess as to how the
metric should be perturbed in a more general case. Then, by looking at the Ricci
tensor with the more general perturbation, we can arrive at a hypothetical set of
field equations—the Einstein field equations—to replace Newton’s second law. At
that point, our work will be done.
As we remarked in Chapter 4, this approach does not give us any new equations;
nor does it save us any trouble in solving the ones we already had in terms of
Newton’s formulation. But it is of great value in unifying our thinking about
mechanics. For example, it was stated by Fermat that a ray of light always appears
to follow a “path of least resistance,” one that requires the minimal time to traverse
compared with all nearby paths. In 1696, a generation after Fermat’s death (which
occurred in 1665), the same principle was introduced into mechanics by Johann
Bernoulli (1667–1748) to solve a problem with which he then challenged other
mathematicians: Finding a “sliding board” down which a frictionless particle would
slide in least time from a given height—the brachistochrone (shortest-time) problem,
which was later also solved by Newton.1 In contrast to mechanics, optics had
always been a purely geometric subject. But now an analogy with optics allowed
the geometric point of view to supplant the concept of force, even in its stronghold
of mechanics. Since the Euler equations are also the key to finding the shortest path
between two points in a space with a general metric, the way was prepared early on
for a further connection between geometry and mechanics. And in 1744, Bernoulli’s
protégé Leonhard Euler showed that a particle moving over a surface and subject to
no tangential forces will move along a geodesic on that surface. (See Appendix 2 for
a proof.) From that perspective, the classical Lagrangian approach can be thought
of as saying that the particle moves along the path of “steepest descent,” where
descent means exchanging potential energy for kinetic energy. That is, given the
states of these two energies at the initial and final positions, the particle gets from
the initial state to the final state in minimal time.
1.2. Comparison with general relativity. Now let us compare the La-
grangian formulation of Newton’s second law with the language of geodesics. The
equations of a geodesic, with proper time as parameter, are
(xi ) + Γijk x (xj ) (xk ) = 0 ,
1 Bernoulli imagined the particle was a ray of light moving in a medium whose index of
refraction was inversely proportional to the square root of the distance fallen; that is because
the speed of a particle falling with constant acceleration is proportional to the square root of the
distance it has fallen.
1. THE EINSTEIN FIELD EQUATIONS 307
between the center of attraction and the orbiting particle, for a mass density. Even worse, we
are going to assume that that the matter in question is moving with constant speed v relative
to an observer fixed in the frame of the attracting particle and has rest density ρ0 . To minimize
confusion, we shall henceforth use r instead of ρ for the distance and replace the Schwarzschild
radius ρs by 4rs , since we plan to use isotropic coordinates to simplify the algebra.
308 7. THE GEOMETRIZATION OF GRAVITY
1.3. The Einstein tensor. If, following Eddington, we think of the metric
coefficients gpq as generalized potentials, we have a choice of various operators to
play the role of the “Laplacian of the potential,” namely the Laplace–Beltrami
1. THE EINSTEIN FIELD EQUATIONS 309
operator, the Hessian, and the Ricci tensor, all closely related to one another, as we
have seen. Choosing the appropriate operator is the task of a physicist. Practical
experience is helpful, and Einstein struggled with this problem for a considerable
time.
We might be tempted to replace the Laplacian in Poisson’s equation once again
by the Ricci tensor, but then what do we do with the term −4πGρ? We observe
that, except for a coefficient of the form ar 2 , that term is just the potential per
unit mass. To be specific, it is 6ϕ/r 2 . Since we are using ϕ to perturb the metric
coefficients gij , we are thus led to consider, as candidates for the left-hand side of
the analog of Poisson’s equation, combinations of the form
Ric − Cϕ ,
for a constant C of suitable dimension. The dimension of C, as already remarked,
is the dimension of the scalar curvature R. The problem we face can thus be
reduced to the choice of a dimensionless scalar a that will produce a suitable tensor
Ricij − aRgij . Notice that R = 0 when the Ricci tensor vanishes, and that explains
why the extra term was absent in the free-space version of gravitation. Its existence
would not have been suspected had we not tried to complicate the problem by
replacing a particle with a continuous density.
What do we mean by suitable and how are we to choose a? Again, Newtonian
mechanics is our guide. In addition to being conservative (meaning that the curl
∇ × F is 0), the gravitational force F = −(GM mr −3 )r is also divergence-free:
∇ · F = 0. We thus attempt to choose a so that the tensor we are going to call
the Einstein tensor and denote by the symbol Ein will have the form Einij =
Ricij − aRgij and will have zero divergence. Since we have discussed only how to
take the divergence of contravariant tensors, it will be necessary to raise the indices
in this covariant tensor of type (0, 2), getting a contravariant version of the Einstein
tensor:3
(7.1) ContraEinij = g ik g jm Rickm − aRgkm = Ricij − aRg ij .
Not to prolong the suspense, we shall reveal immediately that the magic coefficient
is just a = 1/2.
Theorem 7.1. If the contravariant Einstein tensor ContraEin is defined by
Eq. ( 7.1) with a = 1/2, then
⎛ ⎞
0
⎜ 0⎟
div (ContraEin) = ⎝ ⎟
⎜ .
0⎠
0
Proof. Although this theorem is true in general, a full proof requires yet more
combinatorial work in the area of the Bianchi identity, which we wish to spare the
reader. Since we have need of this theorem only in the case of space-time coordinates
in which the metric tensor is diagonal, we confine the proof to that case. We can
then spare the reader all the computational torture by trusting Mathematica to
compute the divergence of the contravariant Einstein tensor and verify that it is
3 Because indices can be raised and lowered at will, any tensor equation stated in covariant
form has an equivalent contravariant form. We choose to deal with the contravariant form in
order to minimize the number of symbols we need. It is generally simpler, though, to state tensor
equations in covariant form.
310 7. THE GEOMETRIZATION OF GRAVITY
indeed the zero vector. Mathematica Notebook 11 in Volume 3 gives a proof for
a perfectly general diagonal space-time metric and requires only a minute or two
to run. Any attempt to get Mathematica to carry out this same labor with a full
4 × 4 matrix of metric coefficients will require considerable patience, since the two
minutes will expand to a day or more of computing, and the computer will need a
great deal of memory.
The output of this program is {0, 0, 0, 0}, and so we have assurance that the
contravariant form of the Einstein tensor is divergence-free, as desired.
Remark 7.2. According to Wald ([83], p. 72), Einstein considered making
the equation Ein = 0 the fundamental equation of a gravitational field, but was
deterred by the fact that it would imply that ρ is constant throughout the universe.
This would make for a great deal of difficulty, since the Newtonian potential for
such a distribution would be identically 0. This curious step along the way to the
general field equations is an example of the trial-and-error process by which they
were discovered.
Remark 7.3. Lovelock ([57]) has shown that the Einstein tensor is the only
divergence-free tensor that can be formed from the metric coefficients and their
first and second derivatives. Lovelock’s theorem proves formally what Einstein had
asserted in 1916.
1.4. The field equation. We now have reason to believe that the Einstein
tensor represents one side of an “equation of motion” in space-time. Its contravari-
ant form will have zero divergence, just like the Newtonian gravitational field, no
matter what the metric tensor is. In that sense, it is the universal part of the
fundamental equation of mechanics, just as mr is in Newtonian mechanics. To
solve mechanical problems, we need the other side of the equation, analogous to
what is generally known as F in Newtonian mechanics, but should really be kF ,
where k is a constant of proportionality that reconciles the dimensions and sizes of
the two sides. Physical units, such as the MKS system are generally chosen so as
to make this constant equal to unity. Thus, the unit of force, the newton, is chosen
so that an acceleration of one meter per second-squared of one kilogram of mass
requires a force of one newton. Since we don’t yet know what units will be most
convenient in relativity, we are going to state the fundamental field equations with
an unspecified constant of proportionality; after looking at one example, we shall
then give the constant the conventional value used by physicists.
We are now seeking the other side of the field equation, which is a constant
multiple of a tensor of type (0, 2) called the stress-energy-momentum tensor, and
usually denoted T . We begin with a simple special case of the Einstein field equation
in the form
8πG
Einpq = 4 Tpq ,
c
where the reason for the particular constant of proportionality will be seen in the
examples below. In standard notation, the metric ds2 of special relativity is “spa-
tialized,” that is, ds2 = c2 dt2 −dx2 −dy 2 −dz 2 . In that case, g11 = c2 has dimension
length2 /time2 , g1j and gj1 , j > 1, all of which are zero, have dimension length/time,
and all the other gij are dimensionless. In that system, the dimensions of the compo-
nents of the Ricci tensor (and hence also those of the Einstein tensor) are as follows:
Ric11 has dimension time−2 , Ric1j has dimension (time × length)−1 , for j > 1, and
1. THE EINSTEIN FIELD EQUATIONS 311
all other components have dimension length−2 . Now G/c4 = (2GM/c2 )(1/M c2 ) =
rs /(M c2 ), so that its dimension is time2 /(mass × length). It follows that the di-
mension of T11 must be mass × length/time4 , which is mass × velocity4 /volume.
It will thus be a dimensionless constant times ρc4 , where ρ is a density In other
words, the entry in the first row and column of Ein will be 8πGρ, where ρ has the
physical dimension of density. When we look at examples below, we will see why
8πG is the most convenient value for the constant.
We will actually be working with the contravariant version of this tensor, and
since g 11 in special relativity is c−2 , that means the entry T 11 will be simply ρ,
which is ρc2 /c2 , that is, an energy density ρc2 divided by c2 . The entries T 1j
and T j1 , j > 1, will represent components of momentum. The diagonal elements
T jj , j > 1 will represent longitudinal stresses, and the remaining six entries T ij ,
2 ≤ i, j ≤ 4 will represent shear stresses. Our interest, however, will be confined to
just T 11 .
We shall look at the Einstein tensor only in the simple case when the gravita-
tional field is due to a spherical mass distribution of constant density ρ, for which
the Newtonian potential per unit mass is V /m = 2πGρr 2 /3, as long as r is less
than the radius of the sphere containing the mass distribution. We recall that in
the case of attraction by a particle of mass M , we replaced the term dt2 in the
metric by (1 − rs /r) dt2 , where rs was the Schwarzschild radius. This, as we saw,
was tantamount to introducing the coefficient 1 + 2V /(mc2 ), where V = −GM m/r
is the Newtonian potential at distance r.
To keep the algebra simple, we are going to use isotropic spherical coordinates,
in which the Schwarzschild radius rs is four times its value in standard coordinates.
In standard spherical coordinates we replace the ratio rs /r by −2V /(mc2 ), so that
in isotropic spherical coordinates we need to replace it by −V /(2mc2 ). But since
we wish to confine our attention to distances r larger than rs , where the coefficient
of dt2 should be smaller than 1, we are actually going to replace it with V /2mc2 .
The metric becomes
3c2 − πGρr2 2 πGρr2 4 2
ds2 = dt 2
− 1 + dr + r 2 dϕ2 + r 2 sin2 ϕ dθ 2 .
3c2 + πGρr2 3c2
With these coordinates, we find that
⎛ 2 2
(1+πGρr /(3c )
⎞
(1−πGρr 2 /(3c2 ))7 0 0 0
⎜ ⎟
⎜ 0 πGρr 2
9c2 2 , 0 0⎟⎟.
Ein = 8πGρ ⎜
⎜
3c2
1− πGρr 2 /(3c2 ) ⎟
⎝ 0 0 0 0⎠
0 0 0 0
Simple as it appears, this expression is still more complicated than we need
for our purposes. We are assuming that the function V /m = πGρr2 /3 is “small.”
When we omit terms containing this factor, the Einstein tensor becomes
⎛ ⎞ ⎛ 4 ⎞
8πGρ 0 0 0 ρc 0 0 0
⎜ 0 0 0 0⎟ ⎜ ⎟
Ein = ⎜ ⎟ , that is, (Tpq ) = ⎜ 0 0 0 0⎟ ,
⎝ 0 0 0 0 ⎠ ⎝ 0 0 0 0⎠
0 0 0 0 0 0 0 0
which is not only of the same form that we indicated above, but even the same
exact value!
312 7. THE GEOMETRIZATION OF GRAVITY
Since, in first approximation, g 11 = 1/c2 for this case, it follows that the
contravariant version of this tensor is the same thing multiplied by 1/c4 . In other
words, the contravariant stress-energy-momentum tensor in this case is
⎛ ⎞
ρ 0 0 0
⎜ 0 0 0 0⎟
(T pq ) = ⎜
⎝ 0 0 0 0⎠ .
⎟
0 0 0 0
This is exactly the stress-energy-momentum tensor we obtained in special rela-
tivity for a constant mass density ρ moving along a straight line at constant speed
(see Chapter 2).
Although purists might prefer to think of this tensor equation as an approxi-
mation to the “true” equation, in reality the expression being approximated is also
not “true” in the mathematical sense of infinite precision. The approximation is in
every practical way better, since we can compute with it, and we have no elegant
way of dealing with the “exact” Einstein tensor in this case. In the “Newtonian
limit” as c → ∞ and rs → 0, we find that πGρr 2 /(3c2 ) → 0 also, and the entry in
the first row and column of the “exact” Einstein tensor (the one before the approx-
imation was made) becomes 2∇2 V /m, so that the equation becomes the classical
Poisson equation, ∇2 (V /m) = 4πGρ.
Remark 7.4. A digression on mathematical elegance and physics may be in or-
der at this point. Both mathematicians and physicists care about elegance, despite
the advice given to physicists by the great nineteenth-century physicist Ludwig
Boltzmann (1844–1906) to “let elegance be the concern of shoemakers and tailors.”
(“Eleganz sei die Sache der Schuster und Schneider.” Source: Wikiquotes.) This
statement is often misattributed to Einstein, who did indeed quote it in 1916 in
the preface to an exposition of relativity theory. Nevertheless, his attitude toward
mathematical physics vehemently contradicts it. In fact, the context in which he
made the quotation refers not to the theory itself but only to his exposition of it.
What he said (my translation) was, “For the sake of clarity, I have found it neces-
sary to repeat myself frequently, not paying the slightest attention to elegance of
presentation; from the scientific point of view, I have followed the dictum of the
brilliant theoretician L. Boltzmann, that one should let elegance be a concern of the
tailors and shoemakers.” Boltzmann’s statement was rebutted by the mathematical
physicist Franz von Krbek (1898–1984), a professor at the University of Greifswald,
in his 1952 book The Captive Infinite (Eingefangenes Unendlich), p. 28.
Mathematicians and physicists seem to be pursuing the same goal. They di-
verge, however, in the matter of consistency. Physicists are generally willing to
replace an exact expression by an approximate expression that uses simpler, more
elementary functions. In both cases, to get a useful application to the physical
world, we must resort to numerical computation of the resulting functions, and the
numerical computations in this case do agree with each other and with observation.
In that respect, Boltzmann’s aphorism can be turned on its head. The process of
taking a mathematical model and applying it to the world, as physicists do, is in
many ways similar to what a tailor or shoemaker does with raw material taken from
nature, trimming it here and there and sewing separate pieces together to fit the
“client” (the physical world). And, just like the work of tailors and shoemakers,
the end result is never a perfect fit. The universe, like the human body, is too
1. THE EINSTEIN FIELD EQUATIONS 313
complicated, and our tools can’t make the raw material fit with infinite precision.
But the aim is elegance, even if it amounts to elegance followed by ugly minor
adjustments.
In any case, we should not be bothered by the need to make approximations
that are closer than any observable difference. Mathematical physics has always
accepted much coarser approximations. For example, the derivation of the classical
vibrating string equation assumes that the restoring force on a stretched string at
each point is proportional to the curvature at that point. If the equation of the
instantaneous form of the string is y = f (x, t) at time t, this means
2
∂ y
∂2y 2 ∂x2
∂t2
= c 2 3/2 .
1 + ∂y
∂x
But in fact, it is usually assumed that dy/dx is negligible, so that the equation
becomes
∂2y 2
2∂ y
= c ,
∂t2 ∂x2
which is the classical one-dimensional wave equation. If we didn’t make such ap-
proximations, the Laplacian would arise far less often in Newtonian mechanics than
it in fact does.
Let us now return to our main theme. The Einstein field equation we have so
far is
8πG
Ein = 4 T ,
c
where T is the stress-energy-momentum tensor.
In standard notation the Einstein tensor is written in terms of its coordinates,
which are denoted Gμν , so that
1 8πG
Gμν = Ricμν − Rgμν = 4 Tμν .
2 c
That is a form more likely to be recognized by physicists, although they are ac-
customed to using just the letter R to denote the Ricci tensor. The constant still
looks a bit messy, but physicists are accustomed to assuming units of measurement
in which G and c are both numerically equal to 1, and thus the constant becomes
simply 8π.
As we have already noted above, the contravariant version of this equation is
⎛ ⎞
ρ 0 0 0
8πG ⎜0 0 0 0⎟
ContraEin = 4 ⎜ ⎟.
c ⎝ 0 0 0 0⎠
0 0 0 0
There now remains only one point that needs to be mentioned in connection
with the Einstein field equations.
1.5. The cosmological constant. The Einstein field equations were not ar-
rived at over a cup of tea. Einstein proposed them only after many years of hard
and deep thought. In the early years of general relativity, a century ago, definitions
were still in a fluid state. Although gravitational effects are quite weak compared
with electromagnetic effects, they still tend to cause a shrinkage in the size of the
universe. Moreover, although the Newtonian gravitational interaction between two
314 7. THE GEOMETRIZATION OF GRAVITY
particles conserves all the usual things, such as angular momentum and total en-
ergy, such is not the case for solid bodies. Gravitational forces are responsible
for the tides, and tidal friction produces heat. Consequently, the idealization of
a planet as a particle, which we have used in Chapter 4, is not perfectly realis-
tic, even though it does suffice to explain the precession of perihelion. In order to
keep the universe from shrinking due to gravity, or expanding without limit due to
an excess of energy, Einstein replaced the field equation written above by a slight
modification, to obtain the now-standard Einstein field equations
8πG
(7.2) Gμν + Λgμν = Tμν .
c4
The constant Λ was known to be small, and Eddington remarked at the time that its
value was mostly theoretical. It allowed the universe to keep its size. As it happens,
however, the solutions to Eq. (7.2) are not stable with respect to perturbations in
the cosmological constant Λ, as Einstein called it. Einstein was therefore led to
abandon this idea. As his friend George Gamow ([32], p. 44) recalled—inaccurately,
some think—
Thus, Einstein’s original gravity equation was correct, and chang-
ing it was a mistake. Much later, when I was discussing cosmo-
logical problems with Einstein, he remarked that the introduction
of the cosmological term was the biggest blunder he ever made in
his life. But this “blunder,” rejected by Einstein is still used by
cosmologists even today, and the cosmological constant denoted by
the Greek letter Λ rears its ugly head again and again and again.
It was accepted for many decades that the universe was, in fact, expanding, so
that the constant Λ was not needed. As Gamow noted, however, it did not disappear
from the literature, and recent evidence that the expansion is itself speeding up has
brought it to prominence once again. Even Einstein’s “blunder” turns out to have
considerable merit.
2. Further Developments
Our discussion of general relativity has now gone up to the border of the Promised
Land of general relativistic dynamics. We do not intend to cross that border,
however. Our intention was only to get the reader mathematically closer to that
step, while making various common-sense observations along the way about what
we were doing. We have carried the story up to the year 1916, when Einstein
published the exposition of his efforts to explain gravitation over a period of several
years. His earlier work, now superseded, had attracted attention, and as early as
1915, Karl Schwarzschild had produced the exact solution of the field equations for
the case of a tiny particle orbiting a heavy one in empty space. Also during that
time, one of the giants of twentieth-century mathematics, David Hilbert (1862–
1943) had begun to study the problem. Among other things, Hilbert studied the
motion of a particle near a (nonrotating, stationary) black hole, as we shall briefly
do in Section 4 below. Hilbert noticed certain peculiarities of this metric, notably
that energy did not appear to be conserved, in contrast to special relativity and
Newtonian mechanics. He asked his collaborators Felix Klein and Emmy Noether
(1883–1934) to look into this problem. Noether did, and produced one of the most
3. “TEMPORONAUTICS” AND THE GÖDEL ROTATING UNIVERSE 315
4 An anonymous reviewer pointed out that I have slighted Hilbert, who was the first to write
down what is now called the Einstein–Hilbert action, and also probably discovered the Einstein
equation, for which he gave priority to Einstein.
5 This word is new as far as I know, although, considering the great linguistic fecundity of
writers like Wells and Asimov, I wouldn’t be surprised to learn that someone has already coined
it. A story I read some 60 years ago contained the beautifully descriptive word chronoclasm to
describe a catastrophe brought about by a time traveler meddling with the past. Temporonaut
seems be the appropriate scientific-sounding term for a time traveler.
6 In the only example of fusion outside a star up to now, the hydrogen bomb, a fission bomb
provides the energy that causes the fusion of hydrogen into helium.
316 7. THE GEOMETRIZATION OF GRAVITY
or in a confined space when some catastrophe occurs. When they step outside at
the end of their journey, or after the catastrophe, they find they are in the same
place where they started and not noticeably older than they were, but they are
either at a time before they left or many centuries later. We shall ignore the latter,
on the grounds that it appears to be possible on the basis of special relativity.
The science-fiction scenario suggests a way to state the problem of travel into the
past formally. To explain the problem, we consider the world-line of a particle in
space-time. Relativity provides two concepts of time for such a particle. There is
the laboratory time t of an observer—let that time correspond to the time passing
outside the particle where the temporonauts reside—and there is the proper time s
on the particle, which is the time shown on the clocks the temporonauts are carrying
with them. Our problem thus becomes to construct a path γ (not necessarily or
even usually a geodesic) that starts and ends at a single point of space-time, having
fixed laboratory coordinates.
With the usual interpretation of the coordinates in space-time, traveling into
the past cannot possibly mean simply standing in one place and commanding time
to flow backwards. When an object is standing in one place, its proper time differs
from laboratory time by a constant; and proper time, by definition, always increases.
Nor can it mean traversing a path parameterized by proper time s in such a way
that dt/ds > 0 at every point, yet t is smaller at the terminal value s1 than at
the initial value s0 . If dt/ds > 0 at every point, as any calculus student knows,
t is bound to get larger as s does. What we actually do is exhibit a space-time
metric in which there exists a closed time-like curve, that is, one on which proper
time serves as the parameter, yet which returns periodically (in proper time) to the
same point in laboratory space-time. The passengers on a particle traveling along
this curve see their own watches apparently keeping normal time, but in relation
to the outside world that uses time t, they keep returning to the same point at the
same time t. To picture what this means, imagine passengers on the Circle Line
in London or the Ring Line in Moscow riding a train that never stops. It keeps
going past the same finite number of stations at a uniform speed according to their
measurements. Yet mysteriously, the time on every station clock that they pass
always shows exactly the same time it showed the last time they passed through it.
This would truly be an extreme manifestation of déjà vu! It would not be surprising
if some science-fiction writer used this scenario as the basis of a story.7
Such a loop, which is truly strange, fits exactly the definition introduced by
Douglas Hofstadter ([42], p. 10):
The “Strange Loop” phenomenon occurs whenever, by moving up-
wards (or downward) through the levels of some hierarchical sys-
tem, we unexpectedly find ourselves right back where we started.
In the present case, where means a point in space-time, so that we find ourselves
not only where, but also when we started. Hofstadter illustrated the idea with the
1961 lithograph Waterfall, by Maurits Cornelis Escher (1898–1972). In this sketch,
the current is always directed downhill, as water should flow, and yet the stream
forms a loop, endlessly returning to its original height. If height replaces laboratory
time and current replaces proper time, this picture can be thought of as a closed
7 Indeed, it may have happened already, as those who have seen the movie Groundhog Day
might think.
3. “TEMPORONAUTICS” AND THE GÖDEL ROTATING UNIVERSE 317
time-like loop. Hofstadter later developed this concept in more detail in an entire
book [43] devoted to the topic. He did not, however, give this example, which
involves a creation of Kurt Gödel in an area remote from his work on logic and
language, which Hofstadter had expounded so masterfully earlier.
We have to make a rather complicated change of variable (due to Gödel) in
order to interpret this metric in normal space-time with a certain distribution of
rotating matter. Whether models of this type are physically realizable appears to
be an open question. The one we are defining is not very stable; you need the
cosmological constant to be very precisely related to a constant matter density ρ,
in fact by the equation Λ = −2πGρ/c2 . We are not going to discuss the practicality
of this model, however. What we are about to embark on is purely an adventure
of the human mind.
(Note added March 4, 2016: Momin’s article was posted on-line at the University
of Toronto in 2015, but appears to have been removed since.)
Theoretically, the variable v should be restricted to the interval (−π, π), since
tan(v/2) is not defined at the endpoints of that interval. For any positive con-
stant b, however, the function 2 arctan b tan(v/2) approaches −π as v ↓ −π and
approaches π as v ↑ +π. We can thus define this function, say for π < v < 3π
by letting it be 2π + 2 arctan b tan((v − 2π)/2) on that interval. This function
can, through such a procedure, be extended to a continuous function of v for all
real values v and whose value equals the value of v at multiples of π. In fact, it is
differentiable when so extended, since its derivative tends to 1/b as v → ±π. With
this extension, τ becomes a continuous periodic function of v, having period 2π.
Obviously, the same is true of ξ, η, and ζ when t, u, and w are held fixed.
We can now see that the straight line t = 0, u = a, z = 0, −∞ < v < +∞,
maps to a closed curve that is traversed infinitely many times. What we shall show
is that, if a is suitably chosen, the variable v is the proper time σ on this path, so
3. “TEMPORONAUTICS” AND THE GÖDEL ROTATING UNIVERSE 319
η
...
......... .......
... ....... ...
.... ... ...
... ...
.. ... ..
.... ... ..
..
..
.. .....
.....
..
τ.....
.. .... ..... .......
... ... .... .... ..........
.. ... .... .. ... .......
.... ... ... ... ... .....
.....
.. ... ... ... ... .....
...
... ... ... ... ... ....
.... .
. ... ......
....
...
... .. ... ...... ....
. ...
.... .... ... ...... ....
....
... ... ... ..... ..
...........................................................................................................................
. .
.
ξ .......................................................................................................................................... σ
.. ... .. ... ....
.... ..
... .
. .
. ... .... ..
.... .... ... ... .... ...
.. .
. ..... ... ... ....
.
.
..
.. .... ... ... ....
.... ..
... ... ... ..... ...
..
.. ... ... ... ..... ...
.. . ....
. ...
.....
...... ....
.. . .
.
..
.. .... ..... ... ...........
.. ... .... .
.. .......
.. .
. . .
..
.. ......
... ... ..
... ... ...
... . .... ...
... .. . ...
............ ..
...
.
that it is timelike (dσ 2 > 0). In fact, as Mathematica will easily compute, in terms
of the variables t, u, v, w, the metric is
7
√ 1
dσ = 4 dt + 2 2 sinh (u) dt dv − du +
2 2 2 2
− cosh(2u) + cosh(4u) dv − dz .
2 2
8 8
Given that dt = du = dz = 0 along this path, we see that we will get ds2 = dv 2
√ 1
√ √
if cosh(2u) = 2 + 2, that is, u = a = 2 ln 2 + 2 + 5 + 4 2 . The path
returns to the point in space-time whose coordinates are (0, 2a, 0, 0) whenever v is
a multiple of 2π.
A geometric representation of this curve in the ξη-plane is shown in Fig. 7.2,
along with the value of the time τ on the “station clocks” in terms of the time
σ shown on the wrist watches of the passengers on this “rapid-transit” line. If
the train sets out from the leftmost point on the loop at noon, on both station
clocks and clocks inside the cars, and travels counterclockwise around the loop, our
passengers find that τ > σ early on, that is, the station clocks show a later time
than their own clocks. But fairly soon, the station clocks show the latest time they
will show and then begin to run backwards, as seen from inside the cars. By the
time the right-most point of the loop is reached, the station clocks the passengers
see will have moved back to noon, while the clocks inside the cars continue to
move ahead at a uniform rate. Over the second half of the loop, the station clocks
continue to retreat to a time before noon, but then begin to run forward again,
getting back to noon exactly as the car arrives at its starting point.
Remark 7.5. Any path along which t, u, and w are constant is√a closed curve.
The Euler equations show that for the fixed value u = (1/4) ln(4 + 15), the corre-
sponding closed curve is a geodesic. It is not, however, timelike. The passengers on
such a “rapid transit loop” would have to have imaginary mass, but at least they
wouldn’t require any fuel!
320 7. THE GEOMETRIZATION OF GRAVITY
4. Black Holes
We introduced the Schwarzschild radius rs = 2GM/c2 in Chapter 4 and noted that
it is the parameter that determines the curvature of space-time in the gravitational
field of a single particle. Although it is a tiny distance (only about 3 km in the case
of the Sun) its effects are noticeable at distances tens of millions of times greater
than itself. In the case of Mercury, the effect is a small amount of precession of the
perihelion of the orbit, as seen by an observer at rest relative to the Sun. It needs
to be emphasized that nothing in particular “happens” at this radius. It is a point
where the coordinates have singularities, but there is no physical singularity there,
and it is possible to change to coordinates that do not even have a mathematical
singularity at the Schwarzschild radius. They are valid for all positive values of the
distance to the origin.
But not at the origin itself; that is a true physical singularity. In the simplified
model we have been using, in which the attracting body is regarded as a massive
particle, that particle is a black hole. Having positive mass M , it possesses a
positive Schwarzschild radius rs = 2GM/c2 , while its own radius—the radius of a
4. BLACK HOLES 321
particle—is zero. Hence the entire mass is inside the Schwarzschild radius, and by
definition, that makes it a black hole.
The processes by which black holes form involve nuclear physics beyond the
scope of the present text, and so our inspection of some “macroscopic” physical
phenomena associated with them will be cursory. Our main interest lies in the
region just outside the Schwarzschild radius rs , and we assume that the source
of the gravitational field lies entirely inside that radius. The gravitational law of
Chapter 4, which requires the vanishing of the Ricci tensor, holds at distances larger
than rs . The simplest way to study this region is to imagine a second particle or a
light ray falling directly toward the gravitating particle. This model will reveal some
bizarre aspects of relativistic gravitation, not at all what the Newtonian picture
would lead us to expect. The stripped-down model we are going to study, which
amounts to an object falling directly into a black hole, allows us to neglect the
altitude and right-ascension coordinates ϕ and θ, so that we have only one spatial
dimension to consider, for which we use the radial coordinate r.
4.1. Falling toward a center: the Newtonian case. We assume that the
particle starts from rest at distance r0 and falls directly to the origin. The initial-
value problem to be solved is
GM
r = − 2 , r(0) = r0 , r (0) = 0 .
r
The usual trick of letting p = r in a second-order ordinary differential equation
that does not contain r works and shows that
√ 1 1
r = − 2GM − .
r r0
(The negative sign is present because r is a decreasing function of time.)
This equation implies that r tends to −∞ as r decreases to zero, and hence
the kinetic energy of the particle becomes arbitrarily large. Correspondingly, the
potential energy that would be required to move it away from the origin is infinite.
(The force holding it there would be infinitely large.)
This equation cannot be solved in closed form for r as a function of t, but the
substitution r = r0 cos2 θ allows us to solve it implicitly, getting the equation
r
r0
t= r0 arccos + r0 r − r . 2
2GM r0
From this relation, we see that the falling particle will reach the origin in a finite
time t0 :8
π r03
t0 = .
2 2GM
By Kepler’s third
√ law, this is equal to the period it would have if orbiting in a
circle at distance 3 2r0 /4. If r0 is the radius of the Earth’s orbit, this corresponds
to an orbit somewhere inside the orbit of Mercury (about 48 million kilometers from
the Sun). More colorfully, if some catastrophe caused the Earth’s forward motion
in its orbit to cease, it would fall directly into the Sun, taking a little more than
two months to do so.
8 Notice the occurrence of the “geometric” constant π in this equation, which describes a
physical phenomenon. The occurrence of π in this relation and in Kepler’s third law may be
considered a foreshadowing of the geometrization of physical laws.
322 7. THE GEOMETRIZATION OF GRAVITY
9 This same singularity at the origin disturbed Gabriel Lamé (1795–1870) in his attempt [49]
to explain light as a disturbance in an elastic medium. Knowing that he could not ignore the
case of light emanating from a point source, he invoked a rather loosely-argued principle involving
the ether, which essentially discarded all the Newtonian mechanics on which his own analysis had
been based up to that point.
4. BLACK HOLES 323
bow in, say the 24th century. Mary, who journeyed from London to Massachusetts
in the eighteenth century and then to a nearby star and back in the 21st century,
is now setting out on the ultimate travel adventure: into a black hole. All she has
to do—provided we remove the rest of the universe—is to launch herself straight
at it. Gravity will take care of everything else. We’ll let her shrink to particle-size
for the journey. We do this for her safety, since a body of any measurable size
would be torn apart by tidal forces at the event horizon, the Schwarzschild radius
rs . The speed of information c with which the twins communicated back in the the
21st century cannot be improved on. At that time, we noted that, due to the large
distances, messages would take a long time to get from one twin to the other. In
this 24th -century exploit, it undergoes a further degradation, since c ↓ 0 as Mary
approaches the event horizon.
To start the discussion with a result that appears to come out of nowhere—its
source will be explained below—we express the coordinates r and t that John uses
to keep track of his sister in terms of Mary’s proper time s as follows:
3cs 23
(7.4) r = r0 1 − ,
2r0
√ √ √
2 rs r r s r − rs
(7.5) t = t0 + s − − ln √ √ ,
c c r + rs
where t0 is chosen so that r = r0 when s = t = 0, that is,
√ √ √
2 rs r0 rs r0 − rs
t0 = + ln √ √ .
c c r0 + rs
When the value of r from Eq. (7.4) is inserted into Eq. (7.5), the result is a very
complicated expression, but one that is at least theoretically computable. These
equations actually come from the Euler equations for geodesics in the metric given
by Eq. (7.3). It is not difficult to verify that they satisfy these equations, that is,
d rs dt
2 1− = 0,
ds r ds
dr
d r rs dt 2 rs dr 2
2 2 = + .
ds c (r − rs ) ds r 2 ds c2 (r − rs )2 ds
The second of these equations is messy, as it would have been back in Chapter 4,
had we not replaced it with the “integrated” equation that, in the present situation
becomes
rs dt 2 r dr 2
1= 1− − 2 .
r ds c (r − rs ) ds
The solution of the first of the Euler equations is
dt r
= ,
ds r − rs
where is a positive constant. The integrated form of the second equation then
becomes
dr 2 rs 2 1 rs 3
=c 2
1− − 2 1− .
dt r r
This form of the equation was pointed out by Hilbert ([40], p. 289), who as-
sumed units in which c = 1 and wrote α for rs and A for the constant −1/2 .
324 7. THE GEOMETRIZATION OF GRAVITY
The solution divides into three cases according as < 1, = 1, and > 1.
We are confining ourselves to the case = 1, which is the simplest. It is a routine
though time-consuming computation to show that the variables t and r are related
by the equation
√r + √r 2 r 32 r 12 ct
s
(7.6) ln √ √ − −2 =K+ ,
r − rs 3 rs rs rs
where again K is a constant such that r = r0 at time t = 0:
√r + √r 2 r 32 r 12
0 s 0 0
K = ln √ √ − −2 .
r0 + rs 3 rs rs
From the preceding equations we find the velocity v and acceleration a of the
falling particle:
dr rs rs
v= = −c 1 − ,
dt r r
d2 r c 2 rs 3rs rs
a= 2 = − 2 1− 1− .
dt 2r r r
The acceleration is negative (toward the center) when r > 3rs , but positive
(away from the center) when rs < r < 3rs . Both velocity and acceleration tend to
zero as r ↓ rs , showing that if Mary reached the event horizon (the Schwarzschild
radius rs ) at a finite time t on John’s clock, she would be “stuck there.” There
would be problems in communicating as Mary approached the event horizon, since
the speed of light decreases to zero at that distance from the black hole. Messages
would take longer and longer to send, eventually requiring an arbitrarily large
amount of time to go from one sibling to the other, even though the distance
between them remains bounded. From John’s point of view, Mary would be falling
forever, but never reaching the event horizon.
From Mary’s point of view, on the other hand, her location r at time s satisfies
the very simple differential equation
√
dr c rs
=− √ ,
ds r
so that √
3c rs s 23
3/2
r = r0 − .
2
and the event horizon is reached at the finite proper time
3/2 3/2
2(r0 − rs )
s0 = √ .
3c rs
Proper time s continues to flow, from s0 to s0 + 2rs /(3c). At that point, Mary
(shrunk to particle-size, remember) reaches the “pole” r = 0, which is a genuine
space-time singularity, thereby becoming the Roald Amundsen of the twenty-fourth
century.
As Eq. (7.6) shows, t → ∞ as r ↓ rs . If we were to extrapolate Mary’s journey
into an imaginary remote past, we would see that, r → +∞ as t → −∞. Thus, if we
imagine that she has been falling forever, then she was located at points arbitrarily
remote from the black hole at times sufficiently long ago. Her trajectory, as seen in
coordinates fixed with origin at the center of the black hole, is shown in Fig. 7.3.
5. PROBLEMS 325
.......... r .
.......... ..........
.......... ..
.•
..........
. ....
(0,r0 ) ...... .........................
.........
.... .........
.........
... ..........
.. ...........
(0,rs ) ..... ............
.........
... ... ... ... ...• ........ ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ........................... ...
...
..
............................................................................................................................................................................
....
t
.
the opposite direction. That is, Mary has been moving away from the radius rs
from all eternity, and will continue in that direction, getting arbitrarily distant.
Equation (7.6) becomes
√r + √r 2 r 32 r 12 ct
s
(7.7) ln √ √ + +2 =K+ ,
r − rs 3 rs rs rs
where again K is chosen so that r = r0 at time t = 0 = s.
Remark 7.7. If r0 < rs , that is, if Mary began her journey inside the black
hole—presumably, the twins were born there—she would, from John’s perspective,
never get out. Equation (7.6) would be replaced by
√r + √r 2 r 32 r 12 ct
s
(7.8) ln √ √ − −2 =K+ ,
rs − r 3 rs rs rs
with K again chosen to make r = r0 at time t = 0 = s. The equation is valid
only as long as Mary stays away from any matter inside the black hole. Since we
are regarding the black hole as being due to a particle, that requirement imposes
no restriction. Depending on which direction time is flowing, her acceleration, as
observed by John, is
c 2 rs 3rs rs
a=± 2 1− 1− .
2r r r
This quantity does not change sign when r < rs . Hence, from John’s point of view
Mary approaches either 0 or rs as time becomes infinite, never reaching either limit.
This brief and superficial excursion into the mysteries of a black hole is the
last of the observations we intend to make on this subject. Our final chapter will
be devoted to a chronology of important papers in the theory of relativity and to
some commonsense metaphysical speculation about what it all means for our view
of the natural world.
5. Problems
Problem 7.1. Imagine a tunnel dug all the way through the Earth around one
of its diameters. (Idealize the Earth as a perfect sphere and imagine its atmosphere
has disappeared, so that the inside of the tunnel is a perfect vacuum.) What would
happen to a particle dropped down the very center of the tunnel? (Use Newtonian
reasoning.)
326 7. THE GEOMETRIZATION OF GRAVITY
Problem 7.2. Show that a plane whose first fundamental form is given by
Eq. (7.3) has curvature κ(r) depending only on r and given by
c 2 rs
κ(r) = − .
r3
(Thus, somewhat surprisingly, given that the parameterization has a singularity at
r = rs , there is no singularity in the curvature at that point.)
Problem 7.3. Show that the Laplace–Beltrami operator on the plane with
first fundamental form (7.3) is
r ∂2f c2 (r − rs ) ∂ 2 f c2 rs ∂f
∇2 f = − − 2 .
r − rs ∂t 2 r ∂r 2 r ∂r
Problem 7.4. Assuming = 1, confirm Hilbert’s statement that a body falling
toward a black hole has √ acceleration toward the black hole if its √speed v = r (t)
satisfies |v| < c(r − rs )/( 3r) and away from it if |v| > c(r − rs )/( 3r).
Problem 7.5. Show that the space-time given by the Gödel metric has no
singularities. Also show that the Gaussian curvature of the section tangent to any
plane containing the y-axis is 0, that of the sections tangent to the tx and tz planes
is −ω 2 , and that of the section tangent to the xz-plane is −3ω 2 . Show finally that
the scalar curvature is −2ω 2 .
Problem 7.6. Prove that the manifold M given by Gödel’s original metric is
a homogeneous space, in the sense that, for any two points P and Q in M, there is
an isometry TP Q : M → M (a one-to-one mapping of M onto itself that preserves
the metric) such that TP Q (P ) = Q. To do this, show that the mapping
2
−q 2
TP Q (t; x, y, z) = (t + q 1 − p1 ; x + q 2 − p2 , ep (y − p3 ) + q 3 , z + q 4 − p4 )
= (τ ; ξ, η, ζ) .
is a one-to-one diffeomorphic (infinitely differentiable, in fact, analytic) isometry of
M onto itself mapping P = (p1 ; p2 , p3 , p4 ) to Q = (q 1 ; q 2 , q 3 , q 4 ).
Problem 7.7. Those who appreciate the power and beauty of the theory of
analytic functions of a complex variable may yearn to see this theory extended to
three-dimensional space R3 . It is possible to create such an extension? A number
of considerations come to mind, algebraic, geometric, and analytic.
The algebraic consideration is that R3 is not a field, but the plane R2 is.
(The latter is the field of complex numbers.) Indeed, it was his attempt to find a
suitable definition of multiplication for elements of R3 that led Hamilton to discover
quaternions, which are a multiplication operation on R4 . Thus, there is an algebraic
barrier to such an extension.
The geometric barrier is even more formidable. Analytic function theory pro-
vides a plethora of conformal mappings. By the Riemann mapping theorem, any
simply-connected subset of the plane that has at least two boundary points can be
conformally mapped onto the unit disk. In contrast, only a very restricted class of
mappings of open sets of R3 can be conformal.
Nevertheless, analysis can still forge ahead and define a mapping F : R3 → R3 ,
given by F (x, y, z) = u(x, y, z)i + v(x, y, z)j + w(x, y, z)k, to be conjugate-analytic
if ∇ × F = 0 and ∇ · F = 0. In particular, the Newtonian gravitational force is a
conjugate-analytic function on R3 \ {0}. Show that, when w ≡ 0 and u and v are
5. PROBLEMS 327
independent of z (so that F maps the plane into itself), these equations reduce to
the Cauchy–Riemann equations for the function f (z) = f (x+iy) = u(x, y)−iv(x, y),
namely ∂u/∂x = −∂v/∂y and ∂u/∂y = ∂v/∂x. Thus if F = ui + vj is identified
with the complex function f (x + iy) = u(x, y) + iv(x, y), it is the conjugate of an
analytic function. Use these equations to show that u and v are harmonic functions,
that is, ∇2 u = 0 = ∇2 v.
Also show that the components u(x, y, z), v(x, y, z), and w(x, y, z) of a con-
jugate-analytic function F = u(x, y, z) i + v(x, y, z) j + w(x, y, z) k are harmonic
functions.
Part 3
331
332 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
are themselves the products of high technology; their functioning and, consequently,
also the measurements they provide are interpreted using other physical theories.
Nobody makes a direct measurement any more, in the sense that this phrase used
to have. We take as our starting point the metaphysics of space and time and then
extend it to look at other physical concepts, two of which are no longer regarded
as viable entities by the physics community, and others that are. The main issue
we intend to discuss is the appropriate language in which discussion of physical
laws should take place, and what meaning is to be attached to statements that
refer to these abstract physical entities. All of that presupposes people engaged in
conversation on this subject. Unless one is attempting to communicate a picture of
reality to someone else, there is no need to worry at all about the appropriate modes
of thought for dealing with the physical world. At the end of this lengthy digression
into many centuries of philosophy, we offer two rather commonplace suggestions for
nonspecialists to consider: (1) Keep pondering the subject and revising your mental
pictures of the occult objects that are used in physics, since new ways of thinking
are desirable, but when discussing the subject, treat them as undefined terms; (2)
in any discussion, concentrate on the mathematical functions (field intensity, for
example) that can be measured, since the measurements encode what we actually
know. This advice is reinforced with two analogies from mathematical practice,
namely the use of axiom systems with undefined terms in a variety of areas and the
particular use of various coordinate systems in differential geometry as a common
language for discussing the geometry of a surface.
In the eighth section, we let two of the pioneers in general relativity, Albert
Einstein and Sir Arthur Eddington, sum up their arguments in favor of (what was,
when they wrote) the radical new approach to physics incorporated in the general
theory of relativity.
In the ninth and final section, we sign off with a brief glance at the acceptance
of relativity theory around the world and some reflections on the acceptance (or
rejection) of scientific theories in general, leaving open the question whether the
geometry that has played so large a role in relativity will retain its prominence in
future physical theories.
the rest of the matter in the universe. That view makes two seemingly unverifiable
predictions: (1) if the bucket could be made to remain still while all the rest of the
matter rotated around it, the water would climb up the sides of the bucket; (2) if
all the rest of the matter in the universe were removed and the water was rotated,
it wouldn’t rise. It has been suggested that the effect could be tested by putting
water into a container encased in a heavy metal sphere and then spinning the sphere
rapidly; but the engineering involved in doing so is formidable, since the sphere has
to stand in for an immense quantity of matter—all the rest of the universe. It might
appear that Mach’s principle is the “only way out” if we wish to preserve relativity
of motion, but it turns out that certain modifications of general relativity can do
without it. In any case, the “total amount of matter in the universe” seems just as
occult and unknowable a concept as absolute space. Indeed, what is meant by “all
the matter in the universe”? Do we know that this quantity is constant? Are we
sure matter isn’t being created somewhere and destroyed elsewhere, as quantum
theory allows? If it depends on time, are the fluctuations large enough to make any
difference to observation? Whose clock is authorized to define the time on which it
depends?
General relativity has always had to compete with rival theories. The most
formidable and long-lasting of these, still “in the running” after half a century,
(but just barely, now a minority point of view), is the Brans–Dicke theory, named
for its creators Carl Brans (1935– ) and Robert Henry Dicke (1916–1997)—see
their paper [4]—in which a more elaborate version of the Einstein field equations is
supplemented by a scalar equation involving the Laplace–Beltrami operator. The
scalar equation lays hands on the most sacred principle of general relativity, the
constancy of the gravitational “constant” G, allowing it to vary. According to Will
([86], pp. 152–153), Dicke was led to this theory by a (very) rough computation
showing that G could be estimated as T c3 /M , where T is the time since the origin
of the universe and M the total amount of matter in the universe; in that respect,
it dovetails nicely with Mach’s principle. The Brans–Dicke theory and general
relativity make closely similar quantitative predictions, and both are compatible
with the observed precession of the perihelion of Mercury and the deflection of
light around the Sun within the limits of observational error. The Brans–Dicke
theory rules out fewer things because it contains a parameter that can be chosen
to fit experimental data. As more and more precise experiments during the 1970s
seemed to tell in favor of general relativity, that parameter had to be continually
adjusted upward to remain compatible with observation. Because of it, the Brans–
Dicke theory is harder to falsify than general relativity would be, and also more
complicated. For those who take the principle popularly and inaccurately known
as Occam’s Razor seriously, those facts speak in favor of general relativity, which
remains the dominant paradigm at the moment.
A large number of experimental tests of general relativity have been systemati-
cally catalogued and discussed by Clifford M. Will ([86], [87]). We have mentioned
already (Chapter 4) the experiments of Baron Eötvös to detect a difference between
gravitational and inertial mass and their negative results. This question arose again
334 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
after the promulgation of general relativity, since it also posits the equality of these
two masses.1
Here is a short list of the many experimental tests of general relativity, mostly
taken from the book of Will [86]. Later results can be found in an on-line article
[87] at the following url:
http://www.livingreviews.org/lrr-2006-3
Kepler’s third law can be stated by saying that the average angular
velocity (dθ/dt) of a body orbiting in a circle of radius r is given by
dθ 2 4π 2 GM rs c 2
= 2 = 3 = 3 .
dt T r r
2 2 2
We therefore get (r /c )(dθ/dt) = rs /2r, and it follows that for a
satellite
ds 2 3rs
=1− .
dt 2r
The higher the satellite is, the closer its clock runs to the time in free
space where there is no gravity. For a clock at the surface of the Earth,
the value of dθ/dt is only half of what it is for a typical GPS satellite.
There are about 30 of these satellites functioning at the moment, out of
some 65 that have been launched; they revolve around the Earth twice a
day with an orbital radius r ≈ 27, 000 km. If we somewhat unrealistically
think of the clock on the ground as being in an orbit at height re equal to
the radius of the Earth (about 6400 km), then the equation relating the
proper time on that clock to free-space, gravity-free time is
ds 2 rs rs re2
=1− − .
dt re 8r 3
Since the mass M of the Earth is 5.97 × 1024 kg and its Schwarzschild
radius rs is therefore about 9 millimeters (8.8 × 10−3 m), the discrepancy
in ds/dt between the satellite and the ground clocks amounts to
1 − 3rs /(2r) − 1 − rs /re − rs re2 /(8r 3 ) ≈ 4.442 × 10−10 ,
that is, about 0.444 nanoseconds per second. As a result, the discrepancy
between them would exceed the 30-nanosecond limit of tolerance in a little
over one minute. If computations didn’t correct for this error, the whole
system would soon become unstable. We have here a good example of the
practical usefulness of theory. No one would simply send up a bunch of
satellites without knowing that this effect was to be expected. Relativistic
effects are very small by traditional standards, but in the modern world
of super-precise engineering, they have to be taken into account.
(The preceding computation is meant to have only heuristic value.
We might equally well have considered a satellite in orbit at the radius
of the Earth. Its angular velocity would be given by Kepler’s third law,
and
the resultingdiscrepancy between the two values of ds/dt—that is
1 − 3rs /(2r)− 1 − 3rs /(2re )—would be 0.786 nanoseconds per second.
The physical principle is what we are after rather than a precise analysis.)
• As early as 1916, Einstein developed the mathematics of gravitational
waves, but they remained a curiosity for some 60 years, until the discov-
ery of a binary pulsar some 1600 light years distant. If the two stars
in this binary system change each other’s shape, then they change each
other’s gravitational fields, and their oscillation about each other thereby
produces a gravitational wave. It is too small to be detected from Earth
of course, but it carries energy away from the system, causing the two
stars to move closer together and thereby speeding up their period of rev-
olution. The effect is extremely small, but with diligent observation, it
336 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
was detected (see the book by Taylor and Weisberg [80]): gravitational
waves do exist, just as theory predicts.
• The hypothesis that the speed of propagation of electromagnetic radi-
ation in free space is independent of the wavelength has been recently
tested by the measurement of a recent gamma-ray burst from a distant
supernova. Even though the nova occurred billions of years ago, and the
various frequencies of gamma rays have been traveling all that time to
reach Earth, they all reached Earth within the span of a fraction of a sec-
ond, as reported in Nature Physics, 16 March 2015. (Evidently, we may
infer, this uniformity of speed applies only in Deep Space. Where matter
is present, different frequencies must propagate at different speeds, else a
prism would not be able to decompose light into a spectrum.)
http://phys.org/news/2015-03-einstein-scientists-spacetime-foam.html
2. Chronology
We list here some important events in the development of physics (mechanics and
optics only), giving the approximate date when they occurred and their significance
for the theory of relativity.2 The development of physical theory is conveniently
divided into three periods. The first consists of the earliest work on mechanics and
optics, up to the end of the seventeenth century. The second period, from roughly
1700 through 1900, contains great advances in the application of calculus to both
geometry and physics and the mathematization of both optics and electromagnetic
theory; it is the “classical” period for mechanics, optics, and electromagnetism,
all three of which were profoundly affected by relativity. The third period, from
1900 on, is the period of relativity theory and especially the geometrization of all
three areas of physics, with forces being supplanted by fields producing a curved
space-time.
2.1. Mechanics and optics. Some of the concepts of modern science have
roots that are very ancient, although, as will be apparent from these descriptions,
their original form differs considerably from the form they now have. This first
period, two millennia long, saw the creation of all the concepts we associate with
Newtonian mechanics.
ca. 330 BCE: Aristotle (384–322 BCE) or one of his students writes the
treatise Physics, in which the concept of force (dynamis) is defined. The
force applied to a “thing moved” is said to be directly proportional to
its size and the distance moved and inversely proportional to the time
required to move it that distance.
ca. 200 BCE: Diocles (ca. 240–180 BCE) writes a work On Burning Mir-
rors, in which he establishes the reflective property of parabolic mirrors
(all rays of light parallel to the axis of a paraboloid of revolution are re-
flected to the focus of the generating parabola). Fragments of this work
survived by accident, having been quoted by the early sixth-century com-
mentator Eutocius, who was writing about Archimedes and Apollonius.
2 I wish to warn the reader that developments in science tend to be more gradual than would
appear from this list. Major breakthroughs are nearly always preceded by extensive preliminary
work extending over considerable time, and ideas ascribed to one person are another are usually
complex assemblages of ideas contributed by many people.
2. CHRONOLOGY 337
ca. 50: Heron of Alexandria (ca. 10?–ca. 75?) writes Catoptrics (reflection),
in which he notes that the reflection property (angle of incidence equals
angle of reflection) means that reflected light takes the shortest path from
one point to another, given that it must first travel to the reflecting plane.
He didn’t think of this as a path of minimal time because he thought light
traveled instantly, as did much later writers such as Fermat (see below).
ca. 150: Ptolemy (ca. 85–ca. 165) publishes five books of Optics containing
results he (allegedly) obtained by experiment. The book contains a table
of angles of incidence θ and refraction ϕ for light at the interface of water
and air at increments of 10◦ from incidence angles of 10◦ through 80◦ .
This table is fairly close to the true values, but reveals itself to the mathe-
matically trained eye to be a strict quadratic function, one we would write
as
33 1 2
ϕ= θ− θ .
40 400
It seems likely that it was this formula (or a finite-difference calcula-
tion from which it can be derived), rather than observation, that led to
Ptolemy’s table. If we take the relative velocities of light in air and water
to have the ratio 4:3, the actual law is better represented by
3 1
ϕ= θ− θ3 .
4 60, 000
Still, Ptolemy was only off by 10% with the first coefficient, and his pro-
cedure (whatever it was) did show that the second term needs to be sub-
tracted.
Ptolemy also wrote a treatise on astronomy under the title Math-
ematike Syntaxis (Mathematical Treatise), better known by its hybrid
Greek–Arabic name Almagest, based on the hypothesis that the motions
of the Sun, Moon, and planets can be described by superpositions of uni-
form circular motions known as epicycles, that is, circles turning on other
circles. In that form, as Sternberg has pointed out [77], this hypothesis
can be described in modern terms by saying that the coordinates of the
planets are almost-periodic functions of time. That hypothesis remains
true, even when the simple periodic elliptic orbits of Newtonian mechanics
are replaced by their relativistic counterparts.
ca. 984: Ibn Sahl (ca. 940–1000) writes On Burning Mirrors and Lenses,
which contains the law of refraction that we now call Snell’s law : The
ratio sin θ : sin ϕ is the same for all angles of incidence θ. Judging from
the figure he drew, it seems that he connected refraction with a difference
in the speed of light in the two media, measured by the ratio of distances
traveled in equal times. This law was rediscovered seven centuries later
in Europe and apparently stated by three people independently. It was
first remarked on by Thomas Harriot (1560–1621) in 1602, then by Wille-
brord Snel van Royen (1580–1626) in 1621. Neither of them published it,
however. It was finally published in 1637 by René Descartes (1596–1650),
in his treatise Dioptrics (refraction). Descartes’ argument was less than
convincing; it was disputed in particular by Fermat.
338 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
...
...
..
...
...
C• .........
...
...
.. ........... ...
.. ........ ...
.......
.. .......
........ ...
.. ....... ...
h ..
..
.......
........
....... θ .. . ..
“resistance” r1
...
.
.. ....... .......... ...
...........
.. ....... ....
•B
.. ........ ..
..................................................................................................................................................................................................................
.
.... ......... ..
... ......... ..
.............. ....... ...
“resistance” r ϕ ...
.
.
k .....
..... ..
..... ..
2 .... ..... .
...•
...
.....
to point A in another will cross a planar interface between the two media
at a point B such that the sum CB + rBA is minimized, r being the ratio
of the two resistances. His line of reasoning was rather mystical, based
on a faith that, as Fermat said, “Nature does nothing in vain.” That is,
Nature doesn’t waste effort. Even though he allowed that light could be
propagated with infinite speed, he thought Nature required some “effort”
to do so, an effort that was directly proportional to the distance traveled
multiplied by the resistance and hence within a medium of constant “re-
sistance” directly proportional to the length of the path traversed. One
can easily see that if effort is replaced by time, the mathematics of Fer-
mat’s reasoning remains the same, and now the ratio of distance to effort
is replaced by the speed of light. Fermat noted, as Heron had done, that
in a single medium where reflection takes place, minimizing time or ef-
fort results in the equality of the angles of incidence and reflection. He
suggested applying the same reasoning to the case of refraction at the in-
terface between two media. That leads to the law that ibn Sahl discovered.
In his letter, which can be found in the work by Ross ([69], pp. 51–55),
he considers only the case r = 1/2, which is a difficult enough problem.
(See Fig. 8.1.) Since in that figure CB = h sec θ and CA = k sec ϕ, where
h and k are the perpendicular distances from C and A respectively to the
planar interface, Fermat is requiring that h sec θ+(k/2) sec ϕ be minimized
subject to the constraint that h tan θ + k tan ϕ is constant. As calculus
was not yet fully developed—Fermat was himself one of its pioneers, and
was an expert at finding minima—that problem was, as he noted, a very
difficult one. But he seems to have been the first to state the principle
now known as Fermat’s principle, as a minimization problem. Of course,
from the practical point of view, he had no way to compute the ratio of
the two “resistances” in advance. It could be computed from a table of
known angles of incidence and refraction, but all that could have been
verified empirically is that the ratio, which is the ratio of the sines of the
two angles, is the same for all angles of incidence.
2. CHRONOLOGY 341
3 In a note published in the March 1686 Acta eruditorum, Leibniz gave what he called a short
demonstration of an error made by “Descartes and others” (meaning, of course, Newton). These
people, he said, misuse the law of conservation of quantity of motion. It was, he said, mv 2 , not
mv, that was conserved. Of course, both are conserved in elastic collisions, but Leibniz wanted the
amount of work put into an object to be the same as what could be got out of it. In that respect,
he was off by a factor of 2, since if you lift a mass m to height
√ h, the work you have done is mgh.
If you then drop it, it will reach ground level at speed v = 2gh, so that mv 2 = 2mgh. That is
why we now write kinetic energy as mv 2 /2. What Leibniz, Fermat, Descartes, and Newton all
agreed on was that conservation laws were important.
2. CHRONOLOGY 343
the unit sphere at the origin, taking the ratio of the area of the projec-
tion to the area on the surface, and letting the area on the surface then
shrink to a point. By a rather arduous route, this approach led him to
an explicit formula for curvature in terms of the metric coefficients and
their first and second partial derivatives in terms of the parameters. (He
did not verify that this number was independent of the parameterization,
a crucial point in modern mathematics.) Knowing that the sum of the
angles of a triangle is two right angles only if Euclid’s parallel postulate
holds, he sent crews up three mountains with lanterns, instructing each of
them to measure the angle subtended by the other two. As happened with
Galileo’s attempt to measure the speed of light, however, the instruments
and method used were not sensitive enough to make the measurement:
He found that the sum was two right angles within the limits of probable
error. Any curvature of physical space that might exist was too small to
be detected at that scale.
1826: Nikolai Ivanovich Lobachevskii (1792–1856) develops “imaginary ge-
ometry,” which we now call hyperbolic geometry. The basic parts of this
subject had been developed in the eighteenth and early nineteenth cen-
tury by a number of scholars, some of whom thought they were proving
the parallel postulate. The first of them in Europe was Girolamo Saccheri
(1667–1733), who developed the subject very far and allowed his strict
rigor to lapse in a very small point, just enough to motivate him to pub-
lish Euclides ab omne nævo vindicatus (Euclid acquitted of every blemish).
Another who worked on the problem was Johann Heinrich Lambert (1728–
1777), who made the prescient remark that what we call hyperbolic plane
geometry had the trigonometry of a sphere of imaginary radius. (Actually,
what we now call inaccurately Saccheri quadrilaterals and Lambert quadri-
laterals had been studied centuries earlier by yet another mathematician
trying to establish the parallel postulate: Thabit ibn-Qurra (836–901).)
Gauss himself realized early on that it was impossible to prove the paral-
lel postulate, but he published nothing on the subject. In 1816, however,
he wrote to his student Christian Ludwig Gerling (1788–1864) suggesting
that if it were true, it would be sensible to take as a universal unit of dis-
tance the side of an equilateral triangle whose angles were extremely close
to 60◦ . Two years later, he was surprised to receive from Gerling a paper
the latter had received from a Marburg lawyer named Karl Schweikart
(1780–1859), developing what Schweikart called astral geometry and was
in essence hyperbolic geometry.
The ideas of hyperbolic geometry were slowly being recognized, but
Lobachevskii deserves the credit for being the first to publish on the sub-
ject, albeit only in the very obscure Proceedings of the Kazan’ Physico-
mathematical Society. Five years later, in 1831, János Bólyai (1802–1860)
published an excessively condensed version of the theory as an appen-
dix to a book written by his father Farkas Bólyai (1775–1856), a former
classmate of Gauss. Both Lobachevskii and János Bólyai developed the
trigonometry of the hyperbolic plane, confirming what Lambert had said.
It took some time for these ideas to gain acceptance, and mathematical
cranks continued to dispute them, far into the twentieth century. (See the
2. CHRONOLOGY 345
The mathematical background of relativity now being laid out, we turn to the
physics behind it. By 1850, Newton’s proposal that light consists of a stream of
particles had been out of favor for some time, as the interference experiments of
Thomas Young (1773–1829) showed that it behaved like a wave. The wave theory
had been pioneered by Huygens, who provided the important principle that each
346 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
point of the “wave surface” acted as the source of a new wave. Complicated phenom-
ena such as double refraction, seen in crystals like Iceland spar, were considered by
Huygens, who finally solved the problem by picturing the wave surface as a sphere
internally tangential to an ellipsoid of revolution. Later, in 1816, Augustin-Jean
Fresnel (1788–1827) considered a more complicated type of double refraction, find-
ing the wave surface to be a fourth-degree polynomial equation in spatial variables
x, y, and z (actually, it is quadratic in x2 , y 2 , and z 2 ). In parallel with this work
on optics, the theory of electricity and magnetism was developing rapidly, and in
1846, Gauss and his collaborator Wilhelm Weber (1804–1891) produced a formula
now known as Weber’s law giving the mutual force of repulsion between two like
charges in relative motion. That formula involved a velocity that, in 1855, Weber
determined to be 4.3945 × 1010 cm/sec. The following year, Gustave Kirchhoff
(1824–1887) remarked
√ that Weber’s velocity was apparently the velocity of light
multiplied by4 2.
1851: Armand Hippolyte Louis Fizeau (1819–1896) publishes “Sur les hy-
pothèses relatives à l’éther lumineux,” (On the hypotheses concerning the
ether [29]), describing an experiment in which he caused light to pass
through flowing water to determine if its speed is affected by the motion
of the medium in which it propagates. He found that its velocity did ap-
pear to increase when the medium moved in the direction of propagation,
but that the increase was not as much as would be predicted if it were an
elastic disturbance in the medium itself. Einstein later cited this result as
an important “retrodiction” of his special theory of relativity.
1861: James Clerk Maxwell (1831–1879) writes to William Thomson (Lord
Kelvin, 1824–1907) to announce his computation of the speed with which
a self-sustaining electromagnetic wave must propagate, given the known
values of dielectric permittivity ε and magnetic permeability μ. (The
√
velocity is 1/ εμ.) The known values of ε and μ gave Maxwell a value of
193,088 miles per second, close enough to the speed of light to be “more
than a coincidence.” He wrote that the magnetic and luminiferous media
must be the same and that “Weber’s number is really, as it appears to
be, one-half of the velocity of light in millimeters per second.” (According
to Siegel ([75], p. 139),√Weber’s system of measurements differed from
Maxwell’s by a factor of 2. Thus when Maxwell did the computation, he
came out with a different value for the constant. I am grateful to Adrian
Rice for pointing out this reference.) He thus duplicated unknowingly
what Riemann had concluded three years earlier.
1887: Edward Williams Morley (1838–1923) and Albert Abraham Michel-
son (1852–1931) publish the paper “On the relative motion of the Earth
and the luminiferous ether” [59], describing an experiment they conducted
at Case Western Reserve University in Cleveland, Ohio. This was the the
famous Michelson–Morley experiment designed to detect the motion of
the Earth through the presumed medium (ether) that was the conductor
of light waves. The experiment showed no evidence of any such motion.
apparently repeating Weber’s computation in his own units, arrived at a value equal to one-half
of the velocity of light, as will be seen below.
2. CHRONOLOGY 347
1915: Einstein publishes four papers on general relativity, the third of which
was “Erklärung der Perihelbewegung des Merkur aus der allgemeinen Rel-
ativitätstheorie,” (Explanation of the movement of the perihelion of Mer-
cury from the general theory of relativity [19]) and the last of which was
“Feldgleichungen der Gravitation,” (Gravitational field equations [20]).
This last paper contained the final form of these field equations.
1916: Karl Schwarzschild (1873–1916) publishes two papers on general rela-
tivity, the first of which was “Über das Gravitationsfeld eines Massenpunk-
tes nach der Einsteinschen Theorie,” (On the gravitational field of a point
mass according to the Einstein theory [74]) and contained a more compli-
cated version of what we called the Schwarzschild solution in Chapter 4.
The second was “Über das Gravitationsfeld einer Kugel aus incompress-
ibler Flüssigkeit,” (On the gravitational field of a ball of incompressible
fluid [73]), and it considered the more general case of a continuous distri-
bution of matter.
1916: Einstein publishes an extended, systematic exposition of relativity as
“Die Grundlage der allgemeinen Relativitätstheorie,” (Foundations of the
general theory of relativity, Annalen der Physik, Vierte Folge, 49, 769–
822).
1916: David Hilbert (1862–1943) gives a course in foundations of physics,
published in 1924 as Grundlagen der Physik ([40], [41]), a series of papers
in Mathematische Annalen. In these papers, he attempted to axiomatize
physics (an activity he was fond of) and gave a thorough discussion of the
dynamics of a particle or light ray falling into a black hole.
1916: De Sitter publishes “Einstein’s theory of gravitation and its astronom-
ical consequences” [11], a thorough analysis of the free-space gravitational
equations of general relativity.
1922: Alexander Friedmann (1888–1925), publishes “Über die Krümmung
des Raumes” (On the curvature of space [30]), which was exactly what
its title indicated.
1923: Elie Cartan (1869–1951) publishes “Sur les variétés à connexion affine
et la théorie de la relativité généralisée (première partie)” (On manifolds
with an affine connection and the general theory of relativity (part 1)”
[6]).
1924: Cartan publishes an extension of the previous work [7].
1924: Friedmann publishes “Über die Möglichkeit einer Welt mit konstanter
negativer Krümmung des Raumes” (On the possibility of a universe in
which space has constant negative curvature [31]), which explores the
possibility that the geometry of space could be hyperbolic rather than
elliptic.
1927: Georges Lemaı̂tre (1894–1966) publishes “Un univers homogène de
masse constante et de rayon croissant rendant compte de la vitesse radiale
des nébuleuses extra-galactiques” (A homogeneous universe of constant
mass and increasing radius taking account of the radial velocity of extra-
galactic nebulae [52]), anticipating the work of Hubble, whose name is
attached to the idea of an expanding universe.
1929: Edwin Hubble (1889–1953), apparently unaware of the earlier work
of Lemaı̂tre, publishes “A relation between distance and radial velocity
among extra-galactic nebulae” [45].
350 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
These quotations from Kant mark the summit of influence of Euclidean geometry
on science and philosophy. They came at the end of a long process of development
of human thought, beginning with the earliest purely practical needs of commerce
and surveying, becoming “philosophized” under the influence of Plato and Aristotle,
and culminating in the abstract concept of absolute space and time used by Newton
and analyzed by Kant. As the contrasting quote from Russell shows, Kant’s view
of absolute space and time fell into disfavor during the nineteenth and twentieth
centuries and was generally rejected. We are now going to make a survey of this rise
and fall with the aim of providing a wider view of the theory that was expounded
in detail in the previous chapters.
The attempt to understand the physical world is one of the oldest intellectual
pursuits of the human race, and mathematics has played an important role in it
from the beginning. The crux of the matter is precisely what is meant by “under-
standing” the physical world. A modest amount of reading through the records of
several millennia of metaphysical speculation has convinced this author that the
important issue in understanding the universe is the question, “How shall we talk
about the physical world?” Philosophers have debated endlessly such questions
as the difference between appearance and reality, essence and existence, and the
like. These questions have produced some interesting literature, but very little that
arises in the conversations of generally educated people in daily life. Because these
mysteries have been with the human race for so many centuries, the literature about
them is vast, much more than anyone could possibly read in a decade. In view of
that fact, the reader will perhaps indulge one whose experience is confined to the
short span of a single lifetime, and moreover a life not devoted to metaphysical
questions, for speculating on the meaning of all this human effort from the limited
perspective provided by that experience. The point I hope to make is that mod-
ern mathematics, including set theory, provides a model of human knowledge in
general, one that allows certain unknowable entities to remain unknowable in their
essence, but at the same time allows us to say that we know certain propositions
in which they are mentioned. Even though we abandon as fruitless the effort to
know (experience) what these entities are, however, we each must have some mental
picture of them to think about, and that need justifies the attempts of philosophers
to express their own picture of them in words. The transition from thought to
language is by no means easy or unimportant.
The basic concepts that have both geometric and mechanical significance are
mass, space (distance, area, volume), and time. These are the materials from
which mechanical theories are constructed. By combining these materials, physics
provides us with a variety of ways of imagining the world. By examining how this
process occurs and the ways people have thought about these concepts over the
centuries, we may arrive at a useful guide to thinking about physical laws. Of the
three abstractions just mentioned, mass has altered very little from earliest times.
The Greeks spoke of bodies rather than mass, but, as Archimedes’ work On Floating
Bodies demonstrates, they had the concept of density, from which the concept of
mass arises via the equation mass = density × volume. Our perception of mass
or weight is tactile rather than visual, and hence geometry enters into its analysis
indirectly, in such laws as Archimedes’ law of the lever. Moreover, the concepts of
mass and density did not change at all from the earliest times until the advent of
special relativity (see Chapter 2). For that reason, we shall confine our discussion
352 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
mostly to the concepts of time and space, which are both inherently geometrical.
After we look at some of what philosophers have said on these subjects, we can
examine the larger stock of objects that constitute modern physical theory, such
as gravitational, electric, and magnetic fields and compare them with now-obsolete
entities such as phlogiston and ether to see the extent to which we are justified in
regarding them as real.
Before we begin, we make one disclaimer: The sections that follow are aimed at
non-specialists, people who just want a way of thinking about informally described
physical theories in a way that does not ignore certain obvious conceptual difficulties
that would arise if they had to write an essay explaining what they know about
the subject. We do not presume to tell professional physicists or philosophers how
they ought to think about their subjects.
3.1. Measurement. Mass, length, area, and volume are the most intuitive of
the mathematical abstractions that play a role in physics. Standard weights and
lengths have been around as long as human civilization, and were crucial, along
with elementary geometry, for allocating land and trading in the produce of the
land. The basic geometric problem in surveying, for example, was to assign a
number (area) to a plot of land. Since plots of land come in different shapes, it was
necessary to have some means of assigning this number that would be intuitively
fair. That made the problem of comparing the sizes of plots of different shapes
an important one. This problem shows how uniform human intuition is across
many cultures. The same shapes (triangles, rectangles, circles) occur repeatedly
in mathematical texts from China, India, Mesopotamia, and Egypt. The presence
of rectangles among them, with the constant use of the formula area = width ×
length, shows that human intuition is essentially Euclidean. (Rectangles do not
exist in non-Euclidean geometry.)
This “numerical” geometry, which focused on the size of geometric figures,
became “philosophized” in one civilization, that of ancient Greece. The result was
an emphasis on the figure itself, its shape, and the proportions between the parts
of different figures, rather than mere quantity (length, area, volume, or weight).
Philosophers such as Aristotle and mathematical physicists such as Archimedes
spoke of particular shapes rather than space and of bodies rather than matter or
mass. Geometry, in addition to acquiring a logical structure, became the study of
proportion.
But the older, practical approach, in which quantity was the main issue (the
amount of land, or the weight or volume of grain, and the like) persisted and
reappeared in mathematical treatises that have survived from 2000 years ago. The
mathematicians Zenodorus (second century BCE) and Heron (first century CE)
,
wrote quantitatively about area (εμβαδóν), where Archimedes had written about
,
a surface (επιϕ άνεια). Heron gave the area of a figure as an absolute number,
whereas Archimedes spoke only of the ratio of two surfaces. He said, for example,
that a sphere is four times as large as its equatorial disk. In Heron’s language
and ours this theorem says that the area of the surface of a sphere—thought of
as a number of square units—is four times the area of its equatorial disk. Greek
geometry, influenced by Platonism, represents a very beautiful, but ultimately self-
limiting, departure from the quantitative numerical geometry that came both before
and after it. Modern geometry takes something from both of these, but adds
3. SPACE AND TIME 353
the crucial element of symbolic algebra. We are not confined, as Heron was, to
computing particular areas in order to convey our ideas through examples; we can
say simply and generally that the area of a sphere of radius r is 4πr 2 , which is much
more efficient. Modern geometry also mixes in the methods of calculus to produce
the differential geometry that has made general relativity possible.
This may be a good place to mention the property of continuity that we as-
cribe to all three of the basic mechanical concepts.5 Being continuous, they present
certain challenges to those who would measure them. There are some interesting
philosophical subtleties involved in the progression from counting discrete collec-
tions of things, which is not problematic at all, to measuring continuous things.
The latter process involves choosing a unit of measurement for each type of thing
measured (kilogram, meter, second), and it always involves approximations and
usually the use of fractions. Units of measurement of continuous quantities are
pure conventions, which differ from one time and place to another. Most people
would consider themselves hard-used if they had to deal with lengths given in old
Russian arshins or ancient Greek parasangs, or even furlongs. Counting, in con-
trast, is universal. “Five sheep” will translate perfectly into its equivalent in any
other language. In the higher-level abstraction of Greek mathematics, it becomes
provable that it is not possible to choose, for example, a unit of length that will
exactly measure both the side and diagonal of a square. But we pass over all that
and take the measurement of continuous quantities as a given. Measurement is
an important element of science and is involved in the both of two philosophical
questions we are going to discuss: (1) What exists? (What is real?) (2) How do
two observers interpret each other’s language so as to know when they are talking
about the same object and whether they are in agreement as to what they are
saying about it?
3.2. The views of Immanuel Kant. One consequence of the Greek abstrac-
tion in geometry was the emergence of a still higher-order abstraction embracing
all the already-abstract geometric figures. That is the concept of space, in which
these shapes “live.” The concept was fully formed by modern times, as Newton’s
discussion of it (quoted in Chapter 1) attests. All modern languages have a word
for it. For a thorough discussion of the evolution of spatial concepts, see the mono-
graph by Jeremy Gray [36]. As mentioned above, Euclidean geometry reached its
high point in the seventeenth and eighteenth centuries, as shown by the works of
people such as Newton and Kant. We shall accordingly use their words, and the
words of their critics, as a springboard for launching our own discussion.
As the epigrams quoted at the beginning of this section show, the philosopher
Immanuel Kant laid down some cogent ideas on the subject of time, space, and
our knowledge of the external world, in his Critique of Pure Reason. While Kant’s
views have clearly visible defects from the modern point of view, they were stated
with great eloquence, and deserve to be taken seriously. Kant distinguished be-
tween two kinds of propositions. Analytic propositions are those that are true or
false “by definition,” such as the statement that every uncle has either a niece or
a nephew (true) and that herbivores live by eating field mice (false). Synthetic
5 We are going to ignore the contradictory fact that some modern physical theories actually
regard all three as being “made up” of discrete “atoms.” We have enough problems to deal with
already, and can’t afford to make things yet more complicated.
354 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
propositions are those in which the truth of the relation asserted between the sub-
ject and the predicate is not determined by the definitions of those two things.
Examples are the statement that Julius Caesar was murdered (true) and that
Genghis Khan invaded Russia in the nineteenth century (false). Crossing elegantly
with the analytic-synthetic distinction was Kant’s distinction between two kinds of
knowledge. A priori (literally “from before”) knowledge is knowledge for which the
logical grounds are innate to the human mind and independent of any experience.
We not only know it, we also know how to prove it logically. All analytic state-
ments that people know to be true belong to this class. A posteriori (literally “from
afterward”) knowledge is derived from perception or experience; examples are the
facts established by experimental science and history. All a posteriori knowledge
is synthetic. From those principles, the question naturally arises: Are analytic and
a priori propositions co-extensive, so that, by inference, synthetic and a posteri-
ori propositions are also? Those who can imagine the appropriate Venn diagram
will see that there remains one doubtful area: Is there such a thing as synthetic a
priori knowledge? Kant believed that there was, and he found examples of it in
mathematics, at least the arithmetic and geometry of his day. (He had nothing to
say about algebra, even though it had undergone an amazing development in the
seventeenth century.)
In the case of arithmetic, he thought that the equality 5 + 7 = 12 (his example)
was an assertion that is a priori, that is, we are born knowing that it must be true,
yet synthetic in that the concepts of 5, 7, and addition exist independently of the
number 12. Nowadays, this claim appears doubtful. Given just the empty set ∅
and the successor relation that expands a set E by placing the set {E} among its
elements, resulting in the set E + , one can construct the ordinal numbers, define the
natural numbers and addition, and prove that 5+7 = 12. We define 1 = ∅+ = {∅},
2 = 1+ = {∅, {∅}}, 3 = 2+ = {∅, {∅}, {{∅, {∅}}}, and so on. Thus, mathematics
appears to clear up this matter from a purely logical point of view. We need the
undefined terms that come from set theory, and these must be left undefined. On
that basis, we can actually prove the propositions of elementary arithmetic. We
then have logical grounds for our belief, with an irreducible residue of undefined
terms that we have to live with. The logical problem is solved, to the extent that
it can be. This logical solution, however, does not account for psychology. The
cause of our belief that 5 + 7 = 12 is not the very non-intuitive definition we
have indicated here for the positive integers. This fact is one that we learned in
school. But how did it get into the arithmetic curriculum? That question shows
that there is still some virtue in Kant’s approach: How did it happen that people
the world over discovered the same fact, namely that 5 + 7 = 12? The discovery
was surely not made in accordance with the proof of its correctness; it involved,
as Kant correctly said, intuition. This logical/psychological seesaw will be seen in
all the examples we discuss below. Our knowledge can be asserted with confidence
except for certain undefined terms; but physics requires a physical interpretation
of the undefined terms. We cannot help attempting to express our intuitive picture
of those terms in language, a task that remains as difficult as ever. We get around
the difficulty, in the end, by positing physical laws that express numerical relations
among measurable variables. The variables and their numerical values suffice for
theoretical physics, and we relegate the question of the intrinsic nature of the basic
concepts to the subject of metaphysics.
3. SPACE AND TIME 355
In geometry, likewise, Kant thought that the proposition that there exists a
triangle with sides of lengths x, y, and z whenever the sum of the smaller two
exceeds the largest of these three quantities was a priori : Everybody “just knows”
that it is true, and this knowledge has not come from examining physical triangles.
It is synthetic a priori knowledge. In stating his belief that mathematical concepts
(along with certain other metaphysical concepts such as causation) are synthetic
a priori knowledge, Kant took it for granted that we “just know” the parallel
postulate, which is to say, we know that geometry must be Euclidean. By the
time of his death in 1804, however, mathematicians had already shown that we do
not “know” this at all. It has become commonplace to claim that non-Euclidean
geometry discredits this neat Kantian scheme. Gauss, who discovered hyperbolic
geometry, thought so, and most mathematicians accepted the judgment of Gauss.
This opinion was eloquently stated by the physicist Ludwig Boltzmann, who was
disturbed by Kant’s attempts to limit the use of entities that go beyond experience
and took rather an evolutionary view of human thought.6 Still, as the modern
philosopher Philip Kitcher has pointed out to me, it is in the nature of synthetic
propositions to admit logically conceivable alternatives: If they didn’t, they would
be analytic rather than synthetic. The validity of Kant’s view is not refuted by the
construction of a logically consistent non-Euclidean geometry.
Kant regarded space and time as mental pictures—innate knowledge—under-
lying all perception rather than as physical objects, and this point of view seems
useful: We attach coordinates to physical space by referring to physical bodies; the
coordinates themselves are not “things” in the same physical sense as the bodies
that occupy portions of space. Some of them are thought of as points occupied
by matter or energy; others are not, but are pure abstractions. The view seems
sound to me. It is nevertheless vulnerable to attack, as the quotation above from
Bertrand Russell and the ones below from Ernst Mach attest.
Kant’s claim that we can imagine pure empty space without any bodies in it
raises a number of questions. Is this empty space “made up” of points? If so,
how is one point of it to be distinguished from any other point? If not, how can a
physical body occupy only part of a thing that has no parts? Most people find it
difficult or impossible to imagine empty space without some physical boundary. As
architects know very well, space isn’t fully appreciated as space unless it is bounded
by something physical. To those questions, one supposes, Kant might reply that he
only said space was a form of intuition and wasn’t talking about space as understood
by physicists and astronomers. But if Kant’s space is divorced from the physical
applications of geometry, of what value is it? The distances between geographical
points and celestial bodies have been measured using geometry and are known. If
these bodies are not located in the space Kant has in mind, then what is added to
our knowledge by the statement that space is a form of intuition? It seems clear that
he did mean his space to be physical space, since he spoke of it containing bodies.
But, after all, we perceive the bodies directly. There is a relation between two
physical bodies, possibly changing over time, called the distance between them.
It does not depend on an abstraction called space. The distance between two
6 Boltzmann replaces Kant’s a priori with the notion of innate knowledge, which is not quite
the same thing. Perhaps the difference is that innate knowledge is not necessarily accompanied by
knowledge of the grounds for it. (In the case of synthetic propositions, those grounds are intuitive
rather than logical.)
356 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
bodies is a fact that we can know, even though we cannot know what “space” in
the abstract is. As we shall see in more detail below, what we know arises from
measurement. The bodies and their mutual distances at different times are the
“stuff” of mechanics. What more do we need? How does it help our understanding
to talk about abstract space? I think the help is psychological, and psychological
needs are not to be despised.
Kant’s claim that we cannot imagine space being absent, although we can
imagine that there are no bodies in it, appears to be both logically absurd and at the
same time psychologically necessary. To model the situation, let us accept that we
use a three-dimensional system of orthogonal coordinates to talk about the points
of space. Mathematically, that is fine; these are pure ideas, and there is no need
to connect them with physical reality. Yet do want to connect them with reality;
we need (or imagine we need) the concept of physical space in order to discuss
mechanics. In order to interpret our coordinates in physical terms, we need at least
four noncoplanar bodies whose mutual distances are constant to set up an origin
and three coordinate axes. That much of space has to be occupied by something
physical, as Mach was later to point out. The rest of the points, apparently, do not
need to be occupied by anything physical. We can go to a physical location whose
coordinates are (x, y, z) and verify that these are our coordinates by measuring the
distances to our four reference points. But given a point that is not occupied by
anything physical and where no event ever occurs, what is that point? How can
it be described except by giving its three real-number coordinates? We can’t help
picturing it in our minds as a thing, but that thing is not physical.
Kant believed that this intuitive idea of space was logically necessary for the
development of geometry, that propositions could not be stated without invoking
it at least implicitly. In that, modern geometry contradicts him. In Hilbert’s
axiomatization of geometry, it is not necessary to mention “space,” only to take as
starting points certain terms left undefined, such as points, lines, and planes and
to make certain assumptions, classified as axioms of incidence, axioms of order,
axioms of congruence, and the axiom of continuity. The subject is then a purely
verbal creation that can be applied just as well as ever in physics, but does not
logically require any intuitive interpretation such as Kant believed necessary. That
being said, Hilbert’s axiomatic version of geometry is rather a cold, bloodless thing,
lacking the poetic beauty of Euclid’s original work. Those who love geometry will
probably always prefer Euclid’s and Kant’s intuitive visual version of it. As a last
defense of Kant, we could still argue that his “form of intuition” is a psychological
necessity for thinking about geometry. Hilbert, after all, did not formulate his
axiom system in a vacuum. Any useful system of axioms has to be created with some
interpretation in mind. And if human intuition leads people the world over to think
the same way, it facilitates the conversation when people come to communicate with
one another, even if it is superfluous from a purely logical point of view.
Even granting that Kant was wrong in his belief that space is Euclidean, we can
still find some good in his philosophy if we accept that geometry is based on intuition
and that everybody’s intuition is Euclidean. The universality of the Euclidean
version of the Pythagorean theorem shows that such is indeed the case. That we
can go beyond our intuition and make effective use of non-Euclidean geometry is
an inspiring statement about human minds, but our point of departure is still our
Euclidean intuition. What is needed, it seems to me, is an amendment to Kant’s
3. SPACE AND TIME 357
philosophy: We need to refrain from using the word knowledge when speaking
about synthetic a priori propositions, and we need to recognize that experience
and perception are the main sources of intuition, which has been described as our
stock of inherited prejudices.
The sources that cause us to believe in what Kant regarded as a priori geo-
metric knowledge, can be seen through psychology. Psychologists have studied the
development of geometric intuition in children and find that their understanding
of (for example) the relative sizes of things is acquired gradually, though at a very
early age. Setting aside the question of causes of our geometric beliefs, we suggest
a modification in Kant’s view of the grounds for them. The propositions that Kant
laid down as a priori knowledge should rather be regarded as working hypotheses,
useful starting points for theory, in which one has some degree of confidence, which
some would call faith.7 By accepting that modification, we come to recognize that
our knowledge is tentative and incomplete, only as secure as the assumptions with
which we began. We must abandon the absolute and follow where experiment and
theory lead us.
There is no assurance that we will ever attain perfect knowledge, and indeed
it seems most likely that we never shall. But even imperfect knowledge, subject to
revision and/or rejection, nevertheless gives us the satisfaction of an ever-improving
understanding of the world. It has been said that to travel hopefully is better than
to arrive. If that aphorism appears to negate any reasonable motive for going
anywhere, we can observe that at most points, there is a direction that appears
to lead to something better—a proximate goal. One can travel (hopefully) in that
direction, without necessarily expecting to arrive and without needing any ultimate
goal, while always—it must be admitted—running the risk of winding up hopelessly
lost. I consider this view to be an optimistic one. It shows that, while we must
begin with our inherited, prejudiced ways of looking at the physical world, we
can, through logic and mathematics, revise our working hypotheses and get a new
picture that works better.
3.3. Measuring time. Since this book, as its title shows, is to a large degree
about time, we need to discuss the metaphysics of time, which is the most abstract
of the three mechanical concepts, not perceived by vision or touch, and yet part of
the consciousness of all people. Every language has a word for it, and the picture of
it in the mind of nearly everyone has changed very little over the centuries. Almost
universally, it is regarded as something that flows, and the metaphor of a river is
very commonly used to describe it.8 The reconciliation of time between observers
in relative motion lies at the heart of the entire theory of relativity, where the
7 Whether science requires faith to practice is a question that can be explored from both an
distinction between proper time and laboratory time is crucial. Poincaré thought
that the invention of the concept of proper time by Lorentz (who called it local
time) was a brilliant insight.
Given that we now have two varieties of the thing, we pose a natural question:
What is time? We can use the word time in sentences that any person of normal
intellect will find to be both meaningful and true. But does time “exist”? And how
do we perceive it? Certainly, reality has temporal aspects, but is time a thing, in
the same sense in which physical bodies are things? Even the measurement of time
is a non-trivial task, much harder than measuring distance or mass. As a great
philosopher of the late fourth and early fifth centuries, Augustine of Hippo, wrote,
in a quotation that can be found at the website
www.gutenberg.org/files/3296/3296-h/3296-h.htm,
For what is time? Who can readily and briefly explain this? Who
can even in thought comprehend it, so as to utter a word about it?
But what in discourse do we mention more familiarly and know-
ingly, than time? And we understand when we speak of it; we
understand also when we hear it spoken of by another. What then
is time? If no one asks me, I know: if I wish to explain it to one
that asketh, I know not: yet I say boldly that I know, that if noth-
ing passed away, time past were not; and if nothing were coming,
a time to come were not; and if nothing were, time present were
not. Those two times then, past and to come, how are they, seeing
the past now is not, and that to come is not yet? But the present,
should it always be present, and never pass into time past, verily
it should not be time, but eternity. If time present (if it is to be
time) only cometh into existence, because it passeth into time past,
how can we say that either this is, whose cause of being is, that it
shall not be; so, namely, that we cannot truly say that time is, but
because it is tending not to be? [Translation by Edward Bouverie
Pusey].
Everyone would say that time can be spoken of meaningfully. That is, the word
“time” can be used as a noun in a sentence that will convey accurate information
about the world. We can know many propositions that involve time—for example,
that two events must either be simultaneous, or in a definite sequential order (since
relativity we need to add the qualification “for each individual observer”), and if
event B comes after event A and event C comes after event B, then event C comes
after event A (again, for each individual observer)—but we do not know what
Aristotle would have called the essence of time. That is precisely what Augustine
is saying in this passage. It does not appear to add anything to our understanding
when we say that time is “real” or that it “exists.”
The second quotation from Immanuel Kant at beginning of this section repre-
sents an eighteenth-century attempt to come to grips with the mystery of time. The
problem was stated by Augustine in the passage just quoted. Kant’s solution to
it, declaring time to be an a priori form of intuition, reflects the tension described
by Augustine: on the one hand, our inability to know the essence of time and on
the other hand, our ability to know facts about it. His claim that we can imagine
time without any events is doubtful. To think of time, we need to think of things
3. SPACE AND TIME 359
In defense of Kant, we must point out the hidden assumption in the process
Mach has described here. He says we determine whether motion is uniform by
singling out a particular motion as a standard (like the motion of the hands on
a clock) and comparing other motions with it. What he doesn’t say is that the
comparison requires us to note the positions in the two motions at the beginning
and end of the comparison, and hence implicitly requires a concept of simultaneity
between the two processes and sequentiality between the beginning and end of each
of them. Kant would not have thought that Mach had refuted his view of time,
and perhaps Mach didn’t intend to. His aim was to point out that it is impossible
to prove in any absolute sense that, for example, the time interval between 10:15
AM and 11:05 AM on a given day is equal to the time interval between 12:10 PM
and 1:00 PM on any other given day. The assumption that the hands of a clock
move uniformly (within limits) remains always an assumption.
Perhaps we might call that assumption synthetic a priori knowledge, since we
are all thoroughly committed to it (bearing in mind that we now agree to interpret
this knowledge as a working hypothesis). We are wise to make that commitment,
and Mach has overstated his case if he means to cast doubt on it. In fact, any
competent musician can detect the difference between a uniform tempo and a non-
uniform one. True, if one were to analyze the brain activity of, say, Itzhak Perlman
as he performs a solo, then the person doing the analysis would indeed be comparing
two physical processes (the stressed beats in the music and the firing of certain
neurons in Perlman’s brain keeping time through an innate sense of rhythm) in
order to determine that they were or were not uniform relative to each other, just
as Mach stated. But for Perlman himself, there would be only one process going on,
namely the music, and he would perceive its uniformity or non-uniformity directly.
Galileo is said to have used his own sense of a uniform tempo to measure the
distances traversed by a ball rolling downhill over equal time intervals. We don’t
360 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
have an absolutely precise sense of time passing, but we can subjectively estimate
it with a small error, even in a dark silent room.
Moreover, the two time intervals just mentioned can be measured by a variety
of clocks, all working on different physical principles: the unwinding of a coiled
spring against a balance wheel, the oscillations of a pendulum, the dripping of sand
through an hourglass, the vibrations of certain crystals, and the position of the
stars. If the substantial agreement of the measurements by all these instruments
does not correspond to some physically real time in which the measured intervals are
congruent, we are faced with a major mystery. Surely the reasonable conclusion
is that they actually do correspond to congruent time intervals. All the laws of
mechanics and electromagnetism that involve time become incomprehensible on
any other assumption. They need to be, as mathematicians say, invariant under a
translation of the time axis.
3.4. Absolute space. Mach was equally scornful of Newton’s view on abso-
lute space, which he also quoted at length (again, the same passage we quoted in
Chapter 1):
When we say that a body K alters its direction and velocity solely
through the influence of another body J, we have asserted a con-
ception that is impossible to arrive at unless other bodies A, B,
C. . . are present with reference to which the motion of the body K
has been measured. In reality, therefore, we are merely aware of a
relation of the body K to A, B, C. . . . If now we suddenly neglect
A, B, C. . . and attempt to speak of the behavior of the body K in
absolute space, we implicate ourselves in a twofold error. In the
first place, we cannot know how K would act in the absence of A,
B, C,. . . ; and in the second place, every means would be wanting
of forming a judgment of the behavior of K and of putting to the
test what we had predicated, which latter therefore would be bereft
of all scientific significance ([58], p. 214, my translation).
4. THE REALITY OF PHYSICAL CONCEPTS 361
of them. That is all there need be until that physicist attempts to communicate
with other physicists.
The conversion of a mental picture into a physical principle stated in human
language gives rise to the two problems that we are attempting to analyze in the
present chapter. To take the first of the two problems, the meaning and/or reality
of intuitive concepts (such as gravitational fields) posited in order to explain obser-
vations, the accepted procedure in mathematics is to begin with undefined terms.
Concepts such as gravity appear to be exactly that. Yet they are not quite that,
since physicists need to interpret their primitive concepts and mathematicians do
not. When mathematicians create an axiomatic system, the undefined terms may
remain forever undefined. The minute one tries to apply the system to the real
world, however, the undefined terms have to be given an interpretation. In physics,
by way of contrast, interpretation is always needed. That is where intuition enters
the story. Just as mathematicians can deduce theorems from their axioms and un-
defined terms, physicists can produce testable predictions by positing physical laws
(usually mathematical in nature) involving such concepts as gravity and magnetic
fields. The interpretation, as we have been saying, comes from measurement and in-
volves quantitative relations satisfied by the hypothetical objects. These relations,
if confirmed by observation and experiment, become knowledge. The underlying
issues of what objects the primitive terms refer to, what they “essentially are” and
the extent to which we can know them, belong to metaphysics. The measurements
are facts, and they are the subject matter of physics proper.
Let us take an example from everyday life. Radio waves are important in
physics and engineering, and who would venture to deny that they “exist”? By
reasoning about them as if they were palpable objects, mathematical physicists
constructed theories in which they played a role, implying that certain events that
can be observed would occur.9 Radio waves cannot be observed directly, but the
assumption that they exist and have certain properties leads to a correct explana-
tion of what can be observed. Because we are so familiar with radio broadcasts,
we have no difficulty picturing a radio wave as being like a wave in water. But
in fact, the radio waves of theory are electromagnetic waves, periodic variations in
the intensity of mathematical objects called electric and magnetic fields that are
coupled in a precise way in accordance with Maxwell’s laws. Their existence is of a
very rarified type, even though we picture them mentally as being just like things
we do observe.
There are other mathematical objects of importance in physics that seem even
less like the physical objects we can observe directly, and it is difficult to picture
9 In the case of radio waves, the theory explained a phenomenon that had been noticed earlier:
When an electrical circuit is opened or closed, a spark appears in the gap of a nearby broken metal
ring that is not part of the circuit. Heinrich Hertz (1857–1894) tried the experiment in 1887 and
found that it worked. In the mid-1890s, Guglielmo Marconi (1874–1937) applied this principle to
create a practical wireless telegraphy, starting with home experiments in 1894. Since the Italian
government refused to support him and in some quarters regarded him as a lunatic, he finally
moved to England, where he found support. A radio receiver had been invented earlier, at Tufts
University in 1882, by Amos Dolbear (1837–1910), who obtained a patent for it in 1886, a patent
that Marconi was eventually forced to buy. In fairness to the Russians, it should be noted that
Aleksandr Stepanovich Popov (1859–1905) demonstrated a radio receiver in a paper delivered in
May 1895. He doesn’t get credit for it in the West because he was a patriotic Russian who refused
to license it abroad, even though the Russian government refused to support him. In that respect,
his experience resembled that of Marconi.
4. THE REALITY OF PHYSICAL CONCEPTS 363
4.1. Knowledge of the physical world*. The present section and the one
that follows wander into areas that are even more abstract and remote from ev-
eryday experience than the physics we have been discussing. They are included in
order to provide the broadest possible framework for the more concrete topics that
are the core of the present book. In particular, we are going to discuss what the
words exist and real mean, and what it means to know what exists and what is
real.
Philosophers have long attempted to arrive at an absolutely true and accu-
rate description of the physical world, and they have done so by inventing abstract
concepts, just as physicists do. Plato of Athens (ca. 425–347 BCE) is the earliest
philosopher who wrote systematic treatises on the subject that are still extant. He
may have been led to formulate his metaphysics in an attempt to reconcile two prin-
ciples proclaimed by the philosophers Heraclitus (fl. ca. 500 BCE) of Ephesus and
Parmenides (ca. 515–460 BCE) of Elea. From the fragments of quotations of these
philosophers that have survived, it appears that Heraclitus argued through many
examples that there is nothing permanent in the world: all is in flux. Parmenides,
on the other hand, argued that there can be no true knowledge of anything that
changes. (One may suppose that he reasoned as follows: Since knowledge of an
object must be phrased as a sentence with a subject, if the meaning of the subject
changes, the sentence can no longer be held to be true, and hence does not rep-
resent knowledge.) Plato’s search for the absolute that would be the exception to
Heraclitus’s world of flux and the foundation of a theory of knowledge satisfying
the requirements of Parmenides while still being applicable to the observable world,
led him to imagine a world of pure “forms,” whose properties were unchanging, and
which, when mixed together in the physical world that we inhabit, accounted for
the phenomena that we actually do observe. He emphasized the importance of
mathematics, which to him was “Euclidean” geometry—this was a century before
Euclid lived—and number theory, apparently seeing in its subject matter (lines,
points, numbers, circles, and the like) the kind of timeless pure entities that resem-
bled the forms he believed were the fundamental elements of the physical world. In
his later years, he seems to have realized the defects of this approach, but did not
364 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
have enough time left on Earth to revise it and present an alternative approach to
epistemology.
Plato was not the last natural philosopher to look to mathematics for absolute
concepts that would give a full explanation of the physical world. His student
Aristotle was much less mystical, more practical, and more scientific than Plato, but
he too was looking for an absolute reality that would rise above the changing world
we observe. Instead of Plato’s timeless forms, Aristotle made what he regarded as an
important distinction, that between substance and accidents, that is, between what
we would regard as things and properties of things. This distinction is mirrored
by the grammatical distinction between subjects and predicates: A substance can
be the subject of many predicates (accidents). Some things can be both subjects
and predicates; for example, the United States has 50 states that are members of
it; and it has membership in the United Nations. In the statement “Iowa is one
of the 50 United States” membership in the United States is a predicate of Iowa,
whereas in the statement “The United States is a member of the United Nations,”
membership in the United Nations is a predicate of the United States. Thus the
United States is both a subject and a predicate. At the rock-bottom level, however,
some things, Aristotle thought, are pure predicates, for example beauty and virtue,
while others (substances) are pure subjects, such as individual people (souls). The
work we have done in Chapter 5 seems to fit this model, with a surface in R3 being
an example of a subject, and the various parameters used to describe it playing the
role of predicates. The distinction is hard to apply in most cases, because, while
predicates appear to modify subjects, we could not articulate a predicate without
using some information about the subjects it applies to.
Set theory mirrors this way of viewing the world, at least to the extent of
denying the possibility of an infinite regression from predicates to subjects. The
elements (“subjects”) of a set (“predicate”) may be other sets (“predicates”), but
if one looks at their elements (“subjects”), which may also be sets (“predicates”),
and continues “digging,” at some finite stage, one arrives at the empty set. It has
no elements, and so is not a predicate of anything at all. That is, it is always false
to say that an element belongs to the empty set.10 The empty set itself, however,
does belong to other sets, and thus is, in Aristotelian terms, a pure subject, the
only one in the entire universe of set theory. It has properties, but its elements do
not, because they don’t exist.
4.2. Reality*. Since one of the main topics of the present chapter is the
question of what exists (is real), we note that Kant made another useful contribution
to the language of philosophy in his Critique, in a context that is also relevant
to the empty set. He pointed out that the verb exist is not a predicate. It is
10 This regression terminates as described only because it is explicitly postulated in the stan-
dard axiomatization of set theory, in which an axiom known as the axiom of regularity is laid
down. This axiom is one way of avoiding Russell’s paradox. Formally, it says that for any set A
there at least one element a ∈ A such that a ∩ A = ∅, that is, as a set, a is disjoint from the
set A that it is an element of. This axiom is so non-intuitive that, to common sense, it thwarts
the original purpose of set theory, which was to provide a clear and secure (non-contradictory)
foundation for mathematics. This axiom is not clear to common sense. Mathematicians not con-
cerned with foundations don’t think about it much, if at all. They use set theory because it is
a convenient common language of discourse. As for the security and non-contradictory nature of
mathematics, even set theorists are sometimes reduced to saying we have faith that our axioms
are consistent.
4. THE REALITY OF PHYSICAL CONCEPTS 365
part of the syntactical structure of a sentence. This point becomes clear when
we examine the two kinds of statements, universal and particular. If I say that
all swans swim (a universal statement), that is a meaningful statement, and its
negation is also meaningful: there exists a swan that does not swim. Exactly one of
the two statements is true, but both are at least meaningful. In standard logic, the
universal statement is by convention true if there are no swans, since a statement
asserted about nothing makes no claim that can be falsified. Of course, such a
statement is useless in any discussion of the physical world, since it conveys no
information about anything real. In the other direction, if I say that some swans
swim, that is, there exists a swan that swims (a particular assertion), I am making
a claim that cannot be true unless there actually are swans. Once that verification
is made, the negation of this statement is also meaningful, though easy to refute
empirically: No swans swim. The verb swim can be used as a predicate in both
kinds of statements.
If, in contrast, I say that all swans exist, which is empirically a true and trivial
statement, I am talking logical nonsense, because the negation of that statement
would be, “There exists a swan that does not exist.” In this case, the positive
statement remains true by convention if there are no swans, but says nothing about
the physical world. The existence of swans has to be established empirically before
one can know that this claim is meaningful; otherwise, it is a statement about
the members of the empty set, and therefore, though true, devoid of significance.
Likewise, the particular statement “Some swans exist,” equivalent to “There exists
a swan that exists,” which seems to be a mere tautology, is a true statement about
the world, but only because the existence of at least one swan has been empirically
verified. The statement could in that case be shortened by eliminating the last two
words. Attempts to get around these considerations by replacing the verb exist
with the adjective real merely make things complicated without changing any of
the essential principles.
The concept of existence is a predicate only in an indirect sense. If I say,
“There exists an entity x having property P ,” I am making an assertion, not
about hypothetical entities x having property P , but rather about the set E =
{x : x has property P }. I am asserting that E is non-empty. That fact reveals
the fallacy in some traditional attempts to conjure things into existence by defining
them as existing. As one example, take the classical “being whose non-existence is
inconceivable.” Consider the set E = {x : the non-existence of x is inconceivable}.
As the argument runs, the definition has a clear meaning; that is, we understand
what it means to say that the non-existence of a thing is inconceivable. Thus the
concept is meaningful. And since it is, such a being necessarily does exist.
Now, in the statement that such a being necessarily exists, the descriptive
phrase such a being is equivalent to a being that belongs to the set E. In other
words, the argument says that if a being belongs to E, then it exists. We may
grant that that statement is true, but as a proof that some being does belong to E,
this statement assumes what is to be proved. One would need to exhibit the being
and prove it belonged to E before the argument would apply. And if the being
were exhibited, there would no longer be any need for the verbiage involved in the
definition of the set E in order to demonstrate that it exists. We have here the
old familiar logical vicious circle. If the set E is in fact empty, then any assertions
366 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
about its elements are true, but have no application in either the physical world or
the world of pure thought.
We are in some very empyrean heights here, going beyond the boundaries of
even theoretical physics. Fortunately, the entities whose reality or unreality is
relevant to physics are not mere words. Proving that they exist is not a mere
matter of word-spinning, as it is in the case just considered. Instead, we associate
them with mathematical functions and interpret the values of those functions as
predictions of the readings on various measuring instruments. In that way, we
obtain an interpretation of the undefined terms that can be communicated from
one person to another.
As a final remark, we add that whether an object “exists” in the mathematical
sense is a purely formal and verbal matter. Certain axioms of set theory begin with
the symbol ∃, and mathematical objects exist (in the mathematical sense) only
in cases where a statement beginning with this symbol can be derived from those
axioms. What those objects are therefore depends on the strictness with which
axioms beginning with this symbol are chosen. One somewhat controversial axiom
is the axiom of choice, which asserts that for any set A, all of whose elements
are non-empty sets, there exists a function f whose domain is A and such that
f (A) ∈ A for each A ∈ A. If this axiom is adopted, there exist (again, in the
mathematical sense) sets of real numbers that are not measurable in the sense of
Henri Lebesgue (1875–1941). Without this axiom, such sets cannot be proved to
exist. Not all mathematicians do accept this axiom, and hence what does or does
not exist in the world of mathematics is a party question. In any case, statements
asserting existence in the mathematical sense do not say anything about the physical
world. They dwell in the same empyrean heights as the philosophical arguments just
discussed, providing models for a real-world interpretation of what are undefined
terms in a purely formal and axiomatic version of physics.
We shall return to these esoteric questions after a digression to reflect on the
role that mathematics plays both in physics and in answering the questions we have
posed here.
11 Technically, the geometry of the sphere is called doubly elliptic since two lines (great circles)
intersect in two points rather than one. If each pair of antipodal points is regarded as a single
point, the geometry becomes singly elliptic and is the geometry of the projective plane, which
cannot be embedded in three-dimensional Euclidean space.
368 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
simple, and using it requires far less energy—meaning energy invested in creating
the nerve connections and reflexes of an animal—than would be the case if the
organism were to deal with a hyperbolic world. The wonder is that the human
mind is able to refine this intuition and replace it with a more complicated set of
concepts that can deal with aspects of the physical world that our ancient ancestors
did not have any need to think about. Moreover, the new mathematical creations
of the human mind are sometimes stunningly in harmony with the physical world.
That fact requires some separate commentary.
Since this book has been written from the point of view of a mathematician
rather than that of a physicist, it aims at discovering why it is that there appears
to be what Felix Klein called a “pre-established harmony” between certain areas of
mathematics and certain physical problems. This harmony was also expressed by
Eugene Wigner in his famous essay “The unreasonable effectiveness of mathemat-
ics” [85]. Why is it that vector analysis encodes so perfectly the laws of mechanics
and electromagnetism? Why is the real part of the square of a quaternion equal
to the space-time interval between two events in special relativity? Why do self-
adjoint operators in Hilbert space represent in such wonderful detail the properties
of “observables” in quantum mechanics? Why does the mathematical principle
that the more concentrated a function is, the more dispersed its Fourier transform
will be express precisely the Heisenberg uncertainty principle? In keeping with the
general level of this chapter, we are going to provide a short list of observations
that amount to nothing more than common sense. From them, at least a partial
explanation may be derived.
can get exact expressions for the integral of any algebraic function. There
are, it turns out, just enough independent algebraic integrals for the prob-
lem of a rotating rigid body to make the cases studied by Euler, Lagrange,
and Kovalevskaya exactly solvable. But the general case of this problem
suffers from a dearth of algebraic integrals (see Appendix 5) and is not
solvable in this way. An even more daunting challenge is the very natural
three-body problem of Newtonian mechanics. It has only ten algebraic
integrals, whereas eighteen would be needed to express the solutions in
terms of theta functions. For that reason, the problem is more often stud-
ied geometrically, one of the most outstanding such studies having been
made by Poincaré in the 1880s. In this problem, the simple conic sections
are no longer adequate, and some very wild orbits, such as the Lissajous
orbits,12 become possible. Despite their fecundity, mathematicians have
not yet produced a structure that will solve this problem.
(3) Logical necessity. We describe below the role played by symbolic algebra
and analysis in physics. On that basis, we can give a partial explana-
tion of the “unreasonable effectiveness” of mathematics. Pretending we
are physicists, we idealize the physical universe—for example, perhaps re-
placing bodies by point masses—and then visualize quantitative relations
such as the inverse-square law of gravitation. Using symbols to represent
force, distance, and mass, we get a familiar algebraic relation, known as
Newton’s law of gravity. The important feature of mathematical relations
is that they take the form of implications. From an assumed law, we can
deduce consequences in the form of differential equations of motion. By
solving those equations, we can further deduce the actual trajectories of
bodies. These mathematical relations are not themselves “about” any-
thing at all. If the premises hold for any interpretation of the symbols
in them, then the conclusions must also hold in that interpretation. Es-
sentially, we take a look at the universe, use it to set up a mathematical
system, then see where the mathematical system leads. If it leads to some-
thing we can observe, we have been successful. How it works is easy to
explain.
Yet, even when we see all the logical connections, the aesthetic beauty
of the result is still awe-inspiring. To take the uncertainty principle as an
example, a simple wave is expressed in complex form as e2πiωt , where t is
the time and ω the frequency in cycles per unit time. (The product ωt is
thus physically dimensionless.) We can analyze a function of time f (t) by
creating its Fourier transform
∞
fˆ(ω) = f (t)e−2πiωt dt .
−∞
We can then synthesize the original function from the set of frequencies
fˆ(ω) by the inverse Fourier transform
∞
f (t) = fˆ(ω)e2πiωt dω .
−∞
what more there may be is not yet known and obviously would be unknow-
able through physics and chemistry.) At the very least, then, physical and
chemical processes play a role in shaping our thought. On that broad ba-
sis, a very vague connection can be made suggesting that those processes
might naturally cause us to create imaginary structures that resemble the
operation of the processes that produce our thinking. In other words,
the mathematical models we think up are the natural consequence of the
processes that produce the thinking.
That is all we intend to say on this subject. We now return to the epistemolog-
ical problems we have posed, abandoning the incompletely solved problem to the
philosophers.
5.1. The role of algebra. Much of the answer to our two questions about
the origin and understanding of our knowledge of physics comes from the subject
of algebra, and glance at the gradual penetration of this subject into physics is
enlightening. Mathematical physics in a form that we can recognize began with
Archimedes’ law of the lever. This law was stated geometrically and lacked the all-
important component provided by algebra. The translation of the concept of direct
proportion into the concept of a linear transformation had not yet been made. As
another example of the need for algebra, Ptolemy’s table of angles of incidence and
refraction fits a simple quadratic model, and this is not quite right, though it is
a good approximation. Likewise, Heron’s attempt to explain the principle of the
inclined plane geometrically doesn’t work; the error in it was corrected by Jordanus
Nemorarius in the thirteenth century.
As is well known, a great revolution in science occurred in the seventeenth
century. In the early years of that century we find Galileo and Kepler still using
only Euclidean geometry and numerical observations. But a few decades later,
Descartes and Fermat took the radical step of incorporating the algebra that had
been developed largely in Italy in the sixteenth century and applying it to geometry.
That was a momentous step. Even more important, given that the sixteenth-
century Italian algebra was still being expounded in the geometric language of
Euclid, Descartes and Fermat adopted the simple symbolic notation of François
Viète (1540–1603)—Fermat completely, Descartes with modifications that made his
notation standard right down to the present day. The introduction of symbols to
represent unspecified or unknown numbers was, I believe, more important than just
algebraic reasoning about such numbers. Such symbols are called variables,14 and
they are one of the significant qualitative differences between modern and ancient
mathematics. The symbols can be combined into formulas that are manipulated
according to a definite set of rules.
It is precisely here that symbolic algebra and its offspring the calculus become
a crucial part of the transition from unknowable primitive terms to knowable facts
about those primitive terms. Each primitive term—a magnetic field, for instance—
is labeled with a symbol representing a variable that depends on other variables,
usually time and space coordinates, which themselves are symbols for the elusive
14 Modern computer science has kept the human imagination well grounded and forced math-
ematicians to be precise. When defining a variable, one does not think of it as representing any
specific quantity. Yet it is still necessary to specify what type of variable it is, which is to say, what
objects can be substituted for it in formulas: real numbers, integers, rational numbers, complex
numbers, matrices, and the like.
5. THE HARMONY BETWEEN MATHEMATICS AND THE PHYSICAL WORLD 373
points of space-time that we have discussed above. Physical laws can then be stated
as algebraic or differential equations relating these variables, such as the Lorentz
law for the force on a charge moving in a magnetic field. If we mix in the other
crucial element—that the values of both dependent and independent variables can
be measured by processes that can be communicated from one person to another—
we can convert these algebraic or differential equations into equalities between the
numbers that result from the measurement, thereby replacing the equations with
equalities between numbers. If those equalities hold, then the physical law is verified
to some extent, and we may, if we wish, take that as an indication that the original
primitive term denotes something real, even if it transcends our senses.
In Chapters 5 and 6 of the present book, we have seen the power of analytic
geometry and calculus in the analysis of geometric figures that would have seemed
impossibly complicated to the ancient Greeks. Euler used these techniques to solve
a variety of problems in geometry and physics. In his hands, the use of parameters
to represent curves and surfaces in R3 led to the results on curvature discussed at the
beginning of Chapter 5. Gauss took the further step of showing how to compute
the curvature of a surface directly from its metric coefficients, thus potentially
eliminating any need for the absolute Euclidean space R3 in which the surface was
embedded. That step was subsequently built upon by Riemann and a galaxy of
brilliant Italian geometers to produce the tensor calculus and the general notion of
a manifold. This was the mathematical theory presented to the world in more or
less finished form at the end of the nineteenth century, just in time for Einstein to
use it in the general theory of relativity.
The algebra involved in general relativity is not difficult in itself, although it
is often tedious. The difficulties that come with it are twofold: first, convincing
oneself that the algebra does indeed express the geometric or physical concept
it is being used to represent, especially when that concept involves more than
three geometric dimensions; second, verifying that, where parameters are used,
the concept being defined is independent of any particular choice of parameters.
The first of these accounts for the lengthiness of Chapters 5 and 6. We needed
some assurance that the relativistic formulation of physics using tensors really does
connect with differential geometry through the concept of curvature; as a result,
we spent many pages looking at surfaces in R3 , which are intuitive. The second
difficulty is addressed in Appendix 6, where it is shown that the value obtained for
the curvature of a surface is independent of the parameters used.
5.2. The role of calculus. It was stated in the preface that this book is aimed
at generality and breadth, making connections among the diverse parts of its subject
rather than exploring all the important parts of it in depth. The main connection
of this sort that I wish to make is one that, in retrospect, seems to be an inevitable
consequence of the work of the seventeenth-century scientists and mathematicians.
By inventing the calculus, they showed how, in first approximation, all smooth
functions could be linearized using first-order derivatives, thereby making geometry
applicable to all forms of motion. At the same time, they revolutionized mechanics
by making acceleration rather than velocity the key concept that makes sense of the
notion of force. Acceleration, in turn, is expressed by the second-order derivative.
In geometry, that second-order derivative is associated with curvature. When the
facts are arranged in this way, it appears that curvature was bound to have an
important role to play in the application of geometry to physics. And so it has
374 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
turned out. Curvature and force are both variants of the second derivative and
amount to different ways of looking at the same physical phenomena.
5.3. The role of modern analysis. The successive terms of a Maclaurin
series mirror the way physical models develop: We begin with an oversimplified
model in which everything is linear, a good example being the eighteenth-century
analysis of the vibrating string, which is summarized in Appendix 2. Once that
oversimplified model is understood, if it appears to have explanatory value, we
refine it by introducing complications that make it more realistic. This process
mimics the Maclaurin series, in which the first two coefficients determine the best
approximation of a curve by a straight line and the third coefficient determines
the extent to which the curve deviates locally from that straight line. The power
of the calculus and its outgrowths is shown by the fact that mathematical physics
was able to give precise, quantitative descriptions of dozens of physical phenomena,
including the motion of particles and planets, the propagation of light and sound,
and the action of electricity and magnetism, all using only second-order differential
equations, which is to say, just these first three terms of the Maclaurin series. It was
not until the nineteenth century that a physical phenomenon led to a third-order
differential equation. The best-known of these phenomena—the propagation of
waves in a shallow channel—led to what is called the Korteweg–de Vries equation.15
Of course, the order of a derivative depends on the function you start with. In
terms of the metric coefficients, the curvature is defined using only second-order
derivatives; but often the metric coefficients come from a parameterization and are
themselves first-order derivatives.
In several places in this book, we have called upon elliptic functions to pro-
vide us with exact solutions of important differential equations. There is a seeming
paradox here, in that these functions can be fully understood only from the point
of view of analytic functions of a complex variable, even though the physical quan-
tities that those variables represent are interpreted as assuming only real numbers
as values. The two giant areas known as real analysis and complex analysis could
hardly be more different. Real analysis is a vast, chaotic jungle of counterintuitive
pathological functions that exhibit little stability, functions that are often wildly
discontinuous. Complex analysis, on the other hand, deals only with the smoothest,
most regular functions imaginable beyond those that are utterly trivial (the con-
stants). In fact, the requirements made of an analytic function of a complex variable
are so restrictive that it seems a miracle that any non-trivial ones at all exist. That
they nevertheless do exist in such abundance that one can be found mapping any
simply connected open set in the plane of complex numbers (except the plane itself)
onto a disk is an amazing fact (the Riemann mapping theorem).
One of the surprises buried within mathematical physics is that complex func-
tion theory is often more successful at solving physical problems than real-variable
theory. This fact is especially surprising, given that in mechanics the independent
variable often represents time. What can it mean to use a complex variable to
represent time? The eighteenth- and nineteenth-century mathematicians do not
appear to have written much on this subject. The only extended remark I have
found is the following, made by Weierstrass in 1885 (see his Werke, Bd. 3, S. 24):
15 Named after Diederik Johannes Korteweg (1848–1941) and Gustav de Vries (1866–1934),
who introduced it in 1895. It had actually been written about nearly two decades earlier in an
1877 work by Joseph Valentin Boussinesq (1842–1929).
5. THE HARMONY BETWEEN MATHEMATICS AND THE PHYSICAL WORLD 375
very far apart. But for analytic functions, due to what is called Morera’s theorem
and Cauchy’s formulas for the derivatives of an analytic function, there is such a
theorem.
The phenomenon described in this quotation was reported some 40 years earlier
by Lancelot Hogben ([44], p. 81), including a drawing of a shrimp swimming in a
magnetic field. Hogben’s explanation was that the shrimp has an “inertial guidance
system” in the form of a liquid-filled sac lined with sensitive cilia and containing a
ball of solid matter that orients the shrimp by pressing against the side of the sac
opposite to the direction of motion. According to Hogben, when this solid ball is
replaced by ferromagnetic material and the shrimp is placed in a magnetic field,
it will move along the magnetic lines of force. (I have been unable to determine
who performed this delicate operation and experiment on the shrimp.) The topic
has been well-studied over the past few decades (see, for example, [5]). Given the
work of Boles and Lohmann [2], the ball in the inertial guidance system mentioned
by Hogben already contains ferromagnetic material, so that the poor shrimp might
have been spared the ordeal of the operation.
To illustrate the passage from primitive undefined concepts to facts about those
concepts, we turn to algebra, introducing a variable to represent the magnetic field
and formulating an algebraic equation relating it to other primitive terms such as
charge, velocity, and force. Electromagnetic theory provides a vector-valued func-
tion B(t; x, y, z) to represent the magnetic field. Its numerical value at a given
point and time can be measured by observing the motion of a charged particle or
by measuring the current in a moving electrical conductor. It is this combination of
algebra and measurement that provides the vocabulary for physicists to communi-
cate and reconcile theory with observation. Physics can talk meaningfully about the
function B(t; x, y, z) in a particular situation, and they can verify their assertions
about it by measuring its value. Their knowledge consists of certain propositions
expressed as vector equations. For example, given a charge q moving with velocity
v, the force on the charge is given by the Lorentz law F = q((v/c) × B). For that
reason, the physicists need not be concerned with the “essence” of a magnetic field
or any of the other metaphysical questions we are considering in this chapter, any
more than mathematics books bother to discuss the desirability of the axiom of
choice. From the point of view of most professional physicists and mathematicians,
6. KNOWLEDGE OF HYPOTHETICAL OBJECTS: AN EXAMPLE 377
there is no need to delve into this philosophy. The great conversation goes on at
professional meetings without any need to worry about it. We are discussing it here
only to offer a few suggestions to non-specialists who may be inclined to ponder
such matters. For the non-specialist, the fact that the Lorentz law is consistent
with observation provides sufficient grounds to say that the magnetic field repre-
sented by the symbol B(t; x, y, z) is a real thing. As long as the present theory
continues to be used by physicists, non-specialists can talk confidently about its
terms, taking for granted that they are real. If it becomes necessary to revise or
replace the present theory, that is a job for the specialist, and the non-specialist
can await the results.
There remains another philosophical problem connected with this process that
we need to address: Since a physical quantity can be measured in different ways,
how can two people using different instruments, whose functioning may even depend
on different physical theories, be sure they are talking about the same quantity and
agree on its value? This problem is no problem for practicing physicists. The needed
reconciliation has already been performed, and its invocation is immediate. We
have already mentioned the widespread agreement exhibited by terrestrial clocks
operating on a wide variety of physical principles. Similarly, no one questions
whether a CAT scan and an MRI of the same patient will give consistent results or
worries that an optical telescope and a radio telescope may contradict each other.
If the readings from different measuring devices agree, then they are measuring
the same quantity and no one needs to ask what the “intrinsic nature” of that
quantity is. To elaborate on this point, we are going to suggest another analogy
from mathematics, this time from the theory of manifolds.
observers know they are measuring the same physical quantity if the two readings
are identical. The process resembles the use of different coordinate systems on a
manifold.
Each manifold is perceived in a unique way by people using a particular set of
parameters to describe it. The price of that individual freedom is the restriction
of discourse to objects (tensors and related concepts) for which agreed-upon trans-
lations from one parameter language to another exist. To take a simple example,
consider the ordinary plane with a closed half-line removed. One person may coor-
dinatize that object by regarding it as the xy-plane of Euclidean plane geometry,
with the non-positive portion of the x-axis removed. It then becomes, for that
person, the set of all ordered pairs (x, y) satisfying the inequality x + |x| + |y| > 0.
Another person may prefer polar coordinates and regard it as the set of ordered
pairs (r, θ) with r > 0, −π < θ < π. The two can agree that they are talking about
“the same point P ” if the coordinates that they assign to P are convertible via the
relations
x = r cos θ ,
y = r sin θ ,
r = x2 + y 2 ,
y
θ = 2 arctan .
x + x2 + y 2
Given that two people use these sets of coordinates in the indicated domains and
agree to identify the pairs (x, y) and (r, θ) when these equations are satisfied, we may
ask whether the original object that motivated our choice of these transformations
is needed at all. Any discussion they have is going to involve symbolic expressions
using these variables? What need is there for the intuitive object? Why do we
need the point P at all, if we have these coordinate transformations? The object
containing P may or may not be real in some unknowable metaphysical sense, but its
reality or unreality is irrelevant to the conversation between the two observers. They
can both define additional concepts just from their parameters, such as tangent
vectors and covectors, and they can use tensor principles to determine whether
a law stated by one of them translates into the same law as understood by the
other. What one regards as a function f (x, y), the other regards as fˆ(r, θ) =
f (r cos θ, r sin θ), and conversely,
what the latter regards as a function
g(r, θ), the
former regards as g̃(x, y) = g x2 + y 2 , 2 arctan(y/(x+ x2 + y 2 )) . We obviously
˜
have g̃ˆ = g and fˆ = f for all functions f (x, y) and g(r, θ). The differentials are also
easily translated: dx = cos θ dr − r sin θ dθ, and so forth. In that way, the geometry
imposed on this space by any metric ds2 can be expressed in either language, and
the relevant Christoffel symbols, Riemann curvature tensor, and the like computed
in both languages. What they know, they can talk about, and they need not worry
about what the underlying manifold “essentially is.”
As long as the mathematical language in which physicists communicate with
one another continues to produce consistent results, as it has in the past, the working
hypothesis of an objective physical world and the physical concepts used to describe
6. KNOWLEDGE OF HYPOTHETICAL OBJECTS: AN EXAMPLE 379
it are all we need. If that working hypothesis should ever fail, future scientists will
have to deal with that problem. But that event has not yet occurred.16
Thus, as we have now amply illustrated, measurement plays the essential role
in making knowledge possible. Without it, we would be floundering in futility like
many of the ancient and medieval philosophers, trying to make sense of unknowable
entities. A hypothetical law phrased mathematically suggests measurements that
can be made. If they confirm the law, it can then be used to make further predic-
tions. To illustrate this process, consider Newton’s law of gravity F = GM m/r 2 .
We can check it against observation only after we have determined the universal
gravitational constant G. To do that, as Henry Cavendish (1731–1810) did in 1797,
we have to assume that the quantity F r2 /M m is constant, where M and m are
the masses of two test bodies, r the distance between them, and F the force of mu-
tual attraction measured by experiment. (Newton’s law of gravity actually asserts
only that this expression is constant over all masses and radii.) If the attractive
force measured in different experiments with different masses and distances reveals
that this quantity is always the same, within the limits of accuracy, then we can
conclude simultaneously that the law is valid and that we have determined the
constant G. As a consequence, it really does not make sense to talk about, say,
the fiftieth decimal digit of G, since G is not a mathematical constant like π. If
two experimenters happened to get different values for that digit—and no one has
come even close to this level of precision—we need not conclude that one is right
and the other wrong. There is always the possibility that the quantity F r2 /M m
varies slightly with distance or with mass.
We can now respond to the seemingly natural question that is often asked:
What is a field of force? Our response is that we do not know and we do not have to
care. Mathematically, the field corresponds to a vector-valued function F (t; x, y, z)
on a region of space-time. We can determine the numerical value of this function
experimentally (assuming the validity of the laws of gravity and electromagnetism)
by observing the way test particles move in these fields. The fact that they do move
as predicted is taken as evidence that the field really is there, although it gives no
information as to what the field is “made of.” We accept the reality of the field on
the basis of these observations and give up trying to imagine a “substance” spread
over space that “is” the field.
Giving up on the search for the substance that constitutes a force field leaves us
with another important question, however. Maxwell and Hertz established the ex-
istence of electromagnetic waves, of which light is an example. Now our experience
of waves, based on observations of bodies of water and the transmission of sound
16 This possibility—that the future might not resemble the past—raises yet another long-
standing metaphysical problem, that of induction. Philosophers have tried to justify some principle
of induction so as to put a foundation under the way that we all think. Kant’s contemporary,
the philosopher David Hume (1711–1776) delivered a devastating critique of the principle. But
why is a justification needed? Induction is a very useful working hypothesis—we expect water
to boil when placed on a hot burner, not freeze. What need do we have to prove that inductive
thinking is valid? What causes us to think inductively is obvious on the basis of evolution: if we
didn’t think that way, we wouldn’t learn from experience. Why philosophers have felt a need to
provide logical grounds for induction is a mystery. If it ever fails on a large scale, we’ll abandon
the hypothesis. We don’t expect the Sun to rise tomorrow merely because it has done so every
day in human history; we have other (also inductively established!) physical principles that imply
it probably will. We don’t have space at this late point in our narrative to say any more about
this subject.
380 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
mathematical theory begins. We know the axioms and the theorems we deduce
about these undefined terms in the “savoir/wissen” sense. We do not know the
undefined terms themselves in the “connaı̂tre/kennen” sense. Even so, as remarked
above about the axiomatization of Euclidean geometry, we generally have in mind
some visualizable interpretation of the undefined terms.
Each person has a description of the world, analogous to a set of parameters for
describing a manifold. As long as that person is content to think and contemplate
the world in isolation, no difficulties arise with that description. But science is
a social enterprise, and scientists must talk to one another. We use language to
communicate our descriptions, and the language of physics—like the language of
tensors in differential geometry and indeed, including that language—enables us
to communicate with one another and determine whether our descriptions agree
or not. We repeat that the crucial link in this chain from unknowable undefined
concepts to known facts is measurement.17 The measurements are what we know,
and they enable physics to function as an intellectual and applied, social enterprise.
Such entities as quarks and force fields are concepts that can be communicated
among physicists and lead to quantitative predictions that have been consistently
confirmed by experiment. Anyone who desires a stronger assertion of their reality
than that must assume the burden of stating what that stronger form of reality
means, and how we could know about it. Thus, we take the modest position that
one should talk and reason as if these objects were real, until such time as better
explanations of what we observe are found. We assert, for example, that chemists
before the eighteenth century were right in speaking as if phlogiston was a real
thing; but we do not assert that it was a real thing. Similarly, nineteenth-century
mathematical physicists were correct in regarding “luminiferous ether” as a real
thing; but we do not assert that it was a real thing. It is conceivable that some day
better theories will be found, and radio waves will no longer be a necessary part of
physics.18 But in the meantime, we are correct in speaking as if they exist.
Finally, we note another important aspect of scientific theories that affects their
plausibility. When a number of physical principles interact fruitfully—for example,
the synergy between the Lorentz transformation and Maxwell’s laws, as discussed
in Chapter 3—each of them gains plausibility. Theories have to be consistent with
observation and experiment, it is true. But beyond that testability, our confidence
in the correctness of a theory depends very much on the way it “fits” with other
physical theories. That interlocking reinforcement prevents the automatic rejection
of a principle on the basis of a single observation or experiment. This interlocking
of theories shows up especially well in the precise measurements made in modern
experimental science. We accept such things as radio waves and regard them as
having a real existence because that is the most efficient way of talking about
phenomena that we perceive directly, such as the computer screen I am looking at as
17 To express this transition in terms used in mathematical logic, the contrast between the
two corresponds to the difference between the syntax of a formal language and the semantics
expressed in its metalanguage. The theory (syntax) is purely formal and intellectual. Measurement
(semantics) gives it meaning and applies it to the physical world.
18 In my opinion, radio waves are far less likely to become obsolete as an explanation than
phlogiston and the ether, first because they arose when much more was known about the physical
universe, and second because they are more closely linked to measurement. I expect them at
the very least to have a longer life than these earlier, now obsolete concepts, but perhaps not
everlasting life.
382 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
I type these words. The common-sense distinction between theory and observation
becomes murky in the light of modern physics, and describing reality is a matter of
getting the language just right. Even people who scornfully dismiss what is “only
a theory” will usually admit that an observation made with a mass spectrometer
is a valid observation, thereby ignoring the fact that the functioning of a mass
spectrometer is interpreted and understood only by means of very complicated
physical theories. What appears to be agreement between theory and observation
is more often nowadays agreement among a number of theories. Again, we need
to emphasize that the interlocking is often a matter of measuring a quantity using
instruments based on different physical theories. Agreement among several different
methods of measurement increases our confidence that each individual theory is
more or less right.
Those who are seeking absolute certainty and are disquieted by the thought
that future generations may look back at us and pity us for speaking about such
things as bosons and fermions as if they were real will not like this conclusion. But
our aim here is a practical one: to provide a way of looking at physical propositions
that can be confidently used, but is always subject to refutation at any time. The
human race is not likely ever to reach absolute truth. Each new physical paradigm
is like a vehicle that (we hope) is carrying us closer to some ultimate truth, but
breaks down after a time and has to be repaired or replaced. We have no choice
but to live our lives midway on this journey, riding in the vehicle that is currently
running, traveling hopefully rather than arriving, as already discussed. To make
ourselves more content with our lot, we shall close this section with a look at some
of the abandoned wreckage of past vehicles.
of a tuning fork are another example illustrating that a wave requires a periodic
motion of some substance. This is certainly true of sound waves, which cannot
travel in a vacuum. But light does travel through interstellar space. If it is a wave,
and something is vibrating as either its cause or its effect, just what is the medium
that is transmitting the wave?
In accordance with the precedents set by gravity and phlogiston, a suitable
,
name—ether, another ancient Greek word (αιθήρ), meaning bright, clear air —was
invented as a patch to cover our ignorance. It seemed necessary to do so, since
certain phenomena associated with light were seemingly impossible of explanation
in mechanical terms. In the nineteenth century, the accepted theory was that light
was a disturbance propagating through an elastic medium, and that was one of
the principal applications of elasticity theory in the book of Gabriel Lamé (1795–
1870). Lamé was sure that elasticity theory could explain even double refraction,
and he did manage to obtain Fresnel’s equation for the wave surface. In his treatise
[49], Lamé was able to explain the propagation of a planar light wave in purely
mechanical terms involving the vibrations of molecules in an elastic solid. Lamé
had great ambitions for his elegant theory of elasticity. But one important case
stumped him: propagation of light from a point source. As he showed, the wave
surface propagates normally at every point except the point of origin. There, he
said,
In order to obey the laws found above [the origin] must undergo
vibrations of infinite amplitude in all directions at once. Should
one therefore conclude that the assumption of a continuous stream
of waves produced by a point source is impossible? I do not think
so. . .
In this section (§ 129 of Lecture 24, pp. 325–326) under the heading “Necessité
d’admettre l’éther,” he continued, demonstrating the amazing ability of a first-rate
mind to resist drawing uncomfortable conclusions. To Lamé, this anomaly showed
the need for the medium of ether, and, in very obscure language, he sought to save
his beloved elasticity theory from this disaster by appealing to its protection:19
It thus follows necessarily that the central system, and then all the
doubly-refracting space, must contain another type of matter which
is the actual medium vibrating under the influence of the light.
Thus physical mass plays only a passive role, modifying by a sort
of resistance the directions of the vibrations and the velocities of
propagation. . .
This passage, at the very end of Lamé’s treatise, is vague and comforting and
uses the ether as a trump card when nothing else will take the trick. It also renders
irrelevant all the intricate mathematical theory he had developed in the first 23 of
his lectures. As Bertrand Russell later wrote ([72], pp. 19–20):
19 Karl Weierstrass (1815–1897) thought he could find another way out. He thought Lamé
hadn’t found all the solutions to his differential equations and tried to find other solutions by
a new method involving what we now call the divergence theorem (Gauss’s theorem). He gave
his paper on the subject to his pupil Sonya Kovalevskaya (1850–1891), who needed a publication
at the time to get a position at the University of Stockholm. Using the method, she produced
a formula for a solution. Unfortunately, as Vito Volterra (1860–1940) discovered a few weeks
after her death, the solutions didn’t actually satisfy the equations, since she had overlooked the
multivaluedness of the functions involved.
384 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
“Eureka!” moment in his thought, when he finally, after many years of striving that we have not
described here, found a set of generally covariant equations for the gravitational field.
8. A FEW WORDS FROM THE DISCOVERERS 385
system of parameters (which may, for example, be normal coordinates) in which the Christoffel
symbols vanish at the origin, then proves the relation only at the origin. But Corollaries 6.3 and 6.4
are examples of equations that hold at the origin in normal coordinates, but are nevertheless not
tensor equations. Although it is not clear what restrictions he imposes on the metric coefficients, it
is not at all difficult to exhibit metric coefficients in which this divergence is not zero, for example,
by taking g11 = 1 + t2 x + y 2 , g22 = 1 − tz 2 + x2 y 2 , g33 = 1 + x2 y − t3 z, g44 = 1 − t2 − x2 − y 2 − z 4 ,
gij = 0 if i = j.
9. EPILOGUE: THE RECEPTION OF RELATIVITY 387
That view, however, does not take into account the rules of scientific evidence
that have been established over the past five centuries. The majority of scientists,
especially in the fields of physics, chemistry, biology, and geology, may propose
hypotheses that challenge an accepted view—indeed, it is their duty to do so—but
they almost never make these proposals dogmatically or refuse to be convinced by
contrary evidence. (The Brans–Dicke theory is an excellent example of this rational
process.)
For a non-scientist (such as the present author), whose opinion carries no sci-
entific weight but who wants to understand the world in terms of the most reliable
theories available, the best guide is something like a principle of “majority rules.”
In a true open market of ideas, individual biases tend to cancel one another out.
Any particular scientist may have a crazy pet theory that, meeting with no support
among peers, never wins the day and gets into textbooks. The overwhelming ma-
jority of scientists in a particular field do not share the irrational thinking that leads
an individual to advocate silly theories. There are nearly always contrarians, even
in well established areas of science: people with doctorates in physics who do not
believe in the reality of electric currents, or people with doctorates in some biolog-
ical or geological area who do not accept the reality of evolution. Einstein himself,
as is well-known, tried to find ways of getting around the uncertainty principle;
but he never asserted this personal preference for a deterministic physics as a fact.
All too frequently, the contrarians—unlike Einstein—make dogmatic claims that
they are right, in the teeth of the evidence. Such people represent the irreducible
minimum of irrationality that is inherent in any area of human endeavor. Unfortu-
nately, their academic credentials are often exploited by special interests pursuing
a political or commercial end and cited as proof that there is still room for doubt
or (even worse) that the scientific consensus is simply a fraud and a hoax. In the
abstract, it is true, a scientific consensus is not the same thing as certain proof.
But in the world of public policy absolute proof is never available. Decisions have
to be made on the basis of incomplete information. Under those circumstances, it
is reckless to trust the contrarians, even though the non-expert cannot refute them.
Any account of the history of a scientific theory must take account of many
human factors like these, which are not themselves scientific. The history of the
theory of relativity fits that pattern to some extent. The gradual acceptance of
the special and general theories of relativity, and the various routes to that end in
several countries was the subject of a collection of essays edited by Thomas Glick
[33]. These essays are very nuanced, and we do not have space to summarize any
of them. Suffice it to say that old habits of thought die hard, and that new ideas
often have to run an obstacle course through the Academy.
9.1. The future of geometry in physics. One of the main purposes of the
present book has been to describe how, through relativity, the geometric concept of
curvature gradually displaced the older physical notion of force. To be sure, the ge-
ometry involved was highly algebraized, so much so that one hardly needed figures
to describe it. One could get by perfectly well just manipulating formulas. That
paradigm shift was fully developed during the century that followed Einstein’s 1915
paper on general relativity and became one of the dominant features of mathemati-
cal physics. Whether it will remain so in the future is uncertain. Already there are
signs that the search for a Grand Unified Theory (GUT) or a Theory Of Everything
(TOE) will require an entirely different form of mathematics. If that happens, then
388 8. EXPERIMENTS, CHRONOLOGY, METAPHYSICS
general relativity will represent the high-water mark of the penetration of geometry
into physics. But that is far too broad and deep a topic to enter into at this point,
and so we end our narrative here.
Bibliography
[1] Ali R. Amir-Móez, transl. (1919–2007). Discussion of difficulties in Euclid by Omar ibn-
Abrahim al-Khayyami (Omar Khayyam). Scripta Mathematica, 24(4):275–303, 1959.
[2] Larry C. Boles and Kenneth J. Lohmann. True navigation and magnetic maps in spiny
lobsters. Nature, 421 (2 January):60–63, 2003.
[3] Emile Borel (1871–1956). Introduction géométrique à quelques théories physiques. Gauthier–
Villars, Paris, 1914.
[4] Carl H. Brans and Robert Henry Dicke (1916–1997). Mach’s principle and a relativistic theory
of gravitation. Physical Review, 124:925–935, 1961.
[5] Ruth E. Buskirk and William P. O’Brien. Magnetic remanence and response to magnetic
fields in crustacea, volume 5 of Topics in Geobiology, chapter 17, pages 365–383. Springer-
Verlag, Berlin, 1985.
[6] Elie Cartan (1869–1952). Sur les variétés à connexion affine et la théorie de la relativité
généralisée (première partie). Annales de l’Ecole Normale Supérieure, 40:325–412, 1923.
[7] Elie Cartan (1869–1952). Sur les variétés à connexion affine et la théorie de la relativité
généralisée (première partie). Annales de l’Ecole Normale Supérieure, 41:1–25, 1923.
[8] Ronald W. Clark (1918–1987). Einstein: The Life and Times. World Publishing Company,
New York and Cleveland, 1971.
[9] P. C. W. Davies. About Time: Einstein’s Unfinished Revolution. Simon & Schuster, New
York, 1995.
[10] Willem de Sitter (1872–1934). On the bearing of the principle of relativity on gravitational
astronomy. Monthly Notices of the Royal Astronomical Society, 71:388–415, 1911.
[11] Willem de Sitter (1872–1934). On Einstein’s theory of gravitation and its astronomical con-
sequences. Monthly Notices of the Royal Astronomical Society, 77:155–184, 1916.
[12] René Descartes (1596–1650). Principia philosophiæ (1647), volume 3 of Œuvres. Levrault,
Paris, 1824.
[13] Tevian Dray. The Geometry of Special Relativity. CRC Press, Boca Raton, 2012.
[14] Tevian Dray. Differential Forms and the Geometry of General Relativity. CRC Press, Boca
Raton, 2015.
[15] Johannes Droste (1886–1963). The field of a single centre in Einstein’s theory of gravitation,
and the motion of a particle in that field. Proceedings of the Royal Netherlands Academy of
Arts and Sciences, 19, Part 1:197–215, 1916.
[16] Arthur Stanley Eddington (1882–1944). The Mathematical Theory of Relativity. Cambridge
University Press, 1923.
[17] Albert Einstein (1879–1955). Zur Elektrodynamik bewegter Körper. Annalen der Physik,
vierte Folge, 17:891–921, 1905.
[18] Albert Einstein (1879–1955). Über den Einfluß der Schwerkraft auf die Ausbreitung des
Lichtes. Annalen der Physik, vierte Folge, 35:898–908, 1911.
[19] Albert Einstein (1879–1955). Erklärung der Perihelbewegung des Merkur aus der allgemeinen
Relativitätstheorie. Sitzungsberichte der preussischen Akademie der Wissenschaften, pages
831–839, 1915.
[20] Albert Einstein (1879–1955). Feldgleichungen der Gravitation. Sitzungsberichte der preussis-
chen Akademie der Wissenschaften, pages 844–847, 1915.
[21] Albert Einstein (1879–1955). Die Grundlage der allgemeinen Relativitätstheorie. Annalen der
Physik, pages 769–822, 1916.
[22] Albert Einstein (1879–1955). On a stationary system with spherical symmetry consisting of
many gravitating masses. Annals of Mathematics, Second series, 40:922–936, 1939.
389
390 BIBLIOGRAPHY
[23] Albert Einstein (1879–1955). Relativity: The Special and the General Theory. Bonanza
Books, New York, 1961.
[24] Albert Einstein (1879–1955) and Marcel Grossmann (1878–1936). Entwurf einer verallge-
meinerten Relativitätstheorie und einer Theorie der Gravitation. Zeitschrift für Mathematik
und Physik, 62:225–261, 1913.
[25] Leonhard Euler (1707–1783). Methodus inveniendi lineas curvas maximi minimive propri-
etate gaudentes. Bousquet & Cie., Lausanne, 1744.
[26] Leonhard Euler (1707–1783). Recherches sur la courbure des surfaces. Mémoires de
l’Académie des Sciences de Berlin, 16, 1767.
[27] Richard Faber (1940–2011). Differential Geometry and Relativity Theory. Marcel Dekker,
New York, 1983.
[28] George Francis FitzGerald (1851–1901). The ether and the earth’s atmosphere. Science,
13:349, 1889.
[29] Armand Hipployte Louis Fizeau (1819–1896). Sur les hypothèses relatives à l’éther lumineux.
Comptes rendus, 33:349–355, 1851.
[30] Alexander Friedmann (1888–1925). Über die Krümmung des Raumes. Zeitschrift für Physik,
10:377–386, 1922.
[31] Alexander Friedmann (1888–1925). Über die Möglichkeit einer Welt mit konstanter negativer
Krümmung des Raumes. Zeitschrift für Physik, 21:326–332, 1924.
[32] George Gamow (1904–1968). My World Line. Viking Press, New York, 1970.
[33] Thomas Glick, editor. The Comparative Reception of Relativity. D. Reidel, Boston, 1987.
[34] Kurt Gödel (1906–1978). An example of a new type of cosmological solution of Einstein’s
field equations of gravitation. Review of Modern Physics, 21:447–450, 1949.
[35] Ivor Grattan-Guinness (1941–2014). Solving Wigner’s mystery: the reasonable (though per-
haps limited) effectiveness of mathematics in the natural sciences. The Mathematical Intelli-
gencer, 30:7–17, 2008.
[36] Jeremy Gray. Ideas of Space: Euclidean, Non-Euclidean, and Relativistic. Clarendon Press,
Oxford, 1989.
[37] G. M. Harvey. Gravitational deflection of light. The Observatory, 99:195–198, 1979.
[38] Stephen Hawking. A Brief History of Time. Bantam Books, New York, 1988.
[39] Heinrich Hertz (1857–1894). Die Prinzipien der Mechanik in neuem Zusammenhange
dargestellt, volume III of Gesammelte Werke. Johann Ambrosius Barth, Leipzig, 1894.
[40] David Hilbert (1862–1943). Die Grundlagen der Physik. Mathematische Annalen, 92:1–32,
1924.
[41] David Hilbert (1862–1943). Die Grundlagen der Physik, pages 258–289. Chelsea, New York,
1965.
[42] Douglas R. Hofstadter. Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books, New
York, 1979.
[43] Douglas R. Hofstadter. I Am a Strange Loop. Basic Books, New York, 2007.
[44] Lancelot Hogben (1895–1975). Mathematics in the Making. Rathbone Books, London, 1960.
[45] Edwin Hubble (1889–1953). A relation between distance and radial velocity among extra-
galactic nebulae. Proceedings of the National Academy of Sciences, 15:168–173, 1929.
[46] Felix Klein (1849–1925). Vorlesungen über die Entwicklung der Mathematik im 19. Jahrhun-
dert, Part II. Springer-Verlag, Berlin, 1927.
[47] Martin J. Klein (1924–2009), A. J. Kox, and Robert Schulmann, editors. The Collected Pa-
pers of Albert Einstein, Vol. 5. The Swiss Years: Correspondence 1902–1914. Princeton
University Press, 1993.
[48] Joseph-Louis Lagrange (1736–1811). Méchanique analitique. Desaint, Paris, 1788.
[49] Gabriel Lamé (1795–1870). Leçons sur la théorie de l’élasticité des corps solides, deuxième
édition. Gauthier–Villars, Paris, 1866.
[50] Urbain Le Verrier (1811–1877). Lettre à M. Faye sur la théorie de Mercure et sur le mouvement
du périhélie de cette planète. Comptes rendus, 49:379–383, 1859.
[51] Dennis Lehmkuhl. Mass-energy-momentum in general relativity. Only there because of space-
time? British Journal for the Philosophy of Science, 62:453–488, 2011.
[52] Georges Lemaı̂tre (1894–1966). Un univers homogène de masse constante et de rayon croissant
rendant compte de la vitesse radiale des nébuleuses extra-galactiques. Annales de la Société
Scientifique de Bruxelles, 47:49–59, 1927.
BIBLIOGRAPHY 391
[53] Angelo Loinger and Tiziana Marsico. On Hilbert’s gravitational repulsion (a historical note).
ArXiv:0904.1578v1, pages 1–7, 2009.
[54] Hendrik Antoon Lorentz (1853–1928). Théorie électromagnétique de Maxwell et son applica-
tion aux corps mouvants. Archives néerlandaises des sciences exactes et naturelles, 25:363–
552, 1892.
[55] Hendrik Antoon Lorentz (1853–1928). Versuch einer Theorie der electrischen und optischen
Erscheinungen in bewegten Körpern. E. J. Brill, Leiden, 1895.
[56] Hendrik Antoon Lorentz (1853–1928). The Michelson–Morley experiment and the dimensions
of moving bodies. Nature, 106:793–795, 1921.
[57] David Lovelock. The uniqueness of the Einstein field equations in a four-dimensional space.
Archive for Rational Mechanics and Analysis, 33:54–70, 1969.
[58] Ernst Mach (1838–1916). Die Mechanik in ihrer Entwicklung historisch-kritisch dargestellt.
F. A. Brockhaus, Leipzig, 1883.
[59] Albert Abraham Michelson (1852–1931) and Edward Williams Morley (1838–1923). On the
relative motion of the earth and the luminiferous ether. American Journal of Science, 34:333–
345, 1887.
[60] Hermann Minkowski (1864–1909). Raum und Zeit. Jahresbericht der deutschen
Mathematiker-Vereinigung, pages 75–88, 1909.
[61] Al Momin. The Gödel solution to the Einstein field equations. (Posted online), 2008.
[62] Jayant V. Narlikar. An Introduction to Relativity. Cambridge University Press, 2010.
[63] Isaac Newton (1643–1727). Philosophiæ naturalis principia mathematica (1687), translated
as The Mathematical Principles of Natural Philosophy, volume 34 of Great Books of the
Western World. Encylopedia Britannica, 1952.
[64] Henri Poincaré (1854–1912). Sur la dynamique de l’électron. Comptes rendus, 140:1504–1508,
1905.
[65] Henri Poincaré (1854–1912). The Value of Science. Modern Library Science Series. Modern
Library, New York, 2001.
[66] Henri Poincaré (1854–1912). Science and Method (1908), translated by Francis Maitland
(reprint). Barnes and Noble Books, 2004.
[67] Grigorio Ricci-Curbastro (1853–1952) and Tullio Levi-Civita (1873–1941). Méthodes de calcul
différentiel absolu et leurs applications. Mathematische Annalen, 54:125–201, 1900.
[68] Georg Friedrich Bernhard Riemann (1826–1866). Ueber die hypothesen welche der Geomet-
ric zu Grunde liegen. Abhandlungen der königlichen Gesellschaft der Wissenschaften zu
Göttingen, 13:133–152, 1867.
[69] Jason Ross, editor. Light: A History. Fermat’s Complete Correspondence on Light.
ΔYNAMIΣ, The Journal of the La Rouche–Riemann Method of Physical Economics, 2008.
[70] Bertrand Arthur William Russell (1872–1970). History of Western Philosophy. Simon and
Schuster, New York, 1945.
[71] Bertrand Arthur William Russell (1872–1970). The Pursuit of Truth. Fact and Fiction. Simon
and Schuster, New York, 1962.
[72] Bertrand Arthur William Russell (1872–1970). The Analysis of Matter (reprint of the 1927
edition). Routledge, London, 1992.
[73] Karl Schwarzschild (1873–1916). Über das Gravitationsfeld einer Kugel aus incompressibler
Flüssigkeit. Sitzungsberichte der preussischen Akademie der Wissenschaften, pages 424–434,
1916.
[74] Karl Schwarzschild (1873–1916). Über das Gravitationsfeld eines Massenpunktes nach der
Einsteinschen Theorie. Sitzungsberichte der preussischen Akademie der Wissenschaften,
pages 189–196, 1916.
[75] Daniel M. Siegel. Innovation in Maxwell’s Electromagnetic Theory: Molecular Vortices, Dis-
placement Current, and Light. Cambridge University Press, 2002.
[76] Johann Georg Soldner (1776–1833). Ueber die Ablenkung eines Lichtstrals von seiner ger-
adlinigen Bewegung, durch die Attraktion eines Weltkörpers, an welchem er nahe vorbei
geht. Berliner astronomisches Jahrbuch, pages 161–172, 1804.
[77] Shlomo Sternberg. Celestial Mechanics. Benjamin, New York, 1969.
[78] Shlomo Sternberg. Curvature in Mathematics and Physics. Dover, New York, 2012.
[79] Peter Guthrie Tait (1831–1901). On the importance of quaternions in physics. Philosophical
Magazine and Journal of Science, Fifth series, pages 84–97, January 1890.
392 BIBLIOGRAPHY
[80] J. H. Taylor and J. M. Weisberg. A new test of general relativity – Gravitational radiation
and the binary pulsar PSR 1913+16. Astrophysical Journal, 253:908–920, 1982.
[81] Roberto Torretti. Relativity and Geometry. Foundations and Philosophy of Science and Tech-
nology. Pergamon Press, Oxford, 1983.
[82] Vladimir Varićak (1865–1942). Anwendung der Lobatschefskijschen Geometrie in der Rela-
tivtheorie. Physicalische Zeitschrift, 11:93–96, 1910.
[83] Robert M. Wald. General Relativity. University of Chicago Press, 1984.
[84] Richard Westfall (1924–1996). Newton’s marvelous years of discovery and their aftermath:
myth versus manuscript. Isis, 71:109–121, 1980.
[85] Eugene Wigner (1902–1995). The unreasonable effectiveness of mathematics. Communica-
tions in Pure and Applied Mathematics, 13:1–14, 1960.
[86] Clifford M. Will. Was Einstein Right? Putting General Relativity to the Test. Second edition.
Basic Books, 1986.
[87] Clifford M. Will. The confrontation between general relativity and experiment. (Posted online
at http://www.livingreviews.org/lrr-2006-3), 2006.
[88] A. Zee. Einstein Gravity in a Nutshell. Princeton University Press, Princeton, 2013.
Subject Index
393
394 SUBJECT INDEX
Newtonian, 6–9, 16, 35, 41, 71, 80, 101, “On the assumptions that underlie
113, 140, 189, 279, 306, 310, 337, geometry”, 232, 344
347, 367 optical telescope, 377
relativistic, 13, 42, 308–310 Optics (Ptolemy), 337
Méchanique analitique, 343 optics, 115
Mercury, xiii, xvi, 129, 134, 139, 166, 305, orthogonal matrix, 54
322, 332, 348, 385 orthogonal transformation, 15
Merton College, Oxford, viii, 79, 337 orthonormal basis, 261, 270
Merton rule, viii, x, 59, 337, 339 orthonormal coordinates, xiv, 15, 20, 26
metaphysics, xvi
Methodus inveniendi lineas curvas, 342 parabola, 65, 137
metric, 212 paraboloid of revolution, 336
positive-definite, 164 paradox
Riemannian, 164 Russell’s, 364
metric coefficients, 25, 142 twin, 32–35, 64
metric tensor, 16, 95, 107–108, 309 parallel postulate, 3, 343, 344, 354
Michelson–Morley experiment, 20, 346 parallel transport, 239–248, 298, 303
Möbius function, 148 parameter space, 18
momentum, 12, 75, 361 parasang, 353
relativistic, 97 Paris Academy of Sciences, 232
Morera’s theorem, 375 particular statement, 364
Moscow, 318 passive equation, 132
MRI, 377 pendulum, xiii, 17, 359
multilinear algebra, 345 Perelandra, 8
mystical formula, 5, 25 perihelion, 134, 175, 180, 185
phlogiston, x, 351, 381–382
nabla (∇), 88, 89, 94, 202, 211
phyllotaxis, 370
Nature Physics, 336
physical dimension, 31
Neptune, 131
Physics (Aristotle), 336
Newton’s Principia, 85, 198
Poincaré disk model, xii
Newton’s equations, 143, 157
point-set topology, xvii
Newton’s first law, 75, 339
Poisson’s equation, 91, 306, 311
Newton’s law of cooling, 286
polar coordinates, xiii
Newton’s law of gravity, 90, 116, 127, 132,
position-gradient, 94
138, 143, 147, 324, 370, 379
potential, 127
Newton’s lemma, 114
potential energy, 12, 16, 87–92, 97, 126,
Newton’s second law, ix, 19, 80, 93, 94, 97,
140, 149, 305, 308, 323, 342, 362
111, 132, 160, 306, 308, 343, 347
precession, 99, 111, 129–176, 332, 385
Newtonian limit, 167, 175
precision, 52
Newtonian mechanics, 6–9, 16, 35, 41, 71,
finite, 52
80, 101, 113, 140, 189, 279, 306, 310,
infinite, 52
337, 347, 367
predicate, 363
Newtonian potential, 91
Princeton, 170
Newtonian Schwarzschild radius, 144, 146,
principal curvatures, 205
153, 175
Principe, 180
Newtonian synchronization, 28
non-Euclidean geometry, xii, 142 Principia (Newton’s), 8, 59, 341
non-measurable set, 366 principle of equivalence, 132, 143
normal coordinates, 257–261, 300 privileged coordinates, 113
number theory, 363 proper space, 21, 24
numerical curvature, 200, 204 proper time, xiii, 5, 13, 21, 99, 102, 103,
110, 168, 348
observable, 368 proportion, 57
observation time, 12 proton, 361
Occam’s Razor, 333 Prussian Academy of Sciences, 131
On Burning Mirrors and Lenses (ibn pseudo-Euclidean norm, 108
Sahl), 337 pseudo-Euclidean space, 25
On Burning Mirrors (Diocles), 336 pseudo-metric, 164
On Floating Bodies, 351 pseudo-sphere, 184, 345
398 SUBJECT INDEX
401
402 NAME INDEX