100% found this document useful (4 votes)

1K views164 pages

A Students Guide To General Relativity 9781107183469 9781316869659 9781316634790 1107183464 1316634795 - Compress PDF

Uploaded by

Asif Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (4 votes)

1K views164 pages

A Students Guide To General Relativity 9781107183469 9781316869659 9781316634790 1107183464 1316634795 - Compress PDF

Uploaded by

Asif Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 164

A Student’s Guide to General Relativity

This compact guide presents the key features of General Relativity, to support and
supplement the presentation in mainstream, more comprehensive undergraduate
textbooks, or as a recap of essentials for graduate students pursuing more advanced
studies. It helps students plot a careful path to understanding the core ideas and basic
techniques of differential geometry, as applied to General Relativity, without
overwhelming them. While the guide doesn’t shy away from necessary technicalities,
it emphasizes the essential simplicity of the main physical arguments. Presuming a
familiarity with Special Relativity (with a brief account in an appendix), it describes
how general covariance and the equivalence principle motivate Einstein’s theory of
gravitation. It then introduces differential geometry and the covariant derivative as the
mathematical technology which allows us to understand Einstein’s equations of
General Relativity. The book is supported by numerous worked examples and
exercises, and important applications of General Relativity are described in an
appendix.

norman g r ay is a research fellow at the School of Physics & Astronomy,

University of Glasgow, where he has regularly taught the General Relativity honours
course since 2002. He was educated at Edinburgh and Cambridge Universities, and
completed his Ph.D. in particle theory at The Open University. His current research
relates to astronomical data management, and he is an editor of the journal Astronomy
and Computing.
Other books in the Student’s Guide series
A Student’s Guide to Analytical Mechanics , John L. Bohn
A Student’s Guide to Inﬁnite Series and Sequences, Bernhard W. Bach, Jr.
A Student’s Guide to Atomic Physics, Mark Fox
A Student’s Guide to Waves, Daniel Fleisch, Laura Kinnaman
A Student’s Guide to Entropy, Don S. Lemons
A Student’s Guide to Dimensional Analysis , Don S. Lemons
A Student’s Guide to Numerical Methods , Ian H. Hutchinson
A Student’s Guide to Lagrangians and Hamiltonians , Patrick Hamill
A Student’s Guide to the Mathematics of Astronomy, Daniel Fleisch, Julia Kregonow
A Student’s Guide to Vectors and Tensors, Daniel Fleisch
A Student’s Guide to Maxwell’s Equations , Daniel Fleisch
A Student’s Guide to Fourier Transforms, J. F. James
A Student’s Guide to Data and Error Analysis, Herman J. C. Berendsen
A Student’s Guide to General Relativity

N O R M A N G R AY
University of Glasgow
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906

Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.

www.cambridge.org
Information on this title: www.cambridge.org/9781107183469
DOI: 10.1017/9781316869659
© Norman Gray 2019
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2019
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Gray, Norman, 1964– author.
Title: A student’s guide to general relativity / Norman Gray (University of Glasgow).
Description: Cambridge, United Kingdom ; New York, NY : Cambridge University
Press, 2018. | Includes bibliographical references and index.
Identifiers: LCCN 2018016126 | ISBN 9781107183469 (hardback ; alk. paper) |
ISBN 1107183464 (hardback ; alk. paper) | ISBN 9781316634790 (pbk. ; alk. paper) |
ISBN 1316634795 (pbk.; alk. paper)
Subjects: LCSH: General relativity (Physics)
Classification: LCC QC173.6 .G732 2018 | DDC 530.11–dc23
LC record available at https://lccn.loc.gov/2018016126
ISBN 978-1-107-18346-9 Hardback
ISBN 978-1-316-63479-0 Paperback
Additional resources for this publication at www.cambridge.org/9781107183469
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Before thir eyes in sudden view appear
The secrets of the hoarie deep, a dark
Illimitable Ocean without bound,
Without dimension, where length, breadth, & highth,
And time and place are lost;
[. . . ]
Into this wilde Abyss,
The Womb of nature and perhaps her Grave,
Of neither Sea, nor Shore, nor Air, nor Fire,
But all these in thir pregnant causes mixt
Confus’dly, and which thus must ever fight,
Unless th’ Almighty Maker them ordain
His dark materials to create more Worlds,
Into this wild Abyss the warie fiend
Stood on the brink of Hell and look’d a while,
Pondering his Voyage: for no narrow frith
He had to cross.
John Milton, Paradise Lost , II, 890–920

But in the dynamic space of the living Rocket,

the double integral has a different meaning. To integrate
here is to operate on a rate of change so that time falls away:
change is stilled . . . ‘Meters per second’ will integrate
to ‘meters.’ The moving vehicle is frozen, in space,
to become architecture, and timeless.
It was never launched.
It will never fall.
Thomas Pynchon, Gravity’s Rainbow
Contents

Preface page ix
Acknowledgements xii
1 Introduction 1
1.1 Three Principles 1
1.2 Some Thought Experiments on Gravitation 6
1.3 Covariant Differentiation 11
1.4 A Few Further Remarks 12
Exercises 16
2 Vectors, Tensors, and Functions 18
2.1 Linear Algebra 18
2.2 Tensors, Vectors, and One-Forms 20
2.3 Examples of Bases and Transformations 36
2.4 Coordinates and Spaces 41
Exercises 42
3 Manifolds, Vectors, and Differentiation 45
3.1 The Tangent Vector 45
3.2 Covariant Differentiation in Flat Spaces 52
3.3 Covariant Differentiation in Curved Spaces 59
3.4 Geodesics 64
3.5 Curvature 67
Exercises 75
4 Energy, Momentum, and Einstein’s Equations 84
4.1 The Energy-Momentum Tensor 85
4.2 The Laws of Physics in Curved Space-time 93
4.3 The Newtonian Limit 102
Exercises 108

vii
viii Contents

Appendix A Special Relativity – A Brief Introduction 110

A.1 The Basic Ideas 110
A.2 The Postulates 113
A.3 Spacetime and the Lorentz Transformation 115
A.4 Vectors, Kinematics, and Dynamics 121
Exercises 127
Appendix B Solutions to Einstein’s Equations 129
B.1 The Schwarzschild Solution 129
B.2 The Perihelion of Mercury 133
B.3 Gravitational Waves 136
Exercises 142
Appendix C Notation 144
C.1 Tensors 144
C.2 Coordinates and Components 144
C.3 Contractions 145
C.4 Differentiation 145
C.5 Changing Bases 146
C.6 Einstein’s Summation Convention 146
C.7 Miscellaneous 147
References 148
Index 150
Preface

This introduction to General Relativity (GR) is deliberately short, and is

tightly focused on the goal of introducing differential geometry, then getting
to Einstein’s equations as briskly as possible.
There are four chapters:

Chapter 1 – Introduction and Motivation.

Chapter 2 – Vectors, Tensors, and Functions.
Chapter 3 – Manifolds, Vectors, and Differentiation.
Chapter 4 – Physics: Energy, Momentum, and Einstein’s Equations.

The principal mathematical challenges are in Chapters 2 and 3, the ﬁrst

of which introduces new notations for possibly familiar ideas. In contrast,
Chapters 1 and 4 represent the connection to physics, first as motivation, then
as payoff. The main text of the book does not cover Special Relativity (SR), nor
does it cover applications of GR to any significant extent. It is useful to mention
SR, however, if only to fix notation, and it would be perverse to produce a book
on GR without a mention of at least some interesting metrics, so both of these
are discussed briefly in appendices.
When it comes down to it, there is not a huge volume of material that a
physicist must learn before they gain a technically adequate grasp of Einstein’s
equations, and a long book can obscure this fact. We must learn how to describe
coordinate systems for a rather general class of spaces, and then learn how
to differentiate functions defined on those spaces. With that done, we are
over the threshold of GR: we can define interesting functions such as the
Energy-Momentum tensor, and use Einstein’s equations to examine as many
applications as we need, or have time for.
This book derives from a ten-lecture honours/masters course I have deliv-
ered for a number of years in the University of Glasgow. It was the first of a pair

ix
x Preface

of courses: this one was ‘the maths half ’, which provided most of the maths
required for its partner, which focused on various applications of Einstein’s
equations to the study of gravity. The course was a compulsory one for most of
its audience: with a smaller, self-selecting class, it might be possible to cover
the material in less time, by compressing the middle chapters, or assigning
readings; with a larger class and a more leisurely pace, we could happily
spend a lot more time at the beginning and end, discussing the motivation and
applications.
In adapting this course into a book, I have resisted the temptation to expand
the text at each end. There are already many excellent but heavy tomes
on GR – I discuss a few of them in Section 1.4.2 – and I think I would
add little to the sum of world happiness by adding another. There are also
shorter treatments, but they are typically highly mathematical ones, which
don’t amuse everyone. Relativity, more than most topics, beneﬁts from your
reading multiple introductions, and I hope that this book, in combination with
one or other of the mentioned texts, will form one of the building blocks in
your eventual understanding of the subject.
As readers of any book like this will know, a lecture course has a point ,
which is either the exam at the end, or another course that depends on it. This
book doesn’t have an exam, but in adapting it I have chosen to act as if it
did: the book (minus appendices) has the same material as the course, in both
selection and exclusion, and has the same practical goal, which is to lead the
reader as straightforwardly as is feasible to a working understanding of the
core mathematical machinery of GR. Graduate work in relativity will of course
require mining of those heavier tomes, but I hope it will be easier to explore
the territory after a ﬁrst brisk march through it. The book is not designed to
be dipped into, or selected from; it should be read straight through. Enjoy the
journey.
Another feature of lecture courses and of Cambridge University Press’s
Student’s Guides , which I have carried over to this book, is that they are
bounded: they do not have to be complete, but can freely refer students to
other texts, for details of supporting or corroborating interest. I have taken
full advantage of this freedom here, and draw in particular on Schutz’s A
First Course in General Relativity (2009), and to a somewhat lesser extent
on Carroll’s Spacetime and Geometry (2004), aligning myself with Schutz’s
approach except where I have a positive reason to explain things differently.
This book is not a ‘companion’ to Schutz, and does not assume you have a
copy, but it is deliberately highly compatible with it. I am greatly indebted
both to these and to the other texts of Section 1.4.2.
Preface xi

In writing the text, I have consistently aimed for succinctness; I have

generally aimed for one precise explanation rather than two discursive ones,
while remembering that I am writing a physics text, and not a maths one. And
in line with the intention to keep the destination firmly in mind, there are rather
few major excursions from our route. The book is intended to be usable as a
primary resource for students who need or wish to know some GR but who
will not (yet) specialise in it, and as a secondary resource for students starting
on more advanced material.
The text includes a number of exercises, and the density of these reflects the
topics where my students had most difficulty. Indeed, many of the exercises,
and much of the balance of the text, are directly derived from students’
questions or puzzles. Solutions to these exercises can be downloaded at
www.cambridge.org/gray.

Throughout the book, there are various passages, and a couple of

complete sections, marked with ‘dangerous bend’ signs, like this one.
They indicate supplementary details, material beyond the scope of the book
which I think may be nonetheless interesting, or extra discussion of concepts
or techniques that students have found confusing or misunderstandable in the
past. If, again, this book had an exam, these passages would be ﬁrmly out of
bounds.
Acknowledgements

These notes have beneﬁtted from very thoughtful comments, criticism, and
error checking, received from both colleagues and students, over the years this
book’s precursor course has been presented. The balance of time on different
topics is in part a function of these students’ comments and questions. Without
downplaying many other contributions, Craig Stark, Liam Moore, and Holly
Waller were helpfully relentless in ﬁnding ambiguities and errors.
The book would not exist without the patience and precision of Róisı́n
Munnelly and Jared Wright of CUP. Some of the exercises and some of the
motivation are taken, with thanks, from an earlier GR course also delivered at
the University of Glasgow by Martin Hendry. I am also indebted to various
colleagues for comments and encouragement of many types, in particular
Richard Barrett, Graham Woan, Steve Draper, and Susan Stuart. For their
precision and public-spiritedness in reporting errors, the author would like to
thank Charles Michael Cruickshank, David Spaughton and Graham Woan.

xii
1
Introduction

What is the problem that General Relativity (GR) is trying to solve? Section 1.1
introduces the principle of general covariance, the relativity principle, and the
equivalence principle, which between them provide the physical underpinnings
of Einstein’s theory of gravitation.
We can examine some of these points a second time, at the risk of a little
repetition, in Section 1.2, through a sequence of three thought experiments,
which additionally bring out some immediate consequences of the ideas.
It’s rather a matter of taste, whether you regard the thought experiments as
motivation for the principles, or as illustrations of them.
The remaining sections in this chapter are other prefatory remarks, about
‘natural units’ (in which the speed of light c and the gravitational constant G
are both set to 1), and pointers to a selection of the many textbooks you may
wish to consult for further details.

1.1 Three Principles

Newton’s second law is
dp
F= , (1.1)
dt
which has the special case, when the force F is zero, of dp/dt = 0: The
momentum is a conserved quantity in any force-free motion. We can take
this as a statement of Newton’s ﬁrst law. In the standard example of ﬁrst-year
physics, of a puck moving across an ice rink or an idealised car moving along
an idealised road, we can start to calculate with this by attaching a rectilinear
coordinate system S to the rink or to the road, and discovering that
d 2r
F = ma = m , (1.2)
d t2

1
2 1 Introduction

from which we can deduce the constant-acceleration equations and, from that,
all the fun and games of Applied Maths 1.
Alternatively, we could describe a coordinate system S ± rotating about the
origin of our rectilinear one with angular speed ±, in which
d r±
F± = m a± = − m ± × (± × r± ) − 2 m ± × , (1.3)
dt
and then derive the equations of constant acceleration from that. Doing so
would not be wrong, but it would be perverse, because the underlying physical
statement is the same in both cases, but the expression of it is more complicated
in one frame than in the other. Put another way, Eq. (1.1) is physics, but the
distinction between Eqs. (1.2) and (1.3) is merely mathematics.
This is a more profound statement than it may at ﬁrst appear, and it can be
digniﬁed as

The principle of general covariance: All physical laws must be invariant under
all coordinate transformations.

A putative physical law that depends on the details of a particular frame –

which is to say, a particular coordinate system – is one that depends on a
mathematical detail that has no physical significance; we must rule it out of
consideration as a physical law. Instead, Eq. (1.1) is a relation between two
geometrical objects, namely a momentum vector and a force vector, and this
illustrates the geometrical approach that we follow in this text: a physical law
must depend only on geometrical objects, independent of the frame in which
we realise them. In order to do calculations with it, we need to pick a particular
frame, but that is incidental to the physical insight that the equation represents.
The geometrical objects that we use to model physical quantities are vectors,
one-forms, and tensors, which we learn about in Chapter 2.
It is necessary that the differentiation operation in Eq. (1.1) is also frame-
independent. Right now, this may seem too obvious to be worth drawing
attention to, but in fact a large part of the rest of this text is about defining
differentiation in a way that satisfies this constraint. You may already have
come across this puzzle, if you have studied the convective derivative in fluid
mechanics or the tensor derivative in continuum mechanics, and you will have
had hints of it in learning about the various forms of the Laplacian in different
coordinate systems. See Section 1.3 for a preview.
It is also fairly obvious that Eq. (1.2) is a simpler expression than Eq. (1.3).
This observation is not of merely aesthetic significance, but it prompts us to
discover that there is a large class of frames where the expression of Newton’s
second law takes the same simple form as Eq. (1.2); these frames are the frames
1.1 Three Principles 3

that are moving with respect to S with a constant velocity v, and we call each
of the members of this class an inertial frame . In each inertial frame, motion
is simple and, moreover, each inertial frame is related to another in a simple
way: namely the galilean transformation in the case of pre-relativistic physics,
and the Lorentz transformation in the case of Special Relativity (SR).
The fact that the observational effects of Newton’s laws are the same in each
inertial frame means that we cannot tell, from observation only of dynamical
phenomena within the frame, which frame we are in. Put less abstractly, you
can’t tell whether you’re moving or stationary, without looking outside the
window and detecting movement relative to some other frame. Inertial frames
thus have, or at least can be taken to have, a special status. This special status
turns out, as a matter of observational fact, to be true not only of dynamical
phenomena dependent on Newton’s laws, but of all physical laws, and this
also can be elevated to a principle.

The principle of relativity (RP): (a) All true equations in physics (i.e., all ‘laws
of nature’, and not only Newton’s ﬁrst law) assume the same mathematical form
relative to all local inertial frames. Equivalently, (b) no experiment performed
wholly within one local inertial frame can detect its motion relative to any other
local inertial frame.

If we add to this principle the axiom that the speed of light is infinite, we
deduce the galilean transformation; if we instead add the axiom that the speed
of light is a frame-independent constant (an axiom that turns out to be amply
confirmed by observation), we deduce the Lorentz transformation and Special
Relativity. In SR, remember, we are obliged to talk of a four-dimensional
coordinate frame, with one time and three space dimensions.
General Relativity – Einstein’s theory of gravitation – adds further signif-
icance to the idea of the inertial frame. Here, an inertial frame is a frame
in which SR applies, and thus the frame in which the laws of nature take
their corresponding simple form. This definition, crucially, applies even in the
presence of large masses where (in newtonian terms) we would expect to find
a gravitational force. The frames thus picked out are those which are in free
fall , either because they are in deep space far from any masses, or because they
are (attached to something that is) moving under the influence of ‘gravitation’
alone. I put ‘gravitation’ in scare quotes because it is part of the point of GR
to demote gravitation from its newtonian status as a distinct physical force to a
status as a mathematical fiction – a conceptual convenience – which is no more
real than centrifugal force.
The first step of that demotion is to observe that the force of gravitation
(I’ll omit the scare quotes from now on) is strangely independent of the
4 1 Introduction

nature of the things that it acts upon. Imagine a frame sitting on the surface
of the Earth, and in it a person, a bowl of petunias, and a radio, at some
height above the ground: we discover that, when they are released, each of
them will accelerate at the same rate towards the floor (Galileo is supposed
to have demonstrated this same thing using the Tower of Pisa, careless of
the health and safety of passers-by). Newton explains this by saying that the
force of gravitation on each object is proportional to its gravitational mass
(the gravitational ‘charge’, if you like); and the acceleration of each object, in
response to that force, is proportional to its inertia, which is proportional to its
inertial mass. Newton doesn’t put it in those terms, of course, but he also fails to
explain why the gravitational and inertial masses, which a priori have nothing
to do with each other, turn out experimentally to be exactly proportional
to each other, even though the person, the plant, the plantpot, and the
radio broadcasting electromagnetic waves all exhibit very different physical
properties.
Now imagine this same frame – or, for the sake of concreteness and the
containment of a breathable atmosphere, a spacecraft – floating in space. Since
spacecraft, observer, petunias, and radio are all equally floating in space, none
will move with respect to another (or, if they are initially moving, they will
continue to move with constant relative velocity). That is, Newton’s laws work
in their simple form in this frame, which we can therefore identify as an inertial
frame.
If, now, we turn on the spacecraft’s engines, then the spacecraft will
accelerate, but the objects within it will not, until the spacecraft collides with
them, and starts to accelerate them by pushing them with what we will at
that point decide to call the cabin floor. Crucially – and, from this point of
view, obviously – the sequence of events here is independent of the details
of the structure of the ceramic plantpot, the biology of the observer and the
petunias, and the electronic intricacies of the radio. If the spacecraft continues
to accelerate at, say, 9.81 m s − 2 , then the objects now firmly on the cabin floor
will experience a continuous force of one standard Earth gravity, and observers
within the cabin will find it difficult to tell whether they are in an accelerating
spacecraft or in a uniform gravitational field.
In fact we can make the stronger statement – and this is another physical
statement which has been verified to considerable precision in, for example,
the Eötvös experiments – that the observers will find it impossible to tell the
difference between acceleration and uniform gravitation; and this is a third
remark that we can elevate to a physical principle.

The Equivalence Principle (EP): Uniform gravitational ﬁelds are equivalent to

frames that accelerate uniformly relative to inertial frames.
1.1 Three Principles 5

The EP is closely related to the observation that gravitational and inertial mass
are strictly proportional; Rindler, for example, refers to this as the ‘weak’
equivalence principle (see Section 4.2.2).
We can summarise where we have got to as follows: (i) the principle of
general covariance thus constrains the possible forms of statements of physical
law, (ii) the EP and RP point to a privileged status of inertial frames in our
search for further such laws, (iii) the RP gives us a link to the physics that we
already know at this stage, and (iv) the EP gives us a link to the ‘gravitational
ﬁelds’ that we want to learn more about.
These three principles make a variety of physical and mathematical points.

• The principle of general covariance restricts the category of mathematical

statements that we are prepared to countenance as possible descriptions of
nature. It says something about the relationship between physics and
mathematics.
• The RP is either, in version (b) above, a straightforwardly physical
statement or, in version (a), a physical statement in mathematical form. It
picks out inertial frames as having a special status, and by saying that all
inertial frames have equal status, it restricts the transformation between any
pair of frames.
• The EP is also a physical statement. As we will examine further in
Chapter 4, it further constrains the set of ‘special’ inertial frames, while
retaining the idea that these inertial frames are physically indistinguishable,
and exploring the constraints that that equivalence imposes.

By a ‘physical statement’ I mean a statement that picks out one of multiple

mathematically consistent possibilities, and says that this one is the one that
matches our universe. Mathematically, we could have a universe in which the
galilean transformation works for all speeds, and the speed of light is inﬁnite;
but we don’t.

Most of the statements in this section can be quibbled with, some-

times with great sophistication. The statement of the RP is quoted
with minor adaptation from Barton (1999), who discusses the principle at
book length in the context of SR. The wording of the EP is from Schutz
(2009, §5.1), but Rindler (2006) discusses this with characteristic precision
in his early chapters (distinguishing weak, strong, and semistrong variants
of the EP), and Misner, Thorne and Wheeler (1973, §§7.2–7.3) discuss it
with characteristic vividness. There is a minor industry devoted to the precise
physical content of the EP and the principle of general covariance, and to their
logical relationship to Einstein’s theory of gravity. This industry is discussed
at substantial length by Norton (1993), and subsequent texts quoting it, but it
6 1 Introduction

does not seem to contribute usefully to an elementary discussion such as this

one, and I have thought it best to keep the account in this section as compact
and as straightforward as possible, while noting that there is much more one
can go on to think about.

1.2 Some Thought Experiments on Gravitation

At the risk of some repetition, we can make the same points again, and
make some further interesting deductions, through a sequence of thought
experiments.

1.2.1 The Falling Lift

Recall from SR that we may define an inertial frame to be one in which
Newton’s laws hold, so that particles that are not acted on by an external force
move in straight lines at a constant velocity. In Misner, Thorne, and Wheeler’s
words, inertial frames and their time coordinates are defined so that motion
looks simple. This is also the case if we are in a box far away from any
gravitational forces, we may identify that as a local inertial frame (we will
see the significance of the word ‘local’ later in the chapter). Another way of
removing gravitational forces – less extreme than going into deep space – is to
put ourselves in free fall. Einstein asserted that these two situations are indeed
fully equivalent, and defined an inertial frame as one in free fall.
Objects at rest in an inertial frame – in either of the equivalent situations
of being far away from gravitating matter or freely falling in a gravitational
field – will stay at rest. If we accelerate the box-cum-inertial-frame, perhaps
by attaching rockets to its ‘floor’, then the box will accelerate but its contents
won’t; they will therefore move towards the floor at an increasing speed, from
the point of view of someone in the box. 1 This will happen irrespective of the
mass or composition of the objects in the box; they will all appear to increase
their speed at the same rate.
Note that I am carefully not using the word ‘accelerate’ for the change in
speed of the objects in the box with respect to that frame. We reserve that word
for the physical phenomenon measured by an accelerometer, and the result of
a real force, and try to avoid using it (not, I fear, always successfully) to refer

1 By ‘point of view’ I mean ‘as measured with respect to a reference frame ﬁxed to the box’, but
such circumlocution can distract from the point that this is an observationwe’re talking about –
we can see this happening.
1.2 Some Thought Experiments on Gravitation 7

Figure 1.1 A ﬂoating box.

Figure 1.2 A free-fall box.

to the second derivative of a position. Depending on the coordinate system, the

one does not always imply the other, as we shall see later.
This is very similar to Galileo’s observation that all objects fall under
gravity at the same rate, irrespective of their mass or composition. Einstein
supposed that this was not a coincidence, and that there was a deep equivalence
between acceleration and gravity (we shall see later, in Chapter 4, that the force
of gravity that we feel while standing in one place is the result of us being
accelerated away from the path we would have if we were in free fall). He
raised this to the status of a postulate: the Equivalence Principle.
Imagine being in a box floating freely in space, and imagine shining a torch
horizontally across it from one wall to the other (Figure 1.1). Where will the
beam end up? Obviously, it will end up at a point on the wall directly opposite
the torch. There’s nothing exotic about this. The EP tells us that the same must
happen for a box in free fall. That is, a person inside a falling lift would observe
the torch beam to end up level with the point at which it was emitted, in the
(inertial) frame of the lift. This is a straightforward and unsurprising use of the
EP. How would this appear to someone watching the lift fall?
Since the light takes a finite time to cross the lift cabin, the spot on the wall
where it strikes will have dropped some finite (though small) distance, and so
will be lower than the point of emission, in the frame of someone watching
this from a position of safety (Figure 1.2). That is, this non-free-fall observer
will measure the light’s path as being curved in the gravitational field. Even
massless light is affected by gravity. [Exercise 1.1]
8 1 Introduction

+ =
Figure 1.3 The Pound-Rebka experiment.

1.2.2 Gravitational Redshift

Imagine dropping a particle of mass m through a distance h . The particle starts
off with energy m (E = mc 2 , with c = 1; see Section 1.4.1), and ends up
with energy E = m + mgh (see Figure 1.3). Now imagine converting all of
this energy into a single photon of energy E , and sending it up towards the
original position. It reaches there with energy E ± , which we convert back into
a particle.2 Now, either we have invented a perpetual motion machine, or else
E ± = m:
E
E± = m = , (1.4)
1 + gh
and we discover that a photon loses energy as a necessary consequence of
climbing through a gravitational field, and as a consequence of our demand
that energy be conserved.
This energy loss is termed gravitational redshift , and it (or rather, some-
thing very like it) has been confirmed experimentally, in the ‘Pound-Rebka
experiment’. It’s also sometimes referred to as ‘gravitational doppler shift’,
but inaccurately, since it is not a consequence of relative motion, and so has
nothing to do with the doppler shift that you are familiar with.
Light, it seems, can tell us about the gravitational field it moves through.

1.2.3 Schild’s Photons

Imagine ﬁring a photon, of frequency f , from an event A to an event B spatially
located directly above it in a gravitational ﬁeld (see Figure 1.4). As we discov-
ered in the previous section, the photon will be redshifted to a new frequency f ± .
After some number of periods n, we repeat this, and send up another photon
(between the points marked A ± and B ± on the space-time diagram).

2 As described, this is kinematically impossible, since we cannot do this and conserve

momentum, but we can imagine sending distinct particles back and forth, conserving just
energy; this would have an equivalent effect, but be more intricate to describe precisely.
1.2 Some Thought Experiments on Gravitation 9

t
B
A n/f
n/f B
A z

Figure 1.4 Schild’s photons.

Photons are a kind of clock, in that the interval between ‘wavecrests’, 1 /f ,

forms a kind of ‘tick’. The length of this tick will be measured to have different
numerical values in different frames, but the start and end of the interval
nonetheless constitute two frame-independent events.
Presuming that the source and receiver are not in relative motion, the
intervals AB and A ± B ± will be the same (I’ve drawn these as straight lines on the
diagram, but the argument doesn’t depend on that). However, the intervals AA ±
and BB ± comprise the same number n of periods, which means that the intervals
in time, n/f and n/f ± , as measured by local clocks, are different . That is, we
have not constructed the parallelogram we might have expected, and have
therefore discovered that the geometry of this space-time is not ﬂat geometry
we might have expected, and that this is purely as a result of the presence of
the gravitational ﬁeld through which we are sending the photons.
Finding out more about this geometry is what we aim to do in this text.

The ‘Schild’s photons’ argument, and a version of the gravitational

redshift argument, ﬁrst appeared in Schild (1962), where both are
presented in careful and precise detail. The subtleties are important, but the
arguments in the sections earlier in this chapter, though slightly schematic,
contain the essential intuition. Schild’s paper also includes a thoughtful
discussion of what parts of GR are and are not addressed by experiment.

1.2.4 Tides and Geodesic Deviation (and Local Frames)

Consider two particles, A and B , both falling towards the earth, with their
height from the centre of the earth given by z (t) (Figure 1.5). They start off
level with each other and separated by a horizontal distance ξ ( t).
From the diagram, the separation ξ (t ) is proportional to z (t), so that
ξ ( t) = kz (t ) , for some constant k . The gravitational force on a particle of
mass m at altitude z is F = GMm /z 2 , thus
10 1 Introduction

A ξ (t) B

z (t)

Figure 1.5 Two falling particles.

d 2ξ d2 z F GM GM
2
= k
2
= −k = −k 2 = −ξ 3 .
dt dt m z z
This tells us that the inertial frames attached to these freely falling particles
approach each other at an increasing speed (that is, they ‘accelerate’ towards
each other in the sense that the second derivative of their separation is non-
zero, but since they are in free fall, there is no physical acceleration that an
observer in the frame would feel as a push).
If A and B are two observers in inertial frames (or inertial spacecraft), then
we have said that they cannot distinguish between being in space far from any
gravitating masses, and being in free fall near a large mass. If instead they
found themselves at opposite ends of a giant free-falling spacecraft, then they
would find themselves drifting closer to each other as the spacecraft fell, in
apparent violation of Newton’s laws. Is there a contradiction here?
No. The EP as quoted in Section 1.1 talked of uniform gravitational fields,
which this is not. Also, both the RP of that section, and the discussion in
Section 1.2.1, talked of local inertial frames. A lot of SR depends on inertial
frames having infinite extent: if I am an inertial observer, then any other
inertial observer must be moving at a constant velocity with respect to me.
In GR, in contrast, an inertial frame is a local approximation (indeed it is fully
accurate only at a point, an important issue we will return to later), and if your
measurement or experiment is sufficiently extended in space or time, or if your
instruments are sufficiently accurate, then you will be able to detect tidal forces
in the way that A and B have done in this thought experiment.
If A and B are plummeting down lift shafts, in free fall, on opposite sides
of the earth, then they are inertial observers, but they are ‘accelerating’ with
respect to one another. This means that, if I am one of these inertial observers,
then (presuming I do not have more pressing things to worry about) I cannot
use SR to calculate what the other inertial observer would measure in their
frame, nor calculate what I would measure if I observed a bit of physics that
I understand, which is happening in the other inertial observer’s frame.
1.3 Covariant Differentiation 11

But this is precisely what I do want to do, supposing that the bit of physics in
question is happening in free fall in the accretion disk surrounding a black hole,
and I want to interpret what I am seeing through my telescope. Gravitational
redshift of spectral lines is just the beginning of it.
It is GR that tells us how we must patch together such disparate inertial
frames. [Exercise 1.2]

1.3 Covariant Differentiation

Like many other parts of physics, the study of gravitation depends on differ-
ential equations, and working with differential equations depends (obviously)
on being able to differentiate. A large fraction of this book – essentially all of
Chapter 3 – is taken up with learning how to deﬁne differentiation in a curved
space-time.
In many ways the key section of the book is Section 3.3.2. That section
builds directly on the deﬁnition of differentiation that you learned about in
school. For some function f : R → R,

df f (x + h) − f (x )
= lim .
dx h→ 0 h

That deﬁnition is straightforward because it’s obvious what f (x + h ) − f (x)

means, and it’s obvious how we divide that by a number.
In a curved space-time, however, a naive approach won’t work, because
the objects we want to differentiate are vectors (or other geometrical objects
such as tensors, which we will learn about next, in Chapter 2), and the way
we subtract them must be independent of any particular choice of coordinate
system. We can see part of the problem even in two dimensions: while it is
easy to see how to subtract two cartesian vectors (we simply work component
by component), it is less clear how to subtract two vectors expressed in polar
coordinates. If we go on to think about how to define and perform arithmetical
operations on vectors defined on the surface of a sphere – a two-dimensional
surface with intrinsic curvature – things become yet more subtle (think of the
difference between plane and spherical trigonometry). All that said, the intu-
ition that lies behind the definition earlier in this section is the same intuition
that underlies the more elaborate maths of Chapter 3. Hold on to that thought.
In the next two chapters we will approach these problems step by step, and
return to physics in Chapter 4, when we get a chance to apply these ideas in
developing Einstein’s equations for the structure of space-time. Appendix B is
all about further application of the tools we develop in this one. The sequence
of ideas is shown in Figure 1.6.
12 1 Introduction

1
principles
2
tensors

scitamehtam
scisyhp

3
diff’n
4
gravity

Figure 1.6 The sequence of ideas. In Chapters 2 and 3 we examine the mathe-
matical technology that we will need to turn the principles of Chapter 1 into the
physics of Chapter 4.

1.4 A Few Further Remarks

1.4.1 Natural Units
In SR, we normally use natural units (also geometrical units , and not quite
the same thing as Planck units ), in which we use the same units, metres, to
measure both distance and time, with the result that we measure distance in
these two directions in space-time using the same units (because of the high
speed of light, metres and seconds are otherwise absurdly mismatched). We
extend this in GR, but now measuring mass in metres also. First, a recap of
natural units in SR.
It is straightforward to measure distances in time-units, and we do this natu-
rally when we talk of Edinburgh being 50 minutes from Glasgow (maintenance
works permitting), or the earth being 8 light-minutes from the sun, or the
nearest star being a little more than 4 light years away. In fact, since 1981
or so, the International Standard definition of the metre is that it is the distance
light travels in 1/299,792,458 seconds; that is, the speed of light is precisely
c = 299,792,458 m s− 1 by definition, and so c is therefore demoted to being
merely a conversion factor between two different units of distance, namely the
metre and the (light-)second.
Alternatively, we can decide that this relation gives us permission to think
of the metre as a (very small) unit of time: specifically the time it takes for light
to travel a distance of one metre (about 3.3 nanoseconds-of-time).
1.4 A Few Further Remarks 13

There are several advantages to this: (i) In relativity, space and time are not
really distinct, and having different units for the two ‘directions’ can obscure
this; (ii) In these units, light travels a distance of one metre in a time of one
metre, giving the speed of light as an easy-to-remember, and dimensionless,
c = 1; (iii) If we measure time in metres, then we no longer need the
conversion factor c in our equations, which are consequently simpler. We also
quote other speeds in these units of metres per metre, so that all speeds are
dimensionless and less than one.
Of these three points, the first is by far the most important.
Writing c = 1 = 3 × 108 m s− 1 (dimensionless) looks rather odd, until
we read ‘seconds’ as units of length. In the same sense, the inch is defined
to be precisely 25.4 mm long, and this figure of 25.4 is merely a conversion
factor between two different, and only historically distinct, units of length.
We write this as 1 in = 25.4 mm or, equivalently but unconventionally, as
1 = 25.4 mm in− 1 .
Consider converting 10 J = 10 kg m2 s− 2 to natural units. Since c = 1, we
have 1 s = 3 × 108 m, and so 1 s− 2 = (9 × 1016)− 1 m− 2 . So 10 kg m2 s− 2 =
10 kg m2 × (9 × 1016)− 1 m− 2 = 1.1 × 10 − 16 kg. Recalling SR’s E = γ mc 2 =
γ m , it should be unsurprising that, in the ‘right’ units, mass has the same units
as other forms of energy.
In GR it is also usual to use units in which the gravitational constant is
G = 1. That means that the expression 1 = G = 6.673 × 10− 11 m3 kg− 1 s− 2 =
7.414 × 10− 28 m kg− 1 becomes a conversion factor between kilogrammes and
the other units. This, for example, gives the mass of the sun, in these units, as
M ² ≈ 1.5 km.
It is easy, once you have a little practice, to convert values and equations
between the different systems of units. Throughout the rest of this book, I will
quote equations in units where c = 1, and, when we come to that, G = 1, so
that the factors c and G disappear from the equations. [Exercise 1.3]

1.4.2 Further Reading

When learning relativity, even more than with other subjects, you beneﬁt from
reading things multiple times, from different authors, and from different points
of view. I mention a couple of good introductions here, but there is really no
substitute for going to the appropriate section in a library, looking through the
books there, and ﬁnding one that makes sense to you .
I presume you are familiar with SR. There is a brief summary of SR in
Appendix A, which is intended to be compatible with this book.
14 1 Introduction

This book is signiﬁcantly aligned with Schutz (2009) (hereafter simply

‘Schutz’), in the sense that this is the book closest in style to this text;
also, I will occasionally direct you to particular sections of it, for details or
proofs.
Other textbooks you might want to look at follow. You may want to use
these other books to take your study of the subject further. But you might
also use them (perhaps a little cautiously) to test your understanding as
you go along, by comparing what you have read here with another author’s
approach.

• Carroll (2004) is very good. Although it’s mathematically similar, the order
of the material, and the things it stresses, are sufficiently different from this
book and Schutz that it might be confusing. However, that difference is also
a major virtue: the book introduces topics clearly, and in a way that usefully
contrasts with my way. Also, Carroll’s relativity lecture notes from a few
years ago, which are a precursor of the book, are easily findable on the
Internet.
• Rindler (2006) always explains the physics clearly, distinguishing
successively strong variants of the EP, and the motivation for GR (his first
two chapters are, incidentally, notably excellent in their careful explanation
of the conceptual basis of SR). However Rindler is now rather
old-fashioned in many respects, in particular in its treatment of differential
geometry, which it introduces from the point of view of coordinate
transformations, rather than the geometrical approach we use later in the
book. Earlier editions of this book are equally valuable for their insight.
• Similarly, again, Narlikar (2010) is worthwhile looking at, to see if it suits
you. The mathematical approach is one which introduces vectors and
tensors via components (like Rindler), rather than the more functional
approach we’ll use here. Narlikar is good at transmitting mathematical and
physical insights.
• Misner, Thorne, and Wheeler (1973) is a glorious, comprehensive, doorstop
of a book. Its distinctive prose style and typographical oddities have fans
and detractors in roughly equal numbers. Chapter 1 in particular is worth
reading for an overview of the subject. MTW is, incidentally, highly
compatible in style with the introduction to SR found in Taylor and
Wheeler’s excellent Spacetime Physics (1992).
• Wald ( 1984) is comprehensive and has long been a standby of
undergraduate- and graduate-level GR courses.
• Hartle (2003) is more recent and similarly popular, with a practical focus.
1.4 A Few Further Remarks 15

This is a pretty mathematical topic, but it is supposed to be a physics book,

so we’re looking for the physical insights, which can easily become buried
beneath the maths.

• Another Schutz book, Gravity from the Ground Up Schutz (2003), aims to
cover all of gravitational physics from falling apples to black holes using
the minimum of maths. It won’t help with the differential geometry, but it’ll
supply lots of insight.
• Longair (2003) is excellent. The section on GR (only a smallish part of the
book) is concerned with motivating the subject rather than doing a lot of
maths, and is in a seat-of-the-pants style that might be to your taste.

There are also many more advanced texts. The following are graduate-level
texts, and so reach well beyond the level of this book. They are mathematically
very sophisticated. If, however, your tastes and experience run that way, then
the introductory chapters of these books might be instructive, and give you a
taste of the vast wonderland of beautiful maths that can be found in this subject.
They can also be useful as a way of compactly summarising material you have
come to understand by a more indirect route.

• Chapter 1 of Stewart (1991) covers more than the content of this course in
its ﬁrst 60 laconic pages.
• Geometrical Methods of Mathematical Physics (Schutz, 1980) is a
delightful book, which explains the differential geometry clearly and
sparsely, including applications beyond relativity and cosmology. However,
it appeals only to those with a strong mathematical background; it may
cause alarm and despondency in others.
• Hawking and Ellis (1973), chapter 2, covers more than all the differential
geometry of this book.

1.4.3 Notation Conventions, Here and Elsewhere

I use overlines to denote vectors: A . This is consistent with Schutz (1980), but
relatively rare elsewhere; it seems neater to me than the over-arrow version A³
(as well as easier to write by hand). One-forms are denoted with a tilde: ± p.
Tensors are in sans serif: g. For a summary of other notation conventions, see
Appendix C.
There are a number of different sign conventions in use in relativity books.
The conventions used in this book match those in Schutz, MTW (1973), Schutz
(1980), and Hawking and Ellis (1973). We can summarise the conventions for
16 1 Introduction

Table 1.1 Sign conventions in various texts. This text also matches Hawking
and Ellis (1973) and MTW. References are to equation numbers in the
corresponding texts, except where indicated. For explanations, see Eq. (1.5).

Riemann Ricci Einstein metric

This text + (3.49) + (4.16) + (4.37) + (2.33)

Schutz + (6.63) + (6.91) + (8.7) + (3.1)
Carroll + (3.113) + (3.144) + (4.44) + (1.15)
Rindler + (8.20) − (8.31) − (9.71) − (7.12)
Stewart − (1.9.4) + (1.9.12) − (1.13.5) − (§1.10)

a few texts (imitating MTW’s corresponding table) in Table 1.1. In this table,
the signs are
´ R i jkl = i
²jl ,k − i
²jk,l + i
²σ k ²jl
σ
− i σ
²σ l ²jk

´ R β ν = R µβ µν
(1.5)
G µν = R µν − 12 Rg µν = ´ 8π T µν
´ ηµν = diag (− 1, + 1, + 1, + 1)

Exercises
Here and in the following chapters, the notations d + , d − , u + , and so on,
indicate questions that are slightly more or less difﬁcult, or more useful, than
others.

Exercise 1.1 (§ 1.2.1) A photon is sent across a box of width h sitting

in space, while it is being accelerated at 1g , in the same direction, by a
rocket. What is the frequency (or energy) of the photon when it is absorbed
by a detector on the other side of the box? Use the Doppler redshift formula
νem /νobs = 1 + v (in units where c = 1), and note that the box will not move
far in this time. How does this link to other remarks in this section? [u + ]

Exercise 1.2 (§ 1.2.4) If two 1 kg balls, 1m apart, fall down a lift shaft
near the surface of the earth, how much is their tidal acceleration towards each
other? How much is their acceleration towards each other as a result of their
mutual gravitational attraction?

Exercise 1.3 (§ 1.4.1) Convert the following to units in which c = 1: (a)

10 J; (b) lightbulb power, 100 W; (c) Planck’s constant, h̄ = 1.05 × 10− 34 J s;
1.4 A Few Further Remarks 17

(d) velocity of a car, v = 30 m s− 1 ; (e) momentum of a car, 3 × 104 kg m s− 1 ;

(f) pressure of 1 atmosphere, 105 N m− 2 ; (g) density of water, 10 3 kg m− 3; (h)
luminosity ﬂux, 10 6 J s− 1 cm− 2.
Convert the following to physical units (SI): (i) velocity, v = 10− 2; (j)
pressure 10 19 kg m− 3 ; (k) time 1018 m; (l) energy density u = 1 kg m− 3; (m)
acceleration 10 m− 1 ; (n) the Lorentz transformation, t± = γ (t − vx ); (o) the
‘mass-shell’ equation E 2 = p 2 + m 2. Problem slightly adapted from Schutz
(2009, ch.1).
2
Vectors, Tensors, and Functions

At this point we take a holiday from the physics, in favour of mathematical

preliminaries. This chapter is concerned with deﬁning vectors, tensors, and
functions reasonably carefully, and showing how they are linked with the
notion of coordinate systems. This will take us to the point where, in Chapter 3,
we can talk about doing calculus with these objects. You may well be familiar
with many of the mathematical concepts in this chapter – functions, vector
spaces, vector bases, and basis transformations – but I will (re)introduce them
in this chapter with a slightly more sophisticated mathematical notation, which
will allow us to make use of them later. The exception to that is tensors,
which may have seemed slightly gratuitous, if you have encountered them at
all before; they are vital in relativity.

2.1 Linear Algebra

The material in this section will probably be, if not familiar, at least recognis-
able to you, though possibly with new notation. After this section, I’m going to
assume you are comfortable with both the concepts and the notation; you may
wish to recap some of your ﬁrst- or second-year maths notes. See also Schutz’s
appendix A.
Here, and elsewhere in this book, the idea of linearity is of crucial
importance; it is not, however, a complicated notion. Consider a function
(or operator or other object) f , objects x and y in the domain of f , and
numbers { a , b } ∈ R: if f (a x + b y) = af (x) + bf (y), then the function f is
said to be linear . Thus the function f = ax is linear in x , but f = ax + b ,
f = ax 2 and f = sin x are not; matrix multiplication is linear in the (matrix)
arguments, but the rotation of a solid sphere (for example) is not linear in the
Euler angles (note that although you might refer to f (x ) = ax + b as a ‘straight

18
2.1 Linear Algebra 19

line graph’, or might refer to it as linear in other contexts, in this formal sense
it is not a linear function, because f (2x ) ±= 2f (x )).

2.1.1 Vector Spaces

Mathematicians use the term ‘vector space’ to refer to a larger set of objects
than the pointy things that may spring to your mind at ﬁrst.
A set of objects V is called a vector space if it satisﬁes the following axioms
(for A , B ∈ V and a ∈ R):

1. Closure: there is a symmetric binary operator ‘+’, such that

A + B = B + A ∈ V.
2. Identity: there exists an element 0 ∈ V , such that A + 0 = A .
3. Inverse: for every A ∈ V , there exists an element B ∈ V such that
A + B = 0 (incidentally, these ﬁrst three properties together mean that V is
classiﬁed as an abelian group ).
4. Multiplication by reals: for all a and all A , aA ∈ V and 1 A = A .
5. Distributive: a (A + B ) = aA + aB .

The obvious example of a vector space is the set of vectors that you learned
about in school, but crucially, anything that satisfies these axioms is also a
vector space.
Vectors A 1, . . . , A n are linearly independent (LI) if a1 A 1 + a 2A 2 + · · · +
a nA n = 0 implies a i = 0, ∀ i. The dimension of a vector space, n, is the largest
number of LI vectors that can be found. A set of n LI vectors A i in an n -
dimensional space is said to span the space, and is termed a basis for the space.
Then is it a theorem that, for every vector B ∈ V , there exists a set of numbers
{ bi } such that B =
∑n b A ; these numbers { b } are the components of the
i= 1 i i i
vector B with respect to the basis { A i } .
One can (but need not) define an inner product on a vector space: the inner
product between two vectors A and B is written A · B (yes, the dot-product
that you know about is indeed an example of an inner product; also note
that the inner product is sometimes written ² A , B ³ , but we will reserve that
notation, here, to the contraction between a vector and a one-form, defined in
Section 2.2.1). This is a symmetric, linear, operator that maps pairs of vectors
to the real line. That is (i) A · B = B · A , and (ii) (aA + bB ) · C = aA · C + bB · C .
Two vectors, A and B , are orthogonal if A · B = 0. An inner-product is positive-
definite if A · A > 0 for all A ±= 0, or indefinite otherwise. The norm of a
vector A is | A | = |A · A | 1/ 2. The symbol δ ij is the Kronecker delta symbol ,
defined as
20 2 Vectors, Tensors, and Functions

±
1 if i = j
δ ij ≡ (2.1)
0 otherwise
(throughout the book, we will use variants of this symbol with indexes raised
or lowered – they mean the same: δ ij = δ i j = δ i j ; see the remarks about this
object at the end of Section 2.2.6). A set of vectors { e i } such that e i · e j = δ ij
(that is, all orthogonal and with unit norm) is an orthonormal basis. It is a
theorem that, if { b i } are the components of an arbitrary vector B in this basis,
then b i = B · e i . [Exercises 2.1 and 2.2]

2.1.2 Matrix Algebra

An m × n matrix A is a mathematical object that can can be represented by a
set of elements denoted A ij , via
⎛ A 11 A 12 · · · A 1n
⎞
⎜
⎜ A 21 A 22 · · · A 2n ⎟⎟
A = ⎜
⎝ .. .. .. ⎟⎠ .
. . .
A m1 Am2 · · · A mn
You know how to define addition of two m × n matrices, and multiplication
of a matrix by a scalar, and that the result in both cases is another matrix, so
the set of m × n matrices is another example of a vector space. You also know
how to define matrix multiplication: a vector space with multiplication defined
is an algebra , so what we are now discussing is matrix algebra.
A square matrix (that is, n × n ) may have an inverse, written A− 1 , such
that AA− 1 = A− 1A = 1 (one can define left- and right-inverses of non-square
matrices, but they will not concern us). The unit matrix 1 has elements δij .
You can define the trace of a square matrix, as the sum of the diagonal
elements, and define the determinant by the usual intricate formula. Since
both of these are invariant under a similarity transformation (A ´→ P − 1AP ),
the determinant and trace are also the product and sum, respectively, of the
matrix’s eigenvalues.
Make sure that you are in fact familiar with the matrix concepts in this
section.

2.2 Tensors, Vectors, and One-Forms

Most of the rest of this book is going to be talking about tensors one way or
another, so we had better grow to love them now. See Schutz, chapter 3.
2.2 Tensors, Vectors, and One-Forms 21

I am going to introduce tensors in a rather abstract way here, in order to

emphasise that they are in fact rather simple objects. Tensors will become a
little more concrete when we introduce tensor components shortly. and in the
rest of the book we will use these extensively, but introducing components
from the outset can hide the geometrical primitiveness of the underlying
objects. I provide some speciﬁc examples of tensors in Section 2.2.2.

2.2.1 Deﬁnition of Tensors

()
For each M , N = 0, 1, 2, . . . , the set of MN tensors is a set that obeys the
()
axioms of a vector space from Section 2.1.1. Three of these sets of tensors
( ) ()
have special names: a 00 tensor is just a scalar function that maps R → R;
we refer to a 10 tensor as a vector and write it as A , and refer to a 01 tensor
as a one-form , written ²
A . The clash with the terminology of Section 2.1.1 is
unfortunate (because all of these objects are ‘vectors’ in the terminology of that
section), but from now on when we refer to a ‘vector space’, we are referring
()
to Section 2.1.1, and when we refer to ‘vectors’, we are referring specifically
to 10 tensors.
For the moment, you can perfectly reasonably think of vectors as exactly
the type of vectors you are used to – a thing with a magnitude and a direction
in space. In Chapter 3, we will introduce a new definition of vectors which is
of crucial importance in our development of GR.
()
M
Definition: A N tensor is a function, linear in each argument, which takes M
one-forms and N vectors as arguments, and maps them to a real number.
()
() ()
Because we said that an MN tensor was an element of a vector space, we
(M) MN tensors, or if we multiply an MN tensor by
already know that if we add two
a scalar, then we get another N tensor. This definition does seem very abstract,
()
but most of the properties we are about to deduce follow directly from it.
For example, we can write the 21 tensor T as

T( ²· , ²· , · ),
to emphasise that the function has two ‘slots’ for one-forms and one ‘slot’ for
a vector. When we insert one-forms ² p and ²q , and vector A , we get T (² p,²q , A ),
which, by our deﬁnition of a tensor, we see must be a pure number, in R.
Note that this ‘dots’ notation is an informal one, and though I have chosen to
write this in the following discussion with one-form arguments (1)all to the left²of
vector ones, this is just for the sake of clarity: in (general,
) the 1 tensor T( · , · )
is a perfectly good tensor, and distinct from the 11 tensor T(²· , · ).
22 2 Vectors, Tensors, and Functions

()
Note ﬁrstly that there is nothing in the deﬁnition of a tensor that states that
the arguments are interchangeable, thus, in the case of a 02 tensor U( · , · ),
U( A , B ) ±= U( B , A ) in general: if in fact U( A , B ) = U( B , A ) , ∀ A , B , then U is

said to be symmetric ; and if U(A , B ) = − U(B , A ), ∀ A , B , it is antisymmetric .

Note also, that if we insert only some of the arguments into this tensor T,

T( ω ², ²· , · ),
then we obtain an object that can take a single one-form( )and a single vector,
and map them into a number; in other words, we have a 11 tensor. If we ﬁll in
a further argument
²²
V = T ( ω, · , A )

then we obtain an object with a single one-form argument, which is to say, a

vector.
As I said earlier in the section, a vector maps a one-form into a number, and
a one-form maps a vector into a number. Thus, for arbitrary A and ² p , both A (²
p)
and ²
p (A ) are numbers. There is nothing in the deﬁnition that requires them to
be the same number, but in GR we will mutually restrict these two functions
by requiring that the two numbers are the same in fact. Thus
³ ´
²p(A ) = A (²p ) ≡ ²p, A , ∀ ²p, A [in GR] (2.2)

where the notation ²· , ·³ emphasises the symmetry of this operation. This

combination of the two objects is known as the contraction of ²
p with A . There
are some further remarks about components at the end of Section 2.2.5.

2.2.2 Examples of Tensors

This description of tensors is very abstract, so we need some examples
promptly. In this section, we introduce a representation, in terms of row and
column vectors, of the structures previously deﬁned. In Section 2.3 we will
describe some more representations.
The most immediate example of a vector is the column vector you are
familiar with, and the one-forms that correspond to it are simply row-vectors.

²p = ( p 1, p 2 ),
µ A 1¶
A= ,
A2
³ ´ µA 1 ¶
²p, A = ( p 1, p 2 )
A2
= p1 A 1 + p 2A 2. (2.3)
2.2 Tensors, Vectors, and One-Forms 23

A
A2

A1
e2 A = A1 e1 + A2 e2
e1
Figure 2.1 A vector.

Here, we see the one-form ² p and vector A contracting to form a number, by

the usual rules of matrix multiplication. Or we can see ² p as a real-valued

³ ´
function over vectors, mapping them to numbers, and similarly A , a real-valued
function over one-forms. In this equation, we have chosen to deﬁne ² p, A using
the familiar mechanism of matrix multiplication; the deﬁnitions of A (² p) and
²p (A ) then come for free, using the equivalences of Eq. (2.2) (I have written
the vector components with raised indexes in order to be consistent with the
notation introduced in Section 2.2.5; note, by the way, that the vector illustrated
in Figure 2.1 is not anchored to the origin – it is not a ‘position vector’, since
that is a thing that would change on any change of origin).
How about tensors of higher rank? Easy: matching the row and column
vectors from this section, a square matrix
µa a 12
¶
11
T =
a21 a 22

( ) and one vector, and maps them to a

is a function that takes one one-form
number, which is to say it is a 11 tensor. If we supply only one of the
arguments, to get TA , we get an object that has a single one-form argument,
which is to say, another vector.
()
In this speciﬁc context, every 2 × 2 matrix is a 11 tensor, in the
sense that it can be contracted with one one-form and one vector.
In Section 2.2.5, we will discover that any tensor has a set of components that
may be written as a matrix. Do not fall into the trap, however, of thinking
that a tensor is ‘just’ a matrix, or that an arbitrary set of numbers necessarily
corresponds, in general, to some tensor. The numbers that are the components
of the tensor in some coordinate basis are the results of contracting the tensor
with that basis, and as the tensor changes from point to point in the space, or
if you change the basis, the components will change systematically. Indeed,
the coordinate-based approach to differential geometry, as exempliﬁed by
24 2 Vectors, Tensors, and Functions

Rindler (2006), deﬁnes tensors by requiring that their components change in

a systematic way on a change of basis (this approach seemed arbitrary to the
point of perversity, when I first learned GR by this route).
As you are aware, there are many quantities in physics that are modelled
by an object that has direction and magnitude – for example velocity, force,
or angular momentum; if they additionally have the property that they are
additive, in the way that two velocities added together make another velocity,
then they may be modelled specifically by a vector or one-form (and as we
will learn in Section 2.3.1, these are almost indistinguishable in euclidean
space, though the distinction is hinted at in mentions of ‘pseudovectors’ or
the occasionally odd behaviour of cross products). There are fewer things that
are naturally modelled by higher-rank tensors.
The inertia tensor is a rank-2 tensor, In, which, when given an angular
velocity vector ω, produces the angular momentum L ; or in tensor terms
L = In(ω, · ). If we supply ω as the other argument, then we get a quantity T ,
the kinetic energy, such that 2 T = ω · L = In(ω , ω), and writing ω = ωn and
I = In(n , n ), we have1 the familiar T = I ω 2/2.
In continuum mechanics, the Cauchy stress tensor describes the stresses
within a body. Given a real or imaginary surface within the body, indicated by
a normal ² n, the stress tensor σ determines the magnitude and direction of the
force per unit area experienced by that surface, via F = σ (² n ,²
· ). If we supply
this with a (one-form) displacement ² s , then we find the scalar magnitude of
the work done per unit area: F (² s ) = σ (²n ,²
s ). Thus the stress tensor takes two
geometrical objects as arguments, and turns them into a number. 2
Can we form tensors of other ranks? We can – recall that they are simply
a function of some number of vectors and one-forms – as long as we have
some way of defining a value for the function, and presumably some physical
motivation for wanting to do so.
We can also construct tensors of arbitrary rank by using the outer product
of multiple vectors or one-forms (this is sometimes also known as the direct
product or tensor product). We won’t actually use this mechanism until we get
to Chapter 4, but it’s convenient to introduce it here.

1 See for example Goldstein (2001). Note that here we have elided the distinction between
vectors and one-forms, since the distinction does not matter in the euclidean space where we
normally care about the inertia tensor.
2 There are multiple ways of describing forces and displacements in terms of vectors and
one-forms (supposing that we are careful enough to care about the distinction between them),
and the consequent rank of σ . Every account of continuum mechanics seems to make its own
choices here: this variety of ‘accents’ serves to remind us that mathematics is a way that we
have of describingnature, and not the same thing as nature itself.
2.2 Tensors, Vectors, and One-Forms 25

()
If we have vectors V and W , then we can form a 20 tensor written V ⊗ W ,
the value of which on the one-forms ²
p and ²
q is deﬁned to be

(V ⊗ W )(²
p, ²
q) ≡ V (²
p ) × W (²
q).

This object V ⊗ W is known as the outer product of the two vectors; see Schutz,
section 3.4. For example, given two column vectors A and B , the object
µA 1 ¶ µB 1¶
A⊗ B = ⊗
A2 B2

is a
( ) tensor whose value when applied to the two one-forms ²p and ²q is
2
0

(A ⊗ B )(²
p, ²
q) = A (²
p ) × B (²
q)
µA 1¶ µB 1 ¶
= ( p 1, p2 ) × ( q1 , q 2)
A2 B2
= ( p 1A 1 + p2 A 2 ) × ( q1 B 1 + q 2 B 2) .
In a similar way, we can use the outer product to form objects of other ranks
from suitable combinations of vectors and one-forms. Not all tensors are
necessarily outer products, though all tensors can be represented as a sum of
outer products.

2.2.3 Fields
We will often want to refer to a scalar-, vector- or tensor-field. A field is just
a function, in the sense that it maps one space to another, but in this book
we restrict the term ‘field’ to the case of a tensor-valued function, where the
domain is a physical space or space-time. That is, a field is a rule that associates
a number, or some higher-rank tensor, with each point in space or in space-
time. Air pressure is an example of a scalar field (each point in 3-d space has a
number associated with it), and the electric and magnetic fields, E and B, are
vector fields (associating a vector with each point in 3-d space).

2.2.4 Visualisation of Vectors and One-Forms

We can visualise vectors straightforwardly as arrows, having both a magnitude
and a direction. In order to combine one vector with another, however, we need
to add further rules, deﬁning something like the dot product and thus – as we
will soon learn – introducing concepts such as the metric (Section 2.2.6).
How do we visualise one-forms in such a way that we distinguish them from
vectors, and in such a way that we can visualise (metric-free) operations such
as the contraction of a vector and a one-form?
26 2 Vectors, Tensors, and Functions

A
p
Figure 2.2 Contraction of 2-d vectors and one-form.

Figure 2.3 Contraction: contours on a map.

The most common way is to visualise a one-form as a set of planes in the

appropriate space. Such a structure picks out a direction – the direction perpen-
dicular to the planes – and a magnitude that increases as the separation between
the planes decreases . The contraction between a vector and a one-form thus
visualised is the number of the one-form planes that the vector crosses.
In Figure 2.2, we see two different vectors and one one-form, ² p. Although
the two vectors are of different lengths (though we don’t ‘know’ this yet, since
we haven’t yet talked about a metric and thus have no notion of ‘length’), their
contraction with the one-form is the same, namely 2.
You may already be familiar with this picture, if you are familiar with the
notion of contours on a map. These show the gradient of the surface they
are mapping, with the property that the closer the coutours are together, the
larger is the gradient. The three vectors shown in Figure 2.3, which might be
different paths up the hillside, have the same contraction – the path climbs
three units – even though the three vectors have rather different lengths. When
we look at the contours, we are seeing a one-form ﬁeld , with the one-form
having different values, both magnitude and direction, at different points in the
space. The direction of the gradient always points to higher values.
We will see in Section 3.1.3 that the natural deﬁnition of the gradient of a
function does indeed turn out to be a one-form.
In Figure 2.4, we this time see a 3-d vector crossing three 2-d planes. Note
that, just as you should think of a vector as having a direction and magnitude at
2.2 Tensors, Vectors, and One-Forms 27

Figure 2.4 A vector contracted with one-form planes.

A A2

A1
A = A1 e1 + A 2 e2
e2
e1
Figure 2.5 An oblique basis.

a point, rather than joining two separated points in space, you should think of
a one-form as having a direction and magnitude at a point, and not consisting
of actually separate planes.
With this visualisation, it is natural to talk of A and ²
p as geometrical objects .
When we do so, we are stressing the distinction between, ﬁrstly, A and ² p as
abstract objects and, secondly, their numerical components with respect to a
basis. This is what we meant when we talked, in Section 1.1, about physical
laws depending only on geometrical objects, and not on their components
with respect to a set of basis vectors that we introduce only for our mensural
convenience.

2.2.5 Components
()
I said, above, that the set of MN tensors formed a vector space. Speciﬁcally,
that includes the sets of vectors and one-forms. From Section 2.1.1, this means
that we can ﬁnd a set of n basis vectors { e i } and basis one-forms { ²
ωi } (this is
supposing that the domains of the arguments to our tensors all have the same
dimensionality, n ; this is not a fundamental property of tensors, but it is true in
all the use we make of them, and so this avoids unnecessary complication).
Armed with a set of basis vectors and one-forms, we can write a vector A
and one-form ² p in components as
· ·
A =
i
A ei; ²p = ²i
p iω .
i i
28 2 Vectors, Tensors, and Functions

See Figure 2.1 and Figure 2.5. Crucially, these components are not intrinsic to
the geometrical objects that A and ²
p represent, but instead depend on the vector
or one-form basis that we select. It is absolutely vital that you fully appreciate
that if you change the basis, you change the components of a vector or one-
form (or any tensor) with respect to that basis, but the underlying geometrical
object, A or ²
p or T, does not change. Though this remark seems obvious now,
dealing with it in general is what much of the complication of differential
geometry is about.
Note the (purely conventional) positions of the indexes for these basis
vectors and one-forms, and for the components: the components of vectors
have raised indexes, and the components of one-forms have lowered indexes.
This convention allows us to deﬁne an extremely useful notational shortcut,
which allows us in turn to avoid writing hundreds of summation signs:
Einstein summation convention: whenever we see an index repeated in an
expression, once raised and once lowered, we are to understand a summation
over that index.
Thus:
· ·
i
A ei ≡
i
A ei; ²i
p iω ≡ ²i
pi ω .
i i
We have illustrated this for components and vectors here, but it will apply quite
generally in the following rules for working with components:
Here are the rules for working with components:
1. In any expression, there must be at most two of each index, one raised and
one lowered. If you have more than two, or have both raised or lowered,
you’ve made a mistake. Any indexes ‘left over’ after contraction tell you
the rank of the object of which this is the component.
2. The components are just numbers, and so, as you learned in primary
school, it doesn’t matter what order you multiply them (they don’t
commute with differential signs, though). If they are the components of a
ﬁeld, then the components, as well as the basis vectors, may vary across
the space.
3. The indexes are arbitrary – you can always replace an index letter with
another one, as long as you do it consistently. That is, p i A i = A jp j , and
p i q j T ij = pj q i T ji = p k qi T ki (though p k qi T ki ±= p k q i T ik in general, unless
the tensor T is symmetric).
What happens if we apply ²
p , say, to one of the basis vectors? We have

²p(e j ) = pi ω²i (e j ). (2.4)

2.2 Tensors, Vectors, and One-Forms 29

In principle, we know nothing about the number ²i

ω ( ej ) , since we are at liberty
to make completely independent choices of the vector and one-form bases.
However, we can save ourselves ridiculous amounts of trouble by making a
wise choice, and we will always choose one-form bases to have the property
±
³ ´ 1 if i = j
²ω , e j
i
= ²ω (e j ) =
i
² i
ej ( ω ) =
i
δj ≡
0 otherwise
(2.5)

A one-form basis with this property is said to be dual to the vector basis.
Returning to Eq. (2.4), therefore, we ﬁnd
²p (ej ) = p i ω²i (e j ) = pi δ i j = pj . (2.6)
Thus in the one-form basis that is dual to the vector basis { ej } , the arbitrary
one-form ²p has components p j = ²
p ( ej ) .
Similarly, we can apply the vector A to the one-form basis { ²
ω i } , and obtain

² ²
A ( ω i ) = A j ej ( ω i ) = A j δ j i = A i .

In exactly the same way, we can apply the tensor T to the basis vectors and
one-forms, and obtain
T( ω²i , ²ω j , e k ) = ij
T k. (2.7)
The set of n × n × n numbers { T ij k } are the components of the tensor T in the
basis { e i } and its dual { ²
ωj } . We will generally denote the vector A by simply
writing ‘A i ’, denote ²
()
p by ‘p i ’, and the 21 tensor T by ‘ T ij k ’. Because of the
index convention, we will always know what sort of object we are referring to
by whether the indexes are raised or lowered: the components of vectors always
have their indexes raised, and the components of one-forms always have their
indexes lowered.
Notice the pattern: the components of vectors have raised indexes, but the
basis vectors themselves are in contrast written with lowered indexes, and
vice versa for one-forms. It is this notational convention that allows us to
take advantage of the Einstein summation convention when writing a vector
as A = A i ei or ² p = pi ²i
ω.
We can, obviously, ﬁnd the components of the basis vectors and one-forms
by exactly this method, and ﬁnd
e1 → ( 1, 0, . . ., 0)

e2 → ( 0, 1, . . ., 0)

.. (2.8)
.
en → ( 0, 0, . . ., 1)
30 2 Vectors, Tensors, and Functions

where the numbers on the right-hand-side are the components in the vector
basis, and
²1 →
ω (1, 0, . . ., 0)

²2 →
ω (0, 1, . . ., 0)

.. (2.9)
.
²n →
ω (0, 0, . . ., 1)

where the components are in the one-form basis. Make sure you understand
why Eqs. (2.8) and (2.9) are ‘obvious’.
So what is the value of the expression ²
p(A ) in components? By linearity,

²p (A ) = pi ω²i (A j e j ) = p i A j ²ωi (e j ) = pi A j δi j = p i A i .
This is the contraction of ² p with A . Note particularly that, since ² p and A are
basis-independent, geometrical objects – or quite separately, since ² p(A ) is a
pure number – the number p i A i is basis-independent also, even though the
numbers p i and A i are separately
(2) basis-dependent.
Similarly, contracting the 1 tensor T with one-forms ²
p, ²
q and vector A , we
obtain the number
², ²q, A ) =
T( p pi q j A k T ij k .

If we contract it instead with just the one-forms, we obtain the object

T(²p ,²
q, · ), which is a one-form (since it maps a single vector to a number)
with components
², ²q , · )k =
T( p p i qj T ij k

and the solitary unmatched lower index k on the right-hand-side indicates (or
rather conﬁrms) that this object is a one-form. The indexes are staggered so
that we keep track of which argument they correspond to. I noted before that
the two tensors T( · , ²· ) and T(² · , · ) are different tensors: if the tensor is
symmetric, then T (e i , ²
ω ) = T(²
j ω , e i ) , and thus T i j = T j i , but we cannot simply
j

assume this.
We can also form the contraction of a tensor, by pairing up a vector- and
(2)
one-form-shaped argument, to make a tensor of rank two smaller. Considering
the 1 tensor T as before, we can deﬁne a new tensor S as
S( ²· ) = (²· , ²ωj , e j ),
T (2.10)
where the indexes j are summed over as usual. Pairing up different slots in T

would produce different contracted tensors S . In component form, this is

S i = T ij j . (2.11)
2.2 Tensors, Vectors, and One-Forms 31

In fact, the operation of contraction is deﬁned , as here, on tensors – it takes

a tensor into a tensor of rank two less. Earlier in this section, we defined
contraction as an operation on a vector and one-form; we can now see that
this is just a consequence of this definition: defining S = ² p ⊗ A , we can
immediately form the contraction of this tensor into a scalar, Si i = pi A i , and
this is what we defined as the ‘contraction of ² p and A ’. MTW have a useful
discussion of contraction in their section 3.5, as part of a longer discussion of
ways of producing new tensors from old. [Exercises 2.3–2.5]

2.2.6 The Metric Tensor

()
One thing we do not have yet is any notion of distance, but we can supply that
very easily, by picking a symmetric 02 tensor g, and calling that the metric
tensor , or just ‘the metric’.
The metric allows us to deﬁne an inner product between vectors (which in
other contexts we might call a scalar product or dot product). The inner product
between two vectors A and B is the scalar

A·B= g (A , B)

We can deﬁne the length of a vector as the square root of its inner product with
itself: | A | 2 = g(A , A ). We can also use this to deﬁne an angle θ between two
vectors via

A · B = | A || B | cos θ .

Note that since g is a tensor, it is frame independent, so that the length | A | and
angle θ must be frame-independent quantities also.
We can ﬁnd the components of the metric tensor in the same way we can
ﬁnd the components of any earlier tensor:

g (e i , ej ) = g ij . (2.12)

()
As well as giving us a notion of length, the metric tensor allows us to deﬁne
a mapping between vectors and one-forms. Since it is a 02 tensor, it is a thing
which takes two vectors and turns them into a number. If instead we only
supply a single vector A = A i e i to the metric, we have a thing which takes
one further vector and turns it into a number; but this is just a one-form, which
we will write as ²
A:

²A = g (A ,· ) = g( · , A ). (2.13)
32 2 Vectors, Tensors, and Functions

That is, for any vector A , we have found a way of picking out a single associ-
ated one-form, written ²A . What are the components of this one-form? Easy:

²
A i = A (e i ) = g( ei , A)
= g( ei , Aje j)
= A j g(e i , e j )
= gij A j , (2.14)
from Eq. (2.12) above. That is, the metric tensor can also be regarded as an
‘index lowering ’ operator. ()
Can we do this trick in the other direction, deﬁning a 20 tensor that takes
two one-forms as arguments and turns them into a number? Yes we can, and
the natural way to do it is via the tensor’s components.
The set of numbers g ij is, at one level, just a matrix. Thus if it is non-singular
(and we will always assume that the metric is non-singular), this matrix has an
inverse, and we can take the components of the tensor we’re looking for, gij , to
be the components of this inverse. That means nothing other than
ij i i
g gjk = g k = δ k, (2.15)
We will refer to the tensors corresponding to g ij , g i j and g ij indiscriminately as
‘the metric’.
What happens if we apply g ij to the one-form components A j ?
ij ij k i k
g A j = g gjk A = δ kA = A i, (2.16)
so that the metric can raise components as well as lower them.
There is nothing in the discussion above that says that the tensor g has the
same value at each point in space-time. In general, g is a tensor ﬁeld, and the
different values of the metric at different points in spacetime are associated
with the curvature of that space-time. This is where the physics comes in.
[Exercises 2.6 and 2.7]

2.2.7 Changing Basis

The last very general set of properties we must discover about tensors is what
happens when you change your mind about the sets of basis vectors and one-
forms (you can’t change your mind about just one of them, if they are to remain
dual to each other).
We (now) know that if we have a set of basis vectors { e i } , then we can ﬁnd
the components of an arbitrary vector A to be A i = A (² ω ) , where the { ²
i i
ω } are
the set of basis one-forms that are dual to the vectors { e i } .
2.2 Tensors, Vectors, and One-Forms 33

But there is nothing special about this basis, and we could equally well
have chosen a completely different set { e j̄ } with dual { ²j¯
ω } . With respect to this
basis, the same vector A can be written
A = A ı̄ e ı̄ ,

where, of course, the components are the set of numbers

A = A (ω ).
ı̄
² ı̄

It’s important to remember that A i and A ı̄ are different (sets of) numbers
because they refer to different bases { e i } and { e ı̄ } , but that they correspond
to the same underlying vector A (this is why we distinguish the symbols by
putting a bar on the index i rather than on the base symbol e or A – this
does look odd, I know, but it ends up being notationally tidier than the various
alternatives). 3
Since both these sets of components represent the same underlying object A ,
we naturally expect that they are related to each other, and it is easy to write
down that relation. From before
A ı̄ = A (ωı̄ ) ²
= A i e i (²ı̄
ω )
i
= ı̄
±i A , (2.17)
where we have written the transformation matrix ± as

±i
ı̄
≡ e i (²
ω )≡
ı̄
²ω (ei ).
ı̄
(2.18)
Note that ± is a matrix , not a tensor – there’s no underlying geometrical object,
and we have consequently not staggered its indexes (see also the remarks on
this at the end of Section 2.3.2). Also, note that indexes i and ı̄ are completely
distinct from each other, and arbitrary, and we are using the similarity of
symbols just to emphasise the symmetry of the operation. Exactly analogously,
the components of a one-form ² p transform as
i
pı̄ = ±ı̄ p i , (2.19)
where the transformation matrix is
i
±ı̄ ≡ ²i (e ).
ω ı̄ (2.20)

3 Notation: it is slightly more common to distinguish the bases by a prime on the index, as in e µ ,
i
or even sometimes a hat, eı̂ . I prefer the overbar on the practical grounds that it seems easier to
distinguish in handwriting – try writing ‘; iµ’ three times quickly.
34 2 Vectors, Tensors, and Functions

Since the vector A is the same in both coordinate systems, we must have
i
A = A e i = A e ı̄
ı̄

= A j ±ı̄j E kı̄ e k , (2.21)

where we write E kı̄ as the (initially unknown) components of vector e ı̄ in the

basis { e k } (ie eı̄ = E kı̄ e k ). This immediately requires that ±ı̄j E kı̄ = δ jk , and thus
that the matrix E must be the inverse of the matrix ±. Now we’re going to ﬁnd
the same expression by a different route.
The components of ² ω in the barred basis are (as usual) ²
i i
ω ( eı̄ ) , which
i
is equal to ±ı̄ (from Eq. (2.20)), and similarly the components of e j are
ω ) = ±j (from Eq. (2.18)). So now look at the contraction ²
e j (²
j¯ i
ω ( e j ) in two
j̄

coordinate systems:
δj = ² ω (e j ) = ±ı̄ ²ω (±j e j̄ ) = ±ı̄ ±j ²
i i i ı̄ j¯ i j¯ ı̄ i j¯ ı̄ i ı̄
ω (e j̄ ) = ±ı̄ ±j δ j̄ = ±ı̄ ±j . (2.22)

Thus ±iı̄ and ±ı̄j are matrix inverses of each other; so we must have E iı̄ =
i
±ı̄ , and
i
eı̄ = ±ı̄ ei . (2.23)
The results of Exercise 2.10 amplify this point.
Here, it has been convenient to introduce basis transformations by
focusing on the transformation of the components of vectors and one-forms,
in Eq. (2.17). We could alternatively introduce them by focusing on the
transformation of the basis vectors and one-forms themselves, and this is
the approach used in the discussion of basis transformations in Section 3.1.4.
Finally, it is easy to show that the components of other tensors transform in
()
what might be the expected pattern, with each unbarred index in one coordinate
system matched by a suitable ± term. For example the components of our 21
tensor T will transform as
j̄ k ij
T
ı̄ j̄
k̄ = ı̄
± i ±j ±
k̄
T k. (2.24)

The δ i j are the components of the Kronecker tensor . As a

( ) tensor, this
1
1
maps vectors to vectors, or one-forms to one-forms, via the identity map:
δ(A , ²
· )i = δi j A j = A i . It also has the property that its components are the
j j
same in every basis: ±j̄ ±ı̄i δ ij = ±j¯ ±ı̄j = δ ı̄j̄ . The property Eq. (2.5) means
that tensors δ( · , ²· ) and δ(²· , · ) are equal, which is why we could afford to be
casual, at the end of Section 2.1.1, about whether we write δ i j , δ i j or indeed δji .
Further, Eq. (2.15) shows that this tensor is the same as the metric tensor with
(0) Kronecker
one index raised and one lowered or, equivalently, that the components of the
tensor are δij = gik δ k j = g ij . [Exercises 2.8–2.11]
2
2.2 Tensors, Vectors, and One-Forms 35

2.2.8 Some Misconceptions about Coordinates and Vectors

This section has a dangerous-bend marker, not because it introduces extra
material, but because it should be approached with caution. It addresses some
misconceptions about coordinates that have cropped up at various times.
The danger of mentioning these, of course, is that if you learn about a
misconception that hadn’t occurred to you before, it could make things worse!
Also, I make forward references to material introduced in Chapter 3, so this
section is probably more useful at revision time than on your first read through.
That said, if you’re getting confused about the transformation material that
we’ve discussed so far (and you’re not alone), then the following remarks
might help.
There might be a temptation to write down something that appears to be the
‘coordinate form’:
i
x ı̄ = ı̄
±i x (meaningless). (2.25)
This looks a bit like Eq. (2.23), and a bit like Eq. (2.17); it feels like it should
be describing the transformation of the { x i } coordinates into the { x ı̄ } ones, so
that it may appear to be the analogue of Eq. (2.27). It’s none of these things,
however.
Notice that we haven’t, so far, talked of coordinates at all. When we talked
of components in Section 2.2.5, and of changing bases in Section 2.2.7 (which
of course we understand to be a change of coordinates), we did so by talking
about basis vectors , ei . This is like talking about cartesian coordinates by
talking exclusively about the basis vectors i, j, and k, and avoiding talking
about { x , y, z } .
In Section 3.1.1, we introduce coordinates as functions on the manifold,
{ x i : M → R} , and in Eq. (3.4) we define the basis vectors associated with
them as e i = ∂/∂x i (see also the discussion at the end of Section 3.1.4). Thus
Eq. (2.25) is suggesting that these coordinate functions are linear combinations
of each other; that will generally not be true, and it is possible to get very
confused in Exercise 3.3, for example, by thinking in this way. It is tempting
to look at Eq. (2.25) and interpret the x i as components of the basis vectors,
or something like that, but the real relationship is the other way around: the
basis vectors are derived from the coordinate functions, and show the way in
which the coordinate functions change as you move around the manifold. The
components of the basis vectors are very simple – see Eq. (2.8).
It’s also worth stressing, once we’re talking about misconceptions, that
neither position vectors, nor connecting vectors, are ‘vectors’ in the sense
introduced in this part of the notes . In a flat space, such as the euclidean space
36 2 Vectors, Tensors, and Functions

of our intuitions, or the flat space of Special Relativity, the difference between
them disappears or, to put it another way, there is a one–to-one correspondence
between ‘a vector in the space’ and ‘the difference between two positions’
(which is what a difference vector is). In a curved space, it’s useful to talk
about the former (and we do, at length), but the latter won’t often have much
physical meaning. It is because of this correspondence that we can easily
‘parallel transport’ vectors everywhere in a flat space (see Section 3.3.2), which
means we have been able, at earlier stages in our education, to define vector
differentiation without having to think about it very hard.
If you think of a vector field – that is, a field of vectors, such as you tend to
imagine in the case of the electric field – then the things you imagine existing at
each point in space-time are straightforwardly vectors. That is, they’re a thing
with magnitude and direction (but not spatial extent), defined at each point.

2.3 Examples of Bases and Transformations

So far, so abstract. By now, we are long overdue for some illustrations.

2.3.1 Flat Cartesian Space

Consider the natural vectors on the euclidean plane – that is, the vectors you
learned about in school. The obvious thing to do is to pick our basis to be the
unit vectors along the x and y axes: e 1 = e x and e 2 = e y . That means that the
vector A , for example, which points from one point to a point two units along
and one up, can be written as A = 2e 1 + 1e 2, or in other words that it has
components A 1 = 2, A 2 = 1. We have chosen these basis vectors to be the
usual orthonormal ones: however, we are not required to do this by anything in
Section 2.2, and indeed we cannot even say this at this stage, because we have
not (yet) deﬁned a metric, and so we have no inner product, so that the ideas
of ‘orthogonal’ and ‘unit’ do not yet exist.
What are the one-forms in this space? Possibly surprisingly, there is nothing
in Section 2.2 that tells us what they are, so that we can pick anything we like
as a one-form in the euclidean plane, as long as that one-form-thing obeys
the axioms of a vector space (Section 2.1.1), and as long as whatever rule we
devise for contracting vectors and one-form-things conforms to the constraint
of Eq. (2.2).
For one-forms, then, and as suggested in Section 2.2.4, we’ll choose sets of
planes all parallel to each other, with the property that if we ‘double’ a one-
form, then the spacing between the planes halves (recall Figure 2.4). For our
2.3 Examples of Bases and Transformations 37

contraction rule, ²· , ·³ , we’ll choose: ‘the number ²

³ ´
p , A is the number of planes
of the one-form ² p that the vector A passes through’. If the duality property
Eq. (2.5) is to hold, then this fixes the ‘planes’ of ² 1
ω to be lines perpendicular
to the x -axis, one unit apart, and the planes of ²ω2 to be similarly perpendicular
to the y -axis.
For our metric, we can choose simply
gij = gij = g i j = i
δj [cartesian coordinates]. (2.26)
This means that the length-squared of the vector A = 2e 1 + 1e 2 is
g (A , A ) = g ij A i A j = A 1 A 1 + A 2A 2 = (2)2 + (1)2 = 5,
which corresponds to our familiar value for this, from Pythagoras’ theorem.
The other interesting thing about this metric is that, when we use it to lower
the indexes of an arbitrary vector A , we find that A i = g ij A j = A i . In other
words, for this metric (the natural one for this flat space, with orthonormal
basis vectors) one-forms and vectors have the same components, so that we
can no longer really tell the difference between them, and have to work hard to
think of them as being separate. This is why you have never had to deal with
one-forms before, because the distinction between the two things in the space
of our normal (mathematical) experience is invisible.

2.3.2 Polar Coordinates

An alternative way of devising vectors for the euclidean plane is to use
polar coordinates. It is convenient to introduce these using the transformation
equation Eq. (2.17). The radial and tangential basis vectors are
e1̄ = e r = cos θ e 1 + sin θ e 2 (2.27a)
e 2̄ = e θ = − r sin θ e1 + r cos θ e 2, (2.27b)
where e1 = ex and e2 = ey , as before. Note that these basis vectors vary over
the plane, and that although they are orthogonal (though we ‘don’t know that
yet’ since we haven’t deﬁned a metric), they are not orthonormal. Thus we
can write
1 2
e1̄ = e r = ± 1̄ e 1 + ± 1̄e 2
1 2
e 2̄ = eθ = ± 2̄ e 1 + ± 2̄e 2,

and so discover that

⎛ 1 1
⎞ µ ¶
±
i
ı̄ = ⎝± 1̄ ± 2̄
⎠ = cos θ − r sin θ
, (2.28)
2
± 1̄
2
± 2̄
sin θ r cos θ
38 2 Vectors, Tensors, and Functions

where we have written the matrix ± so that

⎛ 1 ⎞
±12̄
( e 1 , e2 ) ⎝ ⎠
± 1̄
(e 1̄ , e 2̄) = (2.29)
±2 1̄ ±22̄

recovers Eq. (2.23), and in this section alone staggered the indexes of ± to help
keep track of the elements of ± and its inverse, written as matrix expressions.
Therefore, if we require ±ı̄ i ±i j¯ = δjı̄¯ , then we must have
⎛ ⎞ µ ¶
1̄ ± 1̄2
± i
ı̄
= ⎝± 1 ⎠= cos θ sin θ
(2.30)
± 1
2̄
± 2
2̄ − sin θ /r cos θ /r

(you can conﬁrm that if you matrix-multiply Eq. (2.30) by Eq. (2.28), then you
retrieve the unit matrix, as Eq. (2.22) says you should). Symmetrically with
Eq. (2.29), we can now write
⎛ 1̄ ⎞ ⎛ 1̄ 1̄
⎞ ⎛ 1⎞
⎝A ⎠ = ⎝± 1 ± 2
⎠ ⎝A ⎠ , (2.31)
A 2̄ ±2̄ 1 ±2̄2 A2

and discover that

¸ 1̄ ¹ µA1 ¶
A
(e 1̄ , e 2̄) = (e 1 , e 2) ,
A 2̄ A2

where the contraction of vector and one-form is coordinate-independent.

We therefore know how to transform the components of vectors and one-
forms between cartesian and plane polar coordinates. What does the metric
tensor (gij = δ ij in cartesian coordinates, remember) look like in these new
coordinates? The components are just
i j
g ı̄ j¯ = ± ı̄ ± j̄ gij ,

and writing these components out in full, we ﬁnd

µ 1 0
¶
g ı̄ j̄ = . (2.32)
0 r2

We see that, even though coordinates (x , y ) and (r , θ ) are describing the same
flat space, the metric looks a little more complicated in the polar-coordinate
coordinate system than it does in the plain cartesian one, and looking at this
out of context, we would have difficulty identifying the space described by
Eq. (2.32) as flat euclidean space.
We will see a lot more of this in the parts to come.
2.3 Examples of Bases and Transformations 39

The deﬁnition of polar coordinates in Eq. (2.27) is not the one

you are probably familar with, due to the presence of the r in the
transformations. The expressions can be obtained very directly, by choosing as
the transformation
x y ∂x ∂y
er ≡ ± re x + ±r e y = ex + ey
∂r ∂r

x y ∂x ∂y
eθ ≡ ± θ ex + ±θ e y = ex + ey.
∂θ ∂θ

This is known as a coordinate basis , since it is generated directly from the

relationship between the coordinate functions, r 2 = x 2 + y 2 and tan θ = y/x .
Although this is a very natural deﬁnition of the new basis vectors, and is the
type of transformation we will normally prefer, these basis vectors er and e θ
are not of unit length, and indeed are of different lengths at different points in
the coordinate plane. That is why the usual deﬁnition of polar basis vectors is
chosen to be e θ = (1/r )∂ x/∂ θ e x + (1/r )∂ y/∂ θ ey , which is a non-coordinate
basis . See Schutz §5.5 for further discussion.

From Eq. (2.29) we can recover the transformation expression e ı̄ = ±i ı̄ e i by

the usual rules of matrix multiplication. Presenting it this way helps emphasise
that the object ± is indeed ‘just’ a matrix and not anything more exotic;
this matrix representation also appears in one or two of the problems. While
this may be usefully concrete, I do not believe it to be a very useful way
of manipulating these objects in general, because it is easy to get the matrix
the wrong way around (this is why the index-summation notation is a useful
one). Some authors consistently stagger the indexes of the ± matrices here, in
order to help keep things straight; I think that adds notational intricacy for little
practical beneﬁt. I’ll write out ± matrices when the explicitness seems helpful,
but if this representation seems obscure, then don’t worry about it. You won’t
be missing anything deep. [Exercise 2.12]

2.3.3 Matrices, Row, and Column Vectors

We can regard the set of all n -component column vectors as a vector space
(the terminology in this subsection is going to be confusing!). This means
that the n -component row vectors are the one-forms in the vector space dual
to the column vectors. Why? Because a row vector can be contracted with
a column vector in the usual way to produce a number, which is exactly
the deﬁning relation between vectors and one-forms. Similarly the n × n
matrices (which, since you can add(them) and multiply them by scalars, are
also a vector space) are examples of 11 tensors. Why? Because if you contract
40 2 Vectors, Tensors, and Functions

a matrix with a column vector (by the usual means of right-multiplying the
matrix by the column vector) you get a column vector, and if you contract the
row-vector/one-form with a matrix (by the usual means of left-multiplying the
matrix by the row-vector), you get another row-vector/one-form. If you both
left-multiply a matrix by a row vector, and right-multiply it by a column vector,
then you end up, of course, with just a number, in R.
What we have done here is to regard column vectors, row vectors, and ()
(0) , and (1) tensors (in this approach we can’t conveniently find representations
square matrices as representations of the abstract structures of respectively 10 ,
1 1
of higher-rank tensors). We have done two non-trivial things here: (i) we have
selected three vector spaces to play with, and (ii) we have defined ‘function
application’, as required by the definition of a tensor in Section 2.2.1, to be the
()
familiar matrix multiplication. Thus in this representation, any square matrix
is a 11 tensor.

2.3.4 Minkowski Space: Vectors in Special Relativity

The final very important example of the ideas of this part is the space of SR:
Minkowski space . See Schutz, chapter 2, though he comes at the transformation
matrix ± from a slightly different angle.
Here there are four dimensions rather than two, and a basis for the space
is formed from e 0 = e t and e 1,2,3 = e x,y,z . As is conventional in SR, we will
now use Greek indices for the vectors and one-forms, with the understanding
that Greek indices run over { 0, 1, 2, 3} . The metric on this space, in these
coordinates, is
g µν = ηµν ≡ diag(− 1, 1, 1, 1) (2.33)
(note that this convention for the metric, with a signature of + 2, is the
same as in Schutz; some texts choose the opposite convention of ηµν =
diag(+ 1, − 1, − 1, − 1).). See also Section 1.4.3.
Vectors in this space are A = A µ e µ , and we can use the metric to lower
the indexes and form the components in the dual space (which we define in a
similar way to the way we defined the dual space in Section 2.3.1). Thus the
contraction between a one-form ² A and vector B is just
³² ´
A, B = Aµ B
µ
= A 0B 0 + A 1 B 1 + A 2 B 2 + A 3B 3
= ηµν A
µ
B ν = − A 0B 0 + A 1 B 1 + A 2 B 2 + A 3B 3 .

This last expression should be very familiar to you, since it is exactly the
deﬁnition of the norm of two vectors which was so fundamental, and which
seemed so peculiar, in SR.
2.4 Coordinates and Spaces 41

We can deﬁne transformations from these Minkowski-space coordinates to

new ones in the same space, by specifying the elements of a transformation
matrix ± (cf. Schutz §2.2). We can do this however we like, but there is a
subset of these transformations which are particularly useful, since they result
in the metric in the new coordinates having the same form as the metric in the
old one, namely g µ̄ν̄ = ηµ̄ν̄ . One of the simplest sets of such transformation
matrices (parameterised by 0 ≤ v ≤ 1) is
⎛ γ − vγ 0 0
⎞
⎜⎜ − vγ γ 0 0 ⎟⎟
µ̄
±µ (v ) = ⎝ 0 0 1 0 ⎠, (2.34)
0 0 0 1
√
where γ = 1/ 1 − v 2 . Again, this should be rather familiar.
[Exercise 2.13]

2.4 Coordinates and Spaces

There are a few points of terminology that are useful to gather here, even
though their full significance will not be apparent until later sections.
Note that in the previous section we have distinguished cartesian coordi-
nates from a euclidean space . The first is the system of rectilinear coordinates
you learned about in school, and the second is the flat space of our normal
experience (where the ratio of the circumference of a circle to its diameter
is π , and so on). In Section 2.3.1, we talked about euclidean space described by
cartesian coordinates. In Section 2.3.2, we discussed euclidean space, but not
in cartesian coordinates, and in Section 2.3.4 we used rectilinear coordinates
to describe a non-euclidean space, Minkowski space, which does not have the
same metric as the two previous examples.
Euclidean space and Minkowski space are both flat spaces : in each of them,
Euclid’s parallel postulate holds, and there is a global definition of parallelism
in the sense that if a vector is moved parallel to itself from point A to point B ,
then its direction at B will be independent of its path – this is examined more
closely in Chapter 3. This flatness is a property of the metric, and is not a
property of a particular choice of coordinates; Euclidean space is just as flat in
polar coordinates as it is in cartesian coordinates.
The difference between euclidean and Minkowski space is that, for any
choice of coordinates in a euclidean space, we can, with suitable amounts of
linear algebra, find a coordinate transformation that changes the metric in those
coordinates (such as Eq. (2.32)) into the cartesian metric of Eq. (2.26), and, for
42 2 Vectors, Tensors, and Functions

any coordinates in Minkowski space, we can change to coordinates in which

the metric is Eq. (2.33).
The basis vectors { e x , e y , . . .} , in which the metric is Eq. (2.26), are
a cartesian basis , and the basis vectors deﬁned just above Eq. (2.33) are
a Lorentz basis ; in each case, a constant vector ﬁeld will have the same
components at all points in the space (we will come back to this in Chapter 3;
the fact that we can do this at any point of a curved space, to get ‘locally
inertial’ coordinates, will be crucial in Section 3.3.1).

Exercises
Exercises 1–12 of Schutz’s chapter 3 are also useful.

Exercise 2.1 (§ 2.1.1) Demonstrate that the set of ‘ordinary vectors’ does
indeed satisfy these axioms. Demonstrate that the set of all functions also
satisﬁes them and is thus a vector space. Demonstrate that the subset of
functions { eax : a ∈ R} is a vector space (hint: think about what the ‘vector
addition’ operator should be in this case). Can you think of other examples?
[u + ]

Exercise 2.2 (§ 2.1.1) Prove that if { b i } are the components of an arbitrary

vector B with respect to an orthonormal basis { e i } , then b i = B · e i . [d − ]

Tensor components A ij and B ij are equal in one

Exercise 2.3 (§ 2.2.5)
coordinate frame. By considering the transformation law for a 20 tensor
()
(introduced later in this part, in Section 2.2.7), show that they must be equal in
any coordinate frame. Show that if A ij is symmetric in one coordinate frame, it
is symmetric in any frame. [d − u + ]

Exercise 2.4 (§ 2.2.5) So, why are Eq. (2.8) and Eq. (2.9) obvious?
[d − u + ]

Exercise 2.5 (§ 2.2.5) Show that the contraction in Eq. (2.11) is indeed a
tensor. You will need to show that S is linear in its arguments, and independent
of the choice of basis vectors { e i } (you will need Section 2.2.7).

Exercise 2.6 (§ 2.2.6) Justify each of the steps in Eq. (2.14). [d −u + ]

()
Exercise 2.7 (§ 2.2.6) (a) Given that T is (a )12 tensor, A and B are vectors,
²p is a one-form, and g is the metric, give the MN rank of each of the following
2.4 Coordinates and Spaces 43

e2̄ e2
A

θ e1̄
φ
e1
Figure 2.6 Two coordinate systems.

objects, where as usual · represents an unﬁlled argument to the function T (not

all the following are valid; if not, say why):
1. A (²· ) 2. ²
p( · ) 3. T(²· , · , · ) 4. T (²
p, · , · )
5. T (²
p, A , · ) 6. T(²· , A , · ) 7. T(²· , · , B ) 8. T (²· , A , B )
9. T (²
p , A , B) 10. A ( · ) p (²
11. ² ·)
(b) State which of the following are valid expressions representing the
(components
M
) type of theof tensor;
a tensor. For each of the expressions that is a tensor, state the
and for each expression that is not, explain why not.
N

1. g ii 2. g ij T j kl 3. g ik T j kl 4. T i ij
5. g ij A i A j 6. g ij A k A k
If you’re looking at this after studying Chapter 3, then how about (7) A i ,j and
(8) A i ;j ?
What about A i = A (² i
ω )? [d − ]

Exercise 2.8 (§ 2.2.7) Figure 2.6 shows a vector A in two different coordi-
nate systems. We have A = cos θ e 1 + sin θ e 2 = cos(θ − φ)e 1̄ + sin(θ − φ)e 2̄.
Obtain e 1,2 in terms of e 1̄,2̄ and vice versa, and obtain A ı̄ in terms of cos θ ,
cos φ , sin θ and sin φ . Thus identify the components of the matrix ±iı̄ (see also
Eq. (2.23)).

Exercise 2.9 (§ 2.2.7) Show that the tensor S in Eq. (2.10) is independent
of the basis vectors used to deﬁne it.

Exercise 2.10 (§ 2.2.7) Make a table showing all the transformations

ei ↔ ↔ ²
e ı̄ , A i ↔ A ı̄ , ²ωi
ωı̄ and p i ↔ pı̄ , patterned after Eq. (2.17) and
Section 3.1.4. You will need to use the fact that the ± matrices are inverses of
each other, and that A = A i e i = A ı̄ e ı̄ . This is a repetitive, slightly tedious, but
extremely valuable exercise. One point of this is to emphasise that once you
have chosen the transformation for one of these (say, e ı̄ = ±iı̄ e i ), all of the
others are determined. [d − u++ ]
44 2 Vectors, Tensors, and Functions

Exercise 2.11 (§ 2.2.7) By repeating the logic that led to Eq. (2.17), prove
Eq. (2.19) and Eq. (2.24) [use T = T ij k (e i ⊗ e j ⊗ ²k
ω ) ]. Alternatively, and more
directly, use the results of Exercise 2.10. [d + ]

Exercise 2.12 (§ 2.3.2) Consider the vector A = 1e 1 + 1e 2 , where

e 1 and e 2 are the cartesian basis vectors; observe that this vector has the
same components at all points in R2 . Using Eq. (2.17) and the appropriate
transformation matrix for polar coordinates, determine the components of this
vector in the polar basis, as evaluated at the points (r , θ ) = (1, π/4), (1, 0),
( 2, π/4) and (2, 0 ) ? Use the metric for these coordinates, Eq. (2.32), to ﬁnd the
length of A at each of these points. What happens at the origin? [u + ]

Exercise 2.13 (§ 2.3.4) (i) Given a vector with components A µ in a

Minkowski space, write down its coordinates in a different set of Minkowski
coordinates, obtained from the ﬁrst by Eq. (2.34), and verify that these match
the expression for Lorentz-transformed 4-vectors, which you learned about in
SR.
µ
(ii) What is the inverse, ±µ̄ of Eq. (2.34)? Use it to write down the
components of the metric tensor Eq. (2.33) after transformation by this ±,
and verify that g µ̄ν̄ = ηµ̄ν̄ .
(iii) Consider the transformation matrix
⎛a b 0 0
⎞
⎜⎜ c d 0 0⎟⎟,
⎝ 0⎠
µ
±µ̄ = (i)
0 0 1
0 0 0 1
which is the simplest transformation that ‘mixes’ the x - and t -coordinates. By
requiring that gµ̄ν̄ = ηµ̄ν̄ after transformation by this ±, ﬁnd constraints on
the parameters a, b , c , and d (you can freely add the constraint b = c ; why?),
and so deduce the matrix Eq. (2.34). [u + ]
3
Manifolds, Vectors, and Differentiation

In the previous chapter, we carefully worked out the various things we can do
with a set of vectors, one-forms, and tensors, once we have identiﬁed those
objects. Identifying those objects on a curved surface is precisely what we are
about to do now. We discover that we have to take a rather roundabout route
to them.
After deﬁning them suitably, we next want to differentiate them. But as soon
as we can do that, we have all the mathematical technology we need to return
to physics in the next chapter, and start describing gravity.

3.1 The Tangent Vector

You learned about vectors first in school, perhaps by visualising them as ‘a
thing which points’, and possibly defining them using the separations between
points, or using differences in coordinate values, or otherwise defining them
using their components in some coordinate system. That is straightforward
when using euclidean coordinates, but becomes rapidly challenging when
doing so using a general coordinate system. One can develop GR using this
broad approach – Rindler (2006) does so, for example – but it is messy,
is arguably more confusing, and is now regarded as rather old-fashioned.
MTW (1973) were amongst the first to popularise a more ‘geometric’ or
‘coordinate-free’ approach that stresses the basis-independence of physical
quantities, but which requires a more subtle approach to defining vectors. This
approach defines vectors at a point, with no remnant of the separations between
coordinate points that we learned at first.

45
46 3 Manifolds, Vectors, and Differentiation

t
λ(t) R

M P
TP (M )

xi (P )
R
Figure 3.1 A manifold M, with a curve λ : R → M (dashed), a point on the
manifold P = λ(t), and a coordinate function xi : M → R . Curves that go through
the point P have tangent vectors, at P (shaded arrow), which lie in the space TP(M )
(shaded). The function xi(t) = (xi ◦ λ)(t) is therefore R → R.

3.1.1 Manifolds and Functions

The arena in which everything happens is the manifold . A manifold is a set
of points, with the only extra structure being enough to allow continuous
functions to be defined on it (Figure 3.1). In particular, a manifold does not
have a metric defined.
A chart is a set of functions { x 1, . . ., x n} that between them map points
on the manifold to Rn. We may need more than one chart to cover all of the
manifold. In other words, it is a coordinate system , xi : M → R. The fact that
the range of this map is (flat) Rn allows us to say that the manifold is locally
euclidean .
Now consider a path on the manifold – this is just a continuous sequence of
points. We distinguish this from a curve , λ(t ) : R → M , which is a mapping
from a parameter t to points on a path – two mappings that map to the same
path but with different parameterisation are different curves. ( )
If we put these ideas together, and think of the functions x 1 λ(t ) , . . .,
( )
x n λ(t ) , then we have a set of mappings from the curve parameter to the
coordinates. The properties of the manifold tell us that these are smooth
functions x i(t ) = xi (λ(t )) : R → R, so we can differentiate with respect to
the parameter, t .
We can regard a chart as a reference frame , and I will generally use the latter
term below.

There’s quite a lot we could say about manifolds: for a mathemati-

cian there’s a long backstory before we get to this starting point.
Mathematicians have to be precise about what minimal structures must be
present before a deﬁnition of differentiation is possible; in other words, what
3.1 The Tangent Vector 47

structures are there, the removal of which would make it impossible to define
a derivative? For our present purposes we can do the traditional physicists’
thing and ignore such niceties, and assume that our spaces of interest are well
enough behaved that they can support coordinates and curves as discussed in
this section. Later in your study of GR, you may have to become aware of these
conditions again, in a detailed study of black holes (the point of singularity is
not in the manifold, or more specifically in any open subset of it) or the large-
scale structure of space-time (where the overall topology of the space becomes
important).
That said, for completeness, I’ll mention a few details about what
structure we already have at this point. A manifold, M , is a set (which
we naturally think of here as a set of points) in which we can identify open
subsets; we can identify such a subset (that is, a ‘neighbourhood’) for every
point in the manifold ( ∀ p ∈ M , ∃ S ⊂ M such that p ∈ S). We can smoothly
patch together multiple such subsets to cover the manifold, even though there
might not be any single subset that covers the entire manifold. At this point
we have a ‘topological space’. We then may or may not be able to define maps
from each of these subsets to Rn (these are the aforementioned ‘charts’); if we
can, and with the same n in all cases, we have an n -dimensional manifold (that
is, each such subset is homeomorphic to Rn; two spaces are homeomorphic
if there is a continuous bijection with a continuous inverse, which maps one
to the other; each such subset ‘looks like’ Rn). That is, the dimension of the
manifold, n, is a property of the manifold, and not a consequence of any
arbitrariness in the number of coordinate functions we use. These maps must be
continuous, but we will also assume that they are as differentiable as we need
them to be. Carroll (2004, § 2.2) has an excellent description of the sequence
of ideas, along with examples of spaces that are and are not manifolds. It’s also
possible to say quite a lot more about the precise relationship between charts,
coordinates, and frames, but we already have as much detail as we need.

3.1.2 Deﬁning the Tangent Vector

Now think of a function f : M → R which is deﬁned on the manifold M , and
therefore at every point along the curve λ. The function f is therefore also a
function (Rn → R) of the coordinates of the points along that curve, or

f = f
(λ(t )) = f ±x 1(λ(t )) , . . ., x n (λ(t))²,
which we can write as just (R → R)
( )
f = f x1 (t), . . ., x n(t) .
48 3 Manifolds, Vectors, and Differentiation

d/ds
µ(s)
d/dr
τ (r )
P
λ(t)
d/dt

Figure 3.2 Multiple curves through P.

Be aware that, strictly, we are talking about three different functions here,
namely f (P ) : M → R, and f (x 1 , . . ., x n ) : Rn → R, and f (x 1 (t ), . . ., x n (t )) :
R → R. Giving all three functions the same name, f , is a sort of pun.
Similarly, we will think of x i as interchangeably a function x i (P ) : M → R,
or x i (λ(t)) : R → R, or as the number that is one of the arguments of the
function f (x1 , . . ., x n). When we manipulate these objects in the rest of the
book, it should be clear which interpretation we mean: when we write ∂ f /∂ x i ,
for example, we are thinking of f as f (x 1 , . . ., xn ), and thinking of the x i as
numbers; when we write ∂ x i/∂ x j = δ i j we simply mean that the coordinate
function x i is independent of the value of the coordinate x j .
So how does f vary as we move along the curve? Easy:
df ³
n i
∂x ∂f
= .
dt ∂ t ∂ xi
i= 1
However, since this is true of any function f , we can write instead
d ³ ∂xi ∂
= . (3.1)
dt ∂ t ∂ xi
i
We can now derive two important properties of this derivative. Consider the
same path parameterised by ta = t /a . We have
df ³ ∂xi ∂f
=
d ta ∂ t a ∂ xi
i
³ ∂xi ∂f
= a
∂t ∂xi
i
df
= a (3.2)
dt
Next, consider another curve µ( s ), which crosses curve λ(t ) at point P . We can
therefore write, at P ,
df df ³´ ∂xi ∂xi
µ ∂f ³ ∂xi ∂ f df
a + b = a + b = = , (3.3)
ds dt ∂s ∂t ∂xi ∂r ∂xi dr
i i
3.1 The Tangent Vector 49

for some further curve τ (r ) that also passes through point P (see Figure 3.2).
But now look what we have discovered. Whatever sort of thing d/dt is, a d/dt
is the same type of thing (from Eq. (3.2)), and so is ad/ds + b d/dt . But now
we can look at Section 2.1.1, and realise that these derivative-things deﬁned
at P, which we’ll write (d/dt)P , satisfy the axioms of a vector space . Thus the

(1) (d/dt )P are another example of things that can be regarded as vectors,
things
or 0 tensors. The thing (d/dt)P is referred to as a tangent vector .
When we talk of ‘vectors’ from here on, it is these tangent vectors that we
mean.
A vector V = (d/dt)P has rather a double life. Viewed as a derivative, V is
just an operator, which acts on a function f to give
´dµ df ¶¶
¶
dt ¶t (P)
Vf = f = ,
dt P

the rate of change of f along the curve λ(t ), evaluated at P . There’s nothing par-
ticularly exotic there. What we have just discovered, however, is that this object
()
V = (d/dt )P can also, separately , be regarded as an element of a vector space
associated with the point P , and as such is a 10 tensor, which is to say a thing
· ¹
that takes a one-form as an argument, to produce a number that we will write
as ω̧, V , for some one-form ω̧ (we will see in a moment what this one-form is;
it is not the function f ). This dual aspect does seem confusing, and makes the
object V seem more exotic than it really is, but it will (or should be!) always
clear from context which facet of the vector is being referred to at any point.
We’ll denote the set of these directional derivatives as T P (M ), the tangent
plane of the manifold M at the point P. It is very important to note that T P(M )
and, say, T Q (M ) – the tangent planes at two different points, P and Q , of
the manifold – are different spaces , and have nothing to do with one another
a priori (though we want them to be related, and this is ultimately why we
introduce the connection in Section 3.2).
With this in mind, we can reread Eq. (3.1) as a vector equation, identifying
the vectors
´∂ µ
ei = (3.4)
∂xi P

as a basis for the tangent-plane, and the numbers ∂ x i/∂ t as the components of
the vector V = (d/dt )P in this basis, or
´ dµ ³ ∂xi ´ ∂ µ
=
dt P ∂t ∂xi P
i
i
V = V ei.
50 3 Manifolds, Vectors, and Differentiation

So, I’ve shown you that we can regard the (d/dt )P as vectors; the rest of this
part of the book should convince you that this is additionally a useful thing
to do.

3.1.3 The Gradient One-Form

Consider a function f , defined on the manifold. This is a field , which is to say
it is a rule that associates an object – in this case the number that is the value
of the function – with each point on the manifold (see Section 2.2.3). Given
this function, there is a particular one-form field that we can define (that is, a
rule for associating a one-form with each point in the manifold), namely the
gradient one-form ḑf . Given a vector V = (d/dt )P , which is the tangent to a
curve λ(t ), the gradient one-form is defined by its contraction with this vector:
· ¹ º d
» ´ dµ ¶
df ¶¶
dt ¶P
ḑf , V = ḑf , = ḑf ≡ . (3.5)
dt dt
The first two equalities here simply express notational equivalences; it is the
third equivalence that constitutes the definition of the gradient one-form’s
action. This contraction between a vector and a gradient field is illustrated in
Figure 2.3.
What does Eq. (3.5) look like in component form? Writing V = d/dt =
V i e i , we can write
¶
df ¶¶ · ¹ · ¹ df
dt P¶ = ḑf , V = V i ḑf , e i = V i i .
dx
(3.6)

Now consider the gradient one-form associated with, not f , but one of the
coordinate functions xi (from Section 3.1.1, recall that the coordinates are just
a set of functions on the manifold, and in this sense not importantly different
from an arbitrary function f ). We write these one-forms as simply ḑx i : what
is their action on the basis vectors ei = ∂/∂ x i (from Eq. (3.4))? Directly from
Eq. (3.5),
´∂µ ∂ xi
i i
ḑx = = δ j (3.7)
∂xj ∂ xj

so that, comparing this with Eq. (2.5), we see that the set ω̧i = ḑxi forms a
basis for the one-forms, which is dual to the vector basis e i = ∂/∂ x i .
[Exercises 3.1 and 3.2]

3.1.4 Basis Transformations

What does a change of basis look like in this new notation? If we decide
that we do not like the coordinate functions xi and decide to use instead
3.1 The Tangent Vector 51

functions x ı̄ , how does this appear in our formalism, and how does it compare
to Section 2.2.7?
The new coordinates will generate a set of basis vectors
∂
e ı̄ = . (3.8)
∂ x ı̄

This new basis will be related to the old one by a linear transformation
j
eı̄ = ±ı̄ e j

and the corresponding one-form basis will be related via the inverse transfor-
mation
j
ω̧
ı̄
= ±j ω̧
ı̄

(recall Exercise 2.10). Thus, from Eq. (2.18),

´∂ µ ∂ x ı̄
±j
ı̄
= ı̄
ω̧ ( e j ) = ḑx ı̄
= (3.9a)
∂xj ∂xj
´∂µ ∂x
j
j j
±ı̄ = ω̧ ( e ı̄ ) = ḑx j = . (3.9b)
∂ x ı̄ ∂ x ı̄

Note that Eq. (3.8) does the right thing if we, for example, double the value
of the coordinate function x i . If x i doubles, then ∂ f /∂ x i halves, but Eq. (3.8)
then implies that ei halves, which means that the corresponding component
of V , namely V i , doubles, so that V i ∂ f /∂ x i is unchanged, as expected.

Note that the transformation matrix is deﬁned as transforming the ba-

sis vectors and one-forms; as an immediate consequence it can trans-
form the vector and one-form components also (as discussed in Section 2.2.7).
Because of the choice in Eq. (3.8) of basis vectors as the differentials of the
coordinate functions, the transformation matrix also describes a transformation
between coordinate systems. This choice of basis vectors is a coordinate basis
– see the ‘dangerous bend’ paragraph Section 2.3.2 for discussion of non-
coordinate bases.

If we consider a curve λ(t), which is such that ∂ x 1 λ(t) /∂ t = 1

( )
( )
and x i λ(t) = ci (constant) for i > 1 (i.e., this is a ‘grid line’),
then simply comparing with Eq. (3.1) we see that d/dt = ∂/∂x 1 . Thus in a
coordinate basis, where e i = ∂/∂ x i , the ith basis vector at any point is tangent
to the ith ‘grid line’, which matches the intuition we have for the basis vectors
e x , e y , and so on, in ordinary plane geometry.

[Exercises 3.3–3.5]
52 3 Manifolds, Vectors, and Differentiation

3.2 Covariant Differentiation in Flat Spaces

We are now finally in a position to move on to the central tool of this
chapter, the ideas of coordinate-independent differentiation of tensors, parallel
transport, and curvature. We will make this move in two steps: first, we will
learn how to handle the situation where the basis vectors of the space of interest
are different at different points in the space, but confining ourselves to flat
(euclidean) space, where we already know how to do most of the calculations;
second, we will discover the rather simple step involved in transferring this
knowledge to the case of fully curved spaces.

There are other ways of introducing the covariant derivative, which

are very insightful, but more than a little abstract. Stewart (1991, §1.7)
introduces it in an axiomatic way which makes clear the tensorial nature of the
covariant derivative from the very outset, as well as its linearities and some
of its other properties. Schutz (1980, chapter 6) introduces it in a typically
elegant way, via parallel transport, and emphasising the ultimate arbitrariness
of the precise differentiation rule. Both of these routes define a connection that
is more general than the one that, following Schutz (i.e., Schutz (2009)), we
are about to derive, which is known as the ‘metric connection’. Both of these
books promptly specialise to the metric connection, but it seems useful here to
build up this specific connection step by step, emphasising the link to changes
of bases, and going via the covariant derivative in flat spaces. Equation (3.35),
which we derive rather than assert, is the defining property of this connection.
Chapter 10 of MTW gives a very good, and visual, introduction to covariant
differentiation, though approaching it from a somewhat different direction.
The point of this tool – the goal we are aiming for – is this: given some
geometrical object V of physical interest (such as an electric field in a space, or
a strain tensor in some medium), we want to be able to talk about how it varies
as we move around a space, in a way that doesn’t depend on the coordinates
we have chosen.

3.2.1 Differentiation of Basis Vectors

This section is to some extent another notation section, in that it is describing
something you already know how to do, but in more elaborate and powerful
language.
You will in the past have dealt with calculus in curvilinear coordinate sys-
tems and produced such results as the Laplacian in spherical polar coordinates
1 ∂
´ µ 1
´ µ 1 ∂2
2 2 ∂ ∂ ∂
∇ = r + 2 sin θ + .
r2 ∂ r ∂r r sin θ ∂θ ∂θ r 2 sin2 θ ∂φ2
3.2 Covariant Differentiation in Flat Spaces 53

er d er /d θ
d eθ /d θ
d eθ /d r

dθ
eθ
dθ dr

Figure 3.3 The basis of polar coordinates, and their derivatives.

We are now aiming for much the same destination, but by a slightly different
route. This follows Schutz §§5.3–5.5 quite closely.
In order to illustrate the process, we will examine the basis vectors of
(plane) polar coordinates, as expressed in terms of the cartesian basis vectors e x
and e y . We will promptly see that our formalism is not restricted to this route.
The basis vectors of polar coordinates are
er = cos θ e x + sin θ e y (3.10a)
e θ = − r sin θ e x + r cos θ ey (3.10b)
(compare the ‘dangerous bend’ discussion of coordinate bases in Section
2.3.2). A little algebra shows that
∂
er = 0 (3.11a)
∂r
∂ 1
er = eθ (3.11b)
∂θ r
∂ 1
eθ = eθ (3.11c)
∂r r
∂
e θ = − r er (3.11d)
∂θ

so that we can see how the basis vectors change as we move to different points
in the plane (Figure 3.3), unlike the cartesian basis vectors.
At any point in the plane, a vector V has components (V r , V θ ) in the polar
basis at that point . We can differentiate this vector with respect to, say, r , in
the obvious way
∂V ∂ r
= (V er + V e θ )
θ

∂r ∂r
r θ
∂V r ∂ er ∂V ∂ eθ
= er + V + eθ + V
θ
,
∂r ∂r ∂r ∂r
54 3 Manifolds, Vectors, and Differentiation

or, in index notation, with the summation index i running over the ‘indexes’ r
and θ ,
∂V ∂ i
= (V ei )
∂r ∂r
i
∂V i ∂ ei
= ei + V . (3.12)
∂r ∂r

So much for polar coordinates.

Having illustrated the process with polar coordinates, we can obtain the gen-
eral expression either by replacing r , in Eq. (3.12), by a general coordinate x j ,
or (more directly) by using the Leibniz rule on V = V i e i , to obtain
i
∂V ∂V ∂ei
= ei + V i . (3.13)
∂ xj ∂xj ∂xj

In cartesian coordinates, the second term in this expression is identically zero,

since the basis vectors are the same everywhere on the plane, and so in those
coordinates we can obtain the derivative of a vector by simply differentiating
its components (the ﬁrst term in Eq. (3.13)). This is not true when we are using
curvilinear coordinates, and the second term comes in when we are obliged to
worry about how the basis vectors are different at different points on the plane.
Now, the second term in Eq. (3.13), ∂ e i/∂ x j , is itself a vector, so that it is a
linear combination of the basis vectors, with coefﬁcients ²ijk :
∂ei k
= ²ij e k . (3.14)
∂xj

These coefﬁcients ²ijk are known as the Christoffel symbols , and this set of n ×
n × n numbers encodes all the information we need about how the coordinates,
and their associated basis vectors, change within the space.1 The object ² is
not a tensor – it is merely a collection of numbers – so its indexes are not
staggered (just like the transformation matrix ±).
Returning to polar coordinates, we discover that we have already done all
the work required to calculate the relevant Christoffel symbols. If we compare
Eq. (3.14) with Eq. (3.11) (replacing e r ±→ e 1 and e θ ±→ e2 ), we see, for
example, that
∂ e1 ∂er 1 2
= = ²12 e 1 + ²12 e 2
∂ x2 ∂θ
1
= 0e 1 + e2,
r

1 There seems to be no universal consensus on whether ² i is referred to as the Christoffel

jk
‘symbol’ or ‘symbols’, plural. The former seems more rational, but the latter seems less
awkward in prose, and is the version that I will generally use in the text to follow.
3.2 Covariant Differentiation in Flat Spaces 55

1
so that ²12 = 0 and ²212 = 1/r . We will sometimes write this, slightly slangily,
r
as ²r θ = 0 and ²θrθ = 1/r . By continuing to match Eq. (3.11) with Eq. (3.14),
we ﬁnd
2 2 1 1
²12 = ²21 = , ²22 = − r , others zero, (3.15)
r
or
1 r
θ
²r θ = θ
²θ r = , ²θ θ = − r, others zero. (3.16)
r
[Exercise 3.6]

3.2.2 The Covariant Derivative in Flat Spaces

More notation: If we rewrite Eq. (3.13) including Eq. (3.14), relabel and
reorder, we ﬁnd
∂V
´ ∂V i µ
= + V k ²ikj ei . (3.17)
∂x j ∂x j

For each j this is a vector at each point in the space – that is to say, it is
a vector ﬁeld – with components given by the term in brackets. We denote
these components of the vector ﬁeld by the notation V i ; j , where the semicolon
denotes covariant differentiation. We can further denote the derivative of the
component with a comma: ∂ V i/∂ x j ≡ V i , j . Then we can write
∂V
= V i; jei (3.18a)
∂xj
i i k i
V ;j = V ,j + V ²kj . (3.18b)

It is important to be clear about what you are looking at, here. The objects V i ; j
are numbers, which are the components , indexed by i, of a set of vectors ,
indexed by j. They look rather like tensor components, however, because of
how we have chosen to write them; and we are about to deduce that that is
exactly what they are in fact.2 But components of which tensor?
Final step: looking back at Eq. (3.8), we see that the differential operator
j
∂/∂ x in Eq. (3.17) is associated with the basis vector e j , and this is consistent
with what we saw in Section 3.1.4: that e j is proportional to ∂/∂ x j , and thus
that ∂ V /∂ x j , in Eq. (3.18a), is proportional to ej also. That linearity permits us
()
to deﬁne a 11 tensor, which we shall call ∇ V , which we shall deﬁne by saying

2 It’s also unfortunate that the notation includes common punctuation characters: at the risk of
stating the obvious, note that any commas or semicolons followingsuch notation is part of the
surrounding text.
56 3 Manifolds, Vectors, and Differentiation

that the action of it on the vector e j is the vector ∂ V /∂ x j in Eq. (3.17). That is,
using the notation of Chapter 2, we could write

(∇ ¸
V )( · , e j ) ≡
∂V

∂ xj
( ¸· ) (3.19)

as the deﬁnition of the tensor ∇ V . For notational convenience, we prefer to

write this instead as
∂V
∇ ej V = , (3.20)
∂ xj

where both sides of this equation are, of course, vectors.

This tensor ∇ V is called the covariant derivative of V , and its compo-
nents are
(∇ V )i j ≡ (∇ V )(ω̧i ; e j ) ≡ (∇ ej V )i ≡ (∇ j V )i = V i ;j , (3.21)
where the first equivalence is what we mean by the components of a tensor,
the second is the definition of the tensor, restated from the text immediately
above Eq. (3.19), the third is a notational convenience, which applies in the
case where the argument vector is a basis vector, and the equality indicates
the numerical value of this object – the i th component of the vector ∇ j V – via
Eq. (3.20) and Eq. (3.18a).
You will also sometimes see an expression such as ∇ X V . This is the
covariant derivative of V , contracted with X . In component form, this is
∇ X V = ∇ V (¸· , X ) = X i ∇ V (¸· , ei ) = X i ∇ i V = X i V j ;i e j . (3.22)
We have introduced a blizzard of notations here. Remember that they are all
notational variants of the same underlying object, namely the tensor ∇ V . Make
sure you understand how to go from one variant to the other, and why they
relate in the way they do.
Note : it is easy to misread the notation ∇ X Y as being some sort of tensorial
operation of ∇ on arguments X and Y , and thus to leap to the conclusion that
()
()
this is therefore linear in X and Y . That would be wrong. We have here a 10
tensor Y and a corresponding 11 tensor ∇ Y . If we supply the latter with two
arguments X and p̧ then we could write the result as ∇ Y (p̧ , X ) or, looking at
the first notational eccentricity of Eq. (3.22) as ∇ X Y (p̧). That reminds us that
the thing written ∇ X Y is a vector. Exercise 3.10 is instructive here.
Here’s where we’ve got to: we’ve managed to define a tensor field related
to V , called the covariant derivative, and written ∇ V , which (since it is a
tensor) is independent of any coordinate system, and so, in particular, doesn’t
pick out any coordinate system as special. If we need its components in a
specific system { x k } , however, because we need to do some calculations, we
3.2 Covariant Differentiation in Flat Spaces 57

can ﬁnd them easily, via Eq. (3.18), or by transforming the components from
a system where we already know them (such as cartesian coordinates) into the
system { xk } – we know we can do this because we know that ∇ V is a tensor,
so we know how its components transform.
Finally, here, note that a scalar is independent of any coordinate system,
therefore all the complications of this section, which essentially involve
dealing with the fact that basis vectors are different at different points on the
manifold, disappear, and we discover that we have already obtained a covariant
derivative of a scalar, in Eq. (3.5). Thus
∇ f ≡ ḑf (3.23)
and (where V is tangent to a curve with coordinate t )
∂f
∇ V f = ḑf (V ) = . (3.24)
∂t

If instead we take V = e j = ∂/∂ x j , then

∂f
∇ jf = (3.25)
∂xj

(cf. Schutz Eq. (5.53)). From this we can deduce the expression for the
covariant derivative of a one-form, which we shall simply quote as:
( ∇ j p̧ ) i ≡ (∇ p̧ )ij ≡ p i;j = p i,j − pk ²ijk . (3.26)

( )from Eq. (3.18).

Note the sign difference
The derivative of a 11 tensor is
∇ j T k l ≡ T k l ;j = T k l,j + k i
²ij T l − i k
²lj T i . (3.27)
Note firstly how systematic this expression is, and that it is systematically
extensible to tensors of higher rank – there is one + ² term for each upper
tensor index, and one − ² term for each lower index. The expression looks
hard to remember, but is easier than it looks, since, given the overall pattern,
there is only one consistent way the indexes can fit in to each term.
Secondly, note that in Eq. (3.27) and each of the expressions that it
generalises, back to Eq. (3.23), the connection ∇ acting on the tensor X forms
a tensor field ∇ X that has one more lower index than X has – that is, it has one
more vector argument than X. Supplying a vector V to this derivative produces
∇ V X, which is the rate of change of the tensor field X along the direction V .
Also, the Leibniz (or product) rule applies
∇ j (p k V k ) = pk ;j V k + p k V k ;j . (3.28)
See Schutz §5.3 for details.
58 3 Manifolds, Vectors, and Differentiation

In this discussion of vector differentiation, built up since the be-

ginning of Section 3.2.1, we have not had to recall anything other
than that the vectors e i are the basis vectors of a vector space. That is,
there is no complication arising from our deﬁnition of the vectors as tangent
vectors, associated with the derivative of a function along a curve; there is no
meaningful sense in which this is a ‘second derivative’.
[Exercises 3.7–3.11]

3.2.3 The Metric and the Christoffel Symbols

The covariant derivative, and the Christoffel symbols, give us information
about how, and how quickly, the basis vectors change as we move about a
space. It is therefore no surprise to ﬁnd that there is a deep connection between
these and the metric , which gives information about distances within a space
()
(this argument is taken from Schutz §5.4).
Remember that the metric (a 02 tensor) allows us to identify a particular
one-form associated with a given vector:
V̧ = g (V , · ). (3.29)
What is the derivative of this one-form?
Note that this is a purely geometrical (i.e., coordinate-independent) equa-
tion; that means that we can choose which coordinates to use, to calculate
with it. We will choose cartesian coordinates, in which the basis vectors
are constant, so that covariant differentiation involves the derivative of the
components, only: ∇ j V = ∂ V i /∂ x j e i (this is again an application of the Leibniz
rule). Thus
´ ∂V i µ ∂V
i
g( ∇ jV , · ) = g e, ·
j i
= g( e i , · ). (3.30)
∂x ∂xj

If we write p̧ = g (e i , · ), then we discover that p̧(e j ) = g(e i , e j ) = δij in these

coordinates. Thus p̧ is one of the set of dual one-forms: ω̧i = g (ei , · ).
Differentiating now the left-hand side of Eq. (3.29) (and observing that if
the basis vectors are constant, the basis one-forms must be as well), we ﬁnd
∂V i
∇ j V̧ = ∇ j (V i ω̧i ) = ω̧
i
. (3.31)
∂xj

However in these coordinates, V i = V i , so that the right-hand sides of

Eqs. (3.30) and (3.31) are equal, as component equations. But both of these
are tensor equations, and so if the components are equal in one coordinate
system, then they are equal in any coordinate system (cf. Exercise 2.3), which
means that the left-hand sides are equal as tensors, and
3.3 Covariant Differentiation in Curved Spaces 59

∇ j V̧ = g( ∇ j V , · ). (3.32)
In components (and in all coordinate systems),
V i = g ik V k (3.33)
V i;j = g ik V k ;j . (3.34)

The ﬁrst equation here is just the component form of Eq. (3.29) (compare
Eq. (2.14)). Note that the latter equation (which we obtained by comparing
Eqs. (3.32) and (3.26)) is not trivial. From the properties of the metric we
know that there exists some tensor that has components A ij = g ik V k ;j : what
this expression tells us is the nontrivial statement that this A ij is exactly V i;j .
That is to say that we did not get Eq. (3.34) by differentiating Eq. (3.33),
though it looks rather as if we did. What do we get by differentiating
Eq. (3.33)? By the Leibniz rule, Eq. (3.28),
V i ;j = g ik;j V k + gik V k ;j .

But comparing this with Eq. (3.34), we see that the ﬁrst term on the right-
hand side must be zero, for arbitrary V . Thus, in all coordinate systems (and
relabelling),
g ij;k = 0. (3.35)
We have not exhausted the link between covariant differentiation and the
metric. The two are related via
i 1 il
²jk = 2 g (g jl,k + g kl,j − gjk ,l ). (3.36)
The proof is in Schutz §5.4, leading up to his equation (5.75); it is not long
but involves, in Schutz’s words, some ‘advanced index gymnastics’. It depends
on ﬁrst proving that
k k
²ij = ²ji , in all coordinate systems. (3.37)
Equation (3.36) completely cuts the link between the Christoffel symbols
and cartesian coordinates, which might have lingered in your mind after
Section 3.2.2 – once we have a metric, we can work out the Christoffel
symbols’ components immediately. [Exercises 3.12–3.14]

3.3 Covariant Differentiation in Curved Spaces

Having done all this work to develop covariant differentiation in ﬂat space,
but in purely geometrical terms, it might be a surprise to discover that there is
60 3 Manifolds, Vectors, and Differentiation

actually rather little to do to bring this over to the most general case of curved
spaces. See Schutz §§6.2–6.4.
The ﬁrst step is to deﬁne carefully the notion of a ‘local inertial frame’.

3.3.1 Local Inertial Frames: The Local Flatness Theorem

Recall from Section 3.1.1 that a manifold is little more than a collection
of points. What( gives
) this manifold shape is the metric tensor g, which
0
is a symmetric 2 tensor which, in a particular coordinate system, has the
components gij , which we can choose more or less how we like. In a different
coordinate system, this same tensor will have different components g ı̄j¯ . The
question is, can we find a coordinate system in which the metric has the
particular form ηı̄ j̄ = diag (− 1, 1, 1, 1 )? That is, can we find a coordinate
transformation ±ı̄i which transforms the coordinates x i into the coordinates x ı̄
in which the metric is diagonal?
If the matrix g ij does not have three positive and one negative eigenvalues
(i.e., a signature of + 3 − 1 = + 2), then no, we cannot, and the metric in
question is uninteresting to us because it cannot describe our universe. If the
metric does have a signature of + 2, however, then it is a theorem of linear
algebra that we can indeed find a transformation to coordinates in which the
metric is diagonal at a point.
But we can do better than this. Recall that both gij and ±ı̄i are continuous
functions of position; within the constraints that g be symmetric and ± be
invertible, they are arbitrary. By choosing the numbers ±ı̄i and their first
derivatives, we can find coordinates that have their origin at P and in which
( k̄ ) (( )2 )
gı̄ j̄ x = ηı̄ j¯ + O x k̄

(compare Taylor’s theorem), or

gı̄ j¯ (P ) = ηı̄ j̄ (3.38a)

g ı̄ j¯,k̄ (P ) = 0 (3.38b)
g ı̄ j̄ ,k̄ l̄(P ) ²= 0. (3.38c)

This is the local ﬂatness theorem , and the coordinates xk̄ represent a local
inertial frame , or LIF.
These coordinates are also known as ‘normal’ or ‘geodesic’ coordinates,
and geodesics expressed in these coordinates have a particularly simple form.
Also, in these coordinates, we can see from Eq. (3.36) that ²ijk = 0 at P , which
is just another way of saying that this space is locally ﬂat.
3.3 Covariant Differentiation in Curved Spaces 61

Schutz proves the theorem at the end of his §6.2, and Carroll (2004) in §2.5;
both are very illuminating.

3.3.2 The Covariant Derivative in Curved Spaces

You know how to differentiate things. For some function f : R → R,

df f (x + h) − f (x )
= lim . (3.39)
dx h→ 0 h
That’s straightforward because it’s obvious what f (x + h )− f (x ) means, and how
we divide that by a number. Surely we can do a similar thing with vectors on a
manifold. Not trivially, because remember that the vectors at P are not defined
on the manifold but on the tangent plane T P (M ) associated with that point, so
the vectors at a different point Q are in a completely different space T Q (M ), and
so in turn it’s not obvious how to ‘subtract’ one vector from the other. Differen-
tiation on the manifold consists of finding ways to define just that ‘subtraction’.
There are several ways to do this. One produces the ‘Lie derivative’, which
is important in many respects, but which we will not examine.

The Lie derivative is a coordinate-independent derivative deﬁned in

terms of a vector field. A vector field X has integral curves such that,
at each point p on the integral curve, the curve’s tangent vector is X (p). As
an example, stream lines in a fluid are integral curves of the fluid’s velocity
vector field. The Lie derivative of a function at a point p , written (£X f )p , is
defined as the rate of change of the function along the (unique) integral curve
of X going through p , and Lie derivatives of higher-order tensors are defined in
an analogous way. The disadvantage of this type of derivative is that it clearly
depends on an auxiliary vector field X ; but the compensating advantage is that
it does not depend on a metric tensor, or any other definition of distance. These
make it less useful than the covariant derivative for most GR applications, but it
remains useful in other contexts, such as those where there is already an impor-
tant vector field present, including applications in fluid dynamics. For details,
see Stewart (1991) or Schutz (1980), or look at exercise 39 in Schutz’s §6.9.
The other way to define this ‘subtraction’ uses the notion of ‘parallel
transport’, which we define and examine now.
You parallel transport a vector along a curve by moving it so that the
vectors at any two infinitesimally separated points are deemed parallel, in the
broad sense of having the same length and pointing in the same direction. The
precise rule for deciding whether two such vectors are parallel isn’t specified
here, and is broadly up to you, but we’ll come back to that.
62 3 Manifolds, Vectors, and Differentiation

Figure 3.4 Parallel transporting a vector along a curve.

λ(t)

TQ (M ) V (Q)

t(Q) - t(P )

V (P )

TP (M )
Figure 3.5 Pulling a vector from one tangent plane, attached to Q, to another,
attached to P.

This gives us a way of talking about subtraction. Take a vector ﬁeld V on

the manifold, and two points P and Q that are both on some curve λ(t ), with
tangent vector U . We can take the vector V (Q ) at Q and parallel transport it
back to P ; at that point it is in the same space T P (M ) as the vector V (P ) so we
can unambiguously subtract them to give another vector in T P (M ). These two
points are a parameter distance t (Q )− t (P ) apart (which is a number), so we can
divide the difference vector by that distance, find the limit as that distance goes
to zero, and thus reconstruct all the components we need to define a differential
just like Eq. (3.39). The differential we get by this process is the covariant
derivative of V along U , written ∇ U V .
The covariant derivative depends on using parallel transport as a way of
connecting vectors in two different tangent planes. The covariant derivative
is sometimes also called the connection , and the Christoffel symbols the
connection coefficients .
If V (Q ) starts off as just the parallel-transported version of V (P ), then when
we parallel transport it back to P we’ll get just V (P ) again, so that this covariant
derivative will be zero; thus
3.3 Covariant Differentiation in Curved Spaces 63

∇ U V = 0 ⇔ (V is parallel transported along U ). (3.40)

The crucial thing here is that nowhere in this account of the covariant
derivative have we mentioned coordinates at all.
We’ve actually said rather little, here, because although this passage has, I
hope, made clear how closely linked are the ideas of the covariant derivative
and parallel transport, we haven’t said how we go about choosing a definition
of parallelism, and we haven’t seen how this links to the covariant derivative
we introduced in Section 3.2. The link is the locally flat LIF. Although the
general idea of parallel transport, and in particular the definition I am about
to introduce, may seem obvious or intuitive, do remember that there is an
important element of arbitrariness in its actual definition.
Consider the coordinates representing the LIF at the point P . These are
cartesian coordinates describing a flat space (but not euclidean, remember,
since it does not have a euclidean metric). That means that the basis vectors
are constant – their derivatives are zero. A definition of parallelism now jumps
out at us: two nearby vectors are parallel if their components in the LIF are
the same . But this is the definition of parallelism that was implicit in the
differentiations we used in Sections 3.2.1 and 3.2.2, leading up to Eq. (3.18),
and so the covariant derivative we end up with is the same one: the tensor ∇ V
as defined in this section is the same as the covariant derivative of V in the LIF,
by our choice of parallelism; and the covariant derivative in the (flat) LIF is the
tensor ∇ V of Eq. (3.21).
Possibly rather surprisingly, we’re now finished: we’ve already done all of
the work required to define the covariant derivative in a curved space.
There are two further remarks remaining. Firstly, we can see that, in this
cartesian frame, covariant differentiation is the same as ordinary differentia-
tion, and so
V i ;j = V i ,j in LIF.

But this is true for any tensor, and so, speciﬁcally,

g ij;k = gij ,k = 0 at P ,

by Eq. (3.38). But this is a tensor equation, so it is true in any coordinate

system, and since there is nothing special about the point P , it is true at all
points of the manifold:

g ij;k = 0 in any coordinate system. (3.41)

Secondly, as mentioned at the end of Section 3.2.3, from Eq. (3.41) we can
deduce Eq. (3.36), since the conditions for that are still true in this more general
case.
64 3 Manifolds, Vectors, and Differentiation

Figure 3.6 Not a geodesic: tangent vectors (dashed) and parallel-transported

vectors.

The discussion in this section used less algebra than you may have
expected for such a crucial part of the argument. Writing down the
details of the construction of this derivative would be notationally intricate
and take us a little too far afield. If you want details, there are more formal
discussions of this mechanism via the notion of a ‘pull-back map’ in Schutz
(1980, §6.3) or Carroll (2004, appendix A), and the covariant derivative is
introduced in an axiomatic way in both Schutz (1980) and Stewart (1991).
Also, the definition of parallelism via the LIF is not the only one possible, but
picks out a particular derivative and set of connection coefficients, called the
‘metric connection’. Only with this connection are Eq. (3.36) and Eq. (3.41)
true. See also Schutz’s discussion of geodesics on his pages 156–157, which
elaborates the idea of parallelism introduced here.
[Exercise 3.15]

3.4 Geodesics
Consider a curve λ(t) and its tangent vectors U (that is, the set of vectors U
is a field that is defined at least at all the points along the curve λ). If we
have another vector field V , then the vector ∇ U V tells us how much V changes
as we move along the curve λ to which U is the tangent. What happens if,
instead of the arbitrary vector field V , we take the covariant derivative of U
itself? In general, ∇ U U will not be zero – if the curve ‘turns a corner’, then the
tangent vector after the corner will no longer be parallel to the tangent before
the corner. The meaning of ‘parallel’ here is exactly the same as the meaning
of ‘parallel’ that was built in to the definition of the covariant derivative in the
passage after Eq. (3.40). Curves that do not do this – that is, curves such that
all the tangent vectors are parallel to each other – are the nearest thing to a
straight line in the space, and indeed are a straight line in a flat space. A curve
such as this is called a geodesic, more formally defined as follows:
∇ U U = 0. ⇔ (U is the tangent to a geodesic) (3.42)
3.4 Geodesics 65

Equation (3.42) has a certain spartan elegance, but if we are to do any

calculations to discover what the path of the geodesic actually is, we need
to unpack it. ()
The object ∇ (·) U is a 11 tensor, as you will recall, with its vector argument
denoted by the (· ). Since it is a tensor, it is linear in this argument. That is,
for any vector A and scalar a , ∇ aAU = a∇ A U , and speciﬁcally ∇ Ak ek U =
A k ∇ ek U ≡ A k ∇ k U . The vector U has components

U = U jej,

and so Eq. (3.42) can be written

U j ∇ j U = U j U i ; j ei = 0

(recalling Eq. (3.21)). The i-component of this equation is, using Eq. (3.18b),
U j U i ;j = U j U i ,j + U j U k ²jk
i
= 0.
Let t be the parameter along the geodesic (that is, there is a parameterisation
of the geodesic, λ(t ), with parameter t , which U is the tangent to). Then, using
Eq. (3.5),
j j j
U = U (ḑx ) = dx /dt

and
U i ,j = j i
∂/∂ x (d x /d t) ,

and we promptly ﬁnd

d
´ d xi µ d x j d xk
i
+ ²jk = 0. (3.43)
dt dt dt dt
This is the geodesic equation . For each i it is a second order differential
equation with initial conditions comprising the initial position x i0 = x i (t P)
(if the parameter t has value tP at point P ) and initial direction/speed U i0 =
dx i/dt| t P . The theory of differential equations tells us that this equation does
have a unique solution.
A parameter t for which we can write down the geodesic equation Eq. (3.43)
is termed an affine parameter , and if t is an affine parameter, it is easy to con-
firm that φ = at + b , where a and b are constants, is an affine parameter also.
An affine parameter is one that, in MTW’s words (1973, §1.5), is
‘defined so that motion looks simple’. You can reasonably measure time
in seconds since midnight, or minutes (seconds/ 60), or minutes since noon
(seconds/ 60 − 720). These are all affine transformations, and they share
the property that unaccelerated motion is a linear function of time. If you
were reckless enough to measure time in units of seconds-squared, then
unaccelerated (that is, simple) motion would look very complicated indeed.
66 3 Manifolds, Vectors, and Differentiation

Another way of saying this is that an affine parameter is the time coordinate of
some inertial system , and all that means it that an affine parameter is the time
shown on some free-falling ‘good’ clock; it also means that a ‘good’ time is an
affine transformation of another ‘good’ time. There are further remarks about
affine parameters in Section 3.4.1.

The connection (or rather the class of connections) we have deﬁned

here (see Section 3.3.2) is constructed in such a way as to preserve
parallelism. Such a connection is an afﬁne connection – the word ‘afﬁne’
comes from a Latin root meaning ‘neighbouring’. Other types of connection
are possible; see Schutz (1980, §6.14) for some challenging extra examples.
A geodesic is a curve of extremal length. In a space with a metric with the
signature of GR, it is a curve of maximal length; in a euclidean space it is
a curve of minimal length: for Euclid, a straight line is the shortest distance
between two points.

Note on metric connections (extremely optional) : in other of these

asides I have emphasised that this metric connection is not the only
one definable. Since geodesics are defined in terms of the connection, it
does indeed follow that the geodesics implied by these other connections are
different from the geodesics of the metric connection, and specifically are not
the curves of extremal length. This is bound up with the property Eq. (3.41),
and the observation that only with the metric connection is the dot product
g( A , B ) invariant under parallel transport. This is one reason why the metric

connection is so important, to the point of being essentially ubiquitous in GR.

[Exercise 3.16]

3.4.1 The Variational Principle and the Geodesic Equation

We can prove directly that the geodesic is a curve of extremal length, by
deriving the geodesic equation explicitly from a variational principle.
For a given curve through space-time, parameterised by λ, the length of the
curve is given by
¼ ¼ ¶¶ i j ¶¶1 2 ¼ ¶¶ i j ¶¶1 2
/
λ1
/
¼ λ1

l= ds = g ij dx dx = g ij ẋ ẋ dλ, ≡ ṡ dλ ,
curve curve λ0 λ0

where
¶¶ ¶¶
i j 1/2
ṡ = g ij ẋ ẋ

expresses the relationship between parameter distance and proper distance, and
where dots indicate d/dλ. We wish to ﬁnd a curve that is extremal, in the
3.5 Curvature 67

sense that its length l is unchanged under ﬁrst-order variations in the curve,
for ﬁxed λ 0 and λ1. The calculus of variations (which as physicists you are
most likely to have met in the context of classical mechanics) tells us that such
an extremal curve xk (λ) is the solution of the Euler–Lagrange equations
d
´ ∂ ṡ µ ∂ ṡ
− = 0.
dλ ∂ ẋ k ∂xk

Have a go, yourself, at deriving the geodesic equation from this before reading
the rest of this section (at an appropriate point, you will need to restrict the
argument to parameterisations of s (λ) that are such that s̈ = 0.)
For ṡ as given in the above equation, we find fairly directly that
1 s̈ 1 d ( ) 1 g ẋ i ẋ j = 0.
− 2
2gkj ẋ j + 2gkj ẋ j − ij,k (3.44)
2 ṡ 2ṡ dλ 2ṡ
To simplify this, we can choose at this point to restrict ourselves to parameteri-
sations of the curve that are such that ds/dλ is constant along the curve, so that
s̈ = 0; this λ is an affine parameter as described in the previous section. With
this choice, and multiplying overall by ṡ , we find
g kj,l ẋ j ẋl + g kj ẍ j − 21 gij ,k ẋi ẋ j = 0

which, after relabelling and contracting with gkl , and comparing with
Eq. (3.36), reduces to
k k i j
ẍ + ²ij ẋ ẋ = 0, (3.45)
the geodesic equation of Eq. (3.43).
As well as showing the direct connection between the geodesic equation and
this deep variational principle, and thus making clear the idea that a geodesic
is an extremal distance, this also confirms the significance of affine parameters
that we touched on in Section 3.4. There is a ‘geodesic equation’ for non-affine
parameters (namely Eq. (3.44)), but only when we choose an affine parameter λ
does this equation take the relatively simple form of Eq. (3.43) or Eq. (3.45).
The general solution of Eq. (3.44) is the same path as the geodesic, but because
of the non-affine parameterisation it is not the same curve , and is not, formally,
a geodesic. As we have dicussed before, the affine parameter of the geodesic
is chosen so that motion looks simple.
Schutz discusses this at the very end of his §6.4, and the exercises
corresponding to it.

3.5 Curvature
We now come, ﬁnally, to the coordinate-independent description of curva-
ture . We approach it through the idea of parallel transport, as described in
68 3 Manifolds, Vectors, and Differentiation

4 3
1

Figure 3.7 Dragging a vector across a sphere.

Section 3.3.2, and speciﬁcally through the idea of transporting a vector round
a closed path. This section follows Schutz §6.5; MTW (1973) chapter 11 is
illuminating on this.
First, it’s important to have a clear picture of the way in which a vector will
change as it is moved around a curved surface. In Figure 3.7 we see a view of
a sphere and a vector that starts off on the equator, pointing ‘north’ (at 1). If
we parallel transport this along the line of longitude until we get to the North
Pole (2), then transport it south along a different line of longitude until we
get back to the equator (3), then back along the equator to the point we started
at (4), then we discover that the vector at the end does not end up parallel to the
vector as it started. The vector on its round trip picks up information about the
curvature of the surface; crucially this information is intrinsic to the surface,
and does not depend on looking at the sphere from ‘outside’. We now attempt
to quantify this change, as a measure of the curvature of the surface.

3.5.1 The Riemann Tensor

Consider the path following lines of constant coordinate, in an arbitrary
coordinate system. Figure 3.8 shows a loop in the plane of two coordinates xσ
and x λ The line joining A and B , and the line from D to C , have coordinate xσ
varying along a line of constant x λ , and lines B –C and A –D have x λ varying
along a line of constant x σ . Now take a vector V at A , and parallel transport it
to B , C , D , and back to A : how much has the vector changed during its circuit?
Parallel transporting the vector from A to B involves transporting V along
the vector ﬁeld e σ . From Eq. (3.40), this means that ∇ σ V = 0, or V i ;σ = 0.
That is (from Eq. (3.18b)),
i
∂V
= V i ,σ = − ²ki σ V k . (3.46)
∂xσ

Now, the components of the vector at B are

3.5 Curvature 69

xλ = b
B
xλ = b + δb
eσ

A V
eλ
C

D xσ = a + δa

xσ = a
Figure 3.8 Taking a vector for a walk.

¼ B ∂Vi
i i
V (B ) = V (A ) + dx σ
∂xσ
¼AB
= V i (A ) − ²kσ V
i k
dx σ
¼Aa+ a δ ¶¶
= V i (A ) − i
²kσ V
k
¶x = b d x σ
,
x
σ =a λ

where the integrand is evaluated along the line { x λ = b } from x σ = a to

x σ = a + δ a . Doing the same thing for the other sides of the curve, we ﬁnd:
¼ a+ a ¶¶
δ

δV
i i i
= V (A ﬁnal ) − V (A init ) = − ¶x = b dx
²jσ V
i j σ

x =a
λ

¼ b+ b
σ

i j¶
¶δ

− ²j V ¶ dx λ

x = a+ a
λ
x =b
σ

¼ a+ a
λ δ

j¶
δ¶
²j V ¶
i
+ dx σ

x = b+ b
σ
x =a
λ δ

¼ b+ b
σ

i j¶
¶
δ

+ ²j V ¶
(3.47) dx . λ

x =a
λ
=b x
λ σ

At this point we can take advantage of the fact that δ a and δ b are small by
construction, ignore terms in δ a2 and δ b2 , and thus take the integrands to
½ a+ δa f (x )dx = δaf (a) + O (δa 2 )).
be constant along the interval of integration (by expanding the integrand in
a Taylor series, convince yourself that a
70 3 Manifolds, Vectors, and Differentiation

We don’t know what the ²jiλ V j | x = a+ δa and ²ijσ V j | x = b+ δ b are (of course, since
σ λ

we are doing this calculation for perfectly general ²), but since δ a is small, we
can estimate them using Taylor’s theorem, ﬁnding
¶¶ ¶¶ ¶¶
i j
²jλ V
xσ = a+ δa
= i j
²jλ V
xσ = a
+ δa
∂ i j
²jλ V ¶¶ + O (δ a2 )
∂xσ x =a
σ

(the ∂/∂ x σ is a derivative with respect to a single coordinate, which is why

the σ index is correctly unmatched). Inserting this, and the similar expression
involving δ b , into Eq. (3.47), and ignoring terms of O (δ a 2, δ b2 ), we have
¼ a+ a ¶¶
δV
i
≈ +
δ

δb
∂ i
²jσ V ¶¶
j
dx σ

∂x λ

¼x b+= ab ¶¶ x = a,x = b
σ σ λ

i j¶
δ
∂
− ²j V ¶ dx .
λ
δa λ
x λ
=b ∂xσ x = a,x = b
σ λ

However, the integrands here are now constant with respect to the variable of
integration, so the integrals are easy:
¾ ∂
± ² ∂
± ²¿
i i j i j
δV ≈ δa δb ²jσ V − ²jλ V ,
∂xλ ∂ xσ

with all quantities evaluatated at the point A . If we now use Eq. (3.46) to get
rid of the differentials of V j , we ﬁnd, to ﬁrst order
i
À i i i k i k
Á j
δV = δx σ δx λ ²jσ ,λ − ²jλ,σ − ²kσ ²jλ + ²k λ ²jσ V , (3.48)

where we have written δ a and δ b as δ x σ and δ x λ respectively.

Let us examine this result. The left-hand side is the i component of a
vector δ V (we know this is a vector since it is the difference of two vectors
located at the same point A ; recall the vector-space axioms); we obtain that
component δ V i by acting on the vector δ V with the basis one-form ω̧i . The
right-hand side clearly depends on the vector V (also at the point A ), whose
components are V j . The construction in Figure 3.8, which crucially has the
area enclosed by constant-coordinate lines, depends on multiples of the basis
vectors, δ a eσ and δ b e λ . We can see that the number δ V i depends linearly
on each of these four objects – one one-form and three vectors. This leads
us to identify the( )numbers within the square brackets of Eq. (3.48) as the
components of a 13 tensor

R i jkl = i
²jl,k − i
²jk ,l + i
²σ k ²jl
σ
− i
²σ l ²jk
σ
(3.49)

(after some relabelling) called the Riemann curvature tensor (this notation,
and in particular the overall sign, is consistent with Schutz; numerous other
3.5 Curvature 71

conventions exist – see the discussion in Section 1.4.3). Thus Eq. (3.48)
becomes
i
δV = R i jσ λ V j δ xσ δx λ . = R( ω̧
i
, V , δ x σ e σ , δ x λ e λ ). (3.50)
This tensor tells us how the vector V varies after it is parallel transported on
an arbitrary excursion in the area local to point A (‘local’, here, indicating
small δ a and δ b); that is, it encodes all the information about the local shape
of the manifold.
Another way to see the significance of the Riemann tensor is to consider
the effect of taking the covariant derivative of a vector with respect to first one
then another of the coordinates, ∇ i ∇ j V . Defining the commutator
Â Ã
∇ i, ∇ j V k ≡ ∇ i∇ jV k − ∇ j∇ iV k, (3.51)
it turns out that
Â∇ Ã k
= R k lij V l
i, ∇ j V (3.52)
(see the ‘dangerous-bend’ paragraph in this section for details). This, or
something like it, might not be a surprise. We discovered the Riemann
tensor by taking a vector for a walk round the circuit ABCDA in Figure 3.8
and working out how it changed as a result. The commutator Eq. (3.51) is
effectively the result of taking a vector from A to C via B and via D , and asking
how the two resulting vectors are different.
The Riemann tensor has a number of symmetries. In a locally inertial frame,
R i jkl = 21 g im (gml ,jk − gmk ,jl + gjk ,ml − gjl ,mk ), (3.53)
and so
R ijkl ≡ g im R m jkl = 21 (g il,jk − g ik,jl + gjk ,il − gjl ,ik ). (3.54)
Note that this is not a tensor equation, even in these coordinates: in such inertial
coordinates V i ,j = V i ;j and so an expression involving single partial derivatives
of inertial coordinates can be trivially rewritten as a (covariant) tensor equation
by simply rewriting the commas as semicolons; however the same is not true
of second derivatives, so that Eq. (3.54) does not trivially correspond to a
covariant expression. 3

3 To see this, consider differentiatingV twice, and note that V k ;ln includes a term in ²kml, n . This
term is non-zero (in general) even in locally ﬂat coordinates, which means that Vk ;ln does not
reduce to Vk ,ln in the LIF. That means, in turn, that Eq. (3.53) cannot be taken to be the LIF
version of a covariant equation, and thus that we cannot obtain a covariant expression for the
Riemann tensor by swapping these commas with semicolons. Compare also Eq. (3.38c), and the
argument leading up to Eq. (3.36).
72 3 Manifolds, Vectors, and Differentiation

By simply permuting indexes in Eq. (3.54), you can see that

R ijkl = − R jikl = − R ijlk = R klij (3.55a)
R ijkl + R iljk + R iklj = 0. (3.55b)
These , ﬁnally are tensor equations so that (as usual) although we worked them
out in a particular coordinate system, they are true in all coordinate systems,
and tell us about the symmetry properties of the underlying geometrical object.

Notice that the definition of the Riemann tensor in Eq. (3.52) does not
involve the metric; this is not a coincidence. The development of the
Riemann tensor up to Eq. (3.50) is (I think) usefully explicit, but also rather
laborious, and manifestly involves the Christoffel symbols, which you are
possibly used to thinking of as being very closely related to the metric. That’s
not false, but another way of getting to the Riemann tensor is to define the
covariant derivative ∇ U V as an almost immediate consequence of the definition
of parallel transport, and then notice that the operator
[∇ U , ∇ V ] − ∇ [U ,V ] = R( U , V)
maps vectors to vectors, defining the
( ) Riemann tensor
1
(this is not a
R
3
derivation; see Schutz (1980, §6.8)). The point here is that R is not just a
function of the metric, but is a completely separate tensor, even though, in
the case of a metric connection, it is calculable in terms of the metric via
expressions such as Eq. (3.53). This is why it is not a contradiction (though
I agree it is surprising at first thought) that the Riemann tensor starts off with
considerably more degrees of freedom than the metric tensor. There are some
further remarks about the number of degrees of freedom in the Riemann tensor
in Section 4.2.5.
[Exercises 3.17–3.19]

3.5.2 Geodesic Deviation

In Section 1.2.4, we brieﬂy imagined two objects in free fall near the earth
(Figure 1.5), and noted that the distance between them would decrease as they
both moved towards the centre of the earth. We are now able to state that these
free-falling objects are following geodesics in the spacetime surrounding the
earth, which is curved as a result of the earth’s mass (though we cannot say
much more than this without doing the calculation, which is a bit of physics
which we learn about in the next chapter). We see, then, that the effect of the
curvature of space-time is to cause the distance between these two geodesics
to decrease; this is known as geodesic deviation , and we are now in a position
to see how it relates to curvature.
3.5 Curvature 73

Figure 3.9 Geodesics on earth.

λ (t )
µ (s+δs)
λ(t )
µ(s)
ξ X
X

ξ λ(t +δt)
λ(t +δt) µ(s+δs)
µ (s)

Figure 3.10 Geodesic deviation.

Schutz covers this at the end of his section 6.5, using a different argument
from the following. I plan to describe it in a different style here, partly because
a more geometrically minded explanation makes a possibly welcome change
from relentless components.
First, some useful formulae. (i) Marginally rewriting Eq. (3.52), we ﬁnd
Â∇ Ã Â Ã
= X j Y l ∇ j , ∇ l V k e k = R k ijl V i X j Y l e k .
X,∇ Y V (3.56)
Â Ã
(ii) Using the commutator A , B ≡ A B − B A , we ﬁnd

∇ AB − ∇ BA = A , B ,
Â Ã (3.57)

which is proved in Exercise 3.20.

Consider two sets of curves, λ(t) corresponding to a ﬁeld of tangent
vectors X , and µ(s ) with tangent vectors ξ , and suppose that, in some region of
the manifold, they cross each other (see Figure 3.10). Choose the curves and
their parameterisation such that each of the λ curves is a curve of constant s
and each of the µ curves is a curve of constant t . Thus, speciﬁcally, the ξ
vector – known as the connecting vector – joins points on two λ curves that
have the same parameter t . What we have actually described, here, is (part
of) a set of coordinate functions; you will see that the curves λ and µ have
exactly the properties that the conventionally written coordinate functions x i
74 3 Manifolds, Vectors, and Differentiation

have. Because of this construction, it does not matter in which order we take
the derivatives d/dt and d/ds , so that

d d d d
¾d d
¿
= ⇔ , = 0,
dt ds ds dt dt ds

or, since X = d/dt and ξ = d/ds ,

ÂX , ξ Ã = 0.

Thus, referring to Eq. (3.57),

∇ Xξ = ∇ ξ X . (3.58)

Now suppose particularly that the curves λ(t) are geodesics, which means
that ∇ X X = 0. Then the vector ξ joins points on the two geodesics that have
the same afﬁne parameter.
That means that the second derivative of ξ carries information about how
quickly the two geodesics are accelerating apart (note that this is ‘acceleration’
in the sense of ‘second derivative of position coordinate’, and not anything
that would be measured by an accelerometer – observers on the two geodesics
would of course experience zero physical acceleration). Using Eq. (3.58) the
calculation is easy. The second derivative is

∇ X ∇ X ξ = ∇ X ∇ ξ X = ∇ ξ ∇ X X + R k ijlX i X j ξ l e k , (3.59)

where the ﬁrst equality comes from Eq. (3.58) and the second from Eq. (3.56).
The ﬁrst term on the right-hand side disappears since ∇ X X = 0 along a
geodesic. Now, the covariant derivative with respect to the vector X is just
the derivative with respect to the geodesic’s parameter t (since λ is part of a
coordinate system; see Section 3.2.2), so that this equation turns into
Ä Åk
d 2ξ
= R k ijlX i X j ξ l . (3.60)
d t2

Thus the amount by which two geodesics diverge depends on the curvature
of the space they are passing through. Note that the left-hand side here is the
k -component of the second derivative of the vector ξ , and is a conventional
shortcut for ∇ X ∇ X ; it is not the second derivative of the ξ k component d2 ξ k/dt2 ,
though some books (e.g., Stewart (1991, §1.9)) rather confusingly write it this
way. [Exercises 3.20–3.22]
3.5 Curvature 75

Exercises
Most of the exercises in Schutz’s §6.8 should be accessible.

Exercise 3.1 (§ 3.1.3) By considering the contraction of the gradient with

a vector ad/dt + bd/ds , show that the gradient one-form deﬁned by Eq. (3.5) is
a linear function of its argument, and therefore a valid one-form.

Exercise 3.2 (§ 3.1.3) Given a potential ﬁeld φ in a 2-d space, show

immediately that the gradient of the ﬁeld, in polar coordinates (r , θ ), is
∂φ ∂φ
ḑφ = ḑr + ḑθ
∂r ∂θ

(once you’ve done that, expand ḑφ (V ), where V = V r er + V θ e θ , and persuade

yourself that you expected the result).
The metric in polar coordinates is g ij = diag(1, r 2), and correspondingly
ij
g = diag(1, 1 /r 2 ). Thus calculate the vector gradient of the ﬁeld φ , namely
the vector dφ.
This may not look quite as you expected, but recall that e r and e θ here form
a coordinate basis, and so are slightly different from the ‘usual’ basis of polar
coordinates r̂ = e r and θ̂ = (1/r )e θ . Immediately calculate dφ in this basis,
and conﬁrm that this looks more as you might expect.

Exercise 3.3 (§ 3.1.4) In the { x , y } cartesian coordinate system (with basis

vectors ∂/∂x and ∂/∂y ), the metric is simply diag(1, 1 ). Consider a new
coordinate system { u , v } , (with basis ∂/∂ u and ∂/∂ v ), deﬁned by

u = 21 (x2 − y2 )
(i)
v = xy .

You might also want to look back at the ‘dangerous bend’ paragraphs below
Eq. (2.23).
(a) Write x 1 = x, x 2 = y , x 1̄ = u , x2̄ = v , and thus, referring to
Eq. (3.9), calculate the matrices ±ı̄j and ±ij̄ . (The easiest way of doing the latter
calculation is to calculate ∂ u/∂ u , ∂ u/∂ v, . . . , and solve for ∂ x/∂ u , ∂ x/∂ v, . . . ,
ending up with expressions in terms of x, y , and r 2 = x 2 + y 2 .)
(b) From Eq. (2.24),
i j
gı̄ j¯ = ±ı̄ ±j̄ g ij .

Thus calculate the components g ı̄j¯ of the metric in terms of the coordi-
nates { u , v } (you can end up with expressions in terms of u and v , via
4(u 2 + v 2 ) = r 4).
76 3 Manifolds, Vectors, and Differentiation

(c) A one-form has cartesian coordinates (A x , A y) and coordinates (A u , A v )

in the new coordinate system. Show that
xA x − yA y
Au = ,
x2 + y2

and derive the corresponding expression for A v . [d +u + ]

Exercise 3.4 (§ 3.1.4) (a) Write down the expressions for cartesian coordi-
nates { x , y } as functions of polar coordinates { r , θ } , thus calculate ∂ x/∂ r , ∂ x/∂ θ ,
∂ y/∂ r and ∂ y/∂ θ , and thus ﬁnd the components of the transformation matrix
from cartesian to polar coordinates, Eq. (3.9b).
(b) The inverse transformation is
± y²
r 2 = x 2 + y 2, θ = arctan .
x
Differentiate these, and thus obtain the inverse transformation matrix
Eq. (3.9a). Verify that the product of these two matrices is indeed the identity
matrix. Compare Section 2.3.2.
(c) Let V be a vector with cartesian coordinates { x , y } , so that

V = xe x + ye y .

Show that V̇ and V̈ have components {˙x , ẏ } and {¨x , ÿ } in this basis.
(d) Using the relations x = r cos θ and y = r sin θ , write down expressions
for ẋ , ẏ , ẍ and ÿ in terms of polar coordinates r and θ and their time derivatives.
(e) Now use the general transformation law Eq. (3.9a)

j ∂ x ı̄ j
V = =
ı̄ ı̄
±j V V
∂xj

to transform the components of the vectors V̇ and V̈ , which you obtained in (c),
into the polar basis { e r , e θ } , and show that

V̇ = ṙe r + θ̇ eθ
± ² ´ 2
µ
2
V̈ = r̈ − r θ̇ er + θ̈ + ṙ θ̇ eθ .
r

[u ++ ]

Exercise 3.5 (§ 3.1.4) Deﬁne a scalar ﬁeld, φ, by

φ (x , y ) = x 2 + y 2 + 2xy ,

for cartesian coordinates { x , y } .

3.5 Curvature 77

(a) From Eq. (3.5), the ith component of the gradient one-form ḑφ is
obtained by taking the contraction of the gradient with the basis vector e i =
i
∂/∂ x . Thus write down the components of the gradient one-form with respect
to the cartesian basis.
(b) The result of Exercise 2.10 says that the transformation law for the
components of a one-form is
j ∂xj
A ı̄ = ±ı̄ A j = Aj.
∂ x ı̄

Thus determine the components of ḑφ in polar coordinates { r , θ } .

(c) By expressing φ in terms of r and θ , obtain directly the polar components
of ḑφ and verify that they agree with those obtained in (b).
(d) Write down the components of the metric tensor in cartesian coor-
dinates, gxx , g xy , g yx , g yy , and by examining Eq. (2.15), write down the
components of the metric tensor with raised indexes, g xx , gxy , gyx , g yy . Hence
determine the cartesian components of the vector gradient dφ (i.e., with raised
index).
(e) Recall the metric for polar coordinates, and thus the components grr ,
g r θ , gθ r , and g θ θ . Hence determine the polar components of dφ . Comment on
the answers to parts (d) and (e). [d + u+ ]

Exercise 3.6 (§ 3.2.1) In Eq. (3.16) we see, for example, two lowered θ s
on the left-hand side with no θ on the right-hand side. Why isn’t this this an
Einstein summation convention error?

Exercise 3.7 (§ 3.2.2) Consider a vector ﬁeld V with cartesian components

{ V x, V y} { x2 3y , y 2
= + + 3x } .
()
(a) Using the transformation law for a 10 tensor, and the result of Exer-
cise 3.4, determine { V r , V θ } , the components of the same vector ﬁeld V with
respect to the polar basis { e r , eθ } .
(b) Write down the components of the covariant derivative V i ;j in cartesian
coordinates.
()
(c) Using the fact that V i ;j transforms as a 11 tensor, compute the compo-
nents of the covariant derivative with respect to the polar coordinate basis by
transforming the V i ; j obtained in part (b).
(d) Now, taking a different tack, compute the polar components of the co-
variant derivative of V , by differentiating the polar coordinates obtained in (a).
That is, use Eq. (3.18b) and the Christoffel symbols for polar coordinates,
Eq. (3.15).
(e) Verify that the polar components obtained in (c) and (d) are the same.
[u+ ]
78 3 Manifolds, Vectors, and Differentiation

Exercise 3.8 (§ 3.2.2) Do Exercise 3.7 again, but this time working with
the one-form ﬁeld A̧ , with cartesian components { x 2 + 3y , y 2 + 3x} .

Exercise 3.9 (§ 3.2.2) Comparing Exercises 3.7 and 3.8, verify that in both
cartesian and polar coordinates gik V k ;j = A i;j . [d − ]

Exercise 3.10 (§ 3.2.2) Suppose that U is tangent to a geodesic, thus

∇ U U = 0. Consider another vector ﬁeld V , related to U by V = aU , where a
is a scalar ﬁeld. By recalling that (∇ V )α β = V α ;β = V α ,β + ²αβ γ V γ , show
that V is also tangent to a geodesic, if and only if a is a constant. [u + ]

·p̧, A ¹ = 3.11
Exercise
∇V
·∇ p̧, A(§¹ +3.2.2)
·p̧, ∇ ADeduce
¹, and Eq.Eq. · ¹
(3.26). Use the Leibniz
(3.24) acting on f = p̧, A .
rule
[u + ]
V V

Exercise 3.12 (§ 3.2.3) Derive Eq. (3.33) from Eq. (3.29) (one-liner).
Derive Eq. (3.34) from Eq. (3.32) (few lines). [d − ]

Exercise 3.13 (§ 3.2.3) Let A j be the components of an arbitrary one-form.

Write down the transformation law for A j and for its covariant derivative A j;k .
We can obtain an expression for A j̄ ;k̄ either by differentiating A j¯ , or by
transforming A j;k . By comparing the two expressions for A j̄ ;k̄ , show that the
transformation law for the Christoffel symbols has the form

∂ xı̄ ∂ x j ∂ x k i ∂ x ı̄ ∂ 2x l
²
ı̄
= ²jk + .
j̄ k̄
∂ x i ∂ x j̄ ∂ x k̄ ∂ x l ∂ x j̄ ∂ x k̄

The ﬁrst term in this looks like Eq. (2.24); but the second term will not be zero
in general, demonstrating that Christoffel symbols are not the components of
any tensor. [d + ]

Exercise 3.14 (§ 3.2.3) Suppose that in one coordinate system the Christof-
fel symbols are symmetric in their lower indexes, ²jki = ²ikj . By considering
the transformation law for the Christoffel symbols, obtained in Exercise 3.13,
show that they will be symmetric in any coordinate system.

Exercise 3.15 (§ 3.3.2) Things to think about: Why have you never had to
learn about covariant differentiation before now? The glib answer is, of course,
that you weren’t learning GR; but what was it about the vector calculus that
you did learn that meant you never had to know about connection coefﬁcients?
Or, given that you did effectively learn about them, but didn’t know that was
what they were called, why do we have to go into so much more detail about
them now? There are a variety of answers to these questions, at different levels.
3.5 Curvature 79

Exercise 3.16 (§ 3.4) (a) On the surface of a sphere, we can pick coordi-
nates θ and φ, where θ is the colatitude, and φ is the azimuthal coordinate. The
components of the metric in these coordinates are

gθ θ = 1, g φφ = sin2 θ , others zero.

Show that the components of the metric with raised indexes are
1
g
θθ
= 1, g
φφ
= , others zero.
sin2 θ
(b) The Christoffel symbols are deﬁned as
i 1 ij
²kl = 2 g (g jk,l + g jl,k − gkl ,j ),

and the geodesic equation is

d
´ dx i µ dx j dx k
i
+ ²jk = 0,
dt dt dt dt
for a geodesic with parameter t. Using these ﬁnd the Christoffel symbols for
these coordinates (i.e., ²θθθ , ²θθ φ and so on), and thus show that the geodesic
equations for these coordinates are

θ̈ − sin θ cos θ φ̇ 2 = 0 (i)

cos θ
φ̈ + 2 θ̇ φ̇ = 0, (ii)
sin θ
where dots indicate differentiation with respect to the parameter t .
(c) Using the result of part (b), or any other properties of geodesics that you
know, explain, giving reasons, which of the following curves are geodesics, for
afﬁne parameter t.

1. φ = t , θ = π/2 2. φ = t, θ = π/4 3. φ = t , θ = 0
4. φ = t , θ = t 5. φ = φ0, θ = t 6. φ = φ0 , θ = 2t − 1
7. φ = φ0 , θ = t2

(d) The surface of the sphere can also be described using the coordinates
( x , y ) of the Mercator projection , as used on some maps of the world to
represent the surface in the form of a rectangular grid. The coordinates (x , y)
can be given as a function of the coordinates (θ , φ) listed in this problem.
If you were to perform the same calculation as above using these coordi-
nates, would you obtain the same Christoffel symbols? Explain your answer.
Comment on the relationship between the curves in part (c) and the geodesics
obtained using the Mercator coordinates. [d + u++ ]
80 3 Manifolds, Vectors, and Differentiation

Exercise 3.17 (§ 3.5.1) Prove Eq. (3.52). Write ∇ i ∇ j V k = ∇ i (V k ; j ) =

k
( V ;j ) ;i , and use the expression Eq. (3.27) to expand the derivative with respect
to x i . At this point, decide to work in LIF coordinates, in which all the ²ijk = 0,
making the algebra easier. Thus deduce that ∇ i ∇ j V k = V k ,ji + ²ljk ,i V l . You can
then immediately write down an expression for ∇ j ∇ i V k . Subtract these two
expressions (to form [∇ i , ∇ j ]V k ), noting that the usual partial differentiation of
components commutes: V k ,ij = V k ,ji . Compare the result with the deﬁnition of
the Riemann tensor in Eq. (3.49), and arrive at Eq. (3.52). [ d + u ++ ]

Exercise 3.18 (§ 3.5.1) Prove Eq. (3.53). Expand the deﬁnition of the
Riemann tensor in Eq. (3.49) in the local inertial frame, in which g kl,m = 0
(Eq. (3.38)). Recall that partial derivatives always commute.

Exercise 3.19 (§ 3.5.1) In Exercise 3.16, you calculated the Christoffel

symbols for the surface of the unit sphere. Calculate the components of the
curvature tensor for these coordinates, plus the Ricci tensor R jγ = R i ji γ and
the Ricci scalar R = g jγ R jγ (see Chapter 4).
You can most conveniently do this by calculating selected components of
the curvature tensor R ijkl obtained by lowering the ﬁrst index on Eq. (3.49); you
can cut down the number of calculations you need to do by using the symmetry
relations Eq. (3.55) heavily. Why should you not use Eq. (3.54), which appears
to be more straightforward?
This question is long-winded rather than terribly hard. It’s worthwhile
slogging through it, however, since it gives valuable practice handling indices,
and makes the idea of the curvature tensor rather more tangible. [d −u + ]

Exercise 3.20 (§ 3.5.2) Prove Eq. (3.57), by writing it in component form.

Recall Eq. (3.37). The last step is the tricky bit, but recall that for a (tangent)
vector A , Af = A k e k f = A k f ,k , where f is any function, including a vector
component.

Exercise 3.21 (§ 3.5.2) Consider coordinates on a sphere, as you did in

Exercise 3.16, and consider the geodesics λ(t ) in Figure 3.11 with afﬁne
parameter t and tangent vectors X – these are great circles through the poles.
The curves µ(s ) with tangent vectors ξ are connecting curves as discussed in
Section 3.5.2.
We can parameterise the curve λ(t ) using the coordinates (θ , φ), as
λ(t) :
( )=
θ λ(t ) t,
( )=
φ λ(t) φ0

(compare Section 3.1.2), and you veriﬁed in Exercise 3.16 that this does indeed
satisfy the geodesic equation.
3.5 Curvature 81

X µ(s)

λ(t)
θ

Figure 3.11 Geodesics on a sphere.

(a) Using Eq. (3.1), show that the components of X are

X θ = 1, X φ = 0.

(b) Write Eq. (3.60) as

gλk (∇ X ∇ X ξ )k − gλ k R k ijl X i X j ξ l = 0 (i)

and, by using the components of the curvature tensor that you worked out in
Exercise 3.19, show that

(∇ X ∇ X ξ ) = 0 (iia)
θ

(∇ X ∇ X ξ ) + = 0. (iib)
φ φ
ξ

This tells us that the connecting vector – the tangent vector to the family of
curves µ(s ), connecting points of equal afﬁne parameter along the geodes-
ics λ(t ) – does not change its θ component, but does change its φ component.
Which isn’t much of a surprise.
(c) Can we get more out of this? Yes, but to do that we have to calculate
∇ X ∇ X ξ , which isn’t quite as challenging as it might look. From Eq. (3.21) we
write
´ ∂ξj µ
i i j i j γ
∇ X ξ = X ∇ iξ = X ejξ ;i = X ej + ²iγ ξ . (iii)
∂xi

You have worked out the Christoffel symbols for these coordinates in
Exercise 3.16, so we could trundle on through this calculation, and ﬁnd
expressions for the components of the connecting vector ξ from Eq. (ii). In
order to illustrate something useful in a reasonable amount of time, however,
we will short-circuit that by using our previous knowledge of this coordinate
system.
82 3 Manifolds, Vectors, and Differentiation

The curve
µ(s ) : θ (s ) = θ0 , φ (s ) = s
is not a geodesic (it is a small circle at colatitude θ0 ), but it does connect points
on the geodesics λ(t) with equal afﬁne parameter t ; it is a connecting curve for
this family of geodesics. Convince yourself of this and, as in part (a), satisfy
yourself that the tangent vector to this curve, ξ = d/ds , has components ξ θ = 0
and ξ φ = 1; and use this together with the components of the tangent vector X
and the expression Eq. (iii) to deduce that
˙ ≡ ∇ ξ = 0e + cot θ e ,
ξ X θ φ

(where ξ˙ is simply a convenient – and conventional – notation for ∇ X ξ ) or

ξ̇ = 0, ξ̇ = cot θ .
θ φ

(d) So far so good. In exactly the same way, take the covariant derivative
of ξ˙ , and discover that

∇ X ξ˙ = ∇ X ∇ X ξ = 0eθ − 1eφ = − ξ ,
and note that this ξ does in fact accord with the geodesic deviation equation of
Eq. (ii).
Note that this example is somewhat fake, in that, in (c), we set up the
curve µ( s ) as a connecting curve, and all we have done here is verify that
this was consistent. If we were doing this for real, we would not know (all of)
the components of ξ beforehand, but would carry on differentiating ξ as we
started to do in (c), put the result into the differential equation Eq. (ii) and thus
deduce expressions for the components ξ k .
As a ﬁnal point, note that the length of the connecting vector ξ is just

g(ξ ξ ), = gij ξ i ξ j = sin2 θ ,

which you could possibly have worked out from school trigonometry (but it
wouldn’t have been half so much fun). [u + ]

Exercise 3.22 (§ 3.5.2) In the newtonian limit, the metric can be written as

gij = ηij + h ij (i)

where

ηij = diag(− 1, 1, 1, 1 )
Æ
− 2φ i = j
hij = ,
0 i ²= j
3.5 Curvature 83

and φ is the newtonian gravitational potential φ (r ) = GM /r . In this limit, and

with this metric, the curvature tensor can be written as
2R ijkl = hil ,jk + hjk ,il − h ik ,jl − h jl,ik .
The equation for geodesic deviation is
d2 ξ i
= R i jklU j U k ξ l , (ii)
dt 2
where the vectors U are tangent to geodesics, and we can take them to be
velocity vectors.
Consider two particles in free fall just above the Earth’s North Pole, so
that their (cartesian) coordinates are both approximately x = y = 0, z = R ,
where R is the radius of the Earth. Take them to be separated by a separation
vector ξ = (0, ξ x , 0, 0), where ξ x ³ R . Since they are falling along geodesics,
their velocity vectors are both approximately U = (U t , 0, 0, U z ).
With this information, show that the two particles accelerate towards each
other such that
d2 ξ x GM x
= − ξ (iii)
d t2 r3
to ﬁrst order in φ (given values for G , M , and R , why can we take φ 2 ³ 1?).
Since these are non-relativistic particles, you may assume, at the appropriate
point, that | U t | ´ |U z | , and thus that | U t | 2 ≈ − 1.
If we had used a different metric to describe the same newtonian space-
time, rather than that in Eq. (i), would we have obtained a different result for
the geodesic deviation, Eq. (iii)? Explain your answer. [d + u+ ]
4
Energy, Momentum, and Einstein’s Equations

Proving that nothing ever changes:

. . . For Aristotle divides theoretical philosophy too, very fittingly, into
three primary categories, physics, mathematics and theology. . . . Now
the first cause of the first motion of the universe, if one considers it
simply, can be thought of as an invisible and motionless deity; the
division [of theoretical philosophy] concerned with investigating this
[can be called] ‘theology’, since this kind of activity, somewhere up
in the highest reaches of the universe, can only be imagined, and
is completely separated from perceptible reality. The division which
investigates material and ever-moving nature, and which concerns itself
with ‘white’, ‘hot’, ‘sweet’, ‘soft’ and suchlike qualities one may call
‘physics’; such an order of being is situated (for the most part) amongst
corruptible bodies and below the lunar sphere. That division which
determines the nature involved in forms and motion from place to place,
and which serves to investigate shape, number, size, and place, time and
suchlike, one may define as ‘mathematics’ . . .
From all this we conclude: that the first two divisions of theoretical
philosophy should rather be called guesswork than knowledge, theology
because of its completely invisible and ungraspable nature, physics
because of the unstable and unclear nature of matter; hence there is
no hope that philosophers will ever be agreed about them; and that
only mathematics can provide sure and unshakeable knowledge to its
devotees, provided one approaches it rigorously. For its kind of proof
proceeds by indisputable methods, namely arithmetic and geometry.
. . . As for physics, mathematics can make a significant contribution. For
almost every peculiar attribute of material nature becomes apparent from
the peculiarities of its motion from place to place.
Preface to Book 1 of Ptolemy’s Almagest, between 150–161 ce

84
4.1 The Energy-Momentum Tensor 85

Ptolemy is right, here (though some of the details of his cosmology have been
adjusted since he wrote this, and what he refers to as ‘theology’ is now more
often referred to as ‘Quantum Gravity’): mathematics we can know all about,
with certainty; for physics we have to make guesses. He is also correct about
the contribution of mathematics, and we’ll discover that our ﬁrst insights in
this section do indeed come from considering the peculiarities of the material
world’s motion from place to place.
To match our return to physics, here, we’re now going to specialise to
working in a four-dimensional manifold, and to metrics with a signature of
+ 2, so that the LIF has the same metric as Minkowski space. To further
match Section 2.3.4, we will also introduce a slight, and traditional, notational
change. We will now index components with greek letters, µ , ν , α, β, . . . ,
which we take to run from 0 to 3; we will sometimes write indexes with latin
letters i, j,. . . , taking these to run over the spacelike directions, 1, 2, and 3.

4.1 The Energy-Momentum Tensor

The point of this whole book is to describe how gravity, in the form of
the curvature of space-time, is determined by the presence of mass. In
newtonian physics, the relationship is straightforward, since the notion of mass
is unproblematic. In relativity, however, we know that what matters is not mass
alone, but energy-momentum, and so it is not unreasonable that what matters in
GR is not mass, but the distribution of energy-momentum, and so we must find
a way of describing this in an acceptably geometrical fashion. In this section
we are confining ourselves to special relativity (SR) , and in the next section
we discover that this is not, physically, a restriction in fact. This section largely
follows Schutz chapter 4. MTW (1973, chapter 5) takes a significantly different
approach, but is very illuminating.
We start (as we all end) with dust.

4.1.1 Dust, Fluid, and Flux

A fluid in GR and cosmology is, not surprisingly, something that flows;
that is, a substance where the forces perpendicular to an imaginary surface
(i.e., pressure) are much greater than the forces parallel to it (i.e., stress, arising
from viscosity). The limit of this, a perfect fluid , is a substance that has pressure
but zero stresses. An even simpler substance is termed dust , which denotes an
idealised form of matter, consisting of a collection of non-interacting particles
86 4 Energy, Momentum, and Einstein’s Equations

Δz

Δy Δx

Figure 4.1 The volume swept out by an area.

that are not moving relative to each other, so that the collection has zero
pressure – the dust’s only physical property is mass-density. That is to say that
there is a frame, called the momentarily comoving reference frame- (MCRF),
with respect to which all the particles in a given volume have zero velocity. 1
We can suppose for the moment that all the dust particles have the same
(rest) mass m , but that different parts of the dust cloud may have different
number densities n . Just as the particle mass m is the mass in the particle’s rest
frame, the number density n is always that measured in the MCRF.
If we Lorentz-transform to a frame that is moving with velocity v with
respect to the MCRF, a (stationary) volume element of size ±x ±y ±z
will be Lorentz-contracted into a (moving) element of size ±x ±± y± ± z± =
2 − 1/2
(±x /γ )±y ±z , where γ is the familiar Lorentz factor γ = ( 1 − v ) ,
supposing that the frames are chosen such that the relative motion is along
the x -axis. That means that the number density of particles, as measured in
the frame relative to which the dust is moving, goes up to γ n . What, then, is the
flux of particles through an area ±y ±z in the y ±–z ± plane? The particles in the
volume all pass through the area ± y± ±z ± in a time ±t± , where ±x ± = v ±t± , and
so this total number of particles is (γ n )(v ±t ± )±y± ± z ±. Thus the total number
of particles per unit time and per unit area, which is the flux in the x ± -direction,
is γ nv . Writing N x for this x -directed flux, and vx for the velocity along the
x -axis, v , this is
x x
N = γ nv . (4.1)
We can generalise this, and guess that we can reasonably define a flux vector
N = nU , (4.2)
where again n is the dust number density in its MCRF, and U is the
4-velocity vector (γ , γ v x , γ v y , γ v z ). Since the velocity vector has the property
g( U , U ) = − 1 (remember your SR, and that the 4-velocity vector U = ( 1, 0)

1 This is also, interchangeably, sometimes called the Instantaneously Comoving Reference Frame
(ICRF).
4.1 The Energy-Momentum Tensor 87

in MCRF), we have g (N , N ) = N α N α = − n2 . The components of the ﬂux

vector N in this frame are
x
(γ n , γ nv , γ nv y , γ nv z ). (4.3)
This flux vector is a geometrical object, because U is, and so although its
components are frame-dependent, the vector as a whole is not.
It is obvious how to recover, from Eq. (4.3), the fluxes N x across surfaces
of constant coordinate – we simply take the components – but we will need to
be more general than this. Any function defined over space-time, φ (t, x , y, z ),
defines a surface φ = constant, and its gradient one-form ± dφ acts as a normal
to this surface (think of the planes in our visualisation of one-forms). The
unit gradient one-form ±n ≡ ±dφ/|±dφ | points in the same direction but has unit
magnitude (the notational clash with the number density n is unfortunate but
conventional). Consider specifically the coordinate function x : the gradient
one-form corresponding to this, ± dx , has components (0, 1, 0, 0 ) (and so is
already a unit one-form). If we contract this one-form with the flux vector,
we find
²±dx , N ³ ≡ N (±dx ) = Nx

(where the last expression denotes the x -component of N , rather than the whole
set of components). That is, contracting the flux vector with a gradient one-
form produces the flux across the corresponding surface; this is true in general,
so that N (±
n) produces the flux across the surface φ = constant, where ± n =
±dφ/|±dφ| . The vector N = nU is manifestly geometrical; it is our ability to
recover the flux in this way that justifies our naming this the ‘flux vector’.

4.1.2 The Energy-Momentum Tensor

We’ll switch from (t , x , y , z ) to general (x 0, x1 , x 2 , x 3), now, but we’re still
confining ourselves to SR.
We know from our study of SR that energy and mass are interconvertible.
For our dust particles of mass m , therefore, the energy density of the dust,
in the MCRF, is mn . In our moving frame, however, as well as the number
density rising to γ n , the total energy of each particle, as measured in the
‘stationary frame’, goes up to γ m . Thus the energy density of the dust as
measured in a moving frame is γ 2mn . This double factor of γ cannot result
from a Lorentz boost of a vector, and is the first indication that to describe the
energy-momentum of the dust we will need to use a higher-order tensor.
What geometrical objects do we have to play with? We have the momenta of
the dust particles, p = mU , and we have the flux vector N = nU . As mentioned
88 4 Energy, Momentum, and Einstein’s Equations

in the previous section, we also have the gradient one-forms corresponding

to the coordinate functions, ± dx α . By contracting the vectors with these one-
forms we can extract the particles’ energy p 0 = p(± dx0 ) or spatial momenta
p i = p (±
dx i ), or the number density N 0 = N (± dx 0) (which we can interpret as
the number crossing a surface of constant time, into the future) or number flux
N i = N (±d xi ) . ()
Let us form the 20 tensor
T = p⊗ N = ρU ⊗ U, dust (4.4)
(writing ρ = mn for the mass density , and recalling the definition of outer
product in Section 2.2.2) – this is known as both the energy-momentum
tensor and the stress-energy tensor. In order to convince ourselves that this
mathematical object has a useful physical interpretation, we now examine the
components of this tensor, obtained by contracting it with the basis one-forms
±ωα = ±dxα , where the coordinate functions x i are those corresponding to a
frame with respect to which the dust is moving. These components are, of
course,
T αβ = ±dx , ±dx
T(
α β
) = p (±
dx α ) × N (±
d xβ ) .
The 0-0 component T 00 is just γ 2mn , which we can recognise as the energy
density of the dust, or the flow of the zeroth component of momentum across
a surface of constant time.
The 0-i component is T 0i = γ m × γ nv i (after comparing with Eq. (4.3)).
Given that nv has the dimensions of (per-unit-area per-unit-time), and that mass
and energy are interconvertible in relativity, this is identifiable as the flux of
energy across a (spatial) surface of constant xi .
The i-0 component T i 0 = pi × N 0 = m γ v i × γ n is the flux of the ith
component of momentum across a surface of constant time, into the future.
By analogy with the energy density, this is known as (the i th component of)
the momentum density of the dust. Now, energy flux across a surface is an
amount of energy-per-unit-time, per unit area or, since energy and mass are
the same thing, mass-per-unit-time, per unit area. However, momentum density
is the amount of momentum per unit volume, which is mass-times-speed per
unit volume, which is dimensionally the same as energy flux. Another way of
getting to the same place (in Schutz’s words this time) is that energy flux is the
density of mass-energy times the speed it flows at, whereas momentum density
is the mass density times the speed the mass flows at, which is the same thing.
Thus the identity of T i0 and T 0i in this case is not coincidental or special to
dust, but quite general:
T i0 = T 0i .
4.1 The Energy-Momentum Tensor 89

Finally the i-j component of the energy-momentum tensor, T ij = p i N j =

i j j
γ mv × γ nv , is the flux of i -momentum across a surface of constant x . It has the
dimensions of momemtum per unit time, per unit area, leading us to identify it
as force per unit area, or pressure.
In general, therefore, we can interpret the component T (± dx α, ±dx β ) as
the flow of the αth component of momentum across a surface of constant
coordinate x β .
By considering the torques acting on a fluid element we can show
(Schutz §4.5) that the tensor T is symmetric in general,
T
αβ
= T βα , or T(±
p, ±
q) = ±,±p ),
T (q ∀±
p, ±
q. (4.5)
In a perfect fluid, there is no preferred direction, so the spatial part of the
energy-momentum tensor must be proportional to the spatial part of the metric,
which is δ ij in SR. Since there is no viscosity, the only momentum transport
possible is perpendicular to the surface of a fluid element, in the form of
pressure p (which is force per unit area, remember), giving the constant of pro-
portionality (see Schutz §4.6 for an expanded version of this argument), and so
T ij = pδ ij , (perfect fluid). (4.6)
From there it is a short step to show that the energy-momentum tensor for a
perfect fluid, as a geometrical object, is
T = (ρ + p)U ⊗ U + pg , (perfect fluid). (4.7)
Dust has no pressure, so its energy-momentum tensor in the MCRF is
T = diag(ρ , 0, 0, 0), (dust, MCRF). (4.8)
The final important property of this tensor is its conservation law. If energy
is to be conserved, then the amount of energy-momentum entering an arbitrary
four-dimensional box must be the same as the amount leaving it. From this we
promptly deduce that
∂ ∂ ∂ ∂
T α0 + T α1 + T α2 + T α3 = 0,
∂x0 ∂x1 ∂x2 ∂x3
or
T
αβ
,β = 0. (4.9)
By a similar sort of argument, requiring that under any flow of a fluid or of
dust the total number of particles is unchanged, we can show that
N α ,α = ( nU
α
) ,α = 0, (4.10)
with no source term on the right-hand side. [Exercise 4.1]
90 4 Energy, Momentum, and Einstein’s Equations

4.1.3 The Energy-Momentum Tensor: A Different Approach

Imagine a wireframe box marking out a cuboidal volume of space.2 It might
have just air in it, or flowing through it if you’re sitting in a draught or throwing
it from hand to hand. You might take it to a tap and let water run through
the volume it marks out (you could also imagine putting it round a candle
flame and watching exothermic chemistry happen in it, or waving a magnet
near it and measuring electromagnetism happening inside it; but we’ll stick to
particle and continuum mechanics for the moment).
How much energy is there inside the wire box? Or, if it’s moving, how
much dynamics – how much oomph – is moving through the box? If there’s a
particle inside the box then we can talk about its energy-momentum p = mU =
m γ (1, v); the components of this quantity are of course frame-dependent,
taking different values in frames attached to you, to the box, or to the particles
within it, but we know that the 4-momentum of p is a geometrical frame-
independent quantity. If there are two particles inside the box, we can simply
add up their momenta, but if there is a continuum of dynamics – such as the
air flowing through the box – then we must talk of the density of energy-
momentum; but for that we must first find a way of talking about a volume
in a frame-independent way.
Define the 1-form ± σ with components

σµ = ² µαβ γ A
α
Bβ C γ , (4.11)
where A = (A 0, a), B = (B 0 , b ), and C = (C 0, c) are three linearly independent
vectors, and ²µαβ γ is the Levi-Civita symbol , which is such that
⎧
⎪⎪⎨+ 1 if µαβ γ is an even permutation of 0123
=
² µαβ γ
⎪⎪⎩− 1 if it is an odd permutation (4.12)
0 otherwise.

If A , B , and C are purely spacelike vectors, so that A 0 = B 0 = C 0 = 0, then

σi = 0 and

σ0 = ² 0αβ γ A
α
Bβ C γ ,

and you may recall from linear algebra that this is the expression for a · (b × c),
the vector triple product, which gives the volume, V , of the parallelepiped
bounded by the three 3-vectors a, b , and c. Since each of these vectors is

2 The account here is closely compatible with MTW (1973, chaper 5). It covers the same material
as the previous section, but in a less heuristic way, at the expense of introducing some new
maths. It doesn’t have a ‘dangerous bend’ marker but you should feel free to skip it if the
previous section was adequately satisfying to your mathematical sensibilities.
4.1 The Energy-Momentum Tensor 91

p
σ
B
A

Figure 4.2 A volume form (dimension C suppressed).

C A

Figure 4.3 Flux out of a box.

² ³
orthogonal to ± σ (that is, ±
σ , A = 0, etc.), the 3-d volume that they span
is a hyperplane of the one-form ± σ . This volume (shown in Figure 4.2, with
the direction C suppressed) contains matter or other energy with associated
momenta p.
This (timelike) volume is moving through space-time with a velocity U ,
where U = (1, 0) in the volume’s rest frame. The one-form dual to this velocity
± = g(U , ±· ), which has components ±U = (− 1, 0). Note that, promptly from
is U
this deﬁnition,
²± ³
U,U = g (U , U ) = − 1.

Notice also that, as constructed in this section, with σ0 = V and σi = 0,

the volume one-form ± σ has the same components as − V ± U in this frame, and
therefore in any frame ± σ = − V± U . With this identiﬁcation, it is natural to
interpret ±
σ as representing the volume moving into the future, along its rest
frame’s t-axis.
Now picture the volume bounded by A , B , and C as a box whose top, in
the plane A B , is opened, allowing its contents to puff out into the surrounding
space (for simplicity, take a, b, and c to be orthogonal and of length L , so
a = (L , 0, 0), b = (0, L , 0), and c = (0, 0, L )). In a time ±τ , this box lid
moves through time by a displacement T = (±τ , 0). If we now calculate
2
σµ = ² µαβ γ A B T , we ﬁnd σ3 = L ±τ , with the other components zero.
α β γ

This (spacelike) volume, which is parallel to ± C (and thus in the direction of the
basis one-form ± 3
ω ), clearly represents the amount of space-time swept out by
the top of the box, A B , in time ±τ . The one-form ± σ , therefore, represents a
92 4 Energy, Momentum, and Einstein’s Equations

volume of space-time, at a particular point in space-time, which is spacelike or

timelike depending on its orientation.
This volume one-form gives us a way of answering the question (‘how ) much
energy-momentum is inside the wireframe box?’ We can deﬁne a 0 tensor T2

which (you will not at this point be surprised to learn) we can call the ‘energy-
momentum tensor’, the action of which, when given a volume one-form ± σ , is
to produce the quantity of energy-momentum contained within that volume:
pbox = T( ±· , ±σ ). (4.13)
We can see that this makes sense, in a number of ways, using the examples
of ±
σ mentioned previously.
Contracting T with the timelike ± σ , in Eq. (4.13), gives us the 4-momentum
contained within that (spacelike) volume, and directed into the future. The
energy instantaneously contained within the box, which is the zeroth compo-
nent of p box in its rest frame, can be extracted by contracting it with the basis
one-form ± ω which, in this frame, is just − ±
0
U , to give

± ±
E = p box (ω 0) = − p box (U ) = ±, ±
T( U σ ) = + V T(±
U, ±
U ),

which allows us to identify T(± U, ±

U ) as the energy density within the volume
(moving into the future), in the box’s rest frame; similarly pi = − V (± ω ,±
i
U ) is
the momentum density within the box.
Looking now at the spacelike ± σ , we find that p box = L 2 ±τ T(± · ,±
ω 3) , so
that we can identify T(± · ,±
ω3 ) as a flux of momentum through the top of the
A B C box.
Returning to the dust, we can recall the number density N 0 and flux N i of
dust particles, and thus obtain the density and flux of energy-momentum p µN 0
and p µ N i , but these are the components of a tensor: p µN ν = S(± ω µ, ±
ων ) , where
S = p ⊗ N . The tensor S is therefore the energy-momentum tensor for this dust,

as deﬁned by Eq. (4.13), and comparison with Eq. (4.4) shows that this tensor
is exactly the energy-momentum tensor we obtained earlier.

The expression for the volume form, Eq. (4.11), may appear to be
pulled from a hat, but in fact it emerges fairly naturally from a larger
theory of differential forms ; the one-forms we have been using are the simplest
objects in a sequence of n-forms. This ‘exterior calculus’ can be used, amongst
other things, for providing yet another way of discussing differentiation (and
integration) in a curved manifold, alongside the covariant derivative we have
extensively used, and the Lie derivative we have mentioned in passing. MTW
discuss this approach in their chapter 4; Carroll describes them, very lucidly,
in his section 2.9; Schutz (1980) gives an extensive treatment in chapter 4; they
are well covered in other advanced GR textbooks.
[Exercise 4.2]
4.2 The Laws of Physics in Curved Space-time 93

4.1.4 Maxwell’s Equations

For completeness, here are Maxwell’s equations in the form appropriate for
GR. For fuller details, see exercise 25 in Schutz’s §4.10.
Given electromagnetic ﬁelds (E x , E y, E z ) and (B x , B y , B z ), we can deﬁne the
antisymmetric Faraday tensor
⎛ 0 Ex Ey Ez
⎞
⎜
⎜− Ex 0 Bz − By ⎟⎟
F = ⎝ y
−E − Bz 0 Bx ⎠. (4.14)
− Ez By − Bx 0

We can also deﬁne the current vector J = (ρ , jx , j y , jz ) corresponding to a

charge density ρ and current 3-vector j. With these deﬁnitions, Maxwell’s
equations in SR become

F
µν
,ν = 4π J µ (4.15a)
F µν ,λ + F νλ ,µ + F λµ ,ν = 0. (4.15b)

The Faraday tensor F and the energy-momentum tensor T together form the
source for the gravitational ﬁeld. Notwithstanding that, we shall not explicitly
include the Faraday tensor in the discussion that follows.
The Faraday tensor, and Maxwell’s equations, take a particularly compact
and elegant form when expressed in terms of the exterior derivatives mentioned
in passing at the end of Section 4.1.3.
It is possibly worth highlighting that the components of Eq. (4.14) are
manifestly frame-dependent – you can pick a frame where either E or B
disappears:
It is known that Maxwell’s electrodynamics – as usually understood at the present
time – when applied to moving bodies, leads to asymmetries which do not appear
to be inherent in the phenomena. . . . Examples of this sort, together with the
unsuccessful attempts to discover any motion of the earth relatively to the “light
medium,” suggest that the phenomena of electrodynamics as well as of mechanics
possess no properties corresponding to the idea of absolute rest.
(Einstein, 1905, paragraphs 1and 2).

4.2 The Laws of Physics in Curved Space-time

So we now have a way to describe the energy-momentum contained within an
arbitrary distribution of matter and electromagnetic ﬁelds. What we now want
to know is how these relate to the curvature of the space-time they lie within.
94 4 Energy, Momentum, and Einstein’s Equations

4.2.1 Ricci, Bianchi, and Einstein

First we need to establish useful contractions of the curvature tensor. See
Schutz §6.6 for further details of this brief relapse into mathematics.
These contractions are the Ricci tensor , obtained by contracting the full
curvature tensor over its ﬁrst and third indexes,

Rβν ≡ g R αβ µν = R
αµ µ
β µν , (4.16)

and the Ricci scalar obtained by further contracting the Ricci tensor,

R ≡ g β ν R β ν = gβ ν gαµ R αβ µν . (4.17)

Note, from Eq. (3.55a), that the Ricci tensor is symmetric: R αβ = R β α .

By differentiating Eq. (3.54), we can ﬁnd

2R αβ µν,λ = gαν ,β µλ − gαµ,β νλ + g β µ,ανλ − g β ν,αµλ , (4.18)

and noting that partial derivatives commute, deduce

R αβ µν,λ + R αβ λµ,ν + R αβ νλ,µ = 0. (4.19)

Recall that Eq. (3.54) was evaluated in LIF coordinates; however, since in these
µ µ
coordinates ³αβ = 0 (though ³αβ,σ need not be zero), partial differentiation
and covariant differentiation are equivalent, and Eq. (4.19) can be rewritten

R αβ µν ;λ + R αβ λµ;ν + R αβ νλ;µ = 0, (4.20)

which is a tensor equation, known as the Bianchi identities .

If we perform the Ricci contraction of Eq. (4.16) on the Bianchi identities,
we obtain
R β ν ;λ − R β λ;ν + R µβ νλ;µ = 0, (4.21)

and if we contract this in turn, we ﬁnd the contracted Bianchi identity

G αβ ;β = 0, (4.22)

where the (symmetric) Einstein tensor G is deﬁned as

G αβ ≡ R αβ − 21 gαβ R . (4.23)

From its name, and the alluring property Eq. (4.22), you can guess that this
tensor turns out to be particularly important for us. There are some further
remarks about the Ricci tensor in Section 4.2.5. Some texts deﬁne the Einstein
and Ricci tensors with a different overall sign; see Section 1.4.3.
Anyway, back to the physics.
4.2 The Laws of Physics in Curved Space-time 95

4.2.2 The Equivalence Principle

Back in Chapter 1, we ﬁrst described the equivalence principle (EP). It is now
ﬁnally time to use this, and to restate it in terms that take advantage of the
mathematical work we have done. The material in this section is well-discussed
in Schutz §7.1, as well as in Rindler (2006), at the end of his chapter 1 and
in §§8.9–8.10. It is discussed, one way or another, in essentially every GR
textbook, with more or less insight, so you can really take your pick.
One statement of the principle (Einstein’s, in fact) is

The Equivalence Principle (EP): All local, freely falling, nonrotating

laboratories are fully equivalent for the performance of all physical experiments.

Rindler refers to this as the ‘strong’ EP, and discusses it under that title with
characteristic care, distinguishing it from the ‘semistrong’ EP, and the ‘weak’
EP, which is the statement that inertial and gravitational mass are the same.
The EP gives us a route from the physics we understand to the physics
we don’t (yet). That is, given that we understand how to do physics in the
inertial frames of SR, we can import this understanding into the apparently
very different world of curved – possibly completely round the twist – space-
times, since the EP tells us that physics works locally in exactly the same way
in any LIF, free-falling in a curved space-time.
So that tells us that an electric motor, say, will work as happily as we
free-fall into a black hole, as it would work in any less doomed SR frame. It
does immediately constrain the general form of physical laws, since it requires
that, whatever their form in general, they must reduce to the SR version when
expressed in the coordinates of a LIF. For example, whatever form Maxwell’s
equations take in a curved space-time, they must reduce to the SR form,
Eq. (4.15), when expressed in the coordinates of any LIF. The same goes for
conservation laws such as Eq. (4.9) or Eq. (4.10). This form of the EP doesn’t,
however, rule out the possibility that the curved space-time law is (much) more
complicated in general, and simply (and even magically) reduces to a simple
SR form when in a LIF. Speciﬁcally, it doesn’t rule out the possibility of cur-
vature coupling , where the general form of a conservation law such as Eq. (4.9)
has some dependence on the local curvature, which disappears in a LIF.
For that, we need a slightly stronger wording of the EP as quoted earlier in
the section (see Schutz §7.1; Rindler §8.9 quotes this as a ‘reformulation’ of
the EP):

The Strong Equivalence Principle: Any physical law that can be expressed in
tensor notation in SR has exactly the same form in a locally inertial frame of a
curved space-time. (4.24)
96 4 Energy, Momentum, and Einstein’s Equations

The difference here is that this says, in effect, that only geometrical statements
count (this is why we’ve been making such a fuss about the primacy of
geometrical objects, and the relative unimportance of their components,
throughout the book). That is, it says that a SR conservation law such as
Eq. (4.9), T µν , ν = 0, has the same form in a LIF, and as a result, because
covariant differentiation reduces to partial differentiation in the LIF, the partial
derivative here is really just the LIF form of a covariant derivative, and so the
general form of this law is
T µν ;ν = 0, (4.25)
with the comma turning straight into a semicolon, and no extra curvature terms
appearing on the right-hand side . That is why this form of the EP is sometimes
referred to as the ‘ comma-goes-to-semicolon ’ rule.
Note that this comma-goes-to-semicolon is emphatically not what happened
in the step between, for example, Eq. (4.19) and Eq. (4.20), and in various
similar manouvres throughout Chapter 3 (such as before Eq. (3.41) and after
Eq. (3.54)). What was happening there was a mathematical step: covariant
differentiation of a geometrical object is equivalent to partial differentiation
when in a LIF. We have a true statement about partial differentiation in
Eq. (4.19), so the same statement must be true of covariant differentiation; such
a statement in one frame is true in any frame, hence the generality. The Strong
EP comma-goes-to-semicolon rule, on the other hand, is making a physical
statement, namely that the statement of a physical law in a LIF directly implies
a fully covariant law, which is no more complicated .
It is possibly not obvious, but the Strong EP also tells us how matter is
affected by space-time. In SR, a particle at rest in an inertial frame moves
along the time axis of the Minkowski diagram – that is, along the timelike
coordinate direction of the LIF, which is a geodesic. The Strong EP tells us
that the same must be true in GR, so that this picks out the curves generated by
the timelike coordinate of a LIF, which is to say:

Space tells matter how to move: Free-falling particles move on timelike

geodesics of the local space-time. (4.26)

This, like the Strong EP, is a physical statement about our universe, rather than
a mathematical one. ‘We will return to this very important point in the sections
to follow.’ [Exercises 4.3 and 4.4]

4.2.3 Geodesics and the Link to ‘Gravity’

We should say a little more about the rather bald statement (4.26).
This statement describes the motion of a particle in a particular space-time.
If you want to describe or predict the motion of a particle, you do it in two
4.2 The Laws of Physics in Curved Space-time 97

steps. First, you work out which geodesic it will travel along: this involves
solving Einstein’s equations, and working out from the initial conditions of
the motion which geodesic your particle is actually on, amongst the large
number of possible geodesics going through the initial point in space-time.
Secondly, you work out how to translate from the simple motion in the inertial
coordinates attached to the particle, to the coordinates of interest (presumably
attached to you).
The key thing on the way to the important insight here, is to note that
if you’re moving along a geodesic – if you’re in free fall – you are not
being accelerated , in the very practical sense that if you were carrying an
accelerometer, it would register no acceleration. If instead you stand still on
earth, and drop a ball from your hand, the ball is showing the path you would
have taken, were it not for the floor. That is, it is the force exerted by the floor
on your feet that is accelerating you away from your ‘natural’ free-fall path. If
you hold an accelerometer in your hand – for example, a mass on a spring –
you can see your acceleration register as the spring extends beyond the length
it would have in free fall.
In other words, we’ve been thinking of this situation backwards. We’re
used to standing on the ground being the normal state, and falling being the
exceptional one (we’re primates, after all, and not falling out of trees has long
been regarded as a key skill). But GR says that we’ve got that inside out:
inertial motion, which in the presence of masses we recognise as free fall,
is the simplest, or normal, or ‘natural’ motion, requiring no explanation, and it
is not-falling that has to be explained. 3 The EP says that the force of gravity
doesn’t just feel like being forced upwards by the floor, it is being accelerated
upwards by the floor.

4.2.4 Einstein’s Equations

We have, in the run-up to statement (4.26), worked out how space-time affects
the motion of matter. We now have to work out how matter affects space-
time – where does ‘gravity’ come from? We can’t deduce this from anywhere;
we can simply make intelligent guesses about it, based on our experience of
other parts of physics – see Ptolemy’s remarks about this at the beginning
of this chapter – and hope that our (mathematical) deductions from these are
corroborated, or not, by experiment. Thus our goal in this section is to make

3 This term ‘natural motion’ is clearly not being used in a technical sense. The history of physics
might be said to consist of a sequence of attempts – by Aristotle, Ptolemy, Kepler, Galileo,
Newton, and Einstein – to identify a successively reﬁned idea of ‘natural motion’ which
adequately and fundamentally explains the observed behaviour of the cosmos. Currently ‘move
along your locally-minkowskian t-axis’ is it.
98 4 Energy, Momentum, and Einstein’s Equations

Einstein’s equations plausible . Schutz does this in his §§8.1–8.2; Rindler does
it very well in his §§8.2 and 8.10; essentially every textbook on GR does it in
one way or another, either heuristically or axiomatically.
Newton’s theory of gravity can be expressed in terms of a gravitational
field φ . The gravitational force f on a test particle of mass m is a three-vector
with components fi = − m φ,i , and the source of the field is mass density ρ ,
with the field equation connecting the two being
,i
φ ,i = 4π G ρ (4.27)
(with the sum being taken over the three space indexes, and where φ,i ,i =
g ij φ,ij = gij ∂ 2φ/∂ x i ∂ x j ). This is Poisson’s equation. In a region that does not
contain any matter – for example an area of space that is not inside a star or a
planet or a person – the mass density ρ = 0, and the vacuum field equations are
,i
φ ,i = 0. (4.28)
Now cast your mind back to Chapter 1, and the expression in the notes
there for the acceleration towards each other of two free-falling particles. This
expression can be slightly generalised and rewritten here as
d2 ξ i
= − φ ,i ,j ξ j . (4.29)
d t2
But compare this with Eq. (3.60): they are both equations of geodesic
deviation, suggesting that the tensor represented by R µανβ U α U β is analogous
to φ,i ,j (we’ve used the symmetries of the curvature tensor to swap two indexes,
note, and used U rather than X to refer to the free-falling particle velocity).
Since the particle velocities are arbitrary, that means, in turn, that the φ,i ,i
appearing in Poisson’s equation is analogous to R αβ = R µ αµβ , and so a good
guess at the relativistic analogue of Eq. (4.28) is
R µν = 0. (4.30)
This guess turns out to have ample physical support, and Eq. (4.30) is known
as Einstein’s vacuum field equation for GR.
If R µν = 0, then R = gµν R µν = 0 and therefore
G µν = R µν − 21 Rg µν = 0.

So much for the vacuum equations, but we want to know how space-time is
affected by matter. We can’t relate it simply to ρ , since Section 4.1.2 made it
clear that this was a frame-dependent quantity; the ﬁeld is much more likely to
be somehow bound to the E-M tensor T instead. Looking back at Eq. (4.27),
we might guess
4.2 The Laws of Physics in Curved Space-time 99

R µν = κT
µν
(4.31)
as the field equations in the presence of matter, where κ is some coupling
constant, analogous to the newtonian gravitational constant G . This looks plau-
sible, but the conservation law Eq. (4.25) immediately implies that R µν ;ν = 0,
which, using the Bianchi identity Eq. (4.22), in turn implies that R ;ν = 0.
But if we use Eq. (4.31) again, this means that (g αβ T αβ );ν = 0 also. If we
look back to, for example, Eq. (4.8), we see that this field equation, Eq. (4.31),
would imply that the universe has a constant density. Which is not the case. So
Eq. (4.31) cannot be true.4
So how about
G µν = κT
µν
(4.32)
as an alternative? The Bianchi identity Eq. (4.22) tells us that the conservation
equation T µν ;ν = 0 is satisfied identically. Additionally – and this is the key
part of the argument – numerous experiments tell us that Eq. (4.32) has so-far
undisputed physical validity: it has not been shown to be incompatible with our
universe. It is known as the Einstein field equation , and allows us to complete
the other half of the famous slogan
Space tells matter how to move – the statement (4.26) plus
equations (3.42) or (3.43). And matter tells space how to curve –
equation (4.32).
Einstein first published these equations in a series of papers delivered to the
Prussian Academy of Sciences in November 1915; there is a detailed account
of Einstein’s actual sequence of ideas, which is slightly (but, remarkably, only
slightly) more tentative than the description in this section may suggest, in
Janssen and Renn (2015).
There are two further points to make, both relating to the arbitrariness that
is evident in our justification of Eq. (4.32).
The first is to acknowledge that, although we were forced to go from
Eq. (4.31) to Eq. (4.32) by the observation that the universe is in fact lumpy,
there is nothing other than Occam’s razor that forces us to stop adding
complication when we arrive at Einstein’s equations. There have been various
attempts to play with more elaborate theories of gravity, but almost none so
far that have acquired experimental support. Chandrasekhar’s words on this,
quoted in Schutz §8.1, are good:
4 This argument comes from §8.10 of Rindler (2006); Schutz has a more mathematical argument
in his §8.1. Which you prefer is a matter of taste, but in keeping with our attempt to talk about
physics in this chapter, we’ll prefer the Rindler version for now.
100 4 Energy, Momentum, and Einstein’s Equations

The element of controversy and doubt, that have continued to shroud

the general theory of relativity to this day, derives precisely from this
fact, namely that in the formulation of his theory Einstein incorporates
aesthetic criteria; and every critic feels that he is entitled to his own
differing aesthetic and philosophic criteria. Let me simply say that I do
not share these doubts; and I shall leave it at that.
The one variation of Einstein’s equation that is now being taken seriously
is one that Einstein himself reluctantly suggested. Since g αβ;µ = 0 identically,
we can add any constant multiple of the metric to the Einstein tensor without
disturbing the right-hand side of Eq. (4.32). Specifically, we can write
G
µν
+ ´g
µν
= κT
µν
. (4.33)
The extra term is referred to as the cosmological constant . Einstein introduced
it in order to permit a static solution to the field equations, but the experimental
evidence for the big bang showed that this was not in fact a requirement, and
the parameter ´ was determined to be vanishingly small. Much more recently,
however, studies of dark matter and the cosmic energy budget have shown that
the large-scale structure of the universe is not completely determined by its
matter content, baryonic or otherwise, and so ´, in the form of ‘dark energy’,
is now again the subject of detailed study.
The results of NASA’s WMAP mission were the first, in 2003, to show
that such a cosmological term, related to a dark energy field, is a necessary
addition to Einstein’s equations of Eq. (4.32) in order to match the universe we
find ourselves in. Subsequent results have not changed this conclusion.

4.2.5 Degrees of Freedom

Einstein’s equation 5 constitutes ten second-order nonlinear differential equa-
tions (ten since there are only ten independent components in the Einstein
tensor), which reduce to six independent equations when we take account of
the four differential identities of Eq. (4.22). Between them, these determine six
of the ten independent components of the metric gµν , with the remaining four
degrees of freedom corresponding to changes in the four coordinate functions
x µ(P ) (trivially, for example, I can decide to mark my coordinates in feet rather
than metres, or rotate the coordinate frame, without changing the physics).
The nonlinearity (meaning that adding together two solutions to the equation
does not produce another solution) is what allows space-time to couple to
5 As with the uncertain pluralisation of the Christoffel symbol(s), authors refer to Eq. (4.32) as
both the Einstein equation and equations.
4.2 The Laws of Physics in Curved Space-time 101

itself without the presence of any curvature terms in the energy-momentum

tensor (which acts as the source of the field); it is also what makes Eq. (4.32)
devilishly difficult to solve, and Appendix B is devoted to examining some of
the solutions that have been derived over the years.
This, ultimately, is why we are so interested in the Ricci tensor, which
otherwise seems like a bit of a mathematical curiosity in Section 4.2.1. It is
the Ricci tensor, via the Einstein tensor, which is constrained by the (physically
motivated) expression Eq. (4.32). The Riemann tensor shares these constraints,
but has additional degrees of freedom that are of limited physical significance.

The identities Eq. (3.55) together reduce the number of independent

components of the curvature tensor (with a metric connection) from
256 (44 ) to 20. That corresponds to there being 20 independent second deriva-
tives of the metric g αβ ,μν rather than 100 (100 since both the metric and partial
differentiation are symmetric). See Exercise 4.5, and the end of Schutz §6.2.
[Exercise 4.5]

4.2.6 The Field Equations from a Variational Principle

The account earlier in the section, of how we obtain Einstein’s equations, is
pragmatic, and broadly follows Einstein’s own approach to obtaining them. As
well, it conveniently introduces the idea of the energy-momentum tensor, and
lets us develop some intuitions about it. It is not the only way to obtain the
equations, however.
In Section 3.4.1, we saw, in passing, how we could obtain the geodesic
equation by extremising the integrated length of proper distance between two
points, ds = | gμν dx μdx ν | 1/2 . We can do something very similar with the
Einstein–Hilbert action ,
1
´
S= R (− g )1/ 2d4 x. (4.34)
16π µ

Here, R is the Ricci curvature scalar of Eq. (4.17), g is the determinant of

the metric, and the volume of integration, µ is the region interior to some
boundary where we can take the variation to be zero. The Ricci scalar is
a simple object – a scalar ﬁeld on space-time – that characterises the local
curvature at each point. Under a change of basis, the volume element d4 x is
scaled by a factor of the jacobian | ∂ x μ̄/∂ xμ | , and the determinant g by a factor
√
of (jacobian )− 2 (these are ‘tensor densities’), so that the quantity − g d4x is a
scalar. Such volume elements, and the associated tensor densities, also appear
in the analysis of the volume elements of Section 4.1.3.
102 4 Energy, Momentum, and Einstein’s Equations

We assume that this action, S , is extremised by the variation δ g μν in the

metric. This is a physical statement, and it is startling that such a simple
statement – almost the simplest dimensionally consistent non-trivial statement
we can make with these raw materials – combined with the very profound ideas
of the calculus of variations, can lead us to the Einstein equations.
Calculating the variation, δ S , resulting from a variation δ g μν , we ﬁnd
´ √ µ ¶
δS = d4 x − g R μν − 1
2 Rg μν δ g
μν
(4.35)

(the calculation is not long, but is somewhat tricky, and is described in Carroll
[2004, §4.3], and in MTW [1973, box 17.2 and chapter 21]). You will recognise
the term in square brackets from Eq. (4.23); requiring that δ S = 0 for all
variations g μν therefore implies that

G μν = 0,

recovering Einstein’s vacuum ﬁeld equations.

We can add a second term SM to the action, which depends on the energy-
momentum content of the space-time volume, then perform the same calcula-
tion, and discover the ﬁeld equations in the presence of matter. Choosing what
that term S M should be is of course an intricate matter, but if we obtain from it
the tensor
1 δS M
T μν = − 2 √ ,
− g δg μν
then we can recover the Einstein equations of Eq. (4.32).

4.3 The Newtonian Limit

We cannot finish this book without using at least one physical metric, and the
one we shall briefly examine is the metric in the weak field limit , where space-
time is curved only slightly, such as round a small object like the earth.
Before we do that we need to get units straight, and recap Section 1.4.1. In
SR we chose our unit of time to be the metre, and we followed that convention
in this book. That meant that the speed of light c was dimensionless and exact:

1 = c = 299 792 458 m s− 1 .

In gravitational physics, we use natural units, for much the same reason. In SI
units, Newton’s gravitational constant has the dimensions [ G ] = kg − 1m3 s− 2 ,
but it is convenient in GR to have G dimensionless, and to this end we choose
4.3 The Newtonian Limit 103

our unit of mass to be the metre, with the conversion factor between this and
the other mass unit, kg, obtained by:
G
1= = 7.425 × 10− 28 m kg− 1 .
c2
See Schutz’s §8.1 and Exercise 1.3 for a table of physical values in these
units. Measuring masses in metres turns out unexpectedly intuitive: when you
learn about (Schwarzschild) black holes you discover that the radius of the
event horizon of an object is twice the value of the object’s mass expressed
in metres. Also, within the solar system, the mass of the sun is less precisely
measurable than the value of the ‘heliocentric gravitational constant’, GM ² ,
which has units of m3 s− 2 in SI units, and thus units of metres in natural units
(the ‘gravitational radius’ of the sun GM ² is known to one part in 1010, but
since G is known only to one part in 104 or so, the value of M ² in kg has the
same uncertainty).
In the weak-field approximation, we take the space-time round a small
object to be nearly minkowskian, with
g αβ = ηαβ + h αβ , (4.36)
where | h αβ | ³ 1, and the matrix ηαβ is the matrix of components of the
metric in Minkowski space. Note that Eq. (4.36), defining hαβ , is a matrix
equation, rather than a tensor one: we are choosing coordinates in which
the matrix of components g αβ of the metric tensor g is approximately equal
to ηαβ . If we Lorentz-transform Eq. (4.36) – using the ´αᾱ of SR, for which
β
ηᾱ β̄ = ´ᾱ ´ ηαβ – we get an equation of the same form as Eq. (4.36), but
α
β̄
in the new coordinates; that is, the components h αβ transform as if they were
the components of a tensor in SR. This allows us to express R α β μν , R αβ and
G αβ , and thus Einstein’s equation itself, in terms of h αβ plus corrections of
order | h αβ | 2. The picture here is that gαβ is the result of a perturbation on
flat (Minkowski) space-time, and that h (which encodes that perturbation) is
a tensor in Minkowski space: expressing Einstein’s equations in terms of h
(accurate to first order in h αβ ) gives us a mathematically tractable problem to
solve.
The next step is to observe that in the newtonian limit, which is the limit
where Newton’s gravity works, the gravitational potential | φ | ³ 1 and speeds
| v| ³ 1. This implies that | T 00| ´ | T 0i | ´ | T ij | (because T 00 ∝ m , T 0i ∝ v i
and T ij ∝ vi v j , with v earth ≈ 10− 4 ). We then identify T 00 = ρ + O (ρ v 2 ). By
matching the resulting form of Einstein’s equation with Newton’s equation for
gravity, we fix the constant κ in Eq. (4.32), so that, in geometrical units,
G μν = 8π T μν . (4.37)
104 4 Energy, Momentum, and Einstein’s Equations

The solution to this equation, in this approximation, is

h 00 = h11 = h 22 = h 33 = − 2φ , (4.38)
which translates into a metric for newtonian space-time
g → diag(− (1 + 2φ), 1 − 2φ , 1 − 2φ , 1 − 2φ), (4.39a)
which can be alternatively written as the interval
ds 2 = − (1 + 2φ)dt 2 + (1 − 2φ)(dx 2 + dy 2 + dz 2). (4.39b)

See Schutz §§8.3–8.4 for the slightly intricate details of this deriva-
tion to Eq. (4.39), and see his §7.2 for the derivation of the newtonian
geodesics of Section 4.3.2. Carroll (2004) gives an overlapping account of the
same material in §4.1 and (very usefully, with more technical background)
§7.1. We return to this approximation in Appendix B.

4.3.1 Why Is h Not a Tensor? A Digression on Gauge

Symmetries
We said, in the discussion after Eq. (4.36), that h αβ is not a tensor, even though
it looks like one, and in leading up to Eq. (4.39) we have treated it as one.
Although Eq. (4.36) is a tensor equation (there does exist a tensor g − η ) this
is only a useful thing to do in a coordinate system in which | hαβ ³ 1| . In that
coordinate system, the approximations Eq. (4.38) and Eq. (4.39) are true to
ﬁrst order in h αβ . That is, the components of h αβ , as approximated, cannot be
transformed into another coordinate system with an arbitrary transformation
matrix ´ , and result in correct expressions.
There is a set of transformations that preserves the approximation, however,
and it’s useful to think a little more about this. Intuitively, a transformation to
any other coordinate system in which | hᾱβ̄ ³ 1| would produce an equivalent
result. If we restrict ´ to the Lorentz transformations of SR, then the metric ηαβ
β
will be invariant, and the components h ᾱβ̄ = ´αᾱ ´ h αβ will be small for small
β̄
velocities: the approximation is still good in these new coordinates. We can
think of hαβ as being a tensor within a background (Minkowski) space.
In slightly more formal terms – and see also Carroll (2004, section 7.1) –
we can consider a vector ﬁeld ξ μ (x ) on the background Minkowski space, and
use this to generate a change from one coordinate system to another:
x ᾱ = x α + ξ (x ) ,
α
(4.40)
4.3 The Newtonian Limit 105

giving
∂xα
= α
δᾱ − ξ ,ᾱ .
α

∂ x ᾱ

Thus the metric in these new coordinates is

α β
∂x ∂x
g ᾱβ̄ = g αβ
∂ x ᾱ ∂ xβ̄
β
= α
(δᾱ − α
ξ ,ᾱ )(δ
β̄
− ξ
β
,β̄ )(ηαβ + hαβ ).

If we restrict the ξ to those for which α

ξ ,ᾱ ³ 1, and retain only terms of
leading order, then
g ᾱβ̄ = ηᾱ β̄ + (hᾱβ̄ − ξᾱ ,β̄ − ξ β̄ ,ᾱ ). (4.41)

Since ξ α ,ᾱ ³ 1, this has the same form as Eq. (4.36), meaning that vectors ξ ,
which are ‘small’ in the sense discussed in this section, generate a family of
coordinate systems, in all of which the metric is a perturbation (| hαβ | ³ 1) on
a background minkowskian space.6
You may also see this written using a ‘ symmetrisation ’ notation,
A ( ij) ≡ (A ij + A ji )/2, (4.42)
which lets us write
g ᾱβ̄ = ηᾱ β̄ + h ᾱβ̄ − 2ξ (ᾱ,β̄ ).
There is a corresponding antisymmetrisation notation A [ij ] ≡ (A ij − A ji )/2.
From Eq. (3.54), we promptly find
2R αβ γ δ = g αδ ,β γ − gαγ ,β δ + g β γ ,αδ − gβ δ ,αγ (4.43)
and (as you can fairly straightforwardly confirm) this does not change under
the transformation h αβ µ→ h αβ − 2ξ (α,β ) .
This situation – in which we have identified a subspace of the general
problem, in which the calculations are simpler, and physical quantities such
as the Riemann tensor are invariant – is characteristic of a problem with a
gauge invariance . If I describe and then solve a problem in classical newtonian
mechanics, the dynamics of my solution will not change if I move the origin
of my coordinates, or change units from metres to feet – that is, if I ‘re-gauge’
the solution. A more mathematical way of putting this is that the Lagrangian
is symmetric under the corresponding coordinate transformation, or that the

6 The diffeomorphism in Eq. (4.41) is related to the Lie derivative mentioned in passing in
section Section 3.3.2, which is in turn related to the idea of moving along the integral curves of
the vector ﬁeld ξ μ ( x) .
106 4 Energy, Momentum, and Einstein’s Equations

degree of freedom that that transformation represents is not dynamically

significant; and this gives me the freedom to select coordinates, from the
continuous space of equivalent alternatives, in which the solution is easiest.
You are trained to ‘pick the right coordinates’ from the earliest stages of your
education in physics. In talking about the not-quite-tensor h αβ we are not
picking a particular coordinate system, but instead implicitly identifying, in
the set of metrics perturbatively different from the flat Minkowski metric, a set
of equivalent coordinate systems. In particular, this set includes any coordinate
system in which the nearly minkowskian space of interest is described, as in
Eq. (4.36), by a minkowskian space plus small corrections.
You do the same thing when you discover that the predictions of Maxwell’s
equations are unchanged if you transform the magnetic vector potential A
and electric potential φ by picking a function ψ (x, t) and changing A µ→
A + ∇ ψ and φ µ→ φ − ∂ ψ/∂ t. If, rather than using arbitrary ψ , we
restrict ourselves to functions ψ that are such that ∇ · A + ∂ φ/∂ t = 0,
then the remaining calculations become simpler; this is the Lorenz gauge
condition.7 This deliberate restriction of ourselves to only a subset of the
possible functions A and φ, or of picking coordinates so that | hαβ ³ 1| , is
a more sophisticated version of ‘picking the right coordinates’, with the same
motivation. We pick up this discussion of gauge conditions in Section B.3.
In summary, therefore: the decomposition of Eq. (4.36) can be viewed
either in terms of tensors in a background (flat) space-time (as discussed
earlier in this section), or as exploitation of a gauge freedom in GR. Because
of the coordinate invariance of GR, we are free to choose coordinates (i.e.,
choose a gauge) in which the matrix hαβ has desirable (i.e., simplifying)
properties. The details omitted here are to do with identifying what the
desirable simplifications are, and proving that a suitable choice of coordinates
is indeed always possible. The solution to Eq. (4.37) in terms of h can then be
fairly directly shown to be Eq. (4.38).

4.3.2 Geodesics in Newtonian Space-time

What are the geodesics in this space-time? The geodesic equation is ∇ U U = 0.
This geodesic curve has afﬁne parameter τ , but by rescaling this parameter
though an afﬁne transformation (τ µ→ τ /m ), we can express this in terms of

7 Note the spelling: the Lorenz gauge is named after the Danish physicist Ludvig Lorenz, who is
different from the Dutch physicist Hendrik Antoon Lorentz, after whom the (Poincaré–
Larmor–FitzGerald–)Lorentz transformation is named. Your confusion about this is widely
shared, possibly even by Lorentz (Nevels & Shin 2001). It doesn’t help that there is also a
Lorenz–Lorentz relation in optics, associated with both of them, in one order or another.
4.3 The Newtonian Limit 107

the momentum p = mU . This has the advantage that the resulting geodesic
equation
∇ pp = 0 (4.44)

is also valid for photons, which have a well-deﬁned momentum even though
they have no mass m . We shall now solve this equation, to ﬁnd the path of a
free-falling particle through this space-time.
The component form of Eq. (4.44) is

p α pμ ;α = 0, (4.45)

or
p α p μ, α + p β = 0.
μ
³αβ p
α
(4.46)

If we restrict ourselves to the motion of a non-relativistic particle through this

space-time, we have | p0 | ´ |p i | , and we reduce this equation to
d μ 0 2
p + = 0.
μ
m ³
00
(p ) (4.47)
dτ
The 0–0 Christoffel symbols for this metric, in this approximation, from
Eq. (4.39), are
0
³00 = φ,0 + O (φ 2) (4.48)
i 1 ij
³00 = − 2 (− 2φ),j δ . (4.49)

The 0-th component of Eq. (4.47) then tells us that

dp 0 ∂φ
= −m , (4.50)
dτ ∂τ

so that the energy of the particle in this frame is conserved in a non-time-

dependent ﬁeld (the particle picks up kinetic energy as it falls, and loses
gravitational potential energy). The space component is
dp i
= − m φ ,i , (4.51)
dτ
which is simply Newton’s law of gravitation, f = − m ∇ φ .
Thus we have come a long way in this book, from Special Relativity
back, through Ptolemy and Newton, to well before the place we started. We
have discovered that the universe is simple (the Equivalence Principle and
Eq. (4.32)), and that we are now well placed to look upward and outward,
towards the physical applications of General Relativity.
[Exercises 4.6 and 4.7]
108 4 Energy, Momentum, and Einstein’s Equations

I do not know what I may appear to the world, but to myself I seem
to have been only like a boy playing on the seashore, and diverting
myself in now and then ﬁnding a smoother pebble or a prettier shell than
ordinary, whilst the great ocean of truth lay all undiscovered before me.
Isaac Newton, as quoted in Brewster, Memoirs of the Life, Writings,
and Discoveries of Sir Isaac Newton .

Exercises
Exercise 4.1 (§ 4.1.2) Deduce Eq. (4.7), given that you have only the
tensors U ⊗ U and g = η to work with, that the result must be proportional
to both ρ and p , and that it must be consistent with both Eq. (4.6) and
Eq. (4.4) in the limit p = 0. Thus write down the general expression T =
( aρ + bp ) U ⊗ U + (c ρ + dp ) g and apply the various constraints. Recall that
U = (1, 0) in the MCRF. [u + ]

Exercise 4.2 (§ 4.1.3) Calculate the components of ± σ and A after a Lorentz

² ³
boost of speed v along a. Recall that A μ̄ = ´ μ̄ μ̄
μ A μ
, where ´μ is given by
Eq. (2.34). Verify that ± σ , A = 0 in this frame, too. [d − ]

Exercise 4.3 (§ 4.2.2) What would happen to an electric motor in free fall
across the event horizon of a black hole (ignore any tidal effects)? [d − ]

Exercise 4.4 (§ 4.2.2) At various points in the development of the math-

ematical theory of GR, we pick a coordinate system in which differentiation
is simple, and do a calculation using non-covariant differentiation, indicated
by a comma. We then immediately deduce the covariant result, replacing this
comma with a semicolon.
Recall that, the strong EP is sometimes referred to as the comma-goes-to-
semicolon rule.
Without calculation, explain the logic of each of these replacements of a
comma with a semicolon, putting particular stress on the distinction between
them. [u + ]

Exercise 4.5 (§ 4.2.5) Prove that the curvature tensor has only 20 inde-
pendent components for a 4-dimensional manifold, when you take equations
Eqs. (3.55a) and (3.55b) into account.

Exercise 4.6 (§ 4.3.2) The geodesic equation, in terms of the momentum

one-form ±
p , can be obtained by index-lowering Eq. (4.45) to obtain

pα p β ;α = 0.
4.3 The Newtonian Limit 109

By expanding this, taking advantage of the symmetry of the resulting expres-

sion under index swaps, and using the relation p α d/dx α = m d/dτ , show that
d pα 1
m = β γ
2 g β γ ,α p p . (i)
dτ
You may need the relations
γ
p α;β = p α,β − ±αβ pγ
γ 1 γλ
±αβ = 2
g (gλα ,β + g λβ ,α − g αβ ,λ )
What does Eq. (i) tell you about geodesic motion in a non-time-varying
metric? [u+ ]

Exercise 4.7 (§ 4.3.2) The Schwarzschild metric is

gtt = − (1 − a ) g rr = (1 − a )− 1
2 2 2
gθθ = r g φφ = r sin θ,

where a = 2M /r , M is a constant, and all other metric components are zero.

Calculate the ﬁve non-zero derivatives of the metric.
Using Eq. (i), and by considering the relevant components of dp α/dτ ,
demonstrate that:
1. if a particle is initially moving in the equatorial plane (that is, with
θ = π/2 and p θ = 0), then it remains in that plane;
2. if a particle is released from rest in these coordinates (that is, with
pr = p θ = p φ = 0, and p t ±= 0), it initially moves radially inwards.
Appendix A Special Relativity – A Brief
Introduction

I presume you have studied Special Relativity at some point. This appendix is intended
to remind you of what you learned there, in a way that is notationally and conceptually
homogeneous with the rest of the text here.
This appendix is intended to be standalone, but because it is necessarily rather
compact, you might want to back it up with other reading. Taylor and Wheeler (1992)
is an excellent account of Special Relativity (hereafter ‘SR’), written in a style that
is simultaneously conversational and rigourous (Wheeler, here, is the Wheeler of
MTW (1973)). Rindler (2006) is, as mentioned elsewhere, now slightly old-fashioned
in its treatment of GR, but is extremely thoughtful about the conceptual underpinnings
of SR. In contrast, the ﬁrst chapter of Landau and Lifschitz (1975) gives an admirably
compact account of SR, which would be hard to learn from, but which could consolidate
an understanding otherwise obtained.
There are several popular science books that are about, or that mention, relativity –
these aren’t to be despised just because you’re now doing the subject ‘properly’. These
books tend to ignore any maths, and skip more pedantic detail (so they won’t get you
through an exam), but in exchange they spend their efforts on the underlying ideas.
Those underlying ideas, and developing your intuition about relativity, are things that
can sometimes be forgotten in more formal courses. I’ve always liked Schwartz and
McGuinness (2003), which is a cartoon book but very clear (and I declare a sentimental
attachment, since this is the book where I ﬁrst learned about relativity); these books, like
this appendix, and like many other introductions to relativity, partly follows Einstein’s
own popular account Einstein (1920).

A.1 The Basic Ideas

Relativity is simple. Essentially the only new physics that will be introduced here boils
down to just:
1. All inertial reference frames are equivalent for the performance of all physical
experiments (the Equivalence Principle);
2. The speed of light has the same constant value when measured in any inertial
frame.

110
A.1 The Basic Ideas 111

We must now (a) understand what these two postulates really mean and (b) examine
both their direct consequences, and the way that we have to adjust the physics we
already know.

A.1.1 Events
An ‘event’ in SR is something that happens at a particular place, at a particular instant
of time. The standard examples of events are a ﬂashbulb going off, or an explosion, or
two things colliding.
Note that it is events , and not the reference frames that we are about to mention,
that are primary. Events are real things that happen in the real world; the separations
between events are also real; reference frames are a construct we add to events to allow
us to give them numbers, and to allow us to manipulate and understand them. That is,
events are not ‘relative to an observer’ or ‘frame dependent’ – everyone agrees that an
event happens. SR is about how we reconcile the different measurements of an event,
that different, relatively moving, observers make.

A.1.2 Inertial Reference Frames

A reference frame is simply a method of assigning a position, as a set of numbers,
to events. Whenever you have a coordinate system, you have a reference frame. The
coordinate systems that spring first to mind are possibly the (x, y, z ) or (r, θ , φ) of
physics problems. You can generate an indefinite number of reference frames, fixed
to various things moving in various ways. However, we can pick out some frames as
special, namely those frames that are not accelerating.
Imagine placing a ball at rest on a flat table: you’d expect it to stay in place; if you
roll it across the table, it would move in a straight line. This is merely the expression
of Newton’s first law: ‘bodies move in straight lines at constant velocity, unless acted
on by an external force’. In what circumstances will this not be true?1 If that table is on
board a train that is accelerating out of a station, then the ball will start to roll towards
the back of the train. This observation makes perfect sense from the point of view of
someone on the station platform, who sees the ball as stationary, and the train being
pulled from under it. The station is an inertial frame, and the accelerating train carriage
is not.
This example illustrates what we will make more precise shortly, that position and
speed are frame-dependent quantities, but acceleration is not. If you are sitting in a train
carriage, then the force applied when it accelerates, which you might feel through the
seat or measure using an ‘accelerometer’ such as a plumb line, is frame-independent.
In SR, inertial frames are infinite in extent; also, any pair of inertial frames are
moving with a constant velocity with respect to each other. In GR, in contrast, inertial
frames are necessarily local, in the sense of being meaningful only in the region

1 By restricting ourselves to only horizontal motion, we evade any consideration of gravity. With
that constraint, the definition of ‘inertial frame’ here is consistent with the broader definition
appropriate to GR, which refers to a frame attached to a body in free fall, moving onlyunder the
influence of gravity.
112 A Special Relativity – A Brief Introduction

surrounding a point of interest; and they may be ‘accelerating’ with respect to each
other in the sense that the second derivative of position is non-zero, even though there
is no acceleration measurable in the frame (think of two people in free fall on opposite
sides of the earth).
[Exercise A.1]

A.1.3 Measuring Lengths and Times: Simultaneity

How do we measure times? Einstein put this as well as anyone else in 1905:
We must take into account that all our judgments in which time plays a part
are always judgments of simultaneous events. If, for instance, I say, ‘That train
arrives here at 7 o’clock,’ I mean something like this: ‘The pointing of the
small hand of my watch to 7 and the arrival of the train are simultaneous
events.’ Einstein (1905)
In SR, all observations are of events that are adjacent to us. If two events happen at
the same place and time – for example I set off a ﬁrecracker whilst looking at my
wristwatch,2 or two cars try to occupy the same location at the same time,3 – then they
are simultaneous for any observer who can see it: that metal was bent in a collision
cannot possibly depend on who’s looking at it or how they are moving when they see it.
The space coordinate of the event is given by my (ﬁxed and known) position within a
frame. The time coordinate of an event is the time on my watch when I observe it, and
my watch is synchronised with all the other clocks in the frame (one can go into a great
deal of detail about this synchronisation process; see for example Rindler (2006, §2.5)
and Taylor and Wheeler ( 1992, §2.6)).
If the event happens some distance away, however (answering a question such as
‘what time does the train pass the next signal box?’), or if we want to know what time
was measured by someone in a moving frame (answering, for example, ‘what is the
time on the train driver’s watch as the train passes through the station?’), things are
not so simple, as most of the rest of this text makes clear. We will typically imagine
multiple observers at multiple events; indeed we imagine one local observer per frame
of interest, stationary in that frame, and responsible for reporting the space and time
coordinates of the event ‘as measured in that frame’.
By making only local observations we avoid worrying about light-travel time. To
make observations extended in space or time, we employ multiple observers. For
example, we might measure the ‘length of a rod’ by subtracting the coordinates of two
observers who were adjacent to opposite ends of the rod at the same prearranged time.

A.1.4 Standard Conﬁguration

Finally, a bit of terminology to do with reference frames.
Two frames S and S± , with spatial coordinates (x, y, z) and (x± , y± , z ± ) and time
coordinates t and t± are said to be in standard conﬁguration (Figure A.1) if:

2 An educational experience, in many ways, but one probably best kept as a thought experiment.
3 Ditto.
A.2 The Postulates 113

y y
v
z z
x
x
Figure A.1 Standard conﬁguration.

1. they are aligned so that the (x , y, z) and (x± , y± , z± ) axes are parallel;
2. the frame S± is moving along the x axis with velocity V ;
3. we set the zero of the time coordinates so that the origins coincide at t = t± = 0
(which means that the origin of the S± frame is always at position x = Vt).
When we refer to ‘frame S’ and ‘frame S±’, we will interchangably be referring either
to the frames themselves, or to the sets of coordinates (t, x, y, z) or (t± , x± , y± , z ± ).
Frame S± will often be termed the rest frame; however, it should always be the rest
frame of something. Yes, it does seem a little counterintuitive that it’s the ‘moving
frame’ that’s the rest frame, but it’s called the rest frame because it’s the frame in which
the thing we’re interested in – be it a train carriage or an electron – is at rest. It’s in the
rest frame of the carriage that the carriage is measured to have its rest length or proper
length.

A.2 The Postulates

Galileo described the Principle of Relativity in 1632, in Dialogue Concerning the Two
Chief World Systems, using an elaborate image which invited the reader to imagine
making observations of animals and falling water, and imagining jumping forward and
aft, ﬁrst in a ship in harbour, and then in the ship moving at a constant velocity, and
ﬁnding oneself unable to tell the difference. This is a statement that ‘you can’t tell
you’re moving’ – there is nothing you can do in the ship, or in a train or a plane,
without looking outside, which will let you know whether you’re stationary or moving
at a constant velocity. More formally, and using the language in the previous section:

The Principle of Relativity: All inertial frames are equivalent for the
performance of all physical experiments.
That is, there is no place for the idea of a standard of absolute rest.
From the Relativity Principle (RP), one can show that, with certain obvious (but, as
we shall discover, wrong) assumptions about the nature of space and time, one could
derive the (apparently also rather obvious) Galilean transformation (GT)
x± = x − Vt y± = y z± = z t± = t (A.1)
between two frames in the standard conﬁguration of Section A.1.4. This transformation
relates the coordinates of an event (t, x, y, z ), measured in frame S, to the coordinates of
the same event ( t± , x± , y± , z± ) in frame S± . Differentiating these, we ﬁnd that
114 A Special Relativity – A Brief Introduction

v±x = vx − V v±y = vy v±z = vz a± = a,

where vx is the x-component of velocity, and so on.
If you take the RP as true, then it follows that any putative law of mechanics that
does appear to allow you to distinguish between reference frames cannot in fact be
a law of physics. That is, the RP, in classical mechanics, demands that all laws of
mechanics be covariant under the Galilean Transformation . What that means is that
physical laws take the same form whether they are expressed in the coordinates S or S±,
related by a GT.
Consider for example the constant-acceleration equation x = v0 t + at 2/ 2. If we
transform this into the moving frame using Eq. (A.1), we immediately ﬁnd
x± = v±0t± + a± t± / 2
2

– that is, we find exactly the same relation, as if we had simply put primes on each of the
quantities. This is known as ‘form invariance’, or sometimes ‘covariance’, and indicates
that (in this example) the expressions for x and x± have exactly the same form , with the
only difference being that we have different numerical values for the coefficients and
coordinates.
Maxwell’s equations, however, are not invariant under a GT. The wave equation,
and Maxwell’s equations, do not transform into themselves under a GT, and take their
simplest form (that is, their well-known form) only in a ‘stationary’ frame. Einstein
noted that electrodynamics appeared to be concerned only with relative motion, and
did not take a different form when viewed in a moving frame. His famous 1905 paper
is very clear on this point (‘On the Electrodynamics of Moving Bodies’), and the very
first words of it are:
It is known that Maxwell’s electrodynamics – as usually understood at the
present time – when applied to moving bodies, leads to asymmetries which do
not appear to be inherent in the phenomena. Take, for example, the reciprocal
electrodynamic action of a magnet and a conductor . . . Einstein (1905)
This paper then briskly elevates the principle of relativity to the status of a postulate, and
adds to it a second one, stating that the speed of light has the same value ‘independent
of the state of motion of the emitting body’: no matter what sort of experiment you
are doing, whether you are directly observing the travel time of a flash of light, or
doing some interferometric experiment, the speed of light relative to your apparatus
will always have the same numerical value. This is perfectly independent of how fast
you are moving: it is independent of whichever inertial frame you are in, so that another
observer, measuring the same flash of light from their moving laboratory, will measure
the speed of light relative to their detectors to have exactly the same value.
There is no real way of justifying either of these postulates: it is simply a truth of
our universe, and we can do nothing more than simply demonstrate its truth through
experiment.
[Exercise A.2]

A.2.1 Further Details

It is fairly easy to discuss the transformation properties of the wave equation, slightly
more involved for Maxwell’s Equations. Bell ( 1987, chapter 9) discusses this, or rather
the Lorentz transformation of Maxwell’s Equations, in some depth. More advanced
A.3 Spacetime and the Lorentz Transformation 115

textbooks on electromagnetic theory also tend to have sections on SR, which make this
point more or less emphatically.
The aether drift experiments are discussed in most relativity textbooks. The
sci.physics.relativity FAQ (Roberts and Schleif 2007) provides a large list
of references to experimental corroboration of SR. For an interesting sociological and
historical take on the Michelson-Morley experiments, and the context in which they
were interpreted, see also Collins and Pinch, (1993, chapter 2). Barton (1999, §§3.1
& 3.4) presents the underlying ideas clearly and at length, discusses experimental
corroboration, and provides ample further references.
The constancy of the speed of light is not the only second postulate you could have.
You could take alternatives such as ‘Maxwell’s Equations are true’, or ‘Moving clocks
run slow according to . . . ’, or any other statement that picked out the phenomena of SR,
and you could still derive the results of SR, including, for an encore, the constancy of
the speed of light. However, this particular second postulate is a particularly simple
and fundamental one, which is why it is much the best choice. Alternatively, you
could choose as a second postulate something like ‘c is infinite’ or ‘The Galilean
Transformation is true’, and derive from the pair of postulates the rest of the laws of
classical mechanics. The point here is that each pair of postulates would give you a
perfectly consistent theory – a perfectly possible world – but the galilean transformation
is one that does not happen to match our world other than as a low-speed approximation.
Taking a more mathematical tack, Rindler (2006, §2.17), and Barton less abstractly
(1999, §4.3), shows that the only linear transformations consistent with the Euclidicity
and isotropy of inertial frames are the Galilean and Lorentz Transformations. A
second postulate consisting of ‘there is no upper limit to the speed of propagation of
interactions’ picks out the GT; the statement ‘there is an upper speed limit’ (which
the first postulate implies is the same in all frames) instead picks out the Lorentz
Transformation with a dependence on that constant speed, and saying ‘. . . and light
moves at that speed’ sets the value of the constant. See also Rindler’s other remarks
(2006, §2.7) on the properties of the Lorentz Transformation; Landau and Lifshitz
(1975, §1) take this tack, and are as lucidly compact as ever.
Taking a more historical tack, I have quoted Einstein’s own (translated) words,
not because the argument depends on his authority (it doesn’t), but firstly because he
introduces the key arguments with admirable compactness, and secondly because it
is a very rare example of the first introduction of a core physical theory still being
intelligible after it has been absorbed into the bedrock of physics. [Exercise A.3]

A.3 Spacetime and the Lorentz Transformation

A.3.1 Length Contraction and Time Dilation
Imagine two observers with synchronised watches, standing at each end of a train
carriage: Fred (at the front) and Barbara (at the back). 4 At a prearranged time ‘0’, a

4 This argument ultimately originates from Einstein’s popular book about relativity (Einstein,
1920), ﬁrst published in English in 1920. It reappears in multiple variants, in planes, trains,
automobiles, and rockets, in many popular and professional accounts of relativity. The variant
described here is most directly descended from Rindler’s version (2006).
116 A Special Relativity – A Brief Introduction

3 1 11 ?

1 3 ? 11

Figure A.2 Passing trains: (a, left) ﬂash reaches rear of carriage; (b, right) rear
observers coincide.

flashbulb fires at the centre of the carriage and the observers record the time the flash
reaches them. Since Fred and Barbara are equidistant from the bulb, their times must
be the same, for example time ‘3’ units. In other words, Fred’s and Barbara’s watches
both reading ‘3’, are simultaneous events in the frame of the carriage.
Observing from the platform, we would see the light from the flash move both
forward towards Fred and backwards towards Barbara, but at the same speed c , as
measured on the platform. Consequently, the flash would naturally get to Barbara first.
If, standing on the platform, you were to take a photograph at this point, you would
get something like the upper part of Figure A.2a. Barbara’s watch must read ‘3’, since
the flash meeting her and her watch reading ‘3’ are simultaneous at the same point in
space, and so must be simultaneous for observers in any frame. But at this point, the
light moving towards Fred cannot yet have caught up with him: since the light reaches
Fred when his watch reads ‘3’, his watch must still be reading something less than
that, ‘1’, say. In other words, Barbara’s watch reading ‘3’ and Fred’s watch reading ‘1’
are simultaneous events in the inertial frame of the platform.
Now imagine observing two such trains go past, timetabled such that we can obtain
the observations in Figure A.2a, where the light has reached both rear observers and
neither front one. Now pause a moment, and take another photograph when the two
rear observers are beside each other, this time getting Figure A.2b.
Barbara can report observing the front of the other carriage passing at time ‘3’,
whereas Fred reports the back of that carriage passing earlier, at time ‘1’. They can
therefore conclude that they have measured the length of the other carriage and found
it to be shorter than their own one. This is length contraction.
Similarly, Fred observes the rear clock in Figure A.2a as being two units fast,
compared to his own. But Barbara can later, in Figure A.2b, observe that same clock to
be reading ‘11’ at the same time as hers, no longer fast. They know their own clocks
were synchronised, so they can conclude that the rear clock in the other carriage was
going more slowly than their clocks. This is time dilation.
Notably, Barbara and Fred’s counterparts in the other carriage would come to
precisely the same conclusions. Because this setup is perfectly symmetrical, they would
measure Barbara and Fred’s clocks to be moving slowly, and their carriage to be shorter.
There is no sense in which one of the carriages is absolutely shorter than the other.

A.3.2 The Light Clock

We can put numbers to the effects.
The light clock (see Figure A.3) is an idealised timekeeper, in which a ﬂash of
light leaves a bulb, bounces off a mirror, and returns – this is one ‘tick’ of the
A.3 Spacetime and the Lorentz Transformation 117

ct/2
L L

vt/2
Figure A.3 The light clock, shown in its rest frame (left) and as observed in a
frame in which the clock is moving at speed v (right).

clock. If the mirror and the ﬂashbulb are a distance L apart, then 2L = ct± , where
t± is the time on the watch of an observer standing by, and moving with, the clock,
and c is the frame-independent speed of light. Also note that the clock’s mirrors are
arranged perpendicular to the clock’s motion, and both the stationary and the moving
observer measure the same separation between them – there is no length contraction
perpendicular to the motion.5 Examining the same tick in a frame in which the clock is
moving, we ﬁnd (ct/2)2 = L 2 + (vt /2)2 and thus, using the expression for L above,
t
t± = , (A.2)
γ

where the factor γ = γ (v) is deﬁned as

± ²− 1 2
/
v2
γ = 1− . (A.3)
c2

Now, the important thing about this equation is that it involves t± , the time for the clock
to ‘tick’ as measured by the person standing next to it on the train, and it involves t, the
time as measured by the person on the platform, and they are not the same .

A.3.3 The Invariant Interval

If we take the light ﬂash (call it event 1) to have coordinates t1± = x±1 = 0 in the
clock’s frame, and coordinates t1 = x1 = 0 in the ‘lab’ frame (Section A.1.4), then the
detection of the reﬂection (event 2) has coordinates x±2 = 0, t±2 = 2L /c, x2 = vt2 and
t2 = γ t±2 = 2γ L / c. If we now calculate ±x±2 − c2 ±t±2 or ±x2 − c2 ±t2 we obtain (2L) 2
in both cases. This is not a coincidence.
In general, if we calculate the corresponding separation between any two events,
then we will obtain the same value irrespective of which frame’s coordinates we use.
This quantity is frame invariant. We have illustrated this here rather than proved it, but
there is a proof, which depends only on the two postulates of SR, in Schutz (2009, §1.6)

5 If there were a perpendicular length contraction, then observers would see a passing relativistic
train’s axles contract so that they derailed inside the tracks; however observers on the train
would see the passing sleepers contract, so that the train would derail outside the tracks; the
contradiction implies there can be no such contraction.
118 A Special Relativity – A Brief Introduction

or in Landau and Lifshitz (1975). Alternatively, assuming this in place of the second
postulate would allow us to deduce the constancy of the speed of light.
This quantity ±s2 = ±x 2 − c 2±t2 is referred to as the interval , or sometimes,
interchangeably, as the squared interval or the invariant interval.
From here on, we will handle coordinates only in natural units, in which c = 1,
with the result that we deﬁne
±s
2 = ±x
2 − ±t
2. (A.4)

Some authors deﬁne the interval with the opposite sign. The deﬁnition here is
compatible with Schutz but opposite to Rindler.

A.3.4 The Lorentz Transformation

We have now obtained results describing length contraction and time dilation. We now
wish to find a way of relating the coordinates of any event, as measured in any pair of
frames in relative motion. That relation – a transformation from one coordinate system
to another – is the Lorentz Transformation (LT).
Specifically, consider two frames in standard configuration, and imagine an event
such as a flashbulb going off; observers in the frames S and S± will give this event
different coordinates (t, x, y , z ), and (t± , x± , y± , z± ). How are these coordinates related?
First, we can note that y± = y and z ± = z (this is just a restatement of the lack of
a perpendicular length contraction, discussed in Section A.3.2). Therefore we can take
the event to happen on the x-axis, at y = z = 0.
Now imagine a second event, located at the origin with coordinates (t, x) = (0, 0 ) in
frame S and (t± , x± ) = (0, 0 ) in frame S± . Since we have two events, we have an interval
between them, with the value s2 = (x − 0)2 − (t − 0)2 = x2 − t2 in frame S. Since
the interval is frame-independent, the calculation of this interval done by the observer
in the primed frame will produce the same value:

x2 − t2 = s2 = x±2 − t± 2. (A.5)

Thus the relationship between (t, x) and (t ±, x± ) must be one for which Eq. (A.5) is true.
If we consider two frames on the xy plane related by a rotation, then their coordinates
will be related by
x± = x cos θ + y sin θ
± (A.6)
y = − x sin θ + y cos θ .

and this will preserve the euclidean distance r 2 = x2 + y2 . This is strongly reminiscent
of Eq. (A.5), and we can make it more so by writing l = it and l± = it± , so that Eq. (A.5)
becomes
x2 + l2 = s2 = x±2 + l±2 . (A.7)
This strongly suggests that the pairs (l, x) and (l± , x± ) can be related by writing down the
analogue of Eq. (A.6), replacing y ²→ l and y ± ²→ l± , for some angle θ that depends on v,
the relative speed of frame S± in frame S. That is, this speciﬁes a linear relation for which
Eq. (A.5) is true. If we ﬁnally write θ = iφ (since l is pure imaginary, so is θ , so that φ
A.3 Spacetime and the Lorentz Transformation 119

is real), and recall the trigonometric identities sin iφ = i sinh φ and cos iφ = cosh φ ,
then this expression for x± and l± becomes

x± = x cosh φ − t sinh φ (A.8a)

±t = − x sinh φ + t cosh φ . (A.8b)

Now consider an event at x± = 0 for some unknown t±. This happens at time t in
the unprimed frame and thus happens at position x = vt in that frame, in which case
Eq. (A.8a) can be rewritten as

tanh φ (v) = v. (A.9)

Since we now have φ as a function of v, we have, in Eq. (A.8), the full transformation
between the two frames; combining these with a little hyperbolic trigonometry (remem-
ber cosh 2 φ − sinh2 φ = 1), we can rewrite Eq. (A.8) in the more usual form (for c = 1)

t± = − vx)
γ (t (A.10a)
x± = γ (x − vt), (A.10b)

where the trivial transformations for y and z complete the LT, and (as in Eq. (A.3) but
now with c = 1)
( )
γ = 1 − v2 − 1/2 . (A.11)

If frame S± is moving with speed v relative to S, then S must have a speed − v relative
to S± . Swapping the roles of the primed and unprimed frames, the transformation from
frame S± to frame S is exactly the same as Eq. (A.10), but with the opposite sign for v:

t= ± + vx± )
γ (t (A.12a)
x = γ (x± + vt± ), (A.12b)

which can be veriﬁed by direct solution of Eq. (A.10) for the unprimed coordinates.

A more direct, but less physically illuminating, route to the LT is to note

that the transformation from the unprimed to the primed coordinates must be
linear, if the equations of physics are to be invariant under a shift of origin. That is, we
must have a transformation like t± = Ax + By + Cz + Dt , and similarly (with different
coefﬁcients) for x± , y± and z ±. By using RP and the constancy of the speed of light, one
can deduce the transformation given in Eq. (A.10). See Rindler (2006, §2.6), Taylor and
Wheeler (1992, §§L.4–L.5), or Barton (1999, §4.3) for details. Rindler (2006, §2.17)
shows an even more powerful consequence of the same ideas. If we use this argument
to deduce that the relationship must be x± = γ (x − vt), for some unknown constant γ
(Rindler §2.6), then we can instead deduce the LT, and the expression for γ , from the
construction in Section A.3.3 by considering an event ³4 located at the front of the
carriage at time t4 = 0.
120 A Special Relativity – A Brief Introduction

A.3.5 Addition of Velocities

Adding and subtracting the expressions in Eq. (A.8), and recalling that e´ φ = cosh φ ´
sinh φ , we ﬁnd
t± − x± = eφ (t − x) (A.13a)
t± + x± = e− φ (t + x), (A.13b)
as yet another form (once we add y± = y and z± = z ) of the LT.
If we now have three frames, S, S± and S±± where the mutual velocities of S± vs. S, and
S vs. S±, are v and v respectively, how does the velocity v of S±± in S depend on v
±±
1 2 1
and v2 (being in standard conﬁguration implies that v1 is parallel to v2)? Applying
Eq. (A.13a) twice produces
t±± − x±± = eφ (t − x) = eφ1 + φ2 ( t − x) , (A.14)
where φ, φ1 , and φ2 are the hyperbolic velocity parameters corresponding to v, v1
and v2 . This shows us how to add velocities: Eq. (A.9) plus a little more hyperbolic
trigonometry (tanh(φ1 + φ2 ) = (tanh φ1 + tanh φ2 )/(1 + tanh φ1 tanh φ2 )) produces
v + v2
v= 1 . (A.15)
1 + v1 v2

The form of the LT shown in Eq. (A.13), and the addition law in Eq. (A.14),
conveniently indicate three interesting things about the LT: (i) for any two
transformations performed one after the other, there exists a third with the same net
effect (i.e., the LT is ‘transitive’); (ii) there exists a transformation (with φ = 0) that
maps ( t, x) to themselves (i.e., there exists an identity transformation); (iii) for every
transformation (with φ = φ1, say) there exists another transformation (with φ = − φ1)
that results in the identity transformation (i.e., there exists an inverse). These three
properties are enough to indicate that the LT is an example of a mathematical ‘group’,
known as the Lorentz group.
Any two IFs, not just those in standard conﬁguration, may be related via a
sequence of transformations, namely a translation to move the origin, a rota-
tion to align the axes along the direction of motion, a LT, another rotation, and another
translation. The transformation that augments the LT with translations and rotations is
known as a Poincaré transformation, and it is a member of the Poincaré group.

A.3.6 Proper Time and the Invariant Interval

In some frame S, consider two events, one at the origin, and one on the x -axis at
coordinates (t, x); take the two events to be timelike-separated, so that it is possible
for some clock, moving at a constant velocity v < c, to be present at both events. In a
frame attached to the clock, the spatial separation of the two events is zero, and the time
coordinate in this frame is the time shown on the clock, which we shall label τ and call
the proper time. Given the coordinates (t, x) in S, we can use Eq. (A.10a) to transform
to the clock’s rest frame and write
τ = γ (t − vx). (A.16)
A.4 Vectors, Kinematics, and Dynamics 121

This calculated proper time would be agreed on by all the observers who could not agree
on the spatial or temporal separations of the two events. In other words, this number τ
is invariant under a LT – it is a Lorentz scalar.
In the clock’s frame, the interval separating these two events is just s2 = − τ 2, so
that the invariance of the proper time is just another manifestation of the invariance of
the interval, Eq. (A.4).

A.4 Vectors, Kinematics, and Dynamics

Section A.3 was concerned with static events as observed from moving frames. In this
part, we are concerned with particle motion.

A.4.1 Kinematics
Consider a prototype displacement vector (±x, ±y, ±z ). These are the components
of a vector with respect to the usual axes ex , ey , and ez . We can rotate these into
new axes e ±x, e±y , and e±z using one or more rotation matrices, and obtain coordinates
± ± ±
(±x , ±y , ±z ) for the same displacement vector, with respect to the new axes. These
primed coordinates are different from the unprimed ones, but systematically related
to them.
In four-dimensional spacetime, the prototype displacement 4-vector is ±R =
(±t, ±x, ±y, ±z), relative to the space axes and wristwatch of a speciﬁc observer,
and the transformation that takes one 4-vector into another is the familiar LT of
Eq. (A.10), or
⎛ ±⎞ ⎛ ⎞⎛ ⎞
±t γ − γv 0 0 ±t
⎜
⎜ ±x ⎟
± ⎜ 0⎟ ⎜ ±x⎟
⎝ ±⎟ ⎠ = ⎜⎝ ⎟⎠ ⎜⎝ ⎟⎠
− γv γ 0
(A.17)
±y 0 0 1 0 ±y
±z
± 0 0 0 1 ±z

with an inverse transformation obtained by swapping v ↔ − v. These give the co-

ordinates of the same displacement as viewed by a second observer whose frame is in
standard configuration with respect to the first.
We recognise as a 4-vector any geometrical object, the components of which,
namely (A 0, A1 , A 2, A3 ), transform in the same way as the coordinate transformation
of Eq. (A.17).
It is no more than fiddly to extend Eq. (A.17) to Lorentz boosts that are not along
the x-axis.
A 3-rotation, when applied to a 3-vector A, conserves the euclidean length-squared
of A, (±x)2 + (±y)2 + (±z )2, or, more generally, conserves the value of the inner
product of two 3-vectors, A · B. In exactly the same way, the transformation Eq. (A.17)
conserves the corresponding inner product for 4-vectors,
A · B = − A0 B0 + A 1B1 + A2 B2 + A 3B 3 (A.18a)
³
= ηµν A
µ
Bν , (A.18b)
µ, ν
122 A Special Relativity – A Brief Introduction

where the matrix ηµν is deﬁned as

= diag(− 1, 1, 1, 1).
ηµν (A.19)
The inner product of a vector with itself, A · A, is its norm (or length-squared), and
from this definition we can see that the norm of the displacement vector is ±R · ±R =
− ±t2 + ±x2 + ±y 2 + ±z2 = ±s2 , compatible with Eq. (A.5).
If the norm is positive, negative, or zero, then the vector (or the displacement) is
termed respectively spacelike, timelike, or null. Two vectors are orthogonal if their
inner product vanishes; it follows that in this geometry, a null vector (with A · A = 0) is
orthogonal to itself.
This matrix η is the metric of SR. Just as with the invariant interval (which is really
just a consequence of this definition), some authors define the metric with the opposite
signs.

A.4.2 Velocity and Acceleration

Since the displacement 4-vector ±R is a vector (in the sense that it transforms properly
according to Eq. (A.17)), so is the inﬁnitesimal displacement dR; since the proper
time τ (see Section A.3.6) is a Lorentz scalar, we can divide each component of this
inﬁnitesimal displacement by the proper time and still have a vector. This latter vector
is the 4-velocity :
± ²
dR d x0 d x1 d x2 d x3
U= = , , , , (A.20)
dτ dτ dτ dτ dτ

where we have written the components of dR as dxµ . We can write this as U µ =

dxµ/dτ . By the same argument, the 4-acceleration Aµ = dUµ/dτ = d2x µ/dτ 2 is a
4-vector, also.
Let us examine these components in more detail. The inﬁnitesimal proper time is
dτ = dt2 − |dr| 2 , so that
2
´ dτ µ2 (dτ )
2 | dr| 2 1
= = 1− = 1 − v2 = .
dt (dt)2 (dt)2 γ2

From this, promptly,

d x0 dt
U0 = = =γ (A.21a)
dτ dτ
d xi dt dxi
Ui = = = γ vi . (A.21b)
dτ dτ dt
You can view Eq. (A.21a) as yet another manifestation of time dilation. Thus we can
write
U= (γ , γ vx , γ vy , γ vz ) = γ (1, vx , vy , vz ) = γ (1, v), (A.22)
using v to represent the three (space) components of the (spatial) velocity vector, and
where the last expression introduces a convenient notation.
In a frame that is co-moving with a particle, the particle’s velocity is U = (1, 0, 0, 0),
so that, from Eq. (A.18), U · U = − 1; since the inner product is frame invariant, it must
have this same value in all frames, so that, quite generally, we have the relation
A.4 Vectors, Kinematics, and Dynamics 123

U · U = − 1. (A.23)
You can confirm that this is indeed true by applying Eq. (A.18) to Eq. (A.22).
Here, we defined the 4-velocity by differentiating the displacement 4-vector, and
deduced its value in a frame co-moving with a particle. We can now turn this on its
head, and define the 4-velocity as a vector that has norm − 1 and that points along
the t-axis of a co-moving frame (this is known as a ‘tangent vector’, and is effectively a
vector ‘pointing along’ the world line). We have thus defined the 4-velocity of a particle
as the vector that has components (1, 0) in the particle’s rest frame. Note that the norm
of the vector is always the same; the particle’s speed relative to a frame S is indicated
not by the ‘length’ of the velocity vector – its norm – but by its direction in S. We can
then deduce the form in Eq. (A.22) as the Lorentz-transformed version of ( 1, 0).

Equations A.21 can lead us to some intuition about what the velocity vector
is telling us. When we say that the velocity vector in the particle’s rest frame
is (1, 0), we are saying that, for each unit proper time τ , the particle moves the same
amount through coordinate time t, and not at all through space x; the particle ‘moves
into the future’ directly along the t-axis. When we are talking instead about a particle
that is moving with respect to some frame, the equation U0 = dt/dτ = γ tells us that
the particle moves through a greater amount of this frame’s coordinate time t, per unit
proper time (where, again, the ‘proper time’ is the time showing on a clock attached to
the particle). This is another appearance of ‘time dilation’.
By further differentiating the components of the velocity U, we can obtain the
components of the acceleration 4-vector
A= γ (γ̇ , γ̇ v + γ a) . (A.24)
This is useful less often than the velocity vector, but it is fairly straightforward to deduce
that U · A = 0, and that A · A = a2, defining the proper acceleration aas the magnitude
of the acceleration in the instantaneously co-moving inertial frame.
Finally, given two particles with velocities U and V , and given that the second has
velocity v with respect to the first, then in the first particle’s rest frame the velocity
vectors have components U = (1, 0) and V = γ (v)( 1, v). Thus
U·V= γ (v ),

and this inner product is, again, frame-independent.

A.4.3 Dynamics: Energy and Momentum

In the previous section, we have learned how to describe motion; we now want
to explain it. In newtonian mechanics, we do this by defining quantities such as
momentum, energy, force, and so on. We wish to find the Minkowski-space analogues
of these.
We can start with momentum. We know that in newtonian mechanics, momentum is
defined as mass times velocity. We have a velocity, so we can try defining a momentum
4-vector as
P = mU = mγ (1, v). (A.25)
124 A Special Relativity – A Brief Introduction

Since m is a scalar, and U is a 4-vector, P must be a 4-vector also. Remember also that γ
is a function of v: γ (v).
In the rest frame of the particle, this becomes P = m(1, 0): it is a 4-vector whose
norm (P · P ) is − m2, and which points along the particle’s world line. That is, it points
in the direction of the particle’s movement in space time. Since this is a vector, its
norm and its direction are frame-independent quantities, so a particle’s 4-momentum
vector always points in the direction of the particle’s world , line and the 4-momentum
vector’s norm is always − m2 . We’ll call this vector the momentum (4-)vector, but it’s
also called the energy-momentum vector, and Taylor and Wheeler (1992) call it the
momenergy vector (coining the word in an excellent chapter on it) in order to stress that
it is not the same thing as the energy or momentum (or mass) that you are used to.

Note that here, and throughout, the symbol m denotes the mass as measured
in a particle’s rest frame. The reason I mention this is that some treatments
of relativity, particularly older ones, introduce the concept of the ‘relativistic mass’,
distinct from the ‘rest mass’. The only (dubious) beneﬁt of this is that it makes a
factor of γ disappear from a few equations, making them look at little more like their
newtonian counterparts; the cost is that of introducing one more new concept to worry
about, which doesn’t help much in the long term, and which can obscure aspects of the
energy-momentum vector. Rindler introduces the relativistic mass; Taylor and Wheeler
and Schutz don’t.
Now consider a pair of incoming particles P 1 and P2 , which collide and produce a
set of outgoing particles P 3 and P4 . Suppose that the total momentum is conserved:
P1 + P2 = P3 + P4. (A.26a)
This is an equation between 4-vectors. Equating the time and space coordinates
separately, recalling Eq. (A.25), and writing p ≡ γ mv, we have
m1γ (v1 ) + m2γ (v2) = m3 γ (v3 ) + m4γ (v4 ) (A.26b)
p1 + p2 = p3 + p4 . (A.26c)
Now recall that, as v → 0, we have γ (v) → 1, so that, from Eq. (A.25), the low-
speed limit of the spatial part of the vector P is just mv, so that the spatial part of the
conservation equation, Eq. (A.26c), reduces to the statement that mv is conserved. Both
of these prompt us to identify the spatial part of the 4-vector P as the familiar linear
3-momentum, and to justify giving P the name 4-momentum.
What, then, of the time component of Eq. (A.25)? Let us (with, admittedly, a little
foreknowledge) write this as P 0 = E, so that
E= γ m. (A.27)
If we now expand γ into a Taylor series, then
E= m+ 1 mv 2 + O(v4 ). (A.28)
2

Now 12 mv2 is the expression for the kinetic energy in newtonian mechanics, and
Eq. (A.26b), compared with Eq. (A.27), tells us that this quantity E is conserved in
collisions, so we have persuasive support for identifying the quantity E in Eq. (A.27)
A.4 Vectors, Kinematics, and Dynamics 125

as the relativistic energy of a particle with mass m and velocity v. If, ﬁnally, we rewrite
Eq. (A.27) in physical units, we ﬁnd
2
E= γ mc , (A.29)
the low-speed limit of which (remember γ (0) = 1) recovers what has been called the
most famous equation of the twentieth century.
The argument presented here after Eq. (A.26) has been concerned with giving names
to quantities, and, reassuringly for us, linking those newly named things with quantities
we already know about from newtonian mechanics. This may seem arbitrary, and it is
certainly not any sort of proof that the 4-momentum is conserved as Eq. (A.26) says it
might be. No proof is necessary, however: it turns out from experiment that Eq. (A.26) is
a law of nature, so that we could simply include it as a postulate of relativistic dynamics
and proceed to use it without bothering to identify its components with anything we are
familiar with.

In case you are worried that we are pulling some sort of fast one, that we
never had to do in newtonian mechanics, note that we do have to do a
similar thing in newtonian mechanics. There, we postulate Newton’s third law (action
equals reaction), and from this we can deduce the conservation of momentum; here,
we postulate the conservation of 4-momentum, and this would allow us to deduce a
relativistic analogy of Newton’s third law (I don’t discuss relativistic force here, but it
is easy to define). The postulational burden is the same in both cases.
We can see from Eq. (A.27) that, even when a particle is stationary and v = 0,
the energy E is non-zero. In other words, a particle of mass m has an energy γ m
associated with it simply by virtue of its mass. The low-speed limit of Eq. (A.26b)
simply expresses the conservation of mass, but we see from Eq. (A.27) that it is
actually expressing the conservation of energy. In SR there is no real distinction
between mass and energy – mass is, like kinetic, thermal, and strain energy, merely
another form into which energy can be transmuted – albeit a particularly dense store
of energy, as can be seen by calculating the energy equivalent, in Joules, of a mass
of one kilogramme. It turns out from GR that it is not mass that gravitates, but
energy-momentum (most typically, however, in the particularly dense form of mass),
so that thermal and electromagnetic energy, for example, and even the energy in the
gravitational field itself, all gravitate. (It is the non-linearity implicit in the last remark
that is part of the explanation for the mathematical difficulty of GR.)
Let us now consider the norm of the 4-momentum vector. Like any such norm, it
will be frame invariant, and so will express something fundamental about the vector,
analogous to its length. Since this is the momentum vector we are talking about, this
norm will be some important invariant of the motion, indicating something like the
‘quantity of motion’. From the definition of the momentum, Eq. (A.25), and its norm,
Eq. (A.23), we have
P · P = m2U · U = − m2 , (A.30)
and we find that this important invariant is the mass of the moving particle.
Now using the definition of energy, Eq. (A.27), we can write P = (E , p) , and find
P · P = − E 2 + p · p. (A.31)
126 A Special Relativity – A Brief Introduction

Writing now p2 = p · p, we can combine these to ﬁnd

m2 = E 2 − p2 . (A.32)
This is not simply a handy way of relating E , p, and m. The 4-momentum P encapsulates
the important features of the motion in the energy and spatial momentum. Though
the latter are frame-dependent separately, they combine into a frame-independent
quantity.

It seems odd to think of a particle’s momentum as being always non-zero,

irrespective of how rapidly it’s moving; this is the same oddness that has the
particle’s velocity always being of length 1. One way of thinking about this is that
it shows that the 4-momemtum vector (or energy-momentum or momenergy vector)
is a ‘better’ thing than the ordinary 3-momentum: it’s frame-independent, and so has
a better claim to being something intrinsic to the particle. Another way of thinking
about it is to conceive of the 4-velocity as showing the particle’s movement through
spacetime. In a frame in which the particle is ‘at rest’, the particle is in fact moving at
speed c into the future. If you are looking at this particle from another frame, you’ll
see the particle move in space (it has non-zero space components to its 4-velocity in
your frame), and it will consequently (in order that U · U = − 1) have a larger time
component in your frame than it has in its rest frame. In other words, the particle moves
through more time in your frame than it does in its rest frame – another manifestation
of time dilation. Multiply this velocity by the particle’s mass, and you can imagine
the particle moving into the future with a certain momentum in its rest frame; observe
this particle from a moving frame and its spatial momentum becomes non-zero, and
the time component of its momentum (its energy) has to become bigger – the particle
packs more punch – as a consequence of the length of the momentum vector being
invariant.

A.4.4 Photons
For a photon, the interval represented by dR · dR is always zero (d R · dR = − dt2 +
dx2 + dy2 + dz2 = 0 for photons. But this means that the proper time dτ 2 is also zero
for photons. This means, in turn, that we cannot define a 4-velocity vector for a photon
by the same route that led us to Eq. (A.20), and therefore cannot define a 4-momentum
as in Eq. (A.25).
We can do so, however, by a different route. Recall that we defined (in the paragraph
following Eq. (A.23)) the 4-velocity as a vector pointing along the world line, which
resulted in the 4-momentum being in the same direction. From the discussion of the
momentum of massive particles in the previous section, we see that the P 0 component
is related to the energy, so we can use this to define a 4-momentum for a massless
particle, and again write
Pγ = (E , pγ ) .
Since the photon’s velocity 4-vector is null, the photon’s 4-momentum must be also
(since it is defined in Eq. (A.25) to be pointing in the same direction). Thus we must
have Pγ · P γ = 0, thus pγ · pγ = E2 , recovering the m = 0 version of Eq. (A.32),
A.4 Vectors, Kinematics, and Dynamics 127

E 2 = p2 (massless particle), (A.33)

so that even massless particles have a non-zero momentum.

In quantum mechanics, we learn that the energy associated with a quantum of light
– a photon – is E = hf , where h is Planck’s constant, h = 6.626 × 10− 34 Js (or
2.199 × 10− 42 kg m in natural units), so that

P = (hf , hf , 0, 0) (photon). (A.34)

[Exercise A.4]

Exercises
Exercise A.1 (§ 1.1.2) Which of these are inertial frames? (i) A motorway
bridge, (ii) a stationary car, (iii) a car moving at a straight line at a constant speed,
(iv) a car cornering at a constant speed, (v) a stationary lift, (vi) a free-falling lift. (The
last one is rather subtle.)

Exercise A.2 (§ 1.2) I have a friend moving past me in a rocket at a relativistic

speed, and I observe her watch to be moving slowly with respect to mine (as we will
discover later). She examines my watch as I do this: is it moving faster or slower than
hers?

Exercise A.3 (§ 1.2.1) Consider the wave equation

2
∂ φ ∂ φ
2 2
∂ φ 1 ∂2 φ
+ + − = 0. (i)
∂ x2 ∂ y2 ∂z2 c2 ∂ t 2

Take
∂ ∂x
±
±
∂t ∂ ∂
= +
∂x ∂ x ∂ x± ∂ x ∂ t±
∂
±
∂x ∂ ∂ t± ∂
= +
∂t ∂ t ∂ x± ∂ t ∂ t±

and so on, and using the GT, show that Eq. (i) does not transform into the same form
under a GT. [d + ]

Exercise A.4 (§ 1.4.4) Consider Compton scattering. We can examine the

collision between a photon and an electron. Unlike the classical Thomson scattering
of light by electrons, Compton scattering is an inherently relativistic and quantum-
mechanical effect, treating both the electron and the incoming light as relativistic
particles.
The collision is as shown in Figure A.4. An incoming photon strikes a stationary
electron and both recoil. The incoming photon has energy Q1 = hf1 = h/λ1 and
the outgoing one Q2 = hf 2 = h/λ2 ; the outgoing electron has energy E , spatial
momentum p, and mass m.
128 A Special Relativity – A Brief Introduction

E, p

Q1 θ
φ
Q2
Figure A.4 Compton scattering.

1. Identify the four momentum 4-vectors P1e , P1γ , P 2e , and P 2γ corresponding to

the momenta of the electron and photon before and after the collision.
2. Require momentum conservation, compare components, and show that
λ2 − λ 1 = h(1 − cos φ)/m.
Appendix B Solutions to Einstein’s Equations

By the end of Chapter 4, we had acquired enough differential geometry to understand

where Einstein’s equations come from and what they mean, and by extracting Newton’s
law of gravity from them, in Eq. (4.51), we have now both corroborated Eq. (4.32) and
ﬁxed the constant in it. But after bringing you to the threshold of GR, I cannot simply
leave you there, so this appendix will brieﬂy describe some solutions to Einstein’s
equations.
This account is a very brief one; there are fuller versions in most GR textbooks. 1

B.1 The Schwarzschild Solution

Karl Schwarzschild was the first to obtain an exact solution to Einstein’s equations,
in December 1915, only a month after Einstein had first published the equations in
Berlin2 (Einstein earlier produced an approximate solution, which obtained the advance
in the perihelion of Mercury, and which we will derive below as an encore after the
Schwarzschild solution).
Our goal, remember, is to find a set of coordinates, and a metric tensor, which
together define an interval
ds 2 = gµν dxµ dxν (B.1)
where the metric satisfies Einstein’s equations.
Our first step is to simplify the problem as much as possible, by a sensible choice of
coordinates.

1 This chapter is notationally harmonious with Carroll (2004, chapter 5), which provides very
useful expansion of the details. The selection and sequence of ideas, however, follows that in
my colleague Martin Hendry’s lecture notes, and can thus be claimed as pedagogically verified
in the same way as the rest of the book.
2 Schwarzschild did this work during his service as an army officer in World War I, and described
the solution in a letter to Einstein in December 1915, written under the sound of shell-fire,
probably in Alsace; Schwarzschild died early the next year, possibly from complications arising
from exposure to his own side’s experimental chemical weapons. The historical details here are
an aside in chapter 12 of Snygg (2012). The book is entertaining in a variety of ways,
mathematical and historical, but it is not an introductory text.

129
130 B Solutions to Einstein’s Equations

The Schwarzschild solution describes a space time that is highly symmetric. It is

empty apart from a single mass, which we will take to be at the coordinate system’s
origin, and it is does not change in time. We can describe the metric using the
coordinates (t, r , θ , φ): here θ and φ are the usual polar coordinates, and although
we will take r to be a radial coordinate, we will not assume at this point that it is
straightforwardly related to the distance from the origin, nor that t is straightforwardly
a proper time.
The time-independence of the metric means two things. Firstly, that none of the
components of the metric, gµν has a dependence on t; and secondly, that the interval
includes no time-space cross terms, dt dxi (if the metric is time-independent, then it
must be invariant under time reversal, but if g0i were non-zero, then a change of time
coordinate from t to − t would change the sign of dt dxi terms, but no others).
The rotational symmetry of the space-time must be reﬂected in a rotational
symmetry of the metric: it must be invariant under rotations about the origin. What
that means is that the (2-d) space at constant r (and constant t) has the geometry of the
surface of a sphere, or d s2 = ρ (r)d±2 ≡ dθ 2 + sin2 θ dφ 2 for some function ρ that
depends only on the coordinate r . The function ρ doesn’t depend on t, because none of
the coefﬁcients does, and it doesn’t depend on θ or φ, because if it did then the spaces of
constant (t , r ) would not be rotationally symmetric. This is what it means to identify r
as ‘a radial coordinate’. This also means that there are no dr dθ or dr dφ cross terms.
We have by now excluded a large number of possible terms in the fully general
metric, and can now write this (with some foreknowledge) as

ds 2 = − e2α(r ) dt2 + e 2β (r ) dr2 + r2 d±2 . (B.2)

Here the choice of signs is to retain compatibility with the Minkowski metric
ds2 = − dt 2 + dr2 + r2 d±2 , and we have assumed that the coefficients of dt2 and d r2
cannot change sign from this. Also, we have chosen the coordinate r to be such that the
metric on surfaces of constant (t, r ) is r2 d±2 (and if we hadn’t, for some reason, then
we could change radial variables r ±→ r̄ so that the coefficient ρ (r̄ ) was indeed just r̄2 ).
Notice that this illustrates the principle of general covariance, of Section 1.1. Given
a space-time – such as the space-time around a single point mass – we are allowed
to choose how we wish to label points within it, using a set of four independent
coordinates (four, because the space-time we are interested in is homeomorphic to R4
– see Section 3.1.1). We can explore the length of intervals in that space-time using a
clock, for timelike intervals, or a piece of string, for spacelike intervals, and summarise
the structure we thus discover, by writing down the coefficients of the metric. Those
coefficients depend on our instantaneous position within the space-time, and how we
have chosen to label them using our coordinates. The covariance principle declares that
the physical conclusions we come to must not change with a change in coordinates –
for example that a dropped object accelerates radially downwards or that the curl of
a magnetic field is proportional to a current; the numerical value of the acceleration
might depend on the coordinate choice, or the constant of proportionality, but not the
underlying geometrical statements. Similarly, it is the covariant derivative we defined
in Section 3.3.2 that allows us to define differentiation in such a way that the derivative
depends on the metric but, again, not the choice of coordinates.
B.1 The Schwarzschild Solution 131

The above account may seem somewhat hand-waving, but it is the intuitive
content of a more formal account, which depends on identifying the geometri-
cal structures that generate the symmetries contained within a particular manifold. This
discussion leads to Birkhoff’s theorem, which asserts that the Schwarzschild metric,
which we are leading up to, is the only spherically symmetric vacuum solution (static
or not). See Carroll (2004, §5.2).
With this metric, the Christoffel symbols are:
²tr
t = α² ²tt
r = e2(α− β ) α² ²rr
r = β²
²
θ
rθ = 1/r ²
r
θθ
= − re − 2β ²
φ
rφ
= 1/ r (B.3)
cos θ
r
= − r e− 2β sin2 θ
φ
²φφ ²φφ
θ
= − sin θ cos θ ² =
sin θ θφ

(others zero) where primes denote differentiation with respect to r . The corresponding
Ricci tensor components are
±
²² + α ²2 − α ²β ² + 2α² /r ²
Rtt = e2(α− β ) α

R rr = − α²² − α²2 + α² β ² + 2β ² /r
( ) (B.4)
R θ θ = e− 2β r(β ² − α² ) − 1 + 1
R φφ = sin2 θ R θ θ

(off-diagonal elements zero).3

Now we have the Ricci tensor, we can use the constraint Rµν = 0, Eq. (4.30), to
obtain the metric.
Since both Rtt and R rr are zero, so is their sum, and so
2
0 = e2(β − α) Rtt + Rrr = (α² + β ²) , (B.5)
r
which implies α + β is constant. Since we want the metric, Eq. (B.2), to reduce to the
Minkowski metric at large r, we must have both α( r) → 0 and β (r) → 0 as r → ∞ .
Thus the constant is zero, and
α = −β.
With this, we can rewrite Rθ θ = 0 as

1 = e2α (2rα ² + 1) =
∂ 2α
(r e ),
∂r

which we can promptly solve to obtain

R
e2α = 1 − , (B.6)
r
where R is a constant of integration.
To ﬁnd a value for this constant, R, we can return to the geodesic equation,
Eq. (3.43). Consider a test particle released from rest, so with dxi/dτ = 0, as usual
taking τ to be the proper time, dτ 2 = − ds 2. Looking back at the metric of Eq. (B.2),
we see that
3 Relativists really like computer algebra packages.
132 B Solutions to Einstein’s Equations

dt
= e− α ,
dτ
so that the r-component of the geodesic equation is
d2 r
³ dt ´2 d2 r
+ r
²tt = + e 2α α ² = 0,
dτ 2 dτ dτ 2
or, differentiating Eq. (B.6),
d2 r R
2
= − 2. (B.7)
dτ 2r
In the non-relativistic, or weak ﬁeld, limit, the acceleration of a test particle due to
gravity is (compare Section 4.3.2)
d2 r GM
2
= − 2 ,
dt r
where M is the mass of the body at r = 0. From this we can see that R, the
Schwarzschild radius, is
R = 2GM , (B.8)
and putting this back into the prototype metric Eq. (B.2), we have
³2GM
´ ³ ´
2GM − 1 2
2 2
ds = − 1 − dt + 1− dr + r2 d±2 (B.9a)
r r
≡ gtt dt2 + grr dr 2 + gθ θ dθ 2 + gφφ dφ2 , (B.9b)
the metric for the Schwarzschild solution to Einstein’s equations. We will also refer to
0 1 2 3
(t, r, θ , φ) as (x , x , x , x ) below.
Differentiating e 2α = e− 2β = 1 − R/r to obtain

α
² = − β² = R 1
,
2r r − R
we can specialise Eq. (B.3) to
t R r R r −R
²tr = ²tt = (r − R) ²rr =
2r(r − R) 2r 3 2r(r − R)
r
²
θ
rθ = 1/r ²θ θ = − (r − R) ²
φ
rφ = 1/r (B.10)

r cos θ
²φφ = − (r − R) sin2 θ ²
θ
φφ
= − sin θ cos θ ²
φ
θφ
= .
sin θ
The sun has a mass of approximately 2 × 1030 kg in physical units. Taking
G = 2/3 × 10− 10 m2 kg− 1 s − 2, this gives R = 8/3 × 1020 m3 s − 2; if we then divide
by the conversion factor 1 = c2 = 9 × 1016 m2 s − 2, we end up with R ≈ 3 × 103 m.
Equivalently, we may recall the natural units of Section 4.3, in which the choice of
G = 1 creates a conversion factor between units of kilogrammes and units of metres,
exactly as we did when writing c = 1 in Section 1.4.1. In these units, the mass of the
sun is approximately 1.5 × 103 m giving, again, R = 2M = 3 × 103 m. From here on,
we will work in units where G = 1.
As you can see from Eq. (B.9), the coefﬁcients have a singularity at r = 2M .
These coordinates are ill-behaved at r = 2M but, it turns out, there is no physical
B.2 The Perihelion of Mercury 133

singularity there. It is not trivial to demonstrate this, but it is possible to ﬁnd alternative
coordinates – Kruskal–Szekeres coordinates – that are perfectly well-behaved at that
point. Although the space-time is not singular at r = 2M , that radius is nonetheless
special. The radius r = 2M is known as the ‘event horizon’, and demarcates two
causally separate regions: there are no world lines that go from inside the event horizon
to the outside, so no events within it can affect the space-time outside.
[Exercise B.1]

B.2 The Perihelion of Mercury

Now that we have a metric for the Schwarzschild space-time – that is, a solution to
Einstein’s equations – our next step is to ask about the dynamics in that space-time – that
is, how free-fall test particles move within it. This matters because the Schwarzschild
solution is an adequate model for the space-time round any single gravitating mass.
So the question crystallises into: what are the orbits of planets within the solar system,
viewed as a Schwarzschild space-time?
First, notice that there are no cross-terms in Eq. (B.9), in these coordinates – that
is, gij = 0 for i ³= j; these are known as an orthogonal coordinates. In the case of
orthogonal coordinates, the matrix of metric components gij is diagonal (which implies
that gii = 1/ gii ), and the geodesic equation simpliﬁes to

d
³ ∂x
µ
´ µ ³ ∂ x ´2 α
gµµ − 1 gαα,µ =0 (no sum on µ). (B.11)
dτ ∂τ 2 ∂τ
α

This will simplify some of the calculations below. See Exercise B.2.
We can see that gµν is independent of both t and φ , so that the second term in
Eq. (B.11) disappears for both µ = 0 and µ = 3, and thus that the geodesic equations
for t and φ are solvable as
dt
gtt= k constant (B.12)
dτ
dφ
gφφ = h constant. (B.13)
dτ
Similarly, the θ -equation (µ = 2) becomes

d2 θ dr dθ
³ dφ ´2
r2 + 2r − r 2 sin θ cos θ =0
dτ 2 dτ dτ dτ
(using intermediate results for the metric, from Exercise B.1), which has θ = π/ 2 as a
particular integral (a test particle which starts in that plane stays in that plane).
Trying to do the same thing for r, in Eq. (B.11), gives us a second order differential
equation to solve, which we’d prefer to avoid. Instead, we can use the fact that
¶ ·
∂x ∂x
α β
d
gαβ = 0 (B.14)
dτ ∂τ ∂τ
134 B Solutions to Einstein’s Equations

along a geodesic (see Exercise B.3). Integrating this, we can ﬁx the constant of
integration by recalling that Uµ Uµ = − 1 in Minkowski space, to give

∂x ∂x
α β
gαβ = − 1,
∂τ ∂τ

and thus, on inserting the coefﬁcients of the metric,

³ dt ´ 2 ³ dr ´ 2 ³ dφ ´2
− 1 = gtt + grr + gφφ ,
dτ dτ dτ
or
³ dr ´ 2 ¶ ·
2 2M h2 h2
= k − 1− 2 + 1+ . (B.15)
dτ r r r2

The next step is to change variables from r to u ≡ 1/ r, and to regard φ rather

than τ as the independent variable, so that dr/dφ = dr/dτ dτ /dφ. This mirrors the
conventional approach to the solution of the Kepler problem; both Carroll (2004) and
Schutz (2009) touch on this, and it is discussed in detail in Goldstein (2001, chapter 3).
Using this change of variables, we ﬁnd
³ du ´2
h2 = − ( 1 − Ru)( 1 + h2 u2 ) + k2 .
dφ
If we now differentiate this, we can rearrange the result to obtain

d2 u M
2
= 3Mu2 − u + 2 (B.16)
dφ h
(switching to M = R /2, here, avoids a few unsightly powers of 2 below). If you trace
this calculation back, you see that the ﬁrst term, in u2 , is associated with the factor of
1 − R /r in dt/dτ , Eq. (B.12); it is, loosely speaking, the relativistic term, and it is the
term that is not present in the corresponding part of the non-relativistic, or newtonian,
calculation. And sure enough, this term is much smaller than the other two terms, for
cases such as the earth’s orbit around the sun.
What that means in turn is that the solution to this equation will be the solution to
the Kepler problem, u0, plus a small correction, which we can write as

u = u0 + u1 (u1 ´ u0 ).
You can conﬁrm that, in these units, the solution to the newtonian problem (that is,
Eq. (B.16) without the relativistic term) is
M
u0 (φ) = (1 + e cos φ) ,
h2
and thus
¶ ·
d2 u1 M3 e2 e2
+ u 1 = 3 1+ + 2e cos φ + cos 2φ , (B.17)
d φ2 h4 2 2
B.2 The Perihelion of Mercury 135

where we have suppressed terms in u0u1 and u21 , and recalled that cos2 φ =
(1 + cos 2φ)/2. If you stared at this for long enough, you would doubtless spot that

d2
(φ sin φ) + φ sin φ = 2 cos φ
dφ 2
d2
(cos 2φ) + cos 2φ = − 3 cos 2φ ,
dφ 2
which prompts us to write
B C
u1 (φ) = A + φ sin φ − cos 2φ , (B.18)
2 3
which, on substutition into Eq. (B.17), gives
¶ ·
M3 e2 e2
A + B cos φ + C cos 2φ = 3 4 1+ + 2e cos φ + cos 2 φ ,
h 2 2
giving values for A, B, and C , which are all numerically small.
Examining the terms in Eq. (B.18), we see that the first term is a (small) constant,
and the third oscillates between µ C/ 2; both, therefore, have negligible effects. The
middle term, however, has an oscillatory factor, but also a secular factor, which grows
linearly with φ .
If we finally add u0 + u1 again, but discard the negligible terms in A and C , then we
obtain
M 3M 2
u= (1 + e cos(1 − α)φ) , α = ´ 1 (B.19)
h2 h2
(using a Taylor expansion which shows that cos (1 − α)φ = cos φ + αφ sin φ + O(α2 )).
This is very nearly an ellipse, but with a perihelion that advances by an angle 2 π α per
orbit.
We can describe a newtonian orbital ellipse with
1 + e cos φ
u(φ) = ,
a(1 − e2)
where e is the orbital eccentricity, and a the semi-major axis. By comparison with the
non-relativistic part of Eq. (B.16), we find that
M 1
= ,
h2 a(1 − e 2)
and thus that
M2 6π M
³φ ≡ 2π α = 6π = .
h 2 a(1 − e2)
The nearest planet to the sun is Mercury. The actual orbit of Mercury is not quite a
kelperian ellipse, but instead precesses at 574 arcsec/century (relative to the ICRF, the
relevant inertial frame). This is almost all explicable by newtonian perturbations arising
from the presence of the other planets in the solar system, and over the course of the
nineteenth century much of this anomalous precession had been accounted for in detail.
The process of identifying the corresponding perturbations had also been carried out
136 B Solutions to Einstein’s Equations

for Uranus, with the anomalies in that case resolved by predicting the existence of, and
then finding, Neptune.4 At one point it was suspected that there was an additional planet
near Mercury which could explain the anomaly, but this was ruled out on other grounds,
and a residual precession of 43 arcsec/century remained stubbornly unexplained.
Mercury has semi-major axis a = 5.79 × 1010 m = 193 s, and eccentricity
e = 0.2056. Taking the sun’s mass to be M¶ = 1.48 km = 4.93 × 10− 5 s, we find
³φ = 5.03 × 10
− 7 rad orbit− 1 . The orbital period of Mercury is 88 days, so that
converting ³φ for Mercury to units of arcseconds per century, we find
³φ = 43.0 arcsec/ century.
Einstein published this calculation in 1916. It is the first of the classic tests of GR,
which also include the deflection of light (or other EM radiation) in its passage near the
limb of the sun, and the measurement of gravitational redshift. Numerous further tests
have been made, at increasing precision, and no deviations from GR’s predictions have
been found. The history of such tests is discussed at some length in Will (2014).
[Exercise B.2–B.4]

B.3 Gravitational Waves

The third application we will look at, in this appendix, is that of gravitational waves –
how they propagate and how they are detected. This will be a rather compact account,
which will mostly follow the sequence of ideas in Carroll (2004, chapter 7), because
it makes it easy to pick up the ideas of gauge invariance that we met in passing in
Section 4.3.1. Schutz (2009, chapter 9) gets to the same endpoint more directly. See
those references (or indeed many others) for the details I have elided below.

B.3.1 Einstein’s Equations in the Transverse-Traceless Gauge

Our starting point is the ‘nearly-Minkowskian’ metric of Eq. (4.36), where the
matrix hαβ is a symmetric matrix of small components. We write the components of
this matrix as follows:
h00 = − 2´
h0i = wi (B.20)
hij = 2s ij − 2µ δij ,

where sij is traceless, so that the trace of hαβ appears only in µ = − δ ij hij / 6.
The tensor sij is referred to as the strain . We have done nothing here other than change
the notation; in particular, note that there are ten degrees of freedom here, just as in the
original hij . The signiﬁcance of the different partitions of the matrix is that each of the
00, 0i, and ij components transforms into itself under a spatial rotation.

4 This took place in a few years leading up to 1846, and led to a Franco-British priority dispute
over which country’s astronomer had made and published the crucial prediction (Kollerstrom
2006).
B.3 Gravitational Waves 137

From this metric it is straightforward but tedious 5 to obtain ﬁrst the Christoffel
symbols, and then the Riemann tensor
R0j0l = + w(l,j 0) − hjl,00
´,lj

R 0jkl = w[l,k]j − hj[l,k]0 (B.21)

Rijkl = hi[l,k]j − hj[l,k]i
and Einstein tensor
G00 = 2∇ 2µ + s ij ,ij
G0i = − 12 ∇ 2w j + 21 wk ,kj + 2µ,j0 + sj k ,k0
³ ∂ ∂
´ (B.22)
2 k
Gij = δij ∇ − (´ − µ) + δij w ,k0 − w( j,i)0
∂ xi ∂ xj

+ 2δij µ ,00 − ±s ij + 2sk j,i k − ( ) δij s

kl
,kl .
Recall that the Roman indexes run over only the spatial indexes; that the ± symbol is
the d’Alembertian operator,
2
±≡ η
μν
∂

∂ xμ ∂ xν
∂
= −
∂ t2
∂
+ ∇ 2,
the four-dimensional version of the Laplacian; the definition of the symmetrisation
notation in Eq. (4.42); and that these expressions are calculated using hμν ´ ημν .
Having quoted the Einstein tensor, our next task is to find a way of throwing most
of it away.
The Einstein tensor is governed by Einstein’s equation, Gμν = 8π T μν . Examining
the G00 term, we can see that the corresponding term in Einstein’s equation specifies µ
in terms of sij and T00 . There is no time derivative of µ , and so no propagation of it.
Similarly, the G0i term specifies the w j in terms of µ , s ij and the T0i , and the Gij term
specifies ´, in each case, again, without a time derivative. So although, after Eq. (B.20),
we counted ten degrees of freedom, they are not all independent. The propagating terms
– the terms that will shortly lead to a wave equation – are all in the sij component of the
metric, which, you may recall, is both symmetric and trace-free.
In the discussion under Eq. (4.41) we discussed the family of gauge transformations
generated by hαβ → hαβ + 2ξ(α,β ) (this is mildly adjusted from Eq. (4.40), to make the
signs below nicer). What does this look like, when applied to the metric parameterised
by Eq. (B.20)? We obtain
´ → ´ + ξ
0
,0
i 0
wi → wi + ξ ,0 − ξ ,i
1 k (B.23)
µ → µ − ξ ,k
3
1 k
s ij → sij + ξ(j,i) − ξ ,k δij .
3

5 Did I mention that relativists reallylike computer algebra packages?

138 B Solutions to Einstein’s Equations

Under this transformation,

1 k ij
s ik ,k → sik ,k + ξ
( k,i)
,k − (ξ ,k δ ),j
3
1 k ,i
= s ik ,k + ξ
1 2 i
,k + 2 ∇ ξ .
6
This gives a differential equation for ξ i which makes the divergence of the strain vanish
s ik ,k = 0. (B.24)

Doing the same thing with the wi expression, and choosing ξ 0 so that ∇ 2 ξ 0 − ξ i ,0i =
wi ,i gives
wi ,i = 0.

This choice of coordinates – that is, this choice of ξ μ when applied to xμ – is known as
the transverse gauge. This gauge choice significantly simplifies Einstein’s equations in
Eq. (B.22). In this gauge, we can finally solve Einstein’s equations to find propagating
gravitational wave solutions.
We are looking for solutions propagating in free space, in which Tμν = 0. Looking
at the simplified Eq. (B.22), the G00 = 0 equation implies ∇ 2µ = 0, which, in an
asymptotically flat space-time (remember µ = − h00 / 2 ´ η00 ), implies µ = 0. The
G0i term then similarly implies wi = 0. Finally, Gij = 0 implies that Tr Gij = ηij Gij = 0
and thus (since sij is traceless) that ∇ 2´ = 0 and ´ = 0. The only term that survives
this carnage is the traceless part of the Gij = 0 equation, or
±sij = 0. (B.25)
This is usually referred to as the transverse-traceless gauge. Looking back to our
re-notation of the metric perturbation hμν in Eq. (B.20), and recalling that in this
context ´, wi , and µ are all zero, we write the metric perturbation in this gauge as hTT ,
and rewrite the above reduction of Einstein’s equations as
±h TT
μν
= 0. (B.26)

Comparing this to the deﬁnitions of ´, wi , and µ in Eq. (B.20), we can see that hTT 00
=
hTT
0i = 0. Eq. (B.26) has as solution a matrix of metric perturbations h (x ) = h (t, x),
TT α TT

giving the small deviations from a Minkowski metric as a function of position and time.
Recall that this matrix gives expressions for the components of this perturbation in this
gauge, and that (recalling Section 4.3.1) we can regard this as a tensor in a background
Minkowski space.

B.3.2 Gravitational Wave Solutions

Equation (B.26) is a wave equation, which has solutions
hTT
μν
= Cμν exp ikα xα (λ), (B.27)
for a constant one-form kα , and a constant tensor C with possibly complex components.
In this gauge, C 00 = C0i = 0 and ημν Cμν = 0 (i.e., C is traceless).
B.3 Gravitational Waves 139

Applying Eq. (B.26) to this, we ﬁnd that

− kα kα hTT
μν
= 0,
which is a solution only if kα k α = 0, so that the wave-vector k α is null.
Looking back at the solution in Eq. (B.27), we see that hTT μν
is constant if kα xα is
constant. This is true for the curve x (λ) = k f (λ) + l for any scalar function f and
α α α

constant vector lα . Here, the function f (λ) = λ, or some afﬁne transformation of that,
gives the worldline of a photon, indicating that a given perturbation – that is, a given
value of hTT
μν
– propagates in the same way as a photon, rectilinearly at the speed of
light.
Imposing the gauge condition hTTμν ,ν = 0, Eq. (B.24), we deduce that kμ Cμν = 0:
the wave vector is orthogonal to the constant tensor Cμν , and the oscillations of the
solution are transverse to the direction of propagation. This is why this is called the
transverse gauge.
Our last observation is that we can, without loss of generality, orient our coordinates
so that the wave vector is pointing along the z-axis, and kμ = (ω , 0, 0, ω), writing ω
for the time-component of the wave vector. Thus 0 = kμ Cμν = k 0C0ν + k3C 3ν so
that C3ν = 0 as well. Writing (with, again, a little foreknowledge) C11 = h+ and
C12 = h× , this means that the tensor components Cμν simplify, in this gauge, to
⎛ ⎞
0 0 0 0
⎜⎜0 h+ h× 0⎟⎟.
=⎝
0⎠
Cμν (B.28)
0 h× − h+
0 0 0 0

After identifying ten degrees of freedom in Eq. (B.20), in this gauge there are only two
degrees of freedom left.

B.3.3 Detecting Gravitational Waves

Above, in Eqs. (B.27) and (B.28), we have identiﬁed gravitational waves. What is
the effect of these on matter, or, in more pragmatic terms, how can we detect them
experimentally? First, we need to ask what is the effect, on a free-falling particle, of a
passing gravitational wave.
The paragraph before Eq. (B.28) describes a set of coordinate systems where hTT is
a perturbation of a Lorentz frame oriented so that the gravitational wave is propagating
along the z-axis. Given a test particle in free fall, there is one of these frames with
respect to which the particle is initially at rest, with tangent vector U μ = (1, 0, 0, 0).
This particle obeys the geodesic equation, Eq. (3.43),

d μ
U + U α Uβ = 0.
μ
²
αβ
dτ
Given the initial condition, this gives an initial deﬂection of
d μ
U = − ²00 = − 21 η μν (hTT
0ν ,0 + hν 0,0 − h00,ν ).
μ TT
dτ
140 B Solutions to Einstein’s Equations

Since hTT is proportional to Cμν , and C μν has components as in Eq. (B.28) in these
coordinates, we see that the right-hand side is zero, so that the particle’s velocity vector
is unchanged, and it remains at restas the gravitational wave passes.
Consider now a second test particle, a distance ¶ from the ﬁrst, along the x-axis. The
proper distance between these two particles is
¸ ¸
³l = | ds2 |1/ 2 = | gαβ dxα dxβ | 1/2
¸ ¶
= | gxx | 1/ 2dx
0
¹¹
≈ |gxx | 1/ 2¹ ¶
x= 0
º ¹ ( 2) »
1 hTT ¹
2 xx ¹x= 0
= 1+ + O hTT
xx ¶. (B.29)

Since hTT has an oscillatory factor, the proper distance between the test particles
oscillates as the wave passes, even though both particles are following geodesics, and
are thus ‘at rest’ in their corresponding Lorentz frames. They are ‘at rest’ in the separate
senses that they remain at the same coordinate positions (in these coordinates) and that
they remain on geodesics. This is an example of the distinction between ‘acceleration’
as a second derivative of coordinate position – which is a frame-dependent thing –
and acceleration as the thing you directly perceive when pushed – which is a frame-
independent thing.
The two test particles are both following geodesics, so we should be able to analyse
this same situation using the geodesic deviation equation, Eq. (3.60), which directly
describes the way in which the proper distance between two test particles changes, as
they move along their mutual geodesics. To ﬁrst order, we can take both tangent vectors
to be U μ = (1, 0, 0, 0) , as above, so that
¶ · μ
d2 ξ ν
= Rμ αβ ν U αU β ξ .
dt 2

Evaluating R μαβ ν = η
μσ
Rσ αβ ν in our transverse-traceless gauge,

Rσ αβ ν U α Uβ = hTT − hTT from Eq. (B.21)

º
σ [ν ,0]0

1
0[ν ,0]σ
»
= 2 hTT
σ ν,00
− hTT
σ 0, ν 0
− hTT
0ν ,0σ + h00,νσ
TT

= 1 hTT from Eq. (B.28).

2 σ ν ,00

For these test particles, x 0 = τ = t, so that the geodesic deviation equation becomes
¶ · μ
d2 ξ 2
1 η μσ ξ ν d hTT .
= 2
(B.30)
dt 2 dt 2
σν

Notice that only ξ 1 and ξ 2 are affected – the directions transverse to the direction the
wave is travelling in.
B.3 Gravitational Waves 141

y
x

Figure B.1 Polarised gravitational waves. The successive ﬁgures show the posi-
tions of a ring of test particles at successive times. The upper ﬁgure shows the
effect of + -polarised gravitational waves, and the lower the effect of × -polarised
ones.

If h× = 0, then Eq. (B.30) becomes

∂ ξ
2 1 2
1 1 ∂
= 2 ξ h+ exp ikμ xμ
∂ t2 ∂ t2

∂ ξ
2 2 ∂
2
= − 12 ξ 2 h+ exp ikμ xμ ,
∂ t2 ∂ t2

with lowest-order solutions

1
º » 1
ξ = 1 + 12 h+ exp ikμxμ ξ (0)

2
º » 2 (B.31)
ξ = 1 − 21 h+ exp ikμxμ ξ (0).

The corresponding solutions for h+ = 0 are

1 1
ξ = ξ (0) + 21 h× exp(ikμx μ)ξ 2(0)
(B.32)
2 2
ξ = ξ (0) + 21 h× exp(ikμx μ)ξ 1(0).
Compare Eq. (B.29): what we have done here is essentially a more elaborate calculation
of the same thing.
The effects of these solutions, on a ring of test particles, are shown in Figure B.1,
with the + -polarised solutions corresponding to Eq. (B.31) and the × -polarised ones
corresponding to Eq. (B.32). Since Eq. (B.30) is linear, its solutions are superpositions
of these two solutions.
The gravitational waves emitted by moving masses – in practical terms, by fast-
orbiting or collapsing stellar masses – propagate at the speed of light in these two
polarisations.
The orbit of the two neutron stars in the so-called ‘Hulse–Taylor’ binary pulsar has
slowed in exactly the way that GR would predict, if the binary system is radiating
gravitational waves. This provides a very strong indirect detection of gravitational
waves Weisberg et al. ( 2010). However the calculations above suggest that these waves
in space-time can also be directly detected by measuring the proper separations of test
particles in that space-time.
142 B Solutions to Einstein’s Equations

If a pair of test masses – either two of the particles in Figure B.1 or the two
in Eq. (B.29) – are not in free fall but instead carefully suspended so that they are
stationary in a lab, then their motion is govered both by the suspending force, which
is accelerating them away from a free-fall geodesic, but which is constant, and by
the changes to space-time caused by any passing gravitational waves. By carefully
monitoring the changes in proper separation between the masses, using the light-travel
time between them, we should be able to detect the passage of any gravitational waves
in the vicinity.
The size of these fluctuations is given by the size of the terms in hTT , compared to
the terms in the metric they are perturbing, which are equal to 1. For the Hulse–Taylor
binary mentioned above, hTT ∼ 10− 23 . Detecting the changes in proper separation
requires a rather intricate measurement.
One way of doing this is to develop resonant bar detectors, which are large bars,
typically aluminium, with a mechanical resonance that could be excited by a sufficiently
strong and sufficiently long gravitational wave of the matching frequency. These proved
insufficiently sensitive, however, and the gravitational wave community instead moved
on to develop interferometric detectors. These use an optical arrangement very similar
in principle to the Michelson–Morley detector which failed to detect the luminiferous
aether, and which was so important in the history of SR. The test masses here are
mirrors at the ends of two arms at right angles to each other: changes in the lengths
of the arms, as measured by lasers within the arms are, with suitably heroic efforts,
detectable interferometrically.
There are detailed discussions of the calculation in Schutz (2009, chapter 9) and
Carroll (2004, chapter 7), a summary calculation and a discussion of astrophysical
implications in Sathyaprakash and Schutz (2009), and an account of the experimental
design in Pitkin et al. (2011). The LISA design for a future space-based interferometer,
at a much larger scale and with higher sensitivities, and the astronomy it may unlock,
is described in Amaro-Seoane et al. (2013 & 2017). Pitkin et al. (2011) also provides
a very brief history of the development of gravitational wave detectors, and there is a
much more extensive history, from the point of view of the sociology of science, in
Collins (2004).
The first direct detection of gravitational waves was made by the LIGO experiment
on 2015 September 14, and jointly announced by the LIGO and Virgo collaborations on
2016 February 11 (Abbott et al., 2016). Subsequent detections confirm the expectation
that such measurements will become routine, and will allow gravitational wave physics
to change from being an end in itself, to becoming a non-electromagnetic messenger
for the exploration of the universe on the largest scales.

Exercises
Exercise B.1 (§ 2.1) Derive Eqs. (B.3) and (B.4). You might as well obtain
Eq. (B.10) as an encore. [d− ]
B.3 Gravitational Waves 143

Exercise B.2 (§ 2.2) Prove Eq. (B.11). Suspend the summation convention for
this exercise. First, obtain the Christoffel symbols for an orthogonal metric:
gαα,α
²αα =
α
2gαα
− gββ ,α
²
α
= α ³= β
ββ
2gαα
gαα,β
²αβ = α ³= β
α
2gαα
²
α
βγ
= 0 α, β , γ all different.
Then start from one or other version of the geodesic equation, such as Eq. (4.46), expand
d/dτ (gμμ ∂ xμ /∂ τ ), use the geodesic equation, and expand the sum carefully. [ u+ ]

Exercise B.3 (§ 2.2) Prove Eq. (B.14). Expand the left-hand side, use the
geodesic equation, and use symmetry.

Exercise B.4 (§ 2.2) Estimate the relative numerical values of the terms on the
right-hand side of Eq. (B.16), for the orbit of the earth. Conﬁrm that the ﬁrst term is
indeed small compared to the third. [ u+ ]
Appendix C Notation

Chapters 2 and 3 introduce a great deal of sometimes confusing notation. The best way
to get used to this is: to get used to it, by working through problems, but while you’re
slogging there, this crib might be useful.

C.1 Tensors
(M)
A N tensor is ( )a linear function of M one-forms and( ) N vectors, which turns them into
a number. A 10 tensor is called a vector, and a 01 tensor is a one-form. Vectors are
written with a bar over them, V , and( ) one-forms with a tilde ±p (§2.2.1). In my (informal)
notation in the text, T(±· , · ) is a 11 tensor – a machine with a single one-form-shaped
( )vector-shaped slot. Note that this is a different beast from T( · , ±· ),
slot and a single
which is also a 11 tensor, but with the slots differently arranged.

C.2 Coordinates and Components

In a space of dimension n, a set of n linearly independent vectors e i ( i = 1, . . ., n) forms
a basis for all vectors in the space; a set of n linearly independent one-forms ± i
ω forms a
basis for all one-forms in the space.
±
ei (ω j ) = δi j ⇔ ±j (ei ) = δ j i.
ω Choose basis vectors and (2.5)
one-forms to be dual (remember
±
that ei and ωi are functions)
V = V 0 e0 + V 1 e1 + · · · Vectors have components, written §2.2.5
with raised indexes
±p = p0 ±ω0 + p1 ±ω1 + · · · . . . so do one-forms, but written
with lowered indexes
±
V i = V (ω i ), ±
pi = p(ei ) Components of vectors and (2.6)
one-forms (a consequence of the
preceding relations)

144
C.4 Differentiation 145

T ij = T (ω ±i , ej ) Tensors have components, too (2.7)

T ij = (ei , ±
j
T ω ) A different beast (note the
arrangement of indexes)
The object T i j is a number – a component of a tensor in a particular basis. However
we also (loosely, wickedly) use this same notation( ) to refer to the corresponding matrix
of numbers, and even to the corresponding 11 tensor T.
The vector space in which these objects live is the tangent plane to the manifold M
at the point P , TP (M ) (§3.1.1). In this space, the basis vectors are ei = ∂/∂ x i (§3.1.2),
and the basis one-forms ± dxi , where xi is the i-th coordinate (more precisely, coordinate
i
function; note that x is not a component of any vector, though the notation makes it
look like one). These bases are dual: ei (± ω ) = ∂/∂ x (±
j i dxj ) = δ j (cf., (3.7)).
i

C.3 Contractions
A contraction is a tensor with some or all of its arguments ﬁlled in.
± ±
V (p) = p(V ) by choice §2.2.5
±p(V ) ≡ ±±p, V ² special notation for vectors (2.2)
contracted with one-forms
±p(V ) = pi V i basis independent §2.2.5
(2)
T p, ±
(± · , V )j = pi V k T ij k partially contracted 1 tensor §2.2.5
(a vector)
p= ±p, ±· )
g( a vector, with components. . . §3.2.3
g( ±p, ±ωj ) = pi (ω±i , ±ωj ) ≡ pj
g deﬁnition of vector p, with raised
indexes, dual to one-form ±p, written
with lowered indexes
gij = g( ei , ej ), components of the metric (2.12)

gij = g(ω ±i , ±ω j ) . . . up and down

gij gkl T j l = T i k the metric raises and lowers
indexes; T (±· , · ) and T ( · , ±
· ) are
distinct but related
gij , g i j = δi j , gij different tensors in principle, but (2.15)
all referred to as ‘the metric’

C.4 Differentiation
V i ,j ≡ ∂ V i/∂ xj (non-covariant) derivative of a
vector component
146 C Notation

j
pi,j ≡ ∂ pi/∂ x . . . and one-form
∇V (1) derivative of V
covariant §3.2.2
(a 1 tensor)
∇ UV covariant derivative of V in the cf. (3.40)
direction U (a vector)
∇ ei V ≡ ∇ i V shorthand
(1)
( ∇ V )i j = V i ;j components of the tensor ∇ V (3.21)
(10)
( ∇±
p)ij = pi; j components of the 2 tensor ∇ ± p

V i;j = gjk V i ;k it’s a tensor, so you can raise its

indexes, too
∇ U V = U αV β ;α eβ . . . putting all that together

C.5 Changing Bases

We might change from a basis ei (for example e0, . . ., e3 ) to a basis eı̄ ( e0̄ , . . . , e3̄ ),
noting that the bar goes over the index and not, as might be more intuitive, the vector
(that is, we don’t write ei or e³i ). The transformation is described using a matrix (it’s not
a tensor) ±ij̄ (§2.2.7).

eı̄ = ±
ie ,
ı̄ i
±ω ı̄
= ± ω
ı̄
i ±i transformation of basis vectors and §3.1.4
one-forms
i i
V ı̄ = ±
ı̄
iV , pı̄ = ±
ı̄
pi transformation of components (2.17)
j¯ j̄ j
ei = ±
i e j̄ = ± ±
i j̄
ej the transformation matrix goes in
both directions
⇒ ± ±
¯
j j
= δ
j
matrix inverses (2.22)
i j̄ i
Notes: (1) these look complicated to remember, but as long as each ± has one barred
and one unbarred index, you’ll ﬁnd there’s only one place to put each index, consistent
with the summation convention. (2) This shows why it’s useful to have the bars on the
indexes. (3) Some books use hats for the alternate bases: eı̂ .

C.6 Einstein’s Summation Convention

See §2.2.5.
²
pi V i ≡ piV i .
i
The convention works only for two repeated indexes: one up, one down. This is one
of the reasons why one-form components are written with lowered indexes and vectors
C.7 Miscellaneous 147

with raised ones; the other is to distinguish the components pi of the one-form ±
p from
p, ±
the components pi = gij pj of the related vector p = g(± · ). Points to watch:
• A term should have at most two duplicate indexes: one up, one down; if you ﬁnd
something like Ui V i or Ui T ii , you’ve made a mistake.
• All the terms in an expression should have the same unmatched index(es):
A i = Bj T i j + Ci is all right, Ai = B j T k j is a mistake (typo or thinko).
ij ik ji
• You can change or swap repeated indexes: A Tij , A T ik , and A Tji all mean exactly
ij
the same thing, but all are different from A Tji (unless T happens to be symmetric).

However, sometimes we will refer to particular components of a tensor or matrix,

such as referring to the diagonal elements of the metric as gii – there’s no summation
convention here, so the proscriptions above aren’t relevant. The context matters.

C.7 Miscellaneous
³
i ij 1 if i = j
δ j = δ = δij = Kronecker delta symbol §2.1.1
0 else
η µν = diag(− 1, 1, 1, 1) the metric (tensor) of Minkowski (2.33)
space
[ A, B] = A B − B A the ‘commutator’ before (3.57)
A [ij] = (Aij − A ji )/ 2 anti-symmetrisation
A (ij ) = (Aij + A ji )/2 symmetrisation (4.42)
For the definitions – and specifically the sign conventions – of the Riemann, Ricci,
and Einstein tensors, see Eqs. (3.49), (4.16), and (4.23). For a table of contrasting sign
conventions in different texts, see Table 1.1 in §1.4.3.
Note: ±ij¯ and ²jki
are matrices, not the components of tensors, so the indexes don’t
correspond to arguments, and so don’t have to be staggered.
In general, component indexes are Roman letters: i, j, and so on. When discussing
specifically space-time, it is traditional but not universal to use Greek letters such as µ ,
ν , α, β , and so on, for component indexes ranging over { 0, 1, 2, 3 } , and Roman letters
for indexes ranging over the spacelike components { 1, 2, 3} . Here, we use the latter
convention only in Chapter 4.
Components are usually standing in for numbers, however we’ll sometimes replace
them with letters when a particular coordinate system suggests them. For example ex
rather than e1 in the context of cartesian coordinates, or write ²φφ θ
rather than, say,
1
²
22 when writing the Christoffel symbols for coordinates (θ , φ) . There shouldn’t be
confusion (context, again), because x, y, θ , and φ are never used as (variable) component
indexes; see e.g., Eq. (3.15).
References

Abbott, B. P. et al. (2016), ‘Observation of gravitational waves from a binary black hole
merger’, Phys. Rev. Lett.116 , 061102. https://doi.org/10.1103/PhysRevLett.116
.061102.
Amaro-Seoane, P. et al. (2013), ‘Doing science with eLISA: Astrophysics and
cosmology in the millihertz regime’, Preprint. https://arxiv.org/abs/1201.3621
Amaro-Seoane, P. et al. (2017), ‘LISA mission L3 proposal’, web page. www
.lisamission.org/articles/lisa-mission/lisa-mission-proposal-l3. Last accessed July
2018.
Barton, G. (1999), Introduction to the Relativity Principle , John Wiley and Sons.
Bell, J. S. (1987), Speakable and Unspeakable in Quantum Mechanics , Cambridge
University Press.
Carroll, S. M. (2004), Spacetime and Geometry, Pearson Education. See also http://
spacetimeandgeometry.net .
Collins, H. (2004), Gravity’s Shadow: The Search for Gravitational Waves , University
of Chicago Press.
Collins, H. and Pinch, T. (1993), The Golem: What Everyone Should Know about
Science , Cambridge University Press.
Einstein, A. (1905), ‘Zur Elektrodynamik bewegter Körper (On the electrodynam-
ics of moving bodies)’, Annalen der Physik 17, 891. https://doi.org/10.1002/
andp.19053221004. Reprinted, in translation, in Lorentz et al. (1952).
Einstein, A. (1920), Relativity: The Special and the General Theory , Methuen. Orig-
inally published in book form in German, in 1917; ﬁrst published in English in
1920, in an authorised translation by Robert W Lawson; available in multiple
editions and formats.
Goldstein, H. (2001), Classical Mechanics, 3rd edn, Pearson Education.
Hartle, J. B. (2003), Gravity: An Introduction to Einstein’s General Relativity , Pearson
Education.
Hawking, S. W. and Ellis, G. F. R. (1973), The Large Scale Structure of Space-Time ,
Cambridge University Press.
Janssen, M. and Renn, J. (2015), ‘Arch and scaffold: How Einstein found his ﬁeld
equations’, Physics Today 68(11), 30–36. https://doi.org/10.1063/PT.3.2979 .
Landau, L. D. and Lifshitz, E. M. (1975), Course of Theoretical Physics, Vol. 2: The
Classical Theory of Fields , 4th edn, Butterworth-Heinemann.

148
References 149

Longair, M. S. (2003), Theoretical Concepts in Physics: An Alternative View of

Theoretical Reasoning in Physics , 2nd edn, Cambridge University Press.
Lorentz, H. A., Einstein, A., Minkowski, H. and Weyl, H. (1952), The Principle of
Relativity, Dover.
Misner, C. W., Thorne, K. S. and Wheeler, J. A. (1973), Gravitation , W. H. Freeman.
Narlikar, J. V. (2010), An Introduction to Relativity , Cambridge University Press.
Nevels, R. and Shin, C.-S. (2001), ‘Lorenz, Lorentz, and the gauge’, IEEE Antennas
and Propagation Magazine43(3), 70–71. https://doi.org/10.1109/74.934904 .
Norton, J. D. (1993), ‘General covariance and the foundations of general relativity:
Eight decades of dispute’, Reports on Progress in Physics56(7), 791. https://doi
.org/10.1088/0034-4885/56/7/001.
Particle Data Group (2016), ‘2017 Review of Particle Physics’, Chinese Physics
C 40(100001). https://doi.org/10.1088/1674-1137/40/10/100001. See also http://
pdg.lbl.gov.
Pitkin, M., Reid, S., Rowan, S. and Hough, J. (2011), ‘Gravitational wave detection by
interferometry (ground and space)’, Living Reviews in Relativity14(1), 5. https://
doi.org/10.12942/lrr-2011-5.
Ptolemaeus, C. (1984), Ptolemy’s Almagest, Duckworth. Trans. G. J. Toomer.
Rindler, W. (2006), Relativity: Special, General, and Cosmological , 2nd edn, Oxford
University Press.
Roberts, T. and Schleif, S. (2007), ‘What is the experimental basis of special
relativity?’, web page. The sci.physics.relativity FAQ, including an actively
curated list of references. http://math.ucr.edu/home/baez/physics/Relativity/SR/
experiments.html. Last accessed July 2018.
Sathyaprakash, B. S. and Schutz, B. F. (2009), ‘Physics, astrophysics and cosmology
with gravitational waves’, Living Reviews in Relativity12(1), 2. https://doi.org/10
.12942/lrr-2009-2.
Schild, A. (1962), ‘The principle of equivalence’, The Monist 47(1), 20–39. www
.jstor.org/stable/27901491. Last accessed July 2018.
Schutz, B. F. (1980), Geometrical Methods of Mathematical Physics , Cambridge
University Press.
Schutz, B. F. (2003), Gravity from the Ground Up , Cambridge University Press.
Schutz, B. F. (2009), A First Course in General Relativity , 2nd edn, Cambridge
University Press.
Schwartz, J. and McGuinness, M. (2003), Einstein for Beginners, Pantheon Books.
Snygg, J. (2012), A New Approach to Differential Geometry using Clifford’s Geometric
Algebra, Birkhäuser Basel. https://doi.org/10.1007/978-0-8176-8283-5.
Stewart, J. (1991), Advanced General Relativity, Cambridge University Press.
Taylor, E. F. and Wheeler, J. A. (1992), Spacetime Physics, 2nd edn, W. H. Freeman.
Wald, R. M. (1984), General Relativity, University of Chicago Press.
Weisberg, J. M., Nice, D. J. and Taylor, J. H. (2010), ‘Timing measurements of the
relativistic binary pulsar psr b1913+16’, The Astrophysical Journal722(2), 1030–
1034. https://doi.org/10.1088/0004-637X/722/2/1030.
Will, C. M. (2014), ‘The confrontation between general relativity and experiment’,
Living Reviews in Relativity17(1). https://doi.org/10.12942/lrr-2014-4.
Index

acceleration, 4, 6, 10, 74, 97, 111, 140 Einstein

affine parameter, 65, 67, 81, 106 field equations, 97–102
tensor, 94, 101
basis, 27, 49 Einstein–Hilbert action, 101
coordinate, 39, 51 energy density, 87, 92
dual, 29, 32, 37, 39, 50, 144 energy-momentum tensor, 87–92
transformation, 32, 50 equivalence principle, 7, 95, 96, 107
Bianchi identities, 94 in SR, 110
strong, 95
cartesian basis, 42
weak, 4
cartesian coordinates, 41, 63
euclidean space, 41, 46, 52
Cauchy stress tensor, 24
chart, 46
Christoffel symbols, 54, 58, 59, 77, 78, falling lift, 6
107 Faraday tensor, 93
commutator, 71, 73, 147 field, 25, 50, 64
components, 27–31 one-form, 26, 50
connecting vector, 73, 80 tensor, 32, 56
connection, 62 vector, 42, 61, 62
metric, 64, 66 flat space, 41
contraction, 22, 26, 30, 145 fluid, 85
in SR, 40 flux vector, 86
coordinate system, 46
cosmological constant, 100 galilean transformation, 3, 113
covariant derivative, 56, 62
gauge invariance, 105
in LIF, 63, 71, 94, 96
general covariance, principle of, 2
curvature, 67
and the metric, 32 geodesic, 64
curve, 46 coordinates, 60
deviation, 9, 72, 140
d’Alembertian, 137 equation, 65, 67, 80, 106
determinant, 20 gradient one-form, 26, 50, 87
direct product, 24 gravitational redshift, 8
dust, 85 gravitational waves, 136–142

150
Index 151

index lowering, 32 reference frame, 46

inertia tensor, 24 inertial, see ‘inertial frame’
inertial frame, 3, 6, 66 relativity, principle of, 3
local, 3, 6, 10, 60, 80 Ricci
inner product, 19, 31 scalar, 94
positive-deﬁnite, 19 tensor, 94, 101
Riemann tensor, 70, 101
Kronecker delta symbol, 19, 34
Schild’s photons, 8
Levi–Civita symbol, 90 Schwarzschild radius, 132
Lie derivative, 61, 105 sign conventions, 15, 40, 70, 94
linear independence, 19 signature (of the metric), 40, 60
linearity, 18, 65 special relativity, 3, 6, 40, 85, 87, 110
local ﬂatness theorem, 60 strain, 136
Lorentz basis, 42 stress-energy tensor, see ‘energy-momentum
Lorentz transformation, 3, 118 tensor’
summation convention, 28
manifold, 46 symmetrisation, 105, 147
mass density, 88
Maxwell’s equations, 93 tangent plane, 49, 61
MCRF, 86 tangent vector, 47–50, 62, 64
metric tensor, 31–32 Taylor’s theorem, 60, 70
and the Christoffel symbols, 58 tensor, 21, 144
Minkowski space, 40, 41 (anti)symmetric, 22
momentum density, 88 components, 29
trace, 20
natural units, 12, 102, 118, 132 transformation matrix, 33
Newton’s laws, 107 in SR, 40
norm, 19, 40 transverse gauge, 138
normal coordinates, 60 transverse-traceless gauge, 138

one-form, 21 units, geometrical, see ‘natural units’

orthogonal, 19
orthogonal coordinates, 133 vector, 21
orthonormal, 20 vector space, 19, 21, 49
outer product, 24, 88 basis, 19
components, 19
parallel transport, 61, 67 dimension, 19
path, 46 span, 19
polar coordinates, 37 volume one-form, 90

Experiment 2: G From Simple Pendulum
100% (3)
Experiment 2: G From Simple Pendulum
15 pages
Linear and Weakly Non-Linear Stability Analyses of Double-Diffusive Electro-Convection in A Micropolar Fluid
No ratings yet
Linear and Weakly Non-Linear Stability Analyses of Double-Diffusive Electro-Convection in A Micropolar Fluid
27 pages
Brayton Cycle
No ratings yet
Brayton Cycle
18 pages
G12 Module Electrical Conductors and Wires
No ratings yet
G12 Module Electrical Conductors and Wires
34 pages
EE4003 Power Systems II: Course Syllabus
100% (1)
EE4003 Power Systems II: Course Syllabus
24 pages
Whirlpool Art 666 G
0% (1)
Whirlpool Art 666 G
5 pages
BEE-001 Block 2
100% (3)
BEE-001 Block 2
116 pages
Potential and Kinetic Energy PPT 2015
100% (1)
Potential and Kinetic Energy PPT 2015
31 pages
5fe932d126caf PDF
No ratings yet
5fe932d126caf PDF
50 pages
P&ID Symbols
No ratings yet
P&ID Symbols
17 pages
3 Integration
No ratings yet
3 Integration
18 pages
As 62271.201-2008 High-Voltage Switchgear and Controlgear AC Insulation-Enclosed Switchgear and Controlgear F
No ratings yet
As 62271.201-2008 High-Voltage Switchgear and Controlgear AC Insulation-Enclosed Switchgear and Controlgear F
10 pages
Chapter 06
No ratings yet
Chapter 06
55 pages
"Thermochemistry": Chemistry Charles Page High School Stephen L. Cotton
No ratings yet
"Thermochemistry": Chemistry Charles Page High School Stephen L. Cotton
25 pages
Ratio: This Chapter Will Show You
No ratings yet
Ratio: This Chapter Will Show You
16 pages
Determination of The Earths Magnetic Field
No ratings yet
Determination of The Earths Magnetic Field
9 pages
Vapour and Combine Power Cycles
No ratings yet
Vapour and Combine Power Cycles
4 pages
Lesson 3 Tuned Oscillators
No ratings yet
Lesson 3 Tuned Oscillators
25 pages
Product Information: Industrial X-Ray Tube I-2115A Toshiba X-Ray Tube For Industrial Use
No ratings yet
Product Information: Industrial X-Ray Tube I-2115A Toshiba X-Ray Tube For Industrial Use
5 pages
Electromagnetics: Problems in
No ratings yet
Electromagnetics: Problems in
7 pages
Xi Physics HHW 2023-24
No ratings yet
Xi Physics HHW 2023-24
5 pages
High-Frequency Equivalent Circuit of An Induction Motor Driven by A PWM
No ratings yet
High-Frequency Equivalent Circuit of An Induction Motor Driven by A PWM
12 pages
Practical 5
No ratings yet
Practical 5
9 pages
Digital Insulation Tester: Applications and Features
No ratings yet
Digital Insulation Tester: Applications and Features
1 page
Iraq Hermetic 100 400kVA 12kV
No ratings yet
Iraq Hermetic 100 400kVA 12kV
2 pages
Impuse Worksheet Answers
No ratings yet
Impuse Worksheet Answers
6 pages
SMT1000RMI2UC-3Y Datasheet
No ratings yet
SMT1000RMI2UC-3Y Datasheet
3 pages
12 Gold 2 - M1 Edexcel PDF
No ratings yet
12 Gold 2 - M1 Edexcel PDF
12 pages
Eaton 031885 ETR4 51 W en - GB
No ratings yet
Eaton 031885 ETR4 51 W en - GB
4 pages
Tip41c Inchange
No ratings yet
Tip41c Inchange
2 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (649)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1857)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)