0% found this document useful (0 votes)
86 views

General Relativity - Notes

General Relativity describes spacetime as dynamic and curved rather than flat and static. Before General Relativity, space and time were seen as a fixed background for physics. General Relativity changed this by showing that spacetime is curved by mass and energy, and this curvature is experienced as gravity. The document provides a brief history of different views of space and time, from Aristotle's view of a static spacetime with Earth at the center, to later "Atomist" views removing Earth from the center but still viewing space and time as absolute.

Uploaded by

Luca Denti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views

General Relativity - Notes

General Relativity describes spacetime as dynamic and curved rather than flat and static. Before General Relativity, space and time were seen as a fixed background for physics. General Relativity changed this by showing that spacetime is curved by mass and energy, and this curvature is experienced as gravity. The document provides a brief history of different views of space and time, from Aristotle's view of a static spacetime with Earth at the center, to later "Atomist" views removing Earth from the center but still viewing space and time as absolute.

Uploaded by

Luca Denti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

General Relativity

Joe Keir

Please send comments/corrections to [email protected]


Chapter 1

A brief history of spacetime

The observed facts about earth are not only that it remains at the centre, but also that it
moves to the centre. The place to which any fragment of earth moves must necessarily be
the place to which the whole moves; and in the place to which a thing naturally moves, it
will naturally rest.

Aristotle, De Caelo, translated by J. L. Stocks.

Before the development of General Relativity, ‘space’, ‘time’, and later (after special relativity)
‘spacetime’, were simply the background on which physics took place. They were fairly simple and un-
interesting, although some of their properties did play an important role in the foundations of physics.
Nevertheless, aside from foundational considerations, these structures were not central objects of study
in physics: they formed a part of the necessary substrata on which a theory of physics could be built,
but were not particularly interesting in themselves.
All this changed dramatically with General Relativity. As we will see in this course, in GR spacetime
is no longer flat and featureless, but curved in interesting and sometimes extreme ways. This curvature
turns out to be what we experience as gravity. Moreover, spacetime is no longer static but dynamical;
no longer playing just a background role but instead taking a full part in the dynamical evolution of
physical systems.
Before we get to GR, though, let us take a brief look at some of the different historical notions of
space and time, and how this underlying structure informed the physical theories which were built on
top of it. Thinking precisely about these structures and what they mean – rather than letting them fade
into the background – will be helpful when it comes time to formulate GR1 .

1.1 “Aristotelian” spacetime

This is the most ‘common sense’ view of space and time. Something of this sort was held by some ancient
philosophers2 (see the quote at the start of this chapter), and is still common among children, though in
both cases without the level of mathematical sophistication we will give!
1 The presentation of the ‘history’ below is extremely anachronistic: I will present historical ideas in modern notation and

language. Some of the terms used are also rather idiosyncratic – the term “relativity” is usually reserved for the Galilean
and subsequent views, even though there is some kind of ‘relativity’ present in even earlier theories.
2 Actually, Aristotle’s view differed from the point of view put forward here in two important respects: first, he did not

believe in a finite past, and second, he believed that the universe is finite in spatial extent (erroneously believing that one
could not make sense of an “origin” or centre in an infinite space), so that spacetime is really more like E × B, with B a
ball of some large radius, and E the one-dimensional Euclidean space (see section 1.2). Perhaps the spacetimes described
in this section should instead be called “Thomist”, after Thomas Aquinas, who modified Aristotle’s arguments to fit better
with Christianity, in particular inserting a finite past.

2
Figure 1.1: A sketch of “Aristotelian” spacetime, suppressing one spatial direction. Time goes up the
page, and “space at a given time” is represented by a horizontal slice through spacetime. Space extends
infinitely in all directions, while time extends to the future of (or “above”, in the sketch) the plane t = 0.
O is the origin of space and time, and the the origin of V3 at various times, i.e. the curve (t, 0), is shown
as a solid black straight line going up the page, emanating from O.

In this point of view, spacetime is simply R+ × V3 , with times t taking values in R+ and points p
in space taking values in a 3-dimensional vector space V3 . V3 is to be equipped with a positive definite
inner product (the ‘dot product’, X · Y ), which can be used to measure distances and angles. The origin
of V3 specifies some special place – presumably the centre of the earth – and the origin of R+ specifies
some special time – perhaps the ‘moment of creation’. If we choose a basis for V3 that diagonalizes the
dot product, then V3 can be identified with R3 , and the dot product is the usual vector dot product.
The choice of such a basis is unique up to rotations.

“Aristotelian” relativity

Physical laws should “respect” this background structure, meaning that they should be invariant
under changes that leave this structure intact. In this case, these changes are simply rotations of V3 ,
which leave the dot product invariant. In other words, rotations about the centre of the Earth (which is
obviously spherical – we’re not that primitive!) should not change the laws of physics.

Problems with Aristotelian spacetime

With this view of spacetime, physical laws should be the same at different places on the surface of
the earth, which is good. However, there is no reason why physics should be the same if we perform
other transformations, such as rotating about some other point. Physical laws could (and in general
should) depend on the distance away from the centre of the earth, and the time that has passed since
t = 0 – these are invariant under the allowed transformations. At first, this might seem like a good
thing: Aristotle used this idea to argue that the physical laws cause the element ‘earth’ to fall towards
the centre of the universe. However, from this point of view there’s no real reason why the physical
laws at one time or altitude should be anything like the physical laws at some other time or altitude, so
it would be very hard to build a physical theory with any predictive powers on these foundations. Of
course, such a theory becomes untenable once it is appreciated that the Earth (or even the sun) is not
the centre of the universe!

3
Figure 1.2: A sketch of “Atomist” spacetime. Unlike “Aristotelian” spacetime, there is no special origin
of time or space, both of which extend infinitely in all directions. The paths of several particles at rest
are shown as solid black arrows.

1.2 “Atomist” spacetime

This is a slightly more sophisticated version of the ‘common sense’ view of space and time. The an-
cient atomist philosophers, for example, believed in an infinite universe with no central point3 . This
formulation harmonizes with this viewpoint, while also making use of the more advanced Euclidean
geometry.
Atomist spacetime is modelled on E × E3 , where En is the n-dimensional Euclidean space. By this
we mean an affine space4 such that the associated (real) vector space comes equipped with a positive
definite inner product (the ‘dot product’, X · Y ). In other words, we have the same structures as in the
Aristotelian spacetime, but we forget about the origin.

“Atomist relativity”

Again, physical laws should respect the background structure. This time, the transformations leaving
the structure invariant are rotations (about any point), together with translations in space or in time. In
other words, physical laws should be the same no matter where you are in time or space, and no matter
in which direction you are facing.

Problems with Atomist spacetime

This is a big improvement over the Aristotelian spacetime considered above: we no longer have a
preferred centre of space or start of time, which is a much better foundation on which to build a predictive
theory. Despite these nice properties, there is still a notion of absolute space, and a restrictive notion of
objects being at rest. As Galileo would later show, these notions are incompatible with observations.
How do the relatively flexible structures of affine space give rise to these rigid structures? First, we
can define absolute space E3 using the projection

E × E3 → E3
(t, x) 7→ x

so, although there is no distinguished origin to space, there is a unique point in “absolute space” (thought
3 This viewpoint is also similar to the one put forward by the pantheist occultist Giordano Bruno in support of Copernicus.
4 An affine space can be thought of as just a vector space where we forget the special role played by the origin. More
concretely, for each pair of points in the affine space there is a unique vector - the “difference” between the pair of points -
in an associated vector space.

4
of as Euclidean space E3 ) associated with every point in spacetime.
Next, consider a path through spacetime:

γ : R → E × E3
(s) 7→ (s, p(s))

where we have chosen some (arbitrary up to translations) identification of E with R. Such a path through
spacetime is called a worldline: it could represent, for example, the path taken by a point particle, which
is at position p(s) at time t(s).
Now, this worldline is said to be the worldline of a particle at rest if, for all s, s̃ ∈ [0, 1], we have
p(s) − p(s̃) = 0. Recall that, for two points in the affine space E3 , their difference defines a vector in
R3 . Moreover, the transformations allowed by atomist relativity – translations and rotations – induce
corresponding transformations on R3 , which in this case are just the rotations. So, properties of these
vectors in R3 are allowed to have a physical meaning if they are invariant under rotations – in other
words, the length of such a vector can have a physical meaning. In particular, the zero vector is invariant
under rotations. Hence, the equation p(s) − p(s̃) = 0 defines a physically distinguished class of worldlines
on Atomist spacetime (see figure 1.1). Given one worldline of a particle at rest, it is fairly easy to see
that all the other such worldlines are given by translations of this one reference worldline.
Theories of physics built on these foundations were fairly successful: for example, the earth can be
considered “at rest”, and then it is natural to think that other objects will tend to follow other worldlines
which are at rest, at least in the absence of external factors (such as forces). This matches observations
fairly well, however, there are several difficulties with such a proposal: for example, it is very difficult to
explain projectile motion in this framework. Projectiles appear not to be being affected by any external
factors5 and yet they are clearly not moving along world lines which are at rest!

1.3 Galilean spacetime

Figure 1.3: A sketch of Galilean spacetime. To each point in time we associate an E3 , which corresponds
to space at that time. Unlike in the previous examples, there is no preferred identification of the
different E3 ’s, leading to the absence of absolute space. However, there is a special family of curves
through spacetime, which are the worldlines of inertial observers, some of which are shown as arrows
through spacetime in the figure

After rolling some balls around on inclined planes, Galileo recognised the problems with models built
on absolute space. In a modern formulation the alternative model of spacetime proposed by Galileo takes
the form of a fibre bundle, with base space E and fibres E3 . If you’re not familiar with fibre bundles,
think of the line E, and, above every point on this line, place an E3 (see figure 1.3). This differs from
5 Here we are ignoring air resistance, but this can be justified by taking projectiles with very large mass so that the air
resistance is small.

5
E1 × E3 in that there is no preferred identification of the different E3 ’s that are situated ‘above’ different
times – in other words, there is no absolute space.
We need a bit more structure than this: clearly space at one time and space at another time are not
completely unrelated to one another! Mathematically, the extra structure we need is a connection on this
fibre bundle, which allows us to identify a special class of “straight” worldlines on the fibre bundle. In
Galilean spacetime, these worldlines are all related to one another by Galilean transformations, discussed
below.

Galilean relativity

In a Galilean spacetime, the distinguished class of worldlines are said to be the worldlines of inertial
observers. Suppose that we have one such worldline: we can use this to identify the different E3 ’s by using
coordinates on each E3 so that this worldline passes through the origin6 . In terms of these coordinates,
other worldlines of inertial observers can be found by means of Galilean transformations:

t 7→ t + t0
xi 7→ Rij xj + v i t + (x0 )i .

Here Rij is a special orthogonal matrix, t0 is a constant scalar and v i and (x0 )i are some constant vectors.
Note that repeated indices are summed over.
As usual, physical theories should respect this background structure, which means that the physical
laws should be invariant under these Galilean transformations. Famously, this principle of Galilean
relativity is obeyed by Newton’s equations F = ma: the acceleration is not quite invariant (under a
Galilean transformation ai 7→ Rij aj ), but it does transform like a vector, and so it is able to play a
physical role, as long as we understand that the force F should also transform like a vector.

Problems with Galilean spacetime

This conception of spacetime was tremendously successful and underpins Newtonian mechanics. It is
still the view held by most people who have never learned relativity. Nevertheless, it has a few problems,
the main one being the constancy of the speed of light.
Under a Galilean transformation, a velocity V will transform as

V i 7→ Rij V j + v i .

In particular, the magnitude of a velocity is not invariant under Galilean transformations, and in fact,
given a velocity V there is always some transformation setting V to zero. The speed of light, therefore,
should be different when measured by different inertial observers. On the other hand, Maxwell’s equa-
tions predict just one speed of light, independent of the observer, and this was eventually confirmed by
experiment.
Note that the magnitude of the difference between two velocities (or relative velocity) is invariant
under Galilean transformations. A popular solution to the problem of the constancy of the speed of light
was therefore to suppose the existence of “the aether” through which light propagates. The velocity
c, derived from Maxwell’s equations, was then supposed to be just the relative velocity of light waves
through the aether. However, all attempts to detect the aether failed, most famously in the Michelson-
Morley experiment, which showed that the speed to light is the same in all directions.
There is also the issue of absolute time. With the Galilean viewpoint, we have successfully done away
with the notion of absolute space, but there is still a notion of absolute time. This can be defined by
using the canonical projection from the fibre bundle to E, which can be viewed as “collapsing” each of the
E3 ’s onto the point along the line E that it sits above (see figure 1.3). Using this, each “event” (that is,
each point in spacetime) can be assigned a unique “time” in E. In particular, this gives a unique ordering
6 Technically, we also need to ensure that the E3 ’s don’t rotate relative to one another. This can be done by choosing a
basis for one of the E3 ’s, and then using the connection to parallel transport this basis onto the other E3 ’s– see chapter 4

6
on events: certain events happen before others, and all inertial observers will agree on what precedes
what. However, as Einstein showed (in thought experiments involving lightning bolts hitting trains),
there can be instances where different inertial observers will disagree about the ordering of events: this
is the famous relativity of simultaneity.

1.4 Minkowski spacetime

Figure 1.4: A sketch of Minkowski spacetime M4 . Space and time are no longer separate entities, but
are merged into the single object spacetime. There is no longer a (unique) notion of space at a given
time: horizontal slices are space at a given time according to an inertial observer moving vertically up
the page, but other observers will split space and time differently. The light cones (or null cones) are
invariant under Lorentz transformations, so all observers will agree on them - some of these are shown
in the sketch. Worldlines of inertial observers are straight lines and (if the observers have nonzero mass)
pass through the interior of the light cones.

Up until this point, ‘space’ and ‘time’ have actually been quite distinct, playing distinct and separate
roles in the formulations above. The first true ‘spacetime’, in which space and time are combined and
unified in a natural way, was put forward by Einstein in his special theory of relativity, and subsequently
geometrised by Minkowksi.
According to this view, spacetime is to be identified with M4 , which is a four dimensional affine space.
Unlike the Euclidean spaces En of the previous examples, this doesn’t come equipped with a positive
definite inner product, but instead it comes equipped with an indefinite (but non-degenerate) quadratic
form m with Lorentzian signature (−, +, +, +).

Special relativity

This spacetime structure constructed above is invariant under Poincaré transformations. If we choose
coordinates xa , a = 0, 1, 2, 3 for the affine space M4 (i.e. we make an arbitrary choice of origin) then
these transformations are given by
xa 7→ Λba xb + y a
where Λ is a matrix in SO(1, 3), whose action is called a Lorentz transformation, and y a is a fixed
spacetime vector, corresponding to space and time translations.

7
The set of timelike7 . straight lines in Minkowski space defines inertial observers in this view of
spacetime. Note that a Poincaré transformation takes straight lines to straight lines, therefore the
property of being an inertial observer can have physical content.
Physical theories should be invariant under Poincaré transformations: this is the principle of relativity.
In practice, this means that physical theories can be of the form: “For an inertial observer, the physical
laws are…”. We will review special relativity in more detail in chapter 3.

Problems with Minkowski spacetime

Unlike the previous views of spacetime, which were maintained for hundreds or even thousands of
years, Minkowski spacetime and special relativity held the crown as the state-of-the-art view of spacetime
for a mere ten years. This is not because of their lack of success, but rather because of the rapid progress
made by Einstein, who was able to quickly supersede his own theory!
The main problem (in fact, the only real problem) with this view of spacetime is that it fails to
incorporate gravity. As we shall see in the next chapter, Newtonian gravity is incompatible with special
relativity, so Einstein set about trying to remedy this. In the process, not only did he produce a new
theory of gravity - which remains the current best theory of gravity, after more than 100 years - but he
revolutionised our conception of spacetime for a second time.

7 See chapter 3

8
Chapter 2

Newtonian gravity

The history of the apple is too absurd. Whether the apple fell or not, how can any one
believe that such a discovery could in that way be accelerated or retarded? Undoubtedly,
the occurrence was something of this sort. There comes to Newton a stupid, importunate
man, who asks him how he hit upon his great discovery. When Newton had convinced
himself what a noodle he had to do with, and wanted to get rid of the man, he told him
that an apple fell on his nose; and this made the matter quite clear to the man, and he
went away satisfied.

Carl Friedrich Gauss, as quoted by Robert Chambers, The Book of Days (1832).

In Newtonian gravity, the gravitational potential Φ is a function on spacetime satisfying Poisson’s


equation:
∆Φ = 4πGρ.
Here ∆ is the Laplacian in 3 dimensions: in an inertial frame (that is, in the natural rectangular
coordinate associated with an inertial observer), it is given by


3
∆= ∂i2 ,
i=1

G is Newton’s constant, and ρ, mapping points in spacetime to R, is the matter density.


Particles in a gravitational field experience a gravitational force in the direction of the gradient of
the gravitational potential. If a particle has position x = x(t), then

mẍ = −m∇Φ.

2.1 Problems with Newtonian gravity

From the Galilean viewpoint, Newtonian gravity is completely acceptable. However, from the point of
view of special relativity, there are several problems.
From a theoretical point of view, the major problem is that the equations of Newtonian gravity are
not invariant under Lorentz transformations. Two different inertial observers will generally construct
two different gravitational potentials, and predict two different and incompatible motions for particles.
In the Newtonian theory, gravity propagates instantaneously: Poisson’s equation is solved separately
at each instant of time. So if, for example, the Sun were to suddenly disappear, then it would take about
8 minutes for the light to go out, but we would notice the gravitational effect instantly. But what does

9
“instantly” mean? Remember that, in special relativity, different inertial observers will define different
time coordinates, and will label different times as “now”.
There are also various observational problems with Newtonian gravity, which were beginning to cause
problems for physicists around the time of Einstein. The orbit of the planet Mercury did not quite match
with astronomers’ predictions, leading to the prediction of an extra planet (dubbed ‘Vulcan’) to account
for the observed deviations. Then, there is the bending of light: if light is considered as a wave (i.e.
treating Maxwell’s equations classically) then there should be no bending of light, since the gravitational
field does not appear in Maxwell’s equations. On the other hand, considering light as a particle, then
there should be some bending1 .
Finally, there is the famous equivalence principle, which leads to a philosophical argument against
Newtonian gravity.

2.2 The equivalence principle

Consider the following two situations: situation (A), in which a closed box (or elevator) is in free-fall
towards the Earth2 , and situation (B), in which a similar closed box is freely floating in space (figure
2.1). Then there is no local way (i.e. on a short length and timescale) that an experimenter inside the box
could tell whether they are in situation (A) or situation (B). This is despite the fact that, in Newtonian
gravity, these two situations are described in completely different manners: in situation (A) we would
say that the box sits in a gravitational field and the experimenter experiences a gravitational force, while
in situation (B) no such force is present.
In Newtonian gravity, the reason why it is difficult to tell these situations apart is that the gravita-
tional force is proportional to the mass. This should remind you of fictitious forces such as the centrifugal
force3 .
There is an amusing variant of this thought experiment, which goes as follows. Imagine sitting in
a sealed room on the Earth with no windows. Then imagine that the entire Earth suddenly vanishes,
except for the room you are sitting in, but at the same moment the room starts to accelerate upwards
at 9.8ms−2 , perhaps pushed upwards by some rockets beneath the room (figure 2.2). Then there is no
way that, from inside the room, you will be able to tell that this has happened!
These ideas are formalised in the equivalence principle, which comes in three versions:

The weak equivalence principle


The trajectories of all test particles moving in gravitational fields depend only on their initial positions
and velocities.
Here a test particle is a point mass which does not self-interact gravitationally. The weak equivalence
principle implies that the mass of such a test particle has no effect on its motion through a gravitational
field.

The Einstein equivalence principle


All local, non-gravitational experiments performed in freely falling laboratories will obtain the same
outcomes, regardless of the position and velocity of the lab.
Here a freely falling lab is one in which no local acceleration of the lab can be measured. Such a lab
could either be floating in space or falling to Earth, as discussed earlier.
1 This might seem odd, given that the gravitational force on a massless particle is zero. However, the approach to

massless particles in Newtonian theory was to write out the equations with the mass m as a parameter, and then to take
the limit m → 0. When working out the trajectory of a massless particle in a gravitational field, m appears on both the
right and left hand sides of the equation of motion, and can therefore be cancelled off both sides.
2 As usual, we picture the ideal situation in which there is no air resistance!
3 It is a universal feature of “fictitious forces” that they are proportional to the mass of the object on which they are

acting, just like the force of gravity.

10
This upgrades the weak equivalence principle to apply to experiments which involve things other
than the motion of test particles, e.g. experiments involving other forces such as electromagnetism.
The experiment must still be local, meaning that it can’t involve large time or length scales, otherwise
tidal forces can be measured. The Einstein equivalence principle also rules out experiments involving
significant gravitational interaction between the measured objects.

The strong equivalence principle


The motion of all sufficiently small bodies moving in gravitational fields depends only on their initial
positions and velocities. Also all local experiments performed in freely falling laboratories will obtain the
same outcomes, regardless of the position and velocity of the lab.
The first part upgrades the weak equivalence principle to apply to sufficiently small extended bodies,
while the second part upgrades the Einstein equivalence principle to allow for gravitational interaction
between the objects of study.

Figure 2.1: Einstein’s thought experiment: (A) a scientist in an elevator falling to Earth, while in (B)
they are floating in space. No local experiment can distinguish between (A) and (B).

Figure 2.2: Did this just happen?

11
The equivalence principle strongly suggests that spacetime is curved. The intuition runs as follows:
in previous versions of spacetime, special curves have been distinguished, which we have called the
worldlines of inertial observers. Physically, these are supposed to represent paths taken by observers
who experience no external forces, and we have always assumed that such paths are straight lines. On
the other hand, the equivalence principle suggests that freely falling observers experience no external
forces. However, freely falling objects don’t move along paths which we would normally consider “straight
lines” – they can orbit the Earth, for example. We can still consider these to be a kind of straight line,
but only if spacetime itself is curved!

12
Chapter 3

Review of special relativity

Since the mathematicians have invaded the theory of relativity, I do not understand it
myself anymore.

Einstein, quoted in Zum Siebzigsten Geburtstag Albert Einsteins (To Albert Einstein’s
Seventieth Birthday), translated by Paul A. Schilpp.

In this chapter we’ll do a lightning survey of special relativity, putting special emphasis on the aspects
of the theory which will be central in the transition to GR, and viewing things from a “geometric”
perspective.

3.1 Conventions

First, let’s fix some of the conventions we’ll use thoughout the course.
The signature of the metric is (−, +, +, +). In other words, a − sign is associated with times,
not lengths, and the Minkowski metric is diag(−1, 1, 1, 1). The alternative signature, (+, −, −, −), is
frequently used, particularly in high energy physics, but we’ll take the viewpoint that time is weird and
lengths are normal.
We’ll always use the Einstein summation convention, so whenever repeated indices appear we will
sum over them. For example, if v µ and wµ are a spacetime vector and covector respectively, then


3
v µ wµ := v µ wµ .
µ=0

On the very rare occasions where we do not wish to sum over a pair of repeated indices, then we will
write this explicitly.
Because of this notation, repeated indices are sometimes called dummy indices. It doesn’t matter
which letters are used for these indices: for example, v µ wµ = v a wa = v ν wν .
We will use Greek indices µ, ν, ρ . . . to refer to abstract spacetime indices: if we write out some equation
involving this kind of index, then we have not picked any particular set of coordinates. Consequently,
any such equation should be independent of the coordinates we choose!
Latin indices from the start of the alphabet, a, b, c . . . will be reserved for concrete indices, that is, they
will always refer to a particular set of coordinates (or, occasionally, some special class of coordinates).
Thus, equations written with these indices might not be true in an alternative coordinate system. In

13
general, when such an index takes the value 0 then it refers to ‘time’, and when it takes a value 1, 2 or
3 then it refers to ‘space’.
Latin indices from the middle of the alphabet, i, j, k . . . will be used to refer to spatial indices,
excluding time.
We will use “geometrized units”: the speed of light c = 1, and Newton’s constant G = 1, except on
the few occasions where we display them explicitly for clarity.

3.2 The metric and causal structure

In inertial coordinates xa = (x0 , x1 , x2 , x3 ), the Minkowski metric is mab is given by


 
−1 0 0 0
0 1 0 0
m :=  0 0 1
.
0
0 0 0 1

So, for example, m00 = −1. Similarly, in these coordinates its inverse is
 
−1 0 0 0
0 1 0 0
m := 
−1
0 0 1
.
0
0 0 0 1

Given a (nonzero) spacetime vector v, we say that the vector is spacelike if m(v, v) > 0, timelike if
m(v, v) < 0 and null if m(v, v) = 0. Similarly, given two points p and q in Minkowski spacetime, we
say that these points are spacelike separated, timelike separated, or null separated (or lightlike separates)
depending on whether the vector v = p−q is timelike, spacelike or null respectively. A point in spacetime
is sometimes called an event.
The set of points that are null separated from a point p are said to lie on the light cone of the point
p (see figure 3.1).

Figure 3.1: Timelike (red), spacelike (blue) and null (black) vectors, in a sketch where two spatial
dimensions have been suppressed. The set of timelike vectors has two connected components, which we
call past directed and future directed vectors. Despite its appearance in this sketch, the set of spacelike
vectors has only a single connected component: one spatial dimension can be restored by revolving this
diagram around the vertical axis.

14
3.3 Lorentz transformations

A Lorentz transformation is a transformation from one set of inertial coordinates to another, fixing the
origin. These are given by linear transformations
′ ′
y a := Λaa xa ,

where the matrix Λaa has unit determinant and preserves the form of the metric, that is,
′ ′
mab = Λaa Λbb ma′ b′ .

These can be split up into boosts, which mix the time and space coordinates - e.g. a boost with
relative velocity v in the x direction is
 
γ −γv 0 0
−γv γ 0 0
Λaa =  

 0 0 1 0
0 0 0 1
1
γ=√
1 − v2
and rotations, which have the form ( )
′ 1 0
Λaa = (3.1)
0 Rij

with Rij ∈ SO(3).

3.4 Curves, tangent vectors, proper time and proper length

A curve is a map γ : [0, 1] (or R) → M4 . In inertial coordinates, we can write γ(λ) = xa (λ).
The tangent vector to the curve γ at the point p with coordinate xa (λ0 ) is
d a
v a p := x (λ) λ=λ .
dλ 0

We can use tangent vectors to measure the rate of change of a function along a curve. Given a
function f : M4 → R, using the chain rule we have

d dγ a ∂f
(f ◦ γ(λ)) = ,
dλ dλ ∂xa
where γ a (λ) = xa (λ) are the coordinates of the curve in the coordinates (xa ). Hence

d ∂f
(f ◦ γ(λ)) = v a a .
dλ ∂x

The tangent vector to a curve can be either timelike, spacelike or null. Note the character of the
tangent vector (i.e. whether it is timelike, spacelike or null) at a point p along the curve does not depend
on the parametrisation of the curve1 (exercise).
A curve is said to be timelike, spacelike or null if its tangent vector is everywhere timelike, spacelike
or null. Massive particles move along timelike curves, and massless particles move along null curves2 .
1 If σ : R → R is monotonic increasing, then an alternative parametrisation of the curve is given by γ ◦ σ.
2 These curves are also called worldlines, in both the timelike and null cases.

15
If γ is a timelike curve, then the proper time along γ, τ , is defined to be the parameter along the
curve so that the tangent vector to the curve satisfies
( )( )
d a d b
m(v, v) = mab x (τ ) x (τ ) = −1.
dτ dτ

Such a parameter can always be found along a timelike curve (exercise - see the previous exercise for
the transformation of tangent vectors under reparametrisation). Such a parameter is unique up to the
choice of origin, that is, the point along the curve at which τ = 0. It has a physical meaning in special
relativity, given by:
Postulate (The clock postulate). An accurate clock moving along a timelike worldline measures the
proper time along the worldline.

Similarly, if γ is a spacelike curve, then the proper length s of γ is the parameter defined so that

dxa dxb
mab = 1.
ds ds

There is no analogue to proper time or distance along a generic null curve. However, there are special
null curves, which are generated by a null vector as follows: let p ∈ M4 be some fixed position, and let
v be a null vector. Then consider the curve “generated” by the vector v through the point p, defined by
the equation

γ(λ) − p = λv.

This defines a null curve with tangent vector v. The parameter λ along such a curve is called an
affine parameter. We can find another affine parametrisation of such a curve by choosing a different
point p along the curve, and choosing a different null vector which is proportional to v. Under such a
reparametrisation, the affine parameter λ transforms as λ 7→ aλ + b, for constants a and b.

3.5 Vectors, covectors, tensors and their transformations

Given a point p ∈ M4 , the tangent space at the point p is the vector space consisting of all the vectors
from the point p, that is, we define

Tp (M4 ) := {(q − p) ∈ V 4 q ∈ M4 }.

Equivalently, we can define Tp (M4 ) as the set of all tangent vectors to curves through the point p.
Note that there is a natural way to identify ‘different’ tangent spaces, i.e. the tangent spaces Tp (M4 )
and Tq (M4 ) with q ̸= p. Suppose X is a vector in Tp (M4 ); then there is some point r ∈ M4 such that
X + (q − p) = (r − p). Then we can define the vector X ′ ∈ Tq (M4 ), which corresponds to the vector
X ∈ Tp (M4 ), as
X ′ := (r − q).
See the parallelogram in figure 3.2. Although this identification seems trivial, the fact that, when we
come to consider curved spacetimes, there is no such natural identification leads to a lot of important
consequences for GR.
Consider inertial coordinates xa , chosen so that the point p is at the origin xa = 0. Then the point
′ ′
q has coordinates xa = q a . Under a Lorentz transformation y a = Λaa xa , the coordinates of the point q
transform as
q a = xa

= (Λ−1 )a′ a y a
′ ′
⇒ y a = Λ aa q a .

16
Figure 3.2: The identification of a vector X at p and the corresponding vector X ′ at q.


In other words, in terms of the new coordinates y a , the coordinates of the point q are
′ ′
q ′a = Λaa q a .

Hence, the components of the vector X = q − p transform under a Lorentz transformation as


′ ′
X ′a = Λaa X a . (3.2)

This is the transformation law for the components of a vector. Equation (3.2) is sometimes used to define
vectors, i.e. a vector is any quantity which transforms under this rule.
The cotangent space at the point p, Tp∗ (M4 ), is the dual space to the tangent space Tp (M4 ), i.e. it
is the space of linear maps from Tp (M4 ) to R. Elements of the cotangent space are called covectors (or
sometimes 1-forms).
The transformation law for vector components can be used to deduce the corresponding transforma-
tion law for covectors: let η ∈ Tp∗ (M4 ) be a covector. The components of η with respect to some inertial
coordinates (chosen so that p is at the origin) are the four numbers ηa where, for any vector X ∈ Tp (M4 ),
we have
η(X) = ηa X a ,
where X a are the components of the vector X in the inertial coordinates. Note that η(X) is just a real
number (a scalar) – it clearly doesn’t transform at all under Lorentz transformations! So, if ηa′ ′ are the
′ ′
components of η with respect to the inertial coordinates y a = Λaa xa , we must have

ηa X a = ηa′ ′ X ′a

= ηa′ ′ Λaa X a ,

and so the transformation law for covectors is

ηa′ ′ = (Λ−1 )a′ a ηa . (3.3)

Vectors are sometimes said to transform contravariantly, while covectors transform covariantly.
( )n ( )m
A tensor at the point p is an element of Tp (M4 ) × Tp∗ (M4 ) for some n, m ≥ 0. Such a tensor
is said to be of rank (n, m), or to have valency (n, m). For example, given a vector X and a covector η
we can form the tensor Xη, which has components (in any coordinate system)

(Xη)µν := X µ ην .
( )m ( )n
It is often useful to view a (n, m) tensor as a map from Tp (M4 ) × Tp∗ (M4 ) → R, which is always
possible due to the finite dimensionality of the tangent and cotangent spaces.

17
Tensors transform under Lorentz transformations in the obvious way: for a rank (n, m) tensor T , its
components transform as
a′1 a′2 ...a′n a′ a′2 a′n
(T ′ ) (Λ−1 )b′ 1 (Λ−1 )b′ . . . (Λ−1 )b′
b b2 bm
b′1 b′2 ...b′m
= Λa1 1 Λa2 . . . Λan T a1 a2 ...anb1 b2 ...bm . (3.4)
1 2 m

A contraction of a tensor is formed by summing over a pair of indices, with one “up” index and one
“down” index. Using the Einstein summation convention, this is written as a tensor with the same letter
used in one of the “up” indexes and one of the “down” indexes, e.g. T bba . Such objects are also tensors
(exercise: show that T bba transforms as a covector).
Indices can be lowered and raised using the metric m and its inverse m−1 . To be precise: the metric
can be used to define an isomorphism Tp (M4 ) → Tp∗ (M⋭ ) as follows: for a vector X, and an arbitrary
vector Y

X 7→ X ♭
X ♭ (Y ) = m(X, Y ),

or, in terms of indices (in which case it is conventional to avoid the ‘flat’ sign)

Xµ := mµν X ν ,

and similarly for a covector η and an arbitrary vector Y , we have

η 7→ η ♯
m(η ♯ , Y ) = η(Y ),

or in terms of indices
η µ := (m−1 )µν ην .

3.6 Tensor fields

A tensor field is an assignment of a tensor to all points in spacetime.


One example of a tensor field is the metric m: this defines a rank (0, 2) tensor field whose action on
the vector fields X, Y is given by
m(X, Y ) := mab X a Y b
where, on the right hand side, we are working in an inertial coordinate system, and mab = diag(−1, 1, 1, 1).
Note that, because of the special properties of the metric tensor, the metric takes this form in all inertial
coordinate systems.
Another example is the identity or Kronecker delta: this is a (1, 1) tensor field whose action on a
vector field X and a covector field η is given by

δ(X, η) := η(X).

Note that, in any coordinate system (not just inertial ones!) the components of δ are given by3

δab = diag(1, 1, 1, 1).

Next, consider a function f : M4 → R, and consider the covector field df whose components in the
coordinates xa are
∂f
(df )a = = ∂a f.
∂xa
3 It is conventional not to stagger the indices on the Kronecker delta, unlike other tensor fields.

18
By construction this defines a covector field: its action on the vector X is

df (X) = (∂a f )X a .
′ ′
Note that, if we work in another set of inertial coordinates y a = Λaa xa , then (using the usual transfor-
mation law for covector fields) the components of df are

(df )′a′ = (Λ−1 )a′ a (df )a


∂f
= (Λ−1 )a′ a a
∂x
b′
−1 a ∂y ∂f
= (Λ )a′
∂x ∂y b′
a

′ ∂f
= (Λ−1 )a′ a Λab
∂y b′
∂f
= ,
∂y a′
∂f
so in fact, in any inertial coordinates, the components of df are given by an expression of the form ∂xa .

Finally, consider a more general rank (n, m) tensor field with components in the xa coordinate system
T a1 ...anb1 ...bm .
We can construct the (n, m+1) tensor field which has components given by the derivatives
of the components of T , i.e.
∂c T a1 ...anb1 ...bm .
It is easy to check that this definition is actually independent of the inertial coordinates in which we
work, i.e. this expression transforms as an (n, m + 1) tensor field.

3.6.1 Integral curves

If X is a vector field, then we can define the integral curves of the vector field X. In some inertial
coordinates xa , the integral curve of the vector field X through the point with coordinates xa0 is the
curve defined by the ODE
d a
x (λ) = X a xa (λ)

xa (0) = (x0 )a .

Standard ODE theory ensures that this equation has a unique solution, if the vector field X is smooth
and has bounded components.

3.7 Worldlines of particles

Suppose a particle moves along a curve with coordinates xa (λ) in some inertial frame. For a massive
particle, we can parametrize this curve by the proper time τ instead of the parameter λ.
We define the velocity of the particle v as the tangent vector to its worldline, parametrized by proper
time. In terms of inertial coordinates, we have

dxa (τ )
va = .

Note that this defines a vector along the curve γ, but it does not define a genuine vector field since there
is not prescription for the vector away from the its wordline.

19
Note that, by the definition of proper time, we have mab v a v b = −1. Thus we can write
( )
1
v=γ
v
1
γ=√ .
1 − |v|2

The four-momentum of the particle is defined as the covector pa := µmab v b , where µ is the rest mass
of the particle. Thus we have
( ) ( 1 )
p = − µγ , µγv = − µ − µ|v|2 + O(|v|4 ) , µv + O(|v|3 ) .
2
So for |v| ≪ 1, i.e. for velocities much slower than the speed of light p0 is (up to an additive constant)
the usual expression for the kinetic energy, while pi are the usual expressions for the components of the
momentum. Thus we write
p = (−E, p).

On the other (hand,


) we can choose inertial coordinates so that, at some time τ0 , the velocity vector
1
has components . In these coordinates, at this instant of time, we have
0

p = (−µ, 0).

Now, the quantity −(m−1 )ab pa pb is a scalar quantity, so its value does not depend on which inertial
coordinates we use to evaluate it. So we have Einstein’s famous formula

E 2 − |p|2 = µ2 ,

or, restoring the speed of light using dimensional arguments (recall that we set c = 1),

E 2 = µ2 c4 + |p|2 c2 .

The acceleration of a massive particle is the four-vector

d2 a
aa := x (τ ).
dτ 2
Exercise: show that the acceleration of a particle is orthogonal to its four-velocity in the Lorentzian
sense, i.e. m(v, a) = mab v a ab = 0.

3.8 The energy-momentum tensor

For a continuous distribution of matter, the energy density, energy flux, momentum density and pressure
are encoded in a symmetric rank (2, 0) tensor field T µν .
If v a is the tangent vector of the worldline of an observer moving through spacetime, then

• The vector j a := −T ab vb is the four-momentum density.


• The scalar ρ := −va j a = T ab va vb is the energy density. For normal matter ρ ≥ 0 (the weak energy
condition).

Moreover, if n and N are spacelike vectors defined along the worldline of an observer which are normalised
so that m(n, n) = m(N, N ) = 1, and which are orthogonal to the velocity of the observer (i.e. m(n, v) =
m(N, v) = 0), then

20
• The scalar p = T ab na nb is the pressure measured by the observer in the n direction.
• The stress in the n direction across a surface orthogonal to N is S = T ab na Nb .

For example, for a perfect fluid moving along the integral curves of a vector field u (normalised so
that m(u, u) = −1) the energy-momentum tensor is

T ab := (ρ + p)ua ub + p(m−1 )ab ,

where ρ and p are the fluid density and pressure in its rest frame. The equation of state specifies the
scalar field p (the pressure of the fluid) in terms of the scalar field ρ (the density of the fluid), i.e. p = p(ρ).
A second example is given by a massless scalar field, whose the energy-momentum tensor is
1
T ab := (∂ a ϕ)(∂ b ϕ) − (m−1 )ab (m−1 )cd (∂c ϕ)(∂d ϕ)
2
where here ϕ is the scalar field, and ∂ a ϕ = (m−1 )ab ∂b ϕ.
For continuous distributions of matter, the conservation of energy and momentum is ensured by
the conservation of the energy-momentum tensor, that is, by its being divergence free: using inertial
coordinates:
∂a T ab = 0.
To see how this is connected with conservation, consider some region S0 , with smooth boundary ∂S0 , in
the “time slice” t = 0. Write St for the time translation of this surface. Let n be the outwards pointing
unit normal to ∂S (see figure 3.3). Then, by Stoke’s theorem, we have
∫ ∫ ∫ t′ ∫
T a0 dx1 dx2 dx3 = T a0 dx1 dx2 dx3 + T ab nb dΣdt
St′ S0 t=0 ∂St

where dΣ is the surface element of ∂St . Choosing a = 0 we find that the energy in the region St (or “in
the region S at the time t) is equal to the initial energy in the region S0 , plus the integral of the flux
of energy through the bounary ∂S . Similarly, choosing a = 1, 2, 3 we obtain the same conclusion for the
momentum in the region S.

Figure 3.3

21
Chapter 4

Differential geometry

Riemann has shewn that as there are different kinds of lines and surfaces, so there are
different kinds of space of three dimensions; and that we can only find out by experience
to which of these kinds the space in which we live belongs. In particular, the axioms of
plane geometry are true within the limits of experiment on the surface of a sheet of
paper, and yet we know that the sheet is really covered with a number of small ridges
and furrows, upon which (the total curvature not being zero) these axioms are not true.
Similarly, he says although the axioms of solid geometry are true within the limits of
experiment for finite portions of our space, yet we have no reason to conclude that they
are true for very small portions; and if any help can be got thereby for the explanation of
physical phenomena, we may have reason to conclude that they are not true for very
small portions of space.

William Kingdon Clifford, On the Space-Theory of Matter, Proceedings of the Cambridge


Philosophical Society (1876)

General relativity predicts that spacetime is curved, but contrary to the quote from Clifford above,
in most places in the universe it is not significantly curved on small scales, but instead on large scales.
In fact, it is curved on the scale at which gravitational effects become relevant. In order to understand
the curvature of spacetime, we will need a fair amount of the mathematics of differential geometry. We
will be able to cover this subject in its full glory – instead, we will concentrate on the parts of the subject
which will come in useful later, covering them in as much rigour as we have time for.

4.1 Manifolds and coordinate charts

The basic object of study in differential geometry is a manifold. A manifold M is a topological space1
(i.e. we can talk about open sets in M), where sufficiently small open sets “look like” Rn .
This is made precise as follows: for every point p ∈ M, there is an open neighbourhood U of p and
a map ϕU : U → Rn (called a chart or coordinate chart - the set U is called a coordinate patch). These
charts are required to be bijections between U and the image ϕU (U ), they are also continuous and they
have continuous inverses. Consequently, ϕU (U ) is an open subset of Rn . n is some fixed natural number,
called the dimension of the manifold.
We can use the chart ϕU to define local coordinates xa in the set U . These are defined as the ‘pull-
back’ of the standard coordinates on Rn : in a slight abuse of notation, for each a ∈ {0, 1, 2, . . . n − 1} we
set
xa (p) = xa (ϕU (p)),
1 It is also required to be second countable and Hausdorff, but these technical details will not concern us.

22
where, on the right hand side, xa (ϕU (p)) is just the value of the standard coordinate xa in Rn at the
point ϕU (p). See figure 4.1.

Figure 4.1: A manifold M, with a coordinate patch U and a chart ϕU . Local coordinates are defined in
the patch ϕU by using the usual coordinates on Rn and the chart ϕU .

Generally, we will need more than one chart to cover the manifold M. An atlas is a collection of
charts which covers the entire manifold.
It can happen that two charts overlap - that is, we can have charts ϕU and ϕV with U ∩ V ̸= ∅. On
the overlap, we can define the transition functions:

ϕU,V : Rn → Rn
x 7→ ϕV ◦ (ϕ−1
U )(x)

(see figure 4.2).


Since these transition functions are simply maps from some open set of Rn to another open set of
R , we can make sense of, for example, the differentiability of these maps. We will always work with
n

smooth manifolds, meaning that the transition functions are C ∞ .

4.2 Curves and tangent vectors

As before curve on the manifold M is a map

γ : [0, 1] (or R) → M.

How can we make sense of tangent vectors? Unlike before, we don’t have a map from differences of
points to a vector space. However, we can still differentiate functions along a curve: given f : M → R,
we have
d
f ◦ γ := V (f ).

23
Figure 4.2: Here the coordinate patches U and V overlap, allowing us to define the transition functions
ϕU,V and ϕV,U . For a smooth manifold, these transition functions are smooth.

Here we define V to be the tangent vector to the curve γ. It satisfies the following two important
properties: for constants a, b ∈ R and functions f , g : M → R

1. Linearity:
V (af + bg) = aV (f ) + bV (g).

2. The Leibniz rule:


V (f g) = gV (f ) + f V (g).

In terms of local coordinates xa , we can set



V (f ) p = V (f ◦ ϕ−1
U ◦ ϕU ) p
( )
= V f˜(xa ) p ,

where f˜ = f ◦ ϕ−1
U . Then, using the chain rule we have

˜
a ∂f

V (f ) p = V (x ) p a
∂x xa (p)
= V a ∂a f˜.

Since this formula holds in all local coordinates, we write V = V µ ∂µ . By a common abuse of notation,
people often write f for f˜ = f ◦ ϕ−1 U , although these are two different objects: f is a function on
the manifold, while f˜ is a function of the local coordinates xa (of course, they take the same value at
corresponding points!).

4.3 Vectors, covectors, tensors and their transformation laws

Given a point p ∈ M, a vector at p is just the tangent vector to some curve2 through p, at the point p.
2 Strictly speaking we need to talk about equivalence classes, because there are multiple curves with the same tangent

vector. Two curves γ and γ ′ , with tangent vectors V and V ′ at p are said to define the same vector if V (f ) = V ′ (f ) for all
f.

24
The tangent space at p, Tp (M) is simply the set of all vectors at p. It is not hard to show that Tp (M)
is a vector space with the same dimension as the dimension of the manifold (exercise - hint: read the
next sentence!). In fact, given some local coordinates xa , we can define the vectors ∂a = ∂x∂ a as the
vectors tangent to the curves along which xa changes while xb , b ̸= a remain constant, parametrised by
xa (see figure 4.3). Such vector fields are sometimes called coordinate induced vector fields.


Figure 4.3: The coordinate induced vector field ∂x points in the direction where x changes while all the

other coordinates (here, the coordinate y) remain the same. Similarly, ∂y points in the direction where
y changes while x remains constant.

As before, the cotangent space Tp∗ (M) is the dual space of the vector space Tp (M), i.e. it consists of
all linear maps (called covectors) from the tangent space to the reals.
( )m
A tensor of rank (n, m) is an element of (Tp (M)) × Tp∗ (M) . We can also view this as a linear
n
( ) n
map from (Tp (M)) × Tp∗ (M) to the reals.
m

Now, suppose we have some local coordinates xa . Then the components of the vector X are

X a := X(xa ).

Note that, by the chain rule, we have


X = X a ∂a .
In particular, the components of the vector ∂b are

(∂b )a = ∂b (xa ) = δba .

Suppose we change coordinates in a neighbourhood of the point p, from the coordinates xa to coor-

dinates y a . Then the new components of the vector X are, using the chain rule,
′ ′
′ ′ ∂y a ∂y a a
(X ′ )a = X(y a ) = a
X(xa ) = X .
∂x ∂xa
This is the transformation law for vectors.
Let η be a covector. Then the components of η are defined to be

ηa := η(∂a )
⇔ η = ηa dxa .

25
Note that
η(X) = η(X a ∂a ) = X a ηa .
Since this holds in any coordinate system, we can write η(X) = ηµ X µ . Now, under a change of coordi-
nates as above, we have

′ ∂y a
η(X) = ηa X a = (η ′ )a′ X a = (η ′ )a′ a X a ,
∂x
so we must have

∂y a ′
(η )a′ = ηa
∂xa
∂xa
⇒ (η ′ )a′ = ηa
∂y a′
using the inverse function theorem. This is the covector transformation law.
More general tensors transform in the obvious way:
′ ′ ′
′ ′ ′
′ a1 a2 ...an ∂y a1 ∂y a2 ∂y an ∂xb1 ∂xb2 ∂xbm a1 a2 ...an
(T ) b′1 b′2 ...b′m
= . . . ′ ′ . . . ′ T b1 b2 ...bm .
∂xa1 ∂xa2 ∂xan ∂y b1 ∂y b2 ∂y bm

4.4 Tensor fields and examples

As before, a tensor field is an assignment of a tensor to all points in spacetime3 . Sometimes we may
define a vector field only on some open subset of the manifold.

( As usual,
)m we ( can also
)n consider a rank (n, m) tensor field as a linear operator at each point p from
Tp∗ (M) × Tp∗ (M) to the reals. This means that it is C ∞ -linear in its arguments. For example, a
covector field η is a function from vector fields to the reals, satisfying

η(aX + bY ) = aη(X) + bη(Y ) for all scalar fields a, b and all vector fields X, Y .

Note that a and b are allowed to vary (smoothly) from point to point – they do not have to be constant!
The tangent bundle T (M) is the union of all of the tangent spaces of the manifold:

T (M) = Tp (M)
p∈M

An element of the tangent bundle is a pair (p, X), where p is a point in the manifold and X is a vector
at p. In an exactly analogous way, we can define the cotangent bundle as the union of all the cotangent
spaces.

4.4.1 Some examples of tensor fields

Suppose that U ⊂ M is covered by a coordinate chart, with local coordinates xa . Then, in the set U ,
we can define the vector fields ∂a = ∂x∂ a as above.
Given a smooth function f : M → R (a scalar field), we can define the covector field df , the
differential of f , by its action on an arbitrary vector field X:

df (X) := X(f ).

Since, for all p, this defines a linear map from Tp (M) to the reals, this defines a covector field.
3 We will always work with smooth, i.e. C ∞ tensor fields. To check the differentiability of a tensor field we can simply

examine its components in a chart. Since the transition functions are restricted to be smooth, this is a coordinate-
independent notion.

26
As before, we can define the Kronecker delta, which is a (1, 1) tensor field, defined by its action on
an arbitrary vector field X and covector field η:

δ(X, η) = η(X).

This defines a linear map from Tp (M) × Tp∗ (M) to the reals, and so δ is a tensor field.
Given two vector fields X and Y , we can form their product XY , which is a rank (2, 0) tensor field
with components (XY )ab = X a Y b . Similarly we can form a scalar field by contracting the indices of a
(1, 1) tensor field: T µµ is a scalar field, with values (in any local coordinates) given by T aa (exercise:
show that this defines a tensor field.). The same applies to tensors of higher rank: we can form new
tensors by taking products of tensors (in which case we increase the overall rank), or by contracting
indices (in which case we lower the rank).
If Tµν is a rank (0, 2) tensor field, then we can define its symmetric and antisymmetric parts:
1
T(µν) := (Tµν + Tνµ )
2
1
T[µν] := (Tµν − Tνµ ) .
2

4.4.2 The metric tensor

Finally, we introduce the metric tensor, g. Manifolds which come equipped with a metric tensor are
called4 Lorentzian manifolds.
The metric is a symmetric, rank (0, 2) tensor field. For any vector field X, we define the covector
field X ♭ by
X ♭ (Y ) := g(X, Y ) for all vector fields Y.
Then the metric is non-degenerate: X ♭ = 0 if and only if X = 0. In components we can write

(X ♭ )a = gab X b = Xa ,

where we adopt the notational convention that the metric g lowers indexes.
The metric also has signature (−, +, +, +). This means that, in any coordinate system, at any point
in the manifold, the matrix gab = g(∂a , ∂b ) has signature (−, +, +, +) (exercise: show that the notion
of signature is invariant under a change of coordinates).
The metric g will play the same role as the Minkowski metric m did in special relativity. In short:

• A nonzero vector X is timelike if g(X, X) < 0, spacelike if g(X, X) > 0 and null if g(X, X) = 0.
• Curves are timelike/spacelike/null if their tangent vector is everywhere timelike/spacelike/null.
• On a timelike curve we define the proper time as the parameter such that the tangent vector has
norm −1. Similarly, on a spacelike curve the proper distance is defined so that the tangent vector
has norm 1.

There are several common notations in use for the metric tensor. In terms of local coordinates xa ,
we can write
g = gab dxa dxb = ds2 .

Often it is taken for granted that the metric is symmetric, and so for brevity a non-symmetric
expression is written down, with the understanding that the true metric is found by symmetrising. For
example, we might write
g = dxdy,
4 If
the metric has Lorentzian signature. If the metric had signature (+, +, +, +), then we would call it a Riemannian
manifold.

27
which should be understood as
1 1
g= dxdy + dydx.
2 2

The quantity ds2 , which is really just the metric tensor, is sometimes called the line element.
We can also define the inverse metric g −1 . This is a rank (2, 0) metric defined by the relation

g −1 (X ♭ , η) = η(X).

for all vector fields X and covector fields η.


In components, this reads

(g −1 )ab Xa ηb = (g −1 )ab gac X c ηb = X a ηa .

Since this holds for all X and η, it follows that (g −1 )ab gac = δcb , i.e. the matrix (g −1 )ab is the inverse of
the matrix gab .
We can use the inverse metric to raise indices: for a covector field η, define the vector field η ♯ by

g(η ♯ , Y ) = η(Y ),

for all vector fields Y . In components

(η ♯ )a = (g −1 )ab ηb = η a .

4.5 Calculus on manifolds

Before we can understand dynamics on manifolds, we need to know how to do calculus. Specifically, we
need to know how to take derivatives5 of functions, vector fields, tensor fields etc.
Actually, we already know one way to differentiate scalar fields. Recall that, given a scalar field f ,
we defined the covector field df such that, for vector fields X, df (X) = X(f ). This is the generalization
of the ‘gradient of a function’ to manifolds, and X(f ) is the ‘directional derivative’ of f in the direction
of the vector X.
The components of the covector field df in the coordinate system xa are given by

(df )a = (df )(∂a ) = ∂a f.

Since this holds in any coordinate system, we can write

(df )µ = ∂µ f.

How about vector fields? There is one obvious way to differentiate a vector field: choose some local
coordinates and differentiate the components of the vector field. Unfortunately this won’t work: consider

a change of coordinates xa → y a . Then
′ ( b )
∂y a ′ ∂x ′ b′
∂a X b = ∂ ′ (X )
∂xa a ∂y b′
′ ′ ( )
∂y a ∂xb ′ ′ b′ ∂y a ∂ 2 xb ′
= ∂ ′ (X ) + (X ′ )b .
∂xa ∂y b′ a ∂xa ∂y a′ ∂y b′

The first term is the expected term for the transformation of a (1, 1) tensor field, but the second term is
anomalous. Therefore, the quantity ∂a X b can’t be a (1, 1) tensor field. The geometric reason why this
5 Integration on manifolds is also important for a variety of applications, including analysing the Einstein equations,
however it is beyond the scope of this course.

28
doesn’t work is that there is no canonical way to identify the different tangent spaces at different points,
i.e. Tp (M) and Tq (M). Trying to differentiate the a vector field involves the difference of a vector field
at two different points, but we cannot add or subtract vectors at different spacetime points!
A similar discussion can be had using index-free notation. Suppose that we want to differentiate the
vector field X in the direction of the vector field Y . Then we might be tempted to try to define a vector
field Y (X) which acts on scalar fields f as

Y (X)(f ) := Y (X(f )).

Although this obeys linearity, it does not obey the Leibniz rule, and so does not define a vector field.
The reason is fairly obvious: the expression above depends on the second derivatives of f , whereas a
vector field is supposed to only take first derivatives of f .
How do we resolve this issue? There are actually three different approaches, but we will only pursue
one of these in this course (one alternative approach is introduced example sheet 2, and the third
approach will be covered in the GR2 course). The most important approach for our purposes is through
the introduction of an affine connection.

4.5.1 Affine connections

An affine connection is something that is more or less defined to do the job for us, so that we can
differentiate vector fields. You could worry that there might not actually be such an object, but, as
we will see later, on a Lorentzian manifold we can use the metric to construct a natural, unique affine
connection. But we will return to that later.
An affine connection is a map from a pair of vector fields to a vector field

Γ : (X, Y ) 7→ ∇X Y

with the following properties:

1. ∇ is C ∞ -linear in the first variable: for all scalar fields f and vector fields X,

∇f X Y = f ∇X Y.

2. ∇ satisfies the Leibniz rule in the second variable:

∇X (f Y ) = f ∇X Y + (X(f ))Y.

We can make sense of the components of a connection, which are also called Christoffel symbols.
These are defined, with respect to the local coordinates xa , as follows:

∇∂b ∂c := Γabc ∂a
⇔ Γabc = (g −1 )ad g(∇∂b ∂c , ∂d ).

We will usually write ∇a instead of ∇∂a .

29
Note these are not the components of a tensor field: under a change of coordinates, we have
′ ′
(Γ′ )ab′ c′ = dy a (∇b′ ∂c′ ′ )

∂y a
= dxa (∇b′ ∂c′ ′ )
∂xa
′ ( ( c ))
∂y a ∂x
= dx a
∇ ∂x b ∂
c′ c
∂xa ∂y b ′ ∂b ∂y
′ ( b ( c ))
∂y a ∂x ∂x
′ ∇∂b
a
= dx ∂c
∂x a ∂y b ∂y c′
′ (( b ( c )) )
∂y a a ∂x ∂x ∂xb ∂xc
= dx ∂ b ∂c + (∇ ∂
∂b c )
∂xa ∂y b′ ∂y c′ ∂y b′ ∂y c′
′ ′
∂y a ∂ 2 xa ∂y a ∂xb ∂xc a
= ′ ′ + Γ ,
∂xa ∂y b ∂y c ∂xa ∂y b′ ∂y c′ bc
or equivalently
′ ′ ′ ′ ( )
∂xa ∂y b ∂y c ′ a′ ∂y b ∂y c ∂ 2 xa
Γabc = (Γ ) ′ ′ − .
∂y a′ ∂xb ∂xc bc
∂xb ∂xc ∂y b′ ∂y c′
The first term transforms like a (1, 2) tensor, but the second term does not, so the Christoffel symbols
do not define a tensor field. However, note that the anomalous transformation of the Christoffel symbols
can exactly cancel the anomalous transformation of the object ∂a X b !

4.5.2 Covariant derivatives of vectors and tensors

The affine connection allows us to take derivatives - called covariant derivatives - of vector fields as well
as more general tensor fields. The covariant derivative of the vector field X is a (1, 1) tensor field, defined
as a linear map from a vector field Y and a covector field η to the reals, as follows:

∇X : (Y, η) 7→ η(∇Y X).

In abstract index notation, we can write this (1, 1) tensor field as ∇µ X ν . The components of the tensor
field ∇X with respect to some local coordinates are written ∇a X b . We can write these components in
terms of the Christoffel symbols:

∇Y X = Y a ∇a (X b ∂b )
= Y a (∂a X b )∂b + Y a X b Γcab ∂c
( )
= Y a ∂a X b + Γbac X c ∂b ,

where in the last line we have relabelled some of the dummy indices. Hence, the components of ∇X are

∇a X b = ∂a X b + Γbac X c .

Note that this expression holds in any coordinate system6 .


You should not take the position of these indices too seriously! For example, suppose we have chosen
some specific coordinate system, and we want to calculate the component ∇0 X 1 . Then, according to
the formulae above, this is

∇0 X 1 = ∂0 X 1 + Γ10c X c
= ∂0 X 1 + Γ100 X 0 + Γ101 X 1 + Γ102 X 2 + Γ103 X 3 .
6 With this in mind, you might be tempted to write this expression in abstract indices, i.e. ∇ X ν = ∂ X ν + Γν X ρ .
µ µ µρ
But this would be a mistake: without any coordinate system, we cannot make sense of the notation ∂µ X ν , which does not
ν b b c
represent any (1, 1) tensor field. Nor does Γµρ represent a (1, 2) tensor field - it is only the sum ∂a X + Γac X which does
represent the components of a (1, 1) tensor field.

30
In general (i.e. if most of the Christoffel symbols don’t vanish) this is not an operator acting on the
component X 1 - instead, it depends on all of the components of X. So we cannot think of an expression
like ∇0 X 1 as the operator ∇0 acting on the vector field components X 1 – instead, it should be thought
of as the (0, 1) component of the (1, 1) tensor field ∇X.

Derivatives of general tensor fields


We can also take the covariant derivative of scalar fields, covector fields and higher rank tensor fields.
We define the covariant derivative of a scalar field to be the same thing as differential, that is, for a
scalar field f ,
∇f = df.

We can now extend the definition of the covariant derivative to covector fields by requiring that the
Leibniz rule holds. So, for a covector field η and an arbitrary vector field X, in some arbitrary coordinate
system we have
( ) ( )
da ηb X b = (∇a ηb ) X b + ηb ∇a X b
⇒ (∂a ηb )X b + ηb (∂a X b ) = (∇a ηb ) X b + ηb ∂a X b + ηb Γbac X c
⇒ (∇a ηb ) X b = (∂a ηb − Γcab ηc )X b .

Since this holds for all vector fields X, we must have

∇a ηb = ∂a ηb − Γcab ηc .

Following the same kind of reasoning, we can write out the formula for the covariant derivative of a
tensor of general rank:

∇a T b1 b2 ...bnc1 c2 ...cm = ∂a T b1 b2 ...bnc1 c2 ...cm


+ Γbad1 T db2 ...bnc1 c2 ...cm + Γbad2 T b1 d...bnc1 c2 ...cm . . . + Γbadn T b1 b2 ...dc1 c2 ...cm
− Γdac1 T b1 b2 ...bndc2 ...cm − Γdac2 T b1 b2 ...bnc1 d...cm . . . − Γdacm T b1 b2 ...bnc1 c2 ...d .

4.5.3 The Levi-Civita connection

We have derived all of these properties of affine connections, but we have not shown that such an object
really exists. It turns out that, in general, many such objects do exist, and on a Lorentzian manifold
there is in fact a unique, natural affine connection. This is called the Levi-Civita connection.
The Levi-Civita connection is torsion free. To define the torsion, we first need to define the commu-
tator of two vector fields: given vector fields X, Y , their commutator is the vector field [X, Y ] which acts
on scalar fields as
[X, Y ](f ) := X(Y (f )) − Y (X(f )).
This does define a vector field, since it satisfies linearity and the Leibniz rule (exercise). In terms of
components, we have
[X, Y ]a = X b ∂b Y a − Y b ∂b X a .
You can check that this does transform as a tensor field. Note that the commutator can be defined
without using an affine connection!
The torsion of the connection ∇ is defined as the (1, 2) tensor field T , whose action on the covector
field η and the vector fields X, Y is

T (η, X, Y ) = η (∇X Y − ∇Y X − [X, Y ]) .

In terms of components, this is (exercise)


a
Tbc = Γabc − Γacb .

31
µ
The Levi-Civita connection is torsion free, so Tνρ = 0. In other words, in any local coordinate system,
the Christoffel symbols are symmetric in their lower indices.
The other defining feature of the Levi-Civita connection is that it is compatible with the metric, which
means that
∇µ gνρ = 0.

How does this lead to a unique “metric compatible” connection? We can calculate

∇a gbc + ∇b gac − ∇c gab = ∂a gbc + ∂b gac − ∂c gab


− Γdab gdc − Γdac gbd − Γdba gdc − Γdbc gad + Γdca gdb + Γdcb gad
= ∂a gbc + ∂b gac − ∂c gab − 2Γdab gdc

1 −1 cd
⇒ Γcab = (g ) (∂a gbd + ∂b gad − ∂d gab ) .
2
The components of the Levi-Civita connection are given by this expression in all coordinate systems. If
you like, you can check that this expression does not transform as a (1, 2) tensor field, but instead as the
components of an affine connection.
From this point onwards, we will always work with the Levi-Civita connection.

4.6 Normal coordinates

One important way in which a connection differs from a vector field is that we can choose coordinates
so that, at some point, the connection vanishes. That is, given a point p ∈ M, we can choose some local
coordinates xa in a neighbourhood of p such that

Γabc p = 0.

These coordinates can be further chosen so that the components of the metric at p are

gab p = diag(−1, 1, 1, 1).

These coordinates are called normal coordinates at p. Using normal coordinates can simplify a lot
of computations - you should remember that, if some equation holds in a particular coordinate system,
and if that equation can be written entirely in terms of tensors or tensor fields, then the equation must
hold in all coordinate systems. But you should be careful when using normal coordinates to remember
that the expressions above only hold at a single point in spacetime.
The proof that such coordinates exist can be found in appendix A.

4.7 Parallel transport

We can use a connection to define parallel transport. A tensor field T is said to be parallel transported
(or “parallely transported”) along the integral curves of the vector field X if it obeys the equation

∇X T = 0,

or, if you prefer abstract indices


X ρ ∇ρ T µ1 µ2 ...µnν1 ν2 ...νm = 0.

This is the closest we can get to saying that T remains “parallel to itself” when moved in the direction
X (see figure 4.4). Note, however, that this doesn’t mean that the values of the components of T in any

32
particular coordinate system remain constant! In normal coordinates at the point p the derivative of the
components of T in the direction X vanishes:

X c ∂c T a1 a2 ...anb1 b2 ...bm = 0,

but this only holds at the point p. Note also that this is not a ‘tensorial’ equation, since it involves the
operator ∂c which does not transform as a tensor, so it is not true in a general coordinate system.

Figure 4.4: X is the tangent vector to the curve γ. Here, the vector field V is parallel transported along
γ, i.e. ∇X V = 0.

4.8 Geodesics

In earlier versions of spacetime, straight lines played a very important role: they represented the paths
of inertial observers, as well as the paths of particles which are experiencing no external forces. How can
we generalise these notions to curved spacetimes?
One way to define “straight lines” which can easily be generalised to curved manifolds is that they
extremise the “distance” between two points. On a Lorentzian manifold this has to be interpreted ap-
propriately: for timelike curves, the proper time is extremised, while for spacelike curves it is the proper
distance that is extremised.
Let γ be a timelike curve through the points p and q on the manifold M, parametrised so that
γ(0) = p and γ(1) = q. We work in some local coordinates xa , where the curve γ has coordinates xa (λ).
Then the proper time interval along the curve is7
∫ 1√
dxa (λ) dxb (λ)
τ (p, q)[γ] := −gab (x) dλ.
0 dλ dλ

The Euler-Lagrange equations can be used to find the curves which extremise this integral. Defining

dxa (λ) dxb (λ)
L := −gab (x) .
dλ dλ
Varying the path xa (λ), the Euler-Lagrange equations give (exercise)

d2 xa 1 −1 ad dxb dxc dL dxa


2
+ (g ) (∂ b gcd + ∂ c gbd − ∂d gbc ) = L −1
dλ 2 dλ dλ dλ dλ (4.1)
d2 xa b
a dx dx
c
−1 dL dxa
⇔ + Γbc =L .
dλ2 dλ dλ dλ dλ
The Levi-Civita connection arises naturally! In fact, deriving the Euler-Lagrange equations is often the
easiest way to derive the components of the Levi-Civita connection in a given spacetime, particularly
7 To see that this is the proper time, we can change parametrisation from λ to the proper time τ along the curve. Then,
using the chain rule, the one-form in the integrand becomes simply dτ .

33
if the spacetime in question has some symmetries, since these symmetries lead to conserved quantities
along the geodesics which can often simplify a lot of algebra.
If we wish, we can choose the variable λ to be the proper time τ , in which case L = 1, and so we
obtain
d2 xa b
a dx dx
c
+ Γ bc = 0. (4.2)
dτ 2 dτ dτ

Equation (4.2) is the geodesic equation, and curves which satisfy it are called geodesics. Exactly the
same equation can be derived for spacelike geodesics, which extremise the proper length.
This equation is a second order ODE for the coordinates xa (τ ) of the geodesic. To solve it, we need
both the initial position of the geodesic (i.e. xa (0), the coordinates of the point from which the geodesic
a
originates) and its initial tangent vector dx
dτ , i.e. the initial direction of the geodesic.
a
Equation (4.2) can also be written in terms of the tangent vector to the curve γ. Writing X a = dx
dτ ,
we have
dX a
+ Γabc X b X c = 0. (4.3)

One final way to write this equation which will be particularly useful for us is to recall that, along
the curve γ (parametrised by proper time) with tangent vector X, for any function f we have
d
f = X a ∂a f = X(f ),

so the geodesic equation can be written as

X b ∂b X a + Γabc X b X c = 0
⇔ X b ∇b X a = 0.

The geodesic equation can therefore be written in the ‘tensorial’ manner:

∇X X = 0. (4.4)

in other words, X is parallel transported along its own integral curve.


Equation (4.4) gives us a new way to think about geodesics: a geodesic is a curve along which the
tangent vector to the curve remains parallel to itself. We can think of ordinary straight lines like this:
pick some vector, and extend a curve in the direction of this vector, making sure that, at every point
along the curve, the tangent to the curve is parallel to the tangent at the preceding point (c.f. figure 4.4).
This equation also tells us how to define null geodesics: a null geodesic is simply a curve whose tangent
vector X is both null and satisfies equation (4.4). In all cases (timelike spacelike or null), such a curve
is said to be affinely parametrised - recall that the tangent to a curve depends on the parametrisation.
Note that the character of a geodesic cannot change: if a geodesic is initially timelike/spacelike/null,
then it will always be so, since along the cuve γ we have
d
(X µ Xµ ) = ∇X (X µ Xµ )

= 2gµν X ν ∇X X µ
= 0,

where we have used the fact that ∇g = 0. So g(X, X) is constant along a geodesic.

4.9 Curvature

We are now (finally!) at a point where we can investigate the most important difference between curved
and flat spacetimes: namely, the curvature.

34
The curvature measures, in a sense, the deviation away from flat space. There are several ways to
approach this, but we will do it by means of geodesic deviation. The idea is that curvature can cause
nearby geodesics to converge or diverge (see figure 4.5).

Figure 4.5: Curvature can cause nearby “parallel” geodesics to either converge or diverge.

Suppose that we have a one-parameter family of timelike geodesics, given in local coordinates by
xa (τ ) = γ a (τ, s). Here τ is the proper time along each geodesic, while the geodesics are labelled by a
continuous parameter s (figure 4.6).
The tangent vector to the curve γ is X, with components X a . We can also define the vector J. This
is not a vector field, but a vector defined along each of the curves γ, with components

∂γ a
Ja = .
∂s τ

J is sometimes called a deviation vector or a Jacobi field.


Along each geodesic we are free to choose the origin of the proper time, i.e. the point at which τ = 0.
Under the reparametrisation τ (s) 7→ τ (s) + b(s), we have

X 7→ X
J 7→ J + b′ X.

Hence we can use our ability to “slide” the point τ = 0 up or down each geodesic γ to ensure that, at
τ = 0, we have g(J, X) = 0. Another way to think of this process is to consider taking a “slice” through
the two dimensional surface made up of the geodesics γ, and to choose the slide so that, at every point,
we are slicing orthogonally to the vector field X; we then choose τ = 0 along every geodesic to be the
point at which the geodesic cuts through the slice.
Note that the vectors X and J commute: we have

[X, J]a = X b ∂b J a − J b ∂b X a
∂ a ∂ a
= J − X
∂τ ∂s
2 a 2 a
∂ x ∂ x
= −
∂τ ∂s ∂s∂τ
= 0.

35
Figure 4.6: A congruence of timelike geodesics γ(τ, s), where τ is the proper time along a geodesic and
s labels the different geodesics. Note that this congruence (locally) defines a 2 dimensional surface in
spacetime. At each point on this surface, X is the tangent vector to the geodesic and J is a Jacobi
field, which commutes with X. The acceleration of J along a timelike geodesic measures the geodesic
deviation.

36
Note also that the value of g(X, J) is invariant along each of the geodesics:

d
g(X, J) = X (gµν X µ J ν )

= ∇X (gµν X µ J ν )
= Xµ ∇X J µ
= Xµ ∇J X µ + Xµ [X, J]µ
1
= ∇J (Xµ X µ )
2
1
= ∇J (−1) = 0,
2
and so X and J will remain orthogonal along each of the geodesics. With this in mind, we can think of
J as a “connecting vector”, measuring the infinitesimal displacement of the geodesic γ(·, s + ϵ) from the
geodesic γ(·, s) (figure 4.6).
Now, we can compute the “acceleration” of the Jacobi field along each of the geodesics. We can think
of this as measuring the acceleration of an infinitesimally displaced geodesic. We find8

∂2
J = ∇ X ∇X J
∂τ 2
= ∇ X ∇J X (using the torsion-free property and the fact that J and X commute) (4.5)
( )
= ∇X ∇J − ∇J ∇X − ∇[X,J] X (using the geodesic equation ∇X X = 0 and the fact
that J and X commute).

Why have we added these additional terms which vanish? The point is that the object in the final
line can be used to define a tensor field. We first define the vector field
( )
R(X, Y )Z := ∇X ∇Y − ∇Y ∇X − ∇[X,Y ] Z,

where X, Y and Z are vector fields. For fixed X, Y , R(X, Y ) can be thought of as a map from vector
fields to vector fields, taking Z to R(X, Y )Z.
This map is C ∞ -linear in each of its arguments. First, note that it is antisymmetric in its first two
arguments: we have
R(X, Y )Z = −R(Y, X)Z,
so we only need to check linearity in one of these arguments. For smooth vector fields X, X ′ , Y, Z and
smooth scalar fields a, b, we have
( )
R(aX + bX ′ , Y )Z = a∇X ∇Y + b∇X ′ ∇Y − ∇Y (a∇X + b∇X ′ ) − ∇[aX+bX ′ ,Y ] Z
(
= a∇X ∇Y + b∇X ′ ∇Y − a∇Y ∇X − b∇Y ∇X ′ − (Y (a))∇X − (Y (b))∇X ′
)
− ∇a[X,Y ]−Y (a)X+b[X ′ ,Y ]−Y (b)X ′ Z
( )
= a∇X ∇Y − a∇Y ∇X − a∇[X,Y ] + b∇X ′ ∇Y − b∇Y ∇X ′ − b∇[X ′ ,Y ] Z
= aR(X, Y )Z + bR(X ′ , Y )Z.

so R(X, Y )Z is C ∞ linear in X and Y .


We can also check that R(X, Y ) is a C ∞ linear map from vector fields to vector fields. It is easy to
8 It is not clear how to interpret the first term in this equation, ∂ 2 J. If J were a scalar field then the first line would
τ
hold, since then X(J) = ∂τ J. However, J is a vector field: we can choose a certain kind of “frame”, with respect to
which the components of J satisfy X(X(J a )) = (∇X ∇X J)a , but this will not be true in all coordinate systems. To avoid
unnecessary details, perhaps it is better to think of this formula as being suggestive of the acceleration of the vector J in
the X direction.

37
see that R(X, Y )(Z + Z ′ ) = R(X, Y )Z + R(X, Y )Z ′ , so we just need to check

R(X, Y )(aZ) = (∇X ∇Y − ∇Y ∇X − ∇[X,Y ] )(aZ)


= a∇X ∇Y Z + X(a)∇Y Z + Y (a)∇X Z + X(Y (a))Z
− a∇Y ∇X Z − Y (a)∇X Z − X(a)∇Y Z − Y (X(a))Z − a∇[X,Y ] Z − ([X, Y ](a)) Z
( )
= a ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z + (X(Y (a)) − Y (X(a)) − [X, Y ](a))
= aR(X, Y )Z.

so R(X, Y ) is a linear map from vector fields to vector fields, which is also linear in both X and Y .
We can use this to define a (1, 3) tensor field, called the Riemann curvature tensor (or just the
Riemann tensor). Acting on a covector field η and three vector fields X, Y , Z, the Riemann curvature
tensor R is defined9 as
R(η, Z, X, Y ) = η (R(X, Y )Z) .

Returning now to the equation governing Jacobi fields (4.5), we find that a Jacobi field satisfies the
ODE
∂2
J = R(X, J)X,
∂τ 2
so it is the Riemann curvature which governs the deviation of nearby geodesics. In flat space, the
Riemann curvature vanishes!
You might prefer to work in abstract indices. Working through everything in components, we find
that, for any vector field X
(∇µ ∇ν − ∇ν ∇µ )X α = Rαβµν X β .

What happens if we commute covariant derivatives and apply them to tensors of various types? We
know what happens when we apply [∇µ , ∇ν ] to a vector field – we get out the Riemann curvature tensor.
When we apply this operator to a scalar field f , working in local coordinates we obtain

[∇a , ∇b ]f = ∇a ∂b f − ∇b ∂a f
= ∂a ∂b f − ∂b ∂a f + Γcab ∂c f − Γcba ∂c f
= 0,

using the torsion-free property of the connection. We can use this to work out the corresponding expres-
sion for a covector η (exercise)
[∇µ , ∇ν ]ηρ = −Rσρµν ησ ,
and for a general tensor

[∇µ , ∇ν ]T ρ1 ρ2 ...ρnσ1 σ2 ...σm = Rρ1 κµν T κρ2 ...ρn σ1 σ2 ...σm + Rρ2 κµν T ρ1 κ...ρn σ1 σ2 ...σm
+ . . . + Rρn κµν T ρ1 ρ2 ...κ σ1 σ2 ...σm
− Rκσ1 µν T ρ1 ρ2 ...ρnκσ2 ...σm − Rκσ2 µν T ρ1 ρ2 ...ρnσ1 κ...σm
− . . . − Rκσm µν T ρ1 ρ2 ...ρnσ1 σ2 ...κ .

4.10 Symmetries of the Riemann tensor

4.10.1 Algebraic symmetries

We already know, from the definition of the Riemann tensor, that it is antisymmetric in its final two
indices:
Rµνρσ = −Rµνσρ .
9 You should be careful when consulting references, as there are different conventions regarding the sign of the Riemann
tensor.

38
From the metric compatibility condition we also find

0 = −[∇µ , ∇ν ]gρσ
= Rαρµν gασ + Rασµν gρα
= Rσρµν + Rρσµν ,

so the Riemann tensor is also antisymmetric in its first two indices.


Next consider the following expression, for some scalar field f

∇ µ ∇σ ∇ν f + ∇σ ∇ν ∇µ f + ∇ν ∇µ ∇ σ f − ∇ σ ∇µ ∇ν f − ∇ µ ∇ν ∇ σ f − ∇ ν ∇ σ ∇µ f

We can group these terms in two different ways: first,

∇µ ([∇σ , ∇ν ]f ) + ∇ν ([∇µ , ∇σ ]f ) + ∇σ ([∇ν , ∇µ ]f ) = 0

since [∇α , ∇β ]f = 0. On the other hand, we can group these terms as follows:
( )
[∇σ , ∇ν ]∇µ f + [∇µ , ∇σ ]∇ν f + [∇ν , ∇µ ]∇σ f = Rαµνσ + Rανσµ + Rασµν ∇α f

where we have used antisymmetry in the last pair of indices. But since this holds for all scalar functions
f , we have
Rαµνσ + Rανσµ + Rασµν = 0 (4.6)
This equation is sometimes known as the first Bianchi identity or the algebraic Bianchi identity.
There is another useful symmetry of the Riemann tensor which follows from the symmetries derived
above. Using the first Bianchi identity and cyclicly permuting indices, we have

Rµνρσ + Rµρσν + Rµσνρ = 0


−Rνρσµ − Rνσµρ − Rνµρσ = 0
−Rρσµν − Rρµνσ − Rρνσµ = 0
Rσµνρ + Rσνρµ + Rσρµν = 0

Adding these four equations together, and using antisymmetry in the first and last pair of indices, we
find that
Rµνρσ = Rρσµν

In summary, the Riemann tensor has the following algebraic symmetries:

Rµνρσ = −Rµνσρ
Rµνρσ = −Rνµρσ
Rµνρσ = Rρσµν
Rµνρσ + Rµρσν + Rµσνρ = 0

4.10.2 The (second) Bianchi identity

There is also an important symmetry of the derivatives of the Riemann tensor, called the second Bianchi
identity or simply the Bianchi identity.
We prove this identity in a similar way to the first Bianchi identity. Consider the following expression,
for some arbitrary covector η

∇ µ ∇ρ ∇ν η σ + ∇ρ ∇ν ∇ µ η σ + ∇ν ∇µ ∇ρ η σ − ∇ µ ∇ν ∇ρ η σ − ∇ ν ∇ ρ ∇ µ η σ − ∇ ρ ∇µ ∇ ν η σ

39
Grouping the terms in one way we obtain

[∇µ , ∇ρ ]∇ν ησ + [∇ρ , ∇ν ]∇µ ησ + [∇ν , ∇µ ]∇ρ ησ = Rανρµ ∇α ησ + Rασρµ ∇ν ηα + Rαµνρ ∇α ησ


+ Rασνρ ∇µ ηα + Rαρµν ∇α ησ + Rασµν ∇ρ ηα
( )
= Rαµνρ + Rανρµ + Rαρµν ∇α ησ
+ Rασνρ ∇µ ηα + Rασρµ ∇ν ηα + Rασµν ∇ρ ηα
= Rασνρ ∇µ ηα + Rασρµ ∇ν ηα + Rασµν ∇ρ ηα

where we have used the first Bianchi identity. But, grouping the same terms in an alternative way, we
have
( ) ( ) ( )
∇µ [∇ρ , ∇ν ]ησ + ∇ν [∇µ , ∇ρ ]ησ + ∇ρ [∇ν , ∇µ ]ησ = ∇µ Rασνρ ηα + ∇ν Rασρµ ηα + ∇ρ Rασµν ηα
= Rασνρ ∇µ ηα + Rασρµ ∇ν ηα + Rασµν ∇ρ ηα
+ ∇µ Rασνρ ηα + ∇ν Rασρµ ηα + ∇ρ Rασµν ηα

Combining these two equations and reordering the indices a bit using the symmetries of the Riemann
tensor, we find that ( )
∇µ Rνρασ + ∇ν Rρµασ + ∇ρ Rµν ασ ηα = 0

Since this holds for all covectors η (and since the connection is metric-compatible), we have proved
the second Bianchi identity
∇µ Rνραβ + ∇ν Rρµαβ + ∇ρ Rµναβ = 0 (4.7)

4.10.3 The Ricci and Einstein tensors and the contracted Bianchi identity

We can contract a pair of indices in the Riemann tensor to form the Ricci curvature tensor (or simply
Ricci tensor), which is also conventionally notated with the letter R:

Rµν := Rαµαν

The symmetries of the Riemann tensor imply that the Ricci tensor is symmetric (exercise).
We can contract the indices of the Ricci tensor to form the scalar curvature, which is also conven-
tionally notated10 by the letter R:
R := (g −1 )µν Rµν

If we contract the indices µ and α in the second Bianchi identity (4.7) and then relabel indices, we
obtain the identity
∇α Rαµνρ − ∇ν Rµρ + ∇ρ Rµν = 0
Contracting again, this time with the indices µ and ρ (and relabelling indices and dividing by two), we
obtain the contracted Bianchi identity
( )
1
∇ Rµν − Rgµν = 0
µ
2

This leads us to define the Einstein tensor:


1
Gµν := Rµν − Rgµν
2

From the calculations above, we have shown that the Einstein tensor is divergence free

∇µ Gµν = 0
10 Because of these conventional notations, when dealing with the curvature it is particularly useful to use abstract index
notation rather than index-free notation.

40
4.11 Curvature in terms of the metric

There is one final aspect of the curvature which will be important for us, and that is its relationship to
the metric tensor. If we work in some local coordinates xa , then we can write

[∇a , ∇b ]X c = Rcdab X d
= ∂a ∇b X c − Γdab ∇d X c + Γcad ∇b X d − (a ↔ b)
( )
= ∂a ∂b X c + ∂a Γcbd X d − Γdab ∂d X c + Γcad ∂b X d − Γdab Γcde X e + Γcad Γdbe X e − (a ↔ b)
= (∂a Γcbd − ∂b Γcad + Γcae Γebd − Γcbe Γead ) X d

and so the components of the Riemann tensor can be written in terms of the Christoffel symbols and
their derivatives as follows:

Rabcd = ∂c Γabd − ∂d Γabc + Γace Γebd − Γade Γebc (4.8)

Recalling the expression for the Christoffel symbols of the Levi-Civita connection in terms of the metric
components
1
Γabc = (g −1 )ad (∂b gcd + ∂c gbd − ∂d gbc )
2
We can substitute this expression into equation (4.8) to obtain a long and not very enlightening equation
for the components of the Riemann tensor.
The important thing to notice about this expression is the following: the Riemann tensor depends
on the metric g and its first two derivatives.

41
Chapter 5

The Einstein equations and physics


in curved spacetimes

We now have all the mathematical machinery needed to understand the Einstein equations, as well as
to adapt other physical laws to curved spacetimes.
First, the Einstein equations. Remember that there are many hints that matter causes spacetime to
curve, but, to be consistent with special relativity, it should not be just the matter density which affects
curvature, but the energy density. This appears in the energy-momentum tensor Tµν .
With this in mind, in October 1915 Einstein tried the equation

Rµν = CTµν

for some constant C. But there is a problem with this equation: the conservation of energy-momentum
means that the energy momentum tensor Tµν is divergence-free1 , while the divergence of the Ricci tensor
generally does not vanish.
By November, Einstein had remedied this problem in the obvious way, replacing the Ricci tensor
with the Einstein tensor. This yields the equation

Gµν = CTµν .

We still need to fix the constant C. This can be done by taking the weak field limit, where we take
the metric to have the form
gab = mab + ϵhab
in some coordinates. We then expand the Einstein equations up to first order in ϵ and then compare
the Einstein equations with Newtonian gravity (see the GR2 course for the details). The upshot of this
calculation is that C = 8π. This leads to the Einstein equations (or the Einstein field equations)

Gµν = 8πTµν . (5.1)

Restoring the speed of light and Newton’s constant, this is


8πG
Gµν = Tµν .
c4
1 In special relativity, the energy momentum tensor satisfied ∂ a T
ab = 0, working in inertial coordinates. This was
motivated by an argument involving integrating over a region of spacetime. This argument can be repeated in a curved
spacetime, but this would require us to develop the theory of integration on manifolds. The upshot is that, as might be
expected, the partial derivative should be replaced by a covariant derivative.

42
5.1 Uniqueness of the Einstein equations and the cosmological
constant

There is a sense in which the Einstein equations are “unique”. This is given by Lovelock’s theorem (the
proof of which is well beyond the scope of this course)

Theorem (Lovelock’s theorem). In four spacetime dimensions, the only tensor fields constructed entirely
from the metric tensor together with its first and second derivatives which are symmetric and divergence-
free are of the form
aGµν + bgµν ,
where a and b are constants.

This suggests the following alternative for the Einstein equations, which Einstein published in 1917
Gµν + Λgµν = 8πTµν . (5.2)
The constant Λ in this equation is called the cosmological constant. Consequently, equation (5.2) are
sometimes called the Einstein equations with cosmological constant.
Einstein originally included the cosmological constant because, without it, he couldn’t find cosmolog-
ical solutions2 of the Einstein equations which didn’t either expand or contract. When later observations
showed that the universe is expanding, Einstein called it his “greatest mistake”. More recent observations
indicate that the cosmological constant is nonzero, but with an incredibly small positive value: in Planck
units, Λ ≈ 7.26 × 10−121 .

5.2 The Einstein equations as a system of PDEs

In a local system of coordinates, the Riemann curvature tensor can be written in terms of the metric
and its first two derivatives. Hence the Einstein equations can be viewed as a second order system of
PDEs for the metric components gab .
Usually we have to supplement these equations with the equations of motion for the matter in order
to obtain a closed system of equations. But there is a special case, called the Einstein vacuum equations,
where there is no matter present:
Gµν + Λgµν = 0. (5.3)

In Newtonian gravity, when there is no matter present, the gravitational potential ϕ vanishes and
there are no dynamics. The situation is very different in GR, where the Einstein vacuum equations (5.3)
are a second order system of equations for the metric, which have many nontrivial solutions!
How do we solve these equations, in general? As with other second order PDEs, it is useful to try
to characterise these equations as elliptic (like the Laplace equation), parabolic (like the heat equation)
or hyperbolic (like the wave equation). In the first case we would expect to have to specify boundary
conditions, while in the second or third case we would expect to specify initial conditions and then treat
the PDEs as evolution equations.
Unfortunately, the Einstein equations are not of any specific type. What’s more, if we try to view these
equations as evolution equations, e.g. specifying the metric components gab and their time derivatives
everywhere on some initial time surface, then we find that there is no unique solution. Disaster!
What could have gone wrong? Remember that we are viewing the equations as a system of PDEs for
the components of the metric gab in some coordinate system xa . But consider another coordinate system
2 See chapter 7.

43

y a , which agrees with the coordinate system xa in a neighbourhood of the initial hypersurface. The metric
in these coordinates has components ga′ ′ b′ , which agree with the components gab in a neighbourhood of
the initial surface, but which will generally differ away from this neighbourhood (see figure 5.1).

Figure 5.1: Initial data for the metric is given on the bottom surface, and we try to solve ‘upwards’.
One solution to the Einstein equations, which agrees with this initial data, is given by the metric gab .
Another metric is given by ga′ ′ b′ , whose components agree with the components of gab on the initial
hypersurface and for some amount of ‘time’, but then disagree at later times. However, they only differ
∂xa ∂xb
by a coordinate transformation, so ga′ ′ b′ = ∂y a′ ∂xb′ gab , where x
a
= y a near the initial hypersurface.
Since the Einstein equations are invariant under coordinate transformations, both gab and ga′ b′ are
solutions to the Einstein equations – so the solutions cannot be unique!

The Einstein equations are tensorial in nature. This means that they have the same form in every
coordinate system. Consequently, if gab is a solution to the Einstein equations, then so is ga′ b′ , where this
solution differs from the first one by a coordinate transformation. From this point of view it is obvious
that the Einstein equations cannot have a unique solution!
The solution to our problem is now obvious: we have to make some specific choice of coordinates!
One convenient choice is to choose the coordinate functions to satisfy wave equations themselves (“wave
coordinates”), in which case the Einstein equations become a system of nonlinear wave equations. Now
we see that the Einstein equations are hyperbolic, and we can think about solving the Cauchy problem:
pose initial data on some initial hypersurface, and then use the Einstein equations to evolve this data
into the future.
Amazingly, it wasn’t until almost 40 years after the Einstein equations were published when these
issues were fully understood and solved (by Choquet-Bruhat in 1952). Although the details of these
calculations are not necessary for our purposes, this point of view – viewing the Einstein equations as a
system of evolution equations – is crucial both for a rigorous approach to GR, and for numerical GR.

5.3 Other physical laws in curved spacetimes

How do we generalise other physical laws to a curved spacetime? First, the kinematics of point particles
is governed by the geodesic postulate: “test particles” move along timelike geodesics in the absence of
external forces. In general, however, we should avoid working with point masses in general relativity:
if you take some matter distribution with a fixed mass and then squeeze it into a smaller and smaller
volume, GR predicts that at some point it will form a black hole! Also, if the test particle had mass or
energy, then this should be included in the energy momentum tensor on the right hand side of the Einstein
equations (this is sometimes called back reaction) and so it would affect the metric tensor and hence the
geodesics. For these reasons, “test particles” in GR really just mean timelike geodesics (sometimes we
talk about “massless test particles”, which follow null geodesics).

44
To generalise other physical laws, we recall that, at any point p on the manifold, we can choose to
work in normal coordinates. In these coordinates, the components of the metric g take the same values
(at the point p) as the Minkowski metric m, and the Christoffel symbols vanish so partial derivatives and
covariant derivatives are the same. So, at the point p, working in normal coordinates, physics should
look like the physics of special relativity.
As an example, consider Maxwell’s equations, which can be written (in inertial coordinates) in special
relativity as

∂a Fbc + ∂b Fca + ∂c Fab = 0


(m−1 )ab ∂a Fbc = 0.

These equations should take an identical form form in normal coordinates at the point p. But, in normal
coordinates, these equations are equivalent to

∇a Fbc + ∇b Fca + ∇c Fab = 0


(g −1 )ab ∇a Fbc = 0,

since partial derivatives are covariant derivatives and gab = mab at p. But now, these equations only
involve tensors, so we can write them using abstract indices

∇µ Fνρ + ∇ν Fρµ + ∇ρ Fµν = 0


∇µ Fµν = 0.

This is how to generalise Maxwell’s equations to a curved spacetime.


This illustrates the following general point: to generalise a physical law from special relativity to
general relativity, we should

1. express the physical law in terms of tensors in Minkowski space,

2. replace all partial derivatives with covariant derivatives, and


3. replace the Minkowski metric m with the metric g.

The principle that all physical laws should be expressed as tensorial equations, encapsured by these
rules, is sometimes called the principle of covariance.

45
Chapter 6

The Schwarzschild metric

Hence, according to article 10, if the ſemi-diameter of a ſphær of the ſame denſity with
the ſun were to exceed that of the ſun in the proportion of 500 to 1, a body falling from
an infinite height towards it, would have acquired at its ſurface a greater velocity than
that of light, and conſequently, ſuppoſing light to be attracted by the ſame force in
proportion to its vis inertiæ, with other bodies, all light emitted from ſuch a body would
be made to return towards it, by its own proper gravity.

John Michelle, letter to the Royal Society, 1784.

One solution to the Einstein vacuum equations (with Λ = 0) is Minkowski space, where we set
gab = mab . But we already know everything about this spacetime - this is just the spacetime of special
relativity!
What other solutions does the Einstein equations have? We have already described a way of gener-
ating many solutions to the equations, by choosing some initial data and treating the Einstein equations
as evolution equations. Unfortunately, in most cases it is impossible to solve these equations explicitly,
due to the highly nonlinear nature of the equations.
Alternatively, we can look for solutions in particular symmetry classes, where the Einstein equations
simplify. There is a long and distinguished history of this approach in physics – its first major success
came just one year after the Einstein equations were published, with the discovery of the Schwarzschild
solution.
This is a solution to the Einstein vacuum equations without a cosmological constant, i.e. Gµν = 0.
It is static, spherically symmetric and asymptotically flat. This means that we can write the metric in
coordinates (t, r, θ, ϕ), where

1. the components of the metric are independent of t (stationarity).


2. The metric is also invariant under t 7→ −t (staticity).
3. The components of the metric are invariant under a family of transformations that can be parametrised
by SO(3) matrices, whose orbits are topological spheres (spherical symmetry).
4. As r → ∞, the metric components approach1 the components of the Minkowski metric written in
spherical polar coordinates (asymptotic flatness).

It might not be obvious that point (2) adds anything new to point (1), i.e. that “static” means
anything more than “stationary’. To see how these two points are different, consider the two dimensional
1 Asymptotic flatness actually requires that the metric approaches the Minkowski metric at a certain rate, but there are
various possible rates and we will not go into the messy details here.

46
metric
g = −dtdθ
This metric is stationary: the nonzero metric components are just gtθ = − 21 and gθt − 12 , which are
obviously independent of t. However, under the map t 7→ −t, the metric transforms as g →
7 −g, so the
metric is not invariant under time reversal!
For a more physical example, consider sphere rotating at a constant speed in an otherwise empty
universe. This situation is stationary: it looks the same at all points in time. But if we reverse the
direction of time, then the situation is not invariant, since then the sphere rotates in the opposite
direction!

There is a famous theorem, which shows that the Schwarzschild solution is, in a sense, unique:
Theorem (Birkhoff’s theorem). Every spherically symmetric solution to the Einstein vacuum equations
is locally isometric to either Schwarzschild spacetime or to Minkowski space.

Locally isometric means that for all sufficiently small open sets can be mapped onto an open set
in the Schwarzschild spacetime (or Minkowski space) by an isometry – that is, the map preserves the
metric, so that all inner products between vector fields are preserved. Basically, Birkhoff’s theorem says
that the only spherically symmetric solutions to the Einstein equations are Minkowski space and the
Schwarzschild spacetime, up to topology.
This theorem is important because it means that the Schwarzschild metric characterises spacetime
outside of any spherically symmetric matter distribution, regardless of the interior structure of the matter
(e.g. the density profile of some fluid). In this case, the metric will not agree with the Schwarzschild
metric inside the matter distribution (where Tµν ̸= 0), where the metric will generally depend on the
specific details of the matter.

6.1 The metric

The metric, written in the canonical coordinates described above, is


( ) ( )−1
2M 2M ( )
g =− 1− dt2 + 1 − dr2 + r2 dθ2 + sin2 θdϕ2 (6.1)
r r

here M > 0 is a constant (called the mass), and the ranges of the coordinates are as follows:

• t∈R
• 0<θ<π

• 0 < ϕ < 2π
• The range of the coordinate r is a bit more complicated, but for now we’ll avoid difficulties and say
2M < r < ∞. For example, we might be considering the region outside a spherically symmetric
star, where the radius of the star is much larger2 than 2M .

It is easy to check that this metric satisfies the properties claimed above: the metric components
only depend on the coordinates r and θ, and as r → ∞ the metric approaches that of Minkowski space.
Checking that this metric is actually a solution to the Einstein equations is extremely tedious, but it is
a calculation that everyone should do at one point in their lives (and then never again!).
2 Modelling the sun as spherically symmetric, the surface r = 2M is around 1km from its centre – well inside the region
where there is matter. For the Earth, this surface is around 1cm from the centre!

47
There are well-understood degeneracies in the metric at θ = 0, π and also at ϕ = 0, 2π. These, of
course, are the usual coordinate issues with the sphere, and reflect the fact that, technically, we need to
use two coordinate charts to cover the sphere. On the other hand, something odd is clearly going on
with the metric at r = 0 and at r = 2M . We’ll return to this point later.

6.2 Gravitational redshift

The first phenomena we’ll investigate is called gravitational redshift. Suppose there are two observers,
Alice and Bob, who move along integral curves of the vector field ∂t , at two different radii but with the
same angular coordinates. Their worldlines, parametrised by the coordinate t, are

Alice: (t, rA , θ0 , ϕ0 )
Bob: (t, rB , θ0 , ϕ0 )
where θ0 and ϕ0 are constants, and 2M < rA < rB .
Suppose that Alice sends Bob regular signals, using light rays. Light rays travel along null geodesics,
which we can also parametrise by the coordinate time t. Then, for radial light rays, since they are null
we have
dxa dxb
0 = gab
dt dt
( ) ( )−1 ( )2
2M 2M dr
=− 1− + 1−
r r dt
( )
dr 2M
⇒ = 1− ,
dt r
so, integrating from r = rA when t = tA to r = rB when t = tB , we have
∫ rB ( )−1
2M
1− dr = tA − tB .
rA r
Importantly, the coordinate time difference tA − tB is itself is a constant,independent of the initial time
when the signal was transmitted (you could just read this off directly from the fact that the metric
is stationary). So, if Alice sends repeated signals at (coordinate) time intervals ∆t, then they will be
received by Bob at (coordinate) time intervals ∆t.
But we should always remember that t is just a coordinate, and has no intrinsic meaning. The amount
of time that Alice and Bob experience as passing – the amount of time measured by their clocks – is the
proper time along their worldlines, not the coordinate time.
What is the proper time along Alice and Bob’s worldlines? Along a worldline where r, θ, ϕ are
constants, we calculate
dxa dxb
−1 = gab
dτ dτ
( ) ( )2
2M dt
=− 1−
r dτ
( )− 12
dt 2M
⇒ = 1− .
dτ r
So, along Alice and Bob’s worldlines, differences in proper times satisfy
( ) 21
2M
∆τA/B = 1− ∆t,
rA/B

48
and the ratio of the emission frequency to the received frequency is
( ) 21
∆τB 1− 2M
rB
=( ) 12 .
∆τA
1− 2M
rA

Since rB > rA , ∆τB > ∆τA . So less time passes for Alice than for Bob: Bob receives the signals
at a lower frequency than the emitted frequency. Clocks run slower in a gravitational field. This is
gravitational redshift. Note that, as rA → 2M the ratio of frequencies tends to infinity – a kind of infinite
redshift, which we will interpret later.

Figure 6.1: ∆τB > ∆τA : Bob receives signals from Alice at a slower rate than they are emitted.

6.3 Geodesics in the Schwarzschild metric

Before we can investigate other important phenomena in the Schwarzschild spacetime, we need to derive
the equations for geodesics in this metric and examine some of their properties.
Rather than deriving the Christoffel symbols by differentiating the metric components directly, it is
easier to start from the Lagrangian for point particles. If the Lagrangian itself is constant – which it is
if we parametrise curves using an affine parameter – then we can either use the original Lagrangian L
or alternatively L2 , since they will result in the same Euler-Lagrange equations3 (exercise). So we can
take the Lagrangian to be

dxa dxb
L = gab
dλ dλ
( ) ( )−1
2M 2 2M
=− 1− ṫ + 1 − ṙ2 + r2 θ̇2 + r2 sin2 θϕ̇2 ,
r r

where the ‘dots’ represent derivatives with respect to some affine parameter λ.
3 Using the ‘squared’ form of the Lagrangian is particularly useful for dealing with null geodesics, since varying this

Lagrangian leads to the geodesic equation, while varying the original Lagrangian leads to problems due to the vanishing of
the Lagrangian.

49
As usual with Lagrangians, it is useful to start with conserved quantities. Since the Lagrangian is
independent of t, we have the conserved ‘energy’
1 ∂L
E := −
( 2 ∂ ṫ )
2M
= 1− ṫ.
r

Similarly, the Lagrangian is independent of ϕ, so we have the conserved angular momentum about
the z axis
1 ∂L
Ω :=
2 ∂ ϕ̇
= r2 sin2 θϕ̇.

The Lagrangian itself is constant: in the timelike case (i.e. for a massive particle) we can choose the
affine parameter λ to be the proper time τ , while in the spacelike case we can choose the proper distance
s, and so we have
( ) ( )−1 {
2M 2 2M −1 (timelike)
− 1− ṫ + 1 − ṙ + r θ̇ + r sin θϕ̇ = −K =
2 2 2 2 2 2
r r 0 (null).

Finally, we can use spherical symmetry to rotate the manifold so that the particle moves only in
the equatorial plane θ = π2 . To be more precise: we can use the SO(3) isometries to rotate so that
the particle is initially in the equatorial plane, and so its initial velocity (i.e. the tangent vector of the
geodesic) is initially in the equatorial plane. Then the equation of motion for θ is
d ( 2 )
r θ̇ − r2 sin θ cos θϕ̇2 = 0
ds
⇒ r2 θ̈ + 2rṙθ̇ − r2 sin θ cos θϕ̇2 = 0.

So, if θ λ=0 = π2 and θ̇ λ=0 = 0, we have θ̈ λ=0 = 0. From this it follows that θ = π2 always.
Putting this all together, we find the evolution equation for the r coordinate
( )−1 ( )−1
2M Ω2 2M
1− ṙ + 2 + K = 1 −
2
E2
r r r
(6.2)
( )
1 2 Ω 2
2M MK E −K
2
⇒ ṙ + 2 1− − = .
2 2r r r 2

This is the equation of motion of a particle with energy 21 (E 2 − K), moving in an effective potential

MK Ω2 M Ω2
V (r) = − + 2− 3 .
r 2r r

6.3.1 Timelike geodesics

In this case K = 1 and the effective potential is


M Ω2 M Ω2
V (r) = − + 2− 3 .
r 2r r
The first term is the Newtonian gravitational potential and the second term is the angular momentum
barrier. The third term does not appear in Newtonian theory – it is a correction due to GR. See figure
6.2 for sketches of this potential.

50
At large r, V ∼ −M r−1 , and at r = 2M we have V = − 21 (remember that we are only working in
the region r > 2M for now).
The extrema of the potential are at
( √ )
′ Ω2 M2
V =0⇒r= 1± 1 − 12 2 ,
2M Ω
√ √
so if Ω > 12M there are two√local extrema, at Ω = 12M these two extrema collide (leaving a single
inflection point), and at Ω < 12M there are no real extrema (see figure 6.2).

Figure 6.2: The effective potential in a Schwarzschild spacetime, for various values of the conserved
angular momentum Ω.

Timelike circular orbits


First, let’s look for circular orbits. These have ṙ = 0 and r̈ = 0 – the latter means that we must be
at a local extrema of the effective potential.
Labelling these extrema by r− and r+ , with r+ > r− , we find that the extremum at r− is always
unstable (i.e. it is a local maximum) while that
√ at r = r+ is stable. The innermost (marginally) stable
circular orbit (ISCO4 ) is obtained when Ω = 12M , when r = 6M . The energy of these orbits can be
calculated: e.g. for a stable circular orbit,

E2 − 1
= V (r+ ).
2

Bound orbits

If Ω > 12M then there are bound orbits which are not circular. These have energies satisfying

E2 − 1
V (r+ ) < < 0,
2
(see figure 6.3).

Unbound orbits

If Ω > 12M then there are also unbound orbits, which have energies satisfying E 2 ≥ 1 (see figure
6.4).
4 This is very important in astrophysics. As well as the exterior regions in generic spherically symmetric spacetimes,

the Schwarzschild metric also describes (spoiler alert) a black hole. Astrophysical black holes often have accretion disks,
consisting of matter orbiting close to the black hole, in almost circular orbits. Friction causes this matter to slowly lose
energy, falling towards the black hole (and emitting light). Once it reaches the ISCO, it quickly falls into the black hole.
So the ‘inner edge’ of the accretion disk that you might see is not at the edge of the black hole, but at r = 6M .

51
Figure 6.3: Bound orbits in Schwarzschild.


For smaller angular momentums (Ω ≤ 12M ) the situation is interesting: in this regime there is no
local maximum of the effective potential. As before, there are unbound orbits (with E 2 ≥ 1) – but these
orbits will only reach infinity if they are outgoing initially. All other orbits – that is, orbits with E 2 < 1
or with ṙ < 0 initially – will eventually reach the surface r = 2M (figure 6.4).

Figure 6.4: Unbound orbits in Schwarzschild.

6.4 Perihelion precession

One of the big scientific puzzles before the advent of general relativity was the anomolous precession of
the perihelion of Mercury. The perihelion is the closest point of approach to the sun, and Newtonian
theory predicts that planets move on ellipses, with the perihelion always occurring at the same point
in space. But observations had shown that the perihelion of Mercury is precessing – on each orbit, the
perihelion occurs at a slightly different angle (see figure 6.5).
The orbits of the planets are close to circular, so our approach to this problem will be to treat Mercury
as a point mass travelling on a circular orbit. We will then give this orbit small perturbations, and see
what happens.
M
It is convenient to use the coordinate u = r instead of r, and to parametrise the orbits by ϕ instead
of the proper time τ . Then we have
( )−1
du du dϕ M
= =− ṙ,
dϕ dτ dτ Ω

52
Figure 6.5

and so equation (6.2) becomes


( )2
1 du M2 1 M 2 (E 2 − 1)
− 2
u + u2 − u3 = .
2 dϕ Ω 2 2Ω2
Differentiating with respect to ϕ
d2 u M 2
− 2 + u − 3u2 = 0.
dϕ2 Ω
Ω2
Setting u = M
R + ϵv(ϕ), where R is the radius of a circular orbit (so R2 − MR − 3Ω2 = 0) we find
( )2
d2 v M2 M M
0=ϵ 2 − 2 + + ϵv − 3 + ϵv
dϕ Ω R R
( )
d2 v M
⇒0= + 1−6 v − 3ϵv 2 .
dϕ2 R
Ignoring lower order terms, we have the equation
( )
d2 v M
+ 1−6 v = 0.
dϕ2 R
For stable circular orbits R > 6M . Then this equation has periodic solutions, with period
2π 6πM
Tperiod = ( ) 12 ∼ 2π + R
,
1− 6M
R

so the perihelion precesses by an additional 6πM/R per orbit. This matches the observed anomalous
precession of Mercury!

6.5 Gravitational bending of light

One of the other classic tests of general relativity is the bending of light when it passes near a massive
object.
For this purpose, we need to use the massless geodesic equation rather than the massive one, i.e. we
need to take K = 0. The equation for r is then
( )
1 2 Ω2 2M E2
ṙ + 2 1 − = .
2 2r r 2

53
Figure 6.6: The effective potential for null geodesics moving in the Schwarzschild spacetime.

The effective potential is sketched in figure 6.6. Note that here, ‘dots’ mean derivatives with respect to
an affine parameter along the null geodesic (not the proper time!).
Note that, for this effective potential, there is only one local extremum, at r = 3M . This is called
the light ring or photon sphere – on this surface, light can orbit the central object (assuming that this
central object is smaller than 3M )! However, unlike the massive case, such orbits are always unstable.
For the bending of light, we are interested in unbounded orbits, so the energy satisfies

E2 Ω2
0< < V (3M ) = .
2 27M 2

As before, it is useful to set u = M


r and use ϕ as a parameter along the curve instead of the affine
parameter. Then we obtain the equation

d2 u
+ u − 3u2 = 0.
dϕ2
In the Newtonian theory the third term is absent. So, in the Newtonian theory, the solutions are
M
u= sin(ϕ − ϕ0 ),
b
where b and ϕ0 are constants. This equation can be rewritten

r sin(ϕ − ϕ0 ) = b,

see figure 6.7. There is no gravitational deflection in the Newtonian theory! The constant b is called the
impact parameter: in the Newtonian theory, it measures the closest distance of the curve to the origin.
Now, let’s reintroduce the quadratic term, which is the correction from GR. We’ll consider large
impact parameters (so the light ray passes far from the central region), so we’ll set b = Bϵ−1 . We’ll also
set ϕ0 = 0 for simplicity. We write the solution as
M
u=ϵ sin ϕ + ϵ2 v(ϕ),
B
and we expand in powers of ϵ. The equation for u gives
( 2 )
d v M2
ϵ2 + v − 3 sin2
ϕ + O(ϵ3 ) = 0.
dϕ2 B2

54
Ignoring lower order terms, the general solution to this equation is

M2
v = α sin ϕ + β cos ϕ + (1 + cos ϕ)2 ,
B2
for some constants α and β. For a particle coming in ‘from the left’ (see figure 6.7), both the perturbation
and its derivative should vanish for ϕ = π. These conditions give α = β = 0.
Putting these calculations together, the solution (up to O(ϵ2 ) is

M M2
u=ϵ sin ϕ + ϵ2 2 (1 + cos ϕ)2 + O(ϵ3 ).
B B
M
Recall that r = u . To find the deflection angle, we need to find the value of ϕ such that u = 0, with
ϕ ≤ 0.
Setting ϕ = −ϵ(∆ϕ) and expanding in powers of ϵ, we find that

M M2
0 = −ϵ2 (∆ϕ) + 4ϵ2 2 + O(ϵ3 )
B B
M
⇒ (∆ϕ) = 4 + O(ϵ).
B

So light rays are deflected when they pass near massive objects in general relativity!
This value matches the observations very well. In 1919, after Einstein published GR, two expeditions
were sent out (from the Royal Astronomical Society and the Royal Society) to measure the deflection of
light coming from stars behind the sun during a solar eclipse. They found that the apparent position of
the stars changed when they were behind the sun – in perfect agreement with the prediction we have
just made. This success led to a ‘ticker-tape parade’ for Einstein through New York City – the only such
parade that has ever taken place for a scientist!

Figure 6.7: Gravitational deflection of light.

6.6 Black holes and singularities

6.6.1 Coordinates

So far we have resolutely stuck to the region r > 2M . This is fine so long as we are looking at spherically
symmetric stars or planets, which will have some matter that modifies the geometry in some way (which
we don’t have to care about) for small values of r. But what if there is no matter there? The Schwarzschild
solution is still a solution to the vacuum Einstein equations, so sooner or later we have to understand
what’s going on at r = 2M .
Recall that the Schwarzschild metric is
( ) ( )−1
2M 2M ( )
g =− 1− dt2 + 1 − dr2 + r2 dθ2 + sin2 θdϕ2 .
r r

55
There are several places where something goes funny with this expression for the metric. At r = 2M
the first component vanishes and the second term becomes infinite. At r = 0 all of the components
vanish except for the first one, which becomes infinite. Finally5 , at θ = 0, π the dϕ2 component vanishes.
This last point should give us pause for thought. The metric degenerates on the axis θ = 0, π,
but this doesn’t mean that there is some kind of physical singularity there – in fact, exactly the same
thing happens in flat space when it’s written in spherical polar coordinates! What’s happening at the
poles is not that there is something wrong with the metric, but that there is something wrong with the
coordinates. If we change to different coordinates – for example, changing to polar coordinates with a
different pole, or to rectangular coordinates – then these points appear totally normal.
Could something similar be happening at the other places where the metric is problematic, at r = 2M
and at r = 0? Well, maybe. We can obtain hints of what might be going on my looking at scalar
quantities, since these are invariant under coordinate transformations.
If we want to look at scalar quantities constructed out of the metric, then we cannot take contractions
of the metric itself: (g −1 )µν gµν = 4, which doesn’t tell us anything. As far as first derivatives of the
metric go, these are encoded in the Christoffel symbols – but these are not tensors. Our only real choice
is to look at the curvature. The scalar curvature R will not work, since the Schwarzschild metric is a
solution of the Einstein vacuum equations: contracting these equations using the metric
1
Rµν − Rgµν = 0
2
⇒ R − 2R = 0 ⇒ R = 0.
For this reason, the Einstein vacuum equations are equivalent to Rµν = 0.
There is a scalar that can be built out of the curvature tensor that can be useful for our purposes,
called the Kretschmann scalar Rµνρσ Rµνρσ . In the Schwarzschild metric, this has the value
12M 2
Rµνρσ Rµνρσ = .
r6
So in a coordinate-independent sense, the curvature is finite at r = 2M but infinite at r = 0. This
suggests that r = 2M might be an ordinary surface in spacetime – not a singularity – but r = 0 might
represent a real singularity.
We should not be too hasty and immediately conclude that r = 2M is a ‘coordinate singularity’
while r = 0 is a physical singularity, simply on the basis of the Kretschmann scalar (even though this
will actually turn out to be the case!). There are some interesting subtleties here, that we won’t be able
to fully explore, but I will give some indication of the issues.
First, suppose that the curvature blows up in some coordinate-independent way. Does this necessarily
mean that spacetime comes to an end? Remember that the Einstein equations are a second-order system
of PDEs, so we might think that, to make sense of the Einstein equations, the curvature must be finite.
However, if you’ve ever studies PDEs you might have come across the notion of a weak solution, which is
a ‘solution’ of a PDE with less regularity than might be expected - a famous example is a shock wave in
a fluid. It turns out that the Einstein equations can be made sense of in situations where the curvature
is infinite (so-called impulsive gravitational waves), so this doesn’t necessarily signal a singularity.
On the other hand, there are situations where the curvature is finite, and indeed nothing at all unusual
happens locally, but for global reasons the solution to the Einstein equations cannot be continued past
some surface in spacetime. The most famous such example is called a Cauchy horizon – these occur
inside rotating black holes, and are connected with an important unproved conjecture in GR called
strong cosmic censorship. Unfortunately, these issues also lie beyond the scope of this course.
Returning to the Schwarzschild metric, we have seen that the Kretschmann scalar suggests that the
surface r = 2M might be simply a coordinate singularity. To show that this is in fact the case, we need
to transform to some different coordinates which ‘pass through’ the surface r = 2M .
5 Technically,there is also something unusual going on at ϕ = 0, 2π. Since we are supposed to cover a manifold by open
sets which are then mapped onto Rn , these points are not covered by our chart.

56
First, let’s examine the structure of the light cones near r = 2M . Since we’re in spherical symmetry,
we’ll look only at the (t, r) plane, and we’ll look at radially ingoing and outgoing null curves.
A null vector X in the (t, r) plane satisfies
( ) ( )−1
2M 2M
− 1− (X ) + 1 −
t 2
(X r )2 = 0
r r
( )
2M
⇒X =± 1−
r
X t,
r
and a null curve passing through the point (t0 , r0 ) is given by
( )
dr 2M
=± 1−
dt r
( ( ))
r − 2M
⇒ t − t0 = ± r − r0 + 2M log ,
r0 − 2M
so as r → 2M , the null cones are “squeezed together” (see figure 6.8). It looks as though an ingoing null
curve will never reach the surface r = 2M , but is this really true? Or is it an artefact of the coordinates?

Figure 6.8: Light cones and radial light rays in the Schwarzschild spacetime, drawn in “Schwarzschild
coordinates” (t, r, θ, ϕ).

This suggests that we should try to ‘straightening out’ the null cones. This can be achieved by
changing from the coordinate t to a coordinate v which is constant on the ingoing null curves. With this
in mind, we define
( )
r − 2M
r∗ := r + 2M log
2M
v := t + r∗

57
It is easy to check that v is constant along ingoing null geodesics (exercise). Then we have
dv = dt + dr∗
( )−1
2M
= dt + 1 − dr.
r

Now, if we write the metric in coordinates (v, r, θ, ϕ) it takes the form


( )
2M ( )
g =− 1− dv 2 + 2dvdr + r2 dθ2 + sin2 θdϕ2 .
r

The metric is no longer diagonal due to the term 2dvdr. The first term still vanishes at r = 2M so
we might be tempted to say that the metric degenerates here. However, the matrix gab is not degenerate:
it is still an invertible matrix, so it has maximal rank. To check this, we calculate
( )    
1 − 2Mr 1 0 0 0 ( 1 ) 0 0 1 0 0 0
 0    0 1 0 0
 1 1 − r
2M
 1 0 0 0 0 = 
 0 0 r2 0  0 0 r −2
0  0 0 1 0 ,
0 0 0 r2 sin2 θ 0 0 0 r−2 (sin θ)−2 0 0 0 1
so the inverse metric is
( )
−1 −1 ab 2M
g = (g ) ∂a ∂b = 2∂v ∂r + 1 − (∂r )2 + r−2 (∂θ )2 + r−2 (sin θ)−2 (∂ϕ )2 .
r
So the components of g and g −1 are both finite at r = 2M (except for the usual degeneracy at θ = 0, π
from polar coordinates)!
The coordinates (v, r, θ, ϕ) are called ingoing Eddington-Finkelstein coordinates (There are also out-
going Eddington-Finkelstein coordinates (u, r, θ, ϕ), where u = t − r∗ ). In these coordinates, the ingoing
null curves are simply given by v = constant, while the outgoing radial null curves (in the region r > 2M )
are given by
v − v0 = 2 (r∗ − r0∗ )
( ( ))
r − 2M
= 2 r − r0 + 2M log .
r0 − 2M
Note that, as r → ∞, v ∼ t + r. Also, in these coordinates, there is nothing stopping us from considering
the ‘interior’ region 0 < r < 2M - the metric is perfectly regular at r = 2M . You can also check that the
curve given by r = 2M is itself a null curve (exercise). In fact, the surface r = 2M acts as a one-way
membrane: you can pass through it from the exterior r > 2M to the interior r < 2M , but there are no
causal curves going in the other direction! See figure 6.9.
We can also consider outgoing Eddington-Finkelstein coordinates (u, r, θ, ϕ). In these coordinates,
the surface r = 2M can be also be seen to be a null surface, but this time causal curve cross it in
the opposite direction: you can pass from the interior r < 2M to the exterior r > 2M , but you can’t
enter the interior! The reason for this apparent discrepancy is that the ingoing and outgoing Eddington-
Finkelstein coordinates cover different parts of the manifold (see figure 6.10). Outgoing Eddington-
Finkelstein coordinates cover a different ‘interior’ region 0 < r < 2M .
We can also find some coordinates which cover the original region (r > 2M ) as well as the two extra
regions covered by ingoing and outgoing Eddington-Finkelstein coordinates. These are called Kruskal
coordinates (or Kruskal-Szekeres coordinates), and are defined by
U = −e− 4M
u

v
V = e 4M
then the metric takes the form
32M 3 − r ( )
g= e 2M dU dV + r2 dθ2 + sin2 θdϕ2
r

58
Figure 6.9: Light cones and radial light rays in the Schwarzschild spacetime, drawn in ingoing Eddington-
Finkelstein coordinates (v, r, θ, ϕ).

where here r is defined implicitly in terms of U and V by the relationship


r∗
log(−U V ) =
2M

This metric is regular at r = 2M , which is given by U = 0 or V = 0. The original region that


we started in is the region U < 0, V > 0. We can also clearly extend to positive values of U and/or
negative values of V . The region V > 0, U > 0 is the same as the interior region covered by the ingoing
Eddington-Finkelstein coordinates, while the region U < 0, V < 0 is the same as the interior region
covered by outgoing Eddington-Finkelstein coordinates. The region U > 0, V < 0 is new: it turns out
that this region is isometric to the original exterior region r > 2M !
The full spacetime, including all four regions, is sometimes called the maximally extended Schwarzschild
spacetime.
Often, rather than U and V , Kruskal coordinates are defined to be the coordinates (T, X, θ, ϕ), where
1
T = (U + V )
2
1
X = (V − U )
2
Exercise: write out the Schwarzschild solution in these coordinates. You may make use of the function
r, which can be implicitly defined in terms of T and X.
Note the following useful fact: in each of the four regions (I), (II), (III) and (IV), the metric can
be put back into coordinates that look like the original Schwarzschild coordinates. However, note that
these are all different coordinate systems: the original coordinates only cover region (I), and none of
these coordinates cover the surface r = 2M (given( in Kruskal coordinates ) by U V = 0). In region (II),
for example, we can define a coordinate system t(II) , r(II) , θ(II) , ϕ(II) , which can be obtained from the

59
Figure 6.10: The maximally extended Schwarzschild spacetime. The original Schwarzschild coordinates
only cover region (I). Ingoing Eddington-Finkelstein coordinates cover the regions (I) and (II), while
outgoing Eddington-Finkelstein coordinates cover regions (I) and (III). Finally, Kruskal–Szekeres coor-
dinates cover regions (I), (II), (III) and (IV).
Regions (I) and (IV) are asymptotically flat regions: they include regions of aribtrarily large r, where
the metric looks like the Minkowski metric. Region (II) is the black hole region: causal worldlines can
enter this region but never leave it. Region (III) is the white hole region: causal worldlines can leave
this region but never enter it.

ingoing Eddington-Finkelstein coordinates (v, r, θ, ϕ) by setting


( )
∗ 2M − r
r(II) := r + 2M log
2M

t(II) := v − r(II)
r(II) := r
θ(II) := θ
ϕ(II) := ϕ

then the metric takes the form


( )−1 ( ) ( )
2M 2M
g = 1− 2
dr(II) − 1− dt2(II) + r(II)
2 2
dθ(II) + sin2 θ(II) dϕ2(II)
r(II) r(II)
2
Note that, since r < 2M in this region, the coefficient of dr(II) is negative, while the coefficient of dt2(II)
is positive! So, in this region, r(II) is a ‘time coordinate’ and t(II) is a ‘space coordinate’. Actually, r(II)
decreases towards the future in this region.

6.6.2 Interpreting the maximally extended Schwarzschild space time

Throughout the entire extended Schwarzschild spacetime, the vacuum Einstein equations Rµν = 0 hold.
Hence this is a vacuum solution to the Einstein equations. You will sometimes hear people say things
like ‘there is an infinitely dense point of matter at r = 0 in a black hole’ but this is not true. The ‘point’
r = 0 is not actually a part of the Schwarzschild manifold at all (it only extends to the open region r > 0)
and there is no matter anywhere in sight. What’s more, it has recently been shown that black holes
can (theoretically – not in realistic astrophysical situations) be formed from the collision of gravitational
waves, without any matter taking part!

60
It is useful to keep figure 6.10 in mind, and to remember that, on this diagram (radial) light rays
travel at 45 degrees. Region (I) is the most familiar region: here r > 2M and, at large distances, the
metric approaches the flat Minkowski metric. Worldlines of observers in this region, which always have
tangent vectors inside the light cones, can always escape to regions of arbitrarily large r.
Region (II) is called the black hole region. Once an observer has crossed the surface r = 2M into
region (II) they are stuck inside this region: no worldlines with tangent vectors inside the future light
cones can leave this region. The surface r = 2M , called the event horizon, acts like a one-way membrane:
once you cross r = 2M there is no turning back! On the other hand, there is no local quantity which
distinguishes this surface: the curvature is finite, and small observers can cross this surface without
noticing anything dramatic.
Once inside the black hole region r decreases along all worldlines, and in fact all observers will reach
r = 0 in a finite affine time (see the example sheet). Indeed, r = 0 is more like a time than a point in
space. However, unlike the surface r = 2M , r = 0 is a genuine singularity: the curvature is infinite here,
and there is no way to extend the manifold6 beyond r = 0. Point particles moving on geodesics will
not experience any forces (after all, that is what is the defining feature of a geodesic) but for a realistic
observer of any finite size, tidal forces become infinite as r → 0, ripping the observer apart.
Because of the singularity at r = 0, the Schwarzschild spacetime is said to be geodesically incomplete.
A geodesically complete manifold is one for which all geodesics can be extended arbitrarily far, so that
their associated affine parameters can take values in (−∞, ∞). Minkowski space is geodesically complete,
while the Schwarzschild spacetime is not.
Finally, we come to regions (III) and (IV). Both of these regions are considered unphysical: in a
realistic black hole formed by the collapse of a star they are ‘covered up’ by the matter (see figure 6.11).
However, for the purposes of calculations it is often helpful to work with the full extended Schwarzschild
geometry. Since this geometry is invariant under time reversal, these two regions must be present.
Region (III) is called the white hole region: it is the time-reversal of the black hole region, and has the
time-reversed properties. Observers can leave this region, but they can never enter it!
Region (IV) is a ‘copy’ of the original region (I): it is also asymptotically flat, and looks like Minkowski
space far away from the event horizon. It is sometimes said to be ‘another universe’. There is no way for
observers to get from region (I) into region (IV), so there is no ‘wormhole’ here!

6 In fact, the manifold cannot be extended in a way such that the metric is even continuous, let alone differentiable. This

was only proved in the last few years!

61
Figure 6.11: The spacetime of a “realistic” spherically symmetric collapsing star. By Birkhoff’s theorem,
the geometry is exactly described by the Schwarzschild metric outside of the star, but inside the star the
geometry will be modified. In this case it is modified so that there is a regular ‘centre’ at r = 0 running
up the left hand side of the page: unlike the singularity at r = 0 at the top of the page, this line does
not represent anything unusual: this line is like r = 0 in Minkowski space. Note that the star effectively
‘covers up’ regions (III) and (IV), but there are still parts of regions (I) and (II) left intact. In particular,
there is still a black hole region, from which no causal worldlines can escape – outside of the star, this
region is bounded by the surface r = 2M .

62
Chapter 7

Cosmology

The Cosmos is all that is or ever was or ever will be. Our feeblest contemplations of the
Cosmos stir us — there is a tingling in the spine, a catch in the voice, a faint sensation of
a distant memory, as if we were falling from a great height. We know we are approaching
the greatest of mysteries.

Carl Sagan, Cosmos.

So far we have only seen one solution to the Einstein equations: the Schwarzschild solution, which
is a solution to the vacuum Einstein equations with zero cosmological constant. Remember that this
solution was found by searching for solutions with lots of symetries: in this case, we looked for spherically
symmetric, static spacetimes1 . This symmetry class is important for astrophysical purposes, in which
many important systems are approximately spherically symmetric, and it also led to the very interesting
phenomena of black holes.
Another important symmetry class is homogeneous and isotropic spacetimes. Instead of astrophysical
applications, this symmetry class is suitable for studying the entire universe on the largest scales.
Homogeneity
A homogeneous spacetime is one where there is a global function τ , called a time function, with level
sets Στ satisfying:

• The surfaces Στ are spacelike hypersurfaces, i.e. dτ is a timelike covector, g −1 (dτ, dτ ) < 0 (equiv-
alently every curve which lies entirely within Στ is spacelike). By rescaling τ if necessary, we can
assume that g −1 (dτ, dτ ) = −1.
• The surfaces Στ are homogeneous spaces. This means that there is a group G which acts on the
surface Στ transitively (any point can be mapped to any other point by some group element) and
by isometries (the action of the group preserves the metric). Informally, this says that ‘every point
looks like every other point’ (sometimes called the Copernican principle).

(see figure 7.1).

Isotropy
The level sets Στ are also required to be isotropic. This means that

• for each point p ∈ Στ and for each pair of unit tangent vectors X, Y ∈ Tp (Στ ) (that is, tangent
vectors to the submanifold Στ , not spacetime vectors – although such vectors can be considered
1 From Birkhoff’s theorem, we could have looked for solutions to the Einstein equations in spherical symmetry: the only
such solutions turn out to also be static.

63
spacetime vectors that are tangent to the surface Στ ), there is an isometry mapping X to Y (see
figure 7.1).

By a unit vector, we simply mean a vector X such that g(X, X) = 0. By a vector tangent to the surfaces
Στ , we mean that X(τ ) = 0.
In fact, isotropy implies homogeneity but not vice versa. For example, a torus is homogeneous but
not isotropic.

These definitions also mean that there are special observers, called isotropic observers or comoving
observers, whose worldlines are such that

• The worldlines are timelike.

• For every pair of unit vectors X, Y which are orthogonal to the tangent vector of the wordline,
there is an isometry mapping X to Y .

Note that the tangent vector to these worldlines is simply −(dτ )♯ , or, in abstract index notation, −∂ µ τ .
The − sign is chosen so that these tangents are future-directed.

Figure 7.1: A homogeneous and isotropic spacetime. There are isometries mapping any point on a
surface Στ to any other point on that surface, and also isometries mapping any tangent vector to the
surface Στ to any other tangent vector (these isometries are shown in red). Comoving observers move
along worldlines which are orthogonal to ths surfaces Στ .

The worldline of the Earth is, roughly, the worldline of an isotropic observer: on large scales, the
universe looks more or less the same in every direction. This would not be the case for an astronaut
moving past the earth with some large relative velocity (even if they are also moving on a timelike
geodesic): to them, the universe would look “blueshifted” in front of the spacecraft and “redshifted”
behind it, due to the Doppler effect2 .
2 In some ways we are almost back to the ‘Atomist’ view of spacetime. But there is a crucial difference: now, the

existence of the special class of observers is not built into the basic structure of spacetime - instead it is due to symmetries
which exist in the particular solution to equations which we happen to be living in. This is reminiscent of the way in which
altitude is dealt with in the Aristotelian vs. Atomist spacetimes.

64
Figure 7.2: Comoving and non-comoving observers in a homogeneous and isotropic spacetime.

7.1 Geometry and matter in a homogeneous and isotropic uni-


verse

Homogeneity and isotropy imply that the spacetime metric can be written as
2
g = −dτ 2 + (a(τ )) g.
Here g is the spatial part of the metric, and a(τ ) is called the scale factor. Furthermore, the metric g is
required to be the metric of a maximally symmetric space. There are actually only three options:

• Flat: the metric g is simply the Euclidean metric in three dimensions.

• Closed: the metric g is the standard metric on the 3-sphere S3 .


• Open: the metric g is a metric of constant negative curvature.

In each case there are possible topological modifications – e.g. in the flat space, Στ need not be homeo-
morphic to R3 but it could instead be a cylinder R2 × S1 or a torus S × S × S.
In all cases, the metric can be written in a universal form, called the Robertson-Walker metric:
dr2 ( )
g= + r2 dθ2 + sin2 θdϕ2 ,
1 − kr 2

where k > 0 is a closed universe, k = 0 is a flat universe and k < 0 is an open universe. By rescaling r
and the scale factor a, we can always choose k = 1, 0 or −1 in the closed, flat or open cases respectively
(exercise).

Christoffel symbols and curvature components


This symmetry class makes it relatively easy to calculate the Christoffel symbols and the components
of the Riemann curvature tensor, for the following reasons:

• Scalar fields on each surface Στ must be constant for consistency with homogeneity. So scalar
quantities can only depend on time τ .
• All (co)vector fields on the surfaces Στ must vanish for consistency with isotropy.
• The only (1, 1) tensor fields on the surface Στ consistent with homogeneity and isotropy are those
proportional to the Kronecker delta δji , and the constant of proportionality can only depend on
time τ .

65
• The only (0, 2) tensor fields on the surface Στ consistent with homogeneity and isotropy are those
proportional to the metric g ij , and the constant of proportionality can only depend on time τ .

• The only (2, 0) tensor fields on the surface Στ consistent with homogeneity and isotropy are those
proportional to the inverse metric (g −1 )ij , and the constant of proportionality can only depend on
time τ .

Recall that we use i, j, k etc. to refer to spatial indices. Using these facts together with the form of the
metric, some fairly tedious calculations lead to the following expressions for the Ricci tensor components:

R00 = −3
a
R0i = 0
( )
Rij = aä + 2ȧ2 + 2k g ij ,

where ‘dots’ represent derivatives with respect to τ . From these we can calculate

ȧ2 + k
G00 + Λg00 = 3 2
−Λ
( a )
Gij + Λgij = −2aä − ȧ2 − k + a2 Λ g ij .

The energy momentum tensor must also respect the symmetries imposed by homogeneity and isotropy.
This means that we can write
T00 := ρ
Tij := pa2 g ij ,

where these equations define the ‘density’ and ‘pressure’ – we are not necessarily assuming that the
matter is a fluid.

7.2 The Friedmann equations

The Einstein equations in a homogeneous and isotropic spacetime are called the Friedmann equations.
They are the following system of ODEs, which are derived from the expressions above for the components
of the Einstein tensor and the energy momentum tensor:

ȧ2 + k
3 − Λ = 8πρ (7.1)
a2
2aä + ȧ2 + k − a2 Λ = −8πpa2 . (7.2)

Sometimes the following equation, following from the two above, is useful:
ä 4 1
= − π(ρ + 3p) + Λ.
a 3 3

If we supplement these equations with an equation of state, expressing the pressure p as a function
of the density ρ, then this forms a closed system of ODEs, which can be solved as follows: first we write
p in terms of ρ in equation (7.2), and then we substitute for ρ in terms of a and ȧ using equation (7.1).
This will lead to a nonlinear second order ODE for the scale factor a.

An evolution equation for the density ρ

66
By differentiating equation (7.1) with respect to τ , we find

3ȧ ( )
8π ρ̇ = 2aä − 2ȧ2 − 2k
a3
3ȧ ( )
= 3 3ȧ2 − 3k + a2 Λ − 8πa2 p
a

= −24π (p + ρ),
a
and so we obtain the equation for the derivative of the density

ρ̇ = −3 (p + ρ). (7.3)
a

Equations of state
Often equations of state of the following form are considered:

p = wρ,

where w is a constant. In this case the density evolves as


ρ̇ ȧ
= −3(1 + w)
ρ a
⇒ ρ ∝ a−3(1+w) .

There are certain particular values of the constant w which have physical meanings:

1. Dust: w = 0, ρ ∝ a−3 . In this case the pressure vanishes for any value of the density. This is often
used to model “ordinary” matter – stars, galaxies, dark matter etc. – since on very large scales
there is negligible pressure between these objects. Note that in this case the energy is proportional
to the volume element: as the universe expands (and a increases), matter simply dilutes.
2. Radiation: w = 13 , ρ ∝ a−4 . If you have studies statistical physics then you will know that
radiation can be treated as a perfect fluid with p = 13 ρ. In this case the energy density decreases
both due to the increase in the volume element and due to redshift of the photons (see section 7.3).

3. Dark energy: w = −1, ρ = const. In this case both the pressure and density are constant,
independent of the behaviour of the scale factor a. In fact, in this case the energy-momentum
tensor is just proportional to the metric g (exercise). This allows us to reinterpret the cosmological
constant Λ as a component of “matter” rather than “gravity” – moving it onto the right hand side
of the Einstein equations. However, there is no clear microscopic understanding of the origin of
this “matter”: it is usually thought of as the energy of the vacuum, i.e. the ground state energy
of the quantum fields describing matter. However, cosmological observations place the value of Λ
as almost zero, whereas calculations using quantum field theory predict that is should be around
10120 times larger3 !

7.3 Cosmological redshift and the Hubble constant

Suppose Alice and Bob are both comoving observers. In terms of the coordinates (τ, r, θ, ϕ), suppose
that Alice follows the worldline (τ, 0, 0, 0) and Bob the worldline (τ, rB , θB , ϕB ), where rB , θB and ϕB
are constants. Suppose that Bob sends light signals at regular intervals ∆τB to Alice, who receives them
at time intervals ∆τA .
3 This has been called ”the worst theoretical prediction in the history of physics.” - M.P. Hobson, G.P. Efstathiou &

A.N. Lasenby (2006), General Relativity: An introduction for physicists.

67
Radial null lines in a cosmological spacetime are null geodesics (this follows from isotropy, but ex-
ercise: check it explicitly). Hence the tangent to an affinely parametrised (ingoing) radial null geodesic
is √
1 − kr2
X = ∂τ − ∂r
a
and the path of a null geodesic is (τ, r(τ ), θ0 , ϕ0 ) where

dr 1 − kr2
=−
dτ a
so if a light ray leaves Bob at time τ = τB and arrives at Alice at τ = τA , then
∫ 0 ∫ τA
dr dτ
−√ =
rB 1 − kr 2
τB a(τ )

Performing the same calculation for the subsequent signal, and noting that the left hand side is
independent of τ , we have
∫ τA +∆τA ∫ τA
dτ dτ
=
τB +∆τB a(τ ) τB a(τ )
If we suppose that ∆τA and ∆τB are very small compared to τB − τA , then we find that, to leading
order,
∆τA ∆τB
=
a(τA ) a(τB )
so the ratio of the received frequency to the emitted frequency is
∆τA a(τA )
=
∆τB a(τB )

In an expanding universe, the scale factor a grows over time. Since τA > τB , a(τA ) > a(τB ) and
so ∆τA > ∆τB . So Alice sees the signals at a lower frequency than they are emitted by Bob: this is
cosmological redshift.
Next suppose that Alice and Bob are close together (relative to the time scale on which a varies).
Then, expanding the expression above, we have
∆τA ( )−1 ( )
= a(τA ) a(τA ) − (τA − τB )ȧ(τA ) + O (τA − τB )2
∆τB
ȧ(τA ) ( )
= 1 + (τA − τB ) + O (τA − τB )2
a(τA )
( )
= 1 + (τA − τB )H(τA ) + O (τA − τB )2

where H(τ ) := ȧ(τ )


a(τ ) is the “Hubble constant”. Note that the Hubble constant is not actually constant,
but depends on time!
∆τA
The expression ∆τ B
≈ 1 + (τA − τB )H is known as Hubble’s law. The quantity (τA − τB ) measures,
in a natural sense, the distance from Bob to Alice: it is the time taken for light to travel from Bob
to Alice. So Hubble’s law says that light from nearby galaxies (assumed to move, roughly, along the
worldlines of comoving observers) is redshifted in a way which scales linearly with the distance to that
galaxy. Hubble’s discovery of this law, with H > 0, provided the first clear evidence in favour of an
expanding universe.

7.4 Solutions to the Friedmann equations

So far we have derived the Friedmann equations, discussed a few different models for matter, and shown
that, in the case of an expanding universe, comoving observers will observe cosmological redshift. Our
last task will be to actually solve the Friedmann equations, and to examine some of the properties of the
solutions.

68
7.4.1 The Einstein static universe

The original motivation for introducing the cosmological constant Λ was to find a cosmological model
which is static, so ȧ = ä = 0. This solution is also supposed to model the universe now, when the matter
is well approximated by ‘dust’, so we have p = 0. Substituting into the Friedmann equations (7.2), (7.1)
we obtain
3k
− Λ = 8πρ
a2
k − a2 Λ = 0
from which it follows that
k = a2 Λ
Λ = 4πρ
since we are dealing with ordinary matter we have ρ > 0, so Λ > 0 and hence we can set k = +1. This
means that the solution is given, in terms of the energy density ρ, as
k=1
Λ = 4πρ
1
a= √
4πρ

Unfortunately, this solution is dynamically unstable: small perturbations lead to a solution which
either rapidly expands or collapses (see example sheet 4).

7.4.2 Matter dominated universes with Λ = 0

Observations now show conclusively that the universe is not static, but expanding, so let’s look for such
a solution. Again, we’ll look for a “matter dominated universe”, where the matter is well-approximated
by dust, and for simplicity we’ll take the cosmological constant Λ = 0.
In this case the Friedmann equations are
ȧ2 + k
3 = 8πρ
a2
2aä + ȧ2 + k = 0
Using this second equation, we find that
d ( 2 ) ( )
aȧ + ka = ȧ 2aä + ȧ2 + k = 0

and so
C
ȧ2 = −k
a
where C is constant. We can identify this constant using the other Friedmann equation:
8π 3
C= ρa
3
Recall that, in the case of dust, ρ ∝ a−3 so C is indeed a constant.
First, let’s consider the flat case k = 0. Then we have
d 3 3 1
(a 2 ) = C 2
dτ 2
( )2
3 1 3 2
⇒a= C2 τ3
2
1 −2
ρ= τ

69
where we have chosen the solution where ȧ > 0 and where a = 0 at τ = 0. This universe expands from
a ‘big bang’ at τ = 0, where the scale factor goes to zero and the density becomes infinite as τ −2 .
Note that there is no “place” where the big bang happens: the solution remains homogeneous and
isotropic at all times. Rather, the scale factor goes to zero, meaning that the proper distance on a surface
Στ between any two comoving observers goes to zero as τ → 0, and the density ρ → ∞ everywhere as
τ → 0. Nor is there a time ‘before’ the big bang: the spacetime as a whole is a solution to the Einstein
equations for all τ > 0, and spacetime terminates in a singularity at τ = 0.
What about closed (k = 1) or open (k = −1) universes? Let’s take k = 1 first. Then we have

C
ȧ2 = −1
a
First we substitute a = Cb2 . Then we have
b2 db 1
√ =±
1 − b dτ
2 2C

Now substituting b = sin u and doing the integral and a bit of algebra, we find
( √ √ √ )
a a a
C arcsin − 1− = ±τ + const.
C C C

We can choose the constant to be zero, so that again a = 0 when τ = 0. In this case we should also
choose the + sign, so that for small positive values of τ , the scale factor a is positive.
For small values of τ , we find (exercise) to leading order
( ) 23
3 1 2
a= C3τ 3
2

so that the scale factor approaches zero as t 3 . As before, ρ ∼ a−3 so the energy density becomes infinite
2

as t−2 .
There is an important difference in this case, however: at the time τ = Cπ we again have a = 0. So,
in the closed case, there is a big bang followed by a big crunch, when the scale factor decreases to zero
in the future and the universe recollapses!
Finally, consider the open case k = −1. Following similar algebra as in the closed case, we obtain the
solution (√ √ √ )
a a a
C 1 + − arsinh =τ
C C C
As τ → 0 this solution behaves in the same way as the closed and flat cases, but this time, for large
values of τ we have a ∼ τ , so these universes grow faster than their flat counterparts.

7.4.3 Radiation dominated universes

So far we have only looked at “matter dominated” universes, where the matter content is well approxi-
mated by dust, or pressure-free fluid. We can also consider the case where the primary matter content
of the universe is described by radiation, so that the pressure is given by p = 13 ρ instead of p = 0.
Again, setting Λ = 0 and setting
8π 4
B= a ρ
3
we find that B is a constant (so ρ ∼ a−4 ).

70
Following similar calculations as in the matter dominated case, the solutions are found to be (exercise)
√ 1 1
k=0 ⇒ a = 2B 4 τ 2

√ ( )2
√ τ
k=1 ⇒ a= B 1− 1− √
B

√( )2
√ τ
k = −1 ⇒ a= B 1+ √ −1
B
Qualitatively these solutions behave similarly to the matter dominated scenarios: in the flat and open
cases the universe just goes on expanding, while in the closed case the universe expands initially and
subsequently contracts, ending in a big crunch. The rates, however, are different from those in the matter
dominated cases.

7.4.4 The de-Sitter spacetime

Now we’ll add back in the cosmological constant Λ, but for simplicity we’ll now consider the case in
which there is no matter (or, if you prefer, the only matter content of the universe is “dark energy”).
In this case, the Friedmann equations are
ȧ2 + k
3 −Λ=0
a2
2aä + ȧ2 + k − a2 Λ = 0
Multiplying the first equation by a3 and then taking a derivative with respect to τ , we see that (assuming
a > 0) any smooth solution a(τ ) to the first equation will automatically solve the second equation
(exercise). So we can concentrate entirely on the first equation.
The solutions to this equation are
√λ
k=0 ⇒ a ∝ e± 3 τ

√ (√ )
3 Λ
k=1 ⇒ a= cosh τ
Λ 3

√ (√ )
3 Λ
k = −1 ⇒ a= sinh τ
Λ 3

It appears that we have three different solutions as before. But in fact, all three solutions can be
shown to represent different parts of the same spacetime – called de Sitter spacetime. Moreover, this
spacetime has no ‘big bang’ – in fact, in the right coordinates, it can be seen that this spacetime is
actually static, so it doesn’t evolve over time. The surface τ = 0 in the open case (k = −1), where
the metric appears to be singular (since a = 0) actually just represents a coordinate singularity, like
the surface r = 2M in Schwarzschild. Note that, in this case, ρ ≡ 0 so we cannot argue for a ‘physical
singularity’ by saying that the energy density is infinite, as we could before!

7.4.5 Mixture models

A realistic model of the universe contains both ‘dust’ and ‘radiation’, as well as a cosmological constant
Λ > 0 (albeit with Λ almost equal to zero!). In this case we can try to solve the Friedmann equations

71
by including two different components of the fluid, as well as a cosmological constant.
Recall the equation that we derived for the evolution of the density:
ρ̇ ȧ
= −3(1 + w)
ρ a

from which it follows that ρ ∝ a−3(1+w) . Ultimately, this equation is derived from the conservation of the
energy momentum tensor for the fluid, ∇µ T µν = 0. If we have several fluids which are non-interacting,
then each of their energy-momentum tensors is separately conserved, and so for each fluid we will have

ρ(f ) ∝ a−3(1+w(f ) )

where the subscript (f ) labels the different fluids.


Although in general the equations cannot be solved explicitly for a general mixture of this sort, we
can still obtain the following picture of what’s going on:

1. At early times, when a is very small, the energy density of radiation goes as ρ(rad) ∼ a−4 , and this
is the dominant component. So at early times the universe is well approximated by a radiation
dominated solution.
2. The energy density of dust behaves as ρ(dust) ∼ a−3 . So as a grows larger, eventually the dust com-
ponent dominates over the radiation component, and the universe behaves as a matter dominated
solution.
3. The energy density of ‘dark energy’ (or the cosmological constant) is constant, but very small.
Eventually, if the universe continues expanding, the scale factor a becomes so large that dark
energy dominates over the matter component. At this point, the universe becomes approximately
de-Sitter.

There is one important addendum to this picture which is often incorporated into modern cosmology,
although it remains slightly controversial. To solve certain observational conundrums, cosmologists often
suppose that there was an additional time, very early on in the evolution of the universe (before the
radiation dominated era) when there was a period of exponential, de Sitter-like growth. This requires
some very exotic matter which is able to mimic a large cosmological constant at early times, before falling
away so that it is unobservable at the present time.

72
Appendix A

Normal coordinates

Let p ∈ M be a point in a Lorentzian manifold with metric g. We want to show that there are local
coordinates in some region around p such that

1. gab p = diag (−1, 1, 1, 1)

2. Γabc p = 0

Consider the tangent space at p, Tp (M). The idea is to match up vectors in the tangent space at p
with points along the corresponding geodesics.

First, consider the metric at p, g p . This is a quadratic form on the vector space Tp (M) – moreover,
it is non-degenerate and has signature (−, + + +). By standard linear algebra, it is possible to find a
basis for the vector space, say (e0 , e1 , e2 , e3 ), such that the components of g p with respect this basis are

g p (ea , eb ) = diag (−1, 1, 1, 1)

Note that these are only the components of g with respect to the vectors ea , which (for the time being)
are not related to any kind of coordinate system, so we have not yet achieved our first goal.
Next, we introduce the exponential map, defined as follows:

expp : Tp (M) → M
X 7→ γX (1)

where γX is the affinely parametrised geodesic through p with tangent vector X at p, and with γX (0) = p.
In other words, expp (X) is the point on the manifold reached by travelling a unit distance along the
geodesic through p with initial tangent vector X.
This map is a smooth bijection in a neighbourhood of the origin. This can be seen by working in
some (arbitrary) local coordinates in a neighbourhood of p: we have

(ϕU ◦ expp ) : Tp (M) → Rn


X 7→ (x0 (1), x1 (1), . . . , xn−1 (1))

d2 xa (s) dxb (s) dxc (s)


where 2
+ Γabc x(s) =0
ds ( ds ds )
x0 (0), x1 (0), . . . , xn−1 (0) = ϕU (p)

dxa (s) ( )
= X a = X ϕ−1 a
U (x )
ds s=0

73
For any fixed vector X, the system of equations above have a unique solution for sufficiently small s
– this follows from standard ODE theory. These equations are also invariant under s 7→ λs, X 7→ λ−1 s,
from which it follows that the map (ϕU ◦ expp ) is well defined for all vectors X which are sufficiently
close to the origin.
To check that this map gives a bijection in a neighbourhood of the origin we can use the inverse
function theorem. It is clear that (ϕU ◦ expp ) maps the origin of Tp (M) to the origin of Rn . If we write
y a for the solution to

d2 y a (s) a
dy b (s) dy c (s)
+ Γ bc =0
ds2( y(s) ds ds )
y 0 (0), y 1 (0), . . . , y n−1 (0) = ϕU (p)

dy a (s)
= ϵY a
ds s=0

and then expanding to first order in ϵ, we see that y satisfies

d2 y a (s)
= O(ϵ2 )
( ds2 )
y 0 (0), y 1 (0), . . . , y n−1 (0) = ϕU (p)

dy a (s)
= ϵY a
ds s=0

the solution to which is y a (s) = ϕU (p) + ϵY a s + O(ϵ2 ). In particular, y a (1) = ϵY a + O(ϵ2 ). From this
we see that the differential of (ϕU ◦ expp ) at the origin is the identity map, and hence (by the inverse
function theorem) (ϕU ◦ expp ) is invertible in a neighbourhood of the origin.
Finally, we note that, since ϕU is itself a bijection, and we have just seen that (ϕU ◦expp ) is a bijection,
it follows that expp is a bijection (in a neighbourhood of the origin).
Now that we know that the exponential map is a bijection in a neighbourhood of the origin, we can
use it to define some new local coordinates xa near p. We do this using the vectors ea as follows:

ϕV (q) = (x0 (q), x1 (q), . . . , xn−1 (q))


where expp (xa ea ) = q

Since expp is a bijection in a neighbourhood of the origin and the ea form a basis, this uniquely defines
the constants xa , as long as q is sufficiently close to p.



Now, by the definition of vector fields, the vector ∂xa is the tangent to the curve of constant xb ,
p
b ̸= a through the point p, parametrised by xa . From the definitions above, this is the curve

xa 7→ expp (xa ea )

which is simply the geodesic through p, with initial tangent vector ea , and with affine parameter xa .
Hence we have

= ea
∂xa p
and, since the ea are orthonormal, the components of the metric g at the point p are
( )
∂ ∂
gab p = g a
, b
= g(ea , eb ) = diag (−1, 1, 1, 1)
p
∂x ∂x

This coordinate system {xa } therefore satisfies the first of our desired conditions. We now need to
check the second condition, that is, that the Christoffel symbols vanish at p.

74
To check this, note that, for any set of constants X a , the curve given by

xa (s) = sX a

is an affinely parametrised geodesic. Hence these curves satisfy the geodesic equation, i.e.

d2 xa (s) a
dxb (s) dxc (s)
+ Γ bc =0
ds2 x(s) ds
ds
⇒ Γabc p X b X c = 0
a
Define Z(b) = δba . Then choosing X a = Z(b)
a
in the formula above, we conclude that

Γabb p = 0 (no sum)

On the other hand, if we choose X a = Z(b)


a a
+ Z(c) then we find that

Γabb p + Γabc p + Γacb p + Γacc p = 0 (no sum)

and since the first and last terms were already shown to vanish, it follows that

Γabc p + Γacb p = 0

Finally, since the Christoffel symbols are symmetric in the lower indices, we see that the Christoffel
symbols must vanish at p.

75

You might also like