0% found this document useful (0 votes)
46 views111 pages

General Relativity Lecture Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views111 pages

General Relativity Lecture Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

General Relativity

Alessandro Tomasiello

These are my lecture notes for general relativity. Topics denoted with a ∗ are slightly
more mathematical, and might not be strictly necessary to understand the rest.

Contents
1 Introduction 3
1.1 The Equivalence Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Some consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Special relativity 9
2.1 Lorentz transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Four-component notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Non-linear coordinate changes 17


3.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Accelerated observer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Lie derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6.1 Diffeomorphisms∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.7 Vectors on manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.7.1 Vectors as infinitesimal diffeomorphisms∗ . . . . . . . . . . . . . . . 31
3.8 Metrics on manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Curvature 34
4.1 Origin of the idea of curvature . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Covariant derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Covariant derivatives of other tensors . . . . . . . . . . . . . . . . . . . . . 40

1
4.4 Levi-Civita connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 The Riemann tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.6 Properties of the Riemann tensor . . . . . . . . . . . . . . . . . . . . . . . 44
4.7 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.8 Geodesics deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Einstein’s equations 52
5.1 Old derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2 Modern derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Variation of the action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4 Normal coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.5 Beyond general relativity?∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 Black holes 64
6.1 Spherical symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2 The Schwarzschild solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3 Falling in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.4 Nothing happens at the horizon . . . . . . . . . . . . . . . . . . . . . . . . 72
6.5 Penrose diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.6 Schwarzschild geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.7 Charged black holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.8 Rotating black holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7 Time evolution 88
7.1 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.2 Cosmology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.3 de Sitter space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.4 Anti-de Sitter space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

8 Gravitational waves 102

A Forms∗ 106

2
B Spinors∗ 107

1 Introduction
General relativity (GR) is our current theory of gravity.
It is one of the very few theories that seems to have been initially motivated at least
as much by theoretical considerations as by experimental clues (although of course it was
only accepted after experimental verification).
Let us see then at least a sketch of the logic that takes us to GR. It will not be a proof,
of course, but I hope I will make the idea more plausible, before we turn it into equations
in later sections.
After Einstein introduced Special Relativity, it became clear that gravity had to be
somehow made compatible with it. People at first started playing with what today we
would call a scalar field theory. Eventually, however, progress came by thinking about
what made gravity special.1

1.1 The Equivalence Principle


Gravity couples to everything. The analogue of “electric charge” for gravity is just mass:
mg,1 mg,2
F =G . (1.1.1)
r2
Moreover, these “gravitational” masses mg are experimentally observed to be the same
as the “inertial” mass in F = mi a. These experimental observations go back famously to
Galileo and his famous experiments with inclined planes (and with the leaning tower of
Pisa), but they have been repeated over the years with increasing accuracy (for a summary
see for example [1, Ch. I.2]).
Einstein decided to take this experimental fact and elevate it to a “principle”. This
step is somewhat similar to his decision of postulating speed of light to be constant, which
as we know led him to Special Relativity a few years earlier.
In its most concise form, this Equivalence Principle is simply

me = mg . (1.1.2)

To understand better its implications, and to make it a bit more precise, it is better to
think a bit about what it implies. Two classic thought experiments can help us. First,
1
For a historical introduction, see [1].

3
imagine to be in a sealed box in outer space, far from Earth or any other gravitational field.
Someone is pulling the box from the top, with a constant acceleration exactly equal to g,
the gravitational field on the surface of Earth. Since we are not in an inertial frame, we
experience an apparent force towards the bottom. Any object of mass m will experience
a downward force equal to mg. This is exactly the effect that we would experience on
the surface of Earth. So we might be fooled into believing that we are back on Earth,
even if we are really on a box in outer space. In a hypothetical Universe where me ̸= mg ,
we could realize that the box is being pulled; but in ours, me = mg , and we cannot tell
the difference. In this situation, the Equivalence Principle says that in fact there are no
experiments that can be performed that can tell us whether we are on Earth or in a box
in outer space being pulled at the right acceleration. Well. . . not quite: this is true if the
box is small enough, for a reason and in a sense that I will make precise below.
A second thought experiment is this. Imagine now being in a box that is falling freely
towards Earth. Here, “freely” means that the box is not subject to any other force, only to
gravitation. Now, apparent forces and gravitational forces cancel exactly, again because
me = mg , and we might think we are in outer space, well away from any gravitational
field.
(In the classic version of this second experiment, the box is an elevator, sometimes
with Einstein in it — perhaps reflecting the frustrations of the first students of GR. In
fact, today we have much better examples: take the International Space Station (ISS),
which is orbiting our Planet at around 400 km of height. This is about the distance
from Milan to Trieste, but many people seem to assume the ISS is very far indeed, since
its astronauts seem to experience weightlessness. In fact, the inverse square law (1.1.1)
implies that the gravitational field at that height is less than at the surface only by a
little more than 10%! The real reason for the weightlessness is that the ISS is in “free
fall”, in the sense I just explained: namely, it is in first approximation only subject to
gravitational forces.)
Let me now tell you why the box has to be small enough. The actual gravitational field
of the Earth is not constant: we could try to measure the tiny difference between
 the grav-

1 1
itational field at our feet and at our head, which from (1.1.1) is GM m R2 − (R+δr) 2 ∼
2GM m
R3
δr, where M and R are Earth’s mass and radius respectively; this is of order 3·10−7 ,
but if we have an apparatus sensitive enough and we fail to detect it we can conclude
that we are not on Earth, but being pulled by a mysterious prankster. These variations
of the gravitational field are called “tidal forces”, because tides on Earth are indeed due
to the non-constancy of the gravitational field of the Moon. The Principle can still be

4
formulated by saying that, given a certain experimental sensitivity, one can design a box
small enough that these tidal forces are not detected.
So far we have only looked at experiments that have to do with mechanics. Character-
istically, Einstein pushed it one step further and postulated it to be true for experiments
of any kind. This is sometimes called Einstein’s Equivalence Principle (EEP), to
distinguish it from the “Weak” Equivalence Principle we have discussed so far. Let me
restate it:
In a small enough region of spacetime, no experiments can detect the existence of a
gravitational field.
In other words, no experiments can tell the difference between a gravitational field
and an apparent force, due to the observer being in an accelerated frame.
The Relativity Principle, which was the basis of Special Relativity, told us that all
the laws of physics are the same in all inertial frames. Essentially the EEP now tells us
that the laws of physics are the same in all frames, inertial or not. An accelerated frame
has the disadvantage that apparent forces appear; however, the EEP tells us that these
forces should promoted to have the same dignity as a gravitational field. (Or perhaps the
gravitational field should be demoted to the rank of an apparent force; we will come back
to this point of view later.)
Actually, even defining an inertial frame is impossible in presence of gravity. The best
we can do is to define a locally inertial frame, which is the frame of a “freely falling”
observer in the sense I defined earlier. The “locally” is to emphasize that even in such
a frame one can detect tidal forces if one takes the box to be large enough. One can
now even formulate the EEP as the statement that in a locally inertial frame, within
small enough distances, the laws of physics have the same form as in an inertial frame in
absence of gravity.

1.2 Some consequences


After all these ways of stating the EEP, let us look at two famous consequences: the
deflection of light by a gravitational field, and the gravitational redshift.
Let us first think again about our box pulled at a constant acceleration, along a
direction that we will call “vertical”. If we throw an object horizontally, it will go “down”
in the vertical direction a little bit, just like in a gravitational field; this is of course just
the equivalence principle again. Let us now take a laser and shine it horizontally from
one side of the box to the other (for a length L). What will happen? From the time when

5
t=1
t=0 t=0
t=1

Figure 1: A light ray is emitted on the left side of the box at t = 0 and detected at t = 1 on
the right side. In the meantime the box has moved upwards, to the position indicated by the
dashed outline. The effect is perceived by an observer in a box as a curving light trajectory.

a photon is emitted on one side of the box to the time when it hits the other side, the box
2
will have moved upwards by a vertical distance ∆z = 21 a∆t2 = 12 a Lc2 . So we will observe
that the laser gets deflected vertically by this tiny amount (figure 1).
If the EEP is correct, we should not be able to tell the difference between this situation
and the situation in which the box is in a gravitational field. So the EEP implies that light
rays are deflected by gravity. This is in contrast with Newton gravity, since photons are
massless and shouldn’t experience any gravitational attraction,2 but it is indeed observed;
this was actually the first test of general relativity, performed by Eddington in 1919.
Let us now think about shooting our laser “upwards”, i.e. in the same direction as the
acceleration. A photon of wavelength λ is emitted at the floor, say, and detected at the
pavement at a vertical distance L at a time ∆t = Lc later. During this time, the pavement
will be faster by ∆v = a∆t = a Lc . Because of the Doppler effect, the detector will see a
photon of a different wavelength; the difference will be ∆λ = λ ∆v
c
= λa cL2 .
Again, if the EEP is correct, the same conclusion must hold for a box in a gravitational
field. Thus a photon traveling in a gravitational field should undergo a gravitational
redshift. This effect is again observed experimentally.
2
If one considers the deflection of the trajectory of a particle with small m, and then takes m → 0
limit, one obtains a non-zero deflection, but the result is off by a factor of 1/2.

6
1.3 Curvature
The EEP says that the gravitational field is always locally undetectable. In particular,
within small distances, particles propagate just as they would in inertial frames: they go
straight. However, over large distances particles are deflected by gravity, and we have
seen that this happens even to light.
This is a little unsettling. If even light does not travel in a straight line, what could
possibly be straight? Indeed, let us just define a straight line in space to be one on which
light travels.
This is a good idea, but it leads to important consequences. For example, we can
imagine sending two laser beams which are almost parallel, but which go left and right of
a star. The gravity of the planet will deflect the laser beams, and if the gravity of the star
is strong enough (and the angle between the beams is small enough) the two might cross
again at some other point behind the star. This is the basis of the so-called “gravitational
lensing” effect; it is a violation of the famous “fifth axiom” of Euclid, according to which
two lines cannot intersect at more than one point.

(a) (b)

Figure 2: (a) On a sphere, the sum of the inner angles of a triangle is not π. Here is an example
where it is 23 π. (b) Two geodesics can intersect in two points.

Or, you can imagine drawing a huge triangle, shooting laser beams from three vertices
located outside Earth. The triangle sides cannot be straight, because light is deflected;
hence the inner angles of the triangle will have higher angles than usual. Their sum will
not be π, but higher. This is again in contrast to Euclidean geometry. Such violations are
commonly encountered in the geometry of curved spaces. Trajectories that “go straight”
are defined to be the ones with the shortest length, and they are called “geodesics”. A
well-known example is the surface of a sphere; both violations of Euclidean geometry we

7
just described occur in this case (figure 2). Geodesics are (pieces of) great circles, such
as the meridians on the surface of the Earth (hence the name geodesics).
Thus, according to our definition of “going straight” as being described by light, we
now have to conclude that somehow three-dimensional space itself is curved. This is
harder to understand than the surface of a two-dimensional sphere; people often ask if
this happens because there is some “outside” four-dimensional space, on which our three-
dimensional space is embedded. This auxiliary four-dimensional space is not necessary
at all: mathematicians have developed tools to talk about curvature of a space without
embedding it into anything. We will see such tools later on.
Of course also trajectories of ordinary massive objects curve, in presence of gravity.
But they are not geodesics in space: we can throw an object from a point to another and
let it describe several different trajectories (figure 3(a)). In fact, we said earlier that the
geodesic is the trajectory of a ray of light.
But we also know that in special relativity space and time are intimately connected;
a frame transformation mixes them. This suggests that perhaps we should think about
the path of a particle in spacetime (sometimes called “worldline”), rather than just in
space. Two trajectories that start and end at the same point in space are now actually
connecting two different points in spacetime (figure 3(b)). Actually, special relativity also
tells us that a trajectory in spacetime is the one that maximizes proper time, rather than
minimizing length.

(a) (b)

Figure 3: (a) Two trajectories for objects in free fall in space. (b) The same trajectories in
spacetime.

All this suggests that the trajectories in spacetime of all objects are geodesics, but in
a curved spacetime, rather than in curved space. This is a remarkable conclusion: if all
objects are really going “straight in a curved spacetime”, there is no need to postulate a
force of gravity any more!

8
There is another famous thought experiment that suggests that spacetime is curved,
and not just space. Think about the gravitational redshift of section 1.2. Suppose we
emit at a position z in a gravitational field a light wave of wavelength λ, and we measure
the period of the wave, namely the time ∆t = λ/c between two “crests” of the wave.
When the wave arrives at a position z ′ where the gravitational field is different, the
wavelength will have changed to a new value λ′ . Hence we are forced to conclude that
the time interval between the two crests ∆t = λ′ /c is different at this new point; time is
“running slower”. (We are not playing with words: this effect is very real and can also be
measured by sending a clock at a higher altitude for a while and taking it back down.) In
a spacetime diagram, the trajectories of the two crests seem parallel, but their distance
in time decreases. This again seems to suggest that spacetime is in fact curved.
Let me summarize our conclusions:
There is no “force of gravity”. Objects in free fall (i.e. not subject to other forces)
describe spacetime trajectories which are locally straight. These trajectories are globally
curved because spacetime itself is curved.
Since gravitational effects happen around matter, we also have to conclude that:
Matter curves spacetime.
So far it all sounds a little vague, because we have not really defined what this “curva-
ture” means. Fortunately, this is a topic that mathematicians had thought about before
the EEP came along, so we can try to use their investigations for physical purposes. We
will see that their definitions are exactly what we need for physics. The idea of curved
space started as a vague logical possibility, hidden in the fact that the fifth axiom of Eu-
clid could not be proven; then became an exoteric mathematical subject; and was finally
revealed an established property of spacetime. Think about it next time you sneer at an
idea for being too abstract!

2 Special relativity
Let me start by reviewing special relativity.3
3
In what follows I have essentially copied my lecture notes on Electromagnetism for mathematicians.

9
2.1 Lorentz transformations
Our basic postulate, that the speed of light is the same in any inertial frame, is very
unintuitive to most people. In everyday experience, velocities sum: if I travel with velocity
v1 towards an object headed towards me with velocity v2 , I should see that the object
is coming towards me with total velocity v1 + v2 . However, this expectation is wrong.
Indeed, we just postulated that the speed of light is the same in any inertial frame. So if
I travel with velocity v towards a ray of light, which has velocity c, I will see that ray of
light coming towards me... with velocity c, not c + v. So velocities shouldn’t sum.
From the postulate that c is the same in any frame, one can derive rather easily
the most famous relativistic effects; relativity of event simultaneity, time dilation, length
contraction. I will not go through the details of those derivations, but you can find them
discussed nicely for example in [2, Sec. 12.1.2].
For example, a Lorentz transformation (or “boost”) of velocity v in the x direction
reads  ′    
ct γ βγ 0 0 ct
 x′   βγ γ 0 0  x 
 ′ =  , (2.1.1)
    

 y   0 0 1 0  y 

z 0 0 0 1 z
where (t, x, y, z) and (t′ , x′ , y ′ , z ′ ) are the coordinates in the two frames, and
v 1
β≡ , γ=p . (2.1.2)
c 1 − β2

To see how velocities are actually composed, let us consider an object moving with
velocity u and perform a boost v in direction opposite to it. Using (2.1.1),

dx γ(dx + vdt) u+v


u= → = . (2.1.3)
dt γ(dt + vdx/c2 ) 1 + uv
c 2

When u and v are much smaller than c, this is very well approximated by u + v. At the
other extreme, we see that if u = c, (2.1.3) gives c, as it should.
On the other hand, we can define the rapidity
!
1
λ ≡ cosh−1 p , (2.1.4)
1 − β2

10
(2.1.1) reads
ct′
    
cosh λ sinh λ 0 0 ct
 x′   sinh λ cosh λ 0 0  x 
= . (2.1.5)
    
y′
 
   0 0 1 0  y 
z′ 0 0 0 1 z
In other words, a Lorentz transformation looks like an ordinary rotation, where the
trigonometric functions have been replaced by hyperbolic function. The effect on the
(t, x) axes is shown in figure 4. In such a diagram, time dilation and length contraction
effects can be easily visualized; see figure 6. Rapidities do sum: it is easy to show that

Λ(λ1 + λ2 ) = Λ(λ1 )Λ(λ2 ) . (2.1.6)

t=x
t= x

t
t

Figure 4: A Lorentz boost; here we took c = 1. The dashed lines are branches of two hyperbolas.

In fact, Lorentz transformations are just like ordinary rotations, except that they have
to keep invariant the quadratic form
1 2
τ 2 ≡ t2 − (x + y 2 + z 2 ) , (2.1.7)
c2
rather than the ordinary “Euclidean” quadratic form x2 + y 2 + z 2 . τ is known as proper
time, because (when τ 2 > 0) it is the time measured by an observer moving in a straight
path in spacetime from the origin to the point (t, x, y, z). (When τ 2 < 0, iτ is real
and is (1/c)× the distance measured between the origin and (t, x, y, z) by an observer
for which those two events are simultaneous.) Lorentz transformations are just linear

11
transformations of the spacetime R4 that keep the quadratic form (2.1.7) invariant. This
definition also includes ordinary rotations in R3 , mixing (x, y, z) and keeping t invariant —
as well as “boosts” such as the one in (2.1.1). The most general element of this “Lorentz
group” depends on six parameters, as opposed as the three parameters on which a rotation
in R3 depends.4

worldline of
worldline of
traveling twin
twin at rest

Figure 5: Paths in spacetime for a twin at rest and a twin traveling back and forth at almost
the speed of light.

The quadratic form (2.1.7) has slightly unusual properties, of course. A path γ between
R R p
two points in Euclidean space, measured by the integral γ dl = γ dx2 + dy 2 + dz 2 , can
be as long as one wants; the shortest path is a straight line, which minimizes the distance.
Between two points in the spacetime R4 (sometimes called “events” to stress the fact that
time is one of their coordinates), proper time τ is measured by the integral
Z Z p
dτ = dt2 − (dx2 + dy 2 + dz 2 )/c2 , (2.1.8)
γ γ

and it can be made as short as one wants; one can simply travel very close to light rays,
i.e. very nearly at the speed of light — see figure 5. The longest path is now a straight
line in spacetime, which maximizes distance. This is the Einstein’s paradox of the two
twins.
4
Saying that it is a “group” just means that it is closed under matrix multiplication (and thus that
the product is associative), that there exists an “identity” transformation that leaves all R4 invariant (it
is just the identity matrix), and that every element has an inverse.

12
worldlines of object
boundary

dilated time
interval

time t� length of object


in moving frame
interval t
x�

x
length of object
in rest frame

(a) (b)

Figure 6: (a): Time dilation effects. (b): Length contraction effects.

The idea of extremizing a functional while moving in spacetime might be familiar to


you; it is the principle of minimal action! In fact the proper time (2.1.8) can be interpreted
as the action of a relativistic particle:
r
ẋ2
Z Z Z p
2 2 2
S = −mc dτ = −mc dt 1 − 2 = −mc dt 1 − β 2 , (2.1.9)
c

where ẋi = ∂t xi and ẋ2 = ẋi ẋi . As a cross-check, we see that for small velocities S ∼
m −1 + 2c12 ẋ2 , which (apart from a constant, which comes from rest energy) is the
R 

usual action for a non-relativistic free particle. It is also instructive to derive the equations
of motion from (2.1.9):
!
ẋi δ ẋi i
Z Z

δS ∝ dt p = − dtδxi ∂t p ; (2.1.10)
1 − β2 1 − β2

expanding the derivative gives

ẋi ẋj
 
ij
δ + 2 ẍj = 0 . (2.1.11)
c (1 − β 2 )

The matrix in round brackets has two eigenvalues equal to 1, and one equal to (1 − β 2 )−1 ,
so it is non-degenerate; it follows that

ẍi = 0 , (2.1.12)

13
which is a straight trajectory in spacetime, as expected. (You might wonder at this point
why bother introducing (2.1.9), which gives the same equations of motion as the non-
relativistic one; the answer is that we see the difference when we introduce external fields,
and (2.1.9) turns out to give the correct relativistic answer in that case.) If you think the
derivation we just gave of the equations of motion is too complicated, here is a trick: the
action
c2 1 − β2
Z  
2
S=− dt + em , (2.1.13)
2 e
with an auxiliary field e, is classically equivalent to (2.1.9), as can be seen by using e’s
p
equation of motion, which reads e = m1 1 − β 2 . The equation of motion for xi from
(2.1.13) is now directly (2.1.12).

We emphasized earlier that one could travel “very nearly” at the speed of light, because
traveling at the speed of light is only possible for a particle which has zero mass, such as
a photon, which is indeed a quantum of light. This is because the analogue of (2.1.7) in
momentum space is the mass:

(mc2 )2 = E 2 − c2 (p2x + p2y + p2z ) . (2.1.14)

For a body at rest, pi = 0 and we recover the famous relation E = mc2 . To get the energy
and momentum for a moving particle, we can use a boost as in (2.1.1), to obtain

E = mc2 γ , p⃗ = m⃗v γ. (2.1.15)

Notice that for small velocities E = mc2 γ ∼ mc2 1 + 21 β 2 + . . . = mc2 + 12 mv 2 + . . .; this




is the energy “contained” in the mass, plus the Newtonian kinetic energy. On the other
hand, for a particle with m = 0, (2.1.15) cannot be applied at all; going back to (2.1.14)
we find that E = c|p|.

2.2 Four-component notation


We saw that a boost such as (2.1.1) leaves the quadratic form (2.1.7) invariant. Let us
try to be more systematic about this. Notice that

τ 2 = −xt ηx , (2.2.1)

where  
−1 0 0 0
 0 1 0 0 
η ≡ diag(−1, 1, 1, 1) =   . (2.2.2)
 
 0 0 1 0 
0 0 0 1

14
By the way, when we consider R4 with this quadratic form, we call it R3,1 to emphasize
its signature, and we call it Minkowski space (or spacetime).
If x = (ct, x, y, z)t is the position four-vector, a Lorentz transformation can now be
defined as a linear action
x → Λx (2.2.3)
such that τ 2 is invariant: thus
Λt ηΛ = η . (2.2.4)
Notice that the inner product of two vectors transforming as in (2.2.3) is not invariant:
xt1 x2 → xt1 Λt Λx2 ̸= xt1 x2 . (2.2.4) shows us how to cure this; we can introduce a different
type of vectors, which would transform as

y → ηΛηy . (2.2.5)

Vectors transforming like the position four-vector (see (2.2.3)) are sometimes called con-
travariant, and vectors transforming as in (2.2.5) covariant. Now we have that the in-
ner product between a contravariant and a covariant vector is indeed invariant: y t x →
(ηΛηy)t Λx = y t ηΛt ηΛx = y t x. Notice that η turns a contravariant vector into a covariant
one: for example ηx is a covariant vector. This is consistent with the fact that (2.2.1) is
an invariant. The viceversa is also true: η turns a covariant vector into a contravariant
one.
It is often convenient to present these concepts using indices. The position four-vector
reads  
ct
 x 
xµ ≡   . (2.2.6)
 
 y 
z
The index µ is traditionally taken to run from 0 to 3, so that x0 = ct. As we have
seen, the position four-vector is a contravariant vector. There is a visual trick that will
allow us to tell the difference between a covariant and a contravariant vector. Namely,
contravariant vectors will have their index up, as in (2.2.6); covariant vectors will have
their index down: we will write them as yµ . Now, since we know that the inner product
of two contravariant vectors is not invariant, while the inner product of a covariant and a
contravariant vectors is invariant, we we will refine the summed-index notation to to say
that a pair of summed indices should always be one up and one down: y t x = yµ xµ .

15
This has several consequences. From (2.2.3) we see that Λ needs to have one index up
and one down: so we write Λµ ν . So (2.2.3) reads

xµ → Λ µ ν xν . (2.2.7)

On the other hand, (2.2.1) shows that η has to be written with two indices down:

τ 2 = −xµ ηµν xν . (2.2.8)

As we observed earlier, η turns a contravariant vector into a covariant one; in our index
language, this simply means that it can be used to lower an index, xµ ≡ ηµν xν . We also
observed that η turns a covariant vector into a contravariant one; so in fact we can also
write η as η µν . This is consistent with the fact that η µν ηνρ = δ µ ρ , or in other words
η 2 = Id. We can now use η µν to raise indices.
Now (2.2.4) reads
(Λt )µ ν ηνρ Λρ σ = ηµσ . (2.2.9)
Since (Λt )µ ν = Λν µ , by lowering and raising indices we can rewrite (2.2.4) as

Λρ µ Λρ ν = δ µ ν . (2.2.10)

Finally, just like we have introduced a position four-vector, we can also introduce a
momentum four-vector pµ , whose entries are p0 = 1c E and pi . Then (2.1.14) can be written
as
c2 m2 = −η µν pµ pν . (2.2.11)
(2.1.15) can then be written as

∂xµ ∂xµ
pµ = mγuµ = mγ = mv µ , vµ ≡ ; (2.2.12)
∂t ∂τ
dt
we have used dτ
= γ. Notice that uµ has components (c, v i ); on the other hand, v µ
satisfies
ηµν v µ v ν = −1 . (2.2.13)
For an observer at rest, v µ = (1, 0, 0, 0)t .
From now on it becomes increasingly cumbersome to keep track of the powers of c; we
will set
c=1 (2.2.14)
in what follows.

16
3 Non-linear coordinate changes
In the previous section, we have reviewed the physics of how inertial frames are related to
each other, via Lorentz transformations. However, as we explained in the introduction,
understanding gravity motivates us to consider also what happens in non-inertial frames
— indeed we also mentioned that there is no such a thing as a perfectly inertial frame in
presence of gravity.
Thus in this section we will start looking at more general coordinate changes.

3.1 Metrics
Lorentz transformations are linear, in the sense that the new coordinates x′µ are a linear
combination of the old ones xµ :
x′µ = Λµ ν xν . (3.1.1)
A slight generalization of this is taking a constant matrix that is not Lorentz, x′µ = M µ ν xν .
A famous example is given by the so-called light-cone coordinates x′± = x0 ±x1 , x′2,3 = x2,3
in terms of which  
0 −1/2 0 0
 −1/2 0 0 0 
gµν =  . (3.1.2)
 
 0 0 1 0 
0 0 0 1
However, the discussion of the equivalence principle in the introduction should have
convinced us that we will also need to consider more general nonlinear coordinate changes,
where x′µ are arbitrary functions of the xµ :

x′µ = x′µ (x0 , . . . , x3 ) . (3.1.3)

We will see a concrete example soon in Sec. 3.2.


It is customary to introduce the squared line element

ds2 = ηµν dxµ dxν . (3.1.4)

This is just −dτ 2 ; it is more suited to measuring lengths rather than times. Under (3.1.3):
 µ  ν 
∂x ∂x
2 µ ν
ds = ηµν dx dx = ηµν ′ρ
dx′ρ dx′σ . (3.1.5)
∂x ∂x′σ
We can define
∂xµ ∂xν
  
gρσ ≡ ηµν = (J t ηJ)µν , (3.1.6)
∂x′ρ ∂x′σ

17
∂xµ
where ∂x′ρ
is the Jacobian matrix ; then (3.1.5) is

ds2 = gµν dx′µ dx′ν . (3.1.7)

This is formally the same expression as (3.1.5), but with gµν playing the same role as ηµν
in the original coordinates. In other words, gµν is the expression for the metric tensor in
the x′ coordinates. Notice that for a Lorentz transformation (3.1.1) J = Λ, and (recalling
(2.2.9)) we recover gµν = ηµν .
The crucial novelty is that this new gµν will in general depend nonlinearly on the
coordinates. It is not completely arbitrary: at every point it is still symmetric, non-
degenerate (det g ̸= 0), and it still has one negative and three positive eigenvalues (just
like η). In other words, at every point it is a quadratic form of signature (1, 3).
Inspired by this, we then define a metric as a non-degenerate point-dependent quadratic
form gµν (x). We are not requiring that this gµν (x) should be obtained from ηµν by a co-
ordinate change, so this is possibly a generalization. (In fact it is not clear at this point
whether any metric can be generated this way; we will see later that this it is not the case
at all.) We will mostly be interested in metrics that have signature (1, 3) everywhere,
which are called Lorentzian; a metric whose eigenvalues are all positive everywhere is
instead called Riemannian. This latter case is considered mostly by mathematicians, but
it is also useful in physics whenever one has a favorite time frame, so that it makes sense
to consider a metric on three-dimensional space. (This is the case in cosmology, or if one
wants to set up a Hamiltonian treatment.)
The line-element ds2 = gµν dxµ dxν is a physically meaningful quantity, so it should
be the same in all coordinates, even if the x and g change. So under general coordinate
changes (3.1.3) we have
 ρ  σ 
′ ∂x ∂x
gµν → gµν = ′µ
gρσ . (3.1.8)
∂x ∂x′ν

(3.1.8) generalizes (3.1.6). Again, when ∂x
∂x
is a constant matrix M , (3.1.8) becomes the
t
ordinary transformation g → M gM for quadratic forms.

3.2 Accelerated observer


As an example of non-trivial metric, let us see how Minkowski space looks like viewed by
an (uniformly) accelerated observer.
µ
Recall from (2.2.12) the definition v µ = ∂x∂τ
, which obeys ηµν v µ v ν = −1, and that
for an observer at rest v µ = (1, 0, 0, 0)t . We want to consider uniform acceleration in

18
the x direction, and thus completely ignore directions y, z. Obviously at any given
time τ an observer sees, in their reference frame at that time, again the velocity vector
v µ (τ ) = (1, 0). Uniform acceleration means that v µ (τ + dτ ) ∼ (1, adτ ). We can think
of this is as Λ(β = adτ )v µ (τ ), where Λ(β = adτ ) is an infinitesimal boost (2.1.1) with
parameter β = adτ .
The velocity vector at any time should then be the result of composing many such
infinitesimal boosts. We write τ = N dτ , with N ≫ 1; then v µ (N dτ ) = Λ(adτ )N v µ (τ =
0). Since rapidities sum, (2.1.6), this is Λ(N adτ )v µ (0) = Λ(aτ )v µ (0). If the object is
initially at rest, v µ (τ = 0) = (1, 0)t . All in all:
!   
µ µ cosh(aτ ) sinh(aτ ) 1 cosh(aτ )
v (τ ) = Λ(aτ )v (τ = 0) = = . (3.2.1)
sinh(aτ ) cosh(aτ ) 0 sinh(aτ )

Taking the primitive we obtain


1 1
x0 = sinh (aτ ) , x1 = cosh (aτ ) . (3.2.2)
a a
The worldline is a spacelike hyperbola: (x1 )2 − (x0 )2 = 1/a2 . For τ → ∞ the trajectory
is asymptotic to a light ray: velocity is approaching c, as it should. (In (3.2.2) we have
fixed an integration constant so that this light ray goes through the origin.)
Now we want to take coordinates “adapted” to such an accelerated observer. The
new space coordinate ξ (replacing x) should be such that the ξ =constant loci are tra-
jectories of other accelerated objects in the observer’s frame. Imagine the observer has a
stick extended along x; we would like the two endpoints of this stick to be two lines at
ξ =constant. This is achieved by boosting the stick by a parameter α = aτ . So these lines
are all hyperbolae. As for the new time coordinate η, we would like it to be such that
the loci η =constant are simultaneous from the point of view of the accelerated observer.
Thus these lines should be orthogonal to v µ with respect to η. Since xµ is orthogonal to
v µ , such lines are in fact going through the origin. Another way of seeing this is again
imagining to boost the x axis by α = aτ ; see figure 7.
Both requirements are satisfied by choosing
1 1
x0 = eaξ sinh(aη) , x1 = eaξ cosh(aη) . (3.2.3)
a a
We can now compute the metric in the new coordinates (η, ξ) using (3.1.6). It is even
faster if we write dxµ in terms of dx′µ = (dη, dξ) and replace in ds2 = ηµν dxµ dxν . Either
way:
ds2 = e2aξ (−dη 2 + dξ 2 ) . (3.2.4)

19
3

t
2

x
!1

!2

!3
0 1 2 3 4 5 6

Figure 7: The region on the right is the one covered by the coordinates (3.2.3); lines with
constant η and ξ are in black and red respectively.

Notice that coordinates (η, ξ) only cover one fourth of Minkowski space: the region where
x1 > ±x0 . For this reason, the metric (3.2.4) deserves a name: it is called Rindler space.
Notice that points in {x1 ≤ x0 } cannot reach the accelerated observer: if a point in this
region emits even a light ray, it cannot ever cross the line x1 = x0 and reach the Rindler
region. On the other hand, points in {x1 ≤ −x0 } cannot be reached by the Rindler
region, for the same reason. The components of the boundary of Rindler space are called
horizons.
Yet another interesting set of coordinates is obtained by taking Z = a1 eaξ ; we also
rename x1 → x3 . Using now (3.1.8), or directly via the line element, we obtain

ds2 = −a2 Z 2 dη 2 + dx21 + dx22 + dZ 2 . (3.2.5)

The space part of this is now the usual Euclidean line element; we will see later that the
non-trivial 00 component signals a gravitational potential. This is then a version of one
of our thought experiments in the introduction: in the reference frame of an accelerated
observer, a sort of gravitational force manifests itself.

3.3 Vector fields


We have seen that a metric is a point-dependent quadratic form: the transformation law
(3.1.8) is the same we know for quadratic forms in linear algebra, with the basis change

20
matrix now being point-dependent. This suggests that other concepts of linear algebra
will now have to become point-dependent as well.
Let us look at the concept of “vector”. A point-dependent analogue of this is the idea
of “vector field”: the choice of a “little arrow at every point”. This has already appeared
in many places in physics. In R3 , for example, you have seen the velocity field of a fluid,
or the electric field. Our vector fields will now be in spacetime, but it will be intuitively
useful to keep those examples in mind.
Drawing a little arrow at each point does not require a choice of a coordinate system.
However, its components along do depend on such a choice. A priori it is unclear where the
index should go; namely, if the components should be written as v µ or vµ . A possible way
to answer is to imagine taking a directional derivative along v, limϵ→0 (f (x + ϵv) − f (x))/ϵ.
This object cannot depend on a coordinate choice, but only on f and on v. An ordinary
partial derivative transforms as
∂ ′ ∂xν
∂µ ≡ → ∂µ = ∂ν . (3.3.1)
∂xµ ∂x′µ
So the directional derivative is independent of the coordinate choice if we write the com-
ponents with an upper index, v µ , postulating a transformation law
∂x′µ ν
v µ → v ′µ = v . (3.3.2)
∂xν
Indeed the Jacobian matrix is the inverse of that in (3.3.1); so the two cancel each other:
∂x′µ ν ∂xν
v ′µ ∂µ′ =v ∂ρ = δρµ v ν ∂ρ = v µ ∂µ . (3.3.3)
∂xν ∂x′ρ
We obtain an operator that has the same effect in any coordinate system:
f (x + ϵv) − f (x)
vf ≡ v µ ∂µ f = lim . (3.3.4)
ϵ→0 ϵ
We define a vector field v µ to be an object that under nonlinear coordinate trans-
′µ
formations changes as in (3.3.2). Comparing with special relativity, where ∂x ∂xν
= Λµ ν ,
we see that this new concept extends that of contravariant vector. The crucial new point
is that a vector field is point-dependent. Just like for metrics, even if we start from a v µ
that is constant throughout spacetime, after a nonlinear change of coordinates we end up
with a nonconstant one.
In special relativity, an important role of the metric was to compute the length of a
vector, ηµν v µ v ν ; an example was (2.2.8). If we try to define a similar object with a general
metric,
v 2 ≡ gµν v µ v ν , (3.3.5)

21
we see that this quantity will not transform as a function unless we make v µ transform
non-trivially: this gives us again (3.3.2). Again, the Jacobian matrix in (3.3.2) is the
inverse of that in (3.1.8); so the two cancel each other in (3.3.5).
The directional derivative (3.3.4) can also be thought of as the generator of the in-
finitesimal transformation
xµ → xµ + ϵv µ (x) . (3.3.6)
If we think of (3.3.6) as a map (active transformation), the “little arrow at every point”
v µ is telling us where the infinitesimal map is “pointing towards”. In other words, the
little arrow is a small displacement from each point to a point nearby.
We use these generators extensively in other areas of physics. The vector v µ = δ3µ ,
v = ∂1 , generates translations along direction x1 . The vector v = x∂y − y∂x generates
rotations around the axis x3 . These are of course proportional to momentum and angular
momentum operators in quantum mechanics.

Example 3.1 Let us go back to  the change  of coordinates (3.2.3). If x′µ = (η, ξ, x2 , x3 ),
µ
∂x aξ cosh(η) sinh(η) 1 0

the nontrivial part of ∂x ′ν is e sinh(η) cosh(η) = a xx0 xx1 . Thus for example we compute
∂ξ = a(x0 ∂0 +x1 ∂1 ). This vector is “radially directed” in the 0–1 plane, in the sense that at
every point (x0 , x1 ) its coordinates are proportional to (x0 , x1 ) itself. Indeed it is tangent
to the curves η = const, which are straight lines emanating from the origin; see figure 7.

Conversely, one can “exponentiate” any vector field to recover a family of maps. In
the translation example, one can write
1
exp[τ ∂x ]f (x) = f (x) + τ ∂x f (x) + τ 2 ∂x2 f (x) + . . . = f (x + τ ) , (3.3.7)
2
where we have used the Taylor expansion formula. (We have assumed that the “test
function” f is analytic.) This is a family of translations, parameterized by the parameter
τ . For a general vector field, performing the exponential is not so easy, but in theory it
can be done. Essentially one wants to solve the differential equation
∂xµ
= vµ , (3.3.8)
∂τ
which means that one looks for a trajectory which is tangent to v µ at every point.
A non-trivial property of vector fields is that their commutator as operators is also a
vector field. Indeed let us act on a test function f :

[v, w]f = v µ ∂µ (wν ∂ν f ) − wµ ∂µ (v ν ∂ν f ) = (v µ (∂µ wν ) − wµ (∂µ v ν ))∂ν f . (3.3.9)

22
the second derivatives have cancelled. So [v, w] is a new first-order differential operator,
which can be thought of as a new vector field, called the Lie bracket of the two vector
fields v and w, with components5

[v, w]ν = v µ (∂µ wν ) − wµ (∂µ v ν ) . (3.3.10)

As a cross-check, one can check that [v, w] transforms as in (3.3.2). The operation [ , ]
turns the space of vector fields into a Lie algebra.
(3.3.10) has an intuitive interpretation. Let us apply the infinitesimal map associated
to v, and then the one associated to w. A point with coordinates xµ is first sent to
xµ + ϵv µ (x); then, in the second step, this is sent to6

(xµ + ϵv µ (x)) + ϵ′ wµ (x + ϵv) ∼ xµ + ϵv µ + ϵ′ wµ + ϵϵ′ v ν ∂ν wµ . (3.3.11)

We can on the other hand we apply first the infinitesimal map associated to w and then
the one associated to v; we get a similar result, but with (v ↔ w, ϵ ↔ ϵ′ ). The difference
between the two is
ϵϵ′ [v, w]µ . (3.3.12)
This is the geometrical interpretation of the Lie bracket; see also figure 8.

p vp [v, w]
wp

p
w

vp
p

Figure 8: The geometrical meaning of the Lie bracket.

3.4 Tensors
We have seen that vector fields generalize contravariant vectors. What is the generaliza-
tion of a covariant vector? Inspired by the discussion in Sec. 2.2, we would like this to be
5
[v µ ∂µ , wν ∂ν ] can be evaluated more directly if we recall from quantum mechanics that [∂µ , α] = (∂µ α),
where α is a multiplication operator. However, such a formula is established exactly by acting on a test
function as in (3.3.9)
6
ϵ and ϵ′ can be taken to be equal, but they are independent parameters. Effects of order ϵ2 and ϵ′2
will cancel out in the end.

23
an object with components ωµ , such that

v µ ωµ (3.4.1)

does not depend on the coordinate choice. For this to work, we need to postulate
∂xν
ωµ → ων . (3.4.2)
∂x′µ
Indeed, with a computation similar to (3.3.3):
∂x′µ ν ∂xν
v ′µ ωµ′ = v ωρ = δρµ v ν ωρ = v µ ωµ . (3.4.3)
∂xν ∂x′ρ
An object that transforms in this way is called a form, or sometimes a one-form. Notice
that we are not requiring (3.4.1) to not depend on the point, only on the coordinate
system; in general, it will be a function.
One can combine ωµ with the infinitesimal coordinate differences dxµ , to write

ω ≡ ωµ dxµ , (3.4.4)

similar to what we do for the line element and the metric. Abstractly, we can think that
dxµ is the linear map with the property dxµ (∂ν ) = δνµ ; applying ω to v = v µ ∂µ then gives
back (3.4.2).

Example 3.2 • The partial derivatives of a function, ∂µ f , are the components of a


form. Contracting this form with a vector as in (3.4.1) gives back vf in (3.3.4).

• Similar to Sec. 2.2, using a metric we can now lower the index of a vector:

vµ ≡ gµν v ν . (3.4.5)

The resulting object transforms as a one-form: this can be seen by using the trans-
formation law of g and v separately, (3.1.8) and (3.3.2). Two of the coordinate
change matrices contract and give an identity; only one remains, giving the correct
transformation law for a one-form. Likewise, given a one-form ωµ we can define a
vector field ω µ ≡ g µν vν . This trick can be generalized to any tensor: we can lower
an index using gµν , or raise an index using g µν , and we always obtain a new tensor.

We can also consider objects with many indices. A tensor of type (k, l) is an object
with k contravariant and l covariant indices:
∂x′µ1 ∂x′µk ∂xσ1 ∂xσl ρ1 ...ρk
T µ1 ...µk ν1 ...νl → . . . . . . T σ1 ...σl . (3.4.6)
∂xρ1 ∂xρk ∂x′ν1 ∂x′νl
24
Example 3.3 • From (3.1.8) we see that a metric is a (0, 2) tensor.

• Since gµν is invertible (it has non-zero determinant at every point), we can define
∂xµ
its inverse. This will no longer transform with two ∂x ′ν as in (3.1.8), but with their
′µ
inverses, which are ∂x∂xν
. So it is a tensor of type (2, 0), and we denote it with two
upper indices: it is the tensor g µν , such that

g µν gνρ = δρµ . (3.4.7)

However, not every object that can be written in terms of indices is automatically a
tensor! Take for example a derivative of a vector field, ∂µ v ν . Let us transform it:
 ′ν 
ν ′ ′ν ∂xρ ∂x σ ∂xρ ∂x′ν σ ∂xρ ∂ 2 x′ν σ
∂µ v → ∂µ v = ∂ρ v = ∂ ρ v + v . (3.4.8)
∂x′µ ∂xσ ∂x′µ ∂xσ ∂x′µ ∂xρ ∂xσ
In the last expression, we see that the first term is just like what we should have for a
(1, 1)-tensor from (3.4.6); the second term, however, spoils the fun. Hence ∂µ v ν is not a
tensor.
A similar problem arises for derivatives of one-forms, ∂µ ων . However, the antisym-
metrization 7 of such a derivative, ∂[µ ων] ≡ 12 (∂µ ων − ∂ν ωµ ) is a tensor:
∂xρ
 σ 
′ ′ ∂x ∂xρ ∂xσ ∂xρ ∂xσ
∂[µ ων] → ∂[µ ων] = ∂ρ ωσ = ∂ρ ωσ + ′[µ ∂ρ ′ν] ωσ
∂x′[µ ∂x′ν] ∂x′[µ ∂x′ν] ∂x ∂x
ρ σ 2 σ ρ σ
(3.4.9)
∂x ∂x ∂ x ∂x ∂x
= ∂[ρ ωσ] + ′[µ ′ν] ωσ = ∂[ρ ωσ]
∂x′µ ∂x′ν ∂x ∂x ∂x′µ ∂x′ν
So in particular the electromagnetic relativistic field-strength

Fµν ≡ 2∂[µ Aν] (3.4.10)

is a tensor. This works more generally: given a completely antisymmetric (0, k) tensor
Aµ1 ...µk , its antisymmetrized derivative ∂[µ Aµ1 ...µk ] is also a tensor. For this reason, anti-
symmetric (0, k) tensors are easier to study: they don’t need all the machinery we will
introduce later to deal with derivatives, and they are often studied separately. They are
often called k-forms. This explains the funny name we gave to (3.4.2).
Given the special role of forms, one also defines a special symbol for an antisymmetrized
⊗:
1
dxµ ∧ dxν ≡ (dxµ ⊗ dxν − dxν ⊗ dxµ ) , (3.4.11)
2
7
Square brackets will denote antisymmetrization of indices; round brackets will denote symmetrization.
Sometimes we include also vertical lines to clarify which indices are being (anti)-symmetrized: for example,
∂(mu| Tν|ρ) = 12 (∂µ Tνρ + ∂ρ Tνµ ).

25
similarly to what we did earlier for symmetric (0, 2) tensors such as the metric. The
electro-magnetic field strength can then be written compactly as
1
F = Fµν dxµ ∧ dxν . (3.4.12)
2
It is also convenient to introduce a symbol d for the antisymmetrized derivative, so that
(3.4.10) reads for example F = dA. We explore this notation further in appendix A.
The antisymmetrization trick only works for derivatives of (0, k) antisymmetric ten-
sors. For vectors, for example, it is hard to see how to antisymmetrize the expression
∂µ v ν . We will have to think of something better; we will see this in section 4.2.

3.5 Lie derivatives


To any vector v, we have seen that we can associate a linear first-order differential operator
(3.3.4). It acts on functions, and it gives back functions. We now would like to define a
similar derivative operator Lv , but one which acts on vectors wν , not on functions.
In order to do that, think now of v as being associated to an infinitesimal map (3.3.6),
which sends a point p with coordinates xµ to a point p′ with coordinates xµ + ϵv µ . Given
a vector wν , we want to compare its values at the points p′ and p. Intuitively, to perform
the comparison we want to apply to wν (p′ ) a change of coordinates J that makes the
value of the coordinates of p′ equal to the coordinates of p. Schematically:

J(x′ → x)w(p′ ) − w(p)


Lv w = lim . (3.5.1)
ϵ→0 ϵ
Concretely, the change of coordinates J(x′ → x) takes xµ → xµ − ϵv µ . So we have

∂x′ν ρ ′
(J(x′ → x)w(p′ ))ν = w (p ) = δρν − ϵ∂ρ v ν (wρ +ϵv µ ∂µ wρ ) ∼ wν +ϵ(v µ ∂µ wν −∂µ v ν wµ ) .

∂x ρ
(3.5.2)
Thus (3.5.1) gives
Lv w = [v, w] . (3.5.3)
We saw in section 3.4 that the derivative of a tensor is not a tensor. What we just saw is
that the Lie bracket is in some sense a derivative operator that acts on tensors. However,
it is not a very satisfactory one. We cannot take a “partial derivative” of a vector w at a
point p; we need an entire second vector field v, and we need to know its values even at
nearby points (as evident from the fact that the derivative of v also appears in [v, w], and
from 8). We will see in section 4.2 that a better definition of “partial derivative” can be

26
found, but only if we introduce “by hand” some extra data on how one should compare
a vector at one point p with a vector at nearby point p′ .
∂x
Lv can be defined similarly on forms. Since forms transform with a ∂x′
rather than

with a ∂x
∂x
, there is a sign difference, and we end up with

(Lv ω)ν = v µ ∂µ ων + ωµ ∂ν v µ . (3.5.4)

One can check that


Lv (wν ων ) = v µ ∂µ (wν ων ) . (3.5.5)
This is as it should be, since after all wν ων is a function (recall (3.4.3) again). We could
actually have determined (3.5.4) by imposing this; we will follow this approach later for
“covariant” derivatives.
One can similarly define Lie derivatives for any tensor. The idea is clear: we act on
any upper index as in

(Lv T )µν11...ν
...µk
k
= v µ ∂µ Tνµ11...ν
...µk
k

− Tνµµ 2 ...µk
1 ...νk
∂µ v µ1 − Tνµ11...ν
µ...µk
k
∂µ v µ2 − . . . − Tνµ11...ν
...µ
∂ v µk
k µ
(3.5.6)
µ1 ...µk
+ Tνν ∂ vν
2 ...νk ν1
+ Tνµ11ν...ν
...µk
∂ vν
k ν2
+ ... + Tνµ11...ν
...µk
∂νk v ν .

This has an interpretation similar to (3.5.1): it represents the action of an infinitesimal


map on T . More abstractly we can write

Lv = v µ ∂µ + ∂µ v ν ρ µ ν , (3.5.7)

where ρ is an algebraic operator that summarizes the action on all the indices of T . In
group theory terms, it is the representation of the gl(4) Lie algebra on the space of (k, l)
tensors: so it satisfies
[ρµ ρ , ρν σ ] = ρµ σ δρν − ρν ρ δσµ . (3.5.8)

An important particular case is the Lie derivative of the metric. In this case (3.5.6)
gives
(Lv g)µν = v ρ ∂ρ gµν + ∂µ v ρ gρν + ∂ν v ρ gµρ . (3.5.9)

3.6 Manifolds
In the introduction we have seen that in general relativity the concept of curvature is
going to replace the concept of “force of gravity”. As an intuitive example of curved
space, we have used the surface of a sphere.

27
Crucially, the surface of a sphere cannot be covered using a single set of coordinates.
One often uses polar coordinates θ, ϕ, but ϕ is ill-defined at the two “poles” θ = 0 and
θ = π, in the sense that one cannot assign uniquely a value of ϕ at those two points. A
set of coordinates which does not have any such problem anywhere does not exist on the
sphere.
It is true that in this case one can see the sphere “from outside”, and use coordinates
x , x2 , x3 in R3 . This is however not an “intrinsic” description of the sphere, in the sense
1

that it involves an auxiliary “outer” space. When we want to describe curvature of three-
dimensional space, or even four-dimensional spacetime, using an auxiliary “outer space”
is not going to be practical; moreover, such an “outer space” would have no physical
meaning whatsoever.
Fortunately, mathematicians have developed formal tools to deal with spaces such as
the sphere without having to appeal to any “outer space”. The method is to use different
coordinate systems on different regions. This is rarely really needed in applications of
general relativity, since we have no empirical indication that our Universe has nontriv-
ial “topology” such as in the case of a sphere. Nevertheless, differential geometry, the
field of mathematics where the idea of curvature was first developed, uses this language
extensively, and it is useful sometimes to be able to have a look at its language.
We are now going to give a quick introduction to manifolds in any dimensions.8 We
will soon return to our more physical concerns.

Definition 3.1 A homeomorphism ϕ between two spaces A and B is a continuous bijec-


tive map whose inverse ϕ−1 is also continuous. A and B are then said to be homeomor-
phic.

Definition 3.2 A manifold ( varietà) M of dimension N ≡ dim(M ) is a space which


is locally homeomorphic to RN : namely, such that for every point p ∈ M there exists an
open neighborhood Up of p which is homeomorphic to an open subset of RN .

In this definition, no concept of “derivative” entered. For this reason, we don’t know
yet what it means to take derivatives of functions on a manifold. A possible way is to give
coordinates on M . However, as we remarked earlier, on some space a single coordinate
system is not enough. The remedy for this is to introduce coordinate systems which cover
some open subsets, and then “glue” them together with smooth coordinate changes.
8
What follows is based on my lecture notes on group theory, where the idea of manifold was needed
to define formally a Lie group.

28
Definition 3.3 Let M be a manifold of dimension N . A chart (U, ϕ) is an open set
U ⊂ M with a homeomorphism ϕ from U to a subset of RN . An atlas is a set of charts
{(Ui , ϕi )} such that ∪i Ui = M . The manifold M is said to be a smooth manifold
(varietà liscia) if it has an atlas such that the transition functions

gij ≡ ϕj ◦ ϕ−1
i (3.6.1)

are C ∞ (whenever Ui ∩ Uj is non-empty; otherwise the composition doesn’t exist).

Let us analyze a bit this definition. First of all, instead of considering homeomorphisms
around every point, we have “economized” by covering the space with open sets Ui ; each
of these can serve as a neighborhood for all the points it contains.

Example 3.4 • RN is a manifold, and a smooth manifold.

• The sphere
nN
X +1 o
SN = x2i = 1 ⊂ RN +1 (3.6.2)
i=1
is a smooth manifold. Let us see this explicitly, but for simplicity in the case N = 2,
so that S 2 = {x21 + x22 + x23 } ⊂ R3 . Consider the two open sets:

US = S 2 − (0, 0, 1) , UN = S 2 − (0, 0, −1) . (3.6.3)

S and N stand for South and North: US is the sphere without the “North Pole”
(0, 0, 1), UN is the sphere without the “South Pole” (0, 0, −1). The map ϕS from US
to R2 is the stereographic projection illustrated in figure 9.

Figure 9: Stereographic projection.

A similar map ϕN can be defined from UN to R2 . The Emblem of the United Nations
results from applying ϕN to a subset of the surface of the Earth.

29
Since any point in S 2 has either US or UN (or both) as possible neighborhoods, and
since ϕS and ϕN are homeomorphisms, S 2 is a manifold.
This discussion generalizes easily to S N : again one can define two open sets, this
time US = S N − (0, 0, . . . , 1), and UN = S N − (0, 0, . . . , −1).

Now, you can think of the ϕi as a way of assigning a coordinate system to each Ui : if
you think of a coordinate xm in RN as a map from RN to R, then in the chart Ui we can
also define a map
xm
i : M → R , xm m
i (p) = x (ϕi (p)) ; (3.6.4)
and we can think of this map as of a coordinate on Ui . Here we are using Latin indices (m)
rather than Greek (µ) to emphasize that we are working in arbitrary dimension, rather
than four, and also that we are not using the Lorentzian structure that we emphasized
for special relativity.
On Ui ∩Uj (when it’s not empty), we have then two sets of coordinates, because we can
apply both maps ϕi and ϕj . The transition functions gij in (3.6.1) relate the coordinate
systems xm m N
i and xj . Notice that gij is a map from a subset of R to another, so it makes
sense to demand that it is C ∞ .

3.6.1 Diffeomorphisms∗

Since we now have a coordinate system on M , we can decide when a function is differen-
tiable:

Definition 3.4 A function f : M → R is C ∞ or smooth in p ∈M when, if (U, ϕ) is the


chart that contains p, the function f ◦ ϕ−1 is C ∞ . The space of C ∞ functions on M will
be denoted C ∞ (M ).

Again here ◦ is the composition of maps (from right to left). The trick is that ϕ−1 takes
us from a subset of Rn to M , and then f from M to R; so f ◦ ϕ−1 goes from a subset
of RN to R, and we can decide whether it’s differentiable or not. This just amounts to
taking derivatives in the coordinate system xm defined by the chart (U, ϕ).

Definition 3.5 Let M1 and M2 be two smooth manifolds. A map f from M1 to M2 is


defined to be C ∞ , or smooth, at p ∈ M1 if, calling (U, ϕ) the chart that contains p ∈ M1
and (V, ψ) the chart that contains f (p) ∈ M2 , the map ψ ◦ f ◦ ϕ−1 is C ∞ .
A map f : M1 → M2 is a diffeomorphism if it is bijective, and if f and f −1 are
both C ∞ everywhere.

30
Again the idea is that ψ ◦ f ◦ ϕ−1 is a map from a subset of Rn to another, and so we can
decide whether it is C ∞ by taking partial derivatives.
We have had to define a diffeomorphism after having defined a smooth manifold,
whereas we defined a homeomorphism before defining a manifold. We couldn’t have
defined a diffeomorphism before defining a smooth manifold, because we would have had
to decide what “differentiable” means! Whereas we know what “continuous” means as
soon as we have a topology.
When M1 = M2 = M , a diffeomorphism is simply a smooth map of M into itself. In
each local chart, such a map can be expressed as

x′m = x′m (x1 , . . . , xN ) . (3.6.5)

3.7 Vectors on manifolds


In section 3.3 we defined a vector field in R4 . With (3.3.2) made sure that it was indepen-
dent of the choice of coordinates. This suggests how to define a vector field on a smooth
manifold: we can just define it on each Ui (which is after all homeomorphic to an open set
in RN ), and then make sure that the definitions we have given coincide on each Ui ∩ Uj .
This leads to the following definition:

Definition 3.6 A vector field on a manifold is given by a vector field vi on each Ui , such
that on each intersection Ui ∩ Uj we have vi = vj . In terms of coordinates, this reads
vim ∂x∂m = vjm ∂x∂m , or in other words
i j

∂xm
i
vim = vn .
n j
(3.7.1)
∂xj

3.7.1 Vectors as infinitesimal diffeomorphisms∗

Mathematicians however like being able to give definitions which are as independent from
coordinate systems as possible. For example we saw in section 3.3 that we can view a
vector field as an linear first-order differential operator (see (3.3.4)). We can define such
an operator in a way that does not refer in any way to a coordinate system. Consider
a family of maps (or “diffeomorphisms”, as we learned) ϕt from M to itself, depending
smoothly on a real parameter t, such that ϕ0 = Id, the identity map. Now we can define
an associate differential operator Dϕ as follows: it takes a function f (p) on M , and it

31
gives another function Dϕ [f ](p) defined by

f (ϕδt (p)) − f (p)


Dϕ [f ](p) ≡ lim . (3.7.2)
δt→0 δt
This looks just like the definition of a derivative. In fact:

Example 3.5 Let M = R, and ϕt the family of diffeomorphisms given by translations:


ϕt (x) = x + t. Then we see that Dϕ [f ](x) ≡ limδt→0 f (x+δt)−f
δt
(x)
= f ′ (x).

More generally, if we have a family of maps that infinitesimally looks like xµ → xµ + v µ (x)
in a certain coordinate system, the associated Dϕ will simply be v µ (x)∂µ , which is our
older definition (3.3.4).
Clearly, in the family of diffeomorphisms, only the ones very close to the identity
count in the definition (3.7.2): namely the maps ϕδt , for δt small. You can think of this as
another manifestation of the “small displacement” way of thinking about a vector field.
From this point of view we can also describe the meaning of the Lie bracket (3.3.10)
of two vector fields v and w (see figure 8). From a point p, we can follow first w for a
time δt to a nearby point p′ , and then from that point we can follow v. Or we can do the
opposite: first follow v to a nearby point p′′ , and then from that point we can follow w. In
general we will not end up in the same point, but in two different points. The difference
of these two points is a “small displacement” that is the value of [v, w] at p.
To conclude, we mention another possible point of view on vectors. We can view
the value of a vector field at a point as the velocity of an infinitesimal trajectory going
through it. If we think about a surface in R3 (such as a sphere), such a velocity vector is
tangent to the sphere. For this reason the value of a vector field at a certain point p is
said to belong to the tangent space Tp at that point. In pictures you will often see vectors
depicted this way.

3.8 Metrics on manifolds


A metric on a manifold can be defined in a similar way as in definition 3.6:

i
Definition 3.7 A metric on a manifold is given by a metric gmn on each Ui , such that
2 2
on each intersection Ui ∩ Uj the line elements dsi = dsj . In terms of coordinates, this
i
reads gmn dxm n j m n
i dxi = gmn dxj dxj , or in other words

i j
∂xpj ∂xqj
gmn = gpq . (3.8.1)
∂xm
i ∂xi
n

32
Let us see a simple example.

Example 3.6 Let us consider S 2 ⊂ R3 . In R3 we have polar coordinates (r, θ, ϕ):

x1 = r sin(θ) cos(ϕ) , x2 = r sin(θ) sin(ϕ) , x3 = r cos(θ) . (3.8.2)

On S 2 , which is now defined as the locus {r = 1}, (θ, ϕ) are only good coordinates on the
intersection UN ∩ US .
The usual Euclidean metric on R3 , ds2 = dx21 + dx22 + dx23 , reads in these coordinates

ds2 = dr2 + r2 (dθ2 + sin2 θdϕ2 ) . (3.8.3)

If we now restrict to the locus {r = 1}, which defines S 2 , we see that the metric is defined
by the line element
ds2S 2 = dθ2 + sin2 θdϕ2 . (3.8.4)
In other words, in the coordinate system (θ, ϕ), the metric reads
!
1 0
gmn = . (3.8.5)
0 sin2 (θ)

In θ = 0 and θ = π, this quadratic form becomes singular! However, notice that we said
from the beginning that the coordinates (θ, ϕ) are only valid on the open set UN ∩ US , or
in other words for θ ∈ (0, π).
A coordinate system that covers all UN can be obtained using the the stereographic
projection in figure 9. Namely, one can use the coordinates (x, y) on the plane R2 as
coordinates on the UN . In terms of the previous coordinates (θ, ϕ), these read
   
θ θ
x = tan cos(ϕ) , y = tan sin(ϕ) . (3.8.6)
2 2
The line element (3.8.4) in these coordinates reads
dx2 + dy 2
ds2 = 4 ; (3.8.7)
(1 + x2 + y 2 )2
the corresponding gmn = 4(1 + x2 + y 2 )−2 ( 10 01 ) is also called Fubini–Study metric. The
point θ = 0 corresponds to (x = 0, y = 0); notice that at this point the metric is now
perfectly non-degenerate.

This example teaches us that, before concluding that a metric has a problem at a cer-
tain point or locus, we should first make sure that we are working in the right coordinates.
This lesson will be very useful when we will consider black holes.

33
4 Curvature
In the previous section, we have seen that nonlinear coordinate changes lead us to consider
general point-dependent metrics. We will now see what it means for such metrics to be
curved.

4.1 Origin of the idea of curvature


Before we get into the formalism we will eventually use, let me start with an intuitive and
informal discussion that touches on some concepts used in other fields.
At an intuitive level, the word “curved” simply means “not straight”. For a trajectory
in R2 or R3 , we can quantify this intuition in several different ways. For example we can
look at the local radius of curvature; this can vary from point to point. Traditionally,
what is called “curvature” in this context is the inverse κ of the radius of curvature, so
that κ = 0 when the radius of curvature is infinite (that is, when the trajectory is going
straight).

Figure 10: The “radius of curvature” R for a trajectory at a point p.

How should we define the curvature of a surface? Intuitively, one might think that
any object on which a trajectory is curved should be considered curved. To see that this
is too naive, consider a cone. This looks curved, but if we have a paper cone and we cut
curved both curved
it appropriately, we can make it flat (seestraight
figure 11).9 The same holds for a cylinder. On
the other hand, a sphere cannot be flattened in this fashion, as you convince yourself by
trying to cut open a ping-pong ball, say.
9
This statement is slightly imprecise at the tip of the cone, where the definitions we will introduce
later would in fact say that the cone is infinitely curved. At any other point, however, the statement is
correct.

34
These two examples show that our naive definition of curvature is not “intrinsic” to
the surface: it also has something to do with the way the surface is “realized” (one says
embedded ) inside an “outer space” R3 . We introduced the idea of manifold precisely to
be able to describe spaces in an intrinsic fashion, without any reference to an outer space.
Let me stress again that this is essential for physics: we want to be able to say that
spacetime is curved, without having to refer to an unphysical “outer space” for which
there is no evidence.

Figure 11: A cone and a cylinder look curved, but are in fact flat.

Fortunately, Gauss found that there is a definition of curvature that is “intrinsic”,


in the sense that it does not depend on the embedding. First consider all the possible
trajectories through a point p, and consider their curvatures. There are infinitely many
such trajectories, one for each outgoing direction from p. For a surface, we can consider
the minimum κ1 and the maximum κ2 of these curvatures. Now, the Gaussian curvature
is defined as κ1 κ2 . Gauss’ Theorema Egregium states that this is independent of the
embedding of the surface.
Indeed we see in figure 12 that a cylinder has κ1 = 0, so that its Gaussian curvature is
also 0; this is in agreement with our earlier observation in figure 11. On the other hand,
both κi ̸= 0 for the sphere, confirming that it is intrinsically curved.
Applications of this idea abound. For example, if we have a flat surface and we curve
it in one direction, it will have to remain flat in the other direction. This is useful in
making corrugated iron, cardboard, and even in eating pizza slices.
There are higher-dimensional generalizations. One basically takes an embedding in

35
R

curved straight both curved

Figure 12: Gaussian curvature is the product of the two principal curvatures κ1 and κ2 . For a
cylinder κ1 = 0, so that Gaussian curvature is zero, in agreement with figure 11. For a sphere
Gaussian curvature is non-zero.

RN , and one approximates the surface around p by a quadratic equation. One then
extracts a quadratic form from this quadratic equation; its determinant is the Gaussian
curvature.
This Gaussian curvature shows that there are ways to define curvature “intrinsically”,
or in other words even if one cannot view the surface or manifold from an “outer space”.
However, the definition we have seen so far is given in terms of such an outer space, even
if it is later proved to be independent of such an embedding. It would be nicer to have
definitions that do not require an embedding at all. For this, we will have to work a little
harder.

4.2 Covariant derivative


We will now give a more modern definition of curvature, that can be more easily adapted
to higher dimensions. The idea will basically to generalize the observation in figure 13;
namely, that on a curved space one cannot carry around a vector while keeping it “always
in the same direction”, or “parallel”. If one tries to do so along a closed path, one ends
up in general with a vector that does not coincide with the one one started with.
We first have to decide what it means to keep a vector “always in the same direction”.
A possible first guess is to keep it constant. This does not make sense, however: if the
entries v µ of a vector are constant in one coordinate system xµ , its entries in another
′µ
coordinate system x′µ will be v ′µ = ∂x
∂xν
v ν , which will in general not be constant.
Another way to see the problem is to recall that ∂µ v ν is not a vector, as we remarked
in section 3.4. If v ν is constant in a coordinate system xµ , then ∂µ v ν = 0. If ∂µ v ν were a
tensor, from the tensor transformation law (3.4.6) it would follow that ∂µ′ v ′ν = 0 in any

36
Figure 13: Transporting a vector around a curved space brings it back rotated. The vector is
depicted as a little flag, which we transport along the big triangle; by the end of the journey, in
spite of our efforts to keep the flag “always pointing in the same direction”, we end up with a
flag that points east instead of north.

other coordinate system x′ν . Since ∂µ v ν is not a tensor, that conclusion is not valid.
Most of you have already encountered a situation where a derivative did not transform
well. Consider an electrically charged quantum field ψ. Under gauge transformations

Aµ → Aµ + ∂µ λ (4.2.1)

it transforms as ψ → eiλ(x) ψ. (If you prefer, you can run this argument almost verbatim
with ψ being a wave function.) The derivative ∂µ ψ transforms as ∂µ (eiλ ψ) = eiλ (∂µ ψ +
i∂µ λψ). Thus, if ∂µ ψ vanishes, it will not vanish any more after a gauge transformation.
This creates problems in writing gauge invariant Lagrangians. The way to repair this
problem is to define
Dµ ψ ≡ ∂µ ψ − iAµ ψ , (4.2.2)
so that now
Dµ ψ → eiλ (∂µ + i∂µ λ − iAµ − i∂µ λ)ψ = eiλ Dµ ψ . (4.2.3)
Thus Dµ ψ transforms “in the same way” as ψ itself; one says that Dµ is a “covariant”
derivative.
A further generalization of this concept appears for Yang–Mills theories, where Aµ =
T a Aaµ is a vector which takes values in a Lie algebra: each of the T a is a matrix, such that
the commutator of two of them closes on a third: [T a , T b ] = fcab T c , where the numbers
fcab are called structure constants. The covariant derivative of a field will look like

(Dµ ψ)i ≡ ∂µ ψ i + Aiµ j ψ j . (4.2.4)

37
All this motivates us to look for a derivative of a vector which is similarly covariant.
We can just try to add to the derivative an analogue of the four-vector electro-magnetic
potential Aµ , or of its Yang–Mills generalization Aiµ j . In our case, the field on which we
want to act is a vector v µ : the index i is replaced by an index µ. So the potential we
should use has the form Aνµ ρ . Traditionally one actually uses the symbol Γ rather than
A; also, one uses in this context the symbol ∇ rather than the symbol D. So we are led
to define the covariant derivative of a vector as

∇µ v ν ≡ ∂µ v ν + Γνµρ v ρ . (4.2.5)

Γνµρ is called a connection. (In fact, mathematicians use the same word also for the electro-
magnetic and Yang–Mills potentials we reviewed above.) The fact that it has three indices
might be a bit confusing, but remember that these indices have a different role and origin:
the lower-left one has to do with the derivative, the other two are the entries of a matrix.
Perhaps it would be clearer (but clumsier) to write it as

(Γµ )νρ , (4.2.6)

which would emphasize its similarity to the Yang–Mills potential Aiµ j .


Let us now check that (4.2.5) indeed transforms well under diffeomorphisms (coordi-
nate changes), which was what motivated the definition in the first place. We need to
postulate an appropriate “gauge transformation”; we include a first term that would be
present if Γ were a tensor, and a second additional term (similar to ∂µ λ in (4.2.1)) to
cancel the non-tensorial piece in (3.4.8):

∂xλ ∂xτ ∂x′ν σ ∂ 2 x′ν


 
ν ′ν
Γµρ → Γ µρ = Γ − . (4.2.7)
∂x′µ ∂x′ρ ∂xσ λτ ∂xλ ∂xτ

With this transformation indeed we have


∂xρ ∂x′ν
∇µ v ν → ∇′µ v ′ν = ∇ρ v σ . (4.2.8)
∂x′µ ∂xσ
Thus we can call (4.2.5) covariant derivative of the vector v µ . We can define more generally
a covariant derivative in the direction of any vector:

∇w v µ = wν ∇ν v µ . (4.2.9)

An alternative way of thinking about ∇ is the following. Recall that v µ can be


thought of as the coefficients of a linear first-order differential operator v = v µ ∂µ . This is

38
coordinate-invariant, in the sense that v = v ′µ ∂µ′ . We can now define ∇µ so that it is the
usual derivative ∂µ on the coefficients v ν , and so that

∇µ ∂ν = Γρµν ∂ρ . (4.2.10)

Then

∇µ (v ν ∂ν ) = (∂µ v ν )∂ν + v ν (Γρµν ∂ρ ) = (∂µ v ν + Γνµρ v ρ )∂ν = (∇µ v ν )∂ν . (4.2.11)

This point of view can be clearer sometimes, especially in the context of the tetrad for-
malism (which we consider in appendix B).
A connection gives us a way to take a covariant derivative of a vector. So, even if
the expression “constant vector” doesn’t really make sense, it makes sense to define a
covariantly constant vector as a v µ such that ∇µ v ν = 0.
Another way of saying this is that on a curved space there is no natural way of
comparing one tangent vector v at a point p to another tangent vector v ′ at another
point p′ . In flat space, we would perform this comparison by “transporting” v ′ at p while
keeping it “parallel”. On a general manifold, this “parallel transport” is not obvious, and
corresponds in fact to the choice of connection Γ. To see this even more clearly, let us
suppose a vector field v ν has the value v0ν at a point with coordinates xµ : v ν (x) = v0ν .
Then the vector field will be covariantly constant in the direction δxµ if at the point
xµ + δxµ it is equal to
v ν (xµ + δxµ ) = v0ν − δxµ Γνµρ v0ρ . (4.2.12)
Indeed we can see that in this situation δxµ ∇µ v ν = 0 (at the point xµ ). If we wanted, we
might even define the covariant derivative this way: the coefficients Γ tell us exactly in
which way a vector has to rotate (in a given coordinate system) for it to be considered
“parallel”, or “covariantly constant”. A finite version of (4.2.12) would be this: if along
a curve γ the vector v is such that
∇X v = 0 (4.2.13)
for X tangent to γ, then we say that v has been “parallel transported” along γ. For
example, the flag in figure 13 has been parallel transported along the triangular curve.
Conversely, we can think of the covariant derivative as the infinitesimal version of parallel
transport.
On a manifold which can be covered by a single patch, we can choose Γ in any way we
like in a certain coordinate system, and then define it via (4.2.7) on any other coordinate
system. On a manifold which has to be covered by more than one patch (such as a sphere),

39
however, at this point it is not clear whether a Γ that transforms as in (4.2.7) exists at
all; or, if it exists, if it is unique, or if there are many. We will see in section 4.4 that the
latter is the case, and also that there is a distinguished Γ, which is the one we implicitly
used in figure 13.

4.3 Covariant derivatives of other tensors


Before we investigate existence and uniqueness of a connection Γ, let us also define covari-
ant derivatives of other objects. Let us start with one-forms. As we have seen in (3.4.9),
the antisymmetric derivative ∂[µ ων] is already a tensor, but the non-antisymmetrized
derivative ∂µ ων is not. A priori one might imagine to use a different connection Γ̃. It
would be nice, however, to have a covariant derivative that obeys the Leibniz identity.
Consider the “inner product”
ω(v) = ωµ v µ . (4.3.1)
This is a function: the fact that the indices are saturated signals that it transforms without

any factors of ∂x∂x
. Thus its derivative ∂ν (ωµ v µ ) is already automatically a tensor. The
Leibniz identity would say

∂ν (v µ ωµ ) = (∇ν v µ )ωµ + v µ (∇ν ωµ ) . (4.3.2)

We know already ∇ν v µ ; so we can define ∇ν ωµ so that (4.3.2) holds. This gives

∇µ ων = ∂µ ων − Γρµν ωρ . (4.3.3)

Notice the similarity with (4.2.5); again µ has to do with the direction where the derivative
is being taken. Once again we can regard (4.3.3) to arise from ω = ωµ dxµ and

∇µ (dxν ) = −Γνµρ dxρ , (4.3.4)

just like in (4.2.10).


In a similar way one can define covariant derivatives of any tensor:

∇µ Tνµ11...ν
...µk
l
= ∂µ Tνµ11...ν
...µk
l

+ Γµµρ1 Tνρµ 2 ...µk


1 ...νl
+ Γµµρ2 Tνµ11...ν
ρµ3 ...µk
l
+ . . . + Γµµρk Tνµ11...ν
...µk−1 ρ
l
(4.3.5)
− Γρµν1 Tρν
µ1 ...µk
2 ...νl
− Γρµν2 Tνµ11ρν...µ k
3 ...νl
− . . . − Γρµνl Tνµ11...ν
...µk
l−1 ρ
.

The formula is complicated, but the idea is clear: on each upper index Γ acts as in (4.2.5),
on each lower index as in (4.3.3).

40
4.4 Levi-Civita connection
We have seen that a connection Γ defined as an object that transforms as in (4.2.7), allows
to define a covariant derivative. How many such objects exist on a manifold? Which one
should we use to define the notion of curvature we want for gravity?
Suppose first we have two connections Γ1 and Γ2 . Both of them will transform as in
(4.2.7). It follows that their difference transforms like a (1, 2)-tensor:

∂xλ ∂xτ ∂x′ν


(Γ1 − Γ2 )νµρ → (Γ1 − Γ2 )′νµρ = (Γ1 − Γ2 )σλτ , (4.4.1)
∂x′µ ∂x′ρ ∂xσ
simply because the second term in (4.2.7) cancels out. This means that, given a connection
Γ, we can build infinitely many others by adding to it a (1, 2)-tensor. (In mathematics,
this is often called an “affine space”: it is a set whose differences are in a vector space.)
It remains to see if there is exists any such object at all. We will find one by imposing
some “minimality” condition.
The first condition comes by recalling that the antisymmetric derivative ∂[µ ων] is al-
ready a tensor. As far as this combination is concerned, introducing the covariant deriva-
tive was an overkill. This suggests that one should impose that the antisymmetrized
covariant derivative is the same as the antisymmetrized ordinary derivative:

∇[µ ων] = ∂[µ ων] . (4.4.2)

Looking at (4.3.3), we see that this imposes

µ
Tνρ ≡ Γµ[νρ] = 0 . (4.4.3)

T is called torsion;10 from (4.2.7) it is immediate to see that it is a tensor. Let me


stress again that having vanishing torsion is not a mathematical necessity; it is just a
“minimality” condition that we are imposing in order to single out a preferred connection.
Notice moreover that an analogue of (4.4.2) holds more generally for k-form: ∇[µ1 ων2 ...νk ] =
0. Again this plays well with our comment about k-forms after (3.4.10).
So far Γ has nothing to do with a metric. Given a metric gµν , a natural condition to
impose is that it is covariantly constant, in the sense of (4.3.5):

0 = ∇µ gνρ = ∂µ gνρ − Γσµν gσρ − Γσµρ gνσ = ∂µ gνρ − 2Γσµ(ν gρ)σ . (4.4.4)
10
To gain an intuitive understanding of what torsion is, go back to (4.2.12). Parallel transport v ν
along δxµ = wµ ; then exchange roles, and parallel transport wν along δxµ = v µ ; the difference is exactly
(Γνµρ − Γνρν )v µ wρ .

41
The Levi-Civita connection ΓLC is a Γ such that both (4.4.3) and (4.4.4) hold; in
other words, a connection that has no torsion and under which the metric is covariantly
constant (or, as we often say, which is “compatible” with the metric). We will show that
it exists by determining it explicitly. Let us rewrite (4.4.4):

∂µ gνρ = 2Γσµ(ν gρ)σ = 2Γ(ρν)µ = Γρµν + Γνµρ . (4.4.5)

We lowered the index on Γ by putting it first: Γµνρ ≡ gµσ Γσνρ . Symmetrizing (4.4.5) in µ
and ν, and using (4.4.5) again:
1
∂(µ gν)ρ = Γρµν + Γ(µν)ρ = Γρµν + ∂ρ gµν . (4.4.6)
2
Comparing the first and last expression in this equality we have determined Γρµν , as
promised. Going back to Γµνρ = g µσ Γσνρ :

1
Γµνρ = g µσ (∂ν gρσ + ∂ρ gνσ − ∂σ gνρ ) . (Levi-Civita connection.) (4.4.7)
2

So now we know that at least one connection ΓLC exists, and we have an expression
for it in terms of the metric. By our observation around (4.4.1), we also know that there
are infinitely many others, obtained by summing a tensor δΓ to ΓLC . In what follows, we
will focus on the Levi-Civita and forget about the tensor δΓ. Physically this latter tensor
is just some extra matter field that can always be introduced later, if we find a motivation
(experimental or theoretical) to do so; it is not conceptually different from introducing
any other field in our action.

4.5 The Riemann tensor


We still haven’t seen how to define curvature. Let us go back for a moment to our analogy
with the electromagnetic field. Recall that the four-vector potential Aµ was an inspiration
for us to introduce the concept of connection Γµνρ .
In electromagnetism, the field-strength is actually expressed by Fµν = 2∂[µ Aν] . One
way this occurs is if we take two covariant derivatives: from (4.2.2) we get

[Dµ , Dν ]ψ = 2D[µ Dν] ψ = Fµν ψ . (4.5.1)

This indicates that the effect of two covariant derivatives doesn’t commute, unlike for two
ordinary partial derivatives: ∂[µ ∂ν] ψ = 0. Notice that the result does not contain any
derivative of ψ any more.

42
What happens if we do the same with covariant derivatives? Let us first compute

∇µ ∇ν v ρ = ∇µ (∂ν v ρ + Γρνσ v σ )
= ∂µ (∂ν v ρ + Γρνσ v σ ) + Γρµλ (∂ν v λ + Γλνσ v σ ) − Γλµν (∂λ v ρ + Γρλσ v σ ) (4.5.2)
= ∂µ ∂ν v ρ + (∂µ Γρνσ )v σ + Γρµλ Γλνσ v σ + 2Γρ(µ|λ ∂|ν) v λ − Γλµν (∂λ v ρ + Γρλσ v σ ) .

In the second step we have used the definition of a covariant derivative for a tensor,
(4.3.5). Notice the index symmetrization in one of the terms: recall footnote 7. We can
now compute the anticommutator of two covariant derivatives, such as in (4.5.1). This is
again antisymmetric in µ and ν, so many terms in (4.5.2) disappear. If we define

Rρ σµν = 2(∂[µ Γρν]σ + Γρ[µ|λ Γλν]σ ) = ∂µ Γρνσ − ∂ν Γρµσ + Γρµλ Γλνσ − Γρνλ Γλµσ , (4.5.3)

we have
[∇µ , ∇ν ]v ρ = 2∇[µ ∇ν] v ρ = Rρ σµν v σ . (4.5.4)
Very similarly to (4.5.1), there are no derivatives of v any more.
Rµ νρσ in (4.5.3) is called Riemann tensor. As the name says, it transforms as a
tensor, in spite of being defined in terms of the connection Γ. This can be seen very
laboriously from its definition (4.5.3), but it follows more quickly from (4.5.4), since the
left-hand side is a tensor by construction. Just like Fµν is gauge-invariant while Aµ is not,
so Rµ νρσ is a tensor while Γ is not. The analogy the field-strength of a non-abelian gauge
field is even more convincing: recall that it is defined as

Fµν = 2(∂[µ Aν] + A[µ Aν] ) = 2∂[µ Aν] + [Aµ , Aν ] , (4.5.5)

which is formally identical to (4.5.3) once we remember that F and A are matrices. In
other words, (4.5.3) is a particular case of (4.5.5) where the indices of the matrices coincide
with spacetime indices.
A derivative in one direction is an infinitesimal difference; a covariant derivative is the
infinitesimal version of the parallel transport whose intuitive meaning is shown in figure
13 (see discussion around (4.2.13)). The fact that ∇µ and ∇ν don’t commute suggest
that if we parallel transport a vector first in direction µ and then in direction ν, or first in
direction ν and then in direction µ, we won’t obtain the same result; the Riemann tensor
in (4.5.4) should measure this. Let us see this explicitly. Consider a vector v ρ at a point
p with coordinates xµ ; if we parallel transport it along direction µ by an amount ϵ, we
get to a point p′ ; according to (4.2.12), the parallel transported vector at that point is

v0ρ − ϵΓρµσ v0σ . (4.5.6)

43
Let us now parallel transport this along direction ν; applying again (4.2.12) we get

v0ρ − ϵΓρµσ (p)v0σ − ϵ′ Γρνσ (p′ ) v0σ − ϵΓσµλ (p)v0λ


 
(4.5.7)

Since p′ has coordinates xρ + ϵδµρ , Γρνσ (p′ ) ∼ Γρνσ (p) + ϵ∂µ Γρνσ (p). So (4.5.7) reads

v0ρ − (ϵΓρµσ + ϵ′ Γρνσ )v0σ − ϵϵ′ (∂µ Γρνσ − Γρνλ Γλµσ )v0σ . (4.5.8)

Let us now perform the same operation in reverse order, that is first in direction z and
then in direction w, and subtract the result from (4.5.8); see also footnote 6. We get

ϵϵ′ Rρ σµν v0σ . (4.5.9)

As we expected from (4.5.4) measures how much parallel transport does not commute. It
can also be interpreted as the rotation of a vector parallel transported along a rectangle
of area ϵϵ′ spanning the plane µν. This is the infinitesimal version of the phenomenon
shown in figure 13.
Thus the Riemann tensor measures curvature. Since it is a tensor, it does so in a
coordinate-invariant way. If it is non-zero in a coordinate system, it will be non-zero in
any coordinate system, since it transforms like a tensor! Given a non-trivial metric, if its
Riemann tensor is non-zero, it cannot be equal to η in any coordinate system. For the
Rindler metric (3.2.4), on the other hand, the Riemann tensor is zero. [Exercise!]

4.6 Properties of the Riemann tensor


We will now derive some properties of the Riemann tensor.
First, we know already by the definition (4.5.3) that it is antisymmetric in the last
two indices:
Rµνρσ = −Rµνσρ . (4.6.1)

Next, let us remark that there are analogues of the property (4.5.4) on any tensor. Let
us see this on a one-form ω. It would be possible to repeat the steps that led to (4.5.4), but
it is more fun to proceed as follows. First, since ωρ v ρ is a function, ∇ν (ωρ v ρ ) = ∂ν (ωρ v ρ ).
Because of the zero torsion property, then we have

∇[µ ∇ν] (ωρ v ρ ) = ∂[µ ∂ν] (ωρ v ρ ) = 0 . (4.6.2)

On the other hand, using the Leibniz property (4.3.2) of ∇, we get

∇[µ ∇ν] (ωρ v ρ ) = (∇[µ ∇ν] ωρ )v ρ + ωρ (∇[µ ∇ν] v ρ ) . (4.6.3)

44
(The terms with first derivatives cancel out because of the index antisymmetrization.)
Thus we obtain
2∇[µ ∇ν] ωρ = −Rσ ρµν ωσ . (4.6.4)
If we antisymmetrize the index ρ as well, we get −Rσ [ρµν] ωσ = ∇[µ ∇ν ωρ] ; recall from
(4.4.2) that ∇[ν ωρ] = ∂[ν ωρ] , and from our comment below (4.4.2) that the same is true
for k-forms; so in fact ∇[µ ∇ν ωρ] = ∇[µ ∂ν ωρ] = ∂[µ ∂ν ωρ] , which is zero because partial
derivatives commute. So Rσ [ρµν] ωσ = 0; since this has to be true for any ω, it follows that

Rσ [ρµν] = 0 , (4.6.5)

which is sometimes called the first Bianchi identity.


There are analogues of (4.5.4) and (4.6.4) for any tensor; it is easy to see the pattern.
For example, for a (0, 2) tensor:

2∇[µ ∇ν] Tρσ = −Rλ ρµν Tλσ − Rλ σµν Tρλ . (4.6.6)

For example, if we apply this to the metric gρσ :

2∇[µ ∇ν] gρσ = −Rσρµν − Rρσµν . (4.6.7)

The left-hand side vanishes because of (4.4.5); we have lowered indices with the metric,
as usual. Thus:
Rµνρσ = −Rνµρσ , (4.6.8)
which is of course similar to (4.6.1).
We will now derive one last algebraic property. Start by rewriting (4.6.5) as

Rµνρσ = 2Rµ[σρ]ν . (4.6.9)

Now antisymmetrize indipendently the indices µ and ν, and use (4.6.1), (4.6.1):

Rµνρσ = 2R[µ|[σρ]|ν] = 2R[σ|[µν]|ρ] = Rσρνµ ; (4.6.10)

or in other words
Rµνρσ = Rρσµν . (4.6.11)
So the Riemann tensor is symmetric under exchange of the first pair of indices and the
second pair.
We can use the algebraic conditions we have obtained so far to count the number of
independent components of the Riemann tensor. We can consider it as a matrix RIJ ,

45
where each index I, J is actually a pair of antisymmetrized ordinary indices. If the
2
spacetime dimension is d, this gives d2 = 14 d2 (d − 1)2 . Moreover, we have to take into
account the first Bianchi identity (4.6.5), which has d × d3 components. All in all this


gives
1 2 1 1
d (d − 1)2 − d2 (d − 1)(d − 2) = d2 (d2 − 1) (4.6.12)
4 6 12
components. In d = 4, this means 20 independent components.
Finally, let us apply (4.6.6) to ∇ρ ωσ , where ω is any 1-form:

2∇[µ ∇ν] ∇ρ ωσ = −Rλ ρµν ∇λ ωσ − Rλ σµν ∇ρ ωλ . (4.6.13)

If we antisymmetrize the index ρ as well, one of the term on the right-hand side vanishes
because of (4.6.5):
2∇[µ ∇ν ∇ρ] ωσ = −Rλ σ[µν ∇ρ] ωλ . (4.6.14)
On the other hand, we can act on an extra derivative on (4.6.4) to compute

2∇[µ (∇ν ∇ρ] ωσ ) = ∇[µ (−Rλ σ|νρ] ωλ ) = −∇[µ Rνρ] λσ ωλ − Rλ σ[νρ ∇µ] ωλ . (4.6.15)

Comparing this with (4.6.14), we see that

∇[µ Rνρ]λσ = 0 . (4.6.16)

This is a differential condition, unlike the algebraic ones we have derived so far. It is
called the second Bianchi identity, or sometimes just Bianchi identity.

4.7 Geodesics
Another possible way to define a curved space is that “there is no straight line”. What
does “going straight” mean when the line element ds2 = gµν dxµ dxν depends on the point?
In flat space, straight lines minimize distance. As we reviewed in section 2, straight
paths in spacetime maximize proper time difference. For a general gµν , we might try to
generalize this definition. The proper time difference of an infinitesimal displacement in
spacetime is
√ √ √
dτ = −ds2 = ∂λ −ds2 dλ = −ẋ2 dλ , ẋ2 ≡ gµν ẋµ ẋν (4.7.1)
µ
where ẋµ ≡ dx

. Here λ is just some coordinate parameterizing the worldline (the trajec-
tory in spacetime). Summing many such contributions along a path, we define a geodesic
in such a way that
Z
p
Z √
S = −m dλ −gµν ẋ ẋ = −m dλ −ẋ2
µ ν (4.7.2)

46
is minimized. Notice that for g = η this is simply the action for a relativistic particle
(2.1.9). (In that equation ẋ2 = ẋi ẋi , whereas in our current definition (4.7.1) ẋ2 also
includes time.) So (4.7.2) is simply the action for a free massive particle in a general
metric. As we have anticipated and will see in the next section, the physical meaning
of the metric will be the gravitational field. So in fact this will become the action for a
particle only subject to gravitational field, or in other words in free fall.
We can obtain equations of motion for xµ (λ) from (4.7.2) in the usual way, by varying

the action. Since δ f = 2√1 f δf , we can just vary the argument of the square root:
Z Z
1 2 1
δS ∝ √ δ(ẋ ) = √ (δxρ ∂ρ gµν ẋµ ẋν + 2δ ẋρ gρν ẋν ) (4.7.3)
−ẋ 2 −ẋ2
∂λ (ẋ2 )
Z  
1 ρ µ ν ν ν
= √ δx ∂ρ gµν ẋ ẋ − 2∂λ (gρν ẋ ) + gρν ẋ ;
−ẋ2 ẋ2
in the second line we have integrated by parts. The equations of motion are now obtained
by setting the square bracket to zero:
∂λ (ẋ2 )
∂ρ gµν ẋµ ẋν − 2∂λ (gρν ẋν ) + gρν ẋν = 0 . (4.7.4)
ẋ2
So far, λ was an arbitrary coordinate on the worldline, but now that we have obtained
the equations of motion we are allowed to take λ = τ , the proper time. Going back to
(4.7.1), we see that this implies

ẋ2 = −1 (m ̸= 0) . (4.7.5)

The last term in (4.7.4) then drops out:


 
µ µν 1
0 = ẍ + g ∂ρ gσν − ∂ν gρσ ẋρ ẋσ (4.7.6)
2
or in other words, recalling the expression (4.4.7) for the Levi-Civita connection:

ẍµ + Γµρσ ẋρ ẋσ = 0 . (4.7.7)

If we don’t use the choice (4.7.5) and use a completely arbitrary λ, the derivation is
more complicated: we have to use

∂λ (ẋ2 ) = ∂λ gµν ẋµ ẋν + 2gµν ẍµ ẋν = 2gµν ẋµ (ẍν + Γνρσ ẋρ ẋσ ) (4.7.8)

and the equation of motion becomes

ẋ2 δνµ − ẋµ gνλ ẋλ (ẍν + Γνρσ ẋρ ẋσ ) = 0 .



(4.7.9)

47
The matrix in the first bracket has a kernel, spanned by ẋν . So we get a slight general-
ization of (4.7.7):
ẍµ + Γµρσ ẋρ ẋσ = f ẋµ , (4.7.10)
where f is some function of λ. This looks more general than (4.7.7), but in fact any
solution can be made to solve (4.7.7) after reparameterizing the parameter λ to satisfy
our earlier assumption (4.7.5).
Another point of view on (4.7.8) is that the derivative with respect to τ of (4.7.5)
implies a linear combination of the equations (4.7.7):

ẋ2 = −1 ⇒ 2gµν ẋµ (ẍν + Γνρσ ẋρ ẋσ ) = 0 . (4.7.11)

This implies that one of the (4.7.7) can be eliminated in favor of (4.7.5); this fact will be
very useful later.
We can also use the trick in (2.1.13): consider the action
Z  2 
1 ẋ 2
S̃ = dλ − em . (4.7.12)
2 e

For m ̸= 0, it is classically equivalent to (4.7.2), as one can see by replacing e using its
equation of motion. Now varying (4.7.12) is similar to (4.7.3), but without the last term;
we hence get back to (4.7.7). So we could have obtained (4.7.7) without having to deal
with the square root. A more concrete advantage is that (4.7.12) also makes sense when
m = 0, and shows that (4.7.7) is still applicable in that case. The only difference is that
now
ẋ2 = 0 (m = 0) . (4.7.13)

Let us now analyze the meaning of the equation we have obtained, (4.7.7). A solution
of this should now maximize proper time difference in space time; it deserves the name
of geodesic, even if it is a trajectory in spacetime rather than just in space. We can also
rewrite it as a covariant derivative:

ẍµ + Γµρσ ẋρ ẋσ = ∇U U µ , U ≡ ∂t = ẋµ ∂µ . (4.7.14)

(We have used ẋµ ∂µ = ∂τ in the first term.) So another way to write the equation of
motion is
∇U U = 0 . (4.7.15)
It means that one is trying to move while keeping the direction of motion parallel. This
expresses in equations the intuitive idea of “trying to go straight”.

48
A fun way of thinking about the action (4.7.2) is the following. For simplicity imagine
also that the metric is stationary, namely that gµν do not depend on t. (This notion is
coordinate-dependent, of course.) Choose the gauge x0 = t; for small velocities, we can
R√
expand (4.7.2) as we did after (2.1.9). We will get S ∼ −m −g00 + kinetic terms. The
term we have isolated is present even when the particle is at rest, and can be interpreted
as a potential:

Vgrav = m −g00 . (4.7.16)
This will give us sometimes an intuitive way of thinking about particles trajectories.
However, you should always keep in mind that this potential is dependent on the choice
of coordinates. In a certain coordinate system it might be nontrivial, while in another it
might be even constant!

Example 4.1 [Rindler geodesics.] Let us compute the geodesics for the Rindler met-
ric of Sec. 3.2, with a = 1 for simplicity: ds2 = e2ξ (−dη 2 + dξ 2 ) + dy 2 + dz 2 . (Of course
we actually know them already in the original Minkowski coordinates.) We will keep y
and z constant, and focus on what happens in η and ξ.
First of all let us consider light-like geodesics (m = 0). Since we are looking at two
directions only, there is a trick: we can impose that vectors tangent to the geodesic should
have zero norm. In other words, ds2 = 0 along these gedesics. So we simply get η̇ = ±ξ. ˙
So in a η–ξ graph these are simply 45 o lines.
Let us now turn to massive geodesics, m ̸= 0. The action (4.7.2) reads
Z q
dτ e η̇ 2 − ξ˙2 .
ξ
(4.7.17)

Its equations of motion read


 
∂τ e2ξ η̇ = 0 , ∂τ e2ξ ξ˙ = −1 .

(4.7.18)

(4.7.5) also gives us


−η̇ 2 + ξ˙2 = −e−2ξ . (4.7.19)
This has the advantage of being a first-order equation. If we take its first derivative and
combine it with the first in (4.7.18), we obtain the second. (This is basically because of
(4.7.11).) So we can work directly with (4.7.19) and the first in (4.7.18). Moreover, the
latter can be rewritten as a conservation law:

e2ξ η̇ = −E (4.7.20)

49
where E is some constant. We will see later that there is a symmetry behind this, and
that E can be interpreted as an energy. Now we have two equations for η̇ and ξ. ˙ If we

are just interested in finding the worldline in spacetime, we can work with dη = η̇ξ̇ , which
gives
dξ p
= 1 − E −2 e2ξ . (4.7.21)

The solution can be found easily by quadrature, and reads

cosh(η − η0 ) = Ee−ξ . (4.7.22)

Figure 14: Geodesics in Rindler space.

We see a graph of some geodesics in figure 14. Massive geodesics go towards positive
ξ almost at the speed of light; then turn back and go towards negative ξ. This is not
surprising if we think about the gravitational potential (4.7.16), which in this case reads
Vgrav = meξ : it is an exponential potential growing in the direction of positive ξ. It is as
if the massive particles hit against a wall and turned back. Those with higher and higher
energies climb to a higher value of ξ before turning back. In the limit where their energy
is very large, they become the massless geodesics, which are simply a line at 45 o .
Thinking about this in a slightly different way, look at the geodesic from p to p′ in
figure 14. A particle going from p to p′ has to maximize proper time; so it prefers going

50
to a region where g00 is higher as quickly as possible, and then it turns back to arrive at
p′ .
Finally, if we consider the alternative coordinate system leading to (3.2.5), we see that
the gravitational potential there is

Vgrav = maZ . (4.7.23)

This linear potential corresponds to a constant force directed towards Z; so this is a uni-
form gravitational field. Thus this coordinate system realizes the first thought experiment
in the introduction.

4.8 Geodesics deviation


We will now see what happens to two nearby geodesics.
Consider first a family of geodesics γs , labelled by a parameter s; in other words,
for each value of s we have a geodesic. Each of these is then parameterized by an affine
parameter λ. Together they describe a “sheet” (a two-dimensional subspace) in spacetime.
Let us define the two vectors U = U µ ∂µ = ∂λ and d = dµ ∂µ = ∂s (for “displacement”).
These two commute:
[U, d] = [∂λ , ∂s ] = 0 . (4.8.1)
(More generally, Frobenius’ theorem says that two vector fields v1 , v2 can define a two-
dimensional subspace inside spacetime if and only if their Lie bracket is a linear combi-
nation of themselves: [v1 , v2 ] = a1 v1 + a2 v2 .)
The Lie bracket is [U, d]ν = U µ ∂µ dν − dµ ∂µ U ν = 0, but thanks to the torsion-free
condition on the Levi-Civita connection we can rewrite this as U µ ∇µ dν − dµ ∇µ U ν = 0;
in other words,
∇U dµ = ∇ν U ρ dν . (4.8.2)
This tells us that how the “displacement” d, the distance between two geodesics, behaves
with time evolution along them. In fact, we can also compute the second derivative:

∇U ∇U dµ = ∇U ∇d U µ = ∇d ∇U U µ + [∇U , ∇d ]U µ . (4.8.3)

In our situation, U is also the velocity field of a family of geodesics, so along the entire
sheet we have (∇U U )ν = U µ ∇µ U ν = 0 (we are taking U to be normalized such that U 2
is constant). Thus the first term vanishes. The second reads

[∇U , ∇d ]U µ = (U ν ∇ν dρ − dν ∇ν U ρ )∇ρ U µ + 2U ν dρ ∇[ν ∇ρ] U µ . (4.8.4)

51
The operator in parenthesis is again nothing but [U, d], so it vanishes. The second part
can be reexpressed in terms of the Riemann tensor; so we conclude

∇U ∇U dµ = Rµ σνρ U ν U σ dρ . (4.8.5)

This tells us that curvature is a measure of how geodesics are “attracted” to each other.

5 Einstein’s equations
At the end of the previous section, we finally determined particles trajectories in a curved
spacetime: this is how curvature affects matter. We will now how matter affects curvature.

5.1 Old derivation


We will assume that the equations transform tensorially under general coordinate changes:
this is the mathematical expression of the equivalence principle.
We will start by comparing the geometric formalism introduced so far with the intu-
ition from Newtonian gravity. Recall that the gravitational potential Vgrav was found in

(4.7.16) to be given by m −g00 , at least in stationary metrics (∂t gµν = 0). We will also
assume that g0i = 0; a stationary metric which satisfies this is called static.
In situations where the gravitational field is weak, we would expect to be able to apply
Newtonian gravity. On the one hand, we would expect the metric to be very close to the
Minkowski one:
gµν ∼ ηµν + hµν , (5.1.1)
√ √
where h is a small (0, 2) tensor. In this regime, −g00 ∼ 1 − h00 ∼ 1 − 12 h00 . On the
other hand, the gravitational potential energy should obey Poisson’s equation

∆Vgrav = 4πmGN ρ , (5.1.2)

where ∆ = ∂i ∂i is the Laplacian, ρ is the mass density and GN is Newton’s constant. So


we have
∆h00 ∼ −8πGN ρ . (5.1.3)

We now want to see if we can somehow promote this equation to one that is covariant
under any coordinate change. The left-hand side suggests that the equation should be the
00 component of some tensor. The right-hand side confirms that: the energy density is in
fact the 00 component of the stress-energy tensor Tµν . This can be defined by Noether

52
procedure; as we will see later, it can also be defined in an alternative way, which makes
it automatically symmetric. It also has the property

∇µ Tµν = 0 , (5.1.4)

which is nothing but conservation of energy and momentum.


So we need some tensor that has two symmetric indices, and that contains a second
derivative of the metric. One tensor that contains a second derivative of the metric is
the Riemann tensor Rµ νρσ , but it has four indices, not two. This suggests that we should
contract two of its indices. We cannot contract the first and second, nor the third and
fourth, because in those indices it is antisymmetric (recall (4.6.8) and (4.6.1)). Thus, let
us define the contraction of the first and third indices:

Rµν ≡ Rρ µρν . (5.1.5)

This is called the Ricci tensor. From (4.6.11) we see that it is symmetric:

Rµν = Rνµ . (5.1.6)

Thus it looks promising. Let us evaluate it under the assumption (5.1.1):

Rµνρσ ∼ 2∂[ρ| Γµ|σ]ν = ∂ν ∂[ρ hσ]µ − ∂µ ∂[ρ hσ]ν , (5.1.7)


1 1
Rµν ∼ 2∂[ρ Γρµ]ν ∼ − □hµν + ∂(µ ∂ ρ hν)ρ − ∂µ ∂ν hρ ρ ; (5.1.8)
2 2
where □ = −∂02 + ∆ is the d’Alembertian. In particular for its zero component
1 1
R00 ∼ − ∆h00 + ∂0 ∂i h0i − ∂02 hii . (5.1.9)
2 2
Since we are in the static assumption, actually
1
R00 ∼ − ∆h00 . (5.1.10)
2
Together with (5.1.2) we see
R00 ∼ 4πGN T00 . (5.1.11)

This suggests that we postulate the equation


?
Rµν = 4πGN Tµν . (5.1.12)

Unfortunately, this cannot be correct: if we take ∇µ of this equation, using (5.1.4) we get

∇µ Rµν = 0 . (5.1.13)

53
This is in general not true. There is, however, a very similar relation, that can be obtained
by contracting (4.6.16) with g µσ g νλ :

2∇µ Rµρ = ∇ρ R , (5.1.14)

where the Ricci scalar


R = Rµ µ (5.1.15)
is the trace of the Ricci tensor. Thus the Einstein tensor
1
Gµν = Rµν − Rgµν (5.1.16)
2
now has the property
∇µ Gµν = 0 , (5.1.17)
similar to (5.1.4).
This suggests that we amend (5.1.12) into

Gµν = αTµν (5.1.18)

for some coefficient α. To determine α, take the trace on both sides: using g µν gµν = 4 and
(5.1.16) we get −R = αT ≡ αTµµ . So our hypothetical equation (5.1.18) can be written
as Rµν = α Tµν − 12 gµν T . Now, in our non-relativistic regime energy transport should


be small with respect to energy density: Tij ≪ T00 . So T ∼ −T00 + Tii ∼ −T00 . Thus the
00 component of our equation now reads R00 = α T00 − 12 T00 = α2 T00 . Comparing again


with (5.1.10) and (5.1.3) we get α = 8πGN . So, to summarize, we conclude


1
Rµν − Rgµν = 8πGN Tµν , (5.1.19)
2
These are Einstein’s equations for gravity. (From now on we will denote GN simply by
G.) As we described, they can also be written as
 
1
Rµν = 8πGN Tµν − gµν T . (5.1.20)
2

5.2 Modern derivation


Today a theoretical physicist would try to guess first an action, and then derive from
it the equations of motion. This is actually how Hilbert proceeded; this led him to the
correct equations before Einstein.

54
The action for a field should be an integral over spacetime of a Lagrangian density:
R
S = d4 xL. For the gravitational field, L should be constructed in terms of the metric
and its associated curvature, the Riemann tensor.
At first we might guess L to be a function, since it has no indices. However, we should
be careful about the measure d4 x. Under coordinate changes, this transforms as
 ′µ 
4 4 ′ ∂x
d x → d x = det ν
d4 x . (5.2.1)
∂x

This transformation law is not a particular case of (3.4.6). An object that transforms
picking up a factor of the determinant of the coordinate change matrix (the Jacobian
determinant, or often simply Jacobian) is often called a density. It is also possible to
define a generalization of (3.4.6) with an additional factor of the Jacobian determinant;
such objects are called tensor densities.
We can “repair” d4 x by noticing that the determinant of the metric transforms as
 µ 2
′ ∂x
g ≡ detgµν → det(gµν ) = det(gµν )det , (5.2.2)
∂x′ν

which follows from (3.1.8). Thus the combination



vol ≡ −g d4 x (5.2.3)

is actually invariant. (The minus sign is because g < 0 in Lorentzian signature.)


It is actually also natural to view vol as a four-form:

vol = −g dx0 ∧ dx1 ∧ dx2 ∧ dx3 . (5.2.4)

Indeed once one introduces forms it is natural to say that only k-forms can be integrated
over a space of dimension k. One can also write
1
vol = ϵµνρσ dxµ ∧ dxν ∧ dxρ ∧ dxσ , (5.2.5)
4!

where ϵµνρσ is the tensor which is totally antisymmetric and such that ϵ0123 = −g.11
In any case, from (5.2.3) we see that it is more natural to write

Z
S = d4 x −gL ; (5.2.6)

11
It is a nice exercise to check that this is indeed a tensor. You will have to use the fact that
ϵ0µ1 µ2 µ3 µ4 Mνµ11 Mνµ22 Mνµ33 Mνµ44 = ϵ0ν1 ν2 ν3 ν4 , where ϵ0µνρσ is the antisymmetric tensor such that ϵ00123 = 1.

55

since d4 x −g is invariant, the Lagrangian density L must be invariant as well — or, in
other words, it must be a function.
As anticipated, we expect this function L to be constructed from the metric gµν and
from the Riemann tensor Rµνρσ . To obtain a function, we should somehow get rid of
the indices; we can use the inverse metric g µν . For example we can write g µν gµν ; this is
certainly a function, but g µν gνρ = δ µ ρ, so in fact g µν gνµ = δµµ = 4 — not a very interesting
function. The next attempt might be to use the inverse metric and the Riemann tensor:
this produces in fact the Ricci scalar (5.1.15):

g µν g ρσ Rµρνσ = g ρσ Rρσ = R . (5.2.7)

This possibility looks particularly interesting, because R contains two derivatives of the
metric. (Γ contains only one derivative; Riemann contains two; the contractions don’t
introduce any further derivatives.) If we want to think about gµν as the gravitational
field, then a function with two derivatives of gµν looks like a possible kinetic term. There
is no other function with this feature: if we start to contract for example the Riemann
tensor with itself, we end up with four or more derivatives of gµν . For example

R2 , Rµν Rµν , Rµνρσ Rµνρσ (5.2.8)

are some possible four-derivative terms.


Another possible extension would be to use a connection which is different from the
Levi–Civita connection that we introduced in section 4.4. One might feel that our motiva-
tion for choosing that connection was a bit mathematical in nature rather than motivated
µ
by physics. Thus one could consider a non-zero “non-metricity” ∇µ gνρ or torsion Tνρ .
However, recall from (4.4.1) that the difference of two connections is a tensor. In other
words, any connection can be written as Γ = ΓLC + δΓ, where δΓ is a tensor. One can
µ
find an expression for δΓ in terms of ∇µ gνρ or torsion Tνρ . Doing this in the end results
simply in introducing two tensor fields on top of the field gµν that we have already. This
is a possibility, but not one that is especially more physically motivated than introducing
other matter fields, such as a gauge potential Aµ , a scalar ϕ etc. Moreover, coupling tensor
fields to gravity introduces problems at the quantum level. Hence, we will not consider
this possibility in these notes.
We are thus led to considering the simplest possibility, namely that L ∝ R:

Z
1
SEH = d4 x −gR . (5.2.9)
16πG
This is called the Einstein–Hilbert action. The coefficient in front is necessary for dimen-
sional reasons, but of course the precise value of 16π cannot be fixed with the consid-

56
erations in this section; for that we have to go back to the weak-field situation (5.1.1).
Notice, however, that there was far less guesswork involved in the approach of this section.

5.3 Variation of the action


We will now see that (5.2.9) indeed leads to (5.1.19), when coupled to the action for
matter.
First let us vary (5.2.9). There are three terms:
√ √
Z
1
d4 x −g δ log( −g)R + δg µν Rµν + g µν δRµν .

δSEH = (5.3.1)
16πG
For the first term, we notice the general formula log det M = Tr log M , which upon
variation produces

δ log det M = δTr log M = Trδ log M = Tr(M −1 δM ) = −Tr(M δ(M −1 )) . (5.3.2)

In particular we obtain
√ 1 1
δ log( −g) = g µν δgµν = − δg µν gµν . (5.3.3)
2 2
As for the last term in (5.3.1), we go back to (4.5.3) and we observe that, under
variation of Γ,
1 ρ
δR σµν = ∂[µ δΓρν]σ + δΓρ[µ|λ Γλν]σ + Γρ[µ|λ δΓλν]σ = ∇[µ δΓρν]σ . (5.3.4)
2
It follows that

g µν δRµν = g µν (∇ρ δΓρµν − ∇µ δΓρρν ) = ∇ρ (δΓρ µ µ − δΓµ µ ρ ) . (5.3.5)

The appearance of this derivative suggests that this part of the variation is a total deriva-
tive. Let us see this more explicitly. From (4.4.7) we see
1 1  1
Γµµν = g µρ ∂ν gµρ = Tr g −1 ∂ν g = ∂ν log g , (5.3.6)
2 2 2
where the last step is similar to (5.3.3). It follows that
1 √
∇ρ V ρ = ∂ρ V ρ + Γµµρ V ρ = √ ∂ρ −gV ρ .

(5.3.7)
−g
Thus in particular the third term in (5.3.1), using (5.3.5) and (5.3.7), reads
√ √
Z Z
d x −gg δRµν = d4 x∂ρ −g(δΓρ µ µ − δΓµ µ ρ ) .
4 µν

(5.3.8)

57
Thus that term is indeed a total derivative, and it can be discarded in most situations.
In some cases, one actually has to be careful about the boundary conditions, and this
term cannot be discarded. One then has add to (5.2.9) a boundary term, the so-called
Gibbons–Hawking term. We will not discuss it here.
Thus (5.3.1) becomes

Z  
1 4 µν 1
δSEH = d x −gδg Rµν − Rgµν . (5.3.9)
16πG 2
Thus the equations of motion for pure gravity (gravity without matter) would be Rµν −
1
g R = 0. Contracting this with g µν (which is in a sense “taking the trace” of this
2 µν
equation) we get R = 0. So in absence of matter the equations of motion are simply

Rµν = 0 . (5.3.10)

This does not mean that spacetime should be flat: Rµν = 0 does not imply Rµνρσ = 0.
Indeed, a symmetric tensor in d dimension has 21 d(d + 1) components, which for d = 4
gives 10 components; whereas the Riemann tensor has 20 independent components, as
we found back in (4.6.12). Had Rµν = 0 implied Rµνρσ = 0, the absence of matter in a
region without matter would have implied that spacetime is flat there; a particle traveling
in that region would have felt no effect from gravity. In other words, gravity would not
have propagated. (The same counting argument shows that in d = 2 and d = 3 Rµν = 0
does imply that Rµνρσ = 0; this means that gravity indeed does not propagate in those
dimensions.)
To compare with (5.1.19), we should also add the Lagrangian for the matter fields,
which again is best parameterized as in (5.2.6):

Z
Sm = d4 x −gLm . (5.3.11)

It turns out that the stress-energy tensor satisfies


2 δ √
Tµν = − √ ( −gLm ) . (5.3.12)
−g δg µν
The right-hand side is automatically symmetric; Tµν , as defined by the Noether procedure,
is not. However, we have already remarked that it can be made symmetric, possibly by
adding total derivative terms to the Lagrangian.
Now

Z
1
δSm = − d4 x −gδg µν Tµν . (5.3.13)
2
Putting this together with (5.3.9) we indeed get (5.1.19).

58
5.4 Normal coordinates
We based our quest for the equations of motion on the assumption that the equations of
motion should have the same expression in all coordinate systems, which we motivated
from the equivalence principle.
We now pause to check the equivalence principle more directly. It states that we
cannot detect a gravitational field in small enough box. Moreover, we have learned that
gravity manifests itself as spacetime curvature. It follows that every point p should have
a neighborhood where the metric is approximately the Minkowski one.
To see that such a coordinate system exist, consider the geodesics xµ (λ) that originate
from p, and call aµ their initial velocity vector. (The coordinate λ on the geodesic repre-
sents distance or proper time, depending on whether the geodesic is spacelike or timelike.)
If by convention we assign coordinates 0 to p, then xµ (0) = 0, ∂λ xµ (0) = aµ . Now label a
point p′ with geodesic distance λ from p with the coordinates

xµ = λaµ . (5.4.1)

In other words, we are defining a coordinate system in which all geodesics exiting p look
like straight lines.
The metric gµν (p) can be set to ηµν by a linear coordinate change. But we would also
like to characterize the metric at nearby points. We can do this by a Taylor expansion:
1
gµν = ηµν + ∂ρ gµν (p)xρ + ∂ρ ∂σ gµν (p)xρ xσ + O(x3 ) . (5.4.2)
2

To identify the first and second derivatives in this expression, we can use the geodesic
equation (4.7.7): ẍµ + Γµνρ ẋρ ẋσ = 0. By our assumption (5.4.1), the second derivative of
x is zero and the first is just a:

Γµνρ (x = λa)aρ aσ = 0 . (5.4.3)

Evaluating this at λ = 0 we get Γµνρ (x = 0)aρ aσ = Γµνρ (p)aρ aσ = 0. Since this should be
true for all a, Γµνρ (p) = 0. Recalling (4.4.5),

∂µ gνρ (p) = 0 . (5.4.4)

Next, we take ∂λ of (5.4.3) and set λ = 0. This gives 0 = Γ̇µνρ (p)aρ aσ = ∂α Γµνρ (p)aα aρ aσ
(recalling that ẋµ = aµ ). So
∂(α Γµνρ) (p) = 0 . (5.4.5)

59
Since Γµνρ (p) = 0, the Riemann tensor at p is just Rρ σµν (p) = (∂µ Γρνσ − ∂ν Γρµσ )(p). It
follows that
(5.4.5) 3
Rρ (σµ)ν (p) = ∂(µ Γρσ)ν − ∂ν Γρµσ = − ∂ν Γρµσ . (5.4.6)
2
We can now determine the second derivative of the metric as well:
(4.4.5) (5.4.6) 2 2
∂σ ∂µ gνσ (p) = (∂σ Γρµν + ∂σ Γνµρ )(p) = − (Rρ(µν)σ + Rν(µρ)σ )(p) = Rµ(νρ)σ (p) .
3 3
(5.4.7)
Using (5.4.4) and (5.4.7) in the expansion (5.4.2) we finally obtain
1
gµν = ηµν + Rρµνσ (p)xρ xσ + O(x3 ) . (5.4.8)
3

5.5 Beyond general relativity?∗


In section 5.2 we derived the action (5.2.9) from simple considerations. We first saw that
L in (5.2.6) should be a function; we then saw that the only function with two derivatives
of the metric is the Ricci scalar R.
This leads to Einstein’s equations and general relativity as we know it, which is a
strikingly successful theory in many respects. It does have problems, however, at the
quantum level: it is non-renormalizable. This might motivate us to have a second look at
our motivations for selecting (5.2.9) as our action. Why, for example, could we not include
higher-derivative terms such as (5.2.8)? Is there a more systematic way of deciding what
the correct action is?
First let us recall what non-renormalizability is. If one tries to compute perturbative
scattering amplitudes in any theory, one often finds that they diverge. That is often be-
cause the parameters appearing in the Lagrangian are not actually the physical quantities
(such as masses, charges etc.). Relating the parameters in the Lagrangian to the physical
ones is called “renormalizing” them. It is often done by performing the path integral only
for modes of momentum up to a cutoff scale Λ; after performing the physical computation
we are interested in, we send Λ → ∞. If the theory is renormalizable, the physical result
remains finite in that limit.
Sometimes one needs to add new “counterterms” that were not present in the original
Lagrangian. The original theory is then called “non-renormalizable”. One possible reac-
tion is to change the original theory, so that now it also contains the new counterterms.
However, often the new theory requires new counterterms, and the procedure never ends.
The theory is no longer predictive: one would need infinitely many experiments to fix the

60
new parameters in the Lagrangian, before one can compute the original quantity one was
interested in.
We can perhaps make the same point less precisely, but more strongly, as follows. Let
us work in “natural units”, as one usually does in particle physics: i.e. let us set in this
section
ℏ=c=1 (5.5.1)
so that energies have dimensions of length−1 . To set up perturbation theory, we can
again use the weak-field approximation (5.1.1); expanding (5.2.9) we get, schematically,
S ∼ G1 d4 x((∂h)2 + h(∂h)2 + h2 (∂h)2 + . . .). We can normalize canonically the kinetic
R

term by redefining h → Gh, which gives
Z
4

2
√ 2 2 2

S ∼ d x (∂h) + Gh(∂h) + Gh (∂h) + . . . (5.5.2)

So a scattering amplitude of four gravitons, for example, at tree level starts with G (which
in our natural units has dimension [length]2 , so it has already the correct dimension for a
cross-section). At one-loop level, there will be at least two powers of G, from two vertices.
But to get a [length]2 we need to combine this with two powers of the cutoff Λ; so the
whole amplitude looks like
G + G2 Λ2 + . . . . (5.5.3)
But it is now clear that Λ cannot be sent to infinity. In fact, if the one-loop term is bigger
than the tree-level one, it is clear that perturbation theory is breaking down. If we view
this as a scattering amplitude, we have to impose that the probability of interaction does
not get larger than one. Either way, we expect the theory to lose meaning when the cutoff
Λ is of order
1
MPl ≡ √ , (5.5.4)
G
called the Planck mass; its value is around 1019 GeV . Thus we expect general relativity
to stop making sense quantum mechanically beyond energies of MPl .
After this intuitive discussion, here is a more technical one. At one loop one finds di-
vergencies proportional to the four-derivative terms in (5.2.8). One combination of them
is actually a total derivative; the other two vanish on-shell for pure gravity, since its equa-
tions of motion in absence of matter read Rµν = 0. This implies that the divergencies can
be reabsorbed by a redefinition of the metric. This raised some hopes that gravity might
be renormalizable after all. However, this feature of the theory was already destroyed by
the presence of a single scalar.

61
At this point several people tried to read this as a constraint on the matter content of
the theory of the real world: namely, they tried to add various particles of various spins,
to make the theory one-loop renormalizable just as pure gravity is. This eventually led
to supergravity, the supersymmetric extension of general relativity.
However, higher loops destroy these hopes. In pure gravity at two loops, one finds a
divergence proportional to
Rµν ρσ Rρσ λτ Rλτ µν ; (5.5.5)
this cannot be eliminated by a field redefinition. Supergravity also has divergencies,
although at higher loops. In any case, supergravity finds ultimate redemption in string
theory.
Amusingly, if one adds to the action the first two terms in (5.2.8):


Z  
4 1 2 µν
S = d x −g R + αR + βRµν R (5.5.6)
16πG

the resulting theory is renormalizable. This might be surprising right now, but we’re
going to see it is actually quite natural. This theory, however, has another problem: it
has negative-energy states!
Summing up: our original theory (5.2.9) is non-renormalizable. Should we throw
away (5.2.9) and general relativity altogether? Well, a more modern point of view on
non-renormalizable theories is that they are effective theories. They are perfectly sensible
theories, as long as one keeps the cutoff (and hence the energies of all processes) well
below the energy where the theory loses its meaning — in general relativity’s case, well
below MPl .
In this “Wilsonian” point of view, these effective theories are the result of a renormal-
ization group (RG) flow from a more complete theory. Suppose we have a quantum field
theory (QFT) which is defined at all energies, with action S. We now perform the path
integral on all modes with p2 larger than a certain mass M 2 :
Z
−SM
e = [DΦ]e−S (5.5.7)
p2 >M 2

where [DΦ] denotes the integration measure over all fields. SM is called “effective action”;
it is a very good approximation to S for processes that happen at energies far smaller
than M . The RG flow is this procedure that takes S → SM for varying M . In general,
SM will contain all the operators OI allowed by symmetries, with coefficients of order

M 4−dim(OI ) ; (5.5.8)

62
the exponent is dictated by dimensional analysis.12 A non-renormalizable theory is usually
simply an effective theory for some more fundamental theory S; there are in general
infinitely many OI — these are the infinitely many counterterms we would have had to
add in the previous approach. From this point of view, the only problem that can happen
is that sometimes (5.5.8) is smaller than one would expect from (5.5.8).
Let us see what this means for general relativity. There is an unknown theory of
quantum gravity; suppose we have performed the path integral over all modes with p2 >
2
MPl , just like in (5.5.7). Then we expect an action which has all the possible operators
allowed by symmetries — namely, functions that contain the Riemann tensor and of the
metric — weighed by powers of MPl , according to (5.5.8).
Let us be more specific. In the natural units ℏ = c = 1 of this section, gµν is dimen-
sionless, and Riemann has length dimension −2 because of the two derivatives; in other
words, it has mass dimension 2. Hence the operator R in (5.2.9) has mass dimension 2,
2
and according to (5.5.8) it should appear in the action with a coefficient of order MPl ;
recalling (5.5.4), we see that this is correct. At mass dimension 4, we find the operators
in (5.2.8). According to (5.5.8), these should appear in the action with dimensionless
coefficients. At mass dimension 6, one will find many more possibilities, one of which is
(5.5.5); this will be weighed by a coefficient of order M12 , which will be rather small. The
Pl
series is infinite: at each mass dimension, one finds more and more possible combinations.
In fact, wait a second: what about mass dimension 0? There is also a perfectly sensible
4
operator 1, which we shouldn’t discard; according to (5.5.8), it has coefficient MPl . Thus,
schematically:

Z  
4 4 2 2 1 3
SMPl = d x −g MPl + MPl R + R + 2 R + . . . (5.5.9)
MPl
This is what one expects as an effective theory for a theory of quantum gravity. The
various proposals for a theory quantum gravity (e.g. string theory) should all give instances
of (5.5.9).
General relativity (5.2.9) should be viewed as an approximation to (5.5.9); the idea is
that we have not been able to test the presence of the terms R2 , R3 . . . in the Lagrangian
yet, because gravity is so weak to begin with. From this point of view, the theory (5.5.6)
is renormalizable simply because we can formally push the cutoff MPl to infinity, and
get rid of all terms R3 and higher. (Such operators, with mass dimension larger than 4,
are called irrelevant because their effect becomes smaller and smaller under this limit.)
On the other hand, (5.5.6) with β = 0 is non-renormalizable, because terms with β ̸= 0
12
Notice that already our perturbative action has coefficients dictated by (5.5.8), if we remember (5.5.4).

63
get generated; but it still can be used a pretty useful model (and currently favored by
data) for the initial phases of the Universe, the so-called inflation that we will mention
in section 7.2.
4
What about the term MPl ? We should expect it is also present in general relativity.
Conventionally one introduces a Λ:

Z
1
SEHΛ = d4 x −g(R − 2Λ) . (5.5.10)
16πG
As we will see, such a Λ, called cosmological constant, is indeed observed — but it is
4
not at all of the order suggested by (5.5.9): Λ/G is smaller than MPl by about 120 orders
of magnitude! This is the so called cosmological constant problem.

6 Black holes
Now that we know the laws of general relativity, we can explore its solutions. One of the
easiest things to do is just to analyze the effect on spacetime of a single pointlike mass.
This is particularly easy, due to the presence of many symmetries. As we will see, in this
case the curvature is so large that light itself cannot escape: this is the famous black hole.

6.1 Spherical symmetry


Intuitively, if we place a pointlike mass M at a point in space (let us call it “the origin”)
there is rotational symmetry around it. That should mean that after a rotation around
the point, the metric should remain the same. But in a space which no longer has the
usual Euclidean metric, what does “rotation” mean exactly?
First some notation. An isometry is a map J : x → x′ under which action the metric
remains invariant: namely, the transformed metric as defined in (3.1.8) is equal to the
original metric:

gµν = gµν . (6.1.1)
For example, the flat space metric in R3 is invariant under rotations and translations.
As we have also seen, the flat space metric in R3,1 is invariant under rotations, Lorentz
transformations and translations. Isometries have the property that they form a group:
roughly speaking, this means that the composition of two isometries is still an isometry,
and that each isometry has an inverse.13 The set of rotations around a point in R3 is a
13
For more details, see for example my notes on group theory [3].

64
group that is usually denoted SO(3). The set of rotations and Lorentz transformations in
R3,1 is similarly called SO(3,1). These groups are smooth manifolds themselves; groups
with these property are called Lie groups.
We will define rotational symmetry as the presence of an isometry group which is “iso-
morphic” to SO(3); this means roughly speaking that there is a one-to-one correspondence
between between isometries of our metric g and rotations in R3 , so that to every rotation
R in R3 one can associate an isometry JR , and viceversa, and that JR1 R2 = JR1 JR2 .
In practice, there is a concrete way of checking the presence of such an isomorphism.
Most of the properties of a Lie group can be understood by looking at elements that are
very near the identity element, i.e. the map Id that sends each point to itself. The set of
such infinitesimal maps is a vector space, which also has an antisymmetric “bracket” with
properties similar to the commutator of matrices or to the Poisson bracket; the formal
name is Lie algebra. It is easier to check an isomorphism between two Lie algebras than
between two Lie groups. The Lie algebra of the rotation group SO(3) is just the algebra
of the generators ℓi of angular momentum:

[ℓi , ℓj ] = ϵijk ℓk . (6.1.2)

First let us see concretely how this Lie algebra of isometries manifests itself. We
have learned that an infinitesimal map can be thought of as a vector field. We have also
learned that the infinitesimal action of a map is the Lie derivative of section 3.5. Just
like in (3.5.1), if the map J ∼ Id + ϵv, we have (Lv g)µν = limϵ→0 1ϵ (g ′ − g)µν . If J is an
isometry, (6.1.1), it follows that
(Lv g)µν = 0 . (6.1.3)
Such a vector v is called a Killing vector. In other words, a Killing vector is the
infinitesimal version of an isometry.
A concrete way to check if a vector v is Killing is given by using in (6.1.3) the expression
(3.5.9). We can also re-express this as

2∇(µ vν) = 2∂(µ vν) − 2Γρµν vρ


= 2∂(µ gν)ρ v ρ + 2gρ(ν ∂µ) v ρ − (∂µ gνρ + ∂ν gµρ − ∂ρ gµν )v ρ (6.1.4)
= v ρ ∂ρ gµν + 2∂(µ v ρ gν)ρ = (Lv g)µν .

So a Killing vector can also be characterized as a vector satisfying

∇(µ vν) = 0 . (6.1.5)

65
We have also seen that the Lie bracket (3.3.10) of two vector fields gives a third vector
field. The Lie bracket has the correct properties so that the space of all vector fields can
be considered a Lie algebra; it is an infinitely-dimensional one. However, it is also true
that the Lie bracket of two Killing vectors is a Killing vector. This follows from the nice
property
L[v,w] = [Lv , Lw ] , (6.1.6)
which can be shown from (3.5.7), (3.5.8). It follows that the space of Killing vectors of a
given metric is a Lie algebra.
So an alternative criterion for rotational symmetry is that its Lie algebra of symmetry
contains the Lie algebra of rotations (6.1.2). In other words, there need to be three Killing
vectors ℓi whose Lie brackets satisfy (6.1.2).
On the flat-space metric ds2 = ηµν dxµ dxν these Killing vectors can be written as

ℓ1 = −x2 ∂3 + x3 ∂2 , ℓ2 = −x3 ∂1 + x1 ∂3 , ℓ3 = −x1 ∂2 + x2 ∂1 . (6.1.7)

One can also write these out how these look like in polar coordinates (3.8.2):14

ℓ1 = sin ϕ∂θ + cos ϕ cot θ∂ϕ , ℓ2 = − cos ϕ∂θ + sin ϕ cot θ∂ϕ , ℓ3 = −∂ϕ . (6.1.8)

One can verify that these are indeed Killing vectors of the flat metric in polar coordinates,
(3.8.3). One can also verify that they satisfy indeed (6.1.2).
In fact, in these polar coordinates, it is clear that the r part of the metric plays no
role. So one way to have rotational symmetry is to include a piece f 2 (dθ2 + sin2 θdϕ2 ) in
the metric. A priori, f could be any function of the remaining two coordinates. However,
we can just take f to be one of the two coordinates, and call it r. So we end up with a
term r2 (dθ2 + sin2 θdϕ2 ). (The mass M will be located at r = 0.) Besides r, θ and ϕ, let
us call the remaining coordinate t̃.
Could there be terms involving only one among θ and ϕ, such as drdθ, dsdϕ and so
on? Well, such a term would not be rotationally invariant. Indeed one can show that
there is no one-form ω built out of dθ and dϕ which is invariant under the ℓi in (6.1.8) —
namely, such that Lℓi ω = 0. So at this point we end up with

gt̃t̃ dt̃2 + 2grt̃ drdt̃ + grr dr2 + r2 (dθ2 + sin2 θdϕ2 ) . (6.1.9)
14
Perhaps these don’t look good to you at θ = 0, π, because of the presence of the cotangent. But recall
from section 3.7 that vector fields on a manifold such as S 2 are defined by giving a different expression
on each open set. (6.1.8) is appropriate only on UN ∩ US = S 2 {θ = 0, π}, in the notation of section 3.8.
As we did there for the metric, it is possible to check that the expression for the ℓi in the coordinates on
UN and US has no poles.

66
Now let us see if we can get rid of the mixed terms drdt̃ with a coordinate change. r
is already fixed by its appearance in front of the S 2 metric (dθ2 + sin2 θdϕ2 ); so we can
only change t̃. We will take it to be a function of r and of a new coordinate t: t̃ = t̃(t, r).
Focusing on the t̃, r directions:

gt̃t̃ dt̃2 +2grt̃ drdt̃+grr dr2 = gt̃t̃ (∂t t̃)2 dt2 +2(gt̃t̃ ∂t t̃∂r t̃+grt̃ ∂t t̃)dtdr+(gt̃t̃ (∂r t̃)2 +2grt̃ ∂r t̃+grr )dr2 .
(6.1.10)
grt̃
We see that we can cancel the drdt term if we take ∂r t̃ = − g . So we end up with
t̃t̃

ds2 = gtt dt2 + grr dr2 + r2 (dθ2 + sin2 θdϕ2 ) . (6.1.11)

What do the coefficients gtt and grr in (6.1.11) depend on? Under the ℓi in (6.1.8),
they transform as functions: indeed (Lℓi g)tt = ℓµi ∂µ gtt , since the ℓi don’t have components
along t; and similarly for grr . So, for ℓi to be Killing vectors, gtt and grr should be
functions of θ, ϕ that are invariant under the ℓi . There are no such functions, except the
constant. So gtt and grr cannot depend on θ, ϕ, but only on t, r.
Summing up, with the requirement of rotational invariance, the metric can be put in
the form (6.1.11), with gtt = gtt (t, r), grr = grr (t, r).

6.2 The Schwarzschild solution


We will now apply the equations of motion of general relativity to the spherically sym-
metric metric (6.1.11). However, to simplify computations, we will anticipate one of the
results: namely, in fact the equations of motion imply that gtt and grr do not depend on
t.
gtt = gtt (r) , grr = grr (r) . (6.2.1)
If we didn’t make this assumption, it would eventually come out anyway from the anal-
ysis, but with more complicated computations. Moreover, it is convenient to define two
functions U and V :
gtt = −e2U (r) , grr = e2V (r) , (6.2.2)
so that we enforce already the fact that the metric should have signature (−1, 1, 1, 1).
Notice that (6.2.2) also implies that the vector

∂t (6.2.3)

is a Killing vector for (6.1.11).

67
We need to compute the Ricci tensor Rµν . This is unfortunately a bit lengthy, but
there are a few possible tricks to speed up the computation, depending on taste.
The first thing to do is to compute the connection Γµνρ . (Many people find it more
convenient to compute the Ricci tensor via the spin connection ω of appendix B rather
than via Γ; I will leave you this as an exercise.) One possibility is to just use the
definition (4.4.7). A possible alternative is to write down the particle action (4.7.2), and
extract Γµνρ from (4.7.7). Let us sketch this: the action reads
Z √ Z q
dτ −ẋ2 = dt e2U ṫ2 − e2V ṙ2 − r2 (θ̇2 + sin2 θϕ̇2 ) , (6.2.4)

We derive the equations of motion, and we set −ẋ2 to 1 after varying, as in (4.7.5). (That
is why we call τ the coordinate in (6.2.4); we have also denoted (˙ ) = ∂τ .) The equations
of motion now read
 
∂τ (e2U ṫ) = 0 , ∂τ (e2V ṙ) = −U ′ e2U ṫ2 + V ′ e2V ṙ2 + r θ̇2 + sin2 θϕ̇2 ,
(6.2.5)
∂τ (r2 θ̇) = r2 sin θ cos θϕ̇2 , ∂τ (r2 sin2 θϕ̇) = 0 .
We can now extract Γµνρ from the ẋν ẋρ term in (4.7.7). The non-zero components are

Γttr = U ′ ; Γrtt = e2(U −V ) U ′ , Γrrr = V ′ , Γrϕϕ = sin2 θΓrθθ = − sin2 θre−2V ;


1 1
Γθrθ = , Γθϕϕ = − sin θ cos θ ; Γϕrϕ = , Γϕθϕ = cot θ .
r r
(6.2.6)
where now ( )′ = ∂r .
We now have to compute the Ricci tensor. From its definition (5.1.5) and (4.5.3) we
can obtain several equivalent expressions:
Rµν = ∂ρ Γρµν − ∂ν Γρρµ + Γρρσ Γσµν − Γρµσ Γσνρ
1 1
= ∂ρ Γρµν − ∂µ ∂ν log g + ∂ρ log gΓρµν − Γρµσ Γσνρ (6.2.7)
2 2
1
= ∂ρ Γρµν − ∇µ ∂ν log g − Γρµσ Γσνρ .
2
It can be useful to multiply by dxµ dxν , to write fewer equations. For example:

∂ρ Γρµν dxµ dxν = V ′′ dr2 +∂r (U ′ e2(U −V ) )dt2 −∂r (re−2V )(dθ2 +sin2 θdϕ2 )−cos 2θdϕ2 . (6.2.8)

One can similarly compute the other terms. In total we get


    
µ ν 2(U −V ) ′′ ′ 2 ′ ′ ′′ ′ ′ ′ 2 ′
Rµν dx dx = e U +U +U −V dt + −U + U (V − U ) + V dr2
2
r r
+ re−2V (V ′ − U ′ ) + 1 − e−2V (dθ2 + sin2 θdϕ2 ) .

(6.2.9)

68
Now we can apply this to the equations of motion. Away from the origin, the equations
are simply (5.3.10), Rµν = 0.15 Taking sums and differences:
 
′′ ′ ′ ′ 1
U + (U − V ) U + =0, U′ + V ′ = 0 , r(V ′ − U ′ ) − 1 + e2V = 0 . (6.2.10)
r
Notice the low number of second derivatives. If we think of the r coordinate as a “time”,
this is quite unusual: more typical equations of motion of actions with two derivatives
contain double derivatives. Equation with only first derivatives look more like a constraint.
We will see in section 7.1 that this is no coincidence.
From (6.2.10) we see in particular that U + V is a constant. Up to redefinition of t,
we can simply take V = −U from now on. Then
 
′′ ′ ′ 1
U + 2U U + =0, 2rU ′ = e−2U − 1 . (6.2.11)
r
It is easy to see that in fact the first of these two follows from the second, which is then
all we have to solve; this can be done by quadrature, obtaining
rS
e2U = 1 − , (6.2.12)
r
where rS is an integration constant called the Schwarzschild radius. Obviously this solution
seems to do something funny at r = 0, where the mass is located; indeed at this point
one should not solve the vacuum equations of motion, as we have done, but one should
take care of a delta-like Tµν coming from the pointlike mass M at the origin.
Rather than doing so, we can fix the integration constant by interpreting our results

physically. We can go back to our gravitational potential energy Vgrav = m −gtt from
(4.7.16); from (6.2.2) we see (see figure 15)
r
U rS
Vgrav = me = m 1 − . (6.2.13)
r

At large distances we expect to recover the Newtonian behavior. Indeed


m rS
Vgrav ∼ m − + ... (r ≫ rS ) . (6.2.14)
2 r
Apart from the constant piece, we see indeed the 1r behavior of Newtonian gravity. More
precisely, it should behave like − GmM
r
. Comparison with (6.2.14) gives then

rS = 2GM . (6.2.15)
15
In this section we are setting to zero the cosmological constant we introduced in (5.5.10).

69
Figure 15: Gravitational potential for the Schwarzschild black hole.

All in all, our metric (6.2.2) has become

dr2
 
2 2GM
ds = − 1 − dt2 + + r2 (dθ2 + sin2 θdϕ2 ) . (6.2.16)
r 1 − 2GM
r

This is known as Schwarzschild metric.

6.3 Falling in
We have already noticed the divergent (but expected) behavior at r = 0, where the mass
M is located. However, something peculiar also happens at r = rS = 2GM . Here gtt = 0,
and grr diverges. Moreover, in the region r < rS , gtt > 0: the t coordinate has become
spatial! The signature of the metric is still correctly (−1, 1, 1, 1), since grr < 0; the r
coordinate has become timelike. Thus, if an observer happens to be in the region r < rS ,
it cannot stay at constant r, since time now flows in the r direction. The shape of the
gravitational potential (6.2.13) seems to suggest that it flows towards small r, so that
such an observer is inevitably sucked towards r = 0. We will soon explore this feature in
detail; the locus r = rS is called horizon.
However, it is worth emphasizing again that (6.2.16) is valid for a pointlike mass,
which is of course an abstraction. A real object would have a finite radius rM , and
using (6.2.16) would be justified only for r > rM ; for r < rM we would have to include
the effects of the non-zero Tµν in the interior, and the resulting metric would change.
For most objects, rM > rS . For example, for Earth we have M⊕ ∼ 6 · 1024 Kg; since
m2 m 2
G ∼ 6.7 · 10−11 N Kg −11 m

2 = 6.7 · 10 Kg s
, and recalling from (2.2.14) that 1s ∼ 3 · 108 m,
we obtain rS,⊕ = 2GM⊕ ∼ 1cm. The Schwarzschild metric is a good approximation only
until Earth’s surface. So in these cases there is in fact no horizon.
Nevertheless, there do exist objects which are smaller than their Schwarzschild radius:
these are called black holes. Not only do they exist; they are also important from a
theoretical point of view. So we will now discuss (6.2.16) in depth.
First, let us see if we should take the region r ≤ rS seriously. Since gtt ≥ 0 there, one

70
might be tempted to conclude that spacetime actually ends there. For example let us try
to send objects towards that region, by letting them fall in along geodesics. For simplicity
let us consider purely radial motion, i.e. θ and ϕ constant.
First let us consider light rays. Their trajectories are null everywhere, so ds2 =
1/2
0 along them, just like we did in Sec. 3.2. So from (6.2.16) we get 1 − rrS dt =
−1/2
± 1 − rrS

dr, or in other words

dr
dt = ± ; (6.3.1)
1 − rrS

this can be integrated to


 
r
t = ±r∗ + t0 , r∗ ≡ r + rS log −1 ; (6.3.2)
rS

the ± corresponds to outgoing/ingoing light rays. Ingoing rays are shown in figure 16. As
one can see, t diverges near r = rS . From the point of view of an external observer, the
light ray never enters the horizon. This is of course even more true for a massive geodesic,
since a massive object is even slower. One can formally continue (6.3.2) in r < rS by
replacing log(r − rS ) with log(rS − r); as we noticed earlier, now r is the time coordinate.
However, at this point one could doubt the physical reality of the region inside the horizon.

r=r

Figure 16: An infalling light-like geodesic.

Before proceeding, we pause to note an interesting and easy consequence of this dis-
cussion. Suppose an observer at a fixed r1 sends radial light rays to another observer at r2
once every ∆t. Since in our case the geodesic equation doesn’t contain t explicitly, these
rays will arrive at r2 also every ∆t. However, the physical time should be computed with
p
the metric: at r1 , ∆τ1 = −gtt (r1 )∆t, and the frequency is thus ν1 = 1/∆τ1 ; likewise at

71
r2 . So the frequencies are related by
p s
ν2 −gtt (r1 ) 1 − rS /r1
=p = . (6.3.3)
ν1 −gtt (r2 ) 1 − rS /r2

In particular, if r1 < r2 , ν2 < ν1 . This gravitational redshift is of course the phenomenon


we had predicted in the introduction using the equivalence principle.
Let us now consider a massive geodesic, and let us compute the proper time that it
takes to reach the horizon. The first equation of motion in (6.2.5) implies

E
ṫ = −Ee−2U = − . (6.3.4)
1 − rrS

(We will see later that E can be interpreted as energy.) In (6.2.5) we were assuming
ẋ2 = −1, so that the geodesic is parameterized by proper time; so we also know e2U ṫ2 −

e−2U ṙ2 = 1. This implies also ṙ = E 2 − e2U = E 2 − 1 + rrS . We can fix the integration
p

constant E for example by requiring that at the initial radius


q r0 the particle starts with
zero velocity, dt |r=r0 = 0; this requires −E = e |r=r0 = 1 − rrS0 . The total proper time
dr U

for the particle to reach the horizon is then


Z Z rS r Z rS
dr r0 dr
dτ = √ = p r0 , (6.3.5)
r0 E −e
2 2U rS r0 r
−1

which is finite. So in fact an infalling object reaches the horizon in finite proper time.
This seems to suggest we should take the horizon (and perhaps its interior) seriously.

6.4 Nothing happens at the horizon


Perhaps all the confusion we are encountering with the horizon are due to a bad choice of
coordinates. Remember for example how (3.8.4) looked singular at the north pole, while
in different coordinates it looks like (3.8.7), which is completely smooth. Or, for that
matter, notice how even the flat metric dr2 + r2 dϕ2 on R2 looks singular at the origin.
Could it be that there is a different system of coordinates where the strange features of
(6.2.16) are cured?
The first test of such an idea is to compute the curvature. After all, the curvature is
an object that transforms well under coordinate changes, and thus it can reveal things
which are true in all coordinate systems. For example, we have already remarked that
we can understand whether a metric can be (locally) brought to the Minkowski form
µ
by a coordinate change by computing its Riemann tensor Rνρσ ; if it vanishes, then it is

72
possible to find such a coordinate change. In our case, we don’t want to check whether
the metric is flat (we don’t expect it to be the case), but rather if we can get rid of its
strange features. What should we require of the Riemann tensor? If gµν itself has some
components that blow up or go to zero in a certain coordinate system, we can reasonably
µ
expect that Rνρσ will also have components that blow up. Indeed for example one can
r
check that Rϕrϕ blows up at the horizon. But in another coordinate system, perhaps the
′µ
Jacobian matrix ∂x ∂xν
acting on the r index will cure this. Perhaps a better diagnostic
tool is a scalar quantity, which transforms without Jacobian matrices: for example, if
Rµνρσ Rµνρσ blows up, it will blow up in all coordinate systems. For the Schwarzschild
solution (6.2.16), one can check
rS2
Rµνρσ Rµνρσ = 12 . (6.4.1)
r6
So the origin r = 0 will be peculiar in all coordinate systems, and is in fact called a
singularity for this reason; on the other hand, the horizon r = rS has a chance of being
“cured” in some coordinate system. (In general, it is possible that Rµνρσ Rµνρσ is finite
but that no such coordinate system exists.)
The first step towards obtaining such coordinates is to replace r with r∗ in (6.3.2),
where the radial geodesics are easier-looking:
r − rS
ds2 = (−dt2 + dr∗2 ) + r2 (dθ2 + sin2 θdϕ2 ) . (6.4.2)
r
This is already a simplification, but now we want to get rid of the factor r − rS . If we
define
u ± ≡ t ± r∗ , (6.4.3)
the factor −dt2 + dr∗2 becomes −du+ du− . If we now change coordinates to v+ (u+ ) and
v− (u− ), we can turn this to − ∂u + ∂u−
∂v+ ∂v−
dv+ dv− . So by a change of coordinates we can
r−rS
multiply the coefficient r by a function of u+ times a function of u− . Now observe that
(6.3.2) also means
r∗ r − rS rr
e rS = eS . (6.4.4)
rS
but er∗ /rS = eu+ /2rS e−u− /2rS is indeed a product of a function of u+ times a function of u− .
So it is a good idea to choose v± so that exactly these functions appear as ∂u ±
∂v±
. In other
words we take u
± ±
v± = ±2rS e 2rS . (6.4.5)
Indeed we now get
rS − rr
e S (−dT 2 + dR2 ) + r2 (dθ2 + sin2 θdϕ2 ) , (6.4.6)
r
73
where we have also defined v± = T ± R to get the metric in more familiar time/radial
coordinates. These are called Kruskal–Szekeres coordinates. We see that the bothersome
r − rS has disappeared, and the metric has now nothing special at r = rS . r is not one of
the coordinates, so it should be understood as a function of them determined implicitly
by (6.4.4), (6.4.3), (6.4.5). We see in particular that r = rS corresponds to v± = 0 and
hence T 2 = R2 , while r = 0 corresponds to T 2 − R2 = v+ v− = 4rS2 . We show all this in
figure 17.

T r=0

r=r

I II I

R
II

Figure 17: The Schwarzschild black hole in Kruskal–Szekeres coordinates. The light dotted
lines are hyperbolæ, which correspond to fixed r.

Since the metric has nothing special at r = rS , we now see clearly that we should
take it seriously until r = 0 (the orange hyperbolæ in figure 17). Notice also that radial
massless geodesics in (6.4.6) are given by T = ±R: namely, they are 45o degree lines in
figure 17. Even the strange infalling massles geodesic in 16 has now nothing strange.
Using this fact and taking time to flow in the direction of positive T , we see that from
every point we can only possibly evolve in points which are contained in a cone from that
point. Imagine we are in the region r > rS ; this corresponds to T 2 < R2 , and consists of
the two quadrants labeled I and I ′ in figure 17. If we cross the horizon r = rS , we go into
the region r < rS (T 2 > R2 ; quadrant II in figure 17), and we see that we cannot go back
to I or I ′ . It is for this reason that this geometry is called a black hole: not even light
can escape from inside the horizon r = rS .
However, in figure 17 we also see that there r < rS (T 2 > R2 ) also contains a second

74
p

Figure 18: A closeup of figure 17. We see the future cones of a point p outside the horizon, and
the future cone of a point p′ inside it.

region II ′ , with T < 0. If we start directly here, generically we will eventually get out
and end up in the quadrant I or I ′ . This region is sometimes called a white hole. It might
look a little surprising, but it is a consequence of the fact that the Schwarzschild solution
is invariant under time reversal t → −t. In the original Schwarzschild coordinates, the
light geodesics coming out of the white hole look like the ones in figure 16, after a time
reversal. These geodesics should evidently exist as well; Kruskal coordinates simply make
this clearer.
On the other hand, this is evidently only possible in a Universe where the black hole
has existed forever: these time-reversed geodesics have an asymptote at r = rS that goes
towards negative t. This is clearly not true for real black holes, since the Universe itself
is not eternal. A black hole in the actual Universe is formed by gravitational collapse of
a star; it will actually contain only regions I and II, and in particular no white hole.

6.5 Penrose diagrams


In general, given a metric, it is crucial to understand its causal structure: namely, which
points in the spacetime are possibly in the future of a given point. This was pretty easy
in figures 17 and 18, because radial light geodesics travel at 45o (and of course massive
geodesics have to stay inside them).

75
It is even a bit more practical to choose coordinates that make spacetime look compact.
To see how this works, let us first warm up with Minkowski spacetime. We can start from
polar coordinates, adding time to (3.8.3):

ds2 = −dt2 + dr2 + r2 (dθ2 + sin2 θdϕ2 ) . (6.5.1)

Similarly to (6.4.3), we define u± = t ± r, so that −dt2 + dr2 = −du+ du− . These


coordinates have an infinite range, although of course u+ ≥ u− . We can make them have
finite range by a further change of coordinates:

u± = tan p± . (6.5.2)

This changes the range to the set {p± ∈ [−π/2, π/2], p+ ≥ p− }: see figure 19. The former
coordinate r can this time also be expressed explicitly in terms of p± :
1 1 sin(p+ − p− )
r = (u+ − u− ) = . (6.5.3)
2 2 cos p+ cos p−
Thus (6.5.1) reads now
 
2 1 1 2 2 2 2
ds = −dp+ dp− + sin (p+ − p− )(dθ + sin θdϕ ) . (6.5.4)
cos2 p+ cos2 p− 4
Once again we have coordinates in which radial light-like geodesics are lines at 45o . So
once again the future of any point can be inferred easily; see again figure 19. But these
coordinates have the extra property that their range is compact; this makes them even
more useful. The coordinate range in such a coordinate system is called a Penrose diagram.
Notice that a massive particle will inevitably end up at the upper vertex of the triangle.
Only a massless geodesic can end up on the upper diagonal side, since it needs to travel
at 45o to get there.
We can perform a similar coordinate change in the Schwarzschild solution as well. We
can use this time
v± = 2rS tan(p± ) . (6.5.5)
This makes the diagram in figure 17 compact, and turns it into the Penrose diagram in
figure 20(a).
We mentioned at the end of the previous section that actual black holes are formed by
collapse of a star; the Penrose diagram would look more or less like figure 20(b). It is a bit
of a mix between the one for Minkowski and the one for Schwarzschild. Actually, quantum
mechanical considerations lead us to believe that black holes eventually evaporate; this
would further modify the diagram, in a way that you might try to guess.

76
r=
p p+

r=0

Figure 19: Penrose diagram for Minkowski space. We show the possible future of one point.
The dashed lines have constant r.

6.6 Schwarzschild geodesics


Having clarified the nature of our solution, we now study non-radial geodesics. Recall
that the Schwarzschild metric is relevant not only for a black hole, but also for the region
outside a spherically symmetric planet or star. We will learn a few tricks that are useful
for other metrics as well.
The equations of motion are again (6.2.5); but they are rather complicated, and it is
better to eliminate as many second derivatives as possible. We can use again the condition
−ẋ2 = 1; actually, we can also include the massless case if we write −ẋ2 = ϵ, with ϵ = 1
for massive particles and ϵ = 0 for massless ones. So we have

e2U ṫ2 − e−2U ṙ2 − r2 (θ̇2 + sin2 θϕ̇2 ) = ϵ (6.6.1)

and which is compatible with (6.2.5). You can check that the derivative with respect to
λ of (6.6.1) is a linear combination of (6.2.5). So we can replace one of those equations
with (6.6.1). This is a manifestation of the general remark (4.7.11).
We can also look for constants of motion. We expect these to be associated to the
symmetries of the system. Indeed, given a Killing vector K µ , the quantity

Kµ ẋµ (6.6.2)

77
r=0

r=

r=r

(a) (b)
r=
p p
Figure 20: (a): Penrose diagram for the Schwarzschild solution. The dashed +lines have constant
r. (b): Penrose diagram for a collapsing star, that eventually produces a black hole.

is conserved along geodesics. To see this:

∂τ (Kµ ẋµ ) = ∂ν Kµ ẋν ẋµ + Kµ ẍµ = (∂ν Kµ − Γρµν Kρr)ẋ µ ν µ ν


= 0ẋ = ∇(ν Kµ) ẋ ẋ = 0 . (6.6.3)

In particular, to the Killing vector ∂t we can associate the quantity −e2U ṫ, which is nothing
but the E we had defined in (6.3.4). As usual, the conserved quantity associated to time
translation can be interpreted as energy. We also have our rotational Killing vectors
(6.1.8). The associated conserved quantities Li = ℓiµ ẋµ are the components of angular
momentum. We can show that

L2 ≡ Li Li = r4 (θ̇2 + sin2 θϕ̇2 ) . (6.6.4)

Putting all together we find


 
2 2
 rS  1 2
ṙ + V (r) = E , V (r) ≡ 1 − L +ϵ . (6.6.5)
r r2

(In fact it is even smarter to arrange our coordinates so that the geodesic is in the θ = π/2
plane; then L1 = L2 = 0, L = L3 = r2 ϕ̇, and (6.6.5) follows effortlessly.)
(6.6.5) now has the familiar form of a particle moving in a radial potential, and it can
be used to infer many things about the existence and properties of orbits. Expanding the
product in V we find four terms; the 1/r3 one would be absent in Newtonian mechanics,
the other three would be present. (The 1/r term is the Newtonian potential, the 1/r2
represents centrifugal force.)

78
We can use (6.6.5) for example to study closed geodesics. In the massless case ϵ = 0,
we notice that V has a maximum at r = 3GM . Thus for an appropriate value of E we
can let r to be constant at that value. This means that we have a closed orbit for light
rays, although it is unstable.
For the massive case, V is more complicated, but it can still be studied exactly. The
extrema of V are obtained by solving a second-order equation:

L2 ± L4 − 12G2 M 2
r = r± ≡ . (6.6.6)
2GM

For L > 12GM there is a maximum at r− and a minimum at r+ , corresponding to two
circular orbits: the first is unstable and near rs , the other is stable and farther away. Of
course this stable one is the one relevant for motion of planets around a star.
However, usually orbits are not perfectly circular. An orbit that is only near-circular is
represented in (6.6.5) as the oscillation of r around the minimum. Famously, in Newtonian
gravity these are ellipses; let us see how these are modified in GR. To find the shape of the
orbits in spacetime, we set θ = π/2 as mentioned earlier; then we compute dr/dϕ = ṙ/ϕ̇
from (6.6.5) and L = r2 ϕ̇, getting the single differential equation
r2 p 2
∂ϕ r = E − V (r) . (6.6.7)
L

This can be solved by quadrature, i.e. by writing it as dϕ = Ldr/r E 2 − V and per-
forming the integral on both sides. It gets a little easier after defining x ≡ 1/r. In the
Newtonian case the integral would be a familiar arcsin, which would reproduce the famil-
iar ellipses. For GR, it is known as an elliptic integral;16 it defines a less familiar but still
quite well-known special function, so the geodesic can still be written in closed form.
Here however let us just try to compute a famous effect: the perihelion precession. The
perihelion and aphelion of an orbit are the minimum and maximum of r. In the Newtonian
case, again in terms of x = 1/r, E 2 − V (1/x) is a quadratic polynomial; the coefficient
of x2 is one, so we can parametrize it as (x − x1 )(x2 − x), where xmin = x1 = 1/rmax and
xmin = x2 = 1/rmin . Then the total angle between perihelion and aphelion is
Z x2 Z x2 Z x0
dx dx dx̃
∆ϕ = = = 2
−x0 x0 − x̃
p p 2
x1 E 2 − V (1/x) x1 (x − x1 )(x2 − x) (6.6.8)
= [arcsin(x̃/x0 )]x−x
0
0

which is the result we expect for an ellipse. We have defined x̃ = x − δ, δ = (x1 + x2 )/2 =
GM/L2 .
16
No direct relation of this word with the ellipses of the Newtonian case.

79
For GR, E 2 − V (1/x) is cubic; the coefficient of the quadratic term is still one, so we
can parameterize it as
 −1  
2 x1 + x2 x
E − V (1/x) = 1 + (x − x1 )(x2 − x) 1 − . (6.6.9)
x3 x3
The extra zero x3 ∼ rS−1 ≫ xmin , x, xmax ; so we can approximate
Z xmax Z xmax
dx dx  rS x 
∆ϕ = p ∼ (1 + rS δ) p 1+ (6.6.10)
xmin E 2 − V (1/x) xmin (x − x1 )(x2 − x) 2
Z x0    
dx̃  rS  rS δ 3
∼ (1 + rS δ) p 1 + (x̃ − δ) = π(1 + rS δ) 1 + ∼ 1 + rS δ .
−x0 x20 − x̃2 2 2 2

Now δ ∼ GM/L2 , and we can define the average radius a and eccentricity e through
rmin ≡ a(1 − e), r ≡ a(1 − e), to finally obtain
 
GM 2
∆ϕ ∼ π 1 + 3 (1 + e ) . (6.6.11)
a
This mismatch with (6.6.8) is the famous precession. Again an exact result could be
obtain in terms of elliptic functions.

6.7 Charged black holes


Let us now examine what happens if we make the mass also electrically charged. Actually,
this is never the case for real black holes, but the computation is instructive and the
resulting solution is a good testing ground for many theoretical ideas.
Adding electric charge means that our particle is now also coupled to the electromag-
R
netic force. In flat space, this is described by the action SEM = − d4 xFµν F µν . Since
the equivalence principle prescribes that physics should look the same in all coordinate
systems, we should extend this so that it is invariant under coordinate changes. So we
should extend it to curved spacetimes by choosing an Lm in (5.3.11), which transforms
like a scalar and which reduces to Fµν F µν in flat space. This is clearly Lm = Fµν F µν
itself, where indices are now understood as being raised by the curved inverse metric g µν :

Z
1
SEM = − 2 d4 x −gFµν F µν . (6.7.1)
4e
Now we can derive the stress-energy tensor from (5.3.12). It has two contributions, one

from varying the inverse metrics implicit in raising indices, one from varying the −g:
1
e2 Tµν = Fµρ Fν ρ − gµν Fρσ F ρσ . (6.7.2)
4
80
Let us start again from the rotationally-symmetric metric (6.1.11), with (6.2.2). The
Ricci tensor remains (6.2.9), but since the action now also includes (6.7.1), the solution
will no longer be the Schwarzschild metric. First of all, we have to decide what the
electromagnetic field strength Fµν is like. Since we took the metric to be rotationally
invariant, we should also require it for Fµν . In other words:

Lℓi Fµν = 0 . (6.7.3)

It is not hard to show, along the lines of our analysis in section 6.1, that the only two-
forms which satisfy this are dt ∧ dr and sin(θ)dθ ∧ dϕ; moreover, their coefficient can
only depend on r. In fact, for both we have a physical interpretation. Ftr corresponds
to a radial electric field, which is appropriate for an electric charge at the origin. On the
other hand, Fθϕ corresponds to a radial magnetic field, which would be appropriate for a
magnetic monopole at the origin (which might or might not exist in nature). So we will
only keep
F = Ftr dt ∧ dr . (6.7.4)
All the other components of F thus vanish (except of course Frt = −Frt ). As we men-
tioned, rotational invariance also requires Ftr = Ftr (r).
Before we look at the equations of motion for gravity, we should also think about the
ones for electromagnetism as well! These are the familiar ones in flat space, with the extra
effect of the presence of gravity. They read

∂[µ Fνρ] = 0 , ∇µ F µν = 0 . (6.7.5)

The first follows automatically from (3.4.10), while the second can be derived from (6.7.1).
We note that the second can also be written as

∂µ ( −gF µν ) = 0 (6.7.6)

by using (5.3.6). The only component is

∂r (e−(U +V ) r2 Ftr ) = 0 . (6.7.7)

This can be solved by taking


q U +V
Ftr = e , (6.7.8)
r2
where q is a constant which represents the electric charge of the particle.
Now we can consider the Einstein equations, in the form (5.1.20). We can see from
(6.7.2) that T = 0; so (5.1.20) simplifies to Rµν = 8πGTµν . We can also compute
1 2 −2(U +V ) 2U 2
Tµν dxµ dxν = F e (e dt − e2V dr2 + r2 (dθ2 + sin2 θdϕ2 )) ; (6.7.9)
2e2 tr
81
notice the signs.
Now, the first thing we can do is to check whether the equation U ′ + V ′ = 0 in (6.2.10)
has been modified. Luckily, it is not; so we can still take V = −U . Of the two remaining
two equations of motion, it is still true that one is first order, and that it implies the
other, which is second order. This first order equation reads
4πG 2 2
−e2U (2rU ′ + 1) + 1 = F r . (6.7.10)
e2 tr
Using (6.7.8), choosing e appropriately and writing e2U (2rU ′ + 1) = ∂r (re2U ), we can
easily find
2GM q2
e2U = 1 − +G 2 . (6.7.11)
r r
Thus the resulting solution is a rather minimal modification of the Schwarzschild solution:
q2 dr2
 
2 2GM
ds = − 1 − + G 2 dt2 + q2
dr2 + r2 (dθ2 + sin2 θdϕ2 ) , (6.7.12)
r r 1 − 2GM
r
+ G r2

known as Reissner–Nordström solution.


Notice that now there are two values of r where gtt = 0 and grr = ∞:
p
r = r± ≡ GM ± G2 M 2 − Gq 2 . (6.7.13)

For q = 0, r+ = rS = 2GM . On the other hand, for the critical value q = GM , the two

r± coincide, and for q > GM they become imaginary; in other words, in that case gtt
never vanishes.

We see in figure 21 a graph of the gravitational potential Vgrav in the cases q < GM

case and in the “extremal” case q = GM . These are to be compared with the potential
for the Schwarzschild case (q = 0) in figure 15. At large r the potentials in figure 21 start
going down like 1/r, just like in figure 15. However, at smaller r the contribution from
the electric charge dominates, and the potential goes up again.
In the case of figure 21(a), we see that Vgrav hits zero at the two radii r = r± . A
particle at r > r+ is attracted towards the black hole; in the region between the two radii,
r− < r < r+ , grr is negative and r becomes the time coordinate, just like inside the horizon
in the Schwarzschild black hole; so we don’t show Vgrav in that region. However, when the
particle goes in the region r < r− , grr is positive again, and r is again a spatial variable.
Vgrav is now repulsive, and the particle tends to get back in the region r− < r < r+ , then
back inside r− , and so on. Going to a less confusing set of coordinates, one can obtain a
Penrose diagram (see figure 22) which shows that in fact at each such back-and-forth one
ends up in a new copy of the spacetime.

82
(a) (b)
√ √
Figure 21: Gravitational potential for (a) the q < GM case, and (b) the “extremal” q = GM
case.

In the “extremal” q = GM case, r± coincide, and the radial coordinate is always
spatial; in figure 21(b) we see that the potential touches zero, and goes up again. (This
case is of special relevance in supergravity theories, where it is the only case that does
not break supersymmetry.)

Finally, there is also a case where q > GM . In this case gtt is never zero, and one
can travel all the way to the origin r = 0 without ever crossing a horizon. This might
seem good news, but it should in fact be a little unsettling. It means that an observer can
be influenced by what happens near r = 0. The curvature there is very large (just like in
(6.4.1) for Schwarzschild). So we have a naked singularity: a singularity at r = 0 which
is not covered by a horizon. From our discussion in section 5.5, we expect conventional
general relativity to break down when curvature is too large (basically when the Riemann
2
tensor is > MPl , in natural units). So in this case physics everywhere can potentially be
influenced by a region where general relativity does not apply. From this point of view,
horizons are a good thing: they prevent physics in the rest of spacetime to be influenced
by what happens in a region where we do not know which theory should apply. It is
thus natural to conjecture (weak) cosmic censorship, namely that there are no dynamic

processes in nature that create naked singularities such as the one for (6.7.12), q > GM .

Actually a similar problem also presents itself already in the subextremal q < GM
case, in the region r < r− . For this reason one sometimes calls r = r− a “Cauchy”
horizon: inside it, one can no longer predict the future from initial data. A different
“strong” cosmic censorship conjecture postulates that not even this can happen. In fact,
various stability analyses seem to suggest that the region inside this Cauchy horizon is
unstable in various ways, and that even a small amount of matter entering will modify it
so much that a new singularity develops there, preventing the inner region to even exist.

83
r=0

r=r

r = r+


Figure 22: Penrose diagram for q < GM . r=

6.8 Rotating black holes


Black holes in nature are not electrically charged, but they do rotate, often very fast. To
describe such black holes, we can forget about electromagnetism, but we should also not
assume full rotational invariance any more. The rotation singles out a direction, and we
expect that only one of the three ℓi in (6.1.8) is now still a symmetry.
The process to find a solution is now significantly more involved than the one we saw
for Schwarzschild, and we will not describe it. It is no longer a good idea to use polar
coordinates; there is a different set of “elliptical” coordinates, which generalize to three
dimensions the ones we see in figure 23.
In two dimensions, elliptic coordinates are defined by

x = r2 + a2 sin θ , y = r cos θ . (6.8.1)

Notice that if we define sinh λ ≡ ar , we have r2 + a2 = a cosh λ, and (6.8.1) can be
rewritten as
x + iy = a sin(θ + iλ) . (6.8.2)
Indeed the ellipses and hyperbolæ in figure 23 can be thought of as the level sets of the
two harmonic functions Ref and Imf , where f is the holomorphic function arcsin(x + iy).
It is easy to generalize (6.8.1) to three dimensions:
√ √
x = r2 + a2 sin θ cos ϕ , y = r2 + a2 sin θ sin ϕ , z = r cos θ . (6.8.3)

84
Figure 23: Elliptic coordinates in two dimensions. The red ellipses have constant r; the gray
hyperbolæ have constant θ. The distance between the foci is a.

The sets r =const are now the ellipses

x2 + y 2 z 2
+ 2 =1; (6.8.4)
r 2 + a2 r
the sets θ =const are hyperboloids. We can think of figure 23 as depicting a cross-section
at y = 0 of these ellipses and hyperboloids.
In these coordinates, the rotating black hole solution reads

dr2
 
2mr
ds = −dt + 2 (dt − a sin2 θdϕ)2 + (r2 + a2 ) sin2 θdϕ2 + ρ2
2 2
+ dθ2 (6.8.5)
ρ ∆

where now m = GM , and

ρ2 = r2 + a2 cos2 θ , ∆ = r2 − 2mr + a2 . (6.8.6)

(6.8.5) is called the Kerr solution, and as we will see it describes a rotating black hole. At
large r, or for m → 0, (6.8.5) reduces to −dt2 + the flat metric on R3 , but in the unfamiliar
elliptic coordinates (6.8.3). In particular, r = 0 is not a point as in spherical coordinates,
but rather a disk: in figure 23 we see it sideways, and it appears as a segment.
It is not easy to describe in simple words how this solution was found. Imposing that
there are only two Killing vectors leads to a complicated system of partial differential
equations, that are very hard to solve. Several interesting solutions have the property of

85
being algebraically special.17 Kerr’s insight was to impose this condition as well as the
expected symmetries. This constrains the problem enough that it reduces it to ordinary
differential equations, i.e. that all remaining functions have to depend on r alone.18
The physical interpretation of (6.8.5) is made harder by the fact that it is non-diagonal:
there are cross-terms dtdϕ. These terms are in fact the origin of the rotation. First of
all, we would like to see if there is a horizon. Our first instinct might be to check gtt : it
is equal to −1 + 2mr
ρ2
, and it vanishes first at the locus r = r0 ,

r0 (θ) = m + m2 − a2 cos2 θ . (6.8.7)

Inside this locus, the vector ∂t (which is Killing, just like for the other solutions we have
seen so far) is spacelike. However, this is not the horizon. Indeed at this point r has not
2
become a time direction, and we are not forced to follow it. That only happens when ρ∆
changes sign, which first happens at r = r+ , one of the zeros of ∆:

r± ≡ m ± m2 − a2 . (6.8.8)

This seems a bit paradoxical: in the region

r+ < r < r0 (6.8.9)

neither ∂t nor ∂r are timelike. Where does time point towards? To see this, we have to
look at the cross-terms dtdϕ in (6.8.5). The t–ϕ block of gµν is
!
2
−1 + 2mr
ρ2
− 2amr
ρ2
sin θ
. (6.8.10)
− ρ2 sin θ (r + a ) sin2 θ + 2mr
2amr 2 2 2
ρ2
a2 sin4 θ

To see if there is a timelike direction hidden here, we can compute its determinant, which
after some simplification reads
2
gtt gϕϕ − gtϕ = −∆ sin2 θ , (6.8.11)
17
This is defined in terms of the Weyl tensor, which is obtained from the Riemann tensor by projecting
it on a certain irreducible representation. Just like Riemann, it can be thought of as a map from the
space of two-planes to itself, or in other words a 6 × 6 matrix. It turns out that it can only have four
different eigenvalues. When at least two coincide, the solution is said to be algebraically special. This is
called Petrov classification. A theorem by Robinson and Trautman gives an independent characterization
as the existence of a light-like vector field ξ which is geodesic and shear-free: ∇(µ ξν) = 31 gµν ∇ρ ξ ρ . The
latter version is what Kerr used. It turns out that ξ = −dt − dr + a sin2 θdϕ.
18
If one is able to guess the structure of (6.8.5), perhaps from looking at the Minkowski or Schwarzschild
metric in elliptic coordinates, or by somehow knowing the ξ in the previous footnote, it is then possible
to solve the equations of motion for the remaining functions of r.

86
which in fact is the reason for the name we gave to ∆. So, when r > r+ , r is still spacelike,
but there is a mix of ∂t and ∂ϕ which is timelike. So, in the region (6.8.9), we are not
yet forced to follow the r direction, but we are forced to follow a mix of t and ϕ: in other
words, we are forced to rotate. In this region, which is called ergosphere, even light rays
have to rotate, because time flows in a rotating direction.

r = r+ r=r
r = r0

Figure 24: The various regions of a Kerr black hole. Notice that the surfaces r = r0 and r = r+
(the boundaries of the ergosphere) touch at θ = 0. The black ring is the singularity at r = 0,
θ = π/2.

For r < r+ , ∆ is negative, and the time direction becomes r: so r = r+ is really


the horizon, from which we cannot escape outside. At even smaller radii, we encounter
the smaller surface r = r− , where r becomes spacelike again. The solution is singular
(i.e. curvature explodes) at the locus {r = 0, θ = π/2}, which is the boundary of the disc
at r = 0 and thus is a ring, not a single point. See figure 24 for a summary. (You might
also want to look at the lovely figure 30 in [5]).
In fact the ring is in a sense a “branch cut surface”, going through which one ends
up in a second copy of the same spacetime, where r < 0. In this second copy one can
even find a region where ∂ϕ is timelike: since ϕ ∼ ϕ + 2π, there are closed timelike curves,
meaning that time repeats itself.

Our discussion of black hole solutions if of course by far not exhaustive. For example,
one can combine charge and rotation, to obtain the so-called Kerr–Newman solution. One
can also accelerate the black hole; and one can add the effect of the cosmological constant

87
Λ of (5.5.10).19 However, the general black hole solution only depends on relatively few
parameters. For example, if one specifies that the solution should be stationary and
asymptotically flat, the parameters are only three: mass, charge, angular momentum.
One cannot “distort” a black hole by throwing stuff in it: the perturbations die out
quickly, and one gets a simple solution. This property is sometimes expressed by saying
that “black holes have no hair”. However, it does not really hold if one couples the theory
to an arbitrary number of matter fields.

7 Time evolution
We will now focus on time evolution. The most important application of this is to the
evolution of the Universe itself, i.e. to cosmology — although we will give only a brief
introduction to that huge and important subject.

7.1 Constraints
We will start with some general considerations. We will separate time from the space
directions; the disadvantage of this is that diffeomorphisms that mix time with space are
no longer a manifest symmetry of the theory. The advantage, however, is that this will
show explicitly that the theory has constraints, as usual for a gauge theory. We have seen
already a manifestation of this in the equations of motion for “radial evolution”, (6.2.10),
and we will see another in the cosmological setting of section 7.2.
Thus let us take spacetime to be a direct product of R with a coordinate t, and a
“space” manifold M3 (along which we will use indices i, j = 1, 2, 3.) A convenient choice
of variables is the following:

ds2 = gµν dxµ dxν = −N 2 dt2 + γij (dxi + N i dt)(dxj + N j dt) . (7.1.1)

N and N i are sometimes called “lapse” and “shift”.20 γij represents the metric of the
three-dimensional manifold M3 .
It is also useful to introduce the one-form

n = nµ dxµ = N dt , (7.1.2)
19
The general solution of general relativity coupled to electromagnetism that takes all these solutions
into account is known as Plebanski–Demianski solution. There are several techniques to find exact
solutions of general relativity; see for example [6].
20
See for example [7, Chap. 21.4] for some explanation of these names.

88
whose corresponding vector reads nµ ∂µ = − N1 (∂t − N i ∂i ), and

Pµν = gµν + nµ nν . (7.1.3)

This is a projector: Pµ ν Pν ρ = Pµ ρ . Since Pµν nν = 0, we can think of it as a “space”


projector. (Such a projector is called mathematically first fundamental form.) Finally,
we will need to consider the “time derivative” of P : formally
1
Kµν = Ln Pµν . (7.1.4)
2
(This is also called second fundamental form.) It can be shown that Kµν = ∇µ nν + nµ aν ,
where aµ = nν ∇ν nµ ; and from this, by direct evaluation, we see
 
1 1
Kij = γ̇ij + ∇(i Nj) . (7.1.5)
N 2

where the ∇i is the Levi-Civita connection of γij .


We can now show how the Lagrangian (5.2.9) of general relativity is written in terms
of the variables in (7.1.1):
√ √
Z Z
4
d x −gR = dtd3 xN γ(R3 + K ij Kij − K 2 ) (7.1.6)

where R3 is the Ricci scalar of γij , and γ = det(γij ) its determinant; moreover K = K i i =
Kij γ ij . The terms K 2 and K ij Kij are a kinetic term for γij ; R3 is instead a sort of potential
for it. What is peculiar about this action, however, is that the time derivatives of the
variables N and N i never appear. Thus their equations of motion will only contain first
time derivatives, rather than second time derivatives as more usual equations of motion
do. Indeed by varying N we get (paying attention to the prefactor 1/N in (7.1.5))

R3 = K ij Kij − K 2 ; (7.1.7)

varying N i , and integrating a ∇i by parts we get

∇i K ij = 0 . (7.1.8)

These indeed contain only first time derivatives, via the term γ̇ij in (7.1.5).
In the Hamiltonian formalism, (7.1.7) and (7.1.8) can be thought of as constraints,
since they contain only variables and their first time derivatives. N and N i , which enforce

89
these constraints, can be then thought of as Lagrange multipliers.21 To see this more
clearly, consider introducing momenta
δL δL δL
π= , πi = , π ij = . (7.1.9)
δ Ṅ δ Ṅ i δ γ̇ij
As we mentioned, the Lagrangian does not depend on Ṅ or Ṅi , so we have

π=0, πi = 0 (7.1.10)

identically: these are already constraints. The other momenta are π ij = γ(K ij − γ ij K)
R
If we define (7.1.6) to be dtL, the Hamiltonian is now
Z
H = d3 x(π Ṅ + πi Ṅ i + π ij γ̇ij − L)
(7.1.11)

Z  
3 i ij 2 ij
= d x π Ṅ + πi Ṅ + N γ(K Kij − K − R3 ) + Ni ∇j π .

Since π = 0, we should also have π̇ = 0; but this is {H, π} = K ij Kij − K 2 − R3 . Thus we


recover (7.1.7). Similarly, πi = 0 recovers (7.1.8).
These constraints can also be derived more geometrically. There are various identities,
known collectively as Gauss–Codazzi equations, relating the curvature of a subspace to
the curvature of a bigger space which contains it. We can apply this to space M3 inside
spacetime R × M3 . For example, for the scalar curvature we have

R3 = R + 2Rµν nµ nν + K µν Kµν − K 2 , (7.1.12)

where Rµν is the spacetime Ricci tensor. Contracting Einstein’s equations Rµν − 21 gµν R = 0
with nµ nν , we see that R + 2Rµν nµ nν = 0. Thus we recover (7.1.7). One can recover
(7.1.8) in a similar way. See [8, Chap. 10.2] for more details.

7.2 Cosmology
We are now going to consider time evolution of our Universe, seen at very large scales.
Observations show that space (at fixed time) is very symmetric. One often says that
it is homogeneous (no point in it is special) and isotropic (no direction is special). Math-
ematically, this means that there are many Killing vectors. An example is the flat R3
21
Actually, in field theory one usually calls “Lagrange multiplier” a field such that none of its deriva-
tives (along either time or space) appear in the Lagrangian. It is only in the context of the present
Hamiltonian discussion, which in field theory would be a little clumsy, that we can call N and N i “La-
grange multipliers”. Nevertheless, (7.1.7) and (7.1.8) are very commonly called Hamiltonian constraint
and diffeomorphism constraint respectively.

90
metric, dx2 + dy 2 + dz 3 . This has six Killing vectors: three translations ∂x , ∂y , ∂z , which
express homogeneity, and three rotations (6.1.7). Together, these generate the Lie algebra
of the so-called “Euclidean group” ISO(3).
Six is the maximum number of Killing vectors one can have in d = 3. (In general the
maximum number is d(d+1) 2
.) So R3 is said to be a “maximally symmetric space”. There
are other possibilities, which are classified.
One is the three-sphere S 3 . This is an analogue of the two-sphere S 2 in example 3.6,
but in one dimension higher. It can be realized as the locus
( )
X
S3 = (xi )2 = 1 ⊂ R4 . (7.2.1)
i

(Just as we stressed in our discussion about the origin of curvature in section 4.1, it is
important to remember that this R4 has no physical reality; it is just an auxiliary entity.)
The six Killing vectors are simply the generators of the rotations in this R4 , which can
be expressed explicitly as x[i ∂j] , in analogy to (6.1.7). The corresponding group is called
SO(4). To find a nice metric for S 3 , we can solve the constraint (7.2.1) by introducing
polar coordinates; we get

ds2S 3 = dρ2 + sin2 ρ(dθ2 + sin2 θdϕ2 ) . (7.2.2)

Another possibility is the so-called hyperbolic space H3 , which can be obtained in a


similar way by starting from a hyperboloid. Again this has six Killing vectors; the sym-
metry group is SO(3,1) (which is coincidentally also the Lorentz group in four spacetime
dimensions). The metric can be obtained from (7.2.2) by changing sin ρ → sinh ρ:

ds2H3 = dρ2 + sinh2 ρ(dθ2 + sin2 θdϕ2 ) . (7.2.3)

All of these have a particularly simple Riemann tensor:

3
Rijkl = k(gik gjl − gjk gil ) (7.2.4)

(the 3 is to recall that this is for three-dimensional space, without time); here k = 0 for
R3 , k = 1 for S 3 , and k = −1 for H3 . From this it also follows that

3
Rij = 2kgij . (7.2.5)

A space with this property is called an Einstein space; there are many which do not have
the property (7.2.4).

91
There are also other possibilities, obtained from these three (R3 , S 3 , H3 ) by identifying
points in certain ways. By such identifications we can for example turn R3 into a compact
space, such as the three-torus T 3 ; there are fancier possibilities for H3 . Although current
3
observations show that Rij is very nearly zero, they cannot rule out the cases S 3 , H3
or their quotients. Indeed it would be possible that space has a metric of the form
ds2 = a2 ds2S 3 , with a a very large number. Under a constant rescaling of the metric, it
is easy to see that the Ricci tensor remains invariant. So Rij 3
= ak2 gij , and we see that a
very large a can be compatible with observations.
Having described the situation at fixed time, let us see what happens with time evolu-
tion. We can avoid spoiling the symmetries we mentioned by not mixing time with space.
Moreover, we can make the a we just introduced a function of time. This leads to the
Friedmann–Lemaı̂tre–Robertson–Walker (FLRW) metric:

ds2 = −dt2 + a2 (t)ds23 , (7.2.6)

where ds23 = gij3 dxi dxj is any of the maximally symmetric spaces we discussed — in
particular enjoying the properties (7.2.4) and (7.2.5). The variables we are using here
are a bit different from the ones in section 7.1, because of all the symmetries we have
imposed. We have set N to 1; this can always be done by changing the coordinate t. We
have set N i = 0 because it would violate the isotropy condition. Also, we have redefined
γij = a2 gij3 , with the idea that all the time dependence goes in a, and that the new gij3 is
time-independent.
Just like we did in the context of black holes, we have to compute the Ricci tensor
of (7.2.6); the dependence on t plays the role that was previously played by r. The
Levi-Civita connection reads

Γijk = Γijk3 , Γij0 = δji , Γ0jk = aȧgij3 . (7.2.7)
a
where Γijk3 denote the connection of the three-dimensional metric γij , and (˙) = ∂t . From
this it is easy to compute the Ricci tensor

3 ä
Rij = Rij + (2ȧ2 + aä)gij3 , R00 = −3 . (7.2.8)
a
This would be enough to study the evolution of the metric in the absence of matter. It
is more interesting and more realistic, however, to also include the effect of matter. Since
we are working at very large scales, we should impose again homogeneity and isotropy.
This leads to
T00 = ρ , Tij = pγij = pa2 gij3 . (7.2.9)

92
In terms of the tensors introduced in section 7.1, we can write this as

Tµν = ρnµ nν + pPµν = (ρ + p)nµ nν + pgµν , (7.2.10)

where recall that in our case N = 1, so that nµ dxµ = dt. Notice that these ρ and p are
not completely unconstrained: they have to satisfy the continuity equation (5.1.4). Using
(7.2.7) we get

ρ̇ = −3 (ρ + p) (7.2.11)
a
for the ν = 0 component of (5.1.4). The ν = i component is automatic.
We now have all the elements to evaluate the Einstein equations; it is convenient to
use the form (5.1.20). The 00 component gives
ä 4πG
=− (ρ + 3p) . (7.2.12)
a 3
The ij components are all equal, in the sense that those equations are all multiplied by
gij3 . The overall factor is
k + ȧ2 8πG
= ρ. (7.2.13)
a2 3
We see once again an equation with first derivatives only. In fact one can check that in
absence of matter (when ρ = 0) this is simply (7.1.7).
Since (7.2.13) can then be thought of as a constraint, one should check that its deriva-
tives do not give rise to extra constraints. We have not shown this in general, but we
will do so in our current example. These extra constraints might come about by taking
a derivative of (7.2.13), and substituting ä from (7.2.12). This would give rise to an-
other first order equation, which might be inconsistent with (7.2.13). In fact, deriving
(7.2.13) and using (7.2.12) gives (7.2.11)! So everything is consistent: we don’t get a new
constraint.
From a practical standpoint, this also means that we can ignore (7.2.12) and use simply
(7.2.13) and (7.2.11), which are both first-order and easier to solve. These are collectively
called Friedmann equations. Sometimes one also defines the Hubble parameter

H≡ . (7.2.14)
a
To study the Friedmann equations (7.2.11), (7.2.13), we need to know a little more
about ρ and p. For different types of energy, there are different types of relations between
them, but these relations are usually linear, so that it’s a good idea to define w ≡ ρp .
(7.2.11) then becomes ρρ̇ = −3(1 + w) ȧa , which is solved by

ρ ∝ a−3(1+w) . (7.2.15)

93
There are several components to the current energy density. First of all we have matter.
At the present moment, it is distribute with such small density that the pressure can be
ignored, so that
p=0, w=0, ρm = ρm0 a−3 . (7.2.16)
For massless particles, things are different. We can see this for photons: the stress-energy
tensor (6.7.2) has T = 0. On the other hand, from (7.2.9) or (7.2.10) we have T = −ρ+3p.
So in this case we have
1 1
p= ρ, w= , ρr = ρr0 a−4 ; (7.2.17)
3 3
the subscript r stands for “radiation”. Finally, it is time to also consider the cosmological
constant term Λ that we considered in (5.5.10); we will soon see why it deserves its name.
It modifies the equations of motion as follows:
1
Rµν − gµν R + Λgµν = 8πGTµν . (7.2.18)
2
Alternatively, the term with Λ can be considered as yet another contribution to the right-
1
hand side, defining TΛ µν = − 8πG Λgµν . Comparing with (7.2.10) we see that Λ contributes

p = −ρ , w = −1 , ρΛ = ρΛ0 = const . (7.2.19)

Perhaps counterintuitively, this type of contribution to the energy density stays constant
even as the Universe expands (that is, as a(t) increases).
The total energy density ρ is simply the sum of the contributions (7.2.16), (7.2.17)
and (7.2.19). The various integration constants are the value of each contribution today.
a(t) can then be determined from (7.2.13).
All these constants have been determined by observations. First of all, it turns out
that the term ak2 is very nearly zero right now; in other words, from (7.2.13) we see

3H02
ρ= ≡ ρc , (7.2.20)
8πG

where H0 ≡ H(t = 0) is the value of H = a
right now. Moreover, the observed values of
the integration constants are

ρΛ0 ∼ .7ρc , ρm0 ∼ .3ρc , ρr0 ∼ 10−5 ρc . (7.2.21)

The solution for a(t) coming from (7.2.13) for these parameters is shown in figure 25.
In fact, the values (7.2.21) were obtained by fitting various observations for a at various
times in the past with the theoretical results discussed in this section.

94
a(t)

13.8 10 t
10

Figure 25: The evolution of a(t) for our universe, with t = 0 being now. We see the
beginning of the Universe in the past, and acceleration in the future.

Perhaps surprisingly, ρΛ is the dominant contribution to the energy density today. It


is for this reason that the Universe is accelerating, as one can see from the future part of
figure 25. To understand this, consider for a moment the case where only Λ is present.
(7.2.13) becomes
Λ
ȧ2 = a2 − k (7.2.22)
3
which has a solution
r
1 t/L 2 −t/L 3
adS = (e + kL e ), L≡ . (7.2.23)
2L Λ
The various possibilities k = (0, 1, −1) in fact all correspond to the same space, called de
Sitter space. It can be described as a hyperboloid (a bit like hyperbolic space, but with
time), which for different values of k is “sliced” in different ways. For k = 0, for example,
this is simply an exponential behavior; we see the beginning of this in the future of figure
25. We will see this in more detail in the next subsection.
Going back to our Universe, the radiation part in (7.2.21) is minuscule today, but
(7.2.17) shows that it was dominant at some point in the past. Even earlier, it is thought
that the Universe underwent a time where the cosmological constant was much higher
than today (perhaps because of some phase transition), which accelerated the Universe
much faster than it is accelerating today, in a period of rapid expansion called inflation.
Of course the detailed study of the history of the Universe is a fascinating subject, which
would be well worth of course of its own.

95
7.3 de Sitter space
We now discuss in more detail de Sitter (dS) space, introduced in the previous subsection.
This is the behavior of the Universe at large time, t → +∞: since the matter and radiation
components decrease as a−3 and a−4 respectively, in this limit only the contribution of Λ
remains, and we have to solve (7.2.22), whose solution is (7.2.23).
We claim dS is a hyperboloid in R5 , described by the equation22
4
X
−X02 + Xi2 = L2 , (7.3.1)
i=1

with the metric


4
X
ds2dS4 = −dX02 + dXi2 , (7.3.2)
i=1

To see this, we can parameterize (7.3.1) by taking

X0 = L sinh(t/L) , Xi = L cosh(t/L)X̂i (7.3.3)


P4
where i=1 X̂i2 = 1 parameterize a three-sphere S 3 . Now (7.3.2) becomes

ds2dS4 = −dt2 + cosh2 (t/L)ds2S 3 . (7.3.4)

This is indeed of the form (7.2.6), with ds23 = ds2S 3 (and hence k = 1) and a given by
(7.2.23).
Many other coordinate systems can be obtained in a similar way, by grouping the
squares in (7.3.1) in different ways. For example we can group

X02 − X12 − X22 − X32 = L2 sinh2 (t/L) , X4 = L cosh(t/L) , (7.3.5)

which obviously satisfies (7.3.1). So this time the constant-time loci are themselves hy-
perboloids X02 − X12 − X22 − X32 = λ2 ; we can now parameterize these with

X0 = λ cosh r , Xi = λ sinh rxi (7.3.6)


P3 2 2 2
where i=1 xi = 1 describes a two-sphere S . Now, at fixed λ the metric −dX0 +
dX12 + dX22 + dX32 with the embedding coordinates (7.3.6) gives the metric of Euclidean
hyperbolic three-space (7.2.3); and the metric (7.3.2) with the embedding (7.3.5) gives

ds2dS4 = −dt2 + L2 sinh2 (t/L)ds2H3 . (7.3.7)


22
One should be careful about the sign of the right-hand side: if we take −L2 , we obtain a Euclidean
space, which happens to be an analytic continuation of the anti-de Sitter space we will describe in next
subsection.

96
This is again of the form (7.2.6), with ds23 = ds2H3 (hence k = −1) and a given by (7.2.23).
Reproducing the k = 0 case in (7.2.23) is harder. We introduce

X0 ± X4 = Lu± , Xi = Lu+ xi , (7.3.8)

where now xi are unconstrained. (7.3.1) becomes

u− = xi xi u+ − u−1
+ . (7.3.9)

Taking u+ = et/L , (7.3.2) becomes

ds2dS4 = −dt2 + e2t/L ds2S 3 , (7.3.10)

once again of the form (7.2.6), with ds23 = ds2R3 (k = 0), a given by (7.2.23).
We show a lower-dimensional analogue of these three ways of slicing dS4 in figure 26.
The lines on the hyperboloid represent constant-t loci; they are circles, hyperbolas and
parabolas, for the k = 1, −1, 0 cases, respectively. They are obtained by intersecting the
hyperboloid with different planes.

Figure 26: Three-dimensional hyperboloids analogous to (7.3.1), with constant-t loci cor-
responding to the coordinate systems in (7.3.4), (7.3.7), (7.3.10) respectively.

Given how we obtained dS from (7.3.1), it is clear that it inherits the symmetries of
that equation: all linear transformations in R5 that leave (7.3.1) lead to isometries. We
5
can write (7.3.1) as ηµν X µ X ν , with µ, ν = 0, . . . , 4 and η 5 of signature (−1, 1, 1, 1, 1). So
these transformations form the group

SO(4, 1) (7.3.11)

97
which is then the symmetry group of dS4 . So the space has 10 Killing spinor, which is
the maximum number one can have in d = 4 (since d(d+1)2
= 10). So dS is a maximally
symmetric spacetime, and it obeys a property similar to (7.2.4):
Λ
Rµνρσ = (gµρ gνσ − gµσ gνρ ) (7.3.12)
3
3
from which Rµν = Lgµν , which is equivalent to (7.2.18) for Tµν = 0 and Λ = L2
; the latter
is in agreement with (7.2.23).
To draw a Penrose diagram for de Sitter space, we can start from the coordinates in
(7.3.4), which cover all of it. If we define a coordinate η via
1
cosh(t/L) = , (7.3.13)
cos η
(7.3.4) turns into
L2
ds2dS4 = 2 2

−dη + ds S 3 . (7.3.14)
cos2 η
Recall that ds2S 3 = dρ2 + sin2 ρds2S 2 . Ignoring the S 2 as usual, the coordinates η ∈
[−π/2, π/2] and ρ ∈ [0, π] span a square. The vertical sides of the square don’t represent
a boundary: they are the loci ρ = 0, π, or in other words the poles of the S 3 . The
horizontal sides, on the other hand, represent the infinite past and future η → ±∞.
This square is shown in figure 27 on the left. The other two squares are copies of
the same Penrose diagram, but showing in green the regions covered by the hyperbolic
(7.3.7) and flat (7.3.10) coordinate systems respectively. Looking at the hyperboloids in
figure 26, we can imagine straightening them into finite cylinders and turning them ninety
degrees to the left; one can then see the green regions of figure 27 reflected in the regions
covered by the constant-t lines in figure 26.
There is another important coordinate system, where the metric is not of the form
(7.2.6) but is static. To obtain it, group −X02 + X42 = L2 sin2 ψ and X12 + X22 + X32 =
L2 cos2 ψ. Now we can complete the parameterization as

X0 = L sin ψ sinh T , X4 = L sin ψ cosh T , Xi = L cos ψxi (7.3.15)


P3
where i=1 x2i = 1. Now (7.3.1) becomes

ds2dS4 = L2 dψ 2 − sin2 ψdT 2 + cos2 ψds2S 2 .



(7.3.16)

This is static: no metric component depends on t, and there are no components mixing t
with a spatial coordinate. We can also redefine t = LT , r = L cos ψ, getting
r2 dr2
 
dsdS4 = − 1 − 2 dt2 +
2
r2
+ r2 ds2S 2 . (7.3.17)
L 1 − L2

98
Figure 27: The Penrose diagram of de Sitter space is a square. We see here in green the
loci covered by the coordinate systems in (7.3.4), (7.3.7), (7.3.10) respectively.

This is of the “Schwarzschild” form we saw in section 6.2:

−e2U dt2 + e−2U dr2 + r2 ds2S 2 (7.3.18)


2
with e2U = 1− Lr 2 . Indeed we could have found this similar to how we found Schwarzschild,
by imposing rotation invariance (which is part of invariance under (7.3.11)) and including a
cosmological term in the Einstein equations, as in (7.2.18) (with Tµν = 0). This coordinate
system again doesn’t cover all of dS, but only the region in green in figure 28; it is called
“static patch”.

Figure 28: On the left, constant-t loci for dS in coordinates (7.3.17); on the right, region
covered by those coordinates.

99
Given that it is static, we can also look at
r
r2
Vgrav = meU = 1− . (7.3.19)
L2
We see that it is repulsive. So objects tend to get away from r = 0. Moreover, in the
region r > L we have gtt > 0, grr < 0, just like inside the horizon of a black hole.
Moreover, r = 0 is not a special locus: it is the southern pole of the S 3 , but we can use
the SO(4, 1) symmetry to put any other point at r = 0.
So in de Sitter space any observer is surrounded by a horizon! This represents the fact
that at some point object start getting away from each other very fast — so much so that
at some point they cannot reach each other with a message any more. The final fate of
any observer is complete loneliness.

7.4 Anti-de Sitter space


Anti-de Sitter space (AdS) is obtained similarly by considering the hyperboloid
3
X
X02 + X42 − Xi2 = L2 (7.4.1)
i=1

with metric
3
X
ds2AdS4 = −dX02 − dX42 + dXi2 . (7.4.2)
i=1

The symmetry group is now


SO(3, 2) (7.4.3)
which is again ten-dimensional; so this is also maximally symmetric, and again it satisfies
(7.3.12), this time with Λ = − L32 .
A set of coordinates covering the whole space can be obtained by parameterizing

X0 = L cosh ρ cos τ , X4 = L cosh ρ sin τ , Xi = L sinh ρxi (7.4.4)


P3
where i=1 x2i = 1. Then (7.4.2) gives

ds2AdS4 = L2 − cosh2 ρdτ 2 + dρ2 + sinh2 ρds2S 2 .



(7.4.5)

A constant-τ slice is nothing but hyperbolic space (7.2.3). Here τ would be limited to be
in [0, 2π], but one usually “unwraps” it and takes

τ ∈ R. (7.4.6)

100
Figure 29: AdS as a hyperboloid.

We can also put (7.4.5) in a “Schwarzschild-like” form by redefining t = Lτ , r =


L sinh ρ:
r2 dr2
 
2
dsAdS4 = − 1 + 2 dt2 + r
2 2
2  + r dsS 2 . (7.4.7)
L 1 + L2
Alternatively we can introduce tan θ = sinh ρ, which gives

L2
ds2AdS4 = −dτ 2 + dθ2 + sin2 θds2S 2 .

2
(7.4.8)
cos θ
Now θ ∈ [0, π/2]. So the constant-t time slice is now realized as a ball in R3 , of radius
π/2.23 Since (7.4.8) has radial light rays that propagate at ±π/4 degrees, forgetting the
S 2 we can obtain the Penrose diagram of S 2 , which is an infinite vertical strip, with a
vertical line on the left representing r = 0 and a vertical boundary on the right representing
r = ∞, an actual space infinity (unlike in the dS case, where neither vertical line was a
boundary). Reinstating the S 2 , we can imagine AdS as an infinite vertical cylinder.
Moreover, since all these are static, it is useful to look at the gravitational potential
r
L2 Lm
Vgrav = Lm cosh ρ = m 1 + 2 = . (7.4.9)
r cos θ
A graph of this in the θ coordinate (forgetting the ϕ of S 2 ) is given in figure 30. We
see that the potential tends to make all objects “fall” towards the center. In fact the
isometries (7.4.3) are such that any point can be taken to be in the center. What happens
is that if from any point we shoot objects in all direction, and they evolve following their
geodesics, they refocus at the same space point, in a time ∆t = 2π.
23
There is a famous two-dimensional analogue of this: the Poincaré metric on a disk ⊂ R2 . MC Escher
has given several interesting pictorial representations of this space. See also the book [9] for a nice visual
introduction to hyperbolic geometry.

101
Figure 30: Gravitational potential of AdS. The radial coordinate is θ in (7.4.8).

(7.4.5) is not of FLRW form (7.2.6). If we want to put AdS in FLRW form, it is also
possible, but in fewer ways. This can be seen already from (7.2.22): since now Λ < 0,
the right-hand side is negative for both k = 0 and k = −1, while the left-hand side is
positive. So these two cases are not possible. The case k = 1 can be realized by taking

X0 = L sin t cosh r , X4 = L cos t , Xi = L sin t sinh rxi , (7.4.10)


P3
where i=1 x2i = 1. Then (7.4.2) gives

ds2AdS4 = L2 −dt2 + sin2 tds2H3 .



(7.4.11)

This is of the form (7.2.6) once one rescales t and takes a = L sin t; this is formally in
agreement with (7.2.23) once one takes into account that Λ is now negative.
Finally, there is also an embedding similar to (7.3.8) that turns the metric into

ds2AdS4 = L2 dr2 + e2r ds2Mink3 .



(7.4.12)

These are called Poincaré coordinates.

8 Gravitational waves
We will now look at one last striking consequence of general relativity: the existence of
gravitational waves.
These are small perturbations around a metric. For simplicity we will work in vacuum
(Tµν = 0) and consider perturbations hµν around the flat metric, as in (5.1.1): gµν ∼
ηµν + hµν . In this weak-gravity regime, the Ricci tensor has the expression (5.1.8). Back

102
when we motivated the Einstein equations, we only looked at the 00 component of the
equations of motion; now we need all components. We have
1 1 1
Rµν − gµν R = − □h̄µν + ∂ ρ ∂(µ h̄ν)ρ − ηµν ∂ ρ ∂ σ h̄ρσ (8.0.1)
2 2 2
where
1
h̄µν = hµν − hηµν , h ≡ hµµ = η µν hµν . (8.0.2)
2
As we always did when working in the weak-gravity regime (5.1.1), we raise and lower
indices with the flat metric ηµν ; in particular, □ ≡ η µν ∂µ ∂ν is the flat-space d’Alembertian.
h̄µν is called the “trace-reversed” metric perturbation, because its trace h̄ = −h. One
could also avoid introducing this new object: if we are working with Tµν , the equations
of motion are in fact Rµν = 0, and we could have used the old (5.1.8) instead of (8.0.1).
But in fact (8.0.2) does simplify a bit the analysis that follows; moreover, it can be then
more easily generalized to situations where Tµν ̸= 0.
Just like for electromagnetism, writing down the equations of motion is only part of
the story. In that case, the equations of motion in vacuum are □Aµ − ∂µ (∂ν Aν ) = 0,
but there is also a gauge transformation Aµ → Aν + ∂ν λ. The latter can be used to
set ∂µ Aµ to zero, so that the equation of motion simplifies to □Aµ = 0, which can be
solved by going to momentum space by Fourier transform, where it reads k 2 Aµ (k). So
Aµ (k) is supported on k 2 = 0, and moreover k µ Aµ = 0. So at this point, for any k, the
vector Aµ (k) is orthogonal to it, and thus can point in three directions, one of which is
k itself. (k is orthogonal to itself, since k 2 = 0.) But we have not fixed gauge invariance
completely: if one picks λ such that □λ = 0, then ∂µ Aµ remains invariant. In momentum
space, Aµ → Aµ + kµ λ, and we see that we can remove the component of Aµ parallel to
kµ . This leaves us with two polarizations.
As we know, gravity is also a gauge theory, with coordinate changes as its gauge trans-
formations. So we will now try to perform a similar analysis as with electromagnetism.
The infinitesimal action of a coordinate change consists of Lie derivatives: δgµν = (Lξ g)µν ,
for ξ a vector field. Applying the formula (3.5.9) of a Lie derivative to gµν = ηµν + hµν
and keeping the lowest-order term, we see

δhµν = ηµν + 2∂(µ ξν) . (8.0.3)

From this we get h → h + 2∂ µ ξµ , and hence

δ h̄µν = ηµν + 2∂(µ ξν) − ηµν ∂ ρ ξρ . (8.0.4)

103
This might look more complicated than (8.0.3), but in fact it has an advantage. If we
consider
∂ µ h̄µν → ∂ µ h̄µν + □ξν , (8.0.5)
thanks to a cancellation of two terms. Now, if ∂ µ h̄µν ̸= 0, we can take ξν (k) = − k12 k µ hµν (k);
this makes the right-hand side of (8.0.5) vanish, and so ∂ µ h̄′µν = 0. In other words, we
can always choose the gauge
∂ µ h̄µν = 0 . (8.0.6)
(The transformation of ∂ µ hµν is slightly more complicated, but we could have also used
gauge transformations to set this one to zero. It will soon make no difference.) This does
not fix gauge invariance completely: as long as we take ξν such that □ξν = 0, we won’t
spoil the condition (8.0.6) we have achieved. In any case, now the equation of motion,
which consists in setting (8.0.1) to zero, simplifies to

□hµν = 0 . (8.0.7)

So far we have been general, but to achieve further progress we now focus on a single
“monocromatic” wave,
µ
h̄µν = Re h0µν eikµ x , (8.0.8)
(8.0.7) and (8.0.6) give
k2 = 0 , k µ h0µν = 0 . (8.0.9)

The trace h0 now transforms h0 → h0 + 2k · ξ. Again, if h0 ̸= 0, we can pick ξµ such


that k · ξ = −h0 . So we can assume

h0 = (h0 )µ µ = 0 . (8.0.10)

Since h̄µν only differs from hµν by a trace, from now on h̄µν = hµν . Still, if we gauge
transform by a ξ such that k · ξ = 0, we won’t spoil (8.0.10); so we have three more gauge
parameters for every choice of momentum k.
To fix the residual gauge invariance, choose now a timelike vector uµ such that k·u = 1.
Now

uµ h0µν → uµ (h0µν + 2k(µ ξν) ) = uµ h0µν + M µ ν ξµ , M µ ν ≡ δνµ + uµ kν . (8.0.11)

The matrix M is non-degenerate: on the space k ⊥ of vectors v orthogonal to k, it is the


identity (M µ ν v ν = v µ ); on u it has eigenvalue 2 (M µ ν uν = 2uµ ); and moreover, by our
assumptions u and k ⊥ span R4 . So we can now pick ξρ = −(M −1 )ν ρ uµ h0µν , and set the

104
right-hand side of (8.0.11) to zero. Notice that such a ξ is orthogonal to k, since M is
the identity on k ⊥ , to which k belongs, and hµν k ν = 0; so we don’t spoil (8.0.10). In
conclusion, we can also set
uµ h0µν = 0 . (8.0.12)

To summarize, we have fixed our gauge invariance by setting (8.0.6) (or (8.0.9)),
(8.0.10), and (8.0.12). This is called the tranverse traceless (TT) gauge.
To see more concretely what it means, take for example u = (1, 0, 0, 0), and split the
indices µ into 0 and i. Then the TT gauge becomes

h00ν = 0 , ki h0ij = 0 , h0ii = 0 . (8.0.13)

Even more concretely we can pick for example k = (1, 1, 0, 0); then
 
0 0 0 0
 0 0 0 0 
h0µν =  (8.0.14)
 

 0 0 h+ h× 
0 0 h× −h+
where h+ and h× are two arbitrary complex numbers. So there are two possible polar-
izations, just like for electromagnetism. This could also be predicted by looking at the
theory of unitary representations of the Poincaré group.
To see what these waves represent physically, we can also look at their effect on some
matter. We will take some particles and analyze the effect of an incoming wave. We
take the coordinate system so that one of the particles is at the origin, and at rest. The
position of a nearby particle is given by a deviation vector dµ , which is governed by the
geodesic deviation equation (4.8.5), with U µ = (1, 0, 0, 0). We then get

d¨i = Ri00j dj . (8.0.15)

Moreover, from (5.1.7)


1
Ri00j = ∂0 ∂[0 hj]i − ∂i ∂[0 hj]0 = ḧij . (8.0.16)
2
So
1
d¨i = ḧij dj . (8.0.17)
2
Since hij is small, we can solve this as
1
di ∼ d0i + hij d0j . (8.0.18)
2
So the waves displace nearby particles in a way dictated by the polarization vector.

105
Acknowledgments
Thanks to all the students who have pointed out errors in these notes; in particular to
M. Sacchi and S. Liguori.

A Forms∗
As we saw in section 3.4, k-forms are antisymmetric (0, k) tensors. In this appendix we
explore a bit further their properties, in the “intrinsic notation” that we used for example
when writing a 1-form as ω = ωµ dxµ , and that we saw for a two-form in (3.4.12).
In general, a k-form can be expanded as
1
ω= ωµ ...µ dxµ1 ∧ . . . ∧ dxµk . (A.0.1)
k! 1 k
The k! is consistent with (3.4.12), and it is useful in order to avoid repetitions. For
example, (3.4.12) reads explicitly F = 12 (F01 dx0 ∧dx1 +F10 dx1 ∧dx0 . . .) = F01 dx0 ∧dx1 +. . .
The antisymmetrized derivative that we saw for example in (3.4.10) is implemented by
introducing
d ≡ dxµ ∂µ , (A.0.2)
so that we have
1 1
dωk = ∂µ ωµ1 ...µk dxµ ∧ dxµ1 . . . dxµk = ∂[µ ωµ1 ...µk ] dxµ ∧ dxµ1 . . . dxµk . (A.0.3)
k! k!
Notice that the antisymmetrization in the last step can be inserted for free. Of course d
takes a k-form to a (k + 1)-form.
Another important operator is the contraction ιµ , which takes a k-form to a (k − 1)-
form.
ιµ dxµ1 . . . ∧ . . . dxµk = kδµ[µ1 dxµ2 ∧ . . . ∧ dxµk ] . (A.0.4)
So for example ι0 dx0 ∧ dx1 = dx1 . One then defines ιv ≡ v µ ιµ . If we apply this to a
k-form such as (A.0.1), we get
1
ιv ω = v µ ωµµ2 ...µk dxµ2 ∧ . . . ∧ dxµk . (A.0.5)
(k − 1)!
The Lie derivative of a form is particularly nice in this formalism. Playing with the
definition (3.5.6), it turns out that

Lv = dιv + ιv d (A.0.6)

on a form of any degree k. This is called Cartan’s magic formula.

106
B Spinors∗
All matter particles are actually described by spinors, rather than by tensors. In this
section we are going to see how to describe spinors on a curved spacetime.
First let us quickly recall what they are in Minkowski space (for more details see for
example [3, Sec. 4.10.2]). If one introduces matrices γ µ such that

{γ µ , γ ν } = 2η µν , (B.0.1)

it turns out that − 12 γ µν , where γ µν ≡ γ [µ γ ν] , give a representation of the Lorentz algebra,


called the spinor representation. The elements of the vector space on which the γ µ act
are called spinors. Thus under a Lorentz transformation (exp ω)µν , where ω is a matrix
in the Lie algebra of the Lorentz group, a spinor ψ transforms as ψ → exp[− 21 ωµν γ µν ].
The γ µ can be taken to be 4 × 4 matrices; every choice of these matrices is equivalent, up
to change of basis. Thus spinors have four components.
Now let us try to understand what spinors should be in curved spacetime. It is natural
to extend (B.0.1) by saying that now

{γ µ , γ ν } = 2g µν . (B.0.2)

To find such matrices, there is an easy strategy. We can define a basis of vectors, called
vierbein,24 Ea ≡ Eaµ ∂µ , a = 0, . . . , 3, which are orthonormal, in the sense that

Ea · Eb = Eaµ Ebν gµν = ηab , (B.0.3)

where ηab = diag(−1, 1, 1, 1) is the old flat metric. (Because of this, the indices a, b. . . are
sometimes called “flat”, while the usual indices µ, ν. . . are called “curved”.) Now we can
take a choice of flat space gamma matrices, and call them γ a , where {γ a , γ b } = η ab . It
then follows that γ µ ≡ Eaµ γ a satisfies (B.0.2).
Spinors transform under Lorentz transformations, and thus they also transform under
diffeomorphisms. This creates an issue similar to the one we saw for tensors: their deriva-
tives ∂µ ψ will not transform well. Moreover, we now have a new issue. Given a choice of
vierbein Ea , one can find a new one by a local Lorentz transformation

Ea → Λa b Eb , (B.0.4)
24
This simply means “four-legged” in German. Another popular name is tetrad ; in the mathematical
literature one also more seldom sees soldering form. In dimensions different from four, one can use
vielbein, which means “multi-legged”.

107
where Λ is simply a point-dependent element of the Lorentz group. This changes the
gamma matrices γ µ as well. Any equation where gamma matrices act on spinors (such
as the Dirac equation) is now in danger of becoming ambiguous, unless we declare that
under (B.0.4) spinors themselves have to transform as ψ → exp(ωab γ ab )ψ, where again
Λ = exp(ω).
We thus have to introduce now a derivative that is covariant under both diffeomor-
phisms and local Lorentz transformations. The key is to learn how to perform covariant
derivatives directly on the vectors Ea . Since they are a basis, the covariant derivative of
one of them can be expanded as their sum with some coefficients:

∇µ Ea ≡ −ωµa b Eb . (B.0.5)

This is similar to (4.2.10). We defined the coefficients ω with a minus sign because, even
though they are vectors, they have a lower index. This is a matter of convention, of course.
Since the Ea are in turn linear combinations of the vectors ∂µ , we can also reexpress these
new coefficients ω in terms of the Γ’s:

∇µ (Eaµ ∂µ ) = (∂µ Eaν )∂ν + Eaν ∇µ ∂ν = (∂µ Eaν )∂ν + Eaν Γρµν ∂ρ , (B.0.6)

which together with (B.0.5) gives

∂µ Eaν + Γρµν Eaν − ωµa b Ebν = 0 . (B.0.7)

If one wants, one can see the left-hand side as a covariant derivative acting both on the
a index and on the ν index.
One can similarly introduce a basis of one-forms ea = eaµ dxµ ≡ (η ab Ebν gνµ )dxµ . The
are dual to the Ea , in the sense that

ea · Eb = eaµ Ebµ = η ac Ecρ gρµ Ebµ = η ac ηcb = δba . (B.0.8)

It is easy to check that they are also orthonormal, in the sense that

ea · eb = eaµ g µν ebν = η ab . (B.0.9)

Just as for (B.0.5), we can introduce a covariant derivative for the one-forms ea . The
coefficient need to be related to the (B.0.5):

∇µ ea = −ωµb
a b
e . (B.0.10)

Indeed with this definition and using (B.0.5) one can see that ∇µ (ea · Eb ) = ∇µ (δba ) = 0,
as it should. Compatibility with (4.3.4) gives this time

∇µ eaν − Γρµν eaρ + ωµb


a b
eν = 0 . (B.0.11)

108
It is interesting to antisymmetrize this equation in µ and ν. The Γ term cancels out
because of the vanishing torsion condition. In form notation we obtain

dea + ω a b ∧ eb = 0 . (B.0.12)

Moreover, taking a covariant derivative of (B.0.9) gives

0 = ∇µ (ea · eb ) = ωµc
a c
e · eb + ea · (ωµc
b c a cb
e ) = ωµc η + η ac ωµc
b
. (B.0.13)

Defining ω ac ≡ ω a b η bc , this means that

ω ab = −ω ba ; (B.0.14)

namely, ω ab (with both indices up) is antisymmetric.


Now we can give the covariant derivative of a spinor as
1
∇µ ψ ≡ ∂µ ψ + ωµab γab ψ . (B.0.15)
4
Although we will not show it here, this definition is covariant with respect both to diffeo-
morphisms and to local Lorentz transformations. It is also compatible with the definitions
for vectors, in the sense that if from a spinor one defines a bilinear J µ ≡ ψ̄γ µ ψ (which
physically can be a current, if the spinors are coupled to a gauge field), the covariant
derivative of Jµ as a vector coincides with what one would obtain from (B.0.15) and
(B.0.11).
Since the ω simply represent the Levi-Civita connection in a different basis, it should
not come as a surprise that one can obtain the Riemann tensor from them as well:
1
dω ab + ω ac ∧ ωc b = Rab ≡ Rab µν dxµ ∧ dxν , (B.0.16)
2
where naturally Rab µν ≡ eaρ ebσ Rρσ µν . Together, (B.0.12) and (B.0.16) are called Car-
tan’s structure equations. Sometimes (B.0.16) is a good strategy to obtain the Riemann
tensor.25
Quite reasonably, one also has an analogue of (4.5.4) and (4.6.6):
1
∇[µ ∇ν] ψ = Rab µν γab ψ . (B.0.19)
4
25
A possible (but not always convenient) way of extracting the ω ab is to introduce the anholonomy
a
coefficients Cbc via
1
da + C a bc eb ∧ ec = 0 ; (B.0.17)
2
one then has  
a ac d 1
ωµb = η eµ Cbdc + C(dc)b . (B.0.18)
2

109
Exercises
1. Find the geodesics of the metric (3.8.4) on the two-sphere S 2 , by minimizing the
R
length dℓ. (This is similar to our treatment in section 4.7, but with some differ-
ences due to the different signature of the metric.)

2. In a similar fashion, compute the geodesics of the Poincaré metric


1
ds2H = (dx2 + dy 2 ) (B.0.20)
y2
on the half-plane H = {y ≥ 0} ⊂ R2 . (There is a trick to solve the differential
equation, but at this point you can claim success if you have derived the equations
and if you have found at least some solutions.)

3. We have seen that the ℓi in (6.1.8) are Killing vectors of the metric (3.8.4) on S 2 .
Consider now the vectors
sin ϕ cos ϕ
k1 = cos θ cos ϕ ∂θ − ∂ϕ , k2 = cos θ sin ϕ ∂θ + k3 = − sin θ∂θ .
∂ϕ ,
sin θ sin θ
(B.0.21)
2
Show that the Lie derivatives Lki gµν of the metric on S are proportional to gµν
itself. (Vectors with this property are called conformal Killing vectors.)

4. Compute the Lie brackets [ki , kj ] for the vectors in (B.0.21) and the Lie brackets
[ki , ℓj ] with the (6.1.8). What Lie algebra do the ki and ℓi generate?

5. Find some Killing vectors for the metric (B.0.20).

6. Compute the Riemann and Ricci tensors for the metric (B.0.20). Check that it has
the properties (7.2.4), (7.2.5) for a certain k. Do the same for the S 2 metric (3.8.4).
Can (B.0.20) be put in the form (3.8.4) by a change of coordinates?

7. [Palatini formalism.] Assume Γµνρ has zero torsion, but not necessarily the Levi-
Civita connection. Define the curvature of Γ via the usual formula (4.5.3). We can
R √
now write the action d4 x −gg µν Rµν (Γ), which contains the two fields g and Γ.
Show that the equations of motion of Γ restrict it to be the Levi-Civita connection.

8. Check that the components of the electromagnetic stress-energy (6.7.2) reproduce


in flat space the energy density and Poynting vector that you learned in your EM
class. Rewrite the conservation equation (5.1.4) in these terms.
R √
9. Derive the stress-energy tensor (5.3.12) for a free massless scalar (S = d4 x −gg µν ∂µ ϕ∂ν ϕ.)

110


d4 x −g(e−ϕ R +
R
10. Derive the equations of motion for the Brans–Dicke action
g µν ∂µ ϕ∂ν ϕ).

11. Find the metric for a spherically symmetric black hole in presence of a non-zero
cosmological constant Λ. (The equations of motion (7.2.18).) Draw the gravitational
potential Vgrav and comment on the difference between Λ > 0 and Λ < 0.

12. Set up the differential equations for a spherically symmetric black hole in presence
of a free massless scalar ϕ, using the stress-energy tensor worked out in exercise 9.

References
[1] S. Weinberg, Gravitation and Cosmology: Principles and Applications of the General
Theory of Relativity. Wiley, 1972.

[2] D. J. Griffiths, Introduction to electrodynamics. Prentice Hall, 1999.

[3] A. Tomasiello, “Group theory.”


https://dl.dropboxusercontent.com/u/9571828/mathphys.pdf.

[4] S. M. Carroll, Spacetime and geometry. An introduction to general relativity. Addison–


Wesley, 2004.

[5] S. Hawking and G. Ellis, The large scale structure of space-time, vol. 1. Cambridge
university press, 1975.

[6] H. Stephani, D. Kramer, M. MacCallum, C. Hoenselaers, and E. Herlt, Exact Solutions


of Einstein’s Field Equations. Cambridge University Press, 2003.

[7] C. W. Misner, K. S. Thorne, and J. A. Wheeler, Gravitation. Macmillan, 1973.

[8] R. M. Wald, General relativity. University of Chicago press, 2010.

[9] D. Mumford, C. Series, and D. Wright, Indra’s pearls: The vision of Felix Klein.
Cambridge University Press, 2002.

111

You might also like