100% found this document useful (1 vote)
187 views224 pages

Heinzle. Introduction To Relaivity and Cosmology PDF

The document introduces the concepts of Galilean space and time, establishing that space is modeled as an affine space over a three-dimensional Euclidean vector space. It discusses the principle of relativity and defines Galilean transformations, which are coordinate transformations that maintain the form of Newtonian physics. Additionally, the document touches on the Euler equations that describe the dynamics of inviscid fluids.

Uploaded by

David Prieto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
187 views224 pages

Heinzle. Introduction To Relaivity and Cosmology PDF

The document introduces the concepts of Galilean space and time, establishing that space is modeled as an affine space over a three-dimensional Euclidean vector space. It discusses the principle of relativity and defines Galilean transformations, which are coordinate transformations that maintain the form of Newtonian physics. Additionally, the document touches on the Euler equations that describe the dynamics of inviscid fluids.

Uploaded by

David Prieto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 224

Introduction to

Relativity and
Cosmology I

J. Mark Heinzle
Gravitational Physics, Faculty of Physics
University of Vienna

Version 20/01/2010
–2– version 20/01/2010
CHAPTER 1

AETHER

1.1 Galilean space (. . . and time)

What is space? Which mathematical concept provides the best description for
what we experience as our spatial world?

This question is neither naïve nor trivial. There exists a multitude of concepts
in mathematics that could apply a priori. Space could be merely a set, it could
be a topological space, maybe Hausdorff, a semi-group, a field, a manifold, a
complex manifold, a vector space, . . .

Our daily experience tells us that space is some kind of ‘continuum’ of ‘points’.
It is a matter of course for us to be able to form ‘arrows’ (‘vectors’) −
→ connect-
pq
ing points p, q in space. Most importantly, we can add vectors and multiply
vectors with scalars (∈ R). This suggests that our spatial world has a struc-
ture intimately connected with the structure of a vector space. However, it is
obvious that a vector space is not a good model for our spatial world. This is
simply because there doesn’t seem to exist a distinguished zero vector; quite
the contrary, all spatial points seem to be on an equal footing. Collecting these
intuitive observations results in the following statement:

Space is an affine space over a three-dimensional real vector space.

version 20/01/2010 –3–


Galilean space (. . . and time) Chapter 1. Aether

The affine space we live in is endowed with a geometric structure that is so


fundamental that it is easily overlooked. We can measure the length of vectors
(e.g., in meters) as well as angles between vectors. The mere existence of the
concepts of length and angles entails that the vector space underlying our affine
space is a Euclidean vector space, i.e., it carries a scalar product.

Space is an affine space over a three-dimensional Euclidean vector space.

Definition. We call this model of our spatial reality Galilean space.

Remark. All these “experiments” that we perform every day—we form vectors
connecting points, compute lengths and angles— indicate that Galilean space
is the correct mathematical description of the space of our daily experience.
Note, however, that we would come to a radically different conclusion if the
scale of our intuitive perception were different. But we think in meters and not
in megaparsecs. . .

What is time? This is a somewhat more difficult question. Let us quote Sir
Isaac Newton:

“Absolute, true and mathematical time, of itself, and from its own nature
flows equably without regard to anything external, and by another name
is called duration: relative, apparent and common time, is some sensible
and external (whether accurate or unequable) measure of duration by
the means of motion, which is commonly used instead of true time . . . ”
Summarizing, we don’t quite know what time is (at least not yet), but we have
a lot of first hand experience with it. For our present purposes this is sufficient.

In order to formulate the laws of physics it is essential to choose a coordi-


nate system.1 A coordinate system enables us to make measurements; we can
quantify position by real numbers, and we are able to perform a mathematical
analysis of our physical fields: we form derivatives, gradients, divergences and
formulate (differential) equations for our physical unknowns.

We are free to choose any curvilinear coordinate system we can come up with.2
1
Strictly speaking, a coordinate system—or chart—is simply a bijective map of R3 into
Galilean space. In the present context, however, dependence on time is permitted (so
that a coordinate system is a one-parameter family of maps of R3 into Galilean space).
2
And frequently, curvilinear coordinates are very useful. For instance, when one aims at
studying rotating bodies, spherical coordinates are often advantageous.

–4– version 20/01/2010


Chapter 1. Aether Galilean space (. . . and time)

However, among the vast number3 of possible coordinate systems, there exists
a special subfamily: the inertial frames (inertial coordinates). The existence
of these inertial frames is guaranteed by the principle of relativity:

Special principle of relativity


There exists a family of coordinate systems, which we call inertial frames
of reference, w.r.t. which the laws of nature take one and the same form. In
other words, the mathematical formulation of the laws of physics is identi-
cal for all ‘inertial observers’.

In the Galilean context, an inertial frame is thus a coordinate system of Galilean


space, for which the laws of (Newtonian) physics assume their conventional
form.

Consider, for example, Newton’s first law which describes the motion of a point
particle in the absence of exterior forces. Newton’s first law states that these
point particles move along straight lines in Galilean space. In an inertial frame
of reference, Newton’s first law is represented by the equation4
ẍi (t) = 0 .

Another law of (Newtonian) physics is the Poisson equation describing the


gravitational field of a massive body. In an inertial frame of reference we have
∆φ = ρ ,
where φ is the potential and ρ the matter density. The Laplacian ∆ is given
by
 ∂ 2  ∂ 2  ∂ 2
∆= + + = ∂12 + ∂22 + ∂32 = δij ∂i ∂j .
∂x1 ∂x2 ∂x3
In a general curvilinear coordinate system, e.g., in spherical coordinates, a
straight line does not appear as straight; consequently, the differential equation
representing Newton’s first law is different. (Of course, the same is true for the
Poisson equation.) The price we pay by not choosing an inertial coordinate
system is that the laws of physics take a more complicated form (that includes
terms that are interpreted ficticious forces).
3
The vast number is of course ∞.
4
A point particle is represented by a curve t 7→ x(t); the components of the vector x are
denoted by xi , i = 1, 2, 3. This curve is a straight line if and only if ẍ(t) = 0 (or, in
component notation, ẍi (t) = 0).

version 20/01/2010 –5–


Galilean transformations Chapter 1. Aether

1.2 Galilean transformations

Definition 1.1. A Galilean transformation is a coordinate transformation5 of


Galilean space that maps one inertial frame into another.

Taking account of the definition of an inertial frame we find an alternative


definition.

Definition 1.1′. A Galilean transformation is a coordinate transformation6 of


Galilean space that leaves invariant the laws of (Newtonian) physics.

Remark. We emphasize that the principle of relativity implies that all laws of
(Newtonian) physics are invariant under Galilean transformations.

On the basis of definition 1.1′ we are able to derive the Galilean transformations;
we choose to make use of Newton’s first law.

Suppose that (t, x), where x = (x1 , x2 , x3 ), and (t̄, x̄), where x̄ = (x̄1 , x̄2 , x̄3 ),
are inertial coordinates. By definition, the Galilean transformation mapping
the one inertial coordinate system into the other is of the form7

t̄ = t , x̄ = x̄(t, x) ; (1.1)

expressed in component form, the latter reads x̄i = x̄i (t, x1 , x2 , x3 ). In (1.1) we
assume that the origin of space and time remain unchanged; if this is not the
case, (1.1) may contain additional constants representing a spatial translation
and a temporal translation, i.e.,

t̄ = t + t̄0 , x̄ = x̄(t, x) + x̄0 . (1.1′ )

Let x(t) represent the freely moving point particle w.r.t. the first inertial
frame, i.e., ẍi (t) = 0. W.r.t. the second inertial frame (1.1), this curve reads

5
Strictly speaking, a Galilean transformation is a one-parameter family of coordinate trans-
formations of Galilean space; this is because there might be a dependence on time,
see (1.1).
6
See footnote 5.
7
It is implicit in definition 1.1′ that time remains unchanged under a Galilean transformation
(because it is a coordinate transformation ‘of Galilean space’).

–6– version 20/01/2010


Chapter 1. Aether Galilean transformations


x̄(t) = x̄ t, x(t) ; we may replace t by t̄. We obtain

∂x̄i ∂x̄i
x̄˙ i = + j ẋj ,
∂t ∂x
∂ 2 x̄i ∂ 2 x̄i j ∂ 2 x̄i ∂x̄i
¨i =
x̄ + 2 ẋ + ẋ j k
ẋ + ẍj .
∂ t2 ∂t∂xj ∂xj ∂xk ∂xj |{z}
=0

Since Newton’s law must be invariant under the transformation (because (t̄, x̄)
is assumed to be an inertial frame) we find that

∂ 2 x̄i ∂ 2 x̄i j ∂ 2 x̄i


0= + 2 ẋ + ẋj ẋk .
∂ t2 ∂t∂xj ∂xj ∂xk

Since ẋi is arbitrary, this results in

∂ ∂x̄i ∂ ∂x̄i ∂ ∂x̄i ∂ ∂x̄i


=0, =0 and =0, =0,
∂t ∂t ∂xj ∂t ∂t ∂xk ∂xj ∂xk
from which we conclude that

∂x̄i ∂x̄i
≡ −v i = const and ≡ Rik = const
∂t ∂xk
Accordingly, (1.1) becomes

t̄ = t , x̄i = Rik xk − v i t ; (1.2)

in matrix notation the latter reads x̄ = R x − v t.

The matrix R is not arbitrary. It is straightforward to prove that R ∈ O(3),


i.e., R is either a rotation (R ∈ SO(3)) or a reflection (R ∈ O(3), det R = −1).
(Note, however, that this does not follow directly from the invariance of New-
ton’s first law—it is necessary to invoke another law of physics.)
Exercise. Use the invariance of the Poisson equation under (1.2) to prove that
R ∈ O(3).

In (1.2) the origin of space and time remain unchanged; if this is not the
case, (1.2) may contain additional constants representing a spatial translation
and a temporal translation, i.e.,

t̄ = t + t̄0 , x̄i = Rik xk − v i t + x̄0 . (1.2′ )

version 20/01/2010 –7–


Galilean transformations Chapter 1. Aether

Of particular interest among the Galilean transformations (1.2′ ) are the so-
called Galilean boosts
t̄ = t , x̄i = xi − v i t . (1.3)
Galilean boosts are coordinate transformations between inertial frames that
are in uniform relative motion. Let us summarize the statement of (1.2′ ) as a
corollary.
Corollary 1.2. An (orientation-preserving) Galilean transformation is a spa-
tial translation by a constant vector, a temporal translation by a constant pa-
rameter, a rotation by a constant angle, a Galilean boost, or a combination of
these.
Remark. Also reflections are Galilean transformations; these do not, however,
preserve the orientation of the coordinates.
Remark. We see that inertial frames are those coordinate systems that are
adapted to the geometric structure of (Galilean) space (and time). Inertial
frames essentially correspond to orthonormal frames or orthonormal frames in
uniform motion.

Let us conclude this section with another example illustrating Galilean invari-
ance. Consider the equations of motion of two gravitating point particles with
masses m1 , m2 ,
x1 − x2
m1 ẍ1 = −m1 m2 , (1.4a)
|x1 − x2 |3
x2 − x1
m2 ẍ2 = −m1 m2 . (1.4b)
|x1 − x2 |3
A change of inertial frame is a Galilean transformation (1.2′ ), i.e., t̄ = t + t̄0
and
x̄1i = Rik x1k − v i t + x̄0 , x̄2i = Rik x2k − v i t + x̄0 .
Let us restrict ourselves to a Galilean boost, i.e., t̄ = t and
x̄1i = x1k − v i t , x̄2i = x2k − v i t .
It follows that
x̄˙ 1i = ẋ1i −v i , x̄˙ 2i = ẋ2i −v i , ¨1i = ẍ1i ,
x̄ ¨2i = ẍ1i ,
x̄ x̄1i − x̄2i = x1i −x2i ,
and we infer that
x̄1 − x̄2
¨1 = −m1 m2
m1 x̄ , (1.4a′ )
|x̄1 − x̄2 |3
x̄ − x̄1
¨2 = −m1 m2 2
m2 x̄ , (1.4b′ )
|x̄1 − x̄2 |3

–8– version 20/01/2010


Chapter 1. Aether The Euler equations

hence the equations are identical w.r.t. the inertial frame (t̄, x̄).
Exercise. Show the invariance of (1.4) under a general Galilean transformation.

1.3 The Euler equations

The Euler equations describe the dynamics of inviscid fluids. The equations
read

∂t ρ + ∂k ρuk = 0 , (1.5a)
i
 i k ik

∂t ρu + ∂k ρu u + pδ =0, (1.5b)
where ρ = ρ(t, x) is the density and p = p(t, x) the pressure of the fluid;
u = u(t, x) is the velocity field.

The first equation is the continuity equation and represents conservation of


mass; in index-free notation it reads
∂t ρ + ∇(ρu) = 0 .
Introducing the Lagrangian derivative (material derivative, convective deriva-
tive)
d ∂
= + u∇
dt ∂t
we may write

= −ρ∇u . (1.5a′ )
dt

The second equation, i.e., (1.5b), corresponds to Newton’s second law and
encodes the conservation of momentum. Using
(∂t ρ)ui + ρ∂t ui + ∂k (ρuk )ui + ρuk ∂k ui + ∂ i p = ρ ∂t + uk ∂k )ui + ∂ i p
we may rewrite it as
ρ ∂t + u∇)u = −∇p
or
du
ρ = −∇p . (1.5b′ )
dt

The equations (1.5a) and (1.5b) do not form a closed system. It is required
that we prescribe an equation of state,
p = p(ρ) , (1.5c)

version 20/01/2010 –9–


The Euler equations Chapter 1. Aether

relating the density and the pressure. It is common to also consider an equation
representing the conservation of energy
 
∂t ǫ + ∂k (ǫ + p)uk = 0 ;
here, ǫ denotes the energy density.

The variables of the system of equation (1.5) are


ρ(t, x) , p(t, x) u(t, x) .
Let us consider a boosted inertial frame, i.e.,
t̄ = t̄(t, x) = t x̄ = x̄(t, x) = x − vt . (1.6)
W.r.t. the second frame the functions are
ρ̄(t̄, x̄) , p̄(t̄, x̄) , ū(t̄, x̄) .
The transformation of these functions under (1.6) is straightforward;
ρ̄(t̄, x̄) = ρ(t, x) , p̄(t̄, x̄) = p(t, x) , ūi (t̄, x̄) = uj (t, x) − v i ; (1.7)
the functions ρ and p transform as scalar fields, u as a vector field (when
regarded as fields on Galilean spacetime).

The Euler equations are invariant under Galilean transformations. Consider,


first, equation (1.5a):
∂ ∂  
ρ(t, x) + k ρ(t, x)uk (t, x)
∂t ∂x
h ∂ t̄ ∂ ∂x̄i ∂ i h ∂ t̄ ∂ ∂x̄i ∂ i 
k k
= + ρ̄( t̄, x̄) + + ρ̄( t̄, x̄) ū (t̄, x̄) + v
∂t ∂ t̄ ∂t ∂ x̄i ∂xk ∂ t̄ ∂xk ∂ x̄i
∂ ∂ ∂  
= ρ̄(t̄, x̄) − v i i ρ̄(t̄, x̄) + k ρ̄(t̄, x̄)ūk (t̄, x̄) + ρ̄(t̄, x̄)v k
∂ t̄ ∂ x̄ ∂ x̄
∂ ∂  
= ρ̄(t̄, x̄) + k ρ̄(t̄, x̄)ūk (t̄, x̄) . (1.8)
∂ t̄ ∂ x̄
Hence, if (1.5a) holds w.r.t. the first system of coordinates, it holds w.r.t. second
system of coordinates as well.

To prove (1.8) we have made use of the relations


∂ ∂ t̄ ∂ ∂x̄i ∂ ∂ ∂
= + = − vi i , (1.9a)
∂t ∂t ∂ t̄ ∂t ∂ x̄i ∂ t̄ ∂ x̄
∂ ∂ t̄ ∂ ∂x̄i ∂ ∂
k
= k
+ k i
= . (1.9b)
∂x ∂x ∂ t̄ ∂x ∂ x̄ ∂ x̄k

–10– version 20/01/2010


Chapter 1. Aether Propagation of perturbations

These relations can be rewritten as


∂ ∂ ¯, ¯ =∇,
= − v∇ ∇
∂t ∂ t̄
and used to show invariance of the Lagrangian derivative:
d ∂ ∂ ¯ = ∂ + ū∇
¯ + (ū + v)∇ ¯ = d .
= + u∇ = − v∇
dt ∂t ∂ t̄ ∂ t̄ dt̄

This invariance leads directly to the invariance of equation (1.5b); we use the
form (1.5b′ ).

d
ρ(t, x) u(t, x) + ∇p(t, x)
dt
d 
= ρ̄(t̄, x̄) ¯ t̄, x̄) = ρ̄(t̄, x̄) d ū(t̄, x̄) + ∇p̄(
ū(t̄, x̄) + v + ∇p̄( ¯ t̄, x̄) .
dt̄ dt̄

Finally, to complete the proof that (1.5) is invariant under a Galilean boost,
we note that  
p̄(t̄, x̄) = p(t, x) = p ρ(t, x) = p ρ̄(t̄, x̄) .

In summary, like every good law of (Newtonian) physics, the Euler equa-
tions (1.5) are invariant under Galilean transformations.

1.4 Propagation of perturbations

Let us consider a simple (‘equilibrium’) solution of the Euler equations (1.5):

ρ(t, x) ≡ ρ̊ = const , u(t, x) ≡ 0 , p(t, x) ≡ p̊ = p(ρ̊) . (1.10)

A small perturbation of this solution is given by



ρ(t, x) = ρ̊ 1 + φ(t, x) , |φ(t, x)| ≪ 1 , (1.11a)
|u(t, x)| ≪ 1 . (1.11b)

Using this ansatz we find


d ∂   h ∂φ i
ρ= ρ̊(1 + φ) + u∇ ρ̊(1 + φ) = ρ̊ + u∇φ ,
dt ∂t ∂t

version 20/01/2010 –11–


Propagation of perturbations Chapter 1. Aether

which we may insert into the continuity equation (1.5a′ ) to obtain


h ∂φ i
ρ̊ + u∇φ = −ρ̊(1 + φ) ∇u
∂t
and thus
∂φ
= −(1 + φ)∇u − u∇φ . (1.12a)
∂t
For (1.5b′ ) we get
h ∂u i h i
ρ̊(1 + φ) + u∇u = −∇p = −p′ (ρ) ∇ρ = −p′ (ρ̊) − p′′ (ρ̊)ρ̊ φ − · · · ρ̊∇φ
∂t
and therefore
∂u h i
= −u∇u − cs2 + p′′ (ρ̊)ρ̊ φ + · · · (1 + φ)−1 ∇φ . (1.12b)
∂t
In this equation we have introduced the quantity cs which has the dimension
of a velocity,
′ dp
p (ρ̊) = = cs2 ; (1.13)
dρ ρ=ρ̊
we will identify it with the speed of sound.

Provided that the spatial gradients of φ and u are of the same small order as φ
and u themselves, we may neglect the terms of higher order and thereby obtain
a simple linear system of equations,
∂φ
= −∇u , (1.14a)
∂t
∂u
= −cs2 ∇φ . (1.14b)
∂t
From (1.14) we infer that
1 ∂2
φ = − φ + ∆φ = 0 , (1.15)
cs2 ∂t2
i.e., φ = φ(t, x) satisfies a wave equation. The solutions of (1.15) are compres-
sion and rarefaction waves that propagate with the speed cs . Similarly,
1 ∂2
u = − u + ∆u = 0 .
cs2 ∂t2

The main observation for us is the following: The Euler equations (1.5) are
invariant under Galilean transformations. From these equations we derive an

–12– version 20/01/2010


Chapter 1. Aether Propagation of perturbations

equation describing the propagation of perturbations of an equilibrium state of


the medium; this is the wave equation (1.15). But this equation is not invariant
under Galilean transformations. To see this we consider a Galilean boost and
use (1.9) to find
1 2
 φ(t, x) = − ∂ φ(t, x) + δij ∂i ∂j φ(t, x)
cs2 t
1  
= − 2 ∂¯t − v i ∂¯i ∂¯t − v j ∂¯j φ̄(t̄, x̄) + δij ∂¯i ∂¯j φ̄(t̄, x̄)
cs
1 ¯2 2 i¯ ¯ vi vj ¯ ¯
=− ∂t φ̄(t̄, x̄) + v ∂i ∂t φ̄(t̄, x̄) − ∂i ∂j φ̄(t̄, x̄) + δij ∂¯i ∂¯j φ̄(t̄, x̄)
cs2 cs2 cs2
2 i¯ ¯ vi vj ¯ ¯
¯ φ̄(t̄, x̄) +
= v ∂i ∂t φ̄(t̄, x̄) − ¯ φ̄(t̄, x̄) .
∂i ∂j φ̄(t̄, x̄) 6= 
cs2 cs2

As a simple an explicit example let us consider a one-dimensional problem. In


one (spatial) dimension, the wave equation

φ(t, x) = 0 (1.16)

reduces to
h 1 i h 1 ih 1 i
− 2 ∂t2 + ∂x2 φ(t, x) = − ∂t + ∂x ∂t + ∂x φ(t, x) = 0 . (1.17)
cs cs cs
Its general solution can be represented as the linear combination of a wave
traveling to the right and a wave traveling to the left, i.e.,

φ(t, x) = λ1 φ1 (x − cs t) + λ2 φ2 (x + cs t) , (1.18)

where φ1 and φ2 are arbitrary functions and λ1 , λ2 ∈ R.

In a boosted inertial frame the equation describing the perturbation is given


by
2 i¯ ¯ vi vj ¯ ¯
¯ φ̄(t̄, x̄) + v ∂i ∂t φ̄(t̄, x̄) − ∂i ∂j φ̄(t̄, x̄) = 0 . (1.16)
cs2 cs2
In one dimension this reduces to
h 1 i 2 v2
− 2 ∂¯t2 + ∂¯x2 φ̄(t̄, x̄) + 2 v ∂¯x ∂¯t φ̄(t̄, x̄) − 2 ∂¯x2 φ̄(t̄, x̄)
cs cs cs
 v2  h 1 ¯ ih 1 i
= 1− 2 − ∂t + ∂¯x ∂¯t + ∂¯x φ̄(t̄, x̄) = 0 , (1.17)
cs cs + v cs − v

version 20/01/2010 –13–


Propagation of perturbations Chapter 1. Aether

where v is merely a number instead of a vector. The general solution of (1.17)


is  
φ̄(t̄, x̄) = λ̄1 φ̄1 x̄ − (cs − v)t̄ + λ̄2 φ̄2 x̄ + (cs + v)t̄ , (1.18)
where again φ̄1 and φ̄2 are arbitrary functions and λ̄1 , λ̄2 ∈ R. The solution is
thus a linear combination of a wave traveling at the speed cs − v and a wave
traveling at the speed cs + v.

As a simple special case consider an inertial frame whose relative velocity to


the original inertial frame is v = cs . Then (1.18) reads

φ̄(t̄, x̄) = λ̄1 φ̄1 (x̄) + λ̄2 φ̄2 x̄ + 2cs t̄ .

This makes perfect sense. An observer who is traveling at the speed cs observes
a class of waves that do not propagate at all (i.e., “standing waves”), and and
a class of waves that propagate in the direction opposite to the direction of
motion, which are waves with velocity 2cs . These two classes are represented
by φ̄1 and φ̄2 , respectively.

The example shows that inertial frames need not necessarily be equivalent.
Clearly, the fundamental laws of (Newtonian) physics must be the same un-
der a change of inertial frame (Galilean transformation). But if there exists a
medium (represented by a ‘background’ solution), then this medium automat-
ically distinguishes a frame of reference (“absolute space”); derived equations
(which rely on the existence of a background solution) might take a distin-
guished (simple) form with respect to the distinguished frame of reference. (In
the inertial frame (t, x), where the fluid is at rest, the equation for the per-
turbation is (1.16); in a boosted inertial frame (t̄, x̄) we obtain (1.16); the rest
frame of the fluid is obviously distinguished.)

Let us conclude this section by analyzing the properties of the wave equation
in some more detail. As has been demonstrated above, the wave equation
1 2
φ(t, x) = − ∂ φ(t, x) + δij ∂i ∂j φ(t, x) = 0 (1.19)
cs2 t
is not invariant under Galilean transformations. However, there exists a class
of transformations under which (1.19) is in fact invariant. To see this let us
first define  1 
−c2
s
 1 
ηs = 

. (1.20)
1 
1

–14– version 20/01/2010


Chapter 1. Aether Propagation of perturbations

In addition we set x0 = t and thus ∂0 = ∂t . Using this notation the wave


equation (1.19) becomes
 
 φ x0 , x1 , x2 , x3 = ηsµν ∂µ ∂ν φ x0 , x1 , x2 , x3 = 0 . (1.19′ )

It is common to specify
 the arguments of the functions collectively; e.g., instead
of φ x0 , x1 , x2 , x3 we will write φ(xσ ).

Consider a (linear) change of coordinates

x̄µ = x̄µ (xσ ) = [Ls ]µν xν . (1.21)

This implies
∂ ∂x̄ν ∂ ∂x̄ν ¯
∂µ = = = ∂ν = [Ls ]ν µ ∂¯ν .
∂xµ ∂xµ ∂ x̄ν ∂xµ
Furthermore, the function φ(xσ ) is assumed to transform like a scalar function,
i.e., φ̄(x̄σ ) = φ(xσ ).

A straightforward computation now leads to


 
 φ(xσ ) = ηsµν ∂µ ∂ν φ(xσ ) = ηsµν ∂µ [Ls ]β ν ∂¯β φ̄(x̄σ )

= ηsµν [Ls ]β ν ∂µ ∂¯β φ̄(x̄σ ) = ηsµν [Ls ]αµ [Ls ]β ν ∂¯α ∂¯β φ̄(x̄σ )
!
=¯ φ̄(x̄σ ) = ηsαβ ∂¯α ∂¯β φ̄(x̄σ ) ,

!
where the symbol = means ‘is required to be equal to’. Consequently, invari-
ance of the d’Alembertian is equivalent to requiring, that

[Ls ]αµ [Ls ]β ν ηsµν = ηsαβ , (1.22)

which means that ηs remains invariant under (1.21).

An example for a transformation (1.21) satisfying (1.22) is


    
t̄ γs −γs v/c2s 0 0 t
x̄1  −γs v γ s 0 0   x 1
 =   , (1.23)
x̄2   0 0 1 0  x2 
x̄3 0 0 0 1 x3

where γs = (1− v 2 /c2s )−1 . This is shown by a direct calculation. It is somewhat


strange that the transformation (1.23) involves a non-trivial transformation

version 20/01/2010 –15–


Maxwell’s equations and the wave equation Chapter 1. Aether


of time, t 7→ t̄ = γs t − v c−2 1
s x . However, we don’t have to wrack our
brains about this curiosity now. We know that the equation (1.19) is not
a fundamental equation of physics (but arising from a background solution);
therefore, the transformation properties of (1.19) cannot tell us anything about
fundamental physics.

1.5 Maxwell’s equations and the wave equation

In standard notation, Maxwell’s equations in vacuum read


~E
∇ ~ =0 ~B
∇ ~ =0 (1.24a)
~ ×E
∇ ~ = −∂t B
~ ~ = 1 ∂t E
~ ×B
∇ ~ (1.24b)
c2
where, of course, c is the speed of light,

c = 299 792 458 m/s .


~ ×∇
When we form ∇ ~ × E,
~ then we obtain

~ ×∇
∇ ~ ×E
~ =∇
~ ∇~E
~ −∆E
~
|{z}
=0

on the one hand, and


     
~ ~ ~ ~ ~ ~ ~ 1 ~
∇ × ∇ × E = ∇ × −∂t B = −∂t ∇ × B = −∂t 2 ∂t E
c
on the other hand. Therefore,

~ =− 1 2~ ~ =0.
E ∂ E + ∆E (1.25)
c2 t
Analogously, B ~ = 0. Therefore, the components of the electric and the
~ ~ satisfy the free wave equation.
magnetic field, E and B,

1.6 The luminiferous and electromagnetic ether

So, this is the idea: Light and electromagnetic waves in general require a
propagation medium; this medium is called luminiferous and electromagnetic
ether (alternative spelling: aether).

–16– version 20/01/2010


Chapter 1. Aether Hoek

Electromagnetic waves are like perturbations of the ether (which is the back-
ground solution). There exists some (non-linear) theory of the ether (a theory
that has not been discovered yet); the linearized equations that describe the
perturbations manifest themselves as the Maxwell equations (and thus as the
wave equation for the electric and magnetic field). The (unknown) ether equa-
tions are certainly Galilean invariant; the Maxwell equations on the other hand
are not; the validity of the Maxwell equations is restricted to a distinguished
inertial frame, the rest frame of the ether.
Remark. A good model to have in mind is the one discussed in sections 1.3
and 1.4. There is one difference, however. In the case of a perfect fluid and
the Euler equations, the perturbations are longitudinal waves (compression
and rarefaction); in the electromagnetic case, perturbations of the ether are
transversal waves (which is immediate from the polarization of light). As a
consequence, a fluid cannot be a viable model for the ether; the ether must
have more complicated properties.

Since light and electromagnetic waves are omnipresent, so is ether. In partic-


ular, (outer) space is filled with ether, and the earth travels through it on its
course around the sun. What, then, is the movement of the earth relative to
the ether? Does earth interact with ether? Does ether interact with matter?

To answer these questions several experiments were conceived and carried out
in the nineteenth century. We focus on Hoek’s experiment and the Michelson–
Morley experiment.

1.7 Hoek

Based on an earlier experiment by Fizeau, in 1868, Martinus Hoek performed


an experiment to measure the motion of the earth through the ether and/or
the effect of matter on ether.

The basic idea is the detection of interference of light beams taking different
paths. Suppose that the earth moves with velocity v through the ether. For a
light beam that is parallel to the direction of motion of the earth, the velocity
is c − v, when measured in the rest frame of the laboratory (earth); for a light
beam that is antiparallel to the direction of motion of the earth, this velocity
is c + v. Therefore, if a light ray makes a round-trip through a device of length

version 20/01/2010 –17–


Hoek Chapter 1. Aether

l, the travel time is

l l 2l 2l v 2 
t= + = + 2
+ O v 4 /c4 . (1.26)
c+v c−v c c c
The velocity of the earth in its course around the sun is approximately 30km/s;
hence
v v2
≃ 10−4 , ≃ 10−8 .
c c2
Therefore, since the travel time (1.26) differs from 2l/c at second order in v/c
only, the effect to be measured is very small.

Hoek’s experiment was designed in such a way that the expected effect was of
first order in v/c, because, at that time, effects of second order in v/c were
beyond experimental reach. Two light rays pass through vacuum (or air) in
one direction and through a medium (water) in the opposite direction. Recall
that the speed of light in a medium is
c
,
n
where n is the refractive index of the medium. In the simplest approach to the
problem, we expect the travel times

l l (1 + n)l (1 − n2 )l v 
t1 = c + = + + O v 2 /c2
n +v c−v c c c
l l 2
(1 + n)l (1 − n )l v 
t2 = c + = − + O v 2 /c2
n −v c+v c c c

for the two light rays going through the apparatus in opposite directions. The
difference in travel times (at first order in v/c) should manifest itself in inter-
ference fringes.
Remark. Since the direction of motion of the earth through ether is unknown,
the experimental device could be rotated. (In addition, the experiment was
performed several times several months apart—the earth could be at rest rel-
ative to ether at a certain time; since the earth revolves around the sun, some
months later, it should be moving with 60km/s.

The experiment gave a null result. The motion of the earth through ether could
not be detected. However, the explanation of this result was straightforward
(and completely consistent with the earlier results by Fizeau). Ether interacts
with matter which causes the phenomenon of ‘ether drag’:

–18– version 20/01/2010


Chapter 1. Aether Michelson and Morley

Suppose that some matter moves with velocity v (w.r.t. the universal ether).
Then the ether within the material is (partially) dragged along, so that the
velocity of the ‘local’ ether is

vether in matter = d v ; (1.27)

d is the dragging coefficient. If d = 0 there is no dragging; if 0 < d < 1 there


is partial dragging; if d = 1 the dragging is complete.

Let us consider Hoek’s experiment in the light of ‘ether drag’: If the velocity of
the medium (water)8 w.r.t. the universal ether is v, then its velocity w.r.t. the
‘local’ ether contained within it is (1 − d)v. We thus obtain the travel times

l l (1 + n)l 1 − (1 − d)n2 l v 
t1 = c + = + + O v 2 /c2
n + (1 − d)v c−v c c c

l l (1 + n)l 1 − (1 − d)n2 l v 
t1 = c + = − + O v 2 /c2
n − (1 − d)v c+v c c c
for the two light rays going through the apparatus in opposite directions.

The null result of the experiment is explained by partial ether drag, if and only
if 1 − (1 − d)n2 = 0, which corresponds to a dragging coefficient of
1
d=1− . (1.28)
n2
This was exactly the value suggested by Fresnel some decade earlier and con-
firmed earlier measurements and results by Fizeau.

Due to the effect of partial ether drag, Hook’s experiment could not detect the
movement of the earth w.r.t. the (universal) ether. The terms of first order in
v/c disappear; the remaining effect would be of second order in v/c and could
not be detected with Hoek’s experimental means.

1.8 Michelson and Morley

Toward the end of the nineteenth century the possibility to construct a high
precision interferometer to measure effects of quadratic order in v/c came into
reach.
8
In Hoek’s experiment, the velocity of the medium coincides with the velocity of earth
through the (universal) ether, since the medium is at rest w.r.t. the laboratory.

version 20/01/2010 –19–


Michelson and Morley Chapter 1. Aether

In the Michelson interferometer a light beam is divided into two parts that
travel along orthogonal paths to meet again where interference is observed.
Suppose that one of the two light rays moves in the direction of the movement
of earth through ether; we call this ray the parallel ray, the other ray is called
the perpendicular ray. In the laboratory frame, the velocity vectors are
 
c±v
uk = (k)
0
for the parallel ray and  
0
u⊥ =
±w
for the perpendicular ray, where w is to be determined. A simple Galilean
transformation show that u⊥ reads
 
′ v
u⊥ =
±w
in the rest frame of the ether. Since the absolute value of this velocity must be
c, i.e.,
u′⊥ u′⊥ = v 2 + w2 = c2 ,
we find p q
v2
w= c2 − v 2 = c 1− c2 .
Hence,  
u⊥ = √ 0 . (⊥)
± c2 − v 2

Let us compute the travel times of the two light rays. We obtain
l l 2l 2l v 2 
tk = + = + 2
+ O v 4 /c4 ,
c+v c−v c c c
l l 2l l v2 4 4

t⊥ = √ +√ = + + O v /c .
c2 − v 2 c2 − v 2 c c c2
The difference causes a phase shift between the two beams that produces inter-
ference fringes. These interference pattern should change if the interferometer
is rotated; e.g.,
l v2 
∆t0◦ = tk − t⊥ = + 2
+ O v 4 /c4 , (1.29a)
c c
l v2 
∆t90◦ = t⊥ − tk = − 2
+ O v 4 /c4 ; (1.29b)
c c

–20– version 20/01/2010


Chapter 1. Aether Michelson and Morley

therefore, detection of the movement of the earth through ether should be


observed straightforwardly.

The first experiment was performed by Michelson in 1881. However, there was
a fundamental error in Michelson’s computations; he used the value w = c
to compute the travel time of the perpendicular ray. This error leads to an
additional factor of 2 in (1.29), hence Michelson expected an effect twice the
actual size. The error of his experimental design would have been sufficiently
small to measure this larger effect, but the actual effect is merely half the size
and Michelson’s experiment was not precise enough to measure (1.29).

A reduction of experimental error was necessary to obtain reliable results. The


experiment was repeated (with a better design) by Michelson and Morley in
1887 and again in the eighteen nineties. The experimental error was reassur-
ingly small.

The experiment gave a null result.

The conclusion drawn by Michelson and Morley was that the earth drags the
entire ether along, possibly gravitatively. (This view was consistent with an
earlier suggestion by Stokes.) However, as had been known long before, the
observation of stellar aberration is proof that this kind of ether drag is impossi-
ble. (The phenomenon of stellar aberration would be very different, if the light
reaching us from the stars passed through an ether whose velocity changes with
position.) Moreover, the connection of that kind of complete ether drag with
the partial dragging established by Fizeau and Hoek would remain mysterious.

The attempts to explain the null result of the interferometry experiment became
rather desperate. Since the laboratory where Michelson and Morley performed
their experiment was situated in the basement of a building, it was suggested
that there could be complete ether in such a surrounding. But experiments
on mountains and in balloons (!) confirmed the null results.9 As a last resort,
Fitzgerald and Lorentz proposed the contraction of lengths in the direction of
motion of earth through ether. However, the mechanism that caused such a
contraction remained unclear.

In 1905, Einstein made a clear cut. The main idea is simple: There is no ether.

9
The occasional non-null results were all shown to be due to bad experimental design.

version 20/01/2010 –21–


Michelson and Morley Chapter 1. Aether

–22– version 20/01/2010


CHAPTER 2

EINSTEIN

2.1 Lorentz transformations

This is the line of reasoning: There is no ether. The Maxwell equations are not
‘derived’ equations that describe the perturbations of a medium; the Maxwell
equations are ‘fundamental’. In other words: ‘Laws of nature’. But then,
obviously, the principle of relativity comes into play. If the principle of relativity
holds, then the set of coordinate systems w.r.t. which the Maxwell equations
take their standard form are the inertial frames of reference.

The coordinate transformations that leave invariant the Maxwell equations


must therefore be the transformations that map one inertial frame into an-
other. (These transformation will be given the name Lorentz/Poincaré trans-
formations.)

So, then, what are the coordinate transformations that leave invariant the
Maxwell equations?

Let us use the results of section 1.5. The Maxwell equations encompass the
wave equation. It thus makes sense to begin by studying the coordinate trans-
formations that leave invariant the wave equation, i.e.,
1 2
φ = − ∂ φ + ∆φ = 0 . (2.1)
c2 t

version 20/01/2010 –23–


Lorentz transformations Chapter 2. Einstein

Like at the end of section 1.4 we begin by writing the wave equation in a
convenient form. Using the Kronecker symbol (and the Einstein summation
convention) the wave equation (2.1) becomes
1 2
φ(t, x) = − ∂ φ(t, x) + δij ∂i ∂j φ(t, x) = 0 . (2.2)
c2 t
We define a coordinate x0 that encodes time by

x0 = ct .

Accordingly, instead of coordinates (t, x1 , x2 , x2 ) we use a system of coordinates


(x0 , x1 , x2 , x3 ). It is standard in relativity to have indices that run from 0, . . . , 3
(instead of from 1, . . . , 4).
Remark. One can interpret the coordinate x0 as measuring time in ‘light meters’
instead of in seconds (where a ‘light meter’ is the time that light takes to travel
through one meter). More or less equivalently, one can measure time in seconds
and length in light seconds.

Using (x0 , x1 , x2 , x3 ), the wave equation reads

φ(x0 , x1 , x2 , x3 ) = −∂02 φ(x0 , x1 , x2 , x3 ) + δij ∂i ∂j φ(x0 , x1 , x2 , x3 ) = 0 . (2.3)

Let us define η to be the matrix


 
−1
 1 
η=

. (2.4)
1 
1

Using this notation the wave equation (2.3) becomes


 
 φ x0 , x1 , x2 , x3 = η µν ∂µ ∂ν φ x0 , x1 , x2 , x3 = 0 . (2.5)

It is common to specify
 the arguments of the functions collectively; e.g., instead
of φ x0 , x1 , x2 , x3 we will write φ(xσ ).

Having completed the preparatory steps, let us now consider an arbitrary


change of coordinates,
x̄µ = x̄µ (xσ ) . (2.6)
This implies
∂ ∂x̄ν ∂ ∂x̄ν ¯
∂µ = = = ∂ν .
∂xµ ∂xµ ∂ x̄ν ∂xµ

–24– version 20/01/2010


Chapter 2. Einstein Lorentz transformations

Furthermore, the function φ(xσ ) is assumed to transform like a scalar function,


i.e., φ̄(x̄σ ) = φ(xσ ).

A straightforward computation now leads to


 ∂x̄β 
 φ(xσ ) = η µν ∂µ ∂ν φ(xσ ) = η µν ∂µ ∂¯β φ̄(x̄σ )
∂xν
∂x̄β ¯
2 β
µν ∂ x̄
= η µν ∂ ∂
µ β φ̄(x̄σ
) + η ∂¯β φ̄(x̄σ )
∂xν ∂xµ ∂xν
∂x̄α ∂x̄β ¯ ¯ ∂ 2 x̄β
= η µν µ ν
∂α ∂β φ̄(x̄σ ) + η µν µ ν ∂¯β φ̄(x̄σ ) . (2.7)
∂x ∂x ∂x ∂x
If and only if the coordinate system (x̄0 , x̄1 , x̄2 , x̄3 ) is an inertial frame of ref-
erence, the wave equation is invariant under the change of coordinates (2.6),
i.e.,
!
¯ φ̄(x̄σ ) = η αβ ∂¯α ∂¯β φ̄(x̄σ ) .
 φ(xσ ) =  (2.8)

Using (2.7) we conclude that invariance of the wave equation is equivalent to


requiring that
∂x̄α ∂x̄β ∂ 2 x̄β
η µν = η αβ , η µν =0. (2.9)
∂xµ ∂xν ∂xµ ∂xν
From these equations it is possible (though not simple) to conclude that the
transformation (2.6) must satisfy
∂x̄µ
= const ,
∂xν
i.e., the transformation must be linear. Let us define
∂x̄µ
Lµν := = const .
∂xν
Then the coordinate transformation (2.6) reads

x̄µ = Lµν xν . (2.10)

Remark. Strictly speaking, we conclude from Lµν = ∂x̄µ /∂xν = const that

x̄µ = Lµν xν + āµ , (2.10′ )

for some constant vector āµ . However, this merely corresponds to an additional
translation (in time and/or space). For simplicity we suppress this translational
freedom for the moment.

version 20/01/2010 –25–


Lorentz transformations Chapter 2. Einstein

In addition to (2.10), we conclude from (2.9) that invariance of the wave equa-
tion requires
Lαµ Lβ ν η µν = η αβ , (2.11)
which means that η remains invariant under (2.10).

Let us finally denote the coordinate transformation we consider here by their


well-known name:
Definition 2.1. A Lorentz transformation is a coordinate transformation of
spacetime that maps one inertial frame into another.

Taking account of the definition of an inertial frame we find an alternative


definition.

Definition 2.1′. A Lorentz transformation is a coordinate transformation of


spacetime that leaves invariant the laws of physics1 .
Remark. So far, the only (relativistic) law of physics we know are the Maxwell
equations and the wave equation (which derives from the Maxwell equations).
The wave equation is the one we use above.
Remark. Strictly speaking, a Lorentz transformation is coordinate transforma-
tion of spacetime that maps one inertial frame into another (or: leaves invariant
the laws of physics) and keeps the origin (of spacetime) fixed. If this is not the
case we call the transformation a Poincaré transformation; compare (2.10)
and (2.10′ ).

Our considerations on the wave equation that lead us to (2.10) and (2.11) thus
prove the following theorem:
Theorem 2.2. The Lorentz transformations are uniquely characterized as the
transformations
x̄µ = Lµν xν
that leave η invariant, i.e.,
Lµα Lν β η αβ = η µν . (2.12)

More commonly, equation (2.12) is written as


Lµα Lν β ηµν = ηαβ , (2.12′ )
1
By laws of physics we mean the actual (relativistic) laws and not the Newtonian approxi-
mations.

–26– version 20/01/2010


Chapter 2. Einstein Lorentz transformations

where ηµν is again given as the matrix


 
−1
 1 
η= 
,

1
1

see (2.4).
Remark. In matrix notation, equation (2.12′ ) reads

LT η L = η .

It is simple to see that the two representations (2.12) and (2.12′ ) are equivalent:
We multiply (2.12) with [L−1 ]σµ and [L−1 ]λν and obtain

[L−1 ]σµ [L−1 ]λν Lµα Lν β η αβ = [L−1 ]σµ [L−1 ]λν η µν .

Since [L−1 ]σµ Lµα = δσα and [L−1 ]λν Lν β = δλβ we get

η σλ = [L−1 ]σµ [L−1 ]λν η µν .

A simple change of indices leads to

[L−1 ]µα [L−1 ]ν β η αβ = η µν . (2.12−1 )

The interpretation of (2.12−1 ) is obvious: If L is a Lorentz transformation,


i.e., if L satisfies (2.12), then L−1 is a Lorentz transformation as well, i.e., L−1
satisfies (2.12) as well. (Note that (2.12) and (2.12−1 ) are identical.) In matrix
notation, equation (2.12−1 ) reads

L−1 η −1 (L−1 )T = η −1 .

Note that η and η −1 are identical as matrices; the reason for the choosing η −1
here will become clear later (when we discuss metrics).2 Therefore,
 −1
L−1 η −1 (L−1 )T =η,

which implies
LT η L = η .
2
The index structure of the Minkowski metric η is ηµν ; the index structure of its inverse η −1
is η µν . (Regarded as matrices the two are equal.) The reason is that the matrix identity
η −1 η = 1 corresponds to η µν ηνσ = δ µσ in index notation.

version 20/01/2010 –27–


Examples of Lorentz transformations Chapter 2. Einstein

In index notation, we thus arrive at

Lµα Lν β ηµν = ηαβ , (2.13)

which is (2.12′ ).
Remark. The equivalence of (2.12) and (2.12′ ) is in fact rather obvious and does
not need any particular work (however, we may consider the above a useful
exercise). One simply takes into account the action of L on contravariant and
covariant tensors; see appendix.

2.2 Examples of Lorentz transformations

In section 2.1 we have seen that the Lorentz transformations

x̄µ = Lµν xν (2.14)

are those transformations that leave η invariant, i.e.,

Lµα Lν β ηµν = ηαβ . (2.15)

Recall that ηµν is given as the matrix


 
−1
 1 
η= 
.
 (2.16)
1
1

In matrix notation, equation (2.15) reads

LT η L = η . (2.15′ )

In this section we will see how Lorentz transformations actually look like.

The first question we ask is: How many Lorentz transformations are there?
Every Lorentz transformation is represented by a (4 × 4) matrix, which makes
16 unknowns. Looking at equation (2.15) we see that there are 10 equations
that L must satisfy (since η is a symmetric matrix, 6 of the 16 equations
are redundant). 16 − 10 = 6. In other words, we expect that the Lorentz
transformations form a 6-parameter family.

–28– version 20/01/2010


Chapter 2. Einstein Examples of Lorentz transformations

More specifically, the Lorentz transformations form a 6-parameter group.3


The group property simply amounts to the statement that the composition
of Lorentz transformations is again a Lorentz transformation. To see this, let
L1 and L2 be Lorentz transformations, i.e., LT T
1 η L1 = η and L2 η L2 = η.
T T T
Then (L1 L2 ) η L1 L2 = L2 L1 η L1 L2 = η; in other words, L1 L2 is a Lorentz
transformation.

To derive the Lorentz transformations from (2.15) we make an ansatz. We


choose a matrix L such that is trivial apart from a two-by-two matrix block.
These are six4 possibilities, namely
     0 
L00 L01 0 0 L00 0 L02 0 L0 0 0 L03
L10 L11 0 0  0 1 0 0  0 1 0 0 
 ,  ,  ,
 0 0 1 0 L20 0 L22 0  0 0 1 0 
0 0 0 1 0 0 0 1 L3 0 0 0 L33
     
1 0 0 0 1 0 0 0 1 0 0 0
0 L11 L12 0 0 L11 0 L13  0 1 0 0 
 ,  ,  .
0 L21 L22 0 0 0 1 0  0 0 L 2 L23 
2

0 0 0 1 0 L31 0 L33 0 0 L32 L33

Provided that the ansatz is successful (i.e., provided that there exist Lorentz
transformations with this structure), we automatically obtain the complete
six-parameter family of Lorentz transformations.

Let us denote the non-trivial (2 × 2) block by


 
a b
K= ,
c d

irrespective of where it sits within the matrix L. Using the ansatz for L
in (2.15), i.e., in LT η L = η, we find that a matrix L of the (2 × 2) block
form is a Lorentz transformation if and only if
   
T ∓1 0 ∓1 0
K K = . (2.17)
0 1 0 1

There are two cases:


3
This six-parameter group is the so-called Lorentz group, which is either denoted by L or
by O(3, 1).
4
Oh, I see.

version 20/01/2010 –29–


Examples of Lorentz transformations Chapter 2. Einstein

+ The sign is a plus sign, if the (2 × 2) block K is a part of the spatial


components of the matrix L; this corresponds to the latter three of the
six possibilities above.
− The sign is a minus sign, if the (2 × 2) block K involves time and space,
i.e., the 0-component and one spatial component; this corresponds to the
first three of the six possibilities above.

The + case is particularly simple. In this case, equation (2.17) reduces to


KT K = 1 , (2.18)
which means that K must be an orthogonal matrix, i.e., K ∈ O(2). If we
require the transformation to preserve the orientation of the coordinate, then
we restrict ourselves to K ∈ SO(2), i.e., to the rotations, i.e.,
   
a b cos ϕ − sinϕ
K= = , (2.19)
c d sin ϕ cos ϕ
with ϕ ∈ [0, 2π).

We have reached the conclusion that rotations are Lorentz transformations;


this agrees with our intuition: A rotation (by a constant angle) changes one
inertial frame into another.5 By using our ansatz we have derived three different
rotations: A rotation around the x1 -axis, a rotation around the x2 -axis, and a
rotation around the x3 , i.e.,
     
1 0 0 0 1 0 0 0 1 0 0 0
0 cos ϕ − sin ϕ 0 0 cos ϕ 0 − sinϕ 0 1 0 0 
 ,  ,  .
0 sin ϕ cos ϕ 0 0 0 1 0  0 0 cos ϕ − sinϕ
0 0 0 1 0 sin ϕ 0 cos ϕ 0 0 sin ϕ cos ϕ

Clearly, we can compose these three elementary rotations to a general rotation


in space; therefore, !
1
L= , (2.20)
R
where R = (Rij )i,j is a (3 × 3) rotation matrix, i.e., R ∈ SO(3), is a Lorentz
transformation. For these rotations, (2.14) becomes
x̄0 = x0 ,
x̄i = Rij xj .
5
Evidently, rotations are Galilean transformations and Lorentz transformations at the same
time.

–30– version 20/01/2010


Chapter 2. Einstein Examples of Lorentz transformations

Note again that there exist three rotational degrees of freedom, since the group
SO(3) is a three-parameter group (characterized, e.g., by the Euler angles).

Let us now consider the − case, i.e., (2.17) with the minus sign. We obtain
       2 
T −1 0 a c −1 0 a b −a + c2 −ab + cd
K K = = ,
0 1 b d 0 1 c d −ab + cd −b2 + d2

which is required to satisfy


 2   
−a + c2 −ab + cd −1 0
= . (2.21)
−ab + cd −b2 + d2 0 1

From the equation a2 − c2 = 1 in (2.21) we conclude that there exists u ∈ R


such that
a = cosh u , c = − sinh u .
It is not necessary to restrict oneself to the case a > 0 (a = − cosh u would
be admissible as well); however, we choose to study Lorentz transformation
that keep the direction of time fixed.; it is not difficult to see that this requires
a > 0. (The minus in front of sinh u is merely a matter of convenience and not
a restriction.) Analogously, the equation −b2 + d2 = 1 in (2.21) implies

b = − sinh w , d = cosh w

for some w ∈ R. We choose not to consider spatial reflections, so that we


assume d > 0. Inserting these relations into the remaining equation in (2.21)
yields
!
sinh u cosh w − cosh u sinh w = sinh(u − w) = 0 ,
from which we easily conclude that

u=w.

Therefore,    
a b cosh u − sinh u
K= = . (2.22)
c d − sinh u cosh u
Using K in the ansatzes we obtain the three remaining Lorentz transformations,
i.e.,  
cosh u − sinh u 0 0
− sinh u cosh u 0 0
L=  0
, (2.23)
0 1 0
0 0 0 1

version 20/01/2010 –31–


Lorentz boosts Chapter 2. Einstein

and the two other ones where the (2 × 2) block involves x2 or x3 instead of x1 ,
i.e.,
   
cosh u 0 − sinh u 0 cosh u 0 0 − sinh u
 0 1 0 0  0 1 0 0 
 ,  .
− sinh u 0 cosh u 0  0 0 1 0 
0 0 0 1 − sinh u 0 0 cosh u

Thereby we have successfully completed our search for the Lorentz transforma-
tions. We have found six elementary Lorentz transformations; because of the
group property, every Lorentz transformation can be represented as a combi-
nation (composition) of these six elementary transformations.
Remark. The parameter u is called rapidity.

2.3 Lorentz boosts

Before we proceed, let us reintroduce dimensions into our considerations. The


coordinate x0 is defined as x0 = ct; therefore, the Lorentz transformation (2.14),
i.e., x̄µ = Lµν xν , which reads
 0    0
x̄ cosh u − sinh u 0 0 x
x̄1  − sinh u cosh u 0 0 x1 
  =   
x̄2   0 0 1 0 x2 
x̄3 0 0 0 1 x3

can be rewritten as
     
t̄ cosh u −c−1 sinh u 0 0 t
x̄1  −c sinh u cosh u 0 0 x1 
  =     (2.24)
x̄2   0 0 1 0 x2 
x̄3 0 0 0 1 x3

when we use t (and t̄) instead of x0 (and x̄0 ).

It is not difficult to interpret this Lorentz transformation. Let us denote the


inertial frame with coordinates (t, x) by X and the inertial frame with co-
ordinates (t̄, x̄) by X̄. Take a family of particles that are at rest w.r.t. the
inertial observer (inertial frame of reference) X̄; these particles are described
by x̄1 = const, x̄2 = const, x̄3 = const in the coordinate system X̄. Accord-
ing to (2.24), since x̄1 = −c(sinh u) t + (cosh u) x1 , the coordinates the inertial

–32– version 20/01/2010


Chapter 2. Einstein Lorentz boosts

observer X ascribes to these particles are


x1 = const + |c tanh
{z u} t , x2 = const x2 = const , (2.25)
v

i.e., the particles are in uniform motion in the direction of the x1 -axis with
some constant velocity v which is given by
v = c tanh u . (2.26)
Consequently, v is the constant velocity of the inertial observer X̄ as seen by
the observer X. (Note that a motion in the direction of the positive x1 -axis is
described by positive values of v, while a motion in the direction of the negative
x1 -axis is described by negative values of v.) We draw an interesting conclusion
from the formula (2.26): Since the modulus of tanh u is always less than 1, we
find
|v| < c . (2.27)
This means that if X and X̄ are inertial observers, then the relative velocity
|v| describing their relative motion must necessarily be less than the speed of
light.
Definition 2.3. The γ-factor associated with the relative velocity |v| is defined
as
1
γ = γ(v) = p . (2.28)
1 − v 2 /c2

Rewriting (2.24) in terms of v and γ(v) yields the standard form for the
Lorentz boost in x1 -direction:
    
t̄ γ −γv/c2 0 0 t
x̄1  −γv γ 0 0  x1 
 =   (2.29)
x̄2   0 0 1 0  x2 
x̄3 0 0 0 1 x3

A Lorentz boost is the prototype of a Lorentz transformation. Often the


terms ‘Lorentz boost’ and ‘Lorentz transformation’ are used synonymously and
interchangeably.6 Lorentz boosts involve a change of both the time and the
spatial coordinates; they are thus the special relativistic analog of the Galilean
boosts. This is because Lorentz boosts describe the change of coordinates
between an observer X and an observer X̄ that moves with constant velocity
v w.r.t. X.
6
Admittedly, this is a bit misleading since also rotations are Lorentz transformations. But
rotations are boring transformations, so it is easy to turn a blind eye on them.

version 20/01/2010 –33–


Lorentz boosts Chapter 2. Einstein

Remark. For all inertial observers the speed of light is equal to c. This is
already implicit in our assumptions, since we have required the wave equation
(which contains c) to hold w.r.t. all inertial frames. As an exercise we can check
consistency with (2.29). To that end consider a photon which moves according
to x1 = ct w.r.t. the frame X. In coordinates X̄ we obtain t̄ = γ(1 − v/c)t and
x̄1 = γ(1 − v/c)ct from (2.29). Consequently, x̄1 = ct̄, i.e., also for the observer
X̄ the photon moves with velocity c.

Let us summarize. A Lorentz boost in x1 -direction is the coordinate trans-


formation between inertial observers in uniform relative motion; the inertial
observer X̄ is seen by X to move with velocity v in the direction of x1 .

In particular, the origin of X̄, i.e., x̄ = ō is seen by X to move according to


 
vt
ō : x(t) =  0  .
0

The Lorentz boosts associated with the other axes are completely analogous.
Here is the complete list:

Lorentz boost in direction of x1 :


    
  t̄ γ −γv/c2 0 0 t
vt x̄1  −γv
 = γ 0 0  x1 
 
 (2.30a)
ō : x(t) =  0  x̄2   0 0 1 0  x2 
0
x̄3 0 0 0 1 x3

Lorentz boost in direction of x2 :


    
  t̄ γ 0 −γv/c2 0 t
0 x̄1   0
 = 1 0 0 x1 
   (2.30b)
ō : x(t) = vt x̄2  −γv 0 γ 0  x2 
0
x̄3 0 0 0 1 x3

Lorentz boost in direction of x3 :


    
  t̄ γ 0 0 −γv/c2 t
0 x̄1   0
 = 1 0 0  x1 
 
ō : x(t) =  0  x̄2   0 (2.30c)
0 1 0  x2 
vt
x̄3 −γv 0 0 γ x3

–34– version 20/01/2010


Chapter 2. Einstein Lorentz boosts

Naturally there exist Lorentz boosts associated with an arbitrary direction. It


is given by
    t 
 1 t̄
v γ −γ~v T /c2
x̄1    x1 
ō : x(t) = ~v t = v 2  t  = γ − 1 T  
x̄2  −γ~v 1+ ~v~v x2  ,
v3 v 2
x̄3 x3

i.e., the transformation matrix is


 
γ −γ~v T /c2
 
L = Lgeneral (~v ) =  γ−1 T  , (2.31)
−γ~v 1+ ~v~v
v2

where ~v is an arbitrary vector (with |~v | < c of course). This general Lorentz
boost can be obtained by applying suitable rotations to (2.30a),
 
! γ −γ|v|/c2 !
1 −γ|v| γ  1
Lgeneral (~v ) =   ,
RT  1  R
1

i.e., R is a rotation matrix that rotates the vector ~v into the vector (|v|, 0, 0)T .
We see that the family of Lorentz boosts is a three-parameter family, where
the parameters are the three components of ~v in (2.31).

To obtain the inverse of Lorentz boost in x1 -direction (2.29) we perform a


straightforward calculation—we simply compute the inverse matrix. We find
that the inverse Lorentz boost is
    
t γ γv/c2 0 0 t̄
x1  γv γ 0 0  x̄1 
 
 = . (2.32)
x2   0 0 1 0  x̄2 
x3 0 0 0 1 x̄3

The result (2.32) is intuitively clear. The transformation (2.29) represents


the fact that the observer X̄ is in uniform motion with velocity v w.r.t. the
observer X. Then the inverse transformation must represent the fact that X is
in uniform motion with velocity (−v) w.r.t. the observer X̄. Accordingly, the
inverse transformation is again described by the Lorentz transformation (2.29),
where v is replaced by (−v), and we obtain (2.32).

version 20/01/2010 –35–


The Minkowski metric Chapter 2. Einstein

Finally, let us discuss the Newtonian limit of a Lorentz boost. Suppose we


have two inertial frames

X : {t, x1 , x2 , x3 } X̄ : {t̄, x̄1 , x̄2 , x̄3 } (2.33)

related by a Lorentz boost (2.29) with a relative velocity v ≪ c . Expanding γ


we obtain
1 1 1 v2  v4 
γ=p = 2 4 = 1 + + O , (2.34)
1 − v 2 /c2 1 − 21 vc2 + O( vc4 ) 2 c2 c4

and therefore
  v 4  h   v 2   x1  v 
1 v2 v 1i
t̄ = 1 + +O 4 t − 2x = t 1 + O 2 − O
2 c2 c c c c c
 2  4     2 
1v v     v
x̄1 = 1 + 2
+O 4 −vt + x1 = −vt + x1 1 + O 2 .
2c c c

When we keep only the highest order terms we get

t̄ ≃ t (2.35a)

x̄ ≃ x1 − vt (2.35b)

i.e., we recover the Galilean boost in x1 -direction.

2.4 The Minkowski metric

As demonstrated in section 2.1 it is convenient to use

(x0 , x1 , x2 , x3 )

instead of coordinates (t, x1 , x2 , x3 ). It is understood that the zero component


x0 encodes time. To account for units, one sets

x0 = ct .

Remark. Recall that one can interpret the coordinate x0 as measuring time in
‘light meters’ instead of in seconds (where a ‘light meter’ is the time that light
takes to travel through one meter). More or less equivalently, one can measure
time in seconds and length in light seconds.

–36– version 20/01/2010


Chapter 2. Einstein The Minkowski metric

The components of a four-vector v will be denoted by Greek indices; by v µ we


typically mean the collection of the four components (abstract index notation)
and not one particular component. The spatial components will be denoted by
Latin indices, which run from 1 to 3 (and are thus ’spatial indices’); by v i we
typically mean the collection of the three spatial components; alternatively we
write ~v .

µ, ν, . . . = 0, 1, 2, 3
i, j, . . . = 1, 2, 3

Using these conventions a Poincaré transformation relating an inertial frame of


reference X with another inertial frame X ′ can be written as

xµ′ = Lµν xν + aµ′ .

A Lorentz transformation reads

xµ′ = Lµν xν .

For example, a boost in direction of x1 takes the form


 0′     0
x γ −γv x
x1′  −γv γ  x1 
 =   , (2.36)
x2′   1  x2 
x3′ 1 x3

i.e., using the coordinate x0 corresponds to setting

c=1.

in (2.29) and other formulas. In particular, the γ-factor is

1
γ = γ(v) = √ . (2.37)
1 − v2

Remark. Since x0 = ct, all velocities are measured w.r.t. c. In particular,


|v| < c now becomes
|v| < 1 . (2.38)

W.r.t. some inertial frame of reference we make the following definition:

version 20/01/2010 –37–


The Minkowski metric Chapter 2. Einstein

Definition 2.4. The Minkowski pseudo-scalar product (Minkowski metric) is


the non-degenerate bilinear form

η(v, w) = −v 0 w0 + v 1 w1 + v 2 w2 + v 3 w3 = −v 0 w0 + ~v w
~, (2.39a)

where v and w are arbitrary vectors. Equivalently, we can write


   0
−1 w
 1   1
η(v, w) = (v 0 , v 1 , v 2 , v 3 )   w  = v T η w , (2.39b)
 1  w 2 
1 w3
or
 
−1
  1 
η(v, w) = ηµν v µ wν , where ηµν µ,ν
=

.
 (2.39c)
1
1
Theorem 2.5. The Lorentz transformations are uniquely characterized as the
transformations that leave η(·, ·) invariant, i.e.,

η(Lv, Lw) = η(v, w) (2.40)

for all v, w and L.

Proof. Suppose that L is a Lorentz transformation. By theorem 2.2 of sec-


tion 2.1 this means that
LT η L = η
(or Lµν Lσλ ηµσ = ηνλ in index notation). We obtain

η(Lv, Lw) = (Lv)T η(Lw) = v T LT ηL)w = v T ηw = η(v, w) ,

i.e., η(·, ·) is left invariant. Conversely, suppose that η(·, ·) is left invariant by
a transformation L, i.e.,
η(Lv, Lw) = η(v, w)
for all v, w. Then

0 = η(Lv, Lw) − η(v, w) = (Lv)T η(Lw) − v T ηw = v T LT ηL − η)w .

Since this holds for all v, w, the term in brackets must vanish, i.e.,

LT ηL − η = 0 .

We conclude that L is a Lorentz transformation.

–38– version 20/01/2010


Chapter 2. Einstein The Minkowski metric

Theorem 2.5 is central to our following considerations. The Minkowski met-


ric (2.39a), although defined w.r.t. some inertial observer, is the same for every
other inertial observer. This shows that spacetime is endowed with a natu-
ral geometric structure, a pseudo-scalar product : the (coordinate-independent,
frame-independent, observer-independent ) Minkowski metric.

The geometric structure of spacetime has been unveiled. In the following sec-
tions, this geometric structure will be the cornerstone for everything.

version 20/01/2010 –39–


The Minkowski metric Chapter 2. Einstein

–40– version 20/01/2010


CHAPTER 3

MINKOWSKI

3.1 Minkowski spacetime

We collect our previous considerations and condense our findings into what is in
fact the actual postulate of special relativity: The model of our spatiotemporal
reality is Minkowski spacetime.

Spacetime is an affine space modeled over a four-dimensional Lorentzian


vector space.

Remark. Recall that (Galilean) space is an affine space modeled over a three-
dimensional Euclidean vector space, see section 1.1. Note the analogy.

In connection with this definition some explanations are in order.

Definition 3.1. A Lorentzian vector space is a four-dimensional vector space


that is endowed with a pseudo-scalar product of signature (− + ++). We denote
this pseudo-scalar product by η = η(·, ·) and call it Minkowski metric.

The existence of a pseudo-scalar product η(·, ·) of signature (− + ++) means

version 20/01/2010 –41–


Minkowski spacetime Chapter 3. Minkowski

that there exists bases of the vector space, {e0 , e1 , e2 , e3 }, such that
 
−1
 1 
η(eµ , eν ) = ηµν = 

, (3.1)
1 
1

i.e., η(e0 , e0 ) = −1, η(e1 , e1 ) = 1, η(e2 , e2 ) = 1, η(e3 , e3 ) = 1, and η(eµ , eν ) = 0


for µ 6= ν. Bases of this type are called pseudo-orthonormal.

Let v and w be four-vectors, i.e.,


 0  0
 0 v  0 w
v  v1  w w1 
vµ = =
, wµ = = , (3.2)
~v v2  w
~ w2 
v3 w3

w.r.t. some (pseudo-)orthonormal basis {e0 , e1 , e2 , e3 }. W.r.t. this basis, the


(pseudo-)scalar product of these vectors reads

η(v, w) = ηµν v µ wν = v T η w , (3.3)

or, alternatively,

η(v, w) = v µ wµ (3.4)

where wµ = ηµν wν , or, explicitly,

η(v, w) = −v 0 w0 + ~v w
~ = −v 0 w0 + v 1 w1 + v 2 w2 + v 3 w3 . (3.5)

By definition, Minkowski spacetime is an affine space modeled over a four-


dimensional Lorentzian vector space. (In fact, in special relativity, spacetime
is Minkowski spacetime.) By choosing an origin in Minkowski spacetime, it
becomes a vector space (i.e., identified with the Lorentzian vector space under-
lying it).

Definition 3.2. A point in Minkowski space is called event. An event contains


the information of when and where. An element of a Lorentzian vector space
we call four-vector.

Remark. Frequently, for brevity, Minkowski spacetime is called Minkowski


space.

–42– version 20/01/2010


Chapter 3. Minkowski The geometry of Minkowski spacetime

3.2 The geometry of Minkowski spacetime

Since the pseudo-scalar product η(u, u) is by definition not positive definite,


the square η(u, u) = uµ uµ of a four-vector u can have any sign.
Definition 3.3. A four-vector u 6= 0 is called
• spacelike, if η(u, u) > 0,
• null, if η(u, u) = 0,
• timelike, if η(u, u) < 0.

The basis vector e0 satisfies η(e0 , e0 ) = η00 = −1 and is therefore timelike.


The basis vectors ei (i = 1, 2, 3) satisfy η(ei , ei ) = ηii = 1 and are therefore
spacelike.
Remark. We will see later that timelike four-vectors are used to describe the
motion of massive particles, i.e., velocities less than the speed of light; null
vectors are used to describe the motion of massless particles (photons). Fi-
nally, spacelike four-vectors are used to describe the motion of superluminous
particles a.k.a. tachyons; since such particles probably do not exist, the main
role of spacelike vectors is to measure lengths.

Z A note of advice. The reader is strongly recommended to prepare their own lecture
notes and to include the figures/Minkowski diagrams drawn on the blackboard during
the lecture course.

A special case we will consider frequently is two-dimensional Minkowski space.


It is given by suppressing the 2- and the 3-component of vectors; in other
words: it is the subspace spanned by the vectors {e0 , e1 }. In two-dimensional
Minkowski space, w.r.t. the (pseudo-)orthonormal basis {e0 , e1 }, we have
   0
0 0 1 1 0 1 −1 w
η(v, w) = −v w + v w = (v , v ) . (3.6)
1 w1
Consider a vector  
sin ϕ
u=r
cos ϕ
in two-dimensional Minkowski space; r 6= 0, ϕ ∈ (−π, π]. Let us compute
η(u, u):
η(u, u) = r 2 (− sin2 ϕ + cos2 ϕ) = r 2 cos(2ϕ)

It follows that

version 20/01/2010 –43–


The geometry of Minkowski spacetime Chapter 3. Minkowski

• u is spacelike, if ϕ ∈ (−45◦ , 45◦ ),


• u is null, if ϕ = −45◦ or ϕ = 45◦ ,
• u is timelike, if ϕ < −45◦ or ϕ > 45◦ .

The most conspicuous structure is the set of the two null lines, which are the
straight lines consisting of null vectors (at ϕ = ±45◦ ).

Now consider two vectors


   
sin ϕ sin ψ
u = ru and v = rv
cos ϕ cos ψ
in two-dimensional Minkowski space. We ask the question when these two
vectors are orthogonal. Let us therefore compute η(u, v):
η(u, v) = ru rv (− sin ϕ sin ψ + cos ϕ cos ψ) = ru rv cos(ϕ + ψ)
It follows that
u and v are orthogonal if ψ = 90◦ − ϕ , (3.7)
which is the case if u and v are related by a reflection at the null line with
angle 45◦ .

In two-dimensional Minkowski space we thus arrive at the simple consequence:


when u is timelike, then its orthogonal vector v is spacelike, and conversely,
when u is spacelike, then its orthogonal vector v is timelike. Note that a null
vector k is self-orthogonal, since η(k, k) = 0.

Finally, let us investigate the set of ‘unit vectors’, i.e., vectors


 0
u
u=
u1
such that η(u, u) = 1 (for spacelike vectors) or η(u, u) = −1 (for timelike
vectors); note that null vectors k cannot be normalized since η(k, k) = 0.

If u is spacelike and normalized, i.e., η(u, u) = 1, this means that


η(u, u) = −(u0 )2 + (u1 )2 = 1 .
This is the equation of a unit hyperbola (situated in the spacelike part of
Minkowski space), whose asymptotes are the two null lines. Likewise, if u
is timelike and normalized, i.e., η(u, u) = −1, this means that
η(u, u) = −(u0 )2 + (u1 )2 = −1 ,

–44– version 20/01/2010


Chapter 3. Minkowski The geometry of Minkowski spacetime

which is again the equation of a unit hyperbola, whose asymptotes are the two
null lines.

Let us now consider four-dimensional Minkowski space. Consider a four-vector


 0
u
u1 
u= 
u2  (3.8)
u3

(w.r.t. some inertial basis {e0 , e1 , e2 , e3 }).

• Suppose that u is a null vector. Then

η(u, u) = −(u0 )2 + ~u2 = 0 ⇔ |u0 | = |~u| .

Accordingly, the null vectors lie on a cone in Minkowski space, which we


call the light cone. The set u0 > 0 forms the future light cone, the set
u0 < 0 is the past light cone.
• Suppose that u is a timelike vector that is normalized, i.e., η(u, u) = −1.
Then

η(u, u) = −(u0 )2 + ~u2 = −1 ⇔ (u0 )2 = 1 + ~u2 .

This is a unit hyperboloid in Minkowski space, which consists of two


disconnected parts. The part with u0 > 0 we call the unit mass shell.
(The nomenclature will become clear later.) If u is not normalized, i.e.,
η(u, u) = −const, then (u0 )2 = const + ~u2 , which is also a hyperboloid.
• Suppose that u is a spacelike vector that is normalized, i.e., η(u, u) = 1.
Then

η(u, u) = −(u0 )2 + ~u2 = 1 ⇔ (u0 )2 = −1 + ~u2 ,

which is again a unit hyperboloid in Minkowski space.

In two-dimensional Minkowski space the light cone reduces to the two null lines
with a slope of ±45◦ , and the hyperboloids reduce to hyperbolas.

We see that the family of timelike vectors falls into two disconnected classes:
Future-directed timelike vectors and past-directed timelike vectors. The analog
is true for null vectors.
Definition 3.4. A time-like four-vector is called future-directed if u0 > 0 and
past-directed if u0 < 0. The analog holds for null vectors.

version 20/01/2010 –45–


Inertial observers and the relativity of simulaneity Chapter 3. Minkowski

Concerning orthogonality let us consider the following generalization of space-


like versus timelike for four-dimensional Minkowski space.
Proposition 3.5. Suppose that u is a timelike four-vector. Then its orthogonal
complement (i.e., all four-vectors orthogonal to u) is spacelike.

Proof. Since η(u, u) < 0, w.l.o.g. we can assume η(u, u) = −1 (otherwise we


perform a rescaling). Set e0 = u and complete e0 to an orthonormal basis
{e0 , e1 , e2 , e3 }. W.r.t. this basis we have
η(v, v) = −(v 0 )2 + (v 1 )2 + (v 2 )2 + (v 3 )2 , (3.9)
where v = v µ eµ . By construction, the orthogonal complement of u = e0 is the
subspace spanned by {e1 , e2 , e3 }, i.e., the subspace of all four-vectors v with
v 0 = 0. Therefore, if we assume that v is orthogonal to u (where we assume
v 6= 0, of course), then equation (3.9) yields
η(v, v) = (v 1 )2 + (v 2 )2 + (v 3 )2 > 0 ,
i.e., v is spacelike.

Remark. The orthogonal complement of a null vector k is the tangential hy-


perplane of the light cone through k; it contains spacelike vectors and the line
hki of null vectors. The orthogonal complement of a spacelike vector contains
spacelike vectors, null vectors, and timelike vectors.
Definition 3.6. Two events p, q in Minkowski space are
• spacelike separated, if u = −
→ is spacelike,
pq
• null separated, if u = −

pq is null,
• timelike separated, if u = −
→ is timelike.
pq

It is important to emphasize that everything we did in this section is completely


observer-independent (frame-independent, coordinate-independent). The def-
inition of timelikeness (and the other concepts) is only based on the scalar
product η(·, ·), which exists independently of any choice of observer.

3.3 Inertial observers and the relativity of simulaneity

By an inertial frame of reference (inertial observer) we simply mean a


coordinate system that is adapted to the geometric structure; this involves the

–46– version 20/01/2010


Chapter 3. Minkowski Inertial observers and the relativity of simulaneity

choice of an arbitrary event (point) as the origin o and a pseud-orthonormal


basis {e0 , e1 , e2 , e3 }, i.e.,
η(eµ , eν ) = ηµν ,
which we can also write as

η(e0 , e0 ) = −1 η(e0 , ei ) = 0 η(ei , ej ) = δij ,

where i, j = 1, 2, 3. The vector e0 thus lies on the unit mass shell (since it is
timelike, future-oriented, and normalized), while the vectors ei are normalized
spacelike vectors orthogonal to it.
Remark. Recall that in two-dimensional Minkowski space we consider only e0
and e1 , where the two vectors are related by a reflection at the straight line
with slope 45◦ , see (3.7).

An inertial observer X assigns coordinates to each event (point) in Minkowski


space; these coordinates are denoted by

(xµ )µ=0,...,3 = (x0 , ~x) = (t, ~x) .

Henceforth we suppress the distinction between x0 and t and use the two inter-
changeably. In other words, when we write t, then we refer to time measured
in units such that
c=1.
These coordinates rely on the decomposition of four-vectors v w.r.t. the basis
{e0 , e1 , e2 , e3 }, i.e.,
v = v µ eµ = v 0 e0 + v i ei .
We write  0
v
v=
~v
w.r.t. the observer X.

The time-lines of an observer X are the straight lines given by ~x = const. The
special time-line ~x = 0 is the time-axis of the observer. Clearly, the time-lines
are associated with the vector e0 of the basis, which is simply because the
direction of the time-lines is given by e0 . Since the vector e0 is distinguished
from the other basis vectors by the fact that η(e0 , e0 ) = −1 (instead of +1),
and since the time-lines are constructed from it, the vector e0 is of central
importance.

version 20/01/2010 –47–


Inertial observers and the relativity of simulaneity Chapter 3. Minkowski

Frequently, this vector is thus denoted by a separate symbol (which is typically


u, v, or w). It is called the four-velocity of the inertial observer X; it is X’s
‘arrow of time’. When we write

“Consider an inertial observer X with four-velocity u ...” ,


it is understood that u is a four-vector satisfying u2 = η(u, u) = −1 (i.e., a time-
like vector on the unit mass shell), and we mean that the observer’s timelines
are determined by u. Accordingly, the observer is associated with a coordinate
system where e0 = u; this vector is simply supplemented by three orthonormal
basis vectors {e1 , e2 , e3 }, so that {u = e0 , e1 , e2 , e3 } form an orthonormal basis.
Remark. The four-velocity u of an observer determines his coordinate system
up to the freedom of choosing {e1 , e2 , e3 } in the orthogonal complement of u.
This freedom corresponds to rotations of {e1 , e2 , e3 }.
Remark. These considerations are intimately connected with proposition 3.5,
which guarantees that the orthogonal complement of u consists of spacelike
vectors (since u is timelike).

Suppose X is an inertial observer with four-velocity u. It is trivial to remark


that  
1
0
u= 0
 w.r.t. {u = e0 , e1 , e2 , e3 } , (3.10)
0
in other words, X’s own four-velocity w.r.t. X’s own coordinate system is the
zeroth unit vector (3.10).

Consider an inertial observer X with four-velocity u, i.e., an inertial frame


{u = e0 , e1 , e2 , e3 }. The time-lines of the observer are the straight lines of
constant ~x, e.g., the time-axis ~x = 0 is given as the line {tu | t ∈ R}. In
contrast, the planes of simultaneity are the planes of constant time t, i.e.,
t = const.

Let x be an event in Minkowski spacetime; X ascribes to x the coordinates


 
t
x= .
~x

Multiplication (in the sense of the scalar product) with the four-velocity u of
X yields
η(u, x) = −t .

–48– version 20/01/2010


Chapter 3. Minkowski Inertial observers and the relativity of simulaneity

Therefore, the plane of simultaneity t = τ (where τ = const) is given by the


set of all events x such that η(u, x) = tau, i.e.,

Plane t = τ ⇔ x | η(u, x) = tau . (3.11)

This is reminiscent of the standard Hessian normal form1 for the representation
of a plane in a Euclidean space. In (3.11) the four-vector u is the normal vector
(in the sense of the Minkowski metric, not in the standard Euclidean sense) of
the plane.

The plane t = 0 is x | η(u, x) = 0 , which is simply the plane orthogonal to
X’s four-velocity u (and thus the plane spanned by the spatial basis vectors
{e1 , e2 , e3 }); the planes t = τ are the paralllel planes.

Corollary 3.7. For an observer X with four-velocity u (where η(u, u) = −1),


the plane of simultaneity t = 0 is given by the set of all events x that are
orthogonal [in the sense of the Minkowski metric η(·, ·)] to u, i.e.,

η(u, x) = ηµν uµ xν = 0 . (3.12)

The planes of simultaneity t = τ (where τ is arbitrary) are the planes parallel


to it, i.e.,
η(u, x) = ηµν uµ xν = −τ . (3.13)

The corollary provides a coordinate-independent (frame-independent, observer-


independent ) way of defining simultaneity. This is because (3.13) is written
only in terms of the Minkowski metric η(·, ·) and does therefore not make use
of coordinates. We will come back to the issue of coordinate-independence in
the next section.

Simultaneity is a relative concept, i.e., observer-dependent. To see that consider


two arbitrary events p and q that are spacelike separated (i.e., the four-vector

→ is spacelike). There exist observers such that
pq

• the event p comes before the event q;


• the event p and the event q are simultaneous;
• the event p comes after the event q;
1
A plane (in R3 ) is the set of all ~
x such that ~n~
x = const, where ~
n is the Euclidean normal
vector of the plane. We could also write δ(~n, ~ x) = const, where δ(·, ·) is the standard
scalar product.

version 20/01/2010 –49–


Inertial observers and the relativity of simulaneity Chapter 3. Minkowski

in other words: two events that are spacelike separated do not have a unique
chronological order—their chronological order is observer-dependent. In the
following we prove this claim.

W.l.o.g. we assume that p coincides with the origin (otherwise we make a trans-
lation). Since q is spacelike separated from p we are able to choose an inertial
observer X whose (timelike) four-velocity u is orthogonal to (the spacelike vec-
tor) q. W.r.t. X we have
   
0 0
p= , q= .
~o ~q

By construction, for the observer X, p and q lie in the same plane of simul-
taneity (namely t = 0) and are thus simultaneous; note that η(u, p) = 0 and
η(u, q) = 0.

Let X ′ be another observer with four-velocity w, where we assume that w has


the form  0
w
w=
w
~
for the observer X, i.e., w.r.t. {u = e0 , e1 , e2 , e3 }. The planes of simultaneity
t′ = τ of the observer X ′ are given by the planes {x | η(w, x) = −τ }, see (3.13).
Clearly, since η(w, p) = 0, p lies on the plane t′ = 0; in contrast, by computing
η(w, q) = w~~ q we find that q lies on a plane t′ = τ > 0 when w~ ~ q < 0 and on

a plane t = τ < 0 when w~ ~ q < 0. In other words, for an observer X ′ with
four-velocity w and w ~ q < 0, the event q comes at a time t′ = τ
~ such that w~
after t = 0 (which is the time of p); for an observer X ′ with four-velocity w

and w ~ q > 0, the event q comes at a time t′ = τ before t′ = 0. This


~ such that w~
proves the claim.

Z For a better understanding of these results, Minkowski diagrams are essential.

Finally, consider two arbitrary events p and q that are timelike/null separated
(where the timelike/null four-vector −

pq is assumed to be future-oriented). For
all observers the event p comes before the event q. The proof is left as an
exercise.

Definition 3.8. The future of an event o is the set of all events that are
timelike/null separated from o by a future-oriented timelike/null vector. The
past of an event o is the set of all events that are timelike/null separated from

–50– version 20/01/2010


Chapter 3. Minkowski Four-velocities and three-velocities of observers

o by a past-oriented timelike/null vector. The present of an event is the set of


all events that are spacelike separated from o.
Remark. In other words, the future light cone and its interior are the future,
the past light cone and its interior are the past, and the exterior of the light
cone is the present of an event o.

3.4 Four-velocities and three-velocities of observers

Consider an inertial observer X with four-velocity u, i.e., X has an inertial


frame {u = e0 , e1 , e2 , e3 }. Let’s denote by X ′ a second inertial observer, whose
four-velocity is given by w and whose frame is {w = e′0 , e′1 , e′2 , e′3 }.

Consider the observer X ′ , whose four-velocity is w. The time-lines of X ′ are


the lines of constant values of ~x′ ; in particular, the time-axis is ~x′ = 0, or
hwi = {t′ w | t′ ∈ R}. Evidently, X ′ says that his four-velocity w is
 
1
.
~o
The observer X does not agree. W.r.t. the basis used by X, the four vector w
is  0
w
,
w~
where −(w0 )2 + w~ 2 = −1 since w is a unit vector. Note that w is always the
same vector, but once as seen by X ′ (i.e., decomposed w.r.t. the frame of X ′ ),
once as seen by X (i.e., decomposed w.r.t. the basis of X).

Accordingly, the time-axis of X ′ looks like


n w 0  o n 
1
 o n  1  o
0
hwi = λ |λ ∈ R = λw 0 |λ ∈ R = λ̃ 0 | λ̃ ∈ R .
w
~ w/w
~ w/w
~

~ 0 , which implies
Hence, t = λ̃ and ~x = λ̃ w/w
w
~
~x = t.
w0
In other words, for the inertial observer X, the observer X ′ is in uniform motion
with velocity
w
~ w
~
~v = 0 = √ .
w 1+w ~2

version 20/01/2010 –51–


Four-velocities and three-velocities of observers Chapter 3. Minkowski

Accordingly,
~v
~ =√
w = γ~v
1 − ~v 2
and w0 = γ. This implies that, while w is the zeroth unit vector as seen by X ′ ,
it looks like  0  
w 1
w= =γ w.r.t. X , (3.14)
w
~ ~v
i.e., w.r.t. the basis {e0 , e1 , e2 , e3 } used by X.

Let us summarize. For the four-velocity w of the inertial observer X ′ we find

   
w0′ 1
w= = w.r.t. X ′ = {e′0 , e′1 , e′2 , e′3 } , (3.15a)
~′
w ~o
   
w0 1
w= =γ w.r.t. X = {e0 , e1 , e2 , e3 } , (3.15b)
w
~ ~v

where ~v is the relative velocity of the observer X ′ as seen by X.

Obviously, the scenario (3.15) can also be described by interchanging the roles
of X and X ′ .
 0  
u 1
u= = w.r.t. X = {e0 , e1 , e2 , e3 } , (3.16a)
~u ~o
   
u0′ 1
u= =γ w.r.t. X ′ = {e′0 , e′1 , e′2 , e′3 } . (3.16b)
~u′ −~v

When X ′ moves with ~v w.r.t. X, then X ′ sees X move away with (−~v ).

When we form the scalar product η(u, w) from (3.15) we obtain

η(u, w) = −γ . (3.17)

Since the expression η(u, w) is coordinate-independent (frame-independent, ob-


server-independent ), the result is the same w.r.t. all coordinate systems. (In
particular, one can easily check that the result is the same when computed
w.r.t. {e0 , e1 , e2 , e3 } and {e′0 , e′1 , e′2 , e′3 }.) Forming the product η(u, w) thus
leads to a coordinate-independent way of defining the absolute value of the
relative velocity ~v between two observers.

–52– version 20/01/2010


Chapter 3. Minkowski Four-velocities and three-velocities of observers

Corollary 3.9. Consider two observers X and X ′ represented by the four-


velocities u and w, respectively. Then |~v | is given by
1
γ=p = −η(u, w) . (3.18)
1 − |~v |2

version 20/01/2010 –53–


Four-velocities and three-velocities of observers Chapter 3. Minkowski

–54– version 20/01/2010


CHAPTER 4

PARTICLES

4.1 World lines

A point particle is represented in Minkowski space by a curve


R ∋ λ 7→ x(λ) ,
where the parametrization is completely irrelevant. It is the image of the curve
(i.e., the so-called ‘geometric curve’) that encodes all the information on the
whereabouts and ‘whenabouts’ of the particle.1
Definition 4.1. The curve in Minkowski space representing a point particle is
called the world line of the particle.
Remark. An inertial observer X is characterized by a family of world lines,
namely by the straight time-lines ~x = const.

Consider the world line of a particle, i.e., the curve


R ∋ λ 7→ x(λ) . (4.1)
We can form the (field of) tangent vectors along the world line; the tangent
vector w(λ) = dx(λ)/dλ is a four-vector,
d
R ∋ λ 7→ w(λ) = x(λ) . (4.2)

1
See, however, the treatment and interpretation of null lines in section 4.6.

version 20/01/2010 –55–


World lines Chapter 4. Particles

W.r.t. some inertial coordinate system X we have


 
t(λ)
x(λ) = ,
~x(λ)
   0 
d d t(λ) w (λ)
w(λ) = x(λ) = = .
dλ dλ ~x(λ) w(λ)
~

The normal three-velocity ~v of the particle, as seen by X, is given by

w(λ)
~
~v (λ) = .
w0 (λ)

To see this we note that


d~x d~x dλ d~x(λ)  dt(λ) −1 w
~
~v = = = = 0.
dt dλ dt dλ dλ w

Therefore, we can write


 
t(λ)
x(λ) = ,
~x(λ)
    (4.3)
d d t(λ) 0 1
w(λ) = x(λ) = = w (λ) .
dλ dλ ~x(λ) ~v (λ)

Definition 4.2. A curve λ 7→ x(λ) is called


d
• spacelike, if w(λ) = dλ x(λ) is spacelike for all λ,
d
• null, if w(λ) = dλ x(λ) is null for all λ,
d
• timelike, if w(λ) = dλ x(λ) is timelike for all λ.

Timelike world lines describe the motion of particles at velocities less than the
speed of light, null world lines describe the motion of particles at the speed of
light. This is a simple consequence of (4.3): If the velocity is smaller than the
speed of light, i.e., |~v | < 1, then the vector (1, ~v )T is timelike (and, incidentally,
future-oriented), because −1 + ~v 2 < 0; hence w is timelike. If the velocity
equals the speed of light, then (1, ~v )T is null, because −1 + ~v 2 = 0; hence the
tangent vector w is a null vector. Finally, if the velocity is superluminous, then

–56– version 20/01/2010


Chapter 4. Particles Tachyons

(1, ~v )T and thus w is spacelike.


 
0 1
|~v | < 1 ⇔ w=w is timelike , (4.4a)
~v
 
0 1
|~v | = 1 ⇔ w=w is null , (4.4b)
~v
 
1
|~v | > 1 ⇔ w = w0 is spacelike . (4.4c)
~v

It is important to stress that this classification is observer-independent (which is


simply because the r.h. side does not make use of coordinates). While different
observers might not agree on the issue of how fast (in m/s) a particle actually
is (for one observer, the particle is at rest, for the other it might move fast),
they always agree on the issue of whether the particle moves slower than the
speed of light, at the speed of light, or faster.

A simple parametrization of the world line (4.3) is to use the coordinate time
t of some observer X as the parameter λ. Then (4.3) becomes
 
t
x(t) = ,
~x(t)
   
d d t 1
w(t) = x(t) = = .
dt dt ~x(t) ~v (t)
In general, reparametrizations of a curve change the length of the tangent vec-
tors. Obviously, the coordinate time reparametrization corresponds to setting
w0 = 1 in (4.3).

4.2 Tachyons

Tachyons are supposed to be particles that are represented by spacelike curves


and are thus associated with superluminous velocities. Probably, tachyons do
not exist, since they would cause violation of causality. This will be shown in
the following.

Z Again it is recommended to use Minkowski diagrams to illustrate the text.

Assume that an observer X sends a tachyon from event p to event q. By


definition, tachyons move along spacelike curves, i.e., in the ‘present’, which is

version 20/01/2010 –57–


Tachyons Chapter 4. Particles

the exterior of the light cone of p. In brief: p and q are spacelike separated.
This has some counterintuitive consequences: For X, p comes before q (p is
‘emission’, q is ‘reception’ of the tachyon). However, there exist observers, for
who p and q are simultaneous. There exist even observers, for who p comes
after q. For such observers, the tachyon is seen to come out of the receiver at
q, move in the direction to the emitter, and vanish in the emitter p. This is
weird.

Violation of causality becomes inevitable, when we consider a scenario involving


two observers X and X ′ who are both able to send tachyons and make the
following agreement: X sends a tachyon to X ′ and X ′ send a tachyon back
immediately upon reception. For simplicity, we assume that the tachyons move
extremely fast, namely with infinite velocity. (Note, however, that the same
argument still holds if we assume the velocity to be only slightly larger than
the speed of light.) The observer X has installed a tachyon emitter/receiver
at ~x = 0, so that the tachyon emitter’s world line is the time-axis of X. X
sends a tachyon to X ′ at t = 0, i.e., from the event p = (0, ~o), along the plane
of simultaneity t = 0 to the event q = (0, ~q). X ′ receives the tachyon at q
and sends it back immediately; it reaches the emitter/receiver that X uses at
the event p′ = (t, ~o). Since X ′ is an observer who is completely equivalent
to X, X ′ is able to return tachyons at the same speed, i.e., again along a
plane of simultaneity. However, for X ′ , whose four-velocity is u′ , the planes
of simultaneity are not t = const, but t′ = τ ′ = const, or, in coordinate-
independent notation, the planes of all events x such that η(u′ , x) = −τ ′ ,
see (3.13). Let the four-velocity u′ of X ′ be given by
 
′ 1
u =γ ,
~v

where ~v is the velocity of X ′ as seen by X, see (3.15). The event q lies on the
plane of simultaneity t′ = τ ′ with
   
′ 1 0 
η(u , q) = η γ , = γ~v ~q = −τ ′ .
~v ~q
To compute the time t when X receives the tachyon that comes back we
compute the intersection of that plane with the world line (t, 0) of the emit-
ter/receiver X uses:
   
1 t  !
η(u′ , p′ ) = η γ , = γ~v ~q .
~v 0
| {z }
−γt

–58– version 20/01/2010


Chapter 4. Particles Proper time

It follows that the time of reception (corresponding to event p′ ) is given by

t = −~v ~q , (4.5)

which is negative (when we have made the right choices, i.e., ~v ~q > 0). But this
means that X receives the tachyon that X ′ has sent back before the original
tachyon has been emitted (at t = 0). Then what is supposed to happen, if X de-
cides, during the time-interval (−~v ~
q , 0), “I don’t feel like sending a tachyon any
longer.” Then X ′ couldn’t have received anything, and consequently, wouldn’t
have sent back anything. But then where did the tachyon that X received at
t = −~v ~q come from?

Evidently, the existence of tachyons leads to a violation of causality. Hence-


forth, we exclude the possibility that such particles could exist and postulate:

Particles move along world lines that are timelike or null.

4.3 Proper time

Consider a particle described by the world line

R ∋ λ 7→ x(λ) , (4.6a)

and let
d
w(λ) =x(λ) . (4.6b)

be the tangent vector, which we suppose to be timelike for all λ. We ask
the question of how much time passes for the particle between two events
x1 = x(λ1 ) and x2 = x(λ2 ).

Consider the particle at a fixed value of λ, i.e., at a fixed event x(λ). To measure
how much time passes in the infinitesimal interval [λ, λ+ dλ], the particle needs
an observer. But not any observer:

Time is measured in an inertial coordinate system in which the particle is


(momentarily) at rest, the momentary rest frame.

If the particle is not in uniform motion, there does not exist one global rest
frame. However, there exists a momentary rest frame, i.e., an inertial observer
for whom the particle is at rest at the instant of time λ.

version 20/01/2010 –59–


Proper time Chapter 4. Particles

This inertial frame is associated with the observer whose four-velocity u is


parallel to the tangent vector w at λ, i.e., u is given by

w(λ)
u= q . (4.7)
−η w(λ), w(λ)

Note that u2 = η(u, u) = −1 as it is required for an observer’s four-velocity.


W.r.t. the rest frame {u = e0 , e1 , e2 , e3 } the world line’s tangent vector at λ,
i.e., the tangent at the event x(λ), reads
 0  q !
w (λ) −η w(λ), w(λ)
w(λ) = = w.r.t. {u = e0 , e1 , e2 , e3 } .
w(λ)
~ ~o

In the infinitesimal interval [λ, λ + dλ] the world line connects the event x(λ)
with the event
q !
−η w(λ), w(λ)
x(λ + dλ) = x(λ) + w(λ)dλ = x(λ) + dλ .
~o

Accordingly,
    q  !
t(λ + dλ) t(λ) −η w(λ), w(λ) dλ
x(λ + dλ) − x(λ) = − =
~x(λ + dλ) ~x(λ) ~o

The time that has elapsed for the particle between the events x(λ) and x(λ+dλ)
we call ds. It coincides with the element dt = t(λ+dλ)−t(λ) in the coordinates
of the momentary rest frame and can thus be read off straightforwardly as the
0th component of the four-vector x(λ + dλ) − x(λ). Therefore,
q 
ds = −η w(λ), w(λ) dλ . (4.8)

Remark. Since the particle is not in uniform motion in general, the momen-
tary rest frame changes along the world line of the particle. Therefore, to
avoid ambiguities in the notation we do not denote the time that elapses along
a particle’s world line by t (which is the coordinate time of some particular
momentary rest frame), but by s. This is the proper time of the particle.

Definition 4.3. The proper time (denoted by s) along a world line of a particle
is the flow of time as measured in the momentary rest frames of the particle.

–60– version 20/01/2010


Chapter 4. Particles Proper time

In order to obtain the proper time s it merely remains to integrate (4.8) along
the particle’s world line. The proper time s along a world line is given by
Z q

s= −η w(λ), w(λ) dλ . (4.9a)

Accordingly, the proper time ∆s that elapses between two events x1 = x(λ1 )
and x2 = x(λ2 ) is
Z λ2 q 
∆s = s|x2 − s|x1 = −η w(λ), w(λ) dλ . (4.9b)
λ1

Remark. It is not difficult to show that these formulas are in fact independent
on the chosen parametrization of the world line: Let κ denote the parameter
of an alternative parametrization, i.e., λ = λ(κ); then the tangent vector w.r.t.
the reparametrized curve is

d d   dλ  dλ
w̃(κ) = x(λ) = x(λ) = w(λ) ,
dκ dλ dκ dκ
where λ = λ(κ). Therefore,
q  q  dλ q 
−η w̃(κ), w̃(κ) dκ = −η w(λ), w(λ) dκ = −η w(λ), w(λ) dλ ,

i.e., as expected, the concept of proper time depends only on the geometric
curve and not on the actual parametrization of the world line.
Remark. The definition of proper time is coordinate-independent (frame-inde-
pendent), since only the Minkowski metric η(·, ·) appears in (4.9).

Let us consider a (fixed) inertial observer X, whose coordinates are (t, ~x).
(This coordinate system is completely arbitrary; in particular, it need not be
the momentary rest frame of the particle under consideration at any time.)
W.r.t. this observer’s coordinate time t we parametrize the world line of the
particle. The world line then reads
 
t
R ∋ t 7→ x(t) = , (4.10)
~x(t)

and the tangent vector is


   
d 1 1
x(t) = = ,
dt d~x(t)/dt ~v (t)

version 20/01/2010 –61–


The twin paradox Chapter 4. Particles

where ~v is the three-velocity of the particle at time t as seen by the observer


X. Inserting this into (4.8) and (4.9) and replacing λ by our current choice of
parameter t, we obtain p
ds = 1 − ~v 2 (t) dt (4.11a)
Z p
s= 1 − ~v 2 (t) dt (4.11b)

The proper time that elapses for the particle during an interval [ti , tf ] of coor-
dinate time t, i.e., between the events xi = x(ti ) and xf = x(tf ), is thus
Z tf p
∆s = s|xf − s|xi = 1 − ~v 2 (t) dt . (4.11c)
ti

The factor 1 − ~v 2 is the time dilation factor. During the time dt, which passes
for the observer X, the particle’s proper time only increases by ds < dt. This
gives rise to the so-called twin paradox.

4.4 The twin paradox

We consider two particles—let’s call them twins—that are represented by the


two (timelike) world lines
   
t t
R ∋ t 7→ xA (t) = and R ∋ t 7→ xB (t) = .
~xA (t) ~xB (t)
Let us assume that twin A “stays at home”, i.e., twin A is always at rest w.r.t.
an inertial coordinate system. W.l.o.g. the spatial coordinates of twin A are
zero, i.e.,  
t
xA (t) = .
~o
Twin B is supposed to lead a more adventurous life than her sibling. At time
t = ti = 0, twin B leaves A for a life “somewhere in Minkowski space”. Only at
time t = tf , twin B returns and keeps company with A again. Accordingly,
 
t
xB (t) = , where ~xB (0) = ~xB (tf ) = 0 ,
~xB (t)

and
 
d 1
xB (t) = .
dt ~vB (t)

–62– version 20/01/2010


Chapter 4. Particles Four-velocities

For twin A, his sister has been away for a time tf . In other words, the proper
time ∆sA for A between departure and arrival of B is ∆sA = tf . This is obvious
since A is always at rest w.r.t. the inertial coordinate system. Alternatively,
one can compute ∆sA explicitly by inserting ~v = ~vA = ~o into (4.11).

The proper time ∆sB that passes for twin B is different. We obtain
Z tf q
∆sB = 2 (t) dt
1 − ~vB
0

from (4.11). By assumption, ~vB (t) 6= 0 at least for some t; therefore the square
root is less than one at least for some t. Consequently,
Z tf q Z tf
∆sB = 2 (t) dt <
1 − ~vB dt = tf = ∆sA
0 | {z } 0
<1

and we find that


∆sB < ∆sA . (4.12)
We conclude that, during her voyage, B has aged at a slower rate than A, so
that, upon her return, twin B is younger than twin A.
Remark. Note that we have not made any assumptions about the specifics of
B’s journey. The fact that ∆sB < ∆sA holds independently of these details.
Remark. A common point of confusion is the following. Doesn’t relativity mean
that I could also take twin B’s point of view? Viewed by B, it is A who moves
away and comes back. Then ∆sB > ∆sA , no? Indeed: no. The difference is
clear: A is at rest, all the time, w.r.t. an inertial coordinate system; B isn’t (at
best, B is described by a sequence of momentary rest frames). A can take the
point of view of an inertial observer, while B can’t, because accelerations are
involved in the world line of B.

4.5 Four-velocities

Consider again a particle represented by a timelike world line R ∋ λ 7→ x(λ).


Let s denote the proper time along this world line. It suggests itself to use
proper time s instead of the (arbitrary) parameter λ to parametrize the world
line, i.e.,
R ∋ s 7→ x(s) . (4.13)

version 20/01/2010 –63–


Four-velocities Chapter 4. Particles

The tangent vector of this curve in this parametrization we call u; it is

d
u(s) = x(s) , (4.14)
ds
and it has an important property:

Proposition 4.4. The tangent vector u = u(s) of a world line that is para-
metrized w.r.t. proper time s is normalized, i.e., η(u, u) = −1.

Proof. To prove the proposition we make a straightforward computation. Since

dx dx dλ dx  ds −1 w
u= = = =p ,
ds dλ ds dλ dλ −η(w, w)

we find
1
η(u, u) =  η(w, w) = −1 ,
−η(w, w)
as claimed.

Remark. The proposition is a direct consequence of our definition of proper


time. Proper time is the coordinate time associated with the momentary rest
frame; the four-velocity of the momentary rest frame and the tangent vec-
tor (4.14) must thus coincide; u = (1, 0, 0, 0)T in the coordinates of the mo-
mentary rest frame (at some instant of time). Accordingly, η(u, u) = −1.

Definition 4.5. The tangent vector u = u(s) of a world line representing a


particle (where the world line is parametrized w.r.t. proper time) is called the
particle’s four-velocity; η(u, u) = −1.

Remark. The concepts of a particle’s four-velocity and the four-velocity of an


inertial observer are intimately related, the main difference being that the four-
velocity of a particle is defined along the particle’s world line, while the four-
velocity of an inertial observer can be thought of as a (constant) vector field
on the entire Minkowski space.

Consider a (fixed) inertial observer X, whose coordinates are (t, ~x). As seen
previously, w.r.t. this observer’s coordinate time t, the world line of the particle
reads  
t
R ∋ t 7→ x(t) = , (4.15a)
~x(t)

–64– version 20/01/2010


Chapter 4. Particles Four-velocities

and the tangent vector is


   
d 1 1
x(t) = = ,
dt d~x(t)/dt ~v (t)
where ~v = ~v (t) is the three-velocity of the particle at time t as seen by the
observer X. The four-velocity u associated with the particle corresponds to
the normalized tangent vector; since its squared norm is (−1 + ~v 2 ) = γ −2 we
again  
1
u(t) = γ . (4.15b)
~v (t)
Corollary 4.6. W.r.t. a given observer X (whose basis is {e0 , e1 , e2 , e3 }), the
four-velocity of a particle can be written as
 
1
u=γ w.r.t. X , (4.16)
~
v

where ~v = ~v (t) is the standard three-velocity of the particle w.r.t. this observer.
Remark. Equation (3.15) and (4.16) resemble each other closely, the main dif-
ference being that ~v = const in (3.15) while ~v = ~v (t) varies along the world
line of the particle in (4.16).

As an exercise let us rederive (4.16) in a slightly different way. Let us repara-


metrize the world line
 (4.15a) by proper time s, i.e., we express t as a function
of s and ~x = ~x t(s) ,
 
 t(s) 
R ∋ s 7→ x t(s) = . (4.17)
~x t(s)
The derivative w.r.t. s yields
  dt
!  
 d t(s)  ds (s) dt 1 
u t(s) = =  dt = (s) . (4.18)
ds ~x t(s) d~
x ds ~v t(s)
dt t(s) ds (s)

From
ds p
= 1 − ~v 2 (t) ,
dt
see (4.11), we conclude that

dt 1
(s) = q  =γ,
ds 1 − ~v 2 t(s)

version 20/01/2010 –65–


Four-velocities Chapter 4. Particles

therefore (4.18) becomes


 
 1 
u t(s) = γ ,
~v t(s)

i.e., we reproduce (4.16).

Corollary 3.9 of section 3.4 states that the (absolute value of the) relative
velocity between two inertial observers is obtained in a coordinate-independent
(observer-independent ) manner via the scalar product. The analog holds for
the velocity of a particle w.r.t. a given observer:

Consider a particle described by a timelike world line R ∋ s 7→ xµ (s), where s


denotes proper time; let uµ = dxµ /ds be the four-velocity.2 W.r.t. an observer
X the four-velocity uµ becomes
 
µ 1
u =γ , (4.19)
~v

see (4.16); canonically, u is parametrized w.r.t. proper time s, but, equivalently,


we may parametrize it w.r.t. X’s coordinate time t. Let uXµ denote the four-
velocity of the observer X. W.r.t. the coordinates of X, we have
 
µ 1
uX = . (4.20)
~o

Hence, ηµν uXµ uν = −γ, i.e.,

η(uX , u) = −γ , (4.21)

or, when we write out the dependence on t,

1 
γ(t) = p = −η uX , u(t) . (4.21′ )
1 − ~v 2 (t)

This equation thus determines |~v (t)| in a coordinate-independent way.3


2
Here we are using abstract index notation for a change.
3
Recall that coordinate-independence means that the objects involved (which are η(·, ·),
uX , and u in the present context) are given in an abstract way (i.e., without resorting
to coordinates). To compute the product η(uX , u) we can choose any coordinate system
that comes to mind.

–66– version 20/01/2010


Chapter 4. Particles Four-velocities

Example. Consider the two world lines


   
  t   t
t − 1 sin t t  3 arctan t
xA (t) = = 2
 0 
 , xB (t) = = 4

,

~xA (t) ~xB (t) 0
0 0
(4.22)
which are given in some inertial coordinate system {t, ~x}. (In principle we
could reparametrize each of these world lines with proper time; however, since
this involves elliptic integrals, we refrain from doing so in the present context.)
The four-velocities associated with (4.22) are
   
  1   1
1 − 1 cos t 1 3 1 2 
uA (t) = γA 
= γA  2  , uB (t) = γB = γB
 4 1+t  ,
~vA (t) 0  ~vB (t)  0 
0 0
where γA = γ(~vA ) and γB = γ(~vB ). For instance, at t = 0, we have
   
1 1
− 1  3
uA t=0 = γA  2
 0 , uB t=0 = γB  4
0 , (4.23)
0 0
√ √
where γA = 2/ 3 and γB = 4/ 7, since |~vA | = 1/2 and |~vB | = 3/4. To obtain
the relative velocity |~v | = |~vAB | between the two particles (for t = 0) we may
apply (4.21) in the straightforward way, i.e.,
1 
γ=p = −η uA , uB ,
1 − |~v |2
which yields
10
|~v |
. = (4.24)
11 t=0
Exercise. In the above example, when we proceed analogously for t 6= 0, the
‘relative velocity’ is found to be
3 + 2(1 + t2 ) cos t
|~v | = 2 . (4.25)
8(1 + t2 ) + 3 cos t
Argue that (4.25) is basically meaningless in general for t 6= 0. For instance,
for t = 1.90172, the formula yields |~v | = 0; this suggests that we can conclude
that the particles A and B represented by the world lines are at rest w.r.t.
each other at that time. But this does not make sense; why? And why is the
situation for t = 0 different?

version 20/01/2010 –67–


Photons Chapter 4. Particles

Exercise. And now it becomes really challenging. . . . Define tA = 1.86274


and tB = 2.05225 (± errors in the next digits). Argue that it makes sense
to say that the particles are at rest w.r.t. each other when particle A is at
the event (tA , xA (tA )) and B at (tB , xB (tB )). There exist other pairs (tA , tB )
which exhibit the same property. Compute another pair (using Mathematica or
Maple). Can one deduce something more general from this example?

4.6 Photons

The simplest solutions of Maxwell’s (vacuum) equations are plane waves. Let X
be an inertial observer, whose basis is {u = e0 , e1 , e2 , e3 } and whose coordinates
are (t, ~x). In these coordinates, a plane wave is given by

~
φ(x) = φ(t, ~x) = aei(−ωt+k~x) , (4.26)

where a is the amplitude, ω the angular frequency, and ~k the wave vector. The
phase velocity ω/|~k| coincides with the speed of light (where c = 1), hence

ω = |~k| . (4.27)

A simple computation shows that (4.26) with (4.27) is indeed a solution of the
free wave equation φ = 0.

Define the four-vector k according to


 
µ ω
k = (k )µ=0,...3 = ~ . (4.28)
k

(Note that this is w.r.t. {e0 , e1 , e2 , e3 }, i.e., w.r.t. the observer X under con-
sideration.) By construction, k is a null vector, since η(k, k) = −ω 2 + ~k2 = 0.
Using the Minkowski metric, the plane wave (4.26) can be written as
µ
φ(x) = aeiη(k,x) = aeikµ x , (4.26′ )

which is a coordinate-independent (observer-independent) expression.

A plane wave (of light) corresponds to free photons. As we will see in the
following, the world looks different for these particles, which is due to the fact
that k is null.

–68– version 20/01/2010


Chapter 4. Particles Photons

In the limit of geometric optics we deal with light rays. As implied by (4.26′ ),
in Minkowski space, a light ray (a photon) is described by a null line, a straight
world line whose tangent is a null vector.

R ∋ λ 7→ xµ (λ) = aµ + λkµ , (4.29)

where aµ and kµ are constant four-vectors, and where

kµ kµ = η(k, k) = 0 . (4.30)

For photons, the concept of proper time does not make sense. This is because
we have used momentary rest frames in the derivation of proper time, which
do not exist for photons—there do not exist observers who see photons at rest.
(Formally, by using the formulas of section 4.3, time seems to be at a standstill
for photons. However, it is beyond speculation how photons “perceive” time—
in fact, we can be quite certain that photons do not “perceive” anything at all.)
Accordingly, the concept of a four-velocity does not exist for photons either.

W.r.t. a chosen observer X (whose basis is {e0 , e1 , e2 , e3 }, the four-vector k


naturally decomposes into
 
ω
k= ~ w.r.t. X , (4.31)
k
where k0 = ω is the angular frequency and ~k (with |~k| = ω) the wave vector
as seen by the observer X. This is the direct analog of (4.16) for particles that
are represented by timelike world-lines (massive particles).
Remark. In the context of timelike world-lines we have seen that the parametri-
zation of the world-line is irrelevant. The parametrization w.r.t. proper time
is beneficial, in particular when we aim at reading off directly the particle’s
three-velocity, but this parametrization need not be enforced. With photons,
the case is different. A photon with a prescribed frequency and wave vector
is described not by the null world-line alone, but by the null line plus the
length of the null tangent vector k. (Reparametrizations of (4.29) would leave
invariant the “path” of the photon in space and time, but they would change
the frequency and the wave vector of the photon.)

Let us conclude by discussing how to obtain the frequency ω in a coordinate-


independent way (i.e., we are looking for an analog of (4.21)). Consider a null
line
R ∋ λ 7→ xµ (λ) = aµ + λkµ ,

version 20/01/2010 –69–


Photons Chapter 4. Particles

which describes a photon; kµ is a null vector, which can be represented as


in (4.31) w.r.t. X. The value of the (angular) frequency ω can be obtained in
a coordinate-independent way. We simply note that

ω = −η(uX , k) = −ηµν uXµ kν . (4.32)

–70– version 20/01/2010


CHAPTER 5

RELATIVISTIC EFFECTS

5.1 Addition of velocities and the Doppler effect

Let there be given two observers (or particles), which we call X and Y , and
which are represented by the four-velocities uX and uY , respectively. In partic-
ular, we have uX2 = η(uX , uX ) = −1 and uY2 = η(uY , uY ) = −1.

In addition consider a third four-vector, which is assumed to be either another


timelike normalized vector uZ representing a third observer (or particle) Z, or
a null vector k representing a light ray.

Case I. The third four-vector is uZ , which is timelike and normalized and thus
represents a third observer (or particle). Let vXY denote the (absolute value of
the) relative velocity between X and Y and vY Z the relative velocity between
Y and Z. The question we ask is the following: Given vXY and vY Z , what is
the relative velocity vXZ between X and Z?

Example. Obviously, in Galilean physics, in one (spatial) dimension, we have


vXZ = |vXY ± vY Z |, where the ± sign depends on the directions of the velocities,
i.e., on whether the motion is parallel or antiparallel. In relativity, this simple
addition of velocities fails. For instance, in the example of section 4.5 we have
seen that when the velocity of a particle A w.r.t. a given observer is 1/2 (in
the negative x1 -direction), and the velocity of another particle B is 3/4 (in

version 20/01/2010 –71–


Addition of velocities and the Doppler effect Chapter 5. Relativistic effects

the positive x1 -direction), then the relative velocity between A and B is not
1/2 + 3/4 = 5/4 (which would be larger than the speed of light) but 10/11;
see (4.23) and (4.24).

Case II. Alternatively, we consider the case when the third four-vector is a null
vector k, i.e., k2 = η(k, k) = 0. In this case, k describes a null line, i.e., a light
ray. Let ωY denote the (angular) frequency as seen by Y and by vXY the relative
velocity between X and Y . The question we ask is the following: Given vXY
and ωY , what is the angular frequency ωX of the photon as seen by X?

The two cases can be treated rather analogously. In the following we will take
Y as our reference observer; hence, in case I,
     
1 1 1
uX = γXY , uY = , uZ = γY Z . (5.1)
~vXY ~o ~vY Z

and, in case II,


     
1 1 ωY
uX = γXY , uY = , k= ~kY . (5.2)
~vXY ~o

Case I. Relativistic addition of velocities.


To compute the relative velocity vXZ between X and Z we use (4.21). We
obtain

γXZ = −η(uX , uZ ) = − − γXY γY Z + γXY γY Z ~vXY ~vY Z ,
and thus
2 2
γXZ = γXY γY2 Z (1 − ~vXY ~vY Z )2 ,
Let α denote the angle between ~vXY and ~vY Z and set vXY = |~vXY |, vXZ = |~vXZ |,
and vY Z = |~vY Z |. A straightforward calculation then yields
p
2 + v 2 − 2v v
vXY 2 2 2
YZ XY Y Z cos α − vXY vY Z (1 − cos α)
vXZ = . (5.3)
1 − vXY vV Z cos α

Let us concentrate on some special cases. The case α = 0 means that ~vXY and
~vY Z are parallel; α = π means that ~vXY and ~vY Z are antiparallel. In these cases
we obtain
vXY ± vY Z
vXZ = . (5.4)
1 ± vXY vY Z

–72– version 20/01/2010


Chapter 5. Relativistic effects Addition of velocities and the Doppler effect

The sign is a + sign, if the velocities are antiparallel, and a − sign, if the
velocities are parallel (as seen by Y ).

In the case α = π/2 the velocities ~vXY and ~vY Z are orthogonal. It easily follows
that p
vXZ = vXY 2 + v2 − v2 v2 . (5.5)
YZ XY Y Z

Case II. Relativistic Doppler effect.


To compute the frequency ωX that is measured by the observer X we use (4.32).
We obtain

ωX = −η(uX , k) = − − γXY ωY + γXY ~vXY ~kY .

Let α denote the angle between ~vXY and ~kY (and recall that |~kY | = ωY ). Then
we get
2
2
ωX2 = γXY ωY2 1 − vXY cos α
and, finally,
1 − vXY cos α
ωX = ωY p . (5.6)
2
1 − vXY

The longitudinal Doppler effect corresponds to the case α = 0 or α = π, i.e.,


the direction of motion and the direction of the light ray coincide. In this case
we arrive at r
1 ∓ vXY 1 ∓ vXY
ωX = ωY p = ωY . (5.7)
2
1 − vXY 1 ± vXY
The interpretation is straightforward: If X moves parallel to the direction of
the photon (“X chases the photon”), where our reference is Y , then we have a
− sign in the numerator of (5.7) and
r
1 − vXY
ωX = ωY < ωY ; (5.8a)
1 + vXY

accordingly, there is a redshift. On the other hand, if X moves antiparallel to


the direction of the photon (“X heads toward the photon”), where our reference
is Y , then we have a + sign in (5.7) and
r
1 + vXY
ωX = ωY > ωY , (5.8b)
1 − vXY

i.e., the frequency increases and the wave length undergoes a blueshift.

version 20/01/2010 –73–


The aberration of light Chapter 5. Relativistic effects

The transversal Doppler effect corresponds to α = π/2. We find

ωX = −η(uX , k) = γXY ωY . (5.9)


2 )−1/2 is larger than 1, we find ω > ω , i.e., a blue-shift.
Since γXY = (1 − vXY X Y

Remark. Note that for the transversal Doppler effect the ‘symmetry’ between
the observers is broken. The fact that Y sees the observer X and the light ray
k move in orthogonal directions does not imply that X observes the same for
Y and k. On the contrary, the ‘transversality property’ is not ‘relative’.

5.2 The aberration of light

Aberration is a well-known effect. It is best experienced on a rainy day. When


it rains, the angle at which the rain drops are falling on my umbrella (provided
I have one) depends on how fast I walk. (From the perspective of a car driver
rain seems to fall almost horizontally.) The aberration of light underlies the
same principle; however, since a consistent treatment of light is necessarily
founded in relativity, we need the full machinery of special relativity to obtain
a satisfying mathematical description.

To be able to study aberration we need a concept of angles. In Euclidean


geometry (where the vector space is equipped with a positive definite scalar
product h·|·i), the angle between two vectors v and w is defined as
hv|wi
cos αvw = , (5.10)
kvk kwk
p
where k · k = h·|·i. However, Minkowski space is not a Euclidean space—the
Minkowski metric is not a scalar product, but a pseudo-scalar product. An
attempt to generalize (5.10) would be
η(v, w)
cos αvw = p p . (5.11)
|η(v, v)| |η(w, w)|

The absolute values of η(·, ·) in the denominator ensure that the square roots
exist. However, if v (or w) is a null vector, then the r.h.s. of (5.11) would be
infinite in general; hence, (5.11) does not make sense for null vectors. Suppose
v is timelike and w is timelike or spacelike; then the inverse Cauchy-Schwarz
inequality states that
η(v, w)2 ≥ η(v, v)η(w, w) ,

–74– version 20/01/2010


Chapter 5. Relativistic effects The aberration of light

where the equality sign


p refers topthe case when v and w are
p collinear.
p Hence, in
general, η(v, w) < − |η(v, v)| |η(w, w)| or η(v, w) > |η(v, v)| |η(w, w)|.
For (5.11) this means cos αvw < −1 or cos αvw > 1. This is impossible.
Exercise. Let v and w be spacelike four-vectors and define αvw according
to (5.11). What is the interpretation of αvw ?

Since an observer-independent concept of angles between four-vectors does not


exist, we turn our attention to the case when an inertial coordinate system
(inertial observer) X is given. (Let X’s four-velocity be denoted by uX .) Let
v and w be two four-vectors; w.r.t. the coordinate system X we have the
decomposition1
   0  0
1 v w
uX = , v= , w= . (5.12)
~o ~v w
~

Now, life is easy: We can consider the spatial parts of the four-vectors (‘spatial’
meaning ‘spatial’ w.r.t. X) and compute the conventional (Euclidean) angles
between these three-vectors. The spatial parts are ~v and w ~ and the angle αvw
between the two vectors is then simply given by
~v w
~
cos αvw = , (5.13)
|~v ||w|
~

cf. (5.10).

Let us find a coordinate-independent formula for (5.13). Define the spatial


projection of a four-vector v w.r.t. the observer X as

PX v = v + η(uX , v)uX . (5.14)

W.r.t. X’s coordinates we find


   
0 0
PX v = , PX w = .
~v w
~

Therefore, PX v and PX w are indeed the projections of v and w onto the (lin-
ear) plane of simultaneity {t = 0} of X. Since PX v and PX w are purely spatial
(w.r.t. X), the (Euclidean) scalar product coincides with the Minkowski prod-
uct on such objects; e.g.,
√ p
~ = η(PX v, PX w) , |~v | = ~v~v = η(PX v, PX v) .
~v w
1
Let us exclude the degenerate case where ~v = ~o or w
~ = ~o.

version 20/01/2010 –75–


The aberration of light Chapter 5. Relativistic effects

Consequently, equation (5.13) takes the form

η(PX v, PX w)
cos αvw = p p . (5.15)
η(PX v, PX v) η(PX w, PX w)

In the following we restrict ourselves to light rays and angles in between, since
this is the relevant case for our applications. Let k and l be null vectors
representing two light rays. According to (5.15) the angle between k and l is

η(PX k, PX l)
cos αkl = p p . (5.16)
η(PX k, PX k) η(PX l, PX l)

Let us insert (5.14) for PX k and PX l. On the one hand,



η(PX k, PX l) = η k + η(uX , k)uX , l + η(uX , l)uX
= η(k, l) + 2η(uX , k)η(uX , l) + η(uX , k)η(uX , l) η(uX , uX )
| {z }
−1
= η(k, l) + η(uX , k)η(uX , l) ,

and, on the other hand,



η(PX k, PX k) = η k + η(uX , k)uX , k + η(uX , k)uX
= η(k, k) +2η(uX , k)η(uX , k) + η(uX , k)η(uX , k) η(uX , uX )
| {z } | {z }
0 −1
= η(uX , k)2 ,

and the analog for PX l.

Accordingly, the angle between (the spatial parts of) k and l is given by

η(k, l) + η(uX , k)η(uX , l)


cos αkl = . (5.17)
η(uX , k)η(uX , l)

Equivalently, we have

η(k, l)
cos αkl − 1 = . (5.17′ )
η(uX , k)η(uX , l)

Remark. The formula (5.17′ ) is quite useful. Imagine an astronomer who ob-
serves two stars. The two stars correspond to points on the celestial sphere; by
construction, the angle between the two corresponds to the angle αkl between

–76– version 20/01/2010


Chapter 5. Relativistic effects The aberration of light

the (projections of the) null lines k and l, which represent the light rays emitted
by the stars. Formula (5.17′ ) thus shows how the apparent angle between the
stars changes depending on the state of motion of the observer (as described
by uX ).
Remark. Rescalings of k and l (i.e., k 7→ λk k and l 7→ λl l where λk , λl are
positive), leave αkl invariant; this is intuitively clear, because angles are defined
between the directions determined by the vectors and thus do not depend on
the actual lengths of the vectors.

Equipped with (5.17′ ) we are now able to address the problem of the aberration
of light: Consider two observers, X (with four-velocity uX ) and Y (with uY ),
and two (future-pointing) null vectors k and l (representing light rays). We ask
the question of how the angle between the light rays is observed by X and Y
changes depending on the relative velocity between X and Y .

Let us introduce coordinates associated with X; then uX , uY , and the two


(future-pointing) null vectors read
     0  0
1 1 k l
uX = , uY = γ , k= ~ , l= ~ . (5.18)
~o ~v k l

Since η(k, k) = 0 we have k0 = |~k| (and recall that k0 is essentially the frequency
of the light ray). Rescalings of k and l do not affect the angle between the two
rays, hence, w.l.o.g., we may set
       
1 1 1 1
uX = , uY = γ , k= , l= , (5.18′ )
~o ~v ~nk ~nl
where ~nk and ~nl are unit vectors, i.e., |~nk | = |~nl | = 1.

We denote the angle between (the spatial projections of) k and l w.r.t. X by
θX ; the angle the observer Y measures, we denote by θY . According to (5.17′ )
we have
η(k, l) −1 + ~nk~nl
cos θX − 1 = = = −1 + ~nk ~nl . (5.19)
η(uX , k)η(uX , l) (−1)(−1)
This does not come as a surprise, of course; cos θX = ~nk ~nl is the standard
formula. However, Y measures a different angle:
η(k, l) −1 + ~nk ~nl
cos θY − 1 = =
η(uY , k)η(uY , l) γ(−1 + ~v~nk )γ(−1 + ~v~nl )
cos θX − 1
= . (5.20)
γ 2 (1 − ~v~nk )(1 − ~v~nl )

version 20/01/2010 –77–


The aberration of light Chapter 5. Relativistic effects

Setting ~v~nk = |~v ||~nk | cos αk = v cos αk and ~v~nl = v cos αl , we have expressed
θY in terms of θX and the angles αk , αl between the light rays and the direction
of motion of Y . Here and henceforth we use the abbreviation v = |~v |.

Let us specialize (5.20) to the important case where one of the light rays, say
k, is aligned with the direction of relative motion between the observers, i.e.,
~nk k ~v . In other words, we set ~nk = ~v /v, so that
   
1 1
k = ~v , l= . (5.21)
v ~
n l

In this case, we obtain


~v~nl
cos θX = ,
v
and (5.20) reduces to
cos θX − 1 cos θX − 1
cos θY − 1 = = 2
γ 2 (1 − ~v~v /v)(1 − ~v~nl ) γ (1 − v)(1 − v cos θX )
(1 + v)(cos θX − 1)
= . (5.22)
1 − v cos θX
This formula describes the relativistic aberration (aberration of light ). Applying
standard algebraic manipulations we see that
cos θ − v
cos θ ′ = , (5.23a)
1 − v cos θ
r
θ′ 1+v θ
tan = tan , (5.23b)
2 1−v 2
p sin θ
sin θ ′ = 1 − v 2 . (5.23c)
1 − v cos θ
Note that equivalence between the formulas (5.23a) and (5.23b) follows from the
identity tan α = (1 − cos 2α)/(sin 2α). Since sin θ ′ is not bijective on θ ′ ∈ [0, π),
to invert equation (5.23c) it must be completed by the additional requirement
that θ ′ ∈ [0, π/2) if cos θ > v and θ ′ ∈ (π/2, π) if cos θ < v.

From (5.23b) it is simple to conclude that

θ′ > θ , (5.24)

which means that the angle measured by the observer X ′ is larger than the
angle measured by X (provided that θ > 0).

–78– version 20/01/2010


Chapter 5. Relativistic effects Lorentz contraction

If the velocity v is close to the velocity of light (which is 1 in our units), then
θ ′ ≫ θ. This has a strange effect on the field of vision. Objects that are
actually located behind appear right in front of an observer if the observer
moves sufficiently fast. For details we refer to the lecture course.

5.3 Lorentz contraction

So far we have only considered particles in Minkowski space (which were rep-
resented as world lines). An extended rod (in a state of uniform motion) is
represented by a world sheet, which is a two-dimensional domain whose bound-
aries are two parallel world lines (which represent the two ends of the rod).

The two ends of the rod (which we call E1 and E2) are represented by world
lines that are straight lines (since the rod is in uniform motion). W.l.o.g. we
assume that the world line of E1 passes through the origin. We thus have

E1 : s 7→ xµ1 (s) = suµ , E2 : s 7→ xµ2 (s) = ℓµ + suµ , (5.25)

where uµ is the four-velocity of the rod, i.e., η(u, u) = ηµν uµ uν = −1, and ℓµ
is a constant four-vector representing the displacement of E1 and E2. Every
vector in ℓ + hui can take the role of the displacement vector; hence, w.l.o.g.
we may assume that ℓµ is orthogonal to uµ , i.e., η(ℓ, u) = ηµν ℓµ uν = 0.

In the rest frame of the rod, i.e., for a comoving observer, we have u = (1, ~o)T ,
and hence
     
µ 1 µ 0 1
E1 : s 7→ x1 (s) = s , E2 : s 7→ x2 (s) = ~ + s . (5.25′ )
~o ℓ ~o

Definition 5.1. The proper length of a rod is the length that is measured in
the rest frame of the rod.

The proper length L of (5.25′ ) is obviously given by L2 = |~l|2 ; equivalently, we


write
L2 = η(ℓ, ℓ) . (5.26)

Let X denote an inertial observer with four-velocity uX . What is the length of


the rod according to X? The length of the rod is simply the distance between
the two ends E1 and E2 at a given instant of time (where by time we obviously
mean X’s time t).

version 20/01/2010 –79–


Lorentz contraction Chapter 5. Relativistic effects

Let us therefore calculate two events on the world lines E1 and E2, respectively,
that are simultaneous (for X); for simplicity, we determine the two events on
the plane of simultaneity t = 0. From (3.13) we obtain

E1 : η(x1 (s), uX ) = 0 ⇔ sη(u, uX ) = 0 ⇔ s=0,


η(ℓ, uX )
E2 : η(x2 (s), uX ) = 0 ⇔ η(ℓ, uX ) + sη(u, uX ) = 0 ⇔ s=− .
η(u, uX )

Therefore, the events

η(ℓ, uX )
x1 = 0 on E1 and x2 = ℓ − u on E2 (5.27)
η(u, uX )

are simultaneous for the inertial observer X. (In coordinates adapted to X,


these events would read (0, ~o)T and (0, ~ℓX )T for some ~ℓX . In the coordi-
nates (5.29) we have (0, ~o)T and (~ℓ~v , ~ℓ)T . We could use the latter to de-
rive (5.30).)

The length LX of the rod that X measures is the distance between the two
events; hence,
 η(ℓ, uX ) η(ℓ, uX ) 
LX2 = η ℓ − u, ℓ − u .
η(u, uX ) η(u, uX )
Simple algebraic manipulations using η(ℓ, u) = 0 and η(u, u) = −1 show that

η(ℓ, uX )2 η(ℓ, uX )2
LX2 = η(ℓ, ℓ) − = L 2
− . (5.28)
η(u, uX )2 η(u, uX )2

An immediate consequence is that LX ≤ L; note that the comoving observer


reproduces the result LX = L, since the second term in (5.28) vanishes when
uX = u.

Let us use the coordinates of the comoving observer to compute LX in terms


of the relative velocity. (Since (5.28) is formulated in a coordinate-independent
way we may choose any coordinate we like.) In these coordinates we have
     
1 0 1
u= , ℓ= ~ , uX = γ , (5.29)
~o ℓ ~v

where ~v is the velocity of X as seen by the comoving observer. Then (5.28)


yields
LX2 = L2 − (~v ~ℓ)2 = L2 − ~v 2 ~ℓ2 cos2 α , (5.30)

–80– version 20/01/2010


Chapter 5. Relativistic effects Lorentz contraction

where α is the angle between ~v and ~ℓ (as seen by the comoving observer).
Therefore, p
LX = L 1 − |~v |2 cos2 α ; (5.31)
in particular,
L ≥ LX . (5.32)
Remark. We see that the proper length of the rod (which is the length in the
rest frame) can also be viewed as the maximizer of the lengths of the rod as
seen by inertial observers.

Two cases are of special interest. First, the transversal case: Suppose that the
rod and the direction of motion are orthogonal, i.e., ~ℓ ⊥ ~v . Then cos α = 0
(i.e., α = π/2) and
LX = L . (5.33)
In other words, there does not exist a transversal contraction of lengths.

Second, the longitudinal case: Suppose that the direction of motion is parallel
(or antiparallel) to the rod itself, i.e., ~ℓ k ~v . Then cos α = 1 (i.e., α = 0) and
p
LX = Lγ −1 = L 1 − |~v |2 . (5.34)

This effect is called the Lorentz contraction. For the observer X the rod appears
contracted by a factor γ −1 .

A well-known “paradox” involving Lorentz contractions is the ladder paradox. If


a ladder is traveling at high speed it may undergo a sufficient length contraction
to fit into a much smaller garage. On the other hand, from the point of view of
an observer who is comoving with the ladder, it is the garage that is Lorentz
contracted to a length that is smaller than its proper length, which means that
the garage will be unable to contain the ladder at all. For a discussion we refer
to the lecture course.

version 20/01/2010 –81–


Lorentz contraction Chapter 5. Relativistic effects

–82– version 20/01/2010


CHAPTER 6

ENERGY AND MOMENTUM

6.1 Introduction

In Newtonian (Galilean) physics the momentum p~ of a point particle is defined


as p~ = m~v , where m is the particle’s mass; the kinetic energy is E = 21 m~v 2 ;
here, ~v is the velocity of the particle w.r.t. a (Galilean) inertial observer. Con-
servation of momentum and energy is crucial. Consider a system of particles
mi (i = 1, 2, . . . , n) that come together, interact, and move out again as a dif-
ferent system m′j (j = 1, 2, . . . , n′ ). Suppose for simplicity that we have only
one spatial degree of freedom. Then
X X X X
mi vi = m′j vj′ and mi vi2 = m′j vj′2 . (6.1)
i j i j

The (Galilean) principle of relativity implies that conservation of momentum


and energy must hold in each inertial frame. For an observer that moves with
velocity u w.r.t. the original observer, the Galilean transformation yields
X X X X
mi (vi + u) = m′j (vj′ + u) , mi (vi + u)2 = m′j (vj′ + u)2 . (6.2)
i j i j

Expanding the first equation in the variable u results in


X X X X
mi vi + u mi = mj vj′ + u m′j ,
i i j j

version 20/01/2010 –83–


Four-momentum of massive particles Chapter 6. Energy and momentum

and the second equation becomes


X X X X X X
mi vi2 + 2u mi vi + u2 mi = m′j vj′2 + 2u m′j vj′ + u2 m′j ,
i i i j j j

which is consistent with (6.1) and in addition implies a conservation of the sum
of the masses.

And in relativity? Suppose the definition of energy and momentum are the
same in relativistic physics. Since (6.1) must hold in each (relativistic) inertial
frame by the principle of relativity, we find that conservation of momentum
looks like
X vi + u X vj′ + u
mi = m′j (6.3)
1 + vi u 1 + vj u
i j

for an observer who moves with velocity u (w.r.t. the nameless observer we had
chosen initially); here we have used the relativistic addition of velocities (5.4).
Expanding (6.3) in u yields
X X X 
mi vi + u mi − mi vi2 +
i i i
 X X  X X 
2
+u − mi vi + mi vi3 + u3 mi vi2 − mi vi4 + . . .
i i i i

and the analogous expression for the r.h. side. Equating the two sides then
leads to X X
mi (vi )k = m′j (vj′ )k (6.4)
i j
for all k ∈ N. This is ridiculous.

We conclude that either conservation of momentum and energy fails in relativity


or the concept of momentum and energy is different. It is the latter. . .

6.2 Four-momentum of massive particles

Definition 6.1 (Four-momentum). Consider a particle represented by a time-


like world line s 7→ x(s). Let u denote the four-velocity of the particle. Then
the particle’s four-momentum is given by
p = mu , (6.5)
where m is the mass of the particle.

–84– version 20/01/2010


Chapter 6. Energy and momentum Four-momentum of massive particles

Since u2 = η(u, u) = −1 we have

p2 = η(p, p) = −m2 . (6.6)

Let X be an arbitrary inertial observer. W.r.t. X, the four-momentum of the


particle is  
1
p = mu = mγ . (6.7)
~v
Remark. In SI units (where c 6= 1) we obtain
 
c
p = mγ . (6.7′ )
~v

Note that this leaves (6.6) untouched, since η(x, y) = − c12 x0 y 0 + ~xy~ in this
case.

As usual we distinguish a temporal and spatial part of p, i.e., we have


 0  
p c
p= = mγ . (6.8)
p~ ~v

The vector ~
p is the three-momentum of the particle w.r.t. the observer X,

p = mγ~v .
~ (6.9)

In particular it is different from the Newtonian momentum. However, in the


limit of small velocities we obtain
  |~v |4 
|~v |2
p~ = m~v 1 + 2 + O 4 , (6.10)
2c c

i.e., we recover the Newtonian momentum for small velocities.


Remark. Despite the claim in numerous textbooks, it is a fact that the mass
of a particle does not (I repeat: does not) increase with the velocity of the
particle.1 The idea of a “relativistic mass” has its roots in the obsession that
the momentum must be written as p~ = m~v . This implies that one must tamper
with the mass and make it depend on |~v |. However, the simple truth is that
the three-momentum is not proportional to the three-velocity, but contains a
γ-factor, see (6.9). This leaves the mass as it is and as we love it: as a constant.
1
This sentence is scandalous anyway, since it suggests that velocity is an absolute quantity.

version 20/01/2010 –85–


Four-momentum of massive particles Chapter 6. Energy and momentum

The temporal part of p is given by p0 . Evidently it has the dimension of a


momentum, hence p0 c has the dimension of an energy:

E = p0 c = mγc2 . (6.11)

In the Newtonian approximation we obtain


 
2 |~v |2 3|~v |4 2 m|~v |2 3m|~v |4
E = mc 1 + 2 + + . . . = mc + + + ... . (6.12)
2c 8c4 2 8c2
The energy E is the energy of the particle (w.r.t. the observer X). W.r.t. a
frame where the particle is at rest the energy reduce to the rest energy,

Erest = mc2 . (6.13)

This relationship between mass and energy is the foundation of the equivalence
of mass and energy.

W.r.t. a frame where the particle is in motion the energy consists of the rest
energy term and a term that is easily identified as the generalization of the
Newtonian kinetic energy. (In fact, for small velocities, the kinetic energy
reduces to the standard Newtonian kinetic energy.) Hence,

E = Erest + Ekin , (6.14)

where
 m|~v |2 3m|~v |4
Ekin = E − Erest = mc2 γ − 1 = + + ... . (6.15)
2 8c2

In units where
c=1, (6.16)
the formulas are even simpler: Since
 
1
p = mγ (6.17)
~v

we have

E = p0 = mγ , (6.18a)
p~ = mγ~v . (6.18b)

Moreover,
p2 = η(p, p) = −E 2 + p~2 = −m2 . (6.19)

–86– version 20/01/2010


Chapter 6. Energy and momentum Four-momentum of photons

6.3 Four-momentum of photons

Definition 6.2 (Four-momentum). Consider a photon represented by a null


line with null vector k. Then the photon’s four-momentum is given by

p = ~k , (6.20)

where ~ is the reduced Planck constant.

This definition has its roots in quantum mechanics. When ν is the frequency
and ω the angular frequency of a photon, then its energy is given by E = hν =
~ω Likewise the three-momentum is p~ = ~~k. Since ω and ~k are the building
blocks of k, the definition (6.20) ensues.

Note that
p2 = η(p, p) = 0 (6.21)
for photons.

6.4 Four-momentum conservation

Suppose we have a collision of particles. The n incoming particles have four-


momenta pi , i = 1, 2, . . . , n. The n′ outgoing particles have four-momenta p′j ,
j = 1, 2, . . . , n′ . Then the four-momenta are conserved, i.e.,
X X
pi = p′j . (6.22)
i j

Since this equation includes the energies and the spatial components, it formu-
lates conservation of energy and (three-)momentum at the same time.
Example (Decay of a particle). Suppose we have one particle with mass m that
splits into two particles, each of mass m′ . Conservation of four-momentum
reads
p = p′1 + p′2 , (6.23)
where p = mu and p′1 = m′ u′1 and p′2 = m′ u′2 and where u, u′1 , u′2 are the
four-velocities. W.l.o.g. we can do the computations in the rest frame of the
first particle, hence
     
1 ′ ′ 1 ′ ′ 1
m = m γ1 ′ + m γ2 ′ . (6.24)
~o ~v1 ~v2

version 20/01/2010 –87–


Relativistic billiards Chapter 6. Energy and momentum

The equation for the (three-)momentum implies that ~v1′ = ~v ′ and ~v2′ = −~v ′ and
γ1′ = γ2′ = γ ′ . Therefore, the equation for the energy results in

1
m = 2m′ γ ′ = 2m′ p . (6.25)
1 − |~v |2

In particular we find
m > 2m′ . (6.26)

The Newtonian conservation of mass states that the sum of the masses on the
l.h. side equals the sum of the masses on the r.h. side. This type of conservation
of mass does not hold in general in special relativity.

6.5 Relativistic billiards2

To analyze the physics of billiards we study an elastic collision of two particles


of equal rest mass, one of which is originally at rest. An elastic collision is one
in which the masses of the particles are unchanged.

Let the four-momenta of the particles before the collision be denoted by p and q,
respectively, and by p′ and q ′ after the collision. Four-momentum conservation
implies
p + q = p′ + q ′ . (6.27)

W.l.o.g. we set the mass m of the particles to one, so that p2 = −1, q 2 = −1,
p′2 = −1, q ′2 = −1 (where in our notation p2 = η(p, p),. . . ). From (6.27) we
are able to derive a number of simple identities,

η(p, p′ ) = η(q, q ′ ) , η(p, q) = η(p′ , q ′ ) , η(p, q ′ ) = η(p′ , q) ; (6.28)

for instance, the latter follows by computing and equating the (Minkowski)
norms of p − q ′ and p′ − q.

By assumption, one of the particles is originally at rest; we choose this to be


the second particle, which is described by q. Hence the inertial frame we use is
represented by a four-velocity that is equal to q. This inertial frame is called
the laboratory frame.
2
The treatment of relativistic billiards in this section is inspired by [Beig:2001] R. Beig,
Relativistic Billiards in the Lab Frame, Preprint UWTh-Ph-2001-9, 2001 (unpublished).

–88– version 20/01/2010


Chapter 6. Energy and momentum Relativistic billiards

Let ϑpp′ denote the angle between the (spatial) direction of the incoming par-
ticle p and the outgoing particle p′ . Using (5.14) in (5.15) we get
η(p, p′ ) + η(q, p)η(q, p′ )
cos ϑpp′ = p p .
−1 + η(q, p)2 −1 + η(q, p′ )2
Since

η(p, p′ ) = η(p, p + q − q ′ ) = −1 + η(p, q) − η(p, q ′ ) = −1 + η(p, q) − η(p′ , q) ,

we further obtain
s s
(−1 + η(p, q))(1 + η(p′ , q)) 1 − η(p, q) −1 − η(p′ , q)
cos ϑpp′ =p p = ,
−1 + η(q, p)2 −1 + η(q, p′ )2 −1 − η(p, q) 1 − η(p′ , q)

or, by straightforward manipulations,


p
2η(p′ , q) − 2η(p, q)
tan ϑpp′ = p p .
1 − η(p, q) −1 − η(p′ , q)
In complete analogy we compute the angle ϑpq′ between the (spatial) direction
of the incoming particle p and the outgoing particle q ′ :
p
2η(q ′ , q) − 2η(p, q)
tan ϑpq′ = p p .
1 − η(p, q) −1 − η(q ′ , q)

We can view the product η(p, q) as given (since this quantity is characteristic
of the experimental set-up), while the quantities η(p′ , q) and η(q ′ , q) appearing
in the formulas are unknown. However,

η(q ′ , q) = η(p + q − p′ , q) = −1 + η(p, q) − η(p′ , q) ,

hence the only true variable is η(p′ , q). Consequently, we can eliminate η(p′ , q)
from the equations and then express ϑpq′ as a function of ϑpp′ and the given
quantity η(p, q). The most clever way is to simply multiply tan ϑpp′ and
tan ϑpq′ ; we obtain
2
tan ϑpp′ tan ϑpq′ = . (6.29)
1 − η(p, q)
Since η(p, q) = −γv = −(1 − |v|2 )−1/2 , where v is the relative velocity between
the particles p and q (or, simply, the velocity of the particle p in the lab frame),
we have
2
tan ϑpp′ tan ϑpq′ = . (6.30)
1 + γv

version 20/01/2010 –89–


Relativistic billiards Chapter 6. Energy and momentum

Consequently,
ϑpp′ + ϑpq′ < 90◦ , (6.31)
since γv > 1.
Remark. In the Newtonian limit we deal with small velocities, hence γ ≃ 1.
The formula thus reduces to

tan ϑpp′ tan ϑpq′ = 1 , (6.32)

which implies that ϑpp′ + ϑpq′ = 90◦ . For a proof we use that

cos(a − b) − cos(a + b)
tan a tan b = .
cos(a − b) + cos(a + b)

If you’ve ever played billiard, you needn’t be convinced that 90◦ is the correct
result.

Let us conclude this section with an extended remark. Let us try to under-
stand what four-momentum conservation is able to tell us. We assume that p
and q are known from the experimental set-up of the problem; in particular,
the (three-)velocity associated with p is given. Therefore, since the masses
are known, the unknowns of the problem are the (three-)velocities of the two
particles after the collision. (From their three-velocities we are able to con-
struct their four-velocities.) In other words, there are 6 unknown variables.
On the other hand, four-momentum conservation provides us with 4 equations.
Consequently, we will not be able to solve the problem completely when we
only use the conservation equations; to compeletly solve the problem we must
analyze the equations of motion and the (field) theory that models the in-
teraction between the particles. In the present case, however, and in a large
number of other cases, we pose a type of question that can be answered by
using four-momentum conservation alone.

Let us discuss the problem at hand in detail. Physical intuition suggests that
the problem is effectively two-dimensional; the three-velocities involved will
be coplanar (i.e., lie in a plane). In fact, this is the first result implied by
four-momentum conservation. (We have thus used one of the four equations
of (6.27); three more to go.) Since the three-velocity associated with p must lie
in the plane (and p is known), the unknown position of the plane corresponds
to one degree of freedom (which is, e.g., a variable representing the plane’s
rotational angle about the axis through p). The one equation of (6.27) that
we have used so far thus reduces the 6 unknowns to 4 + 1 (the latter being

–90– version 20/01/2010


Chapter 6. Energy and momentum The Compton effect

the plane’s angle). There are 4 remaining unknowns which we take to be the
following: The absolute values of the velocities vp′ and vq′ of the outgoing
particles and the angles ϑpp′ and ϑpq′ between the (spatial) direction of the
incoming particle p and the outgoing particles p′ and q ′ , respectively. Since
four-momentum conservation provides three conditions that we have not used
up yet, we will be able to compute all four unknown vp′ , vq′ , ϑpp′ , and ϑpq′
except for one that remains unspecified. For example, we will be able to express
three of the variables as functions of the remaining one. (A good choice for
the remaining unknown is vp′ , which is equivalent to η(p′ , q).) The reader is
encouraged to reread the analysis of this section in the light of the above.
Da capo al fine.

6.6 The Compton effect

Let us consider a set-up that is similar to the one for the billiards. We assume
that p is the four-momentum of an ingoing photon that scatters at an electron
with four-momentum q. The interaction leads to p′ for the photon and q ′ for
the electron,
p + q = p′ + q ′ . (6.33)
The vectors p and p′ are null vectors, p2 = 0 and p′2 = 0, while q 2 = −m2e and
q ′2 = −m2e , where me is the electron mass.

The lab frame is the inertial frame where the electron is at rest initially, i.e.,
the inertial frame with four-velocity u such that

q = me u . (6.34)

Let us compute the angle ϑpp′ between the direction of the incoming photon p
and the outgoing photon p′ w.r.t. the lab frame. Since p and p′ are null we can
directly apply (5.17′ ), i.e.,

η(p, p′ )
cos ϑpp′ − 1 = . (6.35)
η(u, p)η(u, p′ )
By definition, p = ~k, where k is the null vector of the photon; hence

η(u, p) = ~η(u, k) = −~ω , (6.36)

and analogously, η(u, p′ ) = −~ω ′ ; here, ω and ω ′ are the (angular) frequencies
of the photons p and p′ , respectively, as seen in the lab frame defined by u.

version 20/01/2010 –91–


The Compton effect Chapter 6. Energy and momentum

Also the product η(p, p′ ) is related to the angular frequencies. When we mul-
tiply (6.33) with p we obtain

η(p, p) + η(p, q) = η(p, p′ ) + η(p, q ′ ) = η(p, p′ ) + η(p′ , q) ,

where we have used (6.28) in the last step. Since η(p, p) = 0 we conclude that
   
η(p, p′ ) = η(q, p) − η(q, p′ ) = me ~ η(u, k) − η(u, k′ ) = me ~ −ω + ω ′ .

Combining these results leads to

me (−ω + ω ′ )
cos ϑpp′ − 1 = (6.37)
~ ωω ′
or
1 1 ~  

− = 1 − cos ϑpp′ , (6.38)
ω ω me
which is in turn equivalent to
h  
λ′ − λ = 1 − cos ϑpp′ , (6.38′ )
me

where λ and λ′ denote the wave lengths of p and p′ . In units where c =


6 1 the
factor on the r.h. side reads
h
λC = ; (6.39)
me c
this is the Compton wave length.

Equation (6.38) shows the change of wave length of the scattered photon in
dependence on the angle.

–92– version 20/01/2010


CHAPTER 7

ACCELERATED MOTION

7.1 Acceleration

Consider a particle represented by a timelike world line

R ∋ s 7→ xµ (s) , (7.1)

where s is proper time. The four-velocity is given by the derivative of xµ (s)


w.r.t. s, i.e.,
d
uµ (s) = ẋµ (s) = xµ (s) . (7.2)
ds
Definition 7.1. The four-acceleration is the second derivative of x(s) w.r.t.
proper time, i.e.,
d2
aµ (s) = ẍµ (s) = 2 xµ (s) . (7.3)
ds

Differentiating the condition η ẋ(s), ẋ(s)) = −1 we obtain



η ẋ(s), ẍ(s) = 0 ,

which is the same as



η u, a = 0 . (7.4)

version 20/01/2010 –93–


Acceleration Chapter 7. Accelerated motion

Hence, the four-acceleration aµ is orthogonal (in the Minkowski sense) to the


four-velocity uµ . This makes the four-acceleration a spacelike vector, see propo-
sition 3.5.

Suppose a particle with world line xµ (s) has vanishing four-acceleration. From
ẍµ (s) = 0 it follows that ẋµ (s) is a constant four-vector (whose norm must be
−1 because s is supposed to be proper time). Accordingly, xµ (s) describes a
straight world line, i.e., a particle in uniform motion.

Evidently, the concept of four-acceleration is rather simple. However, to make


contact with the concept of ’three-acceleration’ we have to work hard. We begin
by recalling that (w.r.t. a given observer) a four-velocity uµ can be written as
 0  
u 1
uµ = =γ . (7.5)
~u ~v

Hence the three-velocity ~v (which is the relative velocity between the particle
and the observer) has a simple relationship with the spatial components of the
four-velocity; namely, ~u = γ~v . For accelerations life is not so simple. Let us
compute aµ (s) from uµ (s).
     
µ d h 1 i dγ 1 0
a = γ = + γ d~v . (7.6)
ds ~
v ds ~
v ds

Since ds = γ −1 dt, see (4.11), we obtain


     d~v  1  
µ dγ 1 2 0 4 2 0
a =γ + γ d~v = γ ~v +γ d~
v , (7.7)
dt ~v dt dt ~
v dt

where we have used that dγ/dt = γ 3~v (d~v /dt). We call

d~v
~aN = (7.8)
dt
the Newtonian three-acceleration. Accordingly,
   
µ 4 1 2 0
a = γ (~v ~aN ) +γ . (7.9)
~v ~aN

The Newtonian three-acceleration ~aN is a poor measure of the acceleration.


To see this we consider a particle whose velocity is close to the speed of light
(w.r.t. the given observer). When this particle is further accelerated, its velocity
increases only marginally; hence ~aN is essential zero. However, the momentum

–94– version 20/01/2010


Chapter 7. Accelerated motion Acceleration

still increases by large amounts because p~ = mγ~v (and not p~ = m~v ). A


large gain in momentum should correspond to a large acceleration. This is not
properly reflected by the quantity ~aN .

In search for a better measure for the three-acceleration we analyze the particle
in a coordinate system in which the particle is instantaneously at rest. Let X
be a coordinate system (with coordinates {t, xi }) such that the particle is at e
e e
rest at time t = tr . Hence the velocity ~v (t) (which is the relative velocity of
e eX ) satisfies ~v (t ) = ~o, ei.e.,
the particle w.r.t. e
r
e ee
   
µ 1 µ
1
u =γ , u t=tr = . (7.10)
e e ~
v ~
o
ee
Furthermore, inserting ~v (tr ) = ~o in (7.7) we get
ee
!
0
µ
a t=tr = d~v . (7.11)
ee e
dt t=t r
ee e
We call
d~v
~ap = e (7.12)
dt t=tr
eee
the proper acceleration of the particle—it is the (Newtonian) acceleration mea-
sured in a momentary rest frame.
Remark. It is not difficult to argue that the proper acceleration is the actual
acceleration experienced by the particle. Namely, in the momentary rest frame
the particle’s velocities for times t in a small interval [tr − ǫ, tr + ǫ] are small.
Consequently, Newtonian physics eis a good approximation e in [te −ǫ, t +ǫ] (and,
r r
in fact, exact at t = tr ), so that the Newtonian concepts (like e d~ve/dt) reflect
e ein a correct way.
the physically reality e e

We write (7.11) as  
0
aµ = , (7.13)
~ap
where it is understood that the coordinate system that is used is a momentary
rest frame. From (7.13) it is easy to see that

a2 = η(a, a) = ~a2p . (7.14)

Hence the magnitude of the proper acceleration coincides with the norm of the
four-acceleration.

version 20/01/2010 –95–


Acceleration Chapter 7. Accelerated motion

Our next aim is express the proper acceleration ~ap in terms of the Newtonian
acceleration ~aN for arbitrary observers. Equation (7.9) results in

a2 = γ 8 (~v ~aN )2 (−1 + ~v 2 ) + 2γ 6 (~v ~aN )2 + γ 4~aN2 = γ 6 (~v ~aN )2 + γ 4~aN2


   
= γ 6 (~v ~aN )2 + (1 − ~v 2 )~aN2 = γ 6 1 − ~v 2 sin2 α ~aN2 , (7.15)

where α is the angle between ~v and ~aN .

If ~v and ~aN are (anti)parallel, i.e., in the longitudinal case, we have α = 0, so


that
a2 = ~a2p = γ 6~aN2
and thus
~ap = γ 3~aN . (7.16)
Since the problem is effectively (spatially) one-dimensional, we can simply write

ap = γ 3 aN . (7.16′ )

where ap and aN denote the magnitude of the respective accelerations. This


leads to the formula
dv
ap = γ 3 (7.16′′ )
dt
that describes the linear (proper) acceleration of a particle (where the word
‘linear’ is synonymous with ‘longitudinal’).

If ~v and ~aN are orthogonal, i.e., in the transversal case, we have α = 90◦ , so
that
a2 = ~a2p = γ 4~aN2
and thus
~ap = γ 2~aN , (7.17)
which describes the orthogonal (proper) acceleration of a particle.

Combining the two formulas it follows that in the general case


k
~ap = γ 3~aN + γ 2~aN⊥ , (7.18)
k
where ~aN is the component of ~aN parallel to ~v and ~aN⊥ the component orthogonal
to ~v , i.e.,
k  ~v  ~v
~aN = ~aN ~v 2
, ~aN⊥ = ~aN − ~aN ~v .
|~v | |~v |2

–96– version 20/01/2010


Chapter 7. Accelerated motion Constant linear acceleration

Remark. Finally, the four-acceleration can be expressed in terms of the proper


acceleration as
    ! !
1 0 ~
v~ap 0
aµ = γ(~v~ap ) ~v + =γ + .
v2
~
~ap − (~v~ap ) ~v~v2 k
~ap ~a⊥
p

Remark. Occasionally, equation (7.15) is expressed not in terms of ~aN = d~v /dt
but in d~v /ds. Since d~v /ds = γ d~v /dt, we have
   d~v 2
a2 = γ 4 1 − ~v 2 sin2 α (7.15′ )
ds
and thus
dv
ap = γ 2 (7.16′′′ )
ds
in the case of linear acceleration.

7.2 Constant linear acceleration

Constant acceleration means constant proper acceleration. Since a2 = ~a2p , we


require that the norm of the four-acceleration is constant, i.e.,
a2 = η(a, a) = const . (7.19)

In the case of linear acceleration, the motion of the particle is along a straight
line (in space). In slight abuse of pnotation we denote by a also the constant
value of the acceleration, i.e., a = η(a, a) = const. Equation (7.16′′ ) reads
dv
γ3 = a = const , (7.20)
dt
which is a differential equation can be integrated rather easily by noting that
Z
v
(1 − v 2 )−3/2 dv = √ . (7.21)
1 − v2
Exercise. (A time travel to the beginner’s course in analysis). If I don’t want
to bother Mathematica or Maple, then a standard way to solve the integral is
the following. We substitute v by sin w and get
Z Z
2 −3/2 1 v
(1 − v ) dv = 2
dw = tan w = tan arcsin v = √ ;
cos w 1 − v2
that’s it.

version 20/01/2010 –97–


Constant linear acceleration Chapter 7. Accelerated motion

Solving
v
√ = at
1 − v2
for v yields
at
v(t) = ± √ ; (7.22)
1 + a2 t2
note that by neglecting the constant of integration we have set v(0) = 0.
Another integration then leads to
1 p 
x(t) = x0 ± 1 + a2 t2 − 1 . (7.23)
a
When we consider the case x0 = ±1/a, then
1p
x(t) = ± 1 + a2 t2 . (7.24)
a
Since
1
−t2 + x(t)2 = 2 , (7.25)
a
the world line of the particle
 
1
x(t)
t 7→ 
 0 
 (7.26)
0
represents an (equilateral) hyperbola. Therefore, constant linear acceleration
leads to ‘hyperbolic motion’.

In the following we present an alternative derivation of this result that is better


adapted to the four-vector formalism (and more elegant). Linear acceleration
means that the motion of the particle is along a straight line (in space); hence,
the world line of the particle must be in a two-dimensional plane (which we
choose to be the (t, x)-plane). The four-acceleration
 0
a
 a1 
aµ = 0
 (7.27)
0
satisfies a2 = η(a, a) = −(a0 )2 + (a1 )2 = const, hence we can parametrize aµ
as  
sinh α(s)
cosh α(s)
aµ (s) = a 

,
 (7.28)
0
0

–98– version 20/01/2010


Chapter 7. Accelerated motion Constant linear acceleration

where α(s) is a function of s that is unspecified a priori. Since uµ must be


orthogonal to aµ , see (7.4), and normalized, η(u, u) = −1, we automatically
obtain
 
cosh α(s)
 sinh α(s) 
uµ (s) = 

.
 (7.29)
0
0

On the other hand,


d µ
u = aµ , (7.30)
ds
from which we easily infer that α(s) = as (plus a constant which we set to
zero). Integrating uµ we arrive at xµ ,
 
sinh as
1 cosh as
xµ (s) =  , (7.31)
a 0 
0

where we have again set the constants of integration to zero. This equation
describes a hyperbola in Minkowski space.
Exercise. In units where c 6= 1 we have

c as c2  as 
t(s) = sinh and x(s) = cosh .
a c a c

Since x(0) is c2 /a, the distance ∆x a particle covers in proper time s is

c2 h  as  i
∆x = cosh −1 .
a c

Suppose that the acceleration a equals 1g, where g is the acceleration due to
gravity. (This is the most pleasant acceleration for spacefarers.) Then
h   i
∆x[in light years] = 0.97 cosh 1.03 s[in years] − 1 .

Compute how long it takes to get to the center of our galaxy (which is about
25000 light years from us). (Be amazed: it’s only 10 years 6 months.)

version 20/01/2010 –99–


Circular motion Chapter 7. Accelerated motion

7.3 Circular motion

Consider a particle in uniform circular motion. W.r.t. some observer its world
line is given by  
t
r cos ωt
xµ (t) =  
 r sin ωt  , (7.32)
0
where r is the radius of the circular orbit and ω the constant angular velocity.
Since |~v | = ωr, the product ωr must be less than 1 to ensure that the circular
velocity is less than the speed of light.

The four-velocity is given by


 
1
−rω sin ωt
uµ (t) = γ  
 rω cos ωt  , (7.33)
0
where
1
γ=√ = const .
1 − ω2r2
This implies that proper time s is a linear function of t, namely
Z
s = γ −1 dt = γ −1 t . (7.34)

Consequently, when we parametrize xµ and uµ w.r.t. proper time we get


 
γs
r cos(ωγs)
xµ (s) =  
 r sin(ωγs)  ,
0
 
1
d −rω sin(ωγs)
uµ (s) = xµ (s) = γ  
 rω cos(ωγs)  .
ds
0
The four-acceleration is the derivative of uµ (s) w.r.t. s, i.e.,
 
0
d −rω 2 cos(ωγs)
aµ (s) = uµ (s) = γ 2  −rω 2 sin(ωγs)  .
 (7.35)
ds
0

–100– version 20/01/2010


Chapter 7. Accelerated motion The relativistic rocket

It is easy to see that the acceleration is constant,


2
a2 = γ 2 rω 2 . (7.36)

Furthermore, the spatial part of the acceleration ~a is orthogonal to ~v .


Remark. An alternative derivation of (7.35) uses the general considerations of
section 7.1. We have
   
−rω sin ωt −rω 2 cos ωt
~v =  rω cos ωt  and therefore ~aN =  −rω 2 sin ωt  .
0 0

Inserting this result into (7.9) leads to (7.35). The proper acceleration is

~ap = γ 2~aN

and the ~a (which is the spatial part of the four-acceleration) is equal to ~ap .
Since the acceleration—either of ~aN , ~ap , ~a—is orthogonal to ~v we are in the
‘transversal case’ of section 7.1.

The magnitude a of the acceleration (which coincides with |~ap |) is given by

a = γ 2 rω 2 , (7.37)

which can be written as


rω 2
a= . (7.38)
1 − r2ω2
This is the formula for the relativistic centripetal acceleration. In the Newtonian
case we simply have
aN = rω 2 . (7.39)
The relativistic centripetal acceleration is larger and diverges as rω → 1.

7.4 The relativistic rocket

The rocket we consider is represented by a world line x(s) in Minkowski space-


time, where we choose to parametrize the world line by proper time s; at lift-off
we set s = 0. If we denote the rocket’s four velocity by u = ẋ(s) and by m its
mass, its four momentum is given by

p = mu . (7.40)

version 20/01/2010 –101–


The relativistic rocket Chapter 7. Accelerated motion

Since a rocket obtains thrust by ejecting its propellant, its mass m decreases
with time. We are thus careful and write

p(s) = m(s)u(s) . (7.40′ )

The four momentum of the propellant (let’s say ‘gas’ for simplicity) that is
emitted in a small time interval [s, s + ds] is given by

dpgas = dmgas ugas , (7.41)

where dmgas = ṁgas ds is the exhaust mass. (Note that each of the quantities
depends on s in general.) Consequently, the four momentum of the gas that
has been emitted up to time s is given by
Z Z s
pgas = dmgas ugas = ṁgas (s′ )ugas (s′ )ds′ . (7.42)
0

The overdot denotes differentiation w.r.t. proper time s.

Conservation of momentum means that the total momentum, i.e., the sum of
the momentum of the rocket and the gas, remains constant, i.e.,

p + pgas = const ; (7.43)

recall that p = p(s) is the momentum of the rocket and pgas the momentum of
the gas emitted up to time s. Differentiating (7.43) we obtain

ṗ + ṗgas = ṁu + mu̇ + ṁgas ugas = 0 . (7.44)

When we multiply this equation with u (in the Minkowski sense), then

ṁη(u, u) + mη(u, u̇) + ṁgas η(u, ugas ) = 0 . (7.45)

Since η(u, u) = −1, we have η(u, u̇) = 0, see (7.4); moreover, η(u, ugas ) = −γgas
2
−1/2
with γgas = (1 − vgas , where vgas is the relative velocity between the gas
emitted at time s and the rocket at time s. Therefore, equation (7.45) results
in
−ṁ − ṁgas γgas = 0 . (7.46)
We can thus replace ṁgas in (7.44) to obtain
−1
ṁu + mu̇ − ṁγgas ugas = 0 . (7.47)

–102– version 20/01/2010


Chapter 7. Accelerated motion The relativistic rocket

Equivalently, we have
−1
ṁu + ma = ṁγgas ugas , (7.48)
where we have written a instead of u̇. It is now straightforward to take the
Minkowski norms of the l.h. side and the r.h. side,

−ṁ2 + m2 a2 = −ṁ2 γgas


−2
, (7.49)

from which we conclude that


 −2 
m2 a2 = ṁ2 −γgas + 1 = ṁ2 vgas
2
. (7.50)

Finally, we have arrived at


ma = −ṁvgas , (7.51)
where (in slight abuse of notation) a (= ap ) now denotes the magnitude of the
(proper) acceleration. (Clearly, the acceleration is antiparallel to the velocity
of the emitted gas, hence the minus sign.) Equation (7.51) is the relativistic
rocket equation.
Exercise. The Newtonian rocket equation takes exactly the same form. But
there are obvious differences. Which ones?

As an example, let us consider a very simple rocket: let us assume that the
velocity vgas is a constant over time, i.e., that the gas is always emitted at the
same speed. From (7.16′′′ ) we have
dv
a = γ2 = γ 2 v̇ , (7.52)
ds
where v denotes the velocity of the rocket w.r.t. some given observer. (Since
we have assumed that lift-off is at time s = 0, we have v = 0 at s = 0.) The
rocket equation becomes
mγ 2 v̇ = −ṁvgas , (7.53)
which we can write as
v̇ ṁ
2
= −vgas (7.54)
1−v m
and thus integrate:
1
[log(1 + v) − log(1 − v)] = −vgas (log m − log m0 ) . (7.55)
2
Here, m0 is an integration constant—it is the mass of the rocket at lift-off.
Finally, r
1 − v  m0 −vgas
= (7.56)
1+v m

version 20/01/2010 –103–


Four-force Chapter 7. Accelerated motion

and thus 
m 2vgas
1− m0
v=  . (7.57)
m 2vgas
1+ m0
We conclude that the rocket’s velocity comes arbitrarily close to 1 (i.e., c)
provided m/m0 becomes arbitarily small.

7.5 Four-force

Consider a particle represented by a timelike world line xµ (s), where s is proper


time. Let pµ denote the four-momentum of the particle.
Definition 7.2. The four-force acting on the particle is given as the derivative
of the four-momentum w.r.t. proper time, i.e.,
d µ
F µ (s) = ṗµ (s) = p (s) . (7.58)
ds

Since  
µ E
p = (7.59)
p~
w.r.t. some observer, we have
! ! !
dE dE dE
ds dt dt
Fµ = =γ =γ , (7.60)
d~
p
ds
d~
p
dt F~

where F~ is the (relativistic) three-force. Interestingly enough, the power dE/dt


appears as the zero component of the four-force.

Written out explicitly, we have


dE d dm d~v
= (mγ) = γ + mγ 3~v , (7.61a)
dt dt dt dt
  
~ d  dm 3 d~v  d~v
F = mγ~v = γ~v + m γ ~v ~v + γ . (7.61b)
dt dt dt dt

When {t, ~x} is an inertial system in which the particle is at rest at some time
ee
t = tr , then
e e dE dm
e = (7.62)
dt t=tr dt t=tr
e ee e ee

–104– version 20/01/2010


Chapter 7. Accelerated motion Four-force

and
d~v
F~ = m e =: F~p . (7.63)
e t=tr dt t=tr
ee eee
The quantity F~p is the ‘proper force’, which is experienced by the particle in
the momentary rest frame. Obviously,

F~p = m~ap . (7.64)

Remark. We call a force heatlike if it does not change the particle’s velocity.
Then (7.61) becomes
dm µ
Fµ = γ u , (7.65)
dt
where uµ is the (constant) four-velocity of the particle. In the rest frame,
!
dm
Fµ = dt
. (7.66)
~oe

By definition, a particle is not accelerated by a heatlike force, but there is an


increase in mass.

We call a force pure if it does not change the particle’s mass, i.e., if

m ≡ const (7.67)

for all times. Then  


µ 0
F = ~ (7.68)
t=tr Fp
ee
in the momentary rest frame. Accordingly, the particle’s four-velocity and the
four-force are orthogonal, i.e.,

η(F, u) = 0 . (7.69)

This is consistent with our expectations, since for a pure force we have
d µ
Fµ = p (s) = maµ , (7.70)
ds
and aµ is orthogonal to uµ .

As a direct consequence of η(F, u) = 0 we have


dE
= F~ ~v . (7.71)
dt

version 20/01/2010 –105–


Four-force Chapter 7. Accelerated motion

Evidently, w.r.t. the momentary rest frame of the particle, the four-velocity of
the observer—let’s call it wµ — is given by
 
1
γ . (7.72)
−~v

Accordingly,
dE
−γ = η(F, w) = −γ F~p~v , (7.73)
dt
where we have evaluated the Minkowski product once in the momentary rest
frame of the particle and once in the observer’s inertial frame. We conclude
that
dE
= F~ ~v = F~p~v . (7.74)
dt
Alternatively, we can write
dE = F~p d~x , (7.75)
which represents the (infinitesimal) work done by the force on the particle.
Remark. Finally, we note that the four-force can be expressed in terms of the
proper force as
    ! !
1 0 ~
v ~p
F 0
~p )
F µ = γ(~v F + ~ (7.76)
~v ~p ) ~v2 = γ F~ k + F~ ⊥ .
Fp − (~v F
v2
~ ~
v p p

This is in complete analogy with the remark at the end of section 7.1.

Our analysis in this section so far can be summarized as follows: Given the
motion of a particle and its associated four-velocity, four-momentum, and four-
acceleration we are able to compute (what we call) the four-force (and the
proper force) that is connected with the particle’s motion. In particular, if the
particle’s mass m is constant along its world line, then

maµ = F µ , (7.77)

which is the analog of Newton’s second law.

However, in Newtonian physics, it is straightforward to do the ‘opposite’. If


there is given a force field F~ (t, ~x), then Newton’s second law m~a = F~ is a
system of ordinary differential equations (ODEs) for ~x(t), i.e.,
2 
¨(t) = d ~x(t) = F~ t, ~x ,
~x
dt 2

–106– version 20/01/2010


Chapter 7. Accelerated motion Four-force

which can be solved for arbitrary initial data ~x(0), ~v (0) = ~x˙ (0), to yield ~x(t)
and thus the motion of the particle in the force field.

The relativistic analog (7.77) can not be interpreted in the analogous way.
Suppose that we are given a force field F µ (xσ ) on Minkowski space and initial
data for the particle, i.e., xµ (0) = x̊µ and uµ (0) = ẋµ (0) = ůµ (where we
continue to use the dot as referring to proper time). Then
mẍµ = F µ (xσ ) , xµ (0) = x̊µ , ẋµ (0) = ůµ
is not a well-posed IVP (initial value problem). The reason is that uµ uµ = −1
is required; hence the l.h.s. is orthogonal to ůµ at initial time, independently
of the choice of ůµ , but the r.h.s., i.e., F µ (xσ ), cannot be orthogonal to all
four-vectors ůµ simultaneously (unless F µ = 0).

We conclude that the concept of a four-force field (as a vector field on Minkowski
space) does not makes sense. To resolve the problem we must realize that it is
impossible to prescribe F µ = F µ (xσ ); instead, we attempt to prescribe F µ as a
function of the spacetime position and the four-velocity, i.e., F µ = F µ (xσ , uλ ).

Let us test the simplest ansatz that comes to mind: We assume linearity in uλ ,
i.e., we make the ansatz
Fµ (xσ , uλ ) = Fµν (xσ ) uν . (7.78)
Note that, equivalently, we are able to write F µ (xσ , uλ ) = F µν (xσ ) uν , when
we raise indices. Let us suppress the dependence on the spacetime position
and simply write Fµ = Fµν uν . The requirement is that F µ be orthogonal to
uµ (for arbitrary uµ ), i.e., Fµ uµ = 0 (for arbitrary uµ ). Let’s see whether the
ansatz (7.78) can guarantee that. We obtain
 !
Fµ uµ = Fµν uµ uν = F(µν) + F[µν] uµ uν = F(µν) uµ uν = 0 ,
which is zero for all uµ if and only if F(µν) = 0.
Exercise. Prove that, if
F(µν) uµ uν = 0
for arbitrary uµ (where uµ uµ = −1), then F(µν) = 0.

We thus find that if Fµν = F[µν] , i.e., if Fµν is assumed to be antisymmetric,


then Newton’s second law (7.77) together with the ansatz (7.78) lead to a
well-posed IVP,
mẍµ = F µν (xσ ) uν , xµ (0) = x̊µ , ẋµ (0) = ůµ .

version 20/01/2010 –107–


Four-force Chapter 7. Accelerated motion

In this context, the prescribed field is not a vector field but an antisymmetric
tensor field Fµν (xσ ) on Minkowski space. (The force is then a derived quantity,
see (7.78).) We will see in the next chapter that these considerations are
part of an important theory, namely the theory of electromagnetism. It is
remarkable that nature does the same thing as we did with the simplest possible
ansatz (7.78).

Now we become overzealous, of course. Having been so successful with the


simple ansatz (7.78) to obtain (one aspect of) electromagnetism, it suggests
itself to try the next simple one to obtain relativistic gravity. We assume that
F µ does not depend linearly but quadratically on uµ , i.e., we make the ansatz

Fµ (xλ , uρ ) = Gµνσ (xλ ) uν uσ . (7.79)

Since uν uσ is symmetric in ν and σ, it is w.l.o.g. when we assume that Gµνσ


is symmetric in ν and σ as well. One tensor that is symmetric, we have come
to know really well: ηνσ . Our intention is to build Gµνσ from the Minkowski
metric and one additional four-vector field that is regarded as the four-gradient
of a function V = V (xλ ). A first try is to put

Gµνσ = V,µ ηνσ ,

where we use the comma notation for the derivative, e.g., V,µ = ∂µ V = ∂V /∂xµ .
However, the required orthogonality Fµ uµ = 0 does not hold for this simple
ansatz; to ensure Fµ uµ = 0 we set

Gµνσ = V,µ ηνσ − ηµ(ν V,σ) . (7.80)

It is straightforward to check that



Fµ uµ = Gµνσ uµ uν uσ = V,µ ηνσ − ηµ(ν V,σ) uµ uν uσ = −V,µ uµ + V,σ uσ = 0 ,

independently of uµ , as required.

Let us investigate Newton’s second law with (7.79), i.e.,

aµ = Gµνσ(xλ ) uν uσ ; (7.81)

for simplicity we have set the particle’s mass to 1. The r.h.s. is



Gµνσ uν uσ = V ,µ ηνσ − δµ(ν V,σ) uν uσ = −V ,µ − uσ V,σ uµ ; (7.82)

its temporal component (zero component) is



~
G0νσ uν uσ = −η 00 V,t − γ 2 V,t + ~v ∇V , (7.820 )

–108– version 20/01/2010


Chapter 7. Accelerated motion Four-force

since uµ = γ(1, ~v )T and thus uσ V,σ = γV,t + γv i V,i . The spatial components
of (7.82) are  i
~
Giνσ uν uσ = −δij V,j − γ 2 V,t + ~v ∇V v . (7.82i )
The temporal and spatial components of the l.h.s. of (7.81) are given by
 d~v  1  
0
µ 4 2
a = γ ~v + γ d~v ,
dt ~v dt

see (7.7).

Let us consider the Newtonian limit, i.e., the limit of small velocities. Then
~
equation (7.820 ) becomes ∂t V − ∂t V − ~v ∇V ~ and (7.82i ) becomes
= −~v ∇V
~ . Therefore, from (7.81), in the Newtonian limit, we obtain
−∇V

d~v ~ , d~v ~ .
~v = −~v ∇V = −∇V
dt dt
 
~ ~x(t) = ∂t V ~x(t) ; accordingly, the equations read
If V = V (~x), then ~v ∇V

d  m~v 2 
+ V (~x) = 0 , ~a = F~ ,
dt 2

where F~ = −∇V ~ is the negative gradient of the potential, i.e., the force.
(Since we did not explicitly introduce a coupling constant in (7.79), V should
be regarded as mass times potential, which makes −∇V~ the force.) We recover
the fundamental aspects of Newtonian gravity in the limit of small velocities:
The first equation is conservation of energy, which is the sum of kinetic plus
potential energy; the second equation is the law of motion.

The theory based on (7.79) and (7.80), which leads to (7.81), is a relativistic
scalar theory of gravity. Unfortunately, despite its appeal, it is false (i.e., in
contradiction with observations).

version 20/01/2010 –109–


Four-force Chapter 7. Accelerated motion

–110– version 20/01/2010


CHAPTER 8

MAXWELL

8.1 The electromagnetic field tensor

Suppose we have an (exterior) electromagnetic field represented by an electric


~ ~x) and a magnetic field B(t,
field E(t, ~ ~x), where {t, ~x} are the coordinates of
an inertial frame. A (test) particle that is moving in this electromagnetic field
is subject to the Lorentz force, i.e.,

d~
p  
=e E~ + ~v × B
~ , (8.1)
dt

where p~ and ~v are the particle’s three-momentum and three-velocity, respec-


tively, and e denotes the particle’s charge. This equation of motion is known
in the case of a slowly moving particle; it is unclear a priori how to extend this
equation to any ~v with |~v | < 1.

In a momentary rest frame, which is represented by {t, ~x}, the Lorentz force is
a purely electric force, ee
d~
p
~,
e = eE (8.2)
dt e
e
since ~v = ~o (at the considered time t = tr where the particle is instantaneously
e e e

version 20/01/2010 –111–


The electromagnetic field tensor Chapter 8. Maxwell

at rest). Using (7.68) we associate (8.2) with a four-force1


   
µ 0 0
F = ~ = ~ , (8.3)
Fp eE
e
where t = tr is understood. Equation (7.76) then yields
e e ! !
~
~v E 0
µ
F = eγ e +e (8.4)
~k
E ~⊥
E
e e
for the four-force w.r.t. an arbitrary observer. However, from (8.1) we obtain
!
~v
E~
µ
F = eγ , (8.5)
~k + E
E ~ ⊥ + ~v × B~

where we have used (7.60) and (7.74). Comparing (8.4) and (8.5) we obtain
 
~k = E
E ~k , E~⊥ = γ E ~ ⊥ + ~v × B
~ . (8.6)
e e
~ and B
In particular, we note that E ~ mix: There is no observer-independent
distinction between the electric and the magnetic field. For instance, while
one observer might see only an electric field (and no magnetic field), another
observer sees both an electric and a magnetic field (according to the above
transformation).

Strictly speaking. so far we have only considered the limit of small velocities,
since we hesitated to automatically assume (8.1) for large velocities. However,
it is a fact (which is amply supported by experiments) that (8.1) is indeed the
correct representation of the Lorentz force for all velocities. In the following we
present an additional plausibility argument that supplements the experimental
evidence.

We have argued that the decomposition of the electromagnetic field into an


~ and a magnetic part B
electric part E ~ is not observer-independent. So, our aim
is to collect these fields into one quantity that represents the electromagnetic
field as a whole. What could this quantity be? Obviously, the electromagnetic
field cannot be a four-vector, which is simply because we have six independent
quantities (namely, the three components of E ~ and the three components of
~
B). It is suggestive to assume that the electromagnetic field is a tensor
Fµν . (8.7)
1
To avoid ambiguities we denote the four-force by a calligraphic letter in this chapter. The
normal letter F is reserved for the electromagnetic field (8.7).

–112– version 20/01/2010


Chapter 8. Maxwell The electromagnetic field tensor

Clearly, such a tensor has 16 entries; hence, many of the components must be
redundant to encode the electromagnetic field.

Given Fµν , there is not much choice to derive a four-force from it other than
via the simple rule
F µ = eF µν uν , (8.8)
where uµ is the four-velocity of the particle. If we require the Lorentz four-
force (8.8) to be a pure force, then

F µ uµ = eFµν uµ uν = 0 , (8.9)
see (7.69). Consequently, the field tensor Fµν must be antisymmetric, i.e.,
Fµν = −Fνµ , or
Fµν = F[µν] , (8.10)
cf. the considerations of section 7.5.

Since Fµν is antisymmetric, there are six independent components. Let us


choose an arbitrary observer; the temporal components are denoted by 0, the
spatial components are denoted by Latin indices. Using the antisymmetry we
see that, on the one hand, we have the three components Fi0 (or, equivalently,
their negative counterparts F0i ); we set Fi0 = δij E j . On the other hand, we
have the three components encoded by Fij , which we can write as Fij = ǫijk B k .
W.r.t. the chosen observer we further get
 
µ 1
u =γ i , (8.11)
v
hence
   
Fµ = eFµν uν = eγ F0i v i , Fi0 + Fij v j = eγ − δij v i E j , δij E j + ǫijk v j B k
(8.12)
and the Lorentz force reads
! !
δij v iE j ~v
E~
F µ = eγ = eγ . (8.13)
E i + ǫijk v j B k E~ + ~v × B~

As a consequence we obtain
d~
p 
F~ = = e E~ + ~v × B
~ (8.14)
dt
for the Lorentz (three-)force, cf. (7.60). Comparison with (8.1) suggests

E~ = E
~ , ~=B
B ~. (8.15)

version 20/01/2010 –113–


The electromagnetic field tensor Chapter 8. Maxwell

We have thus found the four-tensor representation of the electromagnetic field


and the (observer-dependent) decomposition into electric part E~ and magnetic
~
part B.

Let us summarize. The electromagnetic field is described by an antisymmetric


tensor field Fµν that encodes both the electric and magnetic degrees of freedom.
This electromagnetic field tensor is observer-independent.2 When an observer
is chosen, the electromagnetic field tensor is represented by an antisymmetric
matrix, where Fi0 = δij E j and Fij = ǫijk B k , i.e.,
 
0 −E 1 −E 2 −E 3
E 1 0 B 3 −B 2 
Fµν =
E 2 −B 3
 (8.16)
0 B1 
E 3 B 2 −B 1 0

w.r.t. the chosen observer. The Lorentz force acting on a particle with four-
velocity uµ is
F µ = eF µν uν , (8.17)
where e is the charge of the particle. In the chosen coordinates we have
!
e ~v
E~
Fµ = γ  , (8.18)
e E~ + ~v × B
~

so that the three-force takes its conventional form



F~ = e E~ + ~v × B
~ . (8.19)

Let us conclude this section by defining another antisymmetric tensor that is


closely connected with Fµν . The dual of the field tensor is defined as
1 αβ
∗Fγδ = 2 ǫαβγδ F (8.20)

where F αβ = η αµ η βν Fµν as usual. (In the theory of (differential) n-forms, ∗


is known as the Hodge star operator; ∗F is the Hodge dual of F .) The tensor
∗Fµν is antisymmetric.
Remark. Viewed as a map on antisymmetric tensors Tµν , duality is an anti-
involution, i.e.,
∗ ∗ = −id . (8.21)
2
The indices of Fµν are abstract indices.

–114– version 20/01/2010


Chapter 8. Maxwell Transformations of ~E and ~B

To show this we use the definition (8.20).

∗∗Fαβ = 12 ǫαβγδ ∗F γδ = 41 ǫαβγδ ǫγδεζ Fεζ = 41 ǫγδαβ ǫγδεζ Fεζ


ζ]
= 14 (−4)δα[ε δβ Fεζ = −δαε δβζ Fεζ = −Fαβ .

Here we have used that

ε ζ ζ] [ε ζ]
ǫγδαβ ǫγδεζ = −4δ[α δβ] = −4δα[ε δβ = −4δ[α δβ] ,

which can be proved by elementary considerations about permutations, as done


in (B.25).

Making use of either the definition (8.20) between ∗Fµν and Fµν we find that
∗Fµν takes the form

 
0 B1 B2 B3
−B 1 0 E 3 −E 2 
∗Fµν =
−B 2 −E 3
 (8.22)
0 E1 
−B 3 E 2 −E 1 0

w.r.t. the chosen frame of reference. Note that (8.22) arises from (8.16) by
replacing
B i 7→ E i , E i 7→ −B i .

~ and B
8.2 Transformations of E ~

Suppose we are given an electromagnetic field tensor Fµν (where we reempha-


size that this is an abstract object that exists independently of any choice of
coordinates). W.r.t. a chosen coordinate frame X (= observer X), the tensor
components take the form (8.16), where E ~ and B ~ are the electric and mag-
netic 3-field as seen by X. W.r.t. a different observer X ′ , the field tensor is
represented by a different matrix, whose entries E~ ′ and B
~ ′ are the electric and
magnetic 3-field as seen by X .′

Let L denote the Lorentz transformation connecting the inertial frames of ref-

version 20/01/2010 –115–


Transformations of ~E and B
~ Chapter 8. Maxwell

erence X and X ′ ; e.g., L is a boost (2.31). Then3


   
0 −E ′1 −E ′2 −E ′3 0 −E 1 −E 2 −E 3
E ′1 0 B ′3 −B ′2   1 B 3 −B 2 
  = (L−1 )T E 0  L−1 .
E ′2 −B ′3 0 B ′1  E 2 −B 3 0 B1 
E ′3 B ′2 −B ′1 0 E 3 B 2 −B 1 0
We thereby obtain the transformation of the electric and the magnetic fields
under a change of observer.

However, in the spirit of these lectures notes, we prefer to avoid computations


of this kind; we favor coordinate-independent considerations.

Consider an arbitary inertial observer X whose four-velocity is uµ . The electric


four-field seen by the observer X is given as
E µ = F µν uν , (8.23a)
which takes the form  
µ 0
E = ~ (8.23b)
E
in the observer’s coordinates. Analogously, the magnetic four-field seen by X
is given as 
B µ = − ∗F µν uν , (8.24a)
which takes the form  
µ0
B = ~ (8.24b)
B
w.r.t. X. Although evident from the definitions (8.23a) and (8.24a), we emphat-
ically stress that the electric/magnetic four-fields—the four-fields themselves,
not merely their coordinate representations—depend on the choice of observer.4
Remark. Consider a particle with charge e and four-velocity uµ . The electric
four-field sensed by the particle is E µ = F µν uν . (Strictly speaking, this is
the four-field seen by the co-moving observer X, whose four-velocity coincides
with that of the particle.) Therefore, the Lorentz force acting on the particle is
simply F µ = eE µ . Hence, from the particle’s point of view, the Lorentz force
is a purely electric force.
3
We refer to (A.19) and (A.22), where we note that the transformation of tensors is a
straightforward generalization; e.g., T̂ij = (A−1 )ki (A−1 )lj T̆kl .
4
If there existed four-vector fields E µ and B µ representing the electric and magnetic field in
an observer-independent fashion, we would automatically have an observer-independent
decomposition of the electromagnetic field into an electric and a magnetic part. But this
contradicts the mixing of the electric and magnetic field, see section 8.1.

–116– version 20/01/2010


Chapter 8. Maxwell Transformations of ~E and ~B

Remark for experts. Let E µ and B µ be the electric four-field and magnetic
four-field associated with the four-velocity uµ . It is not difficult to show that

Fµν = uµ Eν − uν Eµ + ǫµνστ uσ B τ (8.25a)


σ τ
∗F µν = −uµ Bν + uν Bµ + ǫµνστ u E . (8.25b)

Equipped with the definitions (8.23a) and (8.24a) it becomes straightforward


to investigate the transformation of the electric and magnetic field. Obviously,
the magnitude of the electric field |E|~ (as it is experienced by X) has the
representation 5
~ 2 = E µ Eµ .
|E| (8.26a)
~ is given by
Analogously, the magnitude of the magnetic field |B|
~ 2 = B µ Bµ .
|B| (8.26b)
~ and B
Furthermore, the angle between the fields E ~ is obtained from

~B
E ~ = Eµ B µ . (8.26c)

The formulas (8.26) extract the physically relevant information about the elec-
tric field and the magnetic field from a given electromagnetic field tensor Fµν .

Now suppose there are two observers, X and X ′ , with four-velocities u and u′ ,
respectively. In the coordinates used by X we have
   
µ 1 ′µ 1
u = , u =γ , (8.27)
~o ~v
and  
0 E1 E2 E3
E 1 0 B 3 −B 2 
F µν =
E −B
, (8.28)
2 3 0 B1 
E 3 B 2 −B 1 0
where E~ and B ~ are the electric and the magnetic field as seen by X. (Compare
the index structure of (8.16) and (8.28).) Let us denote by E ~ ′ and B
~ ′ the fields
as seen by X ′ . The associated four-fields are E ′µ and B ′µ , which are given by
! !
′µ µ ′ν
~
~v E ~k
~v E
E =F νu =γ ~ ~ =γ E ~k + E~ ⊥ + ~v × B ~ (8.29a)
E + ~v × B
5
These considerations are in complete the analogy with (7.13) and (7.14). A comment on
the notation, though: Where we wrote η(v, v) earlier, we write v µ vµ here.

version 20/01/2010 –117–


Transformations of ~E and B
~ Chapter 8. Maxwell

and6
! !
 ~
~v B ~k
~v B
B ′µ = − ∗F µν u′ν = γ ~ − ~v × E
~ =γ ~k + B
~ ⊥ − ~v × E
~ . (8.29b)
B B

Equation (8.26) then leads to


h i
|E ~ k )2 + (E
~ ′ |2 = E ′µ Eµ′ = γ 2 −~v 2 (E ~ k )2 + (E
~ ⊥ + ~v × B)
~ 2

~ k )2 + γ 2 (E
= (E ~ ⊥ + ~v × B)
~ 2 (8.30a)

and, in complete analogy,


h i
|B ~ k )2 + (B
~ ′ |2 = B ′µ B ′ = γ 2 −~v 2 (B ~ k )2 + (B
~ ⊥ − ~v × E)
~ 2
µ

~ k )2 + γ 2 (B
= (B ~ ⊥ − ~v × E)
~ 2. (8.30b)

In addition we have
~ ′B
E ~ ′ = Eµ′ B ′µ = E
~B~. (8.30c)

Hence, summarizing,

~ ′k |2 + |E
|E ~ ′⊥ |2 = |E
~ k |2 + γ 2 (E
~ ⊥ + ~v × B)
~ 2,
~ ′k |2 + |B
|B ~ ′⊥ |2 = |B
~ k |2 + γ 2 (B
~ ⊥ − ~v × E)
~ 2,
~ ′k | |B
|E ~ ′k | + |E
~ ′⊥ | |B
~ ′⊥ | = |E
~ k | |B
~ k | + |E
~ ⊥ | |B
~ ⊥| .

From these formulas we infer that


 
~ ′k = E
E ~k ~ ′⊥ = γ E
E ~ ⊥ + ~v × B
~ (8.31a)
 
~ ′k = B
B ~k ~ ′⊥ = γ B
B ~ ⊥ − ~v × E
~ , (8.31b)

when the spatial axes of the observers are properly aligned. Equations (8.31)
are the transformation rules for the electric field and the magnetic field under
~ and a
a change of inertial frame of reference. While X sees an electric field E
~ ′ ~ ′
magnetic field B, the observer X sees E and B . ~ ′

6
The ‘strange’ form of E ′µ and B ′µ is due to the fact that we use the coordinates of the
observer X to write down the fields experienced by X ′ . Clearly, in the own coordinates
~ ′ )T and analogously for B ′µ .
of the observer X ′ , E ′µ = (0, E

–118– version 20/01/2010


Chapter 8. Maxwell Transformations of ~E and ~B

We conclude this section by discussing the invariants of the electromag-


netic field Fµν . As we have already seen from (8.30c), the scalar product

~B
E ~

is an invariant—it is the same for all observers. A straightforward computation


using (8.16) and (8.22) yields

~B
E ~ = 1
Fµν ∗F µν . (8.32)
4

Since this is a manifestly coordinate-independent expression, invariance is a


matter of course.

The squared magnitudes of the electric and magnetic field are not invariant
under a change of observer, see (8.30). In terms of the electromagnetic field
tensor we have

~ 2 = η αβ Fαγ Fβδ uγ uδ ,
|E| ~ 2 = η αβ ∗F αγ ∗F βδ uγ uδ .
|B|

However, the difference


~2 − B
E ~2

is an invariant quanitity, i.e., this quantity remains unchanged under a change


of observer,
E~′2 − B~′2 = E~2 − B
~2 .

To prove this claim, we can either perform a straightforward computation based


on (8.31), or we note that

~2 − B
E ~ 2 = − 1 Fµν F µν = 1
∗Fµν ∗F µν . (8.33)
2 2

~2 −B
Since E ~ 2 has a coordinate-independent representation, it is automatically
an invariant.
Remark. The invariants E ~B~ and E ~2 − B
~ 2 are the only invariants of the elec-
tromagnetic field tensor. To show this we invoke a rather general argument.
Let Ajk be an antisymmetric matrix. Antisymmetric matrices have imaginary
eigenvalues. Moreover, eigenvalues come in complex conjugate pairs, i.e., if
λ ∈ C is an eigenvalue, so is λ̄. Therefore, if our space is 2n-dimensional, the
eigenvalues of Ajk must be ia1 , −ia1 , ia2 , −ia2 , . . . , ian , −ian for some ak ∈ R,
k = 1, . . . , n. The invariants of Ajk must be combinations of the eigenvalues;
accordingly, Ajk can possess merely n independent invariants: ai , i = 1, . . . , n.
Therefore, in our particular case, there can be up to two independent invariants

version 20/01/2010 –119–


Transformations of ~E and B
~ Chapter 8. Maxwell

(which we have already found). In fact, one easily finds that the characteristic
polynomial of F µν is given by
 
~2 − B
λ4 − E ~ 2 λ2 + E
~B~ =0,

which explicitly contains the two invariants.7


Remark for experts. Computations become simpler8 in the language of differen-
tial forms. We refrain from discussing differential forms in these lecture notes;
however, here’s a teaser. Let E and B denote the 1-forms associated with the
electric and magnetic four-field; the components are Eµ and Bµ , respectively.
Then equation (8.25a) becomes

F = u ∧ E + ∗(u ∧ B) ,

which implies

∗F = ∗(u ∧ E) + ∗ ∗ (u ∧ B) = −u ∧ B + ∗(u ∧ E) ,

i.e., (8.25b). Let A and B be arbitrary 2-forms; the definition of the Hodge
star, A ∧ (∗B) = η(A, B)vol, leads to A ∧ (∗B) = (∗A) ∧ B. In particular, since
∗ ∗ = −id, we find (∗A) ∧ (∗A) = −A ∧ A for arbitrary A. Now, on the one
hand,
F ∧ (∗F ) = η(F, F )vol = 21 Fµν F µν vol ;

on the other hand,


   
F ∧ (∗F ) = u ∧ E + ∗(u ∧ B) ∧ − u ∧ B + ∗(u ∧ E)
= (u ∧ E) ∧ ∗(u ∧ E) − ∗(u ∧ B) ∧ (u ∧ B)
= η(u ∧ E, u ∧ E) vol − η(u ∧ B, u ∧ B) vol
 
η(u, u) η(u, E) η(u, u) η(u, B)
= − vol
η(E, u) η(E, E) η(B, u) η(B, B)
 
= −η(E, E) + η(B, B) vol .

~2 − B
This yields (8.33) and thus invariance of E ~ 2.

7
There is a small subtlety here that involves the index structure of the tensor. We refrain
from giving any details, since the general picture remains unaffected.
8
“Simpler”.

–120– version 20/01/2010


Chapter 8. Maxwell The field of a uniformly moving charge

Equation (8.32) is obtained by considering F ∧ F . On the one hand,


   
F ∧ F = u ∧ E + ∗(u ∧ B) ∧ u ∧ E + ∗(u ∧ B)
= (u ∧ E) ∧ ∗(u ∧ B) + ∗(u ∧ B) ∧ (u ∧ E)
 
η(u, u) η(u, E)

= 2 η(u ∧ E, u ∧ B) vol = 2 vol
η(B, u) η(E, B)

= −2 η(E, B) vol .
On the other hand, F ∧ F = −η(F, ∗F ) vol, and η(F, ∗F ) = (1/2)Fµν ∗F µν ;
hence equation (8.32) follows.

8.3 The field of a uniformly moving charge

Consider a uniformly moving point charge. It is characterized by a straight


world line with a four-velocity vector that we denote by wµ ; w.l.o.g. we assume
that the world line passes through the origin. What is the electromagnetic field
generated by this point charge?

Let’s pretend that we haven’t ever heard about a Coulomb field. But still we
would like to know the electromagnetic field generated by the point charge.
Is this possible? Yes. We can derive the electromagnetic field of a uniformly
moving charge from basic mathematical and geometric considerations.

The field tensor of a uniformly moving charge

The electromagnetic field is represented by an antisymmetric tensor field Fµν ;


at event xµ , the field is Fµν (xσ ). This tensor must be built from coordinate-in-
dependent entities; however, there are but three available coordinate-indepen-
dent entities: The event xµ itself, the four-velocity wµ of the point charge, and
the distance r between the event and the world line of the charge.

By the distance r between an event xµ and a timelike (straight) line


{z µ + swµ | wµ wµ = −1, s ∈ R}
we mean the normal distance, i.e.,
2   
r 2 = Pw (x − z) = η Pw (x − z), Pw (x − z) = Pw xµ − Pw z µ Pw xµ − Pw zµ ,

version 20/01/2010 –121–


The field of a uniformly moving charge Chapter 8. Maxwell

where Pw v µ = v µ + η(v, w)wµ is the spatial projection of an arbitrary four-


vector v µ onto the orthogonal complement of wµ , see (5.14). We thus obtain
2
r 2 = Pw (x − z) = η(x − z, x − z) + η(w, x − z)2 . (8.34)

The notion of distance in Minkowski space is analogous to the notion of distance


in a Euclidean space. However, note that the distance between an event and a
timelike (straight) line is the maximal distance; this is in contrast to Euclidean
geometry, where the normal distance is the minimal distance.
Remark. The reader is encouraged to go back to section 5.3 and reinvestigate
the considerations on proper length and the Lorentz contraction by making use
of the concept of distance.

Let us return to the problem at hand. The assumption that the world line of
the charge passes through the origin amounts to setting z µ = 0 in (8.34); hence
the distance between the event xµ and the particle’s world line is

r 2 = (Pw x)2 = η(x, x) + η(w, x)2 . (8.35)

The electromagnetic field tensor must be built from xµ , wµ , and r. There are
only two possible ways: Either
1
Fµν = w x , (8.36E )
f (r) [µ ν]

or

1 1
F̄µν = ǫµνστ w[σ xτ ] , (8.36B )
f (r) 2

where f (r) is some function of r.

Let us deonote the rest frame of the charge by X ′ and the temporal and spatial
coordinates of this comoving observer by t′ and ~x′ , respectively. Let us compute
w[µ xν] in these coordinates. Since, w.r.t. X ′ , the four-velocity wµ of the particle
is  
µ 1
w = ,
~o
and the event xµ has coordinates
 ′
µ t
x = ,
~x′

–122– version 20/01/2010


Chapter 8. Maxwell The field of a uniformly moving charge

it is easy to see that w[µ xν] is represented by a matrix of the form


 
0 −x′1 −x′2 −x′3
1 x′1 0 0 0 
w[µ xν] =  
2 x ′2 0 0 0 
x′3 0 0 0

in the coordinates of X ′ . (Recall that wµ = ηµν wν ; hence w0 = −1 whenever


w0 = 1.)

Therefore, with (8.36E ),


 
0 −x′1 −x′2 −x′3
1  x
′1 0 0 0  
Fµν = 
2f (r) x′2 0 0 0 
x′3 0 0 0

in the coordinates of X ′ , so that (8.16) leads to

~′ = E
~ ′ (t′ , ~x′ ) = ~x′ ~′ = B
~ ′ (t′ , ~x′ ) = ~o .
E , B (8.37)
2f (r)

The distance r is simply given by r = |~x′ | is these coordinates. (Note that


Pw xµ = (0, ~x′ )T .)

The function f (r) can be determined (at least heuristically) by using geometric
arguments involving surfaces of spheres. We obtain f (r) ∝ r 3 and thus

~ ′ = ẽ ~x ,
E ~ ′ = ~o .
B (8.38)
r 3

This is the Coulomb field of a point charge. In this context, ẽ = e/(4πǫ0 ),


where e is the charge, and ǫ0 is the vacuum permittivity or dielectric constant.

Summarizing, the electromagnetic field tensor that corresponds to the Coulomb


field is
2w[µ xν] wµ xν − wν xµ
Fµν (xρ ) = ẽ 3
= ẽ , (8.39)
r r3

where wµ is the four-velocity of the charge, r the distance between xρ and the
particle’s world line, and ẽ = e/(4πǫ0 ). In the coordinates of the comoving

version 20/01/2010 –123–


The field of a uniformly moving charge Chapter 8. Maxwell

observer X ′ we have
 
0 −x′1 −x′2 −x′3
ẽ x′1 0 0 0 
Fµν = 3 
 (8.39′ )
r x ′2 0 0 0 
x′3 0 0 0
and thus (8.38). Some comments are in order.
Remark. To derive (8.39) we have employed (8.36E ). What about (8.36B )?
Using (8.36B ) we obtain the dual of (8.39′ ), i.e.,
 
0 0 0 0
ẽ 0 0 x′3 −x′2 
F̄µν = ∗F µν = 3  ,
r 0 −x ′3 0 x′1 
0 x′2 −x′1 0
when we use the coordinates of X ′ . This electromagnetic field tensor would
correspond to

~ ′ = ~o ,
Ē ~ ′ = ẽ ~x ,
B̄ (8.40)
r 3

which is the field of a (hypothetic) magnetic monopole. In this context, ẽ


encodes the magnetic charge of the monopole. We reject this possibility on
physical grounds.
Remark. An elegant derivation of (8.38) from (8.39) makes use of the concepts
of section 8.2. The electric four-field E ′µ seen by X ′ is
wµ xν − wν xµ ν η(w, x)wµ + xµ Pw xµ
E ′µ = F µν wν = ẽ w = ẽ = ẽ .
r3 r3 r3
Since Pw xµ = (0, ~x′ )T we obtain (8.38). The magnetic four-field B ′µ seen by
X ′ is
ẽ µ ẽ
B ′µ = − 21 ǫµνστ F νσ wτ = − 3
ǫ νστ w[ν xσ] wτ = − 3 ǫµνστ wν xσ wτ = 0 .
r r

Electric field for an arbitrary observer

Our goal is to compute the electric and magnetic field as seen by an arbitrary
observer X, whose four-velocity is uµ and thus different from the particles’
four-velocity wµ . W.r.t. the coordinates {t, ~x} used by X we have
     
µ 1 µ 1 µ t
u = , w =γ x = . (8.41)
~o ~v ~x

–124– version 20/01/2010


Chapter 8. Maxwell The field of a uniformly moving charge

~ as seen by X is given via the electric four-field


The electric field E

Eµ = Fµν uν , (8.42)

see (8.23a). We obtain


ẽ  ẽ 
Eµ = 3
wµ xν − wν xµ uν = 3 − twµ + γxµ , (8.43)
r r
which results in  
µ ẽ 0
E (t, ~x) = 3 γ (8.44)
r ~x − ~v t
in the coordinates of X. The vector on the r.h.s. of this equation represents
the distance between the point at which the field is measured and the position
of the charged particle at time t,
 
0
κ µ = xµ − tγ −1 wµ = , ~ = ~x − ~v t .
κ (8.45)
~x − ~v t

We thus write
ẽ ~ ~x) = ẽ γ~
E µ (t, ~x) = γκ µ and E(t, κ. (8.46)
r3 r3

It remains to express r 3 in terms of the coordinates (t, ~x), ideally in terms of


~ . To this end we first recall from (8.34) that
κ
2
r 2 = η(Pw x, Pw x) = xµ xµ + wν xν .

Second we note that


Pw xµ = Pw κ µ ,
therefore 2
r 2 = η(Pw κ, Pw κ) = κ µ κµ + wν κν
which can be treated in a straightforward manner.9 We obtain
2 
r2 = κ
~ 2 + γ~κ~v = κ ~ 2 + γ 2~v 2 κ
~ 2 cos2 α = γ 2 κ
~ 2 1 − ~v 2 sin2 α , (8.47)

where α is the angle between κ ~ and ~v . Inserting this result into (8.46) we
finally arrive at
ẽ κµ
E µ (t, ~x) =  , (8.48)
γ 2 1 − ~v 2 sin2 α κ |3
3/2 |~

9
Note the intimate connection of these considerations with the results of section 5.3.

version 20/01/2010 –125–


The field of a uniformly moving charge Chapter 8. Maxwell

so that
~ ~x) = ẽ ~
κ
E(t,  . (8.49)
γ 2 1 − ~v 2 sin2 α
3/2 κ |3
|~

We conclude that the electric field of a moving charge is compressed by a factor


of γ −2 in the direction of motion. However, it is stretched by a factor of γ in
the plane orthogonal to to the velocity. If the velocity is close to the speed of
light, the electric field is almost concentrated in that plane.

Magnetic field for an arbitrary observer

The magnetic four-field seen by the observer X with four-velocity uµ is


 1 ẽ ẽ
Bµ = − ∗F µν uν = − ǫµνστ 3 2 w[σ xτ ] uν = −ǫµνστ 3 wσ xτ uν
2 r r
We anticipate that there is a simple relation between E µ and B µ . Hence we
use (8.43) in the form ẽxµ /r 3 = γ −1 (E µ + twµ ) and obtain

Bµ = −γ −1 ǫµνστ wσ (E τ + twτ )uν = γ −1 uν ǫνµστ wσ E τ .

W.r.t. the observer’s coordinates we have (8.41), hence uµ is represented by δ0µ ,


which leads to

Bi = γ −1 ǫ0iστ wσ E τ = γ −1 ǫ0ijk wj E k = ǫijk v j E k ,

i.e.,

~ = ~v × E
B ~ , (8.50)
~ is given by (8.49).
where E

An alternative derivation

Naturally there exist alternative derivations of the electromagnetic field of a


uniformly moving charge. The ‘standard’ derivation is based on the transfor-
mation formula (8.31) (which did not enter at all in our considerations above!)
and application of the Lorentz transformation. Let us sketch this alternative
derivation.

–126– version 20/01/2010


Chapter 8. Maxwell The field of a uniformly moving charge

We presuppose equation (8.38), which is the electric (Coulomb) field in the


rest frame of the particle. W.r.t. to an inertial frame X, in which the charged
~ and B.
particle is moving with velocity ~v , the field is E ~ From (8.31) we infer
 
~k = E
E ~ ′k ~⊥ = γ E
E ~ ′⊥ − ~v × B
~′ (8.51a)
 
~k = B
B ~ ′k ~⊥ = γ B
B ~ ′⊥ + ~v × E
~′ , (8.51b)

hence

~k = E
E ~ ′k ~⊥ = γ E
E ~ ′⊥ (8.52a)

~k = 0
B ~ ⊥ = γ ~v × E
B ~′ . (8.52b)

It remains to express ~x′ in terms of t and ~x, which can be done by using the
Lorentz transformation (2.31),
    t 
t′ γ −γ~v T
x1′    x1 
 = γ − 1 T   (8.53)
x2′  −γ~v 1+ ~
v ~
v x2  .
x3′ v2 x3

Standard (but tiresome) algebraic manipulations show that the results can be
expressed in terms of the vector κ
~ = ~x − ~v t. Finally, we arrive again at (8.49).
~ is to again make use
The simplest way to compute the magnetic field B
of (8.52):

~k = E
E ~ ′k ~⊥ = γ E
E ~ ′⊥
 
~k = 0
B ~ ⊥ = γ ~v × E
B ~ ′ = γ ~v × E
~ ′⊥ .

Consequently,
   
~k = 0
B ~ ⊥ = ~v × γ E
B ~ ⊥ = ~v × E
~ ′⊥ = ~v × E ~ ,

and therefore

~ = ~v × E
B ~ . (8.54)

This completes our discussion of the electromagnetic field of a uniformly moving


charge.

version 20/01/2010 –127–


Distributions of particles Chapter 8. Maxwell

8.4 Distributions of particles

Previously we have studied in detail the motion of a single particle. In this


section we analyze a continuous distribution of particles.

A distribution of particles is characterized by a function ρ(t, ~x) that represents


the particle density at time t and position ~x. (Alternatively, the function ρ(t, ~x)
might represent the mass density or the charge density. It is the latter that is
relevant in electromagnetism.) Accordingly,
Z
N (t) = ρ(t, ~x) d3 x (8.55a)
V

represents the number of particles in a given volume V at time t. It is customary


to write
dN = ρ d3 x . (8.55b)
In addition to ρ(t, ~x) there exists a velocity field ~v (t, ~x), which represents the
velocity of the particles at position ~x, at time t. The particle current is simply
given as ~ = ρ~v . If particles are conserved, the continuity equation holds,
∂ρ ~
+ ∇~ = 0 . (8.56)
∂t
(If ρ represents the charge density, then the continuity equation reflects the
conservation of charge.)

Obviously, the definitions and formulas given here make sense only once co-
ordinates {t, ~x} have been chosen. However, while the velocity field ~v (t, ~x)
undergoes the obvious transformation under a change of coordinates, the den-
sity ρ(t, ~x) does not: ρ is not a scalar function. This is because its definition
involves volumes, which change under a change of observer (Lorentz transfor-
mation), see section 5.3, where we discuss the Lorentz contraction. Using the
results of that section we can find an easy fix by simply compensating for the
change of volume with the right γ factor. However, to neatly embed the treat-
ment of distributions of particles into the four-vector formalism of Minkowski
spacetime, let us proceed a bit more systematically and geometrically.

A distribution of particles in Minkowski space is a collection of particles repre-


sented by a scalar field ρp (xσ ) and a vector field uµ (xσ ) on Minkowski space.
The latter encodes the four-velocity of the particles at the event xσ , the former
is the particle density measured in the rest frame of the particles at the event
xσ . We call ρp the proper density of the distribution.

–128– version 20/01/2010


Chapter 8. Maxwell Distributions of particles

Remark. Consider a fixed event xσ . For an observer with four-velocity uµ (xσ ),


we have  
µ 1
u = (8.57)
~o
w.r.t. the observer’s coordinates {t, ~x} and basis {u = e0 , e1 , e2 , e3 }. The proper
density (at xσ ) is the density in these coordinates, i.e.,

dN = ρp d3 x . (8.58)
(Note that the volume element d3 x is derived from the rest frame coordinates
we use.)
Definition 8.1. The particle four-current density of a distribution of particles
is
j µ = ρp uµ . (8.59)
Since j µ is constructed from a scalar and a four-vector, it is a four-vector.

W.r.t. an arbitrary observer X we have


 
µ µ 1
j = ρp u = ρp γ . (8.60)
~v
In order to interpret the quantity ρp γ in terms of the density ρ measured by X
we must consider volumes.

By definition, the proper density ρp at the event xµ represents the number


of particles contained in a unit volume in the rest frame of the particles at
xµ . For an observer X that moves w.r.t. this rest frame with velocity −~v ,
‘longitudinal lengths’ undergo a Lorentz contraction by a factor of γ −1 , while
‘transversal lengths’ are unaffected.10 Consequently, volumes (like a cube)
appear ‘contracted’ by a factor of γ −1 in the direction of motion of X. Under
the change of observer we consider, we thus find that
ρ = ρp γ (8.61)
is the density seen by X.

Accordingly, we can write the four-current density (8.60) as


 
µ 1
j =ρ , (8.60′ )
~v
10
The observer X moves with velocity −~v w.r.t. the local rest frame of the particles. Hence, in
the observer’s coordinates, the particles move with velocity ~v , which is reflected in (8.60).

version 20/01/2010 –129–


Distributions of particles Chapter 8. Maxwell

where ρ and ~v are the particle density and the velocity field of the particle
distribution as seen by X.

The continuity equation takes a manifestly covariant form when we use the
four-current j µ . From (8.56) we directly obtain

∂µ j µ = 0 , (8.62)

i.e., the four-current density is divergence-free. Recall that ∂0 = ∂/∂x0 = ∂/∂t


and that ∂i = ∂/∂xi .

A little more about volumes

In analogy to the rigid rod considered in section 5.3, consider an extended


body in uniform motion that takes up a fixed volume; let the four-velocity of
the body be uµ . The rest frame of the body is given by the co-moving observer
X, whose four-velocity coincides with that of the body.

In the simplest case, the volume is a cuboid that is spanned by three spacelike
vectors aµ , bµ , cµ . W.l.o.g. we could choose these vectors to be orthogonal to
the four-velocity uµ , so that they lie in the observer’s plane of simultaneity;
this is not necessary, however. W.r.t. the co-moving observer’s coordinates we
have
   0  0  0
µ 1 µ a µ b µ c
u = , a = , b = ~ , c = . (8.63)
~o ~a b ~c

Each point pµ in the volume taken up by the body propagates along a world
line pµ + suµ . Now, the proper volume of the cuboid (which is the volume
measured in the rest frame X) is simply
 
V = Vp = det ~a ~b ~c = ǫijk ai bj ck . (8.64)

Using the coordinate representation (8.63) it is not difficult to convince oneself


that this is equivalent to

V = Vp = ǫµαβγ uµ aα bβ cγ . (8.65)

Since the volume form ǫαβγδ is invariant under a change of inertial coordinates,
this formula holds independently of the chosen coordinates.

–130– version 20/01/2010


Chapter 8. Maxwell Maxwell’s equations

For a different observer X ′ , whose four-velocity is u′µ , the body moves with
some velocity ~v , i.e.,
   
′µ 1 µ 1
u = , u =γ (8.66)
~o ~v

w.r.t. X ′ . W.l.o.g. the three vectors aµ , bµ , and cµ can be arranged to be


simultaneous in the coordinates of X ′ , i.e.,
     
0 0 0
aµ = , bµ
= ~b′ , cµ
= . (8.67)
~a′ ~c′

w.r.t. X ′ . The volume measured by X ′ is given by


 
V ′ = det ~a′ ~b′ ~c′ = ǫijk a′i b′j c′k . (8.68)

or, equivalently, by11


V ′ = ǫµαβγ u′µ aα bβ cγ . (8.69)
In contrast, the proper volume is given by (8.65). Inserting uµ = γ(u′µ + v i e′i µ )
into (8.65) we compute

Vp = γǫµαβγ u′µ aα bβ cγ + γ ǫµαβγ v i e′i µ aα bβ cγ = γV ′ . (8.70)


| {z }
=0
Consequently, we find
V ′ = γ −1 Vp , (8.71)
i.e., there is a ‘volume contraction’.

8.5 Maxwell’s equations

W.r.t. an inertial coordinate system, Maxwell’s equations read


~E
∇ ~ = 4πρ ~B
∇ ~ =0 (8.72a)
~ ×E
∇ ~ = −∂t B
~ ~ ×B
∇ ~ = ∂t E
~ + 4π~ , (8.72b)

where ρ = ρ(t, ~x) is the charge density and ~ = ρ~v is the charge current density.
Our aim is to find a version of Maxwell’s equations that is manifestly covariant.
11
The fact that we have chosen aµ , bµ , cµ to be orthogonal to u′µ is important here.

version 20/01/2010 –131–


Maxwell’s equations Chapter 8. Maxwell

Let us first consider the equations with source terms, i.e.,

~E
∇ ~ = 4πρ , ~ ×B
∇ ~ − ∂t E
~ = 4π~ . (8.73)

In the preceding section we have seen that ρ and ~ can be collected into a
four-vector, the four-current density
 
1
jµ = ρ .
~v

Recalling that Fi0 = δij E j and Fij = ǫijk B k it is straightforward to see that
the equations (8.73) take the form

∂ν F µν = 4πj µ . (8.73′ )

This is the first of the Maxwell equations in their relativistic formulation.


(Clearly, since µ = 0, . . . , 3, we have four equations encoded in (8.73′ ).)
Example. For instance, let µ be a spatial index k. Then
  
~ k,
~ ×B
∂ν F kν = ∂0 F k0 + ∂i F ki = ∂t η 00 E k + ∂i ǫkij Bj = −∂t E k + ∇

from which the second equation of (8.73) follows.


Remark. Often, partial derivatives are denoted by commas, e.g., for a function
f we have f,µ ≡ ∂µ f . Using this notation, ∂ν F µν becomes F µν,ν , so that (8.73′ )
is written as
F µν,ν = 4πj µ . (8.73′′ )

Second we consider the remaining equations

~B
∇ ~ =0, ~ ×E
∇ ~ + ∂t B
~ =0. (8.74)

A straightforward calculation shows that these equations are equivalent to

∂[µ Fνσ] = 0 . (8.74′ )

Equivalently, we can write


F[µν,σ] = 0 . (8.74′′ )

–132– version 20/01/2010


Chapter 8. Maxwell Maxwell’s equations

Example. For instance, consider ∂[0 Fij] , which is

1 1h  i
∂[0 Fij] = [∂0 Fij + ∂i Fj0 + ∂j F0i ] = ∂t ǫijk B k + ∂i Ej − ∂j Ei
3 3
1h k
 i
= ǫijk ∂t B + 2∂[i Ej] .
3
Then
  
3ǫlij ∂[0 Fij] = ǫlij ǫijk ∂t B k + 2ǫlij ∂i Ej = δjj δlk − δjk δlj ∂t B k + 2ǫlij ∂i Ej
  
= 3δlk − δlk ∂t B k + 2ǫlij ∂i Ej = 2∂t B l + 2 ∇ ~ ×E~ l

and the second equation in (8.74) follows.

Collecting the results we obtain the Maxwell equations in their manifestly co-
variant relativistic version:

F[µν,σ] = 0 , F µν,ν = 4πj µ . (8.75)

Remark. Alternatively, the homogeneous Maxwell equation can be expressed


in terms of the dual field tensor ∗Fµν . The Maxwell equations then read

∗F µν,ν = 0 , F µν,ν = 4πj µ , (8.76)

which is a simple consequence of (8.20) and (8.75).


Remark for experts. Let us define the complex tensor12

Wµν = Fµν + i ∗F µν . (8.77)

This tensor is antisymmetric like Fµν and ∗Fµν ; moreover, Wµν is anti-self-dual,
i.e.,
(8.21)
∗Wµν = ∗Fµν + i ∗∗Fµν = ∗Fµν − iFµν = −iWµν . (8.78)
The complex conjugate tensor W µν is self-dual, i.e., ∗W µν = iW µν . In terms
of Wµν , the Maxwell equations read

W µν,ν = 4πj µ . (8.79)


12
Some mathematical background: The duality map ∗ is a linear map on the linear space of
antisymmetric tensors Fµν . Since ∗ is an anti-involution, i.e., ∗ ∗ = −id, the only possible
eigenvalues of ∗ can be ±i. If Wµν is an eigenvector of ∗ associated with the eigenvalue
(−i), then the complex conjugate vector W µν is automatically an eigenvector associated
with the eigenvalue i.

version 20/01/2010 –133–


Four-potential Chapter 8. Maxwell

Remark for experts. In the language of differential forms, the antisymmetric


field Fµν is a 2-form, which is simply denoted by F . The operation F[µν,σ] then
corresponds to the exterior derivative of this 2-form and is written as dF . The
operation Fµ ν,ν is the co-differential δF = ∗ d ∗ F . Hence, using differential
forms, the Maxwell equations (8.75) look particularly simple:

dF = 0 , δF = 4πj . (8.80)

Here, j is the current one-form, whose components are jµ . Alternatively, one


writes
dF = 0 , d ∗F = 4πJ , (8.81)
where J = ∗j is the current three-form.

A simple consequence of Maxwell’s equations is the continuity equation. In-


deed,
4π∂µ j µ = ∂µ ∂ν F µν = 0 , (8.82)
which follows directly from the antisymmetry of Fµν (where we note that ∂µ ∂ν
is symmetric). The continuity equation implies the conservation of charge.

8.6 Four-potential

A fundamental result in analysis concerns the existence of potentials of vector


fields. For instance, if ~v (~x) is a vector field on R3 (or some other simply
connected space), then there exists a (scalar) potential φ(~x) if and only if the
curl of the vector field vanishes, i.e.,

~
~v = ∇φ ⇔ ~ × ~v = 0 .
∇ (8.83)

~ × ~v )i = ǫijk ∂j vk , this is in turn equivalent to requiring that ∂[i vj] = 0.


Since (∇
In general, if v i (x) is a vector field on x ∈ Rn , then there exists a (scalar)
potential φ(x), if and only if v[i,j] = 0, i.e.,

vi = ∂i φ ⇔ v[i,j] = 0 . (8.84)

This potential φ can be determined by a path integral,


Z x
φ(x) = vi dsi . (8.85)
0

–134– version 20/01/2010


Chapter 8. Maxwell Four-potential

The statement (8.84) can be generalized to antisymmetric tensors (n-forms): In


particular, if Fµν is an antisymmetric tensor, then there exists a four-potential
Aµ if and only if F[µν,σ] = 0. Therefore, for the electromagnetic field, the
Maxwell equations automatically guarantee the existence of a potential Aµ ,
whose antisymmetric derivatives yield Fµν . More specifically,

Fµν = 2A[µ,ν] ⇔ F[µν,σ] = 0 . (8.86)

Remark. The first guess for the relation between Fµν and the potential Aµ is
probably Fµν = Aµ,ν . However, to ensure that the l.h. side is antisymmetric,
we must perform an antisymmetrization. The factor of 2 in (8.86) is introduced
for aesthetic reasons; alternatively, we can write (8.86) as

Fµν = Aµ,ν − Aν,µ = ∂ν Aµ − ∂µ Aν . (8.86′ )

It is not necessary to invoke a general theorem to prove (8.86). Instead, we can


explicitly construct a four-potential in analogy to (8.85) (by means of what is
known as the Poincaré-Lemma). Let us define Aµ according to
Z1
Aµ (x) = Fµν (λx) λxν dλ . (8.87)
0

To prove (the nontrivial direction of) the statement (8.86), we must show that
2A[µ,ν] = Fµν . We proceed step by step:

Z1 Z1
σ
 
Aµ,ν = Fµσ (λx) x ,ν
λdλ = − Fσµ (λx) xσ ,ν
λdλ ,
0 0

hence
Z1 Z1  
σ

A[µ,ν] = Fσ[ν (λx) x ,µ]
λdλ = Fσ[ν,µ] (λx) λxσ + Fσ[ν δσµ] λdλ
0 0
Z1  
= Fσ[ν,µ] (λx) λxσ + F[µν] λdλ
0
Z1 Z1
2 σ
= Fσ[ν,µ] (λx) λ x dλ + Fµν λdλ .
0 0

version 20/01/2010 –135–


Four-potential Chapter 8. Maxwell

In the next step we use equation (∗) on this page.

Z1 Z1
(∗) 1
A[µ,ν] = Fµν,σ (λx) λ2 xσ dλ + Fµν λdλ
2
0 0
Z1   Z1
1 d 2
= Fµν (λx) λ dλ + Fµν λdλ
2 dλ
0 0

Finally, by an integration by parts, we obtain

1 1 Z1 Z1
1
2
A[µ,ν] = Fµν (λx) λ − Fµν λdλ + Fµν λdλ = Fµν (x)
2 0 2
0 0

Here we have used F[µν,σ] = 0 to derive

1
Fσ[ν,µ] = Fµν,σ . (∗)
2

Summing up, we have explicitly constructed a four-potential Aµ from Fµν


via (8.86),
Fµν = Aµ,ν − Aν,µ . (8.86′ )

It is important to note that there does not exist a unique four-potential. Let
µ and õ be four-potentials of Fµν , i.e.,

Fµν = 2Â[µ,ν] , Fµν = 2Ă[µ,ν] .

Then Aµ = õ − µ satisfies

A[µ,ν] = 0 ,

which implies that there exists a scalar function Λ so that Aµ = ∂µ Λ, see


equation (8.84). Accordingly,

õ = µ + Λ,µ . (8.88)

Conversely, if µ is a four-potential of Fµν , then õ defined by (8.88) is a four-


potential as well. The non-uniqueness of the four-potential expressed by (8.88)
is called gauge-freedom.

–136– version 20/01/2010


Chapter 8. Maxwell Four-potential

In the following we formulate Maxwell’s equations in terms of a four-potential


Aµ . By construction, F[µν,σ] = 0 holds automatically, since Fµν is derived from
a four-potential Aµ via (8.86′ ). The second of Maxwell’s equations becomes
 
4πj µ = ∂ν F µν = ∂ν ∂ ν Aµ − ∂ µ Aν = Aµ − ∂ µ ∂ν Aν . (8.89)

To get rid of the second term, we require the four-potential to satisfy the Lorenz
gauge condition13
∂ν Aν = 0 . (8.90)

Clearly, a given four-potential Aµ will in general not satisfy the Lorenz gauge
condition (8.90); however, we can always make use of the gauge freedom (8.88)
to achieve (8.90). To see this, assume that ∂ν Aν 6= 0; then there exists a scalar
function Λ and a modified four-potential µ = Aµ + ∂ µ Λ, such that

∂ν Āν = ∂ν Aν + ∂ ν Λ = ∂ν Aν + Λ = 0 ,

since the inhomogeneous wave equation

Λ = −∂ν Aν

possesses a solution Λ.
Exercise. Show that the four-potential Aµ defined by (8.87) automatically sat-
isfies the Lorenz gauge condition, if j µ = 0.

Using a four-potential Aµ in Lorenz gauge, i.e., ∂µ Aµ = 0 , the second Maxwell


equation thus reads

Aµ = 4πj µ . (8.91)

Hence, each component of the four-potential satisfies a wave equation.

Remark. W.r.t. some inertial coordinate system we have


 
0 −E 1 −E 2 −E 3
E 1 0 B 3 −B 2 
Fµν =
E 2 −B 3
,
0 B1 
E 3 B 2 −B 1 0
13
The Lorenz gauge condition was formulated by Ludvig Valentin Lorenz (b1829 in Helsingør,
d1891) and not by Hendrik Antoon Lorentz (b1853 in Arnhem, d1928 in Haarlem).

version 20/01/2010 –137–


Energy-momentum tensor Chapter 8. Maxwell

see (8.16). Likewise, the four-potential Aµ is given by


 
φ
Aµ = ~ , (8.92)
A

where we have set A0 = φ (so that A0 = −φ). Since Fµν = 2A[µ,ν] we have

Ei = Fi0 = Ai,0 − A0,i = ∂t Ai + ∂i φ ,


jk
Bi = 1
2 ǫijk F = ǫijk Aj,k = −ǫikj ∂ k Aj

and hence
~ = ∂t A
E ~ + ∇φ
~ , ~ = −∇
B ~ ×A
~. (8.93)
Therefore, the components φ and A~ of the four-potential coincide (up to a minus
14
sign ) with the electric scalar potential and the magnetic vector potential,
which are known from the standard formulation of Maxwell’s equations in terms
of potentials.

The Maxwell equation (8.91) for the four-potential Aµ in Lorenz gauge can
be solved if boundary conditions are prescribed. In particular, we obtain the
well-known advanced and retarded solutions discussed in every textbook on
electromagnetism.

8.7 Energy-momentum tensor

The energy-momentum tensor (stress-energy tensor) of the electromagnetic


field is defined as
 
1 σ 1 σλ
Tµν = Fµσ Fν − Fσλ F ηµν . (8.94)
4π 4

Remark for experts. Using the tensor Wµν = Fµν + i ∗F µν , see (8.77), and its
complex conjugate, the energy-momentum tensor Tµν can be constructed in a
simpler way,
1 1 σ
Tµν = Wµσ W ν . (8.94′ )
4π 2
14
If had chosen the signature of the Minkowski metric to be sign η = (+ − −−) instead of
(− + ++), there would not appear a minus sign here. However, our choice of signature
is well adapted to other situations; in particular, when we go over from special relativity
to general relativity.

–138– version 20/01/2010


Chapter 8. Maxwell Energy-momentum tensor

Proof. To prove that the two expressions for Tµν coincide we perform a straight-
forward calculation.15
σ  
Wµσ W ν = Fµσ + i ∗Fµσ Fν σ − i ∗Fν σ
= Fµσ Fν σ + 2 i ∗
F[µ σ Fν]σ + ∗Fµσ ∗Fν σ
| {z } |{z}
ητ [µ ∗F τ σ Fν]σ ηνπ ∗F πσ

= Fµσ Fν σ + i ητ [µ Fν]σ ǫτ σρπ Fρπ + 1


4 ǫµστ ρ F τ ρ ηνπ ǫπσλξ Fλξ . (8.95)

The second term can be manipulated by making use of the results of Ap-
pendix B. We have
ǫτ σρπ Fνσ Fρπ = ǫτ σρπ Fν[σ Fρπ]
because of (B.10) and further

ǫτ σρπ Fνσ Fρπ = ǫτ σρπ F[νσ Fρπ]

because of (B.28a). In the case of a four-dimensional vector space (which is


our case), modulo constants there exists only one totally antisymmetric tensor
of order four, the ǫ-tensor, cf. (B.12); hence

F[νσ Fρπ] ∝ ǫνσρπ

and thus
ǫτ σρπ Fνσ Fρπ ∝ ǫτ σρπ ǫνσρπ = ǫσρπτ ǫσρπν = −6δντ ,
where we have used (B.25). Therefore,

i ητ [µ Fν]σ ǫτ σρπ Fρπ ∝ ητ [µ δν]


τ
= η[νµ] = 0 ,

i.e., the second term in (8.95) vanishes. The third term in (8.95) can be ma-
nipulated along the following lines:

ǫµστ ρ ǫπσλξ F τ ρ Fλξ ηνπ = ǫσµτ ρ ǫσπλξ F τ ρ Fλξ ηνπ = −δµτ


πλξ τ ρ
ρ F Fλξ ηνπ

Here we have used (B.25). In the next step we apply (B.16), i.e.,
 
ξ] π
ǫµστ ρ ǫπσλξ F τ ρ Fλξ ηνπ = − 2δµπ δτ[λ δρξ] − 4δµ[λ δ[τ δρ] F τ ρ Fλξ ηνπ .
15
The calculation is straightforward, but this doesn’t mean that it’s easy. Actually, it isn’t.
In fact, it would be easier to define Fµν , ∗F µν and Wµν as matrices in Mathematica or
Maple and let the computer do the rest. However, we are here on a training ground for our
later lives as (theoretical) physicists. (Yes, this might be part of what you’ll be doing. . . )

version 20/01/2010 –139–


Energy-momentum tensor Chapter 8. Maxwell

The antisymmetrizations can be dropped because of (B.10) and we obtain

ǫµστ ρ ǫπσλξ F τ ρ Fλξ ηνπ = −2ηµν δτλ δρξ F τ ρ Fλξ − 4δµλ δτξ δρπ F τ ρ Fλξ ηνπ .

Continuing in the obvious way we get


1
4 ǫµστ ρ ǫπσλξ F τ ρ Fλξ ηνπ = − 12 ηµν F λξ Fλξ − F τ ρ Fµτ ηνρ
= − 12 ηµν F λξ Fλξ + Fµτ Fν τ .

Inserting this result into (8.95) we finally arrive at


h i
σ σ λξ τ
1
2 W µσ W ν = 1
2 Fµσ Fν − 1
2 ηµν F Fλξ + Fµτ Fν = Fµσ Fν σ − 14 ηµν F λξ Fλξ ,

which proves the claim (8.94′ ).

The energy-momentum tensor (8.94) is a symmetric tensor, i.e.,

Tµν = T(µν) . (8.96a)

To see this we simply note that Fνσ Fµ σ = Fν σ Fµσ = Fµσ Fν σ , hence the first
term is symmetric; for the second term, symmetry is evident.

Another important property of the electromagnetic energy-momentum tensor


is its tracelessness, i.e.,
T µµ = 0 . (8.96b)
The proof is straightforward:
1 h µ 1 i 1 h i
T µµ = F σ Fµ σ − Fσλ F σλ δµµ = Fµσ F µσ − Fσλ F σλ = 0 .
4π 4 |{z} 4π
=4

Next we compute the divergence T µν,ν of the energy-momentum tensor:



4πTµ ν,ν = Fµσ,ν F νσ + Fµσ F νσ,ν − 14 Fσλ,ν F σλ + Fσλ F σλ,ν δµ ν
(8.75)
= Fµσ,ν F νσ − 4πFµσ j σ − 12 Fσλ,µ F σλ
(∗)
= Fµσ,ν F νσ − 4πFµσ j σ − Fµ[λ,σ] F σλ

= Fµσ,ν F νσ − 4πFµσ j σ − Fµλ,σ F σλ


= −4πFµσ j σ

–140– version 20/01/2010


Chapter 8. Maxwell Energy-momentum tensor

where we have used equation (∗) of page 136. We note that Tµν is divergence-
free in the absence of sources, i.e.,

T µν,ν = 0 if j µ = 0 ; (8.97)

however, in general we obtain

T µν,ν = −F µν j ν . (8.98)

Let us investigate the r.h.s. of this equation. The charge four-current density
j µ is given by
j µ = ρe uµ ,
see (8.59), where in the present context ρe is the proper charge density (i.e.,
charge per proper volume) of the charge distribution. Hence,

F µν j ν = ρe F µν uν . (8.99)

The Lorentz force acting on a point particle (with charge e) is

F µ = eF µν uν .

Since ρe is charge per proper volume, we conclude that (8.99) represents a force
density Fµ ,
Fµ = F µν j ν ; (8.100)
it is the Lorentz force (per proper volume) acting on the charge distribution of
particles.

Now let {t, ~x} be inertial coordinates of an inertial observer X. The total
four-force acting on an infinitesimal volume d3 x is
µ
Ftot = Fµ γ d3 x , (8.101)

where the factor γ enters because the proper volume of d3 x is γ d3 x, see (8.71).
The (total) four-force is connected with the derivative (w.r.t. proper time16 ) of
the (total) four-momentum in d3 x, which is

d µ d
ptot = γ pµtot . (8.102)
ds dt
16
Proper time is defined along the integration curves of the four-vector field uµ .

version 20/01/2010 –141–


Energy-momentum tensor Chapter 8. Maxwell

The four-momentum pµtot is given by the four-momentum density Pµ times the


volume d3 x, i.e.,
pµtot = Pµ d3 x . (8.103)
Hence, the expression Fµ d3 x is connected with the time-derivative of Pµ d3 x.
(Equality holds under appropriate conditions on the distribution of particles or
when we consider not small volumes but the entire space R3 .)
Remark. Note that the four-momentum density Pµ is specific to the chosen
observer—a different observer would define a different four-momentum density
Pµ ′ .17 This is because Pµ is a density w.r.t. coordinate volume (and not proper
volume). In the case of simple matter we have Pµ = ρuµ , where in this context
ρ is the mass density (w.r.t. coordinate volume). This will be discussed further
in the next section.

Making use of these considerations we proceed with equation (8.98). The inte-
gral version is Z Z
T ,ν d x + F µν j ν d3 x = 0 .
µν 3
(8.104)

When we use that T µν,ν = ∂t T µ0 + ∂i T µi we obtain


Z Z
d
T µ0 d3 x + ∂i T µi d3 x (8.105)
dt
for the first term of (8.104). The integral over the spatial divergence ∂i T µi can
be transformed into a boundary integral:
Z Z
µi 3
∂i T d x = T µi dσi , (8.106)
V ∂V

where dσi is the surface element of the boundary ∂V of the volume V . If the
fall-off as |~x| → ∞ of the involved fields is sufficiently fast, then the boundary
integral vanishes in the limit of infinitely large spheres. Therefore,
Z
∂i T µi d3 x = 0 .
R3

The second term in equation (8.104) becomes


Z Z Z
µ ν 3 µ 3 d
F νj d x= F d x= Pµ d3 x . (8.107)
R3 R3 dt R3
Collecting the terms we thus obtain
17
This is in contrast to the four-current j µ which is observer-independent.

–142– version 20/01/2010


Chapter 8. Maxwell Energy-momentum tensor

Z  
d
T µ0 + Pµ d3 x = 0 . (8.108)
dt R3

The first conclusion we draw is that the quantity


T µ0 (8.109)

is the four-momentum density of the electromagnetic field (as seen by the ob-
server X). Consequently, the energy density is
   
00 1 0σ 0 1 σλ 00 1 ~ 2 1 ~ 2 ~ 2
T = F F σ − Fσλ F η = E − E −B
4π 4 4π 2
1 1  ~ 2 ~ 2
= E +B , (8.110)
4π 2
where we have used (8.33). Analogously, the three-momentum density is
 
i0 1 iσ 0 1 σλ i0 1 ij 0 1 ijk
T = F F σ − Fσλ F η = F Fj = ǫ Bk Ej
4π 4 4π 4π
1 ~ ~
i
= E×B (8.111)

~ ×B
The vector E ~ is the well-known Poynting vector.

The second conclusion concerns the balance equation (8.108). The sum of the
total momentum (in R3 ) of the electromagnetic field and the total momentum
of the charged matter is constant, i.e., we have conservation of energy and
momentum of the system (matter + fields).

The energy density T 00 and the momentum density T i0 are w.r.t. the chosen
observer X. Let wµ denote the four-velocity of this observer, i.e.,
 
µ 1
w =
~o

w.r.t. the observer’s coordinates. Then T µ0 can be written in the coordinate-


independent manner
T µ0 = −T µν wν = −T µν wν . (8.112)
In other words, T µν wν is the four-momentum density of the electromagnetic
field as seen by an observer with four-velocity wµ ; likewise,

T 00 = T µν wµ wν = Tµν wµ wν (8.113)

version 20/01/2010 –143–


Energy-momentum tensor Chapter 8. Maxwell

is the energy-density seen by this observer.

Finally, let us analyze the balance equation for general volumes in the vacuum
case (which corresponds to j µ = 0, Pµ = 0). We obtain
Z Z
d 00 3
T d x+ T 0i dσi = 0 and (8.114a)
dt V ∂V
Z Z
d
T k0 d3 x + T ki dσi = 0 . (8.114b)
dt V ∂V

The first equation states that the energy of the electromagnetic field in a given
volume is transported through the boundary via the Poynting vector T i0 . The
second equation formulates the loss of momentum in a volume through its
boundary in terms of the ‘stress-tensor’ T ij .

–144– version 20/01/2010


CHAPTER 9

BEYOND MINKOWSKI

9.1 The equivalence principle

Newtonian gravity

The basis of Newtonian gravity is a gravitational potential Φ. The potential


generated by a point particle located at ~z is
mag
Φ(~x) = −G .
|~x − ~z|

In this context, mag is a property of the point particle, which we call its active
gravitational mass; a priori it might be different from its (inertial) mass. More
generally, the gravitational potential is determined by the Poisson equation

∆Φ = 4πGρag .

In this equation, ρag is the density of the active gravitational mass of the
configuration that generates the gravitational field, and G is the gravitational
constant (which takes the value G = (6.6743 ± 0.0007) × 10−11 m3 kg−1 s−2 ).
The gravitational field is the vector field

~ = −∇Φ
φ ~ .

version 20/01/2010 –145–


The equivalence principle Chapter 9. Beyond Minkowski

The force exerted on a (point) particle is proportional to the gravitational field


~,
F~ = mpg φ
where mpg is a property of the point particle that we call its passive gravita-
tional mass; a priori it might be different from the active gravitational mass
and the (inertial) mass. The passive gravitational mass is a measure of the
coupling to a gravitational field.

When we invoke Newton’s third law of motion we see that the concepts of
active and passive gravitational mass coincide. Consider two point particles
with gravitational masses mag , mpg and Mag , Mpg , respectively. Then the
force exerted by the first particle on the second must equal the force exerted
by the second particle on the first, i.e.,
mag Mpg Mag mpg
−G
2
= −G ,
r r2
where r is the distance between the particles. We conclude that
mag Mag
= ,
mpg Mpg
which implies that we can set
mag = mpg
by adjusting units. In other words, there is but one gravitational mass mg that
enters the equations, i.e.,
∆Φ = 4πGρg , F ~,
~ = mg φ

universally for every type of matter.

The motion of a point particle in a gravitational field is described by Newton’s


second law, i.e., mi ~a = F~ , or
¨ = mg φ
mi ~x ~. (9.1)
Here, mi is the inertial mass of the particle. The inertial mass is the mass that
determines the particle’s inertia, its ‘resistance to accelerations’; it appears in
the energy and momentum formulas—it is the ‘kinematical mass’.

The Galilean equivalence principle is a postulate in Newtonian gravity.1 ‘The


motion of a test particle in a gravitational field depends solely on its initial
1
The equivalence principle in Einstein’s general theory of relativity is not a postulate but a
consequence of the field equations.

–146– version 20/01/2010


Chapter 9. Beyond Minkowski The equivalence principle

conditions (i.e., position and velocity).’ Using this principle in combination


with equation (9.1) we conclude that the ratio of inertial over gravitational
mass is a universal constant, i.e.,
mi
= const .
mg
Adjusting units we obtain equality of inertial and gravitational mass,

mi = mg . (9.2)

There is just one concept of mass m that enters the equations, i.e.,

∆Φ = 4πGρ , ~,
F~ = mφ

and (9.1) becomes


¨=φ
~x ~. (9.3)
It is evident that the equivalence principle is in fact equivalent to the equality
of inertial and gravitational mass.

The equivalence principle is under constant experimental tests. Loránd Eötvös’


experiments that were conducted at the beginning of the last century showed
that the relative difference between inertial mass and gravitational mass must
be less than 10−9 . Modern experiments have lowered that bound to approxi-
mately 10−12 ; a result of 2008 (PRL 100, 041101) shows that
mi  mi
= 1 + (0.3 ± 1.8) × 10−13
mg Ti mg Be
for titanium and beryllium test masses.

To study the implications of the equivalence principle let us consider a homo-


geneous gravitational field, i.e.,
~ = ~g ,
φ (9.4)

where ~g is a constant vector; we use the abbreviation g = |~g |. This gravita-


tional field corresponds to a (time-independent) potential Φ(~x) = −~g~x; it is a
vacuum solution of the Poisson equation.2 The motion of point particles in the
homogeneous gravitational field (9.4) is determined by
¨ = ~g .
~x (9.5)
2
But Φ(~x) 6→ 0 as |~ x| → ∞. It is thus a valid viewpoint to say that φ ~ = ~g isn’t a real
gravitational field at all, because it is not the field of an isolated body.

version 20/01/2010 –147–


The equivalence principle Chapter 9. Beyond Minkowski

However, equation (9.5) can be interpreted in a completely equivalent man-


ner as follows. Consider a (Galilean) spacetime in the absence of gravitational
fields. Instead of inertial coordinates ~x consider an accelerated frame of refer-
ence, i.e.,
1
~x′ = ~x + t2 ~g .
2
W.r.t. these coordinates, Newton’s second law (in the absence of gravitational
fields), m~x¨ = 0, becomes
¨′ = ~g .
~x (9.5′ )

We conclude that the effect of a homogeneous gravitational field corresponds


exactly to the effect of an apparent force in a uniformly accelerated frame of
reference (where gravitational fields are absent). In fact, it is impossible to
determine from experiments in a closed laboratory whether the lab is (at a
‘fixed position’) in a homogeneous gravitational field or in a state of uniform
acceleration.
Remark. Using accelerated coordinates
1 2
~x′ = ~x − t ~g .
2
in a homogeneous gravitational field we find that the gravitational field can
be compensated by the acceleration, i.e., ~x ¨′ = 0. Therefore, it is impossi-
ble to determine whether a laboratory is an inertial frame (in the absence of
gravitational fields) or freely falling in a homogeneous gravitational field.

This strict ‘equivalence of gravitation and acceleration’ is restricted to the case


of homogeneous gravitational fields. If there are tidal forces (i.e., non-vanishing
second derivatives of the potential), then, by considering separate points, it
becomes possible to distinguish between effects of the gravitational field and
effects from apparent forces in an accelerated frame.
Remark. Particularly, it becomes possible to distinguish between inertial frames
(in the absence of gravitational fields) and freely falling frames of reference in
a gravitational field.
Example. Consider a sphere consisting of (thousands of) particles. If these
particles are at rest w.r.t. some inertial frame in the absence of gravitational
fields, then this spherical configuration of particles remains unchanged forever.
Now consider the same configuration in free fall in the gravitational field of
the Earth; we assume that the particles are initially at rest. The trajectories
of the particles are radial, i.e., straight lines meeting at the center of gravity

–148– version 20/01/2010


Chapter 9. Beyond Minkowski The equivalence principle

of the Earth. Furthermore, the acceleration of a particle that is closer to the


center of the Earth is larger than that of a particle farther away from the
center. It is straightforward to see that what is initially a sphere of particles
becomes distorted after some time—the sphere becomes a prolate spheroid.
Therefore, by considering separate points, it is possible to detect the presence
of a gravitational field despite the fact that the configuration is in free fall.

However, ‘equivalence of gravitation and acceleration’ is still approximately


true, at least locally, i.e., in regions where (and as long as) the gravitational
field is approximately constant. Let us make this statement more specific.

Consider 
¨(t) = φ
~x ~ t, ~x(t) + K
~ , (9.6)
which models the motion of a particle in a gravitational field φ(t, ~x), where
we include the possibility of an additional force.3 For the majority of thought
experiments, the gravitational field is assumed to be time-independent, i.e.,
~ = φ(
φ ~ tC , ~x); this is convenient but not necessary. Let ~x(t) be a particular solu-
tion (obtained, e.g., by prescribing initial conditions) and define, in slight abuse
of notation, φ(t) ~ =φ ~ t, ~x(t) . When we take a trajectory ~x(t) of (9.6) that is
 
sufficiently close to ~x(t) at t = t, then φ ~ t, ~x(t) ≈ φ
~ t, ~x(t) = φ(t)
~ in the time
interval containing t where ~x(t) is sufficiently close to ~x(t). Accordingly, (9.6)
reads
¨(t) ≈ φ(t)
~x ~ +K ~ (9.7)
in this time interval. Considering particles with initial data increasingly close
to the initial data of ~x(t) at t, which is ~x(t) and ~x˙ (t), equation (9.7) holds for
increasingly long times.

Let us define an accelerated frame of reference by


Z Z

~x = ~x + dt dt φ(t)~ .

We then obtain
¨′ (t) = ~x
~x ¨(t) + φ(t)
~ .
Now consider Newton’s second law ~x ¨(t) = K ~ in the absence of gravitational
fields. When expressed in the accelerated frame we get
¨′ (t) = φ(t)
~x ~ +K ~ . (9.7′ )
3 ~ so
For instance, imagine a (small) ball sitting on a table; the table provides the force K
that the ball doesn’t move. In the general case, K ~ depends on time and on the spatial
position; we choose to suppress this dependence. Free fall corresponds to K ~ ≡ ~o.

version 20/01/2010 –149–


The equivalence principle Chapter 9. Beyond Minkowski

Comparing (9.7) and (9.7′ ) we conclude that in a sufficiently small neighbor-


hood of ~x(t), for some time, the effects of a gravitational field are virtually
indistinguishable from the effects of an apparent force in an accelerated frame
(where gravitation is absent).

Summing up, we observe ‘equivalence of gravitation and acceleration’ as a local


phenomenon; in a small neighborhood of each trajectory, for short times, the
gravitational effects can be reinterpreted, at least up to a certain accuracy, as
effects of apparent forces in an accelerated frame. The closer the trajectory is
followed, the better the ‘equivalence’.
Remark. Using accelerated frames we are able to compensate the effects of
gravity; these are the freely falling frames of reference. Along each separate
trajectory, the gravitational field can be transformed to zero in these coordi-
nates, i.e., a freely falling frame is equivalent to an inertial frame along the
trajectory under consideration (and along this trajectory alone). In a (small)
neighborhood of this trajectory, this strict equivalence is broken (when tidal
forces are present); however, the statement is still true in an approximative
sense: Define the freely falling frame of reference by

~x′ = ~x − ~x(t) ,

where ~x(t) is a solution of (9.6) with K ~ = ~o. Then (9.6) yields


   
¨′ (t) = ~x
~x ¨(t) − ~x
¨(t) = φ
~ t, ~x(t) − φ
~ t, ~x(t) = φ
~ t, ~x(t) + ~x′ (t) − φ
~ t, ~x(t) .

For small ~x′ (t) the r.h.s. is approximately zero, i.e.,


¨′ (t) ≈ 0 ,
~x

which implies that the freely falling frame in a gravitational field is, in a neigh-
borhood of the trajectory under consideration, equivalent, to zeroth order, to
an inertial frame (in the absence of gravitational fields). More specifically we
obtain
 
ẍ′i (t) = φi,j t, ~x(t) x′j (t) + O(~x′2 ) = −Φ,ij t, ~x(t) x′j (t) + O(~x′2 ) .

We conclude this section by addressing an additional aspect of the ‘entangle-


ment’ between gravitation and acceleration: It is impossible from local obser-
vations to separate the effects of a gravitational field from those of an apparent
force in an accelerated frame. Let us elaborate. Suppose the existence of a
gravitational field
~ x) ,
φ(~ (9.8a)

–150– version 20/01/2010


Chapter 9. Beyond Minkowski The equivalence principle

which we assume to be time-independent for simplicity. The motion of point


particles in this gravitational field is determined by
¨ = φ(~
~x ~ x) . (9.8b)

Consider a different gravitational field



~ ~x) = φ
ψ(t, ~ ~x + 1 t2 ~g − ~g , (9.9a)
2

where ~g is constant. The equation of motion of point particles in this gravita-


¨ = ψ(t,
tional field is ~x ~ ~x).

Instead of inertial coordinates ~x consider an accelerated frame of reference, i.e.,


1 2
~x′ = ~x + t ~g .
2
W.r.t. this accelerated frame of reference, the equation of motion of point
~ becomes
particles in the gravitational field ψ

¨′ = ~x
~x ¨ + ~g = ψ(t,
~ ~x) + ~g = φ
~ ~x + 1 t2 ~g − ~g + ~g ,
2

i.e.,
¨′ = φ(~
~x ~ x′ ) . (9.9b)
This equation is identical to (9.8b).

Let us summarize. The gravitational field (9.8a) yields the equation of mo-
tion (9.8b); the gravitational field (9.9a), which corresponds to
~ ′ (~x′ ) = ψ(t,
ψ ~ ~x) = φ(~
~ x′ ) − ~g (9.9a′ )
in accelerated coordinates, leads to the same equation of motion, i.e., equa-
tion (9.9b), w.r.t. an accelerated frame.

We conclude that it is impossible to distinguish through local experiments


whether scenario (9.8) or (9.9) is the ‘true one’: Do we perform our experiments
in an inertial frame of reference, where the gravitational field is φ, ~ or in a
uniformly accelerated frame (with acceleration ~g) where the gravitational field
~=φ
is ψ ~ − ~g ?
Remark. Recall that if and only if φ ~ is constant, i.e., in the case of a homo-
geneous gravitational field, it is possible to get rid of the gravitational field
~ = ~o.4 (But note that, in general, φ
entirely, i.e., ψ ~ is not homogeneous.)
4
It is justified to say that a homogeneous gravitational field is not a real gravitational field,
since it disappears through a coordinate transformation.

version 20/01/2010 –151–


The equivalence principle Chapter 9. Beyond Minkowski

There is an important exception, however. The gravitational field of an isolated


body satisfies ‘boundary conditions at infinity’, i.e., there are decay conditions;
Φ → 0 as |~x| → ∞. If the gravitational field is that of an isolated body, then
this global piece of information removes the ambiguity between (9.8) or (9.9).
The decay condition fixes the arbitrary constant in (9.9a′ ).

Beyond Newtonian gravity

In the (special) relativistic context, the equivalence principle is rather mean-


ingless as long as we lack a relativistic theory of gravity. ‘The motion of a
test particle in a gravitational field depends solely on its initial conditions (i.e.,
position and velocity).’ What is a gravitational field in relativity5 ?

Let us postpone this question. Instead, let us simply assume that the relativistic
theory of gravity is a theory in which the equivalence principle is implemented
in a well-defined manner. Let us further assume that the equivalence principle
amounts to ‘local equivalence of gravitation and acceleration’, which implies
that results on accelerated frames of reference will shed light on effects in a
relativistic theory of gravity.

The world line of a uniformly accelerated observer6 is a hyperbola in Minkowski


space, cf. (7.31). W.l.o.g. we assume the motion to be in the x-direction of a
(fixed) inertial frame of reference (whose coordinates we denote by {t, x, y, z}).
Then
   
0 sinh as
x̊ − 1  1 cosh as
 a  
 ẙ  + a  0  with s ∈ R and x̊, ẙ,z̊ arbitrary (9.10)
z̊ 0

is a family of uniformly accelerated observers (with proper acceleration a). As


seen in (9.4) et seq., a uniformly accelerated frame of reference in Galilean
spacetime is equivalent to a time-independent homogeneous gravitational field
of Newtonian gravity. It thus seems natural to guess that the family (9.10)
of observers, which represents a uniformly accelerated frame of reference in
5
And what is a test particle? It turns out that the latter question is even trickier than the
former.
6
In special relativity we used to use the word ‘observer’ as a shorthand for ‘inertial observer’;
we break with this convention. Henceforth, by ‘observer’ we simply mean an ‘idealized
physicist’ characterized by a timelike world line; a ‘family of observers’ is a congruence of
timelike world lines that gives rise to local coordinates.

–152– version 20/01/2010


Chapter 9. Beyond Minkowski The equivalence principle

Minkowski spacetime, is equivalent to a time-independent homogeneous grav-


itational field in a relativistic theory of gravity. In other words, is it possible
to reinterpret the world lines (9.10) as the world lines of observers ‘sitting at
fixed positions’ in a gravitational field? Interestingly enough, the answer is no.

Consider world lines of the form (9.10) that are infinitesimally separated, i.e.,
     
1 sinh as 0 1 sinh as′
and + ; (9.11)
a cosh as dx a cosh as′

we suppress the trivial y- and z-components. The proper distance dℓ is obtained


by solving
       
1 sinh as sinh as 0 1 sinh as′
+ dℓ = + ,
a cosh as cosh as dx a cosh as′

because (sinh as, cosh as)T is the spatial frame vector (i.e., the normalized
spacelike vector orthogonal to the four-velocity). The result is

dℓ = (cosh as) dx . (9.12)

We infer that the (proper) distance between neighboring world lines is growing
with time. This is despite the fact that the acceleration along the two world
lines is identical.
Remark. Imagine two observers on two world lines (9.11) that are separated
by some (finite) initial distance. To get a measure of their distance, observer
one sends light signals to observer two, who signals back instantaneously upon
receipt. Observer one measures the times that pass between emission and
reception. Interestingly enough, these times increase over all bounds; in fact,
after some time, the signals sent by observer two cannot even reach observer
one any longer (which is a simple consequence of the fact that the asymptotes
of the two world lines are parallel null lines). Note that this is despite the fact
that the two observers experience the same acceleration.

It is clear that the property of increasing distances between observers is in-


consistent with an interpretation of the world lines (9.11) as the world lines
of observers ‘sitting at fixed positions’ in a gravitational field. We see that
the strict equivalence of acceleration and gravitation (of uniformly accelerated
frames and time-independent homogeneous fields) that is present in Newtonian
theory is absent (or modified) in relativity.

version 20/01/2010 –153–


The equivalence principle Chapter 9. Beyond Minkowski

For our present purposes this does not matter. The core of the equivalence
principle, which is the local equivalence of gravitation and acceleration, is un-
touched by these considerations (which are of course of a global nature). Let us
be persistent nonetheless; let us find a different family of accelerated observers.

We require a family of uniformly accelerated observers whose pairwise distances


remain constant. In analogy with (9.11) consider one particular world line
 
µ 1 sinh as
x (s) = (9.13)
a cosh as

that corresponds to a uniformly accelerated observer. Note that the four-


velocity is (cosh as, sinh as)T while the spatial frame vector is axµ (s). As an
alternative to (9.13) we may characterize the world line by the condition
1
xµ xµ = (9.13′ )
a2
(and the vanishing of the y- and z-component). Let d ∈ R and consider the
set of events whose proper distance to the world line (9.13) is |d|, i.e.,
n o
x̄µ (s) = xµ (s) + d a xµ (s) , s ∈ R . (9.14)

Then  1 + a d 2
x̄µ x̄µ = (1 + a d)2 xµ xµ = ,
a
hence (9.14) is again a hyperbola and thus represents the motion of an observer
with constant acceleration.7 However, the acceleration differs from the original
acceleration a; it is
a
ā = . (9.15)
1 + ad
Conversely, the pair of world lines
1 1
xµ xµ = , x̄µ x̄µ = (9.16)
a2 ā2
is equidistant (with d = 1 − ā/a). For two uniformly accelerated observers to
retain a constant distance, the accelerations must be different.

Summarizing we see that the family of hyperbolas


 µ µ
x x xµ = const > 0 (9.17)
7
A posteriori we make the restriction d > −a−1 .

–154– version 20/01/2010


Chapter 9. Beyond Minkowski The equivalence principle

in some timelike 2-plane (e.g., the plane spanned by {t, x}) represents a family
of uniformly accelerated observers (where, however, each observer experiences
a different acceleration) such that the pairwise distances between the observers
remain constant.

Take a particular world line (9.13). Consider, for |σ| < a−2 , the set
n o
x̄µ (s) = xµ (s) + σ ẋµ (s) , s ∈ R . (9.18)

Each event x̄µ (s) is separated from xµ (s) by a constant amount of proper time
σ along ẋµ (s). We find
1
x̄µ x̄µ = xµ xµ + σ 2 ẋµ ẋµ =− σ 2 = const ,
a2
i.e., the world line x̄µ (s) is again a hyperbola. Therefore, two world lines
of (9.17) are spatially and temporally equidistant.

It is tempting to take a leap of faith and interpret the family of accelerated


observers (9.17) as a family of observers, each at a fixed point, in a time-
independent gravitational field. A straight line, on the other hand, would then
correspond to an inertial observer; we would interpret this observer as being
in free fall in the gravitational field. But there is a ‘but’: The ‘gravitational
field’ so constructed is trivial. In the Newtonian case, a time-independent
homogeneous gravitational field uniquely corresponds to a family of accelerated
observers (in Galilean space without gravity). It is thus valid to say that
this gravitational field is not a proper gravitational field at all. Likewise, in
the relativistic case, we cannot construct a real gravitational field by simply
considering a family of accelerated observers in Minkowski space. Minkowski
space is Minkowski space, irrespective of whether we use inertial coordinates
or accelerated coordinates. And Minkowski space is a spacetime where gravity
is absent. We will come back to this issue in section 9.4.

However, we may repeat that, for our present purposes, this does not matter.
The core of the equivalence principle, which is the local equivalence of grav-
itation and acceleration, is untouched by these considerations (which are of
course of a global nature). We have good reasons to believe that a single ac-
celerated observer, represented by a particular world line in Minkowski space,
is completely equivalent to an observer experiencing accelerations caused by a
gravitational field. Furthermore, locally, in a neighborhood of that observer,
and for some finite time, the effects stemming from the acceleration of the
frame of reference, are approximately equivalent to the effects stemming from
gravity.

version 20/01/2010 –155–


Clocks and light in a gravitational field Chapter 9. Beyond Minkowski

In the subsequent section we use the equivalence principle to gain a number of


insights on gravitational effects. These considerations are local in nature, so
that the equivalence principle is expected to hold (at least approximately).

9.2 Clocks and light in a gravitational field

We do not yet have a relativistic theory of gravity at hand. There is only one
property of that theory we expect to hold (and which we thus assume): The
equivalence principle, i.e., the local (approximate) equivalence between gravi-
tation and acceleration discussed in section 9.1. Despite this severe restriction
we are able to derive a number of results.

Imagine the Piazza del Duomo (Piazza dei Miracoli) in Pisa and Galileo Galilei
standing on top of the Leaning Tower and performing free fall experiments. But
let us twist history and imagine that Galileo intends to drop . . . clocks. On the
lawn at the base of the Leaning Tower, Galileo’s assistant, let’s call him Albert,
who is like Galileo equipped with precise chronometers, is eager and ready to
begin with the measurements.

Suppose that the clocks Galileo drops are perfectly synchronized with his
own chronometer—a millisecond on the clocks is a millisecond on Galileo’s
chronometer. The gravitational field of the Earth accelerates a falling clock
until it reaches the velocity v at the base of the tower. Shortly before Albert
breaks the fall of the clock, he makes time measurements by comparing the
clock’s time with his chronometer’s time. The result is fascinating: The clock
and Albert’s chronometer are asynchronous and, really and truly, the clock
runs faster than the chronometer by a factor of

1 v2
1+ .
2 c2
Galileo is sceptical of the findings his assistant relates to him upon his return
at the base of the tower; he does not comprehend the results. But surprisingly,
his assistant understands. “This is how it works,” says Albert, “we use the
equivalence principle to explain the results.”

Galileo is a stationary observer in a gravitational field; the acceleration expe-


rienced by Galileo is g. By the equivalence principle, Galileo can be equiva-
lently regarded as a uniformly accelerated observer (with acceleration g); in
Minkowski space this corresponds to a hyperbolic world line with acceleration

–156– version 20/01/2010


Chapter 9. Beyond Minkowski Clocks and light in a gravitational field

g, see section 9.1. Albert and the falling clock are spatially close to Galileo
and the experiment is rather short; the condition of locality in the equivalence
principle is thus satisfied and we may regard Albert as another uniformly accel-
erating observer, represented by another hyperbola in Minkowski space. The
freely falling clock, on the other hand, is to be identified with an inertial clock
by the equivalence principle; free fall in a gravitational field is equivalent to
inertial motion, which corresponds to a straight line in Minkowski space. Since
the clock is inertial we are permitted to apply our collective special relativistic
reasoning to measurements made w.r.t. this clock. At the moment Galileo drops
the clock, the relative velocity between Galileo’s chronometer and the clock is
zero. Hence, obviously, there is no time dilation between the two. However,
shortly before Albert catches the clock, the relative velocity between Albert’s
chronometer and the clock is v. Hence, as seen from the clock’s perspective,
the chronometer undergoes a time dilation by the factor of
r
v2 1 v2
1− 2 ≈1− ,
c 2 c2
see the considerations of section 4.3. Note that the clock is inertial, while the
chronometer is not (cf. the twin paradox in section 4.4).

We conclude that gravity influences the course of (proper) time; ‘deeper down’
in the gravitational field, time runs slower than ‘higher up’.

Albert recommends to Galileo to solve an exercise to better understand the


above: “Make a Minkowski diagram that represents the experiment. And note
that the clock’s world line is tangent to the hyperbola that represents you.”

Galileo is not convinced but suggests a second experiment. He proposes to


simply take a clock, which is initially synchronized with the chronometers,
throw it up into the air, as high as the tower, and catch it again when it falls
back to the ground. No sooner said then done. Galileo throws the clock up into
the air and Albert manages to catch it again some seconds later. The readings
corroborate Albert’s ideas: The clock is ahead of the chronometers by some
tiny fractions of a second. Galileo is astounded; he had thought the results of
the first experiment to be due to some virtual effect resulting from the fact that
the measurements are taken at different locations, on the base and on the top of
the tower, and that these measurements should not be directly compared. Now,
however, Galileo holds in his hands two clocks that do no longer show the same
time despite the fact that had been perfectly synchronized initially. “Piece
of cake,” says Albert, “just remember the twin paradox and the equivalence
principle.”

version 20/01/2010 –157–


Clocks and light in a gravitational field Chapter 9. Beyond Minkowski

Galileo and Albert correspond to a stationary observer in a gravitational field;


the acceleration is g. By the equivalence principle, they can be equivalently
regarded as a uniformly accelerated observer that is represented in Minkowski
space by a hyperbolic world line. The clock, on the other hand, is freely falling
on its entire trajectory; both on its way up and on its way down there are no
forces except gravity that act on the clock. Free fall in a gravitational field
is equivalent to inertial motion; therefore, the clock corresponds to a straight
line, which represents inertial motion, in Minkowski spacetime. Albert correctly
concludes that the experimental set-up can be represented, with good accuracy,
since the condition of locality is satisfied, by a hyperbola, representing Galileo
and Albert, and a secant of this hyperbola, representing the clock. The result
then follows straightforwardly from the standard considerations on the twin
paradox. The inertial twin, i.e., the clock, is ‘older’ than the accelerated twin,
i.e., Galileo and Albert. The time that has passed on the clock during its
round trip is greater than the time that has passed on Galileo’s and Albert’s
chronometers: The clock is ahead of the chronometers by approximately
Z tf  r Z tf
v 2 (t)  1 v 2 (t)
1− 1− 2 dt ≈ 2
dt ,
ti c ti 2 c

where ti and tf are the initial and final time, i.e., throw and catch, respectively;
v(t) is the relative velocity; |v(ti )| = |v(tf )| and v = 0 at the turning point; see
section 4.4.

Still hesitant to accept Albert’s explanations, Galileo takes matters into his
own hands. While he mounts the tower again, he orders Albert to stay at
the foot of the tower until he will return. After some hour, Galileo descends
again; with some despair he compares his chronometer with Albert’s, which
had remained at the ground for the entire time. And there it is again, the time
difference. While on Galileo’s chronometer the time ∆t has passed, Albert’s
chronometer shows that only
 gh 
1 − 2 ∆t
c
has passed for Albert; h is of course the height of the Leaning Tower. “Did
you tamper with your chronometer?” Galileo cries accusingly. But Albert is a
physicist of impeccable character. “Let me explain,” he says.

Galileo and Albert are observers who take fixed positions in a stationary gravi-
tational field. The equivalence principle tells us that, equivalently, Galileo and
Albert are represented by uniformly accelerated observers in Minkowski space,

–158– version 20/01/2010


Chapter 9. Beyond Minkowski Clocks and light in a gravitational field

i.e., by two hyperbolic world lines. The condition of locally in the equivalence
principle is satisfied, since the spatial separation, i.e., h is small. We take
the pair of hyperbolas (9.16) to be Galileo’s and Albert’s world lines, respec-
tively, where a is replaced by g and the distance d corresponds to the height h.
Therefore, we have
 
1 sinh gsA
Albert:
g cosh gsA
 
1 sinh ḡsG
Galileo: ,
ḡ cosh ḡsG
where sA and sG are Albert’s and Galileo’s proper time, respectively; further-
more, from (9.15), where we use SI units, we find
g  gh 
ḡ = ≈ g 1 − .
1 + gh/c2 c2
Suppose that Galileo leaves Albert when sA = 0 and sG = 0. To compute the
proper times sA and sG that have passed until Galileo’s return, we equate the
two time components, i.e.,
1 1
sinh(gsA ) = sinh(ḡsG ) ;
g ḡ
in this context we have neglected the time Galileo needs to ascend the tower
and related subtleties. We thus need to solve
 gh 
sinh(ḡsG ) ≈ 1 − 2 sinh(gsA ) ,
c
which yields
 gh
ḡ sG ≈ g sA − tanh(gsA ) 2 ≈ g sA
c
and thus
 gh 
sG ≈ sA 1 + 2 ,
c
 gh 
sA ≈ sG 1 − 2 .
c
This is in perfect accordance with the result of the experiment.

“You seem sure of yourself, Albert,” Galileo observes, “but, according to the
first experiment, should not the relation
r
v2
sA ≈ sG 1 − 2
c

version 20/01/2010 –159–


Clocks and light in a gravitational field Chapter 9. Beyond Minkowski

hold? Don’t you remember the time dilation factor you computed with the
clock’s terminal velocity v?” “Sure, I do. Obviously, we have
v = gt and h = 1
2 gt2 .
It follows that p
v= 2gh
and we obtain
r
v2 1 v2 1 2gh gh
1− 2
≈ 1 − 2
=1− 2
=1− 2 .
c 2c 2 c c
Quod erat demonstrandum.”

Finally, Galileo is convinced: “Time in a gravitational field is relative! But I’m


beginning to wonder whether we couldn’t have obtained similar results by using
light. Come Albert, let us try.” Galileo procures a light source emitting light at
a precisely specified frequency and mounts the tower again. From the top of the
tower Galileo signals downward to Albert, who measures the frequency of the
light beam he receives. Interestingly enough, the frequency Albert measures is
larger by a factor of
gh
1+ 2
c
than Galileo’s frequency. A short debate between Albert and Galileo ensues,
whereupon Galileo announces: “I’ve come to terms with the equivalence prin-
ciple. So this time, I’ll give the explanation myself.”

By the equivalence principle, Galileo is represented by a hyperbolic world line


in Minkowski space and Albert by a nearby one. The beam of light, on the
other hand, is in ‘free fall’ and thus corresponds to a null line in Minkowski
space intersecting Galileo’s world line (at the time of emission) and Albert’s
world line (at the time of receipt). The time the light takes to reach the ground
is h/c. Albert experiences the acceleration g during this time; hence, in the
Minkowski picture, he has reached the velocity
gh
v=
c
when the beam of light hits him. In this picture we may simply apply for-
mula (5.7) of section 5.1, i.e., we regard the frequency shift as being due to the
(longitudinal) Doppler effect. The frequency νA that Albert measures is
s r
1 + v/c v  v  v
νA = νG ≈ 1+ 1+ νG ≈ 1 + νG .
1 − v/c c c c

–160– version 20/01/2010


Chapter 9. Beyond Minkowski Clocks and light in a gravitational field

Inserting v we obtain
 gh 
νA ≈ 1 + 2 νG ,
c
which reproduces the measurement perfectly. Furthermore, this result is in
perfect accord with the previous considerations on time. Since
 gh 
sA ≈ 1 − 2 sG
c
we expect that frequencies behave like the reciprocals of sA and sG , i.e.,
 gh −1  gh 
νA ≈ 1 − 2 νG ≈ 1 + 2 νG .
c c
A consistent picture of the gravitational redshift effect emerges. Light traveling
from ‘deeper down’ to ‘higher up’ in a gravitational field is redshifted; on the
opposite path there is a blueshift.

Galileo and Albert decide to pack their gear and return to their humble abode
in the poorer quarters of Pisa. On their way home, Albert is deeply immersed in
thought. Finally, he asks: “Master Galilei, may I confront you with a Gedanken-
experiment of mine? It concerns another property of light, its bending in the
gravitational field of the Earth.”

Suppose that the Earth is flat, which implies that the gravitational field is
exactly the same (in magnitude and direction) along the surface. On this flat
Earth Galileo and Albert stand some hundred meters apart from each other.
Galileo sends a beam of light in the direction toward Albert; the initial height
of the beam is precisely specified, and, at the point of emission, the beam is
exactly parallel to the Earth’s surface. Interestingly enough, the height of the
beam, when Albert receives it, is less than the original height by

1 g d2
,
2 c2
where d is the distance between Galileo and Albert.

It is again the equivalence principle that provides an explanation. Instead of


the Earth and its gravitational field, imagine a planar surface that is uniformly
accelerating through space, with Galileo and Albert on it. If the distance
between the two is d, then the time the beam of light takes on its voyage from
Galileo to Albert is
d
t= .
c

version 20/01/2010 –161–


Clocks and light in a gravitational field Chapter 9. Beyond Minkowski

During this time, Albert is uniformly accelerating with an acceleration g.


Therefore, his position w.r.t. the original position at the time of emission of
the light ray has changed by

1 2 1 g d2
∆h = gt = ,
2 2 c2
as claimed.

It is a little more complicated but instructive to consider the same Gedanken-


experiment in the Minkowski space picture. Galileo’s world line, the emitter’s
word line, to be exact, is represented by a hyperbola in Minkowski space; since
the set-up requires two spatial dimensions it becomes necessary to picture a
three-dimensional Minkowski space spanned by ht, x, yi. W.l.o.g. we choose
the emitter’s hyperbolic world line to lie in the ht, xi plane and its vertex to
coincide with the origin; furthermore, the origin is assumed to be the time of
emission. Albert cannot be treated as a point particle; indeed, to measure the
beam of light he requires a screen; the beam of light creates a spot on this
screen whose height can then easily be measured. The screen corresponds to
a family of hyperbolas which are parallel to Galileo’s world line; the vertices
of these hyperbolas are given by t = 0 and y = d, while x varies. To see
this we simply note that x is the direction of acceleration in the Minkowski
picture, which means that x is connected with the height in the gravitational
field. (Of course, it is not the height itself; the family of accelerated observers
define equal heights; the height is thus the x-coordinate of a world line at its
point of intersection at t = 0.) The beam of light emitted from the origin is
represented by a null line; it is the null line in the ht, yi plane (which is tangent
to the emitter’s world line); this requirement corresponds to the condition that
the beam be parallel to the ground initially. The light ray intersects Albert’s
screen at the point
d
t= , x =0, y =d.
c
Tracing back this point along its hyperbolic world line to t = 0 yields

g d2
t=0, x=− , y =d.
2 c2
Since, at t = 0, the Minkowski coordinate x measures the height in the grav-
itational field, the height of the spot that the beam of light produces on the
screen is less, by
g d2
,
2 c2

–162– version 20/01/2010


Chapter 9. Beyond Minkowski Metrics

than the original height (since emission was at t = 0, x = 0). The Minkowski
picture thus reproduces, in a more formal way, the previous result.

Galileo approves of Albert’s ideas and entrusts him with an exercise. “Turn
the Gedankenexperiment into an actual experiment; allow for the fact that the
surface of the Earth is curved.”

A day of exciting experiments comes to a close. Bidding Albert a goodnight,


Galileo retires into his chamber. But his sleep is restless. Fragments of formulas
appear and vanish in front of his inner eye. The term gh of

gh

c2
is particularly insistent and irritating; it dances in his mind and cries: “I’m the
gravitational potential, haven’t you noticed?” And the nightmare continues.
Galileo rides on a beam of light; he leaves the Earth’s surface and travels higher
and higher until he sees the Earth as a blue ball floating in space. “Compute
my redshift”, the light ray commands. Galileo is willing to obey but he fails.
And Albert steps forward from a dark corner of space and says: “Never, never
ever, try to push the boundaries of the equivalence principle. Be aware of its
limitations. Equivalence of gravitation and acceleration is merely local. Global
considerations of this kind are bound to fail.” And of out nothing a signpost
appears that says: “Realm of General Relativity. Border crossing.”

9.3 Metrics

Definition 9.1. A metric is a sufficiently smooth non-degenerate symmetric


tensor field of rank (0, 2) on a differentiable manifold. A spacetime is a four-
dimensional manifold equipped with a Lorentzian metric.

In the following we discuss the concepts this definition is based on without going
into too much detail.8 The notion of a manifold is a fundamental concept
in differential geometry. We avoid a formal definition and simply say that
a manifold of dimension n is a ‘space’ that locally looks like (an open set
of) Rn , i.e., the defining property of a manifold of dimension n is to admit
8
These lecture notes are not the right place to discuss the mathematics underlying gen-
eral relativity in detail. The reader is referred to the lecture course ”Einführung in die
Relativitätstheorie und Kosmologie II” instead.

version 20/01/2010 –163–


Metrics Chapter 9. Beyond Minkowski

coordinate systems of n real coordinates, at least locally, at every point.9 A


simple example is the space R2 (or Rn ), which is automatically equipped with
a (global) coordinate system. Similarly, every vector space and affine space of
dimension n is a manifold of dimension n. Another two-dimensional manifold
is the sphere S 2 ; it is not difficult to equip S 2 with coordinates.10 A similar
example is the torus T 2 . In fact, every sufficiently well-behaved surface in R3
(or R4 , like the Klein bottle) is a two-dimensional manifold. More generally,
hypersurfaces of dimension n in Rn+1 (or RN , N > n + 1) are n-dimensional
manifolds. It is possible, although probably not advisable, to visualize every
n-dimensional manifold as a hypersurface in RN .

To (intuitively) understand the concept of tensor fields let us begin by consider-


ing vector fields. Consider a (fixed) point x of an manifold M and a coordinate
system with coordinates {x1 , . . . , xn }. Picture the coordinate system as a co-
ordinate grid covering a neighborhood of the point. For a function f defined
on M the expression
∂f
∂xi f x or (9.19)
∂xi x
is well-defined; it measures the change of the function f along the grid line
denoted by xi at the point x; note that the coordinate line xi is characterized
by xj = const for all j different from i. The expression (9.19) is the directional
derivative of the function f along the coordinate line xi . We interpret the
derivative

∂xi x or (9.20)
∂xi x
as a vector at the point x ∈ M . A coordinate grid defined by n coordinates
gives rise to n linearly independent vectors
∂ ∂ ∂ ∂
, , , . . . , .
∂x1 x ∂x2 x ∂x3 x ∂xn x
Using these vectors as the basis we find that an arbitrary vector at x ∈ M
reads

v(x) = v i (x) i .
∂x x
Using the language of differential geometry we define vectors as derivations on
smooth functions.
9
Consider a set M . A coordinate system—or chart— is simply an injective map ϕ of an
open subset of Rn into M . The existence of a family of compatible charts (‘atlas’), where
compatibility of two charts ϕ1 , ϕ2 means that ϕ−1
2 ◦ ϕ1 is a diffeomorphism on an open
set, makes M a manifold of dimension n.
10
Note, however, that at least two coordinate patches are necessary to cover the entire
sphere S 2 .

–164– version 20/01/2010


Chapter 9. Beyond Minkowski Metrics

Example. A simple example is the two-dimensional plane. Choosing an origin


and a basis {e1 , e2 } we obtain a coordinate system; the coordinate grid consists
of straight lines parallel to e1 and e2 . Therefore,
∂ ∂
= e1 = e2
∂x1 x ∂x2 x
for every point x.
Example. Consider the manifold S 2 as embedded into R3 . A vector v(x) at a
point x ∈ S 2 is a tangent vector to the sphere; it is a vector in R3 but not, in any
way, an element of S 2 . However, by considering coordinate grids and directional
derivatives we can give these tangent vectors an intrinsic characterization that
does not make use of the ambient vector space R3 .
Remark. The use of ‘directional derivatives’ as (basis) vectors is not merely
a mathematical quirk. In general there does not exist any ambient vector
space that simplifies matters, which in turn compels us to use concepts that
are defined intrinsically. Note that the universe is a four-dimensional manifold
that is not embedded in a five- or higher dimensional vector space.

When x is not fixed but regarded as varying on the manifold we obtain a vector
field. Hence a vector field on a manifold M corresponds to a vector at each
point, i.e.,

v = vi i , (9.21)
∂x
where v i depends on the position on M .

The basis vector fields


∂ ∂ ∂ ∂
1
, 2
, 3
, ..., n . (9.22)
∂x ∂x ∂x ∂x
form a so-called coordinate frame.
Remark. It is important to note that a particular coordinate frame is in general
not defined globally on the manifold—it is clear that its domain is the domain
of the coordinate system (i.e., the chart).11

The representation of a vector field depends on the chosen coordinate system.


A change of coordinates induces a change of basis. Let {x1 , . . . , xn } be a
coordinate system and {x̄1 , . . . , x̄n } be a different coordinate system. Then
∂ ∂ x̄j ∂
= (9.23)
∂xi ∂xi ∂ x̄j
11
A good example is the manifold S 2 . Since at least two charts are needed to cover S 2 , there
are at least two different coordinate frames needed to represent vector fields globally.

version 20/01/2010 –165–


Metrics Chapter 9. Beyond Minkowski

by the chain rule. The vector field v reads

∂ ∂
v i (xk ) and v̄ i (x̄k )
∂xi ∂ x̄i
w.r.t. the first and the second coordinate system, respectively. It is straight-
forward to see that
∂ x̄i j k
v̄ i (x̄k ) = v (x ) . (9.24)
∂xj
Compare (9.23) and (9.24) with (A.18) and (A.19) by setting Aij = ∂ x̄i /∂xj .
The transformation behavior (9.24) of vector fields under a change of coordi-
nates generalizes the transformation behavior of scalar field (A.61).

To define covector fields (a.k.a. 1-forms) on a manifold M we use the coordinate


coframe
dx1 , dx2 , dx3 , . . . , dxn . (9.25)
It generalizes the concept of a dual basis, cf. (A.8′ ), since we require

∂ 
dxi = δij .
∂xj
Hence, at each point, a covector maps vectors to the real numbers. A covector
field corresponds to a covector at each point, i.e.,

a = ai dxi , (9.26)

where ai depends on the position on M . A covector field a takes a vector field


v and generates a function on M via

∂ 
a(v) = ai dxi v j = ai v i . (9.27)
∂xj

The transformation of covector fields under a change of coordinates is as intu-


itive as the transformation of vector fields. We have

∂xi j
dxi = dx̄ (9.28)
∂ x̄j

for the coframe and, from ai dxi = āi dx̄i ,

∂xj
āi (x̄k ) = aj (xk ) . (9.29)
∂ x̄i

–166– version 20/01/2010


Chapter 9. Beyond Minkowski Metrics

This is in complete analogy with (A.20) and (A.22), when we note that
∂xi  ∂ x̄i −1
= .
∂ x̄j ∂xj

The relation (9.27) is independent of the choice of coordinates. To see this


explicitly we argue that
∂xj ∂ x̄i k ∂xj ∂ x̄i
āi v̄ i = aj v = aj v k = δjk aj v k = aj v j .
∂ x̄i ∂xk ∂ x̄i ∂xk

Tensor fields are obtained via the tensor product of vector and covector fields.
For example,
∂ ∂ ∂
T = T ij klm i
⊗ dxj ⊗ k ⊗ l ⊗ dxm (9.30)
∂x ∂x ∂x
is a tensor of rank (3, 2). The transformation of tensor fields under a change
of coordinates generalizes the transformation of vector and covector fields in
the obvious way: Each contravariant (i.e., upper) index follows (9.24), each
covariant (i.e., lower) index behaves like (9.29), i.e.,
′ ′
∂ x̄i ∂xj ∂ x̄k ∂ x̄l ∂xm ′ ′ ′
T̄ ij klm (x̄n ) = i‘ j k‘ l‘ m
T i j ′ k l m′ (xn ) . (9.31)
∂x ∂ x̄ ∂x ∂x ∂ x̄

The metric is a tensor field of rank (0, 2), i.e.,12


g = gij dxi ⊗ dxj ; (9.32)
the components are functions. Symmetry means that
gij = gji . (9.33)
Non-degeneracy means that the inverse metric, which we denote by gij , exists,
i.e.,
gij gjk = δik . (9.34)

It is common to call a metric a line element ds2 . An equivalent notation


for (9.32) is
ds2 = gij dxi ⊗ dxj . (9.32′ )
12
When we go over to four-dimensional manifolds we typically use Greek indices, i.e.,
g = gµν dxµ ⊗ dxν .

version 20/01/2010 –167–


Metrics Chapter 9. Beyond Minkowski

It is customary to omit the tensor product signs, i.e., to write

ds2 = gij dxi dxj . (9.32′′ )

A metric generalizes the concept of a non-degenerate bilinear form, see ap-


pendix A.3; it is a field of non-degenerate bilinear forms (scalar products or
pseudo-scalar products), i.e., a non-degenerate bilinear form at each point of
the manifold. Therefore, a metric functions exactly like a (pseudo-)scalar prod-
uct, at each point of the manifold:

Metrics measure ‘lengths’ of vectors. The ‘squared norm function’ of a vector


field v, i.e.,
g(v, v) = gij v i v j ,
is a function on x ∈ M . At the point x ∈ M , this expression defines the squared
norm of the vector v(x). Writing out the coordinate dependence explicitly,
where x is represented by the coordinates (x1 , x2 , . . . , xn ), we have

g(v, v)|x = gij (xk ) v i (xk ) v j (xk ) .

If g(v, v)|x is positive for all v(x) 6= 0 and at each point x, which corresponds to
positive definiteness at each point, then the metric is called Riemannian; below
we discuss a prominent example. Likewise, the angle between the two vectors
v(x), w(x) at the point x ∈ M is defined by using g(v, w)|x in the obvious way.

Metrics are not necessarily Riemannian. If the signature of g|x , at each point
x, is (− + ++), then the metric is called Lorentzian. This means that it is
possible to choose, at each point x ∈ M , a basis such that the components gij
(or rather gµν ) of the metric form the diagonal matrix diag(−1, 1, 1, 1). Let us
reiterate: It is possible to bring the metric to the standard Minkowski form,
but merely at each point x ∈ M separately. There does not exist a coordinate
system and an associated coordinate frame such that g = ηij dxi dxj globally
unless the metric is the Minkowski metric itself.13
Exercise. Show that the non-degeneracy of a metric, see (9.34), can be charac-
terized alternatively as in (A.25c).

Metrics raise and lower indices; e.g., a vector field v = v i ∂i becomes a covector
field with components
vi = gij v j .
13
If we achieve g = gij dxi dxj to fulfill gij |x = ηij at a point x, then gij |y 6= ηij for almost
all points in a neighborhood of x. The diagonal form can not be achieved on open sets.

–168– version 20/01/2010


Chapter 9. Beyond Minkowski Metrics

Exercise. Show that the inverse metric gij is obtained by raising the two indices
of the metric gij .

Let us discuss a prominent Riemannian metric, the standard metric on the unit
sphere S 2 . In a first step we take the standard metric14

dx2 + dy 2

on R2 and express it in polar coordinates, i.e., x = ρ cos ϕ, y = ρ sin ϕ. Strictly


speaking, the polar coordinates form a coordinate system (chart) on R2 \{0}
and not on the entire space R2 . Therefore, the coordinate basis {∂ρ , ∂ϕ } and
the coframe {dρ, dϕ} are well-defined15 on R2 \{0} but not on the entire space
R2 . Since

dx = (dρ) cos ϕ − ρ (sin ϕ)dϕ , dy = (dρ) sin ϕ + ρ (cos ϕ)dϕ ,

we obtain
 2  2
dx2 + dy 2 = (cos ϕ)dρ − ρ (sin ϕ)dϕ + (sin ϕ)dρ + ρ (cos ϕ)dϕ

= dρ2 + ρ2 dϕ2 , (9.35)

which thus is the standard Euclidean metric in polar coordinates.


Exercise. Use dρ2 + ρ2 dϕ2 to compute the length of the vector field ∂ϕ in
dependence on the position. What is the angle between ∂ρ and ∂ϕ ? What is
the length of the vector field ρ−1 ∂ϕ ? And what about the vector field ρ∂ρ + ∂ϕ ?

In the second step we take the standard metric

dx2 + dy 2 + dz 2

on R3 and express it in spherical coordinates, i.e.,

x = r sin ϑ cos ϕ , y = r sin ϑ sin ϕ , z = r cos ϑ .

We first note that

x = ρ cos ϕ , y = ρ sin ϕ , z = r cos ϑ


14
Note that dx2 + dy 2 = δij dxi dxj with i, j = 1, 2.
15
There is one more subtlety connected with the periodic nature of ϕ; recall that ϕ = 0 and
ϕ = 2π are identified. We ignore this issue here.

version 20/01/2010 –169–


The Minkowski metric in accelerated coordinates Chapter 9. Beyond Minkowski

with ρ = r sin ϑ. Using dρ = sin ϑ dr + r cos ϑ dϑ we obtain, from (9.35),


 2
dx2 + dy 2 = dρ2 + ρ2 dϕ2 = sin ϑ dr + r cos ϑ dϑ + r 2 sin2 ϑ dϕ2 ,

and thus
 2
dx2 + dy 2 + dz 2 = sin ϑ dr + r cos ϑ dϑ
 2
+ r 2 sin2 ϑ dϕ2 + cos ϑ dr − r sin ϑ dϑ

= dr 2 + r 2 dϑ2 + r 2 sin2 ϑ dϕ2 . (9.36)

Setting r = const in (9.36) yields the standard metric on a sphere of radius r,


i.e., r 2 dϑ2 + r 2 sin2 ϑ dϕ2 ; setting r = 1 yields the standard metric on the unit
sphere S 2 , i.e.,
gS 2 = dϑ2 + sin2 ϑ dϕ2 . (9.37)
The metric (9.37) allows the computation of lengths and angles of vectors
on the sphere; these computations are purely intrinsic because the quantities
employed are intrinsic to the sphere (like the metric (9.37) and vectors); the
ambient vector space R3 does not enter these computations.

In the next section we turn our attention to the most basic example of Lo-
rentzian metrics: The Minkowski metric itself.

9.4 The Minkowski metric in accelerated coordinates

The simplest Lorentzian metric is the Minkowski metric

η = −dt2 + (dx1 )2 + (dx2 )2 + (dx3 )2 .

Recall that dt2 = dt ⊗ dt, (dx1 )2 = dx1 ⊗ dx1 , etc. In the inertial coordinates
{t, x1 , x2 .x3 } the components of the Minkowski metric are constant.

Consider the metric


−2du dv + dy 2 + dz 2 (9.38)
for (u, v, y, z) ∈ R4 . The assertion is that this metric is again the Minkowski
metric η, but in so-called ‘double null’ instead of inertial coordinates. The

–170– version 20/01/2010


Chapter 9. Beyond Minkowski The Minkowski metric in accelerated coordinates

proof is straightforward. The components of the metric are


 
0 −1 0 0
−1 0 0 0
 ,
0 0 1 0
0 0 0 1

in particular the components are constant and do not depend on the posi-
tion. Therefore we are able to go over to orthonormal coordinates; we refer to
section A.4 for further details. We find that
1 1
u = √ (t + x) , v = √ (t − x) (9.39)
2 2
yields the desired result. Indeed,
1 1
−2du dv + dy 2 + dz 2 = −2 √ (dt + dx) √ (dt − dx) + dy 2 + dz 2
2 2
= −dt2 + dx2 + dy 2 + dz 2 .

The coordinates (u, v, y, z) are called ‘double null’ for the reason that the co-
ordinate lines defined by u and v are null lines. Clearly,

η(∂u , ∂u ) = 0 , η(∂v , ∂v ) = 0 ,

which is obvious from (9.38). Hence, in double null coordinates, the light cone
of a point (in 2-dimensional Minkowski space) is given by the coordinate lines
through that point.

From (9.39) we have


∂ ∂u ∂ ∂v ∂ 1 
∂t = = + = √ ∂u + ∂v .
∂t ∂t ∂u ∂t ∂v 2
Using (9.38) we find that
1 1
η(∂t , ∂t ) = −2 √ √ = −1 ,
2 2
hence ∂t is a timelike unit vector, as expected.

Let us turn our attention to the metric

ds2 = −x′2 dt′2 + dx′2 + dy ′2 + dz ′2 , (9.40)

version 20/01/2010 –171–


The Minkowski metric in accelerated coordinates Chapter 9. Beyond Minkowski

where (t′ , x′ , y ′ , z ′ ) ∈ R4 with x′ > 0. The claim is that (9.40) is again the
Minkowski metric, represented in an accelerated frame of reference.

To establish the claim consider the coordinate transformation


t p
t′ = artanh , x′ = −t2 + x2 , (9.41)
x
and y ′ = y, z ′ = z. The coordinates {t, x, y, z} will then turn out to be standard
inertial coordinates. We have
 ′µ  ! x t
!
∂t′ ∂t′ −
∂x ∂t ∂x 2
−t +x 2 2
−t +x 2
= ∂x = (9.42)
∂xσ µ,σ
′ ∂x′ √ −t √ x
∂t ∂x 2
−t +x 2 −t2 +x2

and therefore
 ∂t′ ∂t′ 2  ∂x′ ∂x′ 2
ds2 = −(−t2 + x2 ) dt + dx + dt + dx + dy 2 + dz 2
∂t ∂x ∂t ∂x
  ∂t′ 2  ∂x′ 2 
2 2
= −(−t + x ) + dt2
∂t ∂t
  ∂t′ ∂t′   ∂x′ ∂x′ 
2 2
+ −2(−t + x ) +2 dt dx
∂t ∂x ∂t ∂x
  ∂t′ 2  ∂x′ 2 
2 2
+ −(−t + x ) + dx2 + dy 2 + dz 2
∂x ∂x
= −dt2 + dx2 + dy 2 + dz 2 ,

hence (9.40) is indeed the Minkowski metric as claimed.

The coordinates (9.41) are called ‘Rindler coordinates’. These coordinates do


not cover the entire Minkowski space but merely a part of it, namely, the
Cartesian product of the set of events that are spacelike separated from the
origin in the ht, xi plane times the hy, zi plane. A ‘Rindler observer’ is an
observer with world line x′ = const, y ′ = const, z ′ = const; note that the
tangent ∂t′ is timelike as required. In inertial coordinates, a Rindler observer
is represented by a hyperbolic world line, hence it is a uniformly accelerated
observer.

The Minkowski metric in Rindler coordinates is useful in several contexts. Due


to its association with uniformly accelerated observers there is a close connec-
tion with the equivalence principle. E.g., it is possible (and instructive) to
reinvestigate the results of section 9.2 with the aid of (9.40).

–172– version 20/01/2010


Chapter 9. Beyond Minkowski Geodesics

An important lesson to learn from the considerations of this section is that it


might to difficult to tell whether a metric is truly different from a metric that is
already well-known or just a well-known metric ‘in disguise’, i.e., represented in
unusual coordinates. At least in the case of the Minkowski metric it turns out
that there is a good criterion, the vanishing or non-vanishing of the Riemann
curvature tensor associated with the metric.16 In anticipation of things to
come, we conclude this section with the vague remark that, in general relativity,
gravity is represented by metrics that are different from the Minkowski metric.

9.5 Geodesics

Consider two points p1 and p2 on a manifold with a Riemannian metric. What


is the distance between p1 and p2 ?

Well, if c is a (differentiable) curve connecting the two points, then the length
of c is
Z Z λ2 Z λ2 
 dxi dxj 1/2
ds = g(w, w)1/2 dλ = gij xk (λ) dλ , (9.43)
λ1 λ1 dλ dλ
where
dxi (λ)
wi (λ) =

is the field of tangent vectors along the curve c, which is parametrized17 as
λ 7→ xi (λ).18
Example. We use (9.37) in (9.43) to compute the circumference of a circle of
latitude. This circle is represented by the curve ϑ = const parametrized by
ϕ ∈ [0, 2π). The tangent vector field is ∂ϕ ; the length of the tangent vectors is
gS 2 (∂ϕ , ∂ϕ ) = sin2 ϑ .
The length of the curve is computed through the path integral, i.e.,
Z Z 2π q Z 2π p
s = ds = gS 2 (∂ϕ , ∂ϕ ) dϕ = sin2 ϑ dϕ = 2π sin ϑ .
0 0
16
The metric is Minkowski in the former case; in the latter it is not.
17
The choice of parametrization is irrelevant. Simple exercise: Show the invariance of (9.43)
under reparametrizations.
18
For simplicity we assume that the entire curve is contained in the domain of one chart
with coordinates (x1 , . . . , xn ). If two (or more) charts are necessary to cover the curve we
divide the curve into two (or more) parts whose lengths we compute separately by (9.43);
adding up we obtain the total length.

version 20/01/2010 –173–


Geodesics Chapter 9. Beyond Minkowski

In connection with the remarks of section 9.3 we note that our computation is
based on purely intrinsic quantities of the sphere (namely the metric and the
tangent vector field of the curve); the ambient vector space did not enter our
considerations.

The distance between two points p1 and p2 on a Riemannian manifold we take


to be the length of the shortest curve connecting p1 and p2 (assuming that
the lower bound in length is actually attained). Note that this curve need not
be unique; e.g., there exists infinitely many curves of equal (shortest) distance
between the north and the south pole of a sphere. In any case, the shortest
curve is a (local) extremum of length; we call such a curve a geodesic. The
notion of a geodesic is the generalization of a straight line in Rn .

Let us turn to spacetimes, i.e., manifolds with Lorentzian metrics (where the
signature is (− + ++) as usual). A curve is said to be timelike if the norm
of its tangent is everywhere timelike, i.e., g(v, v) < 0; it is null if its tan-
gent is everywhere null; it is spacelike if its tangent is everywhere spacelike;
cf. definition 4.2.

Every spacelike curve has a length defined by its ‘arc length’ (9.43). The ‘arc
length’ of timelike curves, on the other hand, represents proper time. Consider
a timelike curve c,
λ 7→ xµ (λ) ,
which connects two events p1 and p2 that are represented by xµ (λ1 ) and xµ (λ2 ),
respectively. Let
dxµ (λ)
wµ (λ) =

denote the tangent vector field along c. The proper time that passes along this
curve is
Z λ2 Z λ2 
1/2  dxµ dxν 1/2
s= − g(w, w) dλ = − gµν xσ (λ) dλ . (9.44)
λ1 λ1 dλ dλ

This is the straightforward generalization of equation (4.9) that gives proper


time along a curve in Minkowski space.19

A curve that extremizes (9.44) we call a (timelike) geodesic. To derive the con-
dition on a curve of being a geodesic, i.e., the geodesic equation, we regard (9.44)
19
As in (9.43) we assume, for simplicity, that the entire curve can be represented by one set
of local coordinates.

–174– version 20/01/2010


Chapter 9. Beyond Minkowski Geodesics

as a Lagrangian action and use variational analysis. Let


1/2
L(xµ , ẋµ ) = −gµν (xσ ) ẋµ ẋν , (9.45a)

where in this context (and in this context alone) we make use of the abbrevia-
tion ẋµ = dxµ /dλ. (In general, we reserve the dot notation for differentiation
w.r.t. proper time.20 ) Varying the action
Z
s = L dλ , (9.45b)

where we keep the endpoints fixed, we obtain the Euler-Lagrange equations


∂L d ∂L
µ
− =0. (9.46)
∂x dλ ∂ ẋµ
For the first term we find
∂L 1 1
µ
=− gσλ,µ ẋσ ẋλ ,
∂x 2 L
the second term is
∂L 1
µ
= − gµλ ẋλ ,
∂ ẋ L
hence the Euler-Lagrange equations read
1 1 d 1 
− gσλ,µ ẋσ ẋλ + gµλ ẋλ = 0 . (9.47)
2 L dλ L
We choose to use a parametrization of the curve w.r.t. proper time (which is
an ‘affine parametrization’, cf. the remark on page 177), whence

d 1 d
= ,
ds L dλ
see (9.45). Accordingly, when we multiply (9.47) with L−1 we obtain

1 dxσ dxλ d dxλ 


− gσλ,µ + gµλ =0.
2 ds ds ds ds
Since gµλ = gµλ (xκ ), this further results in

d2 xλ dxσ dxλ 1 dxσ dxλ


gµλ 2
+ gµλ,σ − gσλ,µ =0.
ds ds ds 2 ds ds
20
More generally, we use the dot in connection with affine parameters, see below.

version 20/01/2010 –175–


Geodesics Chapter 9. Beyond Minkowski

Let us reintroduce the dot notation, but this time according to the standard
convention that the overdot refers to differentiation w.r.t. proper time. The
equation then becomes

1
gµλ ẍλ + gµλ,σ ẋσ ẋλ − gσλ,µ ẋσ ẋλ = 0 .
2

Using the fact that ẋλ ẋσ is symmetric in λ and σ we write the second term as

gµ(λ,σ) ẋσ ẋλ .

Summarizing,

1 
gµλ ẍλ + gµλ,σ + gµσ,λ − gλσ,µ ẋσ ẋλ = 0 (9.48)
2
corresponds to the Euler-Lagrange equations. This equation is the condition
that the curve extremizes ‘arc length’ (proper time), i.e., the geodesic equation.

Define
1 
gµν,σ + gµσ,ν − gνσ,µ .
Γµνσ =
2
As an aide memoir one can use the ‘curly braces’ notation

A{ijk} := Aijk − Ajki + Akij ;

the indices run over the cyclic permutations of (ijk), where positive and nega-
tive signs alternate.21 We thereby obtain

1 1 
Γµνσ = g{µν,σ} = gµν,σ − gνσ,µ + gσµ,ν .
2 2
Expressed in terms of Γµνσ , equation (9.48) becomes

gµν ẍν + Γµσλ ẋσ ẋλ = 0 (9.48′ )

We define the Christoffel symbols as22

1 µλ  
Γµνσ = gµλ Γλνσ = g gλν,σ + gλσ,ν − gνσ,λ . (9.49)
2
21
A note of caution: This notation is not particularly common in the literature.
22
The Christoffel symbols (9.49) represent the so-called Levi-Cività connection associated
with the metric g; see also footnote 26.

–176– version 20/01/2010


Chapter 9. Beyond Minkowski Geodesics

On multiplying (9.48′ ) with the inverse of the metric, it becomes the geodesic
equation
ẍµ + Γµσλ ẋσ ẋλ = 0 . (9.50)

Let us summarize what we succeeded in showing: Let c be a timelike curve that


is parametrized w.r.t. proper time (or, more generally, affinely parametrized23 )
Then c is a geodesic if and only if it satisfies the geodesic equation (9.50).

The case of spacelike geodesic is completely analogous. A spacelike curve that is


affinely parametrized (which in the spacelike case means that it is parametrized
w.r.t. arc length24 ) is a geodesic if and only if (9.50) holds.

Finally, consider null geodesics. Since the tangent field is a null vector field,
the ‘arc length’ is zero. A straightforward analog of the extremization con-
siderations is thus not available. However, we may simply resort to (9.50).
A (parametrized) null curve is called an (affinely parametrized) null geodesic
if (9.50) is satisfied.
Remark. Let us explain the terminology ‘affine parametrization’ for timelike
and spacelike geodesics. Suppose that a timelike geodesic is parametrized
w.r.t. proper time s. Let t = λ1 + λ2 s with λ1 , λ2 ∈ R. It is common to
refer to parameters t of this kind as affine parameters, since the transformation
s 7→ t is obviously an affine transformation. The importance of affine parame-
ters lies in the fact that the geodesic equation (9.50) is invariant under affine
reparametrizations.
Example. Using an inertial frame of reference in Minkowski space, the Christof-
fel symbols vanish, i.e.,
Γµνσ = 0 .

Therefore the geodesic equation becomes ẍµ = 0, which yields the straight
lines.

From (9.49) it is immediate that Γµνσ is symmetric in (νσ).25 It is important to


note that the Christoffel symbols do not form a tensor field. We will elaborate
on the transformation of the Christoffel symbols shortly; but first, another
example.
23
Proper time is a particular choice of ‘affine parameter’. We refer to the remark on page 177.
24
Or a multiple thereof; cf. footnote 23 and the remark on page 177.
25
Note, however, that this symmetry requires the choice of a coordinate frame.

version 20/01/2010 –177–


Geodesics Chapter 9. Beyond Minkowski

Example. Consider the (Riemannian) manifold S 2 with the standard met-


ric (9.37). Set x1 = ϑ and x2 = ϕ; then

1 il  
Γijk = g glj,k + glk,j − gjk,l
2
leads to
1 1l   1 11  
Γ1jk = g glj,k + glk,j − gjk,l = g g1j,k + g1k,j − gjk,1 ,
2 2
1 2l   1 22  
Γ2jk = g glj,k + glk,j − gjk,l = g g2j,k + g2k,j ,
2 2
since the metric is diagonal and independent of the second coordinate. We find
1 11  
Γ111 = g g11,1 + g11,1 − g11,1 = 0 ,
2
1 11  
Γ112 = g g11,2 + g12,1 − g12,1 = 0 ,
2
1 11  
Γ122 = g g12,2 + g12,2 − g22,1 = − sin ϑ cos ϑ ,
2
1 22  
Γ211 = g g21,1 + g21,1 = 0 ,
2
1 22   1
Γ212 = g g21,2 + g22,1 = cos ϑ ,
2 sin ϑ
1 22  
Γ222 = g g22,2 + g22,2 = 0 .
2
The geodesic equation (9.50) thus reads

ϑ̈ − sin ϑ cos ϑ ϕ̇ ϕ̇ = 0 ,
cos ϑ (9.51)
ϕ̈ + 2 ϑ̇ ϕ̇ = 0 .
sin ϑ
Note the factor of 2 in the second equation which is due to the fact that
Γ221 = Γ212 . Simple solutions of (9.51) are the circles ϕ = const and ϑ = π/2;
for a simplification of the ODEs (9.51) we refer to the exercise course.
Remark. By equation (9.49), the Christoffel symbols can be computed directly
from the metric. However, in many cases this is rather time-consuming. An
alternative way to obtain the Christoffel symbols is to proceed in analogy
with (9.45) et. seq., i.e., to derive the Euler-Lagrange equations of the La-
grangian
1/2
L(xµ , ẋµ ) = ±gµν (xσ ) ẋµ ẋν .

–178– version 20/01/2010


Chapter 9. Beyond Minkowski Geodesics

(Note that L ≡ ±1 in the affine parametrization w.r.t. proper time or arc


length.) As an example consider again the standard metric (9.37) on S 2 which
yields
1/2
L = ϑ̇2 + sin2 ϑ ϕ̇2 .
The Euler-Lagrange equations are

∂L d ∂L ∂L d ∂L
− =0, − =0.
∂ϑ ds ∂ ϑ̇ ∂ϕ ds ∂ ϕ̇

Since L ≡ 1 we may replace L by L2 in these equations to simplify matters,


i.e.,
∂L2 d ∂L2 ∂L2 d ∂L2
− =0, − =0.
∂ϑ ds ∂ ϑ̇ ∂ϕ ds ∂ ϕ̇
We then obtain
d d 2 
2 sin ϑ cos ϑ ϕ̇2 − 2 ϑ̇ = 0 , −2 sin ϑ ϕ̇ = 0 ,
ds ds
which results in
cos ϑ
ϑ̈ − sin ϑ cos ϑ ϕ̇2 = 0 , ϕ̈ + 2 ϑ̇ ϕ̇ = 0 .
sin ϑ
From these geodesic equations, the Christoffel symbols can be read off.

Christoffel symbols and covariant derivative

By (9.49), the Christoffel symbols are a collection of numbers which we write


as an array with three indices. However, the Christoffel symbols do not form a
tensor field. This becomes explicit when we consider the transformation of the
Christoffel symbols under a change of coordinates.

Consider the geodesic equation (9.50). Being the tangent vector field to a curve,
ẋµ behaves like any other vector field under a change of coordinates, i.e.,

dx̄µ ∂ x̄µ dxν ∂ x̄µ ν


x̄˙ µ = = = ẋ , (9.52a)
ds ∂xν ds ∂xν
cf. (9.24). The expression
d2 xµ
ẍµ = ,
ds2

version 20/01/2010 –179–


Geodesics Chapter 9. Beyond Minkowski

on the other hand, is not a vector. We simply note that


d dx̄µ d  ∂ x̄µ λ  ∂ 2 x̄µ ∂ x̄µ λ
¨µ =
x̄ = ẋ = ẋσ λ
ẋ + ẍ (9.52b)
ds ds ds ∂xλ ∂xλ ∂xσ ∂xλ
under the change of coordinates xµ 7→ x̄µ = x̄µ (xσ ).
Exercise. In chapter 7 we defined the acceleration as aµ = ẍµ and treated it
as a four-vector. Why is this valid in the context of special relativity? Argue
with the aid of (9.52b).

Consider a geodesic c. Expressed w.r.t. the coordinate system {xµ } this means
that c is represented by the (affinely parametrized) curve s 7→ xµ (s) that sat-
isfies the geodesic equation

ẍµ + Γµσλ (xκ ) ẋσ ẋλ = 0 , (9.53a)

cf. (9.50). Equivalently, we are able to use a different coordinate system {x̄µ }
and represent c as s 7→ x̄µ (s) with
¨µ + Γ̄µσλ (x̄κ ) x̄˙ σ x̄˙ λ = 0 .
x̄ (9.53b)

Insertion of (9.52) yields

∂ x̄µ λ ∂ 2 x̄µ σ λ µ κ ∂ x̄
σ ∂ x̄λ
′ ′
ẍ + ẋ ẋ + Γ̄ σλ (x̄ ) ẋσ ẋλ = 0 ,
∂xλ ∂xλ ∂xσ ∂xσ′ ∂xλ′
or, equivalently, on multiplication with ∂xµ‘ /∂ x̄µ ,
 ∂xµ′ ∂ 2 x̄µ µ ∂ x̄σ ∂ x̄λ 

µ′ µ κ ∂x ′ ′
ẍ + µ σ ′ λ′ + Γ̄ σλ (x̄ ) µ σ ′ λ′ ẋσ ẋλ = 0 . (9.54)
| ∂ x̄ ∂x ∂x {z ∂ x̄ ∂x ∂x }

Γµ σ′ λ′

Comparison with (9.53a) thus entails that


′ ′
µ′ ∂xµ ∂ 2 x̄µ µ ∂xµ ∂ x̄σ ∂ x̄λ
Γ σ′ λ′ = + Γ̄ σλ ∂ x̄µ , (9.55a)
∂ x̄µ ∂xσ ∂xλ
′ ′
∂xσ′ ∂xλ′
or, equivalently,
′ ′
µ κ ∂ x̄µ ∂ 2 xν µ′
µ
κ ∂ x̄ ∂x
σ ∂xλ
Γ̄ σλ (x̄ ) = + Γ σ ′ λ′ (x ) . (9.55b)
∂xν ∂ x̄σ ∂ x̄λ ∂xµ ∂ x̄σ ∂ x̄λ

From (9.55) we see that the Christoffel symbols are merely an array of real
numbers and not a tensor; the first term in the transformation formula is
‘untensorial’, cf. (9.31).

–180– version 20/01/2010


Chapter 9. Beyond Minkowski Geodesics

Exercise. Show the equivalence of (9.55a) and (9.55b). This amounts to proving
that
∂ 2 x̄µ ∂xν ∂xσ ∂ x̄µ ∂ 2 xκ
− ν σ = .
∂x ∂x ∂ x̄λ ∂ x̄ρ ∂xκ ∂ x̄λ ∂ x̄ρ
Use that
∂ x̄µ ∂xκ
= δµν ,
∂xκ ∂ x̄ν
and differentiate w.r.t., say, xπ .

Finally, let us briefly introduce the notion of covariant derivative.26 Let


f = f (xk ) be a function; then the covariant derivative ∇i of f is defined to
coincide with the regular partial derivative, i.e.,27

∇i f (xk ) = ∂i f (xk ) . (9.56a)

Let v be a vector field, i.e., v = v i (xk )∂i . Then the covariant derivative of v is
a tensor field whose components we write as ∇i v j ; these are defined to be

∇i v j = ∂i v j + Γj ik v k . (9.56b)

Let a be a covector field, i.e., a = ai (xk )dxi . Then the covariant derivative is
a tensor field whose components we write as ∇i aj ; these are defined to be

∇i aj = ∂i aj − Γkij ak . (9.56c)

The covariant derivative of tensor fields generalizes the covariant derivative of


vector and covector fields in the obvious way: Each contravariant (i.e., upper)
index behaves according to (9.56b), each covariant (i.e., lower) index behaves
according to (9.56c); e.g.,

∇i gjk = ∂i gjk − Γlij glk − Γlik gjl .

Let u be a vector field. The covariant derivative in the direction of u is

∇u = ui ∇i .

It is immediate from (9.56) that the covariant derivative ∇u preserves the rank
of the tensor: If, e.g., v is a vector, then so is ∇u v. (The components of ∇u v
are are ∇u v i .) If, e.g., T is a tensor of rank (p, q), then so is ∇u T .
26
For the abstract (and beautiful) definition of covariant derivatives in terms of connections
we refer to the lecture course ”Einführung in die Relativitätstheorie und Kosmologie II”.
27
We use Latin indices for a change to indicate the general nature of these considerations.

version 20/01/2010 –181–


Geodesics Chapter 9. Beyond Minkowski

Several comments are in order. First, a comment on notation. It is custom-


ary to use the ‘comma notation’ for the partial derivative and the ‘semicolon
notation’ for the covariant derivative, i.e., (9.56) is written as28

f;i = f,i , v j;i = v j,i + Γj ik v k , aj;i = aj,i − Γkij ak . (9.56′ )

Second, a comment on consistency. In (9.27) we have seen that a covector field


a acts on i
 vector fields v to produce functions a(v) = ai v . Let us compute
∇i a(v) . On the one hand, since this is a function, we obtain from (9.56a)

 
∇i aj v j = ∂i aj v j = aj,i v j + aj v j,i .

On the other hand, we apply (9.56b) and (9.56c) and the Leibniz rule for
derivative operators, i.e.,

    
∇i aj v j = ∇i aj v j + aj ∇i v j = aj,i − Γkij ak v j + aj v j,i + Γj ik v k
= aj,i v j + aj v j,i .

Since this reproduces the original result, we find that the three definitions (9.56)
are consistent (i.e., consistent with the Leibniz rule).

Third, the proof of the crucial claim: We show the tensorial character of the
covariant derivative. Consider ∇i v j ; we prove that these are the components

28
The last equation looks nicer when expressed as aj;i = aj,i − Γkji ak ; we simply use the
symmetry of the Christoffel symbols in the two lower indices. Note, however, that this
symmetry requires the choice of a coordinate frame.

–182– version 20/01/2010


Chapter 9. Beyond Minkowski Geodesics

of a tensor of rank (1, 1). We have


 ∂ x̄j 
¯ i v̄ j = ∂¯i v̄ j + Γ̄j v̄ k = ∂
∇ v k + Γ̄j ik v̄ k
ik ∂ x̄i ∂xk
∂xl ∂  ∂ x̄j k 
= v + Γ̄j ik v̄ k
∂ x̄i ∂xl ∂xk
∂xl ∂ 2 x̄j k ∂xl ∂ x̄j ∂v k
= v + i + Γ̄j ik v̄ k
∂ x̄i ∂xk ∂xl ∂ x̄ ∂xk ∂xl
∂xl ∂ 2 x̄j k ∂xl ∂ x̄j k j ∂ x̄k l
= v + ∂l v + Γ̄ ik v
∂ x̄i ∂xk ∂xl ∂ x̄i ∂xk ∂xl
∂xl ∂ 2 x̄j k ∂xl ∂ x̄j
= v + i ∂l v k +
∂ x̄i ∂xk ∂xl ∂ x̄ ∂xk
 ∂ x̄j ∂xi′ ∂xk′ ′ ∂ x̄j ∂xi ∂xk ∂xj ∂ 2 x̄n  ∂ x̄k l
′ ′ ′
j
+ Γ i′ k′ − j ′ v
∂xj ′ ∂ x̄i ∂ x̄k ∂x ∂ x̄i ∂ x̄k ∂ x̄n ∂xi′ ∂xk′ ∂xl
∂xl ∂ 2 x̄j k ∂xl ∂ x̄j
= v + i ∂l v k +
∂ x̄i ∂xk ∂xl ∂ x̄ ∂xk
′ ′
∂ x̄j ∂xi k′ j ′ l ∂xi j k′ ∂ 2 x̄n
+ j′ δ Γ v − δ δ vl
l ik ∂ x̄i n l ∂xi′ ∂xk′
′ ′
∂x ∂ x̄i
′ ′
∂xl ∂ 2 x̄j k ∂ x̄j ∂xl k ∂ x̄j ∂xi j ′ l ∂xi ∂ 2 x̄j
= v + ∂l v + Γ i′ l v − vl
∂ x̄i ∂xk ∂xl ∂xk ∂ x̄i ∂xj ′ ∂ x̄i ∂ x̄i ∂xi′ ∂xl
∂ x̄j ∂xl  k k n

= ∂l v + Γ ln v
∂xk ∂ x̄i
∂ x̄j ∂xl
= ∇l v k ,
∂xk ∂ x̄i
which is what we intended to show. Here we have used the transformation of
vectors, see (9.24), and the transformation rule for the Christoffel symbols in
the form (9.55a).

The covariant derivative is intimately connected with the notion of parallel


transport. Let c be a curve with tangent vector field u. A vector field v is
called parallelly transported along c if

∇u v = 0

along this curve. If the curve is represented by λ 7→ xµ (λ) in local coordinates,


then uµ = dxµ /dλ and ∇u v reads
 
uµ xκ (λ) ∇µ v ν xκ (λ) ,

when written out explicitly.

version 20/01/2010 –183–


Geodesics Chapter 9. Beyond Minkowski

Let c be a timelike curve and s 7→ xµ (s) an affine parametrization (i.e., s is


proper time or a multiple thereof). The tangent vector field is

dxµ (s)
uµ (s) = ẋµ (s) = .
ds
The covariant derivative of this vector along the curve c itself is
dxν ∂ dxµ
∇u uµ = uν ∇ν uµ = uν ∂ν uµ + uν Γµνσ uσ = + Γµνσ ẋν ẋσ
ds ∂xν ds
d2 xµ
= + Γµνσ ẋν ẋσ = ẍµ + Γµνσ ẋν ẋσ .
ds2
Comparing with (9.50) we conclude that a curve c is a geodesic if and only if
its tangent vector is parallelly propagated along c, i.e., parallel to itself along
the curve. The geodesic equation is simply ∇u u = 0 or

∇ẋ ẋ = 0 . (9.57)

We conclude this section with a property of parallel transport: Parallel trans-


port respects lengths and angles. Let c be a curve and u its tangent field.
Suppose that v and w are vector fields that are parallelly propagated along
c. Then g(v, v), g(w, w), and g(v, w) are constant along c. To see this we
compute, e.g.,
    
∇u g(v, w) = ∇u gµν v µ wν = ∇u gµν v µ wν + gµν ∇u v µ wν + v µ ∇u wν

= ∇u gµν v µ wν .

We then note that


∇σ gµν = 0 ,
which is because

gµν;σ = gµν,σ − Γκσµ gκν − Γκσν gµκ = gµν,σ − Γνσµ − Γµσν


1 
= gµν,σ − g{νσ,µ} + g{µσ,ν}
2
1 
= gµν,σ − gνσ,µ − gσµ,ν + gµν,σ + gµσ,ν − gσν,µ + gνµ,σ = 0 .
2
In other words, the metric is a parallel tensor field, i.e., parallelly propagated
along arbitrary curves. Continuing the above argument we find that

∇u g(v, w) = ∇u gµν v µ wν = 0 ,

–184– version 20/01/2010


Chapter 9. Beyond Minkowski A scalar relativistic theory of gravity

from which we conclude that


g(v, w) = const
along c. The claim follows.
Remark. As a special case consider a geodesic c. Then the tangent u is by
definition parallel along c. If v is another parallel vector field along c, then
g(u, v) = const
along c. In particular, g(u, u) = const, as expected, since c is affinely para-
metrized.
Exercise. If c satisfies the geodesic equation (9.57), then the causal character
(timelike/null/spacelike) of c cannot change along the curve. Prove this.

9.6 A scalar relativistic theory of gravity

In section 7.5 we have made an attempt to formulate a relativistic theory of


gravity. In this theory the gravitational field is basically represented by a
scalar function (‘gravitational potential’) V = V (xλ ) on Minkowski spacetime;
we thus speak of a scalar theory of gravity. The potential V (xλ ) gives rise to
a gravitational field tensor
Gµνσ = Gµνσ (xλ ) = V,µ ηνσ − ηµ(ν V,σ) , (9.58)
see (7.80), and the motion of a test particle is described by Newton’s second
law, where the field tensor generates the force, i.e.,
mi u̇µ = mg Gµνσ uν uσ (9.59)

or

mi ẍµ = mg Gµνσ ẋν ẋσ . (9.59′ )


In complete analogy with the arguments of section 9.1 we find that imposing
the equivalence principle leads to equality of inertial and gravitational mass,
i.e., mi = mg . Therefore,
ẍµ = Gµνσ ẋν ẋσ (9.60)
holds irrespective of the particle’s mass. Note that the dot stands for differen-
tiation w.r.t. proper time s, i.e., a solution s 7→ xµ (s) of (9.60) is a world line
parametrized by ‘arc length’; this is equivalent to the condition uµ uµ = −1
(where uµ = ẋµ ).

version 20/01/2010 –185–


A scalar relativistic theory of gravity Chapter 9. Beyond Minkowski

Remark. To formulate a complete scalar theory of gravity we need a relativistic


analog of the Poisson equation for the potential V . A reasonable equation (in
the vacuum case) is V = 0. However, as mentioned in section (7.5), this
scalar theory of gravity is falsified by experiments. (The ‘correct’ theory is
general relativity.)

A quick glance at (9.60) reveals its similarity to the geodesic equation (9.50).
Note that Gµνσ is symmetric in (νσ) as are the Christoffel symbols of a Levi-
Cività connection. The Christoffel symbols of which metric? Define the metric

gµν = e2V ηµν . (9.61)

The associated Christoffel symbols are


1 1 1

Γµνσ = 2 g{µν,σ} = 2(gµν,σ − gνσ,µ + gσµ,ν ) = 2 2gµ(ν,σ) − gνσ,µ
 (9.62)
= −e2V V,µ ηνσ − 2ηµ(ν V,σ) ,

which closely resemble (9.58), Therefore we conjecture that (9.60) coincides


with the geodesic equation in the spacetime with metric gµν = e2V ηµν .

Let us be precise. Consider a world line in Minkowski space that is parametrized


w.r.t. proper time, i.e., xµ (s). Note that proper time s is the proper time w.r.t.
the Minkowski metric ηµν . Let x̄µ (s̄) denote the same world line in a different
parametrization, i.e.,
xµ (s) = x̄µ (s̄) ,
where s̄ denotes proper time w.r.t. the metric gµν given by (9.61).

And here’s the claim: xµ (s) satisfies the equation (9.60) in Minkowski space
if and only if x̄µ (s̄) satisfies the geodesic equation in the spacetime with met-
ric (9.61).

Let us prove this claim. Since we use two concepts of proper time in parallel,
it is advisable to avoid the use of the overdot. Let s denote proper time w.r.t.
ηµν and s̄ proper time w.r.t. gµν ; then

ds̄ = eV ds . (9.63)

To see this we simply note that g(v, v) = e2V η(v, v) for every vector v µ and in
particular for every timelike vector; if x(λ) denotes an arbitrary parametrization
of a timelike world line, then, by (4.8) and (9.44),
q  q 
V
ds̄ = −g dx/dλ, dx/dλ dλ = e −η dx/dλ, dx/dλ dλ = eV ds .

–186– version 20/01/2010


Chapter 9. Beyond Minkowski A scalar relativistic theory of gravity

A very intuitive way of deriving (9.63) is to use the line element notation, i.e.,

gµν = e2V ηµν ⇔ ds̄2 = e2V ds2 ,

from which (9.63) can be read off immediately.

We set
d µ d µ
uµ (s) = x (s) and ūµ (s̄) = x̄ (s̄) .
ds ds̄
Since
d d
= e−V ,
ds̄ ds
we find that
λ (s))
ūµ (s̄) = e−V(x uµ (s) .
Suppressing the arguments we simply write ūµ = e−V uµ . Note that uµ is
normalized w.r.t. ηµν while ūµ is normalized w.r.t. gµν , i.e.,

ηµν uµ uν = −1 , gµν ūµ ūν = −1 .

This is consistent with the requirement that the world line be parametrized
w.r.t. proper time s and s̄, respectively.

Based on these preparations we obtain


 
d µ d  −V µ  duµ (s) dV(xλ (s)) µ
ū (s̄) = e−V e u (s) = e−2V − u (s) .
ds̄ ds ds ds

Furthermore,
dV(xλ (s)) ∂V dxµ
= = V,µ uµ (s) ,
ds ∂xµ ds
and hence  
d µ duµ (s)
ū (s̄) = e−2V τ µ
− V,τ u (s) u (s) . (9.64)
ds̄ ds

The equation of motion (9.60) reads

duµ
= Gµνσ uν uσ . (9.65)
ds
It is important to note that the indices are raised and lowered with ηµν in this
context; in particular,
Gµνσ = η µρ Gρνσ .

version 20/01/2010 –187–


A scalar relativistic theory of gravity Chapter 9. Beyond Minkowski

Let us do the comparison with (9.62). Because



Gµνσ = η µρ Gρνσ = η µρ −e−2V Γρνσ + ηρ(ν V,σ) = −e−2V η µρ Γρνσ + δµ(ν V,σ) ,

and
gµν = e−2V η µν ,
where gµν is the inverse of gµν , i.e., gµν gνσ = δµσ , we obtain

Gµνσ = −gµρ Γρνσ + δµ(ν V,σ) = −Γµνσ + δµ(ν V,σ) , (9.66)

where the indices on the r.h.s. are raised and lowered with gµν . An equation
like (9.66) is extremely ‘risky’; it is preferable to write

η µρ Gρνσ = −gµρ Γρνσ + δµ(ν V,σ)

to avoid confusion about which indices are raised and lowered with which met-
ric.29

Inserting (9.65) into (9.64) and using (9.66) we find

d µ
ū (s̄) = e−2V [Gµνσ uν (s)uσ (s) − V,τ uτ (s) uµ (s)]
ds̄
h i
= e−2V −Γµνσ uν (s) uσ (s) + δµ(ν V,σ) uν (s)uσ (s) − V,τ uτ (s) uµ (s)
h i
= −Γµνσ ūν (s̄) ūσ (s̄) + e−2V δµν V,σ uν (s) uσ (s) − V,τ uτ (s) uµ (s)
h i
= −Γµνσ ūν (s̄) ūσ (s̄) + e−2V uσ (s) V,σ uµ (s) − V,τ uτ (s) uµ (s)

= −Γµνσ ūν (s̄) ūσ (s̄) .

This is the geodesic equation for x̄(s̄), i.e.,

d µ
ū (s̄) + Γµνσ ūν (s̄)ūσ (s̄) = 0 (9.67)
ds̄
or

d2 µ µ dx̄ν (s̄) dx̄σ (s̄)


x̄ (s̄) + Γ νσ =0. (9.67′ )
ds̄2 ds̄ ds̄
This completes the proof of the claim.
29
A large number of mistakes in general relativity are due to this kind of confusion.

–188– version 20/01/2010


Chapter 9. Beyond Minkowski The equivalence principle

Let us recapitulate the essence of what we have shown. To formulate a scalar


relativistic theory of gravity we consider a potential V (xλ ) and a resulting
tensor field (9.58) on Minkowski spacetime and use (9.60) as the equation
motion of a particle in the gravitational field. However, there is an equivalent
viewpoint. The gravitational field is not represented by a field on Minkowski
spacetime but by a different spacetime, which is characterized by a metric that
differs from the Minkowski metric; in our case, the metric (9.61). The motion of
test particles is simply geodesic motion (free fall) w.r.t. this spacetime metric.

Unfortunately, the scalar theory of gravity we have discussed here does not
represent our physical reality correctly. But we have come across a fundamental
principle:

Gravitation is modeled by spacetime metrics. Minkowski spacetime repre-


sents a spacetime where gravitational interaction is absent—the Minkowski
metric is flat. Gravity is present in curved spacetimes, which are spacetimes
with metrics different from the Minkowski metric.

In these curved spacetimes, the motion of test particles (in the absence of other
forces) is geodesic motion, i.e., free fall.

9.7 The equivalence principle

Suppose (M, gµν ) is a spacetime, i.e., M a four-dimensional manifold equipped


with a Lorentzian metric. The motion of a freely falling test body is geodesic
motion; this is irrespective of the composition and the mass of the test body.30
The equivalence principle is thus automatically implemented. Let {z µ } denote a
local coordinate system31 ; then the equation of motion is the geodesic equation,
i.e.,
∇ż ż µ = z̈ µ + Γµσλ ż σ ż λ = 0 . (9.68)

In sections 9.1 and 9.2 we have taken the local equivalence between gravitation
and acceleration to good account. Let us finally formalize this idea.

Take an arbitrary point p ∈ M . In a neighborhood of p we construct a system


30
We avoid the subtleties connected with the concept of a ‘test body’.
31
We save the notation {xµ } for the coordinate system of Riemann normal coordinates we
construct below.

version 20/01/2010 –189–


The equivalence principle Chapter 9. Beyond Minkowski

of coordinates that are called Riemann normal coordinates. Let us describe


the procedure.32 Choose an orthonormal basis of vectors at the point p, which
we call e0 |p , . . . , e3 |p . Choose a quadruple (x0 , x1 , x2 , x3 ) ∈ R4 and define
the vector x|p = xµ eµ |p . Solve the geodesic equation (9.68) with initial data
z(0) = p and ż(0) = x|p (i.e., ż µ (0) = xµ ). The solution t 7→ z(t) is the unique
geodesic passing through p whose tangent vector at p is x|p . Flow along with
this geodesic until the affine parameter has reached 1, i.e., follow the geodesic to
the point z(1). Finally, assign the point z(1) on the spacetime the coordinates
xµ .
Remark. Suppose that we take λx|p instead of x|p (with λ ∈ R) as the initial
vector. The geodesic t 7→ z̄(t) with initial data z̄(0) = p and z̄(0) ˙ = λx|p is
merely an affine reparametrization of the original geodesic; regarded as geo-
metric curves, the two coincide. Hence the point z̄(1) coincides with z(λ).
Exercise. An alternative description of the construction of Riemann normal
coordinates is the following: Take a point p and any point q of a neighborhood;
let t 7→ z(t) be the unique geodesic joining p and q, where the parametrization
is chosen to satisfy z(0) = p and z(1) = q. The coordinates assigned to q are
the components of ż(0) w.r.t. the orthonormal basis e0 |p , . . . , e3 |p at p. Fill in
the small gaps in the argumentation.

Using the procedure outlined above we are able to assign coordinates xµ to


each point in a neighborhood of p. The coordinates so constructed are the
Riemann normal coordinates. These coordinates have convenient properties.
By construction we have

∂µ p = µ = eµ p .
∂x p
Therefore, in these coordinates the metric is ds2 = gµν dxµ dxν , where

gµν p = ηµν , (9.69)
because gµν = g(∂µ , ∂ν ). But Riemann normal coordinate achieve much more
than (9.69). In these coordinates we have
1
gµν (xσ ) = ηµν −
Rµσνρ xσ xρ + terms of third order , (9.70)
3
where Rµσνρ is the Riemann tensor. (We refrain from giving a proof of (9.70).)
From (9.70) we obtain (9.69) and

gµν,σ p = 0 ,
32
To get a good picture of the construction it is useful to imagine a simple Riemannian
manifold like the unit sphere instead of a Lorentzian manifold.

–190– version 20/01/2010


Chapter 9. Beyond Minkowski The equivalence principle

which implies that


Γµνσ p = 0 . (9.71)
We conclude that, in Riemann normal coordinates in a neighborhood of a point
p, the spacetime looks like Minkowski to zeroth and first order. In particular,
the equation of motion of test particles (geodesic equation) is ẍµ |p = 0 and
ẍµ ≈ 0 in a neighborhood of p because of (9.71).

There exists a slightly more difficult construction of coordinates in a neigh-


borhood of a given timelike geodesic (instead of a point); these are the Fermi-
Walker coordinates. The final statements are similar: In these coordinate,
along the geodesic, we have

gµν = ηµν and Γµνσ = 0 .

Therefore, in a sufficiently small neighborhood of the geodesic, the spacetime


looks like Minkowski space. This is the mathematical formulation of the equiva-
lence principle. We have used a freely falling frame of reference to establish local
approximate equivalence of the curved spacetime and flat Minkowski space.

version 20/01/2010 –191–


The equivalence principle Chapter 9. Beyond Minkowski

–192– version 20/01/2010


APPENDIX A

MATHEMATICAL BACKGROUND

A.1 Vector spaces and dual spaces

Conventions

Let us consider an abstract vector space V of dimension dim V = n over the


field of real numbers R. When we choose a basis

{e1 , e2 , . . . , en }

of V , then each vector v ∈ V admits a unique decomposition with respect to


this basis, or, in other words, v can be written as a unique linear combination
of the basis vectors. In this way, we assign to each vector v an n-tuple of real
numbers which are called the components of the vector.1

To ‘formalize’ this we make use of conventions that are customary in several


branches of mathematics and theoretical physics, and in particular in relativity:

• The components of a vector are denoted by upper indices (contravariant


index notation).
1
Thereby, the abstract vector space V becomes isomorphic to Rn . Note, however, that this
isomorphism is not canonical (which means ‘not unique’ in this context).

version 20/01/2010 –193–


Vector spaces and dual spaces Appendix A. Mathematical background

• In expressions that contain a particular index twice, once as a subscript,


once as a superscript, it is implied that we sum over all possible values of
that index (Einstein summation convention).

The decomposition of a vector v can thus be written as

v = v i ei . (A.1)

The Einstein summation convention implies that summation overPi is under-


stood; equation (A.1) is merely a space saving way of writing v = ni=1 v i ei or
v = v 1 e1 +v 2 e2 +· · ·+v n en . The real numbers v 1 , v 2 , . . . , v n are the components
of the vector v.
Remark. In several instances Greek indices are used instead of Latin indices. In
relativity, the use of Greek indices is standard2 in the case of four dimensions,
i.e., n = 4. For example, we would write

v = v µ eµ . (A.2)

In this context, Greek indices are assumed to run from zero to three, i.e., the
basis is {e0 , e1 , e2 , e3 }, the components of the vector are v 0 , v 1 , v 2 , v 3 , and (A.2)
means v = v 0 e0 + v 1 e1 + v 2 e2 + v 3 e3 . (Note again that there’s nothing deep in
this—we are merely discussing conventions of writing up things.)

It is common not to distinguish between the abstract vector v and the collection
of its components w.r.t. the chosen basis3 ; we therefore typically write
 
v1
.
v =  ..  . (A.3)
v n

It is important to note that (A.3) tacitly assumes that a basis has been selected;
otherwise (A.3) does not make sense.

Rather confusing, at first sight at least, is the abstract index notation. Adopting
this notation (A.3) becomes  
v1
 .
v i =  ..  . (A.4)
v n

2
Unfortunately not all relativists follows this convention—some stick to Latin indices.
3
This is true when we consider only one (fixed) basis. However, when we deal with two
different bases at the same time, a distinction is useful; see (A.31) below.

–194– version 20/01/2010


Appendix A. Mathematical background Vector spaces and dual spaces

Here, v i does not denote the ith component of the vector, but the collection of
all components i = 1, 2, . . . , n, and hence the vector itself. In abstract index
notation we would typically write “Consider a vector v i ∈ V and . . . ”, where the
superscript i is not an actual index but merely a dummy indicating the vector
character of the object; this is particularly useful, when we deal simultaneously
with vectors, covectors, and tensors, see below. In this script we use both the
standard notation and the abstract index notation.
Remark. In relativity, in dimension four, we would write
 0
v
 v1 
vµ =  
v 2  (A.5)
v3

in abstract index notation. Again, µ does not denote any particular value, but
the collection of all µ = 0, . . . , 3.

The dual space

Consider a vector space V of dimension n over the real numbers. We now define
the dual space V ∗ associated with V .

Definition A.1. The dual space V ∗ is the space of linear functionals on V .


Elements of V ∗ are called covectors.

Let a ∈ V ∗ ; by definition, a is a map

a :V →R (A.6a)
v 7→ a(v) ∈ R , (A.6b)

that is linear, i.e.,

a(v + λw) = a(v) + λa(w) for all v, w ∈ V, λ ∈ R . (A.6c)

As a matter of course, V ∗ is a vector space, since addition and scalar multipli-


cation of elements of V ∗ are well-defined (and obey the vector space axioms).
Namely, for a1 ∈ V ∗ , a2 ∈ V ∗ , and λ ∈ R, a1 + λa2 is defined as the map
v 7→ a1 (v) + λa2 (v), which is again in V ∗ .

version 20/01/2010 –195–


Vector spaces and dual spaces Appendix A. Mathematical background

In principle, a basis on V ∗ can be chosen independently of a basis on V ; how-


ever, when the vector space V is equipped with a basis {e1 , e2 , . . . , en } it is
convenient to use the so-called dual basis (or: co-basis)

{e1 , e2 , . . . , en } ; (A.7)

note the convention that co-basis vectors are indexed by superscripts. In order
to define the co-basis vector ei , since it is a map of the type (A.6), we must
prescribe how it acts on vectors v ∈ V . We define

ei (v) = v i , (A.8)

i.e., ei applied to a vector yields the ith component of the vector. Since v = v j ej ,
we have
ei (v) = ei (v j ej ) = v j ei (ej )
|{z}
vi

for all v. We thus see that an equivalent definition of ei is

ei (ej ) = δij . (A.8′ )

Note in passing that it is not difficult to prove that the dimension of V ∗ coin-
cides with the dimension of V , as we have tacitly presupposed above.

For an arbitrary covector a ∈ V ∗ we have the decomposition

a = ai ei , (A.9)

where the Einstein summation convention is used. By convention, the covector


components are denoted by lower indices: a1 , a2 , . . . , an . These components
are collected into a row vector, i.e.,

a = (a1 , a2 , . . . , an ) . (A.10)

Remark. In abstract index notation we would write “Given a covector a i ∈ V ∗ ,


then . . . ”, and
ai = (a1 , a2 , . . . , an )
instead of (A.10).

Let v ∈ V and a ∈ V ∗ ; by definition, a(v) is a real number. Making use of the


component decomposition it is simple to compute a(v):

a(v) = ai ei (v) = ai v i , (A.11)

–196– version 20/01/2010


Appendix A. Mathematical background Transformation of (co)vectors

where summation is always understood. As a special case, when a is applied


to a basis vector ej ∈ V , we obtain

a(ej ) = ai ei (ej ) = ai δij = aj . (A.12)

Hence when the covector a is applied to the basis vector ej we obtain the
j th component of the covector, aj . (This equation should be compared with
equation (A.8).)

A.2 Transformation of (co)vectors

In this section we analyze how vectors and covectors transform4 under a change
of basis. To that end suppose that the vector space V is equipped with two
different bases,

{ê1 , . . . , ên } versus {ĕ1 , . . . , ĕn } .

The decomposition of a vector v w.r.t. the first basis is

v = v̂ i êi , (A.13)

which we prefer to write as  


v̂ 1
.
v =  ..  . (A.13′ )
v̂ n
Analogously, the decomposition of v w.r.t. the second basis is

v = v̆ i ĕi , (A.14)

which we choose to write as  


v̆ 1
.
v =  ..  . (A.14′ )
v̆ n
It is important to keep in mind that (A.13′ ) and (A.14′ ) only make sense w.r.t.
the bases we use. To avoid ambiguities it necessary to always specify the basis.
4
Note that (co)vectors do not change under a change of basis. It is merely the components
of the (co)vectors w.r.t. the different bases that differ (and can be transformed into each
other).

version 20/01/2010 –197–


Transformation of (co)vectors Appendix A. Mathematical background

For instance, we could write


 
v̂ 1
 .. 
v= .  w.r.t. {ê1 , . . . , ên } (A.15)
v̂ n

and 
v̆ 1
.
v =  ..  w.r.t. {ĕ1 , . . . , ĕn } . (A.16)
v̆ n
Remark. In relativity, bases correspond to observers. Hence, when we introduce
a (four-)vector we must specify the observer, w.r.t. which we decompose the
vector.

The decomposition of a covector a ∈ V ∗ w.r.t. the co-bases {ê1 , . . . , ên } and


{ĕ1 , . . . , ĕn } is analogous. We have

a = âi êi and a = ăi ĕi , (A.17)

or, equivalently,

a = (â1 , . . . , ân ) w.r.t. {ê1 , . . . , ên } (A.17′ a)


a = (ă1 , . . . , ăn ) w.r.t. {ĕ1 , . . . , ĕn } . (A.17′ b)

Now suppose that the two bases are related via a basis transformation, i.e.,

ĕi = Aj i êj , (A.18)

where the coefficients Aj i can be collected into a non-singular (n × n) matrix.


We obtain

v = v̂ i êi

v = v̆ k ĕk = v̆ k Aik êi = Aik v̆ k êi ,

therefore,

v̂ i = Aik v̆ k . (A.19)

For the transformation of the dual basis we make the ansatz

–198– version 20/01/2010


Appendix A. Mathematical background Bilinear forms

ĕi = B ij êj . (A.20)

We then obtain
δji = ĕj (ĕi ) = B jl êl (Aki êk ) = B jl Aki êl (êk ) = B jk Aki ,
|{z}
δlk
which implies that B is the inverse of A,

B = A−1 , i.e., B ik = (A−1 )ik . (A.21)

Finally, for covectors we get


a = âi êi

a = ăk ĕk = ăk B ki êi = B ki ăk êi ,
so that

âi = B ki ăk . (A.22)

We conclude that
a(v) = âi v̂ i = B ki ăk Ail v̆ l = B ki Ail ăk v̆ l = ăk v̆ k , (A.23)
| {z }
δkl
i.e., we find consistency.

A.3 Bilinear forms

Non-degenerate symmetric bilinear forms

Let V be a finite-dimensional real vector space. In order to be able to define


the concepts of orthogonality, the length of a vector, and related concepts, the
vector space must be endowed with an additional geometric structure: a scalar
product or a generalization thereof.
Definition A.2. A non-degenerate symmetric bilinear form b on V is a map
b:V ×V →R (A.24)
that satisfies the conditions of

version 20/01/2010 –199–


Bilinear forms Appendix A. Mathematical background

• bilinearity: for all u, v, w ∈ V and λ ∈ R we have

b(u + λv, w) = b(u, w) + λb(v, w)


(A.25a)
b(u, v + λw) = b(u, v) + λb(u, w) ;

• symmetry: for all v, w ∈ V there holds

b(v, w) = b(w, v) ; (A.25b)

• non-degeneracy: the following implication for v ∈ V holds:

b(v, w) = 0 ∀w ∈ V ⇒ v =0. (A.25c)

Since b is a map with two slots, we will occasionally write b(·, ·). This is
reminiscent of the notation h·, ·i that is employed for scalar products.
Example. A scalar product h·, ·i is defined as a positive definite symmetric
bilinear form. It is the prime example for a non-degenerate symmetric bilinear
form. To show this, we must prove that positive definiteness implies non-
degeneracy, cf. (A.25c): Thus, for v ∈ V assume that hv, wi = 0 ∀w ∈ V . Since
this holds for all w it holds necessarily also for w = v, i.e., hv, vi = 0. However,
the requirement of positive definiteness implies that hv, vi is always positive
unless v = 0. Hence, from hv, vi = 0 we conclude that v = 0. We have thus
established (A.25c).
Definition A.3. Let {e1 , e2 , . . . , en } be a basis of V . The components of the
bilinear form b = b(·, ·) are given by applying b to the basis vectors: we define
bij as
bij = b(ei , ej ) . (A.26)
Remark for experts. The resemblance between (A.12) and (A.26) is not a co-
incidence. In fact, a bilinear form is a co-tensor of rank 2, i.e., an element
of V ∗ ⊗ V ∗ , namely b = bij ei ⊗ ej (cf. a = ai ei ). Its behavior thus naturally
generalizes the behavior of covectors.

The components bij can be collected into an (n×n) matrix, which we call again
b in slight abuse of notation,
 
b11 b12 · · · b1n
  b21 b22 · · · b2n 
 
b = bij i,j =  . .. .. ..  . (A.27)
 .. . . . 
bn1 bn2 · · · bnn

–200– version 20/01/2010


Appendix A. Mathematical background Bilinear forms

This matrix is obviously symmetric, because b(·, ·) is symmetric. Furthermore,


it is not difficult to show that non-degeneracy is equivalent to the matrix b
being non-singular.

Using the component representation of the bilinear form we obtain for v, w ∈ V :

b(v, w) = b(v i ei , wj ej ) = v i wj b(ei , ej ) = bij v i wj . (A.28)

This is evidently the (tensor) analog of (A.11). In matrix notation, when we


use (A.3) and (A.27) we can write (A.28) as
   1
b11 · · · b1n w
T 1 n  .. .. ..   ..  .
b(v, w) = v b w = (v , . . . , v )  . . .  .  (A.28′ )
b1n · · · bnn wn

Example. The standard scalar product on R3 can be written as


  
1 w1
hv, wi = δij v i wj = v T 1w = (v 1 , v 2 , v 3 )  1  w 2  = v T w .
1 w3

Example. Consider the two-dimensional plane with the bilinear form


   1
1 −2 w
b(v, w) = v 1 w1 − 2v 1 w2 − 2v 2 w1 − 2v 2 w2 = (v 1 , v 2 ) , (A.29)
−2 −2 w2

the definition of which is given w.r.t. some chosen basis {e1 , e2 }. This bilinear
form is clearly symmetric and non-degenerate, since the matrix is symmetric
and non-singular.
Remark. Equation (A.28′ ) can be written in another form that is used fre-
quently, namely as
b(v, w) = v T b w = hv, b wi , (A.28′′ )

where h·, ·i denotes the standard scalar product w.r.t. the basis under con-
sideration.5 In quantum mechanics it is customary to write hv|b|wi instead
of (A.28′′ ).
5 ′′
There is a slight subtlety involved here that we choose to suppress. In (A.28
` i ´ ), strictly
speaking, b denotes the endomorphism on V whose matrix representation b j i,j is given
by (A.27).

version 20/01/2010 –201–


Bilinear forms Appendix A. Mathematical background

Orthogonality

A non-degenerate symmetric bilinear form b = b(·, ·) can be regarded as a


generalized scalar product, since it can be employed to define several geometric
concepts that are usually associated with a scalar product in a similar way.
The most important of these concepts is the concept of orthogonality.

Definition A.4. Two vectors v and w in V are called (pseudo-)orthogonal, if


b(v, w) = 0.

Since b(·, ·) is in general not the standard scalar product, the concept of or-
thogonality differs from the one we are used to in Euclidean geometry. This is
illustrated by the following example.
Example. Consider again the two-dimensional plane with the bilinear form
given by (A.29). A simple calculation shows that the vectors
   
1 2
v= and w=
0 1

are orthogonal.

Several basic properties carry over from Euclidean geometry, an important one
being the following:

Proposition A.5. Let V be an n-dimensional vector space endowed with a


non-degenerate symmetric bilinear form and let v ∈ V . Then the orthogonal
complement of v is an (n − 1)-dimensional subspace of V .

Proof. The orthogonal complement of v is the set of all vectors w orthogonal


to v, i.e., {w | b(v, w) = 0}. Define v̄ = b v, where b denotes the (symmetric)
matrix (A.27). Since b is non-singular (so that ker b = 0), it follows that v̄ 6= 0.
According to (A.28′′ ) we can write

b(v, w) = b(w, v) = hw, b vi = hw, v̄i = hv̄, wi .

Therefore, b(v, w) = 0 if and only if hv̄, wi = 0, and the orthogonal complement


of v coincides with the standard Euclidean orthogonal complement of v̄. Since
the latter is clearly a (n − 1)-dimensional subspace of V , the proposition is
established.

–202– version 20/01/2010


Appendix A. Mathematical background Transformation of bilinear forms

In Euclidean
p geometry, the length of a vector v is given as its norm kvk, where
kvk = hv, vi. A non-degenerate symmetric bilinear form b = b(·, ·), however,
does not define a norm in general. This is simply because there might exists
vectors v such that b(v, v) = 0 or b(v, v) < 0.
Example. Consider again the two-dimensional plane with the bilinear form
given by (A.29). There exist vectors v such that b(v, v) < 0; for example,
 
0
for v = we have b(v, v) = −2 .
1

Although we are thus unable to define norms in the proper sense, we see that
we can easily ascribe the “squared norm” b(v, v) to each vector v. This concept
is of central importance in non-Euclidean geometry and in applications.
Remark. Occasionally (and in particular in relativity) we speak of the square
of a vector v. Despite being prone to confusion, it is also common to write v 2
for the expression b(v, v). (This is reminiscent of the notation ~v 2 = h~v , ~v i in
Euclidean space R3 .)

Based on the above geometric implications of non-degenerate bilinear forms


one often speaks of pseudo-scalar products. To emphasize this, one often goes
so far as to use h·, ·i to denote a bilinear form b(·, ·) even though it might not
be positive definite. We conclude this section with a statement that completes
Proposition A.5.
Proposition A.6. Let V be an n-dimensional vector space endowed with a
pseudo-scalar product b(·, ·). Let v ∈ V and denote by v ⊥ the orthogonal com-
plement of v. Then
v ∈ v ⊥ ⇔ b(v, v) = 0 . (A.30)
Accordingly, when b(v, v) 6= 0, then v 6∈ v ⊥ and V = hvi ⊕ v ⊥ .

Proof. The proof is trivial when we recall that v ⊥ = {w | b(v, w) = 0}.

A.4 Transformation of bilinear forms

It is almost a tautological remark when we say that the component representa-


tion of a pseudo-scalar product b(·, ·), in either of its forms (A.28) or (A.28′ ),
depends on the basis we choose. In the following we investigate how the com-
ponent representation (bij )i,j transforms under a change of basis.

version 20/01/2010 –203–


Transformation of bilinear forms Appendix A. Mathematical background

Bilinear forms and basis transformations

Suppose we have two bases in V ,

{ê1 , . . . , ên } versus {ĕ1 , . . . , ĕn } .

The decomposition of the vector v w.r.t. the first basis is6


 
v̂ 1
 .
v = v̂ i êi , or equivalently v ֒→ v̂ =  ..  . (A.31a)
v̂ n

Analogously, the decomposition of v w.r.t. the second basis is


 
v̆ 1
i  .. 
v = v̆ ĕi , or equivalently v ֒→ v̆ =  .  . (A.31b)
v̆ n

Evidently, the analogous decompositions hold for the vector w. Also the bilinear
form b = b(·, ·) possesses two component representations, which are given by

b̂ij = b(êi , êj ) versus b̆ij = b(ĕi , ĕj ) .

Accordingly, via (A.28) and (A.28′ ), b(v, w) becomes

b(v, w) = b̂ij v̂ i ŵj = v̂ T b̂ ŵ , (A.32a)


i j T
b(v, w) = b̆ij v̆ w̆ = v̆ b̆ w̆ . (A.32b)

Now, the two bases are related via a basis transformation, i.e.,

ĕi = Aj i êj ,

where the coefficients Aj i can be collected into a non-singular (n × n) matrix.


This implies that

b̆ij = b(ĕi , ĕj ) = b(Aki êk , Alj êl ) = Aki Alj b(êk , êl ) = Aki Alj b̂kl , (A.33)

or in matrix notation
b̆ = AT b̂ A . (A.33′ )
6
Recall that the representation of a vector v as a column vector presupposes the choice of
a basis, see (A.3). Although this is a trivial fact, it tends to be forgotten easily.

–204– version 20/01/2010


Appendix A. Mathematical background Transformation of bilinear forms

Based on (A.33) we can write (A.32) as

b(v, w) = b̂ij v̂ i ŵj = Aki Alj b̂kl v̆ i w̆j . (A.34)

We see that when b̂ij is used to compute b(v, w) in hatted coordinates, then
Aki Alj b̂kl is used in breve coordinates. We have thus proved the following
proposition:

Proposition A.7. On an n-dimensional vector space consider a bilinear form


whose component representation is (bij )i,j w.r.t. some basis {e1 , . . . , en }. With
respect to a different basis {ē1 , . . . , ēn }, the component representation is given
by
b̄ij = Aki Alj bkl , (A.35)

which corresponds to AT b A in matrix notation, where A = (Aij )i,j is the basis


transformation matrix, ēi = Aki ek .

Remark. The transformation behavior (A.35) does not come as a surprise. In


fact, it reflects the standard transformation of general co-tensors Ti1 ···in , since
bij can be viewed as a co-tensor of rank 2.
Example. Consider again the two-dimensional plane that is endowed with the
bilinear form b = b(·, ·), whose component representation w.r.t. some given
basis {e1 , e2 } is (A.29), i.e.,
 
 1 −2
bij i,j
= . (A.36)
−2 −2

Now consider a second basis, {ē1 , ē2 }, which is related to the original basis via
a basis transformation, i.e.,
ēi = Aki ek ,

where √ √ 
  1
Aij =√ √2 −2
√ 3 .
i,j 30 2 2 3

W.r.t. the new basis {ē1 , ē2 } the bilinear form exhibits the component repre-
sentation  
 k l
 −1 0
b̄ij i,j = A i A j bkl i,j = . (A.37)
0 1
The relevance of this particular example will become clear below.

version 20/01/2010 –205–


Transformation of bilinear forms Appendix A. Mathematical background

Orthogonal transformations

Definition A.8. Consider a vector space endowed with a bilinear form b =


b(·, ·). A basis transformation Aij i,j that does not change the component
representation of the bilinear form, i.e.,
 
Aki Alj bkl = bij , (or equivalently AT b A = b) , (A.38)
i,j

is called (pseudo-)orthogonal basis transformation w.r.t. b(·, ·).

Example. Consider the two-dimensional plane with a bilinear form given by


 
 −1 0
bij i,j
= (A.39)
0 1

w.r.t. some basis {e0 , e1 } (where we follow the relativistic convention of in-
dexing, for a change). A straightforward computation shows that the basis
transformation determined by the matrix
  √ 
2 √
−1
Aij = .
i,j −1 2

is an example of an orthogonal basis transformation. This means that w.r.t.


the basis {ē0 , ē1 }, given by ēi = Aj i ej (i, j = 0, 1), the bilinear form reads
 
b̄ij = Aki Alj bkl = bij .
i,j

Remark. In our analysis we have focused on basis transformations on V and


on the associated transformation of the component representation of a bilinear
form b(·, ·). If we consider linear maps instead of basis transformations,7 the
discussion is similar: Consider a vector space V with bilinear form b(·, ·), a
vector space V with bilinear form b̄(·, ·), and a linear map A : V → V . Then
A is called orthogonal, if b̄(Av, Aw) = b(v, w) for all v, w ∈ V . Furthermore, if
(Aip )i,p is the matrix representing the linear map A (w.r.t. bases {e1 , . . . , en }

in V and {ē1 , . . . , ēn̄ } in V ), then (Ai p )i,p satisfies Aip Aj q b̄ij i,j = bpq . If
 
V , b̄(·, ·) = V, b(·, ·) , then this reduces to (A.38).
7
Linear maps are sometimes called active transformations, while basis transformations are
called passive transformations. We have chosen to use the nomenclature that is more
common in mathematics.

–206– version 20/01/2010


Appendix A. Mathematical background Transformation of bilinear forms

Orthonormal bases

In the context of scalar products, orthonormal bases play an important role.


This is because the component representation of a scalar product h·, ·i assumes
its simplest form w.r.t. an orthonormal basis: if {e1 , . . . , en } is an orthonormal
basis w.r.t. h·, ·i, then the component representation of h·, ·i is

hv, wi = δij v i wj = v T 1 w ,
P
or equivalently hv, wi = v T w = ni=1 v i wi . This is generalized to pseudo-scalar
products by virtue of the following theorem.

Theorem A.9. Let b(·, ·) be a non-degenerate symmetric bilinear form on V .


There exist adapted bases {e1 , . . . , en }, which we call orthonormal bases, such
that the component representation bij is diagonal and normalized, i.e.,
 
b = bij i,j = diag −1, −1, . . . , −1, +1, +1, . . . , +1 , (A.40)
| {z } | {z }
n− times n+ times

where n− and n+ are characteristic of the bilinear form b(·, ·).

Proof. We perform a proof by induction. For a one-dimensional vector space,


the statement of the theorem is trivial. Suppose that we have proved the
theorem for all vector spaces of dimension less than n. To show  that the theo-
rem also holds for n-dimensional vector spaces, let V, b(·, ·) be n-dimensional.
Take an arbitrary vector with non-zero square and call it e1 . (Such a vector
exists, since b(·, ·) is non-degenerate.) Since b(e1 , e1 ) 6= 0 we are able to rescale
this vector to obtain either b(e1 , e1 ) = +1 or b(e1 , e1 ) = −1. By Proposi-
tion A.5 the orthogonal complement of e1 is a (n − 1)-dimensional vector space
T (a subspace of V ), which is naturally endowed with a symmetric bilinear
form, namely the restriction b(·, ·)|T ×T of the bilinear form b(·, ·). This bilinear
form on T is non-degenerate as is shown by the following argument: Let v ∈ T .
Assume that b(v, w) = 0 for all w ∈ T ; then b(v, λe1 + w) = 0 for all λ ∈ R.
Since he1 i⊕T = V by Proposition A.6, it follows that b(v, w) = 0 for all w ∈ V .
Hence v = 0 by non-degeneracy of b(·, ·), which establishes non-degeneracy of
b(·, ·)|T ×T .

Now, by the induction hypothesis, the orthogonal complement T possesses an


orthonormal basis w.r.t. b(·, ·)|T ×T , which we denote by {e2 , . . . , en }. We have
thus constructed an orthonormal basis {e1 , e2 , . . . , en } of V, b(·, ·) , such that,
after a possible rearrangement of the basis vectors, we obtain (A.40). The fact

version 20/01/2010 –207–


Transformation of bilinear forms Appendix A. Mathematical background

that the number of minus-signs and plus-signs does not depend on the actual
choice of basis follows from dimensional considerations.

Definition A.10. The signature of a non-degenerate symmetric bilinear form


b(·, ·) is defined to be the distribution of minus-signs and plus-signs in the nor-
mal form (A.40) of b, i.e.,

sign b = | − − {z
... − −
} | {z } .
+ + . . . + + (A.41)
n− times n+ times

Example. The signature of the bilinear form (A.36) is (−+), since there exists
an orthonormal basis, w.r.t. which it takes the form (A.37).

W.r.t. an orthonormal basis the pseudo-scalar product of two vectors v and w


can be written as
 
−1
 .. 
 . 
  X X

T −1 
b(v, w) = v  w = − v i i
w + v i wi .
1 
  i≤n1 i>n1
 .. 
 . 
1
As with conventional scalar products, an orthonormal basis w.r.t. b(·, ·) is not
unique. If we have one orthonormal basis {e1 , . . . , en }, then other orthonor-
mal bases can be obtained via orthogonal basis transformations, since these 
transformations do not change the form b = diag −1, . . . , −1, +1, . . . , +1 .

The signature of a non-degenerate symmetric bilinear form b(·, ·) can be ob-


tained easily by computing the eigenvalues of its component representation bij .
We formulate this as another theorem:
Theorem A.11. The signature of b(·, ·) is given by the number of negative
versus the number of positive eigenvalues of b, i.e.,
sign b =(−
| − {z
... − − + + {z
} | ... + +
}) ⇔
n− times n+ times

⇔ b possesses n− negative and n+ positive eigenvalues .

Proof. The proof of this theorem can also be regarded as an alternative proof
of Theorem A.9. Starting from a given basis {e1 , . . . , en }, where b(·, ·) is rep-
resented by bij , we explicitly construct an orthonormal basis {ē1 , . . . , ēn }.

–208– version 20/01/2010


Appendix A. Mathematical background Transformation of bilinear forms

To that end recall from (A.35) that with ēi = Aj i ej the components of bij
transform according to b̄ij = Aki Alj bkl , or, in matrix notation,

b̄ = AT b A . (A.42)

Viewed as a matrix, b is symmetric and non-singular; therefore, b can be diag-


onalized by means of an orthogonal matrix O,

O−1 b O = OT b O = diag λ1 , . . . , λn . (A.43)

The eigenvalues λi are real (since b is symmetric) and non-zero (since b is not
singular). Assume that there exist n− negative eigenvalues and n+ positive
eigenvalues. Without loss of generality we may arrange the eigenvalues in such
an order that λ1 < 0, . . . , λn− < 0 and λn− +1 > 0, . . . , λn > 0. Let us
introduce the matrix
p 
|λ1 |
 .. 
Λ= . ;
p
|λn |

the diagonal matrix diag (λ1 , . . . , λn ) admits the following (unique) decompo-
sition  
diag λ1 , . . . , λn = Λ diag −1, . . . , −1, +1, . . . , +1 Λ .
| {z } | {z }
n− times n+ times

Combining this equation with (A.43) we find

Λ−1
|
T
{zO} b O Λ−1
| {z } = diag (−1, . . . , −1, +1, . . . , +1) .
| {z } | {z }
AT A n− times n+ times

Therefore, by defining A = Aij i,j
to be A = O Λ−1 , we achieve

AT b A = b̄ = diag −1, . . . , −1, +1, . . . , +1) , (A.44)


| {z } | {z }
n− times n+ times

as intended. We have thus found a orthonormal basis {ē1 , . . . , ēn }, in which


the components of b(·, ·) are diagonal and normalized,

b(ēi , ēj ) = b̄ij = diag −1, . . . , −1, +1, . . . , +1 .
| {z } | {z }
n− times n+ times

This ends the proof of the theorem.

version 20/01/2010 –209–


Raising and lowering of indices Appendix A. Mathematical background

A.5 Raising and lowering of indices

A non-degenerate symmetric bilinear form b(·, ·) generates a canonical map


between V and its dual space V ∗ . Using the musical notation it is called ♭
(“flat”) and its inverse is ♯ = ♭−1 (“sharp”):

♭ :V → V ∗ (A.45a)
♯ = ♭−1 :V ← V ∗ (A.45b)

The map ♭ : V → V ∗ is defined according to

♭ :V → V ∗ (A.46a)

v 7→ ♭(v) := b(v, ·) ∈ V ; (A.46b)

in other words, when v is a vector, ♭(v) is a covector that takes arguments


w ∈ V according to
♭(v)(w) = b(v, w) . (A.47)
The map ♭ is a map of rank n, which relies on the non-degeneracy of b(·, ·). In
other words, ♭(v) 6= 0 whenever v 6= 0.

Written in components, we see that this abstract definition lays the foundation
for the procedure commonly known as “lowering of indices”. Let us write

v = v i ei and ♭(v) = vi♭ ei . (A.48)


|{z}
v♭
To obtain the i th component vi♭ of the covector we use (A.12):

vi♭ = v ♭ (ei ) = ♭(v)(ei ) = b(v, ei ) = v j b(ej , ei ) = bji v j = bij v j .

It is customary to drop the ♭ and simply write

vi = bij v j . (A.49)

Although this notation is slightly misleading, it offers so many advantages that


we will always use it. It is important, however, to always keep in mind that
while v i ∈ V , vi is not a vector but a covector, vi ∈ V ∗ . The abstract index
notation is particularly useful in this context.

The counterpart of (A.49) is the procedure we call “raising of indices” that


arises when we use the map ♯ instead of ♭:

v i = bij vj . (A.50)

–210– version 20/01/2010


Appendix A. Mathematical background Affine geometry


Since ♯ = ♭−1 , bij i,j
denotes the inverse matrix of bij , i.e.,

bij bjk = δik . (A.51)

It suggests itself that in order to lower/raise indices of a general tensor we have


to contract every index, i.e.,

T i1 ···inj1 ···jm = bi1 k1 · · · bin kn bj1 l1 · · · bjm lm Tk1 ···kn l1 ···lm .

Remark. By definition, we obtain bij from bij when we take the inverse matrix.
However, it is equally possible to compute bij from bij by raising both indices.
To see that we compute

bij = δil bjl = bik bkl bjl = bik bjl bkl ,

which proves the claim.

Using raising and lowering of indices provides a means to write the pseudo-
scalar product in a convenient form. Namely, instead of b(v, w) = bij v i wj we
are able to write
b(v, w) = v i wi = vi wi . (A.52)
Accordingly, the square of a vector becomes

v 2 = b(v, v) = v i vi = vi v i . (A.53)

In application in relativity, where we use Greek indices that run from zero to
three, expressions of this kind are abundant:

v µ wµ , v 2 = v µ vµ , etc.

A.6 Affine geometry

Alternative A: Non-technical introduction

Consider an abstract vector space V . By virtue of the vector space axioms,


elements of V (a.k.a. vectors) can be added and multiplied with scalars. In
particular, there exists one element of V that is distinguished from all other
elements: the zero vector. The world of vectors is thus not a world where
equality reigns. Let us go over to affine spaces.

version 20/01/2010 –211–


Affine geometry Appendix A. Mathematical background

Definition A.12. An affine space over a vector space V is a set A that can
be identified with V provided that one (arbitrary) element is distinguished as
the origin. The elements of an affine space are called points

An affine space can thus be viewed as a “forgetful vector space”—a vector space
that has lost its information on the zero vector. By actively choosing an origin,
the affine space becomes a vector space again. Note, however, the immanent
arbitrariness: any point can be chosen as the origin. As a consequence, all
points in an affine space are equal. There is a price to pay, though:

Although, by definition, we can choose an origin o (which can be any point of


A) and turn A into a vector space,
A ∋ p ←→ −

op ∈ V ,
this does not provide a sensible way of defining addition and scalar multi-
plication. This is because the definition of these operations cannot be made
independently of the concrete choice of origin. The complete randomness of
the choice of the origin o thus prohibits that addition and scalar multiplication
of points of A can be defined canonically.
Example. Consider the affine 2-plane A2 over R2 . There is no canonical pre-
scription defining addition of points p, q ∈ A2 ; neither does λp for λ ∈ R make
sense. When we distinguish one point o ∈ A2 , then A2 becomes a vector space
with vectors −→
op, −

oq,. . . . Although it is possible to add −→+−
op →
oq the result is
clearly not invariant under a change of origin (and thus pretty useless).

While points in A cannot be added nor multiplied with scalars, it is intuitively


clear that we can “add” points of A and vectors of V , i.e., for any p ∈ A,
v ∈ V , the “sum” p + v is a well-defined element of A (and this procedure is
independent of any choice of origin).

Appealing to our mathematical intuition once again, we see that for any pair
of points p, q ∈ A there exists a unique vector v such that p + v = q; one writes
v = q − p or v = −→ Loosely speaking, we can say that (a copy of) the vector
pq.
space V is attached to each point p ∈ A.

Alternative B: Technical introduction

We begin by introducing the concept of a group action, which is relevant for


many applications in theoretical physics. Let X denote a set and G a group.

–212– version 20/01/2010


Appendix A. Mathematical background Affine geometry

Definition A.13. A group G is said to act on a set X (on the left) if every
g ∈ G induces a bijective map gX on the set X, i.e.,

gX : X → X (A.54a)
x 7→ g · x , (A.54b)

such that

g1 · (g2 · x) = (g1 · g2 ) · x ∀x ∈ X , ∀g1 , g2 ∈ G , (A.55a)


e·x =x ∀x ∈ X , (A.55b)

where e denotes the identity element of the group G.


Remark. The conditions (A.55) ensure that the mapping G → Bij(X), g 7→ gX
is a homomorphism of groups.

The notation
x 7→ g · x
is chosen to suggest that elements of X can be “multiplied” with elements of g
(on the left). In some cases (when the group operation of G is denoted by a
plus sign instead of by a dot) it is preferable to write

x 7→ x + g

instead of x 7→ g · x, to suggest that one can “add” an element of G to an


element of X. Accordingly, equation (A.55) then reads

(x + g1 ) + g2 = x + (g1 + g2 ) , x+0=x,

where now 0 denotes the identity element of G. (We have chosen an action on
the right in this case; however, since (G, +) is commonly us for abelian groups,
this makes no actual difference.)

It is important to note that group action is neither multiplication nor addition


in the standard sense (groups, rings, fields), since those standard algebraic
operations are defined on one single set only, namely as maps G × G → G.
Example (Scalar multiplication in vector spaces). Consider a vector space V
over the field of the real numbers R. The vector space axioms guarantee the
existence of an action of (the multiplication group underlying) R on V . This
is the scalar multiplication

v 7→ λ · v (λ ∈ R, v ∈ V ) . (A.56)

version 20/01/2010 –213–


Affine geometry Appendix A. Mathematical background

For all λ, µ ∈ R we have

λ · (µ · v) = (λµ) · v and 1 · v = v .

It is standard to omit the dots and simply write λ(µv) = (λµ)v and 1v = v.
Example. The general linear group GL(n, R) (special linear group, orthogonal
group,. . . ) acts on the vector space Rn as a group of endomorphisms.

There exists several types of group actions. We merely discuss the type that
is essential for our purposes: A group action is called simply transitive, if for
any two elements x1 , x2 ∈ X there exists exactly one element g ∈ G such that
g · x1 = x2 (or x1 + g = x2 in the ‘+’ notation).
Definition A.14. An affine space over a vector space V is a set A, on which
the (abelian group underlying the) vector space V acts simply transitively. The
elements of an affine space are called points.

Hence, for all p ∈ A, for all v ∈ V , there exists the action

p 7→ p + v . (A.57)

Furthermore, since the action is assumed to be simply transitive, for each pair
of points p, q ∈ A there exists a unique vector v such that p + v = q. One
typically writes v = q − p or v = −

pq.

It follows that, if we distinguish one point in A, which we call o (the origin),


then each point p ∈ A can be identified with − → ∈ V and conversely,
op

A ∋ p ←→ −

op ∈ V .

In this way, A can be identified with the underlying vector space V . However,
this identification is not canonical, since the choice of origin is completely free.

Affine coordinates

A natural question to ask is how an affine space A can be endowed with a


coordinate system. In this context a coordinate system—or chart— is simply
a bijective map from Rn in A; it assigns each point p in A a real n-tuple (‘its
coordinates’). Since the affine space A is modeled over a vector space V , it
suggests itself to use this structure in order to construct a coordinate system.
Let us begin with the following definition.

–214– version 20/01/2010


Appendix A. Mathematical background Affine geometry

Definition A.15. Let A be an affine space over  an n-dimensional vector space


V . An affine basis is a tuple o, {e1 , . . . , en } , where o is a point in A (‘the
origin’) and {e1 , . . . , en } a basis of V .

By using an affine basis we obtain coordinates on A in a simple way: Let p ∈ A.


To assign coordinates to p, we first note that p = o + x, where x = −

op ∈ V , i.e.,

A ∋ p ←→ x = −
→ ∈ V ←→ xi ∈ Rn .
op

Obviously, the vector x ∈ V is described by its components w.r.t. the basis,


i.e., x = xi ei ,
A ∋ p ←→ x = − → ∈ V ←→ xi ∈ Rn .
op
Hence the coordinates of p are given by

A ∋ p ←→ x = −

op ∈ V ←→ xi ∈ Rn . (A.58)

Definition A.16. A coordinate map Rn → A of the type (A.58) is called an


affine chart or an affine coordinate system.

Suppose we have two affine bases on an affine space A, i.e.,


 
o, {e1 , . . . , en } versus ō, {ē1 , . . . , ēn } ,

where the bases {e1 , . . . , en } and {ē1 , . . . , ēn } on V are related by the basis
transformation8
ei = Gj i ēj , (A.59a)
where Gik is a non-singular matrix. Let v i denote the components of some
vector w.r.t. the basis {e1 , . . . , en } and v̄ i its components w.r.t. {ē1 , . . . , ēn }.
Then (A.59a) induces the transformation

v̄ i = Gik v k (A.59b)

on the vector components.

Now consider a point p ∈ A, whose affine coordinates w.r.t. the two affine bases
are

p ↔ xi ∈ Rn p ↔ x̄i ∈ Rn
 
w.r.t. o, {e1 , . . . , en } w.r.t. ō, {ē1 , . . . , ēn }
8
Previously, in section A.4, we typically used the form ēi = Aj i ej for a basis transforma-
tion and called (Akl )k,l the basis transformation matrix. The previous equation clearly
corresponds to (A.59a) with A = G−1 .

version 20/01/2010 –215–


Affine geometry Appendix A. Mathematical background

respectively, where x = xi ei = − → and x̄ = x̄i ē = −


op

ōp. Hence the relation
i

→ −

ōp = −

op + ōo can simply be rewritten as x̄i ēi = xi ei + āi ēi . The vector āi ēi


represents the translation between o and ō, i.e., āk ēk = ōo. Using (A.59a) it is
simple to convince oneself that the affine coordinates xi and x̄i are related by
x̄i = Gik xk + āk . (A.60)
Equation (A.60) represents a change of affine coordinates. It is illustrative to
juxtapose (A.59b) and (A.60).

As a matter of course it is not necessary to use affine coordinates on A. Instead


we can employ any set of curvilinear coordinates that come to mind: spherical
coordinates, cylindrical coordinates, . . . . However, affine coordinates exhibit
one characteristic property that makes them stand out from the crowd of all
possible coordinate systems:
Proposition A.17. Let A be an affine space over a real9 vector space. A
coordinate system ϕ : Rn → A is an affine coordinate system if and only if all
straight lines in A appear as straight lines in Rn .
Remark. We must append the following definitions: An m-dimensional affine
subspace of A is a set p + hv1 , v2 , . . . , vm i, where {v1 , . . . , vm } is a set of
m linearly independent vectors. One-dimensional affine subspaces are called
(straight) lines.

For a proof of the proposition see any textbook on affine geometry.

Orthonormal affine coordinates

Remark. Suppose that the vector space V is endowed with a pseudo-scalar


product b(·, ·). Under a change of basis (A.59a), the components transform
according to
bij = Gki Glj b̄kl ,
where bij are the components of the pseudo-scalar product w.r.t. the basis
{e1 , . . . , en } and b̄ij the components w.r.t. {ē1 , . . . , ēn }. This suggests to define
a subclass of affine coordinate changes (A.60) that is of particular significance:
(pseudo-)orthogonal coordinate changes. In this case, Gij is a (pseudo-)ortho-
gonal basis transformation so that the component representation bij remains
unaffected by the coordinate change. The translation vector remains unspeci-
fied in this context.
9
The assumption that the field be R is important in this context.

–216– version 20/01/2010


Appendix A. Mathematical background Transformation of scalar fields

Definition A.18. Let A be an affine space over an n-dimensional vector space


V that is endowed with a pseudo-scalar
 product. An orthonormal affine ba-
sis is a tuple o, {e1 , . . . , en } , where o is a point in A and {e1 , . . . , en } an
orthonormal basis of V .

The affine coordinate system of A that is obtained by using an orthonormal


affine basis is called orthonormal affine coordinate system. W.r.t. these co-
ordinates, since {e1 , . . . , en } is an orthonormal basis of V , the pseudo-scalar
product b(·, ·) on V assumes the normal form
 
bij i,j = diag −1, −1, . . . , −1, +1, +1, . . . , +1 ,
| {z } | {z }
n− times n+ times

see Theorem A.9.

A.7 Transformation of scalar fields

Let A be a abstract space (like an affine space or a manifold) and let

ψ : Rn → A ψ ′ : Rn → A
(t, x) 7→ ψ(t, x) (t′ , x′ ) 7→ ψ ′ (t′ , x′ )

be two different coordinate systems on A (where the fact that the coordinates
are denoted by a tuple (t, x) instead of only x is of course completely irrelevant).
A particular point p ∈ A has thus two different coordinate representations
ψ ψ′
Rn −→ A ←− Rn
(t, x) 7→ p ←[ (t′ , x′ ) ,

and the coordinate transformation between the two charts is given by the map

ϕ = (ψ ′ )−1 ◦ ψ : Rn → Rn
(t, x) 7→ (t′ , x′ )

that maps (t, x) into (t′ , x′ ).

Consider a function

Φ :A → R
p 7→ Φ(p)

version 20/01/2010 –217–


Transformation of scalar fields Appendix A. Mathematical background

on the abstract space A. To obtain its coordinate representations we must


compose Φ with the charts. W.r.t. to the coordinates given by the chart ψ, the
function Φ looks like
ψ Φ
φ :(t, x) −→ p = ψ(t, x) −→ Φ(p) = Φ(ψ(t, x)) = (Φ ◦ ψ)(t, x) .

W.r.t. to the coordinates given by the chart ψ ′ we obtain

ψ′ Φ
φ′ :(t′ , x′ ) −→ p = ψ ′ (t, x) −→ Φ(p) = Φ(ψ ′ (t′ , x′ )) = (Φ ◦ ψ ′ )(t′ , x′ ) .

Hence, in brief, the coordinate representations of the abstract function Φ are

φ=Φ◦ψ and φ′ = Φ ◦ ψ ′ .

Therefore, we obtain
 
φ = Φ ◦ ψ = φ′ ◦ (ψ ′ )−1 ◦ ψ = φ′ ◦ (ψ ′ )−1 ◦ ψ = φ′ ◦ ϕ ,

i.e., the function φ and φ′ behave under coordinate transformations ϕ according


to
φ = φ′ ◦ ϕ ,
which can also be written as φ(t, x) = φ′ (ϕ(t, x)), or, since (t′ , x′ ) = ϕ(t, x), as

φ(t, x) = φ′ (t′ , x′ ) . (A.61)

This is the standard transformation behavior of scalar fields.

–218– version 20/01/2010


APPENDIX B

(ANTI) SYMMETRIZATION

Consider an arbitrary tensor Ti1 ···ik . Let P(1···k) denote the set of all k! per-
mutations of the k-tuple (1, 2, . . . , k). Then the symmetric part of the tensor
Ti1 ···ik is defined as

1 X
T(i1 ···ik ) := Tiσ(1) ···iσ(k) . (B.1)
k!
σ∈P(1···k)

Likewise, the totally antisymmetric part is defined as


1 X
T[i1 ···ik ] := sgn σ Tiσ(1) ···iσ(k) , (B.2)
k!
σ∈P(1···k)

where sgn σ = ±1 is the sign of the permutation.

Example. For a tensor Aij we have


   
A(ij) = 21 Aij + Aji , A[ij] = 1
2 Aij − Aji . (B.3)

Therefore,
Aij = A(ij) + A[ij] , (B.4)
which shows that every tensor of rank 2 (which corresponds to a matrix) can
be uniquely decomposed into its symmetric plus its antisymmetric part.

version 20/01/2010 –219–


Appendix B. (ANTI) SYMMETRIZATION

Example. For a tensor ai bj constructed from two covectors ai and bj we have


   
a(i bj) = 21 ai bj + aj bi , a[i bj] = 12 ai bj − aj bi . (B.5)

Example. A tensor Aij is symmetric if and only if Aij = Aji . Equivalently, we


can write
Aij = Aji ⇔ Aij = A(ij) ⇔ A[ij] = 0 . (B.6)
Analogously, a tensor Aij is antisymmetric if and only if

Aij = −Aji ⇔ Aij = A[ij] ⇔ A(ij) = 0 . (B.7)

Example. For a tensor Aijk we have


 
A[ijk] = 16 Aijk − Aikj + Ajki − Ajik + Akij − Akji (B.8)

and the analogous result for A(ijk) . Clearly, the analog of (B.4) does not hold.
Exercise. Let n be the dimension of the vector space  and let Ai1 ···ik be a tensor
of rank k (with k ≤ n). Show that A[i1 ···ik ] has nk independent entries, while

A(i1 ···ik ) has n+k−1
k .
 
Example. The tensor A[ij] has n2 independent entries, while A(ij) has n+1 2 .
The sum of the two is n2 , which equals the number of coefficients  of Aij ; com-
pare this result with (B.4). In contrast, the tensor A[ijk] has n3 independent

entries, while A(ijk) has n+2 2
3 . Show that the sum is n(n + 2)/3, which is less
than the n3 coefficients of Aijk .

When a symmetric object is contracted with an antisymmetric object, the result


is zero. For instance, let Aijk be totally antisymmetric and S ij symmetric.
Then

Aijk S jk = Ai[jk] S jk = 21 Aijk S jk − 21 Aikj |{z}


S jk = 21 Aijk S jk − 12 Aijk S jk = 0 .
S kj
(B.9)

When an (anti)symmetrized object is contracted with an (anti)symmetric ob-


ject, the (anti)symmetrization is superfluous. For instance, let Aijk be totally
antisymmetric and T ij an arbitrary object. Then

Aijk T [jk] = 12 Aijk T jk − 12 Aijk T kj = 12 Aijk T jk + 12 Aijk T jk = Aijk T jk . (B.10)


|{z}
−Aikj

–220– version 20/01/2010


Appendix B. (ANTI) SYMMETRIZATION

Example. A well-known totally antisymmetric tensor is the ǫ-tensor. In the


case of a three-dimensional vector space we have
(
+1 if sgn(ijk) = +1
ǫijk = (B.11a)
−1 if sgn(ijk) = −1 ,

and ǫijk = 0 otherwise. In the case of an n-dimensional vector space, the


definition is analogous, where ǫ now carries n indices. For example, in four
dimensions, (
+1 if sgn(αβγδ) = +1
ǫαβγδ = (B.11b)
−1 if sgn(αβγδ) = −1 ,
and ǫαβγδ = 0 otherwise. It is important to note that in an n-dimensional vector
space, every totally antisymmetric tensor of rank n must be proportional to
the ǫ-tensor, i.e., in n dimensions,

A[i1 ···in ] ∝ ǫi1 ···in . (B.12)

Example. A generalization of the ǫ-tensor is the ‘generalized Kronecker symbol’:



+1 if (i1 · · · ik ) is an even permutation of (j1 · · · jk )

i1 ···ik
δj1 ...jk = −1 if (i1 · · · ik ) is an odd permutation of (j1 · · · jk ) (B.13)


0 if (i1 · · · ik ) is not a permutation of (j1 · · · jk )

Elementary considerations about permutations imply that


 
ip 
δji11 ···ik
...jk = det δjq p,q , (B.14)

hence
X
δji11···i
...jk =
k
sgn σ δji1σ(1) · · · δjikσ(k) = k! δ[j
i1
1
· · · δjikk ] . (B.15a)
σ∈P(1···k)

Alternatively we obtain

[i i ] [i i ]
δji11···i 1 k 1 k
...jk = k! δj1 · · · δjk = k! δ[j1 · · · δjk ] .
k
(B.15b)

In some contexts the nested variants of (B.15) are useful. We merely consider
a particular example,
αµν µ ν [µ
δκστ = 2δκα δ[σ α
δτ ] + 4δ[σ δτ ] δκν] . (B.16)

version 20/01/2010 –221–


Appendix B. (ANTI) SYMMETRIZATION

The proof of this relation is not difficult:


αµν α µ ν (B.8) µ ν α µ ν α µ ν
δκστ = 6δ[κ δσ δτ ] = 2δκα δ[σ δτ ] + 2δ[σ δτ ] δκ + 2δ[τ δκ̂ δσ] ,

where by convention, hatted indices are excluded from the antisymmetrization;


1
for instance, A[τ κ̂σ] = 2 Aτ κσ − Aσκτ ; continuing the calculation we get

αµν µ ν α µ ν µ ν [µ
δκστ = 2δκα δ[σ δτ ] + 2δ[σ α ν µ
δτ ] δκ + 2 δ[τ δσ] δκ = 2δκα δ[σ α
δτ ] + 4δ[σ δτ ] δκν] ,
|{z}
α δν
−δ[σ τ]

which proves the claim. Finally we note that ǫi1 ···ik is a special case of the
generalized Kronecker symbol, because
ǫi1 ···ik = δi1···k
1 ···ik
. (B.17)
Example. A common source of confusion is the ǫ-tensor with upstairs indices.
On the one hand, there exists the tensor ǫi1 ···ik defined in analogy to (B.11),
i.e., (
i1 ···ik +1 if sgn(i1 · · · ik ) = +1
ǫ = (B.18)
−1 if sgn(i1 · · · ik ) = −1 ,
and ǫi1 ···ik = 0 otherwise. However, if the vector space is endowed with a metric
(scalar product), then the symbol ǫi1 ···ik is often used to denote the tensor that
is generated by raising the indices of ǫi1 ···ik . For instance, in Minkowski space,
in contrast to (B.18), the symbol ǫαβγδ might be used to denote
′ ′ ′ ′
ǫαβγδ = η αα η ββ η γγ η δδ ǫα′ β ′ γ ′ δ′ . (B.19)
In particular
′ ′ ′ ′
ǫ0123 = η 0α η 1β η 2γ η 3δ ǫα′ β ′ γ ′ δ′ = det η = −1 . (B.20)
We conclude that
(
αβγδ −1 if sgn(αβγδ) = +1
ǫ = (B.21)
+1 if sgn(αβγδ) = −1 ,

and ǫαβγδ = 0 otherwise. Let us summarize this notational confusion with the
‘equation’
ǫαβγδ = −ǫαβγδ , (B.22)
where the l.h.s. is the tensor generated by raising the indices of ǫαβγδ and the
r.h.s. is the ‘normal’ ǫ-tensor with upstairs indices. Which of two conventions
is followed is (hopefully) clear from the context.

–222– version 20/01/2010


Appendix B. (ANTI) SYMMETRIZATION

Example. Often one is confronted with contractions of ǫ-tensors. Consider


ǫi1 ···ik and let ǫi1 ···ik be the normal epsilon tensor with upstairs indices defined
by (B.18). Let k = p + q. The fundamental relation is

l ...l
ǫi1 ···ip j1 ···jq ǫi1 ···ip l1 ···lq = p! δj11 ···jqq , (B.23)

where the factor p! enters, because there are p! permutations of the p-tuple
(i1 · · · ip ) that are summed over. In the special case of four dimensions we thus
obtain the following relations:

ǫν1 ν2 ν3 ν4 ǫν1 ν2 ν3 ν4 = 4! (B.24a)

ǫν 1 ν 2 ν 3 α ǫν 1 ν 2 ν 3 β = 3! δαβ (B.24b)

ǫν1 ν2 α1 α2 ǫν1 ν2 β1 β2 = 2! δαβ11βα22 (B.24c)

ǫνα1 α2 α3 ǫνβ1 β2 β3 = δαβ11βα22βα33 (B.24d)

ǫα1 α2 α3 α4 ǫβ1 β2 β3 β4 = δαβ11βα22βα33βα44 (B.24e)

When we use the ǫ-tensor (B.21), which is the negative of the tensor used
above, we obtain the following relations:

ǫν1 ν2 ν3 ν4 ǫν1 ν2 ν3 ν4 = −4! = −24 (B.25a)

ǫν 1 ν 2 ν 3 α ǫν 1 ν 2 ν 3 β = −3! δαβ = −6 δαβ (B.25b)


β1 β2
ǫν1 ν2 α1 α2 ǫν1 ν2 β1 β2 = −2! δαβ11βα22 = −4 δ[α δ
1 α2 ]
(B.25c)
β1 β2 β3
ǫνα1 α2 α3 ǫνβ1 β2 β3 = − δαβ11βα22βα33 = −6 δ[α δ δ
1 α2 α3 ]
(B.25d)
β1 β2 β3 β4
ǫα1 α2 α3 α4 ǫβ1 β2 β3 β4 = − δαβ11βα22βα33βα44 = −24 δ[α δ δ δ
1 α2 α3 α4 ]
(B.25e)

Here we have used (B.15). The relations (B.25) are useful in many contexts.
Example. If (anti)symmetrization involves two identical objects, the additional
symmetry results in simplifications. The simplest example is

a[i aj] = 0 , (B.26)

1

which is proved straightforwardly: a[i aj] = 2 ai aj − aj ai = 0. Analogously,

a(i aj) = ai aj , (B.27)

version 20/01/2010 –223–


Appendix B. (ANTI) SYMMETRIZATION

because ai aj is already symmetric. For tensors of higher order things get more
complicated; we merely consider one example. Let Fab be antisymmetric, i.e.,
F[ab] = Fab . Then

F[ab Fcd] = Fa[b Fcd] . (B.28a)

The latter can be written as

1 
Fa[b Fcd] = Fab Fcd + 2Fa[c Fd]b . (B.28b)
3
Let us prove (B.28a). (Recall that hatted indices are excluded from the an-
tisymmetrization; for example, in [abĉd] the antisymmetrization  involves only
1
the indices [abd], hence A[abĉd] = 6 Aabĉd − Abaĉd + Abdĉa + . . . .)

1 
F[ab Fcd] = Fa[b Fcd] − F[bâ Fcd] + F[bc Fâd] − F[bc Fd]a
4
1 
= Fa[b Fcd] − F[cd Fb]a +Fa[d Fbc] − F[bc Fd]a
4 |{z} |{z}
−Fâb] −Fâd]

1 
= Fa[b Fcd] + Fa[b Fcd] + Fa[d Fbc] + Fa[d Fbc] = Fa[b Fcd]
4
We have only used elementary considerations about permutations; furthermore,
we have employed the antisymmetry of Fab and the fact that cyclic permuta-
tions of a triple do not change signs. The proof of (B.28b) is simpler, since
1 
Fa[b Fcd] = Fab F[cd] + Fa[c Fd]b + Fa[d Fb̂c]
3 |{z}
−Fc]b

and the result follows.

–224– version 20/01/2010

You might also like