Heinzle. Introduction To Relaivity and Cosmology PDF
Heinzle. Introduction To Relaivity and Cosmology PDF
Relativity and
Cosmology I
J. Mark Heinzle
Gravitational Physics, Faculty of Physics
University of Vienna
Version 20/01/2010
–2– version 20/01/2010
CHAPTER 1
AETHER
What is space? Which mathematical concept provides the best description for
what we experience as our spatial world?
This question is neither naïve nor trivial. There exists a multitude of concepts
in mathematics that could apply a priori. Space could be merely a set, it could
be a topological space, maybe Hausdorff, a semi-group, a field, a manifold, a
complex manifold, a vector space, . . .
Our daily experience tells us that space is some kind of ‘continuum’ of ‘points’.
It is a matter of course for us to be able to form ‘arrows’ (‘vectors’) −
→ connect-
pq
ing points p, q in space. Most importantly, we can add vectors and multiply
vectors with scalars (∈ R). This suggests that our spatial world has a struc-
ture intimately connected with the structure of a vector space. However, it is
obvious that a vector space is not a good model for our spatial world. This is
simply because there doesn’t seem to exist a distinguished zero vector; quite
the contrary, all spatial points seem to be on an equal footing. Collecting these
intuitive observations results in the following statement:
Remark. All these “experiments” that we perform every day—we form vectors
connecting points, compute lengths and angles— indicate that Galilean space
is the correct mathematical description of the space of our daily experience.
Note, however, that we would come to a radically different conclusion if the
scale of our intuitive perception were different. But we think in meters and not
in megaparsecs. . .
What is time? This is a somewhat more difficult question. Let us quote Sir
Isaac Newton:
“Absolute, true and mathematical time, of itself, and from its own nature
flows equably without regard to anything external, and by another name
is called duration: relative, apparent and common time, is some sensible
and external (whether accurate or unequable) measure of duration by
the means of motion, which is commonly used instead of true time . . . ”
Summarizing, we don’t quite know what time is (at least not yet), but we have
a lot of first hand experience with it. For our present purposes this is sufficient.
We are free to choose any curvilinear coordinate system we can come up with.2
1
Strictly speaking, a coordinate system—or chart—is simply a bijective map of R3 into
Galilean space. In the present context, however, dependence on time is permitted (so
that a coordinate system is a one-parameter family of maps of R3 into Galilean space).
2
And frequently, curvilinear coordinates are very useful. For instance, when one aims at
studying rotating bodies, spherical coordinates are often advantageous.
However, among the vast number3 of possible coordinate systems, there exists
a special subfamily: the inertial frames (inertial coordinates). The existence
of these inertial frames is guaranteed by the principle of relativity:
Consider, for example, Newton’s first law which describes the motion of a point
particle in the absence of exterior forces. Newton’s first law states that these
point particles move along straight lines in Galilean space. In an inertial frame
of reference, Newton’s first law is represented by the equation4
ẍi (t) = 0 .
Remark. We emphasize that the principle of relativity implies that all laws of
(Newtonian) physics are invariant under Galilean transformations.
On the basis of definition 1.1′ we are able to derive the Galilean transformations;
we choose to make use of Newton’s first law.
Suppose that (t, x), where x = (x1 , x2 , x3 ), and (t̄, x̄), where x̄ = (x̄1 , x̄2 , x̄3 ),
are inertial coordinates. By definition, the Galilean transformation mapping
the one inertial coordinate system into the other is of the form7
t̄ = t , x̄ = x̄(t, x) ; (1.1)
expressed in component form, the latter reads x̄i = x̄i (t, x1 , x2 , x3 ). In (1.1) we
assume that the origin of space and time remain unchanged; if this is not the
case, (1.1) may contain additional constants representing a spatial translation
and a temporal translation, i.e.,
Let x(t) represent the freely moving point particle w.r.t. the first inertial
frame, i.e., ẍi (t) = 0. W.r.t. the second inertial frame (1.1), this curve reads
5
Strictly speaking, a Galilean transformation is a one-parameter family of coordinate trans-
formations of Galilean space; this is because there might be a dependence on time,
see (1.1).
6
See footnote 5.
7
It is implicit in definition 1.1′ that time remains unchanged under a Galilean transformation
(because it is a coordinate transformation ‘of Galilean space’).
x̄(t) = x̄ t, x(t) ; we may replace t by t̄. We obtain
∂x̄i ∂x̄i
x̄˙ i = + j ẋj ,
∂t ∂x
∂ 2 x̄i ∂ 2 x̄i j ∂ 2 x̄i ∂x̄i
¨i =
x̄ + 2 ẋ + ẋ j k
ẋ + ẍj .
∂ t2 ∂t∂xj ∂xj ∂xk ∂xj |{z}
=0
Since Newton’s law must be invariant under the transformation (because (t̄, x̄)
is assumed to be an inertial frame) we find that
∂x̄i ∂x̄i
≡ −v i = const and ≡ Rik = const
∂t ∂xk
Accordingly, (1.1) becomes
In (1.2) the origin of space and time remain unchanged; if this is not the
case, (1.2) may contain additional constants representing a spatial translation
and a temporal translation, i.e.,
Of particular interest among the Galilean transformations (1.2′ ) are the so-
called Galilean boosts
t̄ = t , x̄i = xi − v i t . (1.3)
Galilean boosts are coordinate transformations between inertial frames that
are in uniform relative motion. Let us summarize the statement of (1.2′ ) as a
corollary.
Corollary 1.2. An (orientation-preserving) Galilean transformation is a spa-
tial translation by a constant vector, a temporal translation by a constant pa-
rameter, a rotation by a constant angle, a Galilean boost, or a combination of
these.
Remark. Also reflections are Galilean transformations; these do not, however,
preserve the orientation of the coordinates.
Remark. We see that inertial frames are those coordinate systems that are
adapted to the geometric structure of (Galilean) space (and time). Inertial
frames essentially correspond to orthonormal frames or orthonormal frames in
uniform motion.
Let us conclude this section with another example illustrating Galilean invari-
ance. Consider the equations of motion of two gravitating point particles with
masses m1 , m2 ,
x1 − x2
m1 ẍ1 = −m1 m2 , (1.4a)
|x1 − x2 |3
x2 − x1
m2 ẍ2 = −m1 m2 . (1.4b)
|x1 − x2 |3
A change of inertial frame is a Galilean transformation (1.2′ ), i.e., t̄ = t + t̄0
and
x̄1i = Rik x1k − v i t + x̄0 , x̄2i = Rik x2k − v i t + x̄0 .
Let us restrict ourselves to a Galilean boost, i.e., t̄ = t and
x̄1i = x1k − v i t , x̄2i = x2k − v i t .
It follows that
x̄˙ 1i = ẋ1i −v i , x̄˙ 2i = ẋ2i −v i , ¨1i = ẍ1i ,
x̄ ¨2i = ẍ1i ,
x̄ x̄1i − x̄2i = x1i −x2i ,
and we infer that
x̄1 − x̄2
¨1 = −m1 m2
m1 x̄ , (1.4a′ )
|x̄1 − x̄2 |3
x̄ − x̄1
¨2 = −m1 m2 2
m2 x̄ , (1.4b′ )
|x̄1 − x̄2 |3
hence the equations are identical w.r.t. the inertial frame (t̄, x̄).
Exercise. Show the invariance of (1.4) under a general Galilean transformation.
The Euler equations describe the dynamics of inviscid fluids. The equations
read
∂t ρ + ∂k ρuk = 0 , (1.5a)
i
i k ik
∂t ρu + ∂k ρu u + pδ =0, (1.5b)
where ρ = ρ(t, x) is the density and p = p(t, x) the pressure of the fluid;
u = u(t, x) is the velocity field.
The second equation, i.e., (1.5b), corresponds to Newton’s second law and
encodes the conservation of momentum. Using
(∂t ρ)ui + ρ∂t ui + ∂k (ρuk )ui + ρuk ∂k ui + ∂ i p = ρ ∂t + uk ∂k )ui + ∂ i p
we may rewrite it as
ρ ∂t + u∇)u = −∇p
or
du
ρ = −∇p . (1.5b′ )
dt
The equations (1.5a) and (1.5b) do not form a closed system. It is required
that we prescribe an equation of state,
p = p(ρ) , (1.5c)
relating the density and the pressure. It is common to also consider an equation
representing the conservation of energy
∂t ǫ + ∂k (ǫ + p)uk = 0 ;
here, ǫ denotes the energy density.
This invariance leads directly to the invariance of equation (1.5b); we use the
form (1.5b′ ).
d
ρ(t, x) u(t, x) + ∇p(t, x)
dt
d
= ρ̄(t̄, x̄) ¯ t̄, x̄) = ρ̄(t̄, x̄) d ū(t̄, x̄) + ∇p̄(
ū(t̄, x̄) + v + ∇p̄( ¯ t̄, x̄) .
dt̄ dt̄
Finally, to complete the proof that (1.5) is invariant under a Galilean boost,
we note that
p̄(t̄, x̄) = p(t, x) = p ρ(t, x) = p ρ̄(t̄, x̄) .
In summary, like every good law of (Newtonian) physics, the Euler equa-
tions (1.5) are invariant under Galilean transformations.
Provided that the spatial gradients of φ and u are of the same small order as φ
and u themselves, we may neglect the terms of higher order and thereby obtain
a simple linear system of equations,
∂φ
= −∇u , (1.14a)
∂t
∂u
= −cs2 ∇φ . (1.14b)
∂t
From (1.14) we infer that
1 ∂2
φ = − φ + ∆φ = 0 , (1.15)
cs2 ∂t2
i.e., φ = φ(t, x) satisfies a wave equation. The solutions of (1.15) are compres-
sion and rarefaction waves that propagate with the speed cs . Similarly,
1 ∂2
u = − u + ∆u = 0 .
cs2 ∂t2
The main observation for us is the following: The Euler equations (1.5) are
invariant under Galilean transformations. From these equations we derive an
φ(t, x) = 0 (1.16)
reduces to
h 1 i h 1 ih 1 i
− 2 ∂t2 + ∂x2 φ(t, x) = − ∂t + ∂x ∂t + ∂x φ(t, x) = 0 . (1.17)
cs cs cs
Its general solution can be represented as the linear combination of a wave
traveling to the right and a wave traveling to the left, i.e.,
φ(t, x) = λ1 φ1 (x − cs t) + λ2 φ2 (x + cs t) , (1.18)
This makes perfect sense. An observer who is traveling at the speed cs observes
a class of waves that do not propagate at all (i.e., “standing waves”), and and
a class of waves that propagate in the direction opposite to the direction of
motion, which are waves with velocity 2cs . These two classes are represented
by φ̄1 and φ̄2 , respectively.
The example shows that inertial frames need not necessarily be equivalent.
Clearly, the fundamental laws of (Newtonian) physics must be the same un-
der a change of inertial frame (Galilean transformation). But if there exists a
medium (represented by a ‘background’ solution), then this medium automat-
ically distinguishes a frame of reference (“absolute space”); derived equations
(which rely on the existence of a background solution) might take a distin-
guished (simple) form with respect to the distinguished frame of reference. (In
the inertial frame (t, x), where the fluid is at rest, the equation for the per-
turbation is (1.16); in a boosted inertial frame (t̄, x̄) we obtain (1.16); the rest
frame of the fluid is obviously distinguished.)
Let us conclude this section by analyzing the properties of the wave equation
in some more detail. As has been demonstrated above, the wave equation
1 2
φ(t, x) = − ∂ φ(t, x) + δij ∂i ∂j φ(t, x) = 0 (1.19)
cs2 t
is not invariant under Galilean transformations. However, there exists a class
of transformations under which (1.19) is in fact invariant. To see this let us
first define 1
−c2
s
1
ηs =
. (1.20)
1
1
It is common to specify
the arguments of the functions collectively; e.g., instead
of φ x0 , x1 , x2 , x3 we will write φ(xσ ).
This implies
∂ ∂x̄ν ∂ ∂x̄ν ¯
∂µ = = = ∂ν = [Ls ]ν µ ∂¯ν .
∂xµ ∂xµ ∂ x̄ν ∂xµ
Furthermore, the function φ(xσ ) is assumed to transform like a scalar function,
i.e., φ̄(x̄σ ) = φ(xσ ).
= ηsµν [Ls ]β ν ∂µ ∂¯β φ̄(x̄σ ) = ηsµν [Ls ]αµ [Ls ]β ν ∂¯α ∂¯β φ̄(x̄σ )
!
=¯ φ̄(x̄σ ) = ηsαβ ∂¯α ∂¯β φ̄(x̄σ ) ,
!
where the symbol = means ‘is required to be equal to’. Consequently, invari-
ance of the d’Alembertian is equivalent to requiring, that
of time, t 7→ t̄ = γs t − v c−2 1
s x . However, we don’t have to wrack our
brains about this curiosity now. We know that the equation (1.19) is not
a fundamental equation of physics (but arising from a background solution);
therefore, the transformation properties of (1.19) cannot tell us anything about
fundamental physics.
~ ×∇
∇ ~ ×E
~ =∇
~ ∇~E
~ −∆E
~
|{z}
=0
~ =− 1 2~ ~ =0.
E ∂ E + ∆E (1.25)
c2 t
Analogously, B ~ = 0. Therefore, the components of the electric and the
~ ~ satisfy the free wave equation.
magnetic field, E and B,
So, this is the idea: Light and electromagnetic waves in general require a
propagation medium; this medium is called luminiferous and electromagnetic
ether (alternative spelling: aether).
Electromagnetic waves are like perturbations of the ether (which is the back-
ground solution). There exists some (non-linear) theory of the ether (a theory
that has not been discovered yet); the linearized equations that describe the
perturbations manifest themselves as the Maxwell equations (and thus as the
wave equation for the electric and magnetic field). The (unknown) ether equa-
tions are certainly Galilean invariant; the Maxwell equations on the other hand
are not; the validity of the Maxwell equations is restricted to a distinguished
inertial frame, the rest frame of the ether.
Remark. A good model to have in mind is the one discussed in sections 1.3
and 1.4. There is one difference, however. In the case of a perfect fluid and
the Euler equations, the perturbations are longitudinal waves (compression
and rarefaction); in the electromagnetic case, perturbations of the ether are
transversal waves (which is immediate from the polarization of light). As a
consequence, a fluid cannot be a viable model for the ether; the ether must
have more complicated properties.
To answer these questions several experiments were conceived and carried out
in the nineteenth century. We focus on Hoek’s experiment and the Michelson–
Morley experiment.
1.7 Hoek
The basic idea is the detection of interference of light beams taking different
paths. Suppose that the earth moves with velocity v through the ether. For a
light beam that is parallel to the direction of motion of the earth, the velocity
is c − v, when measured in the rest frame of the laboratory (earth); for a light
beam that is antiparallel to the direction of motion of the earth, this velocity
is c + v. Therefore, if a light ray makes a round-trip through a device of length
l l 2l 2l v 2
t= + = + 2
+ O v 4 /c4 . (1.26)
c+v c−v c c c
The velocity of the earth in its course around the sun is approximately 30km/s;
hence
v v2
≃ 10−4 , ≃ 10−8 .
c c2
Therefore, since the travel time (1.26) differs from 2l/c at second order in v/c
only, the effect to be measured is very small.
Hoek’s experiment was designed in such a way that the expected effect was of
first order in v/c, because, at that time, effects of second order in v/c were
beyond experimental reach. Two light rays pass through vacuum (or air) in
one direction and through a medium (water) in the opposite direction. Recall
that the speed of light in a medium is
c
,
n
where n is the refractive index of the medium. In the simplest approach to the
problem, we expect the travel times
l l (1 + n)l (1 − n2 )l v
t1 = c + = + + O v 2 /c2
n +v c−v c c c
l l 2
(1 + n)l (1 − n )l v
t2 = c + = − + O v 2 /c2
n −v c+v c c c
for the two light rays going through the apparatus in opposite directions. The
difference in travel times (at first order in v/c) should manifest itself in inter-
ference fringes.
Remark. Since the direction of motion of the earth through ether is unknown,
the experimental device could be rotated. (In addition, the experiment was
performed several times several months apart—the earth could be at rest rel-
ative to ether at a certain time; since the earth revolves around the sun, some
months later, it should be moving with 60km/s.
The experiment gave a null result. The motion of the earth through ether could
not be detected. However, the explanation of this result was straightforward
(and completely consistent with the earlier results by Fizeau). Ether interacts
with matter which causes the phenomenon of ‘ether drag’:
Suppose that some matter moves with velocity v (w.r.t. the universal ether).
Then the ether within the material is (partially) dragged along, so that the
velocity of the ‘local’ ether is
Let us consider Hoek’s experiment in the light of ‘ether drag’: If the velocity of
the medium (water)8 w.r.t. the universal ether is v, then its velocity w.r.t. the
‘local’ ether contained within it is (1 − d)v. We thus obtain the travel times
l l (1 + n)l 1 − (1 − d)n2 l v
t1 = c + = + + O v 2 /c2
n + (1 − d)v c−v c c c
l l (1 + n)l 1 − (1 − d)n2 l v
t1 = c + = − + O v 2 /c2
n − (1 − d)v c+v c c c
for the two light rays going through the apparatus in opposite directions.
The null result of the experiment is explained by partial ether drag, if and only
if 1 − (1 − d)n2 = 0, which corresponds to a dragging coefficient of
1
d=1− . (1.28)
n2
This was exactly the value suggested by Fresnel some decade earlier and con-
firmed earlier measurements and results by Fizeau.
Due to the effect of partial ether drag, Hook’s experiment could not detect the
movement of the earth w.r.t. the (universal) ether. The terms of first order in
v/c disappear; the remaining effect would be of second order in v/c and could
not be detected with Hoek’s experimental means.
Toward the end of the nineteenth century the possibility to construct a high
precision interferometer to measure effects of quadratic order in v/c came into
reach.
8
In Hoek’s experiment, the velocity of the medium coincides with the velocity of earth
through the (universal) ether, since the medium is at rest w.r.t. the laboratory.
In the Michelson interferometer a light beam is divided into two parts that
travel along orthogonal paths to meet again where interference is observed.
Suppose that one of the two light rays moves in the direction of the movement
of earth through ether; we call this ray the parallel ray, the other ray is called
the perpendicular ray. In the laboratory frame, the velocity vectors are
c±v
uk = (k)
0
for the parallel ray and
0
u⊥ =
±w
for the perpendicular ray, where w is to be determined. A simple Galilean
transformation show that u⊥ reads
′ v
u⊥ =
±w
in the rest frame of the ether. Since the absolute value of this velocity must be
c, i.e.,
u′⊥ u′⊥ = v 2 + w2 = c2 ,
we find p q
v2
w= c2 − v 2 = c 1− c2 .
Hence,
u⊥ = √ 0 . (⊥)
± c2 − v 2
Let us compute the travel times of the two light rays. We obtain
l l 2l 2l v 2
tk = + = + 2
+ O v 4 /c4 ,
c+v c−v c c c
l l 2l l v2 4 4
t⊥ = √ +√ = + + O v /c .
c2 − v 2 c2 − v 2 c c c2
The difference causes a phase shift between the two beams that produces inter-
ference fringes. These interference pattern should change if the interferometer
is rotated; e.g.,
l v2
∆t0◦ = tk − t⊥ = + 2
+ O v 4 /c4 , (1.29a)
c c
l v2
∆t90◦ = t⊥ − tk = − 2
+ O v 4 /c4 ; (1.29b)
c c
The first experiment was performed by Michelson in 1881. However, there was
a fundamental error in Michelson’s computations; he used the value w = c
to compute the travel time of the perpendicular ray. This error leads to an
additional factor of 2 in (1.29), hence Michelson expected an effect twice the
actual size. The error of his experimental design would have been sufficiently
small to measure this larger effect, but the actual effect is merely half the size
and Michelson’s experiment was not precise enough to measure (1.29).
The conclusion drawn by Michelson and Morley was that the earth drags the
entire ether along, possibly gravitatively. (This view was consistent with an
earlier suggestion by Stokes.) However, as had been known long before, the
observation of stellar aberration is proof that this kind of ether drag is impossi-
ble. (The phenomenon of stellar aberration would be very different, if the light
reaching us from the stars passed through an ether whose velocity changes with
position.) Moreover, the connection of that kind of complete ether drag with
the partial dragging established by Fizeau and Hoek would remain mysterious.
The attempts to explain the null result of the interferometry experiment became
rather desperate. Since the laboratory where Michelson and Morley performed
their experiment was situated in the basement of a building, it was suggested
that there could be complete ether in such a surrounding. But experiments
on mountains and in balloons (!) confirmed the null results.9 As a last resort,
Fitzgerald and Lorentz proposed the contraction of lengths in the direction of
motion of earth through ether. However, the mechanism that caused such a
contraction remained unclear.
In 1905, Einstein made a clear cut. The main idea is simple: There is no ether.
9
The occasional non-null results were all shown to be due to bad experimental design.
EINSTEIN
This is the line of reasoning: There is no ether. The Maxwell equations are not
‘derived’ equations that describe the perturbations of a medium; the Maxwell
equations are ‘fundamental’. In other words: ‘Laws of nature’. But then,
obviously, the principle of relativity comes into play. If the principle of relativity
holds, then the set of coordinate systems w.r.t. which the Maxwell equations
take their standard form are the inertial frames of reference.
So, then, what are the coordinate transformations that leave invariant the
Maxwell equations?
Let us use the results of section 1.5. The Maxwell equations encompass the
wave equation. It thus makes sense to begin by studying the coordinate trans-
formations that leave invariant the wave equation, i.e.,
1 2
φ = − ∂ φ + ∆φ = 0 . (2.1)
c2 t
Like at the end of section 1.4 we begin by writing the wave equation in a
convenient form. Using the Kronecker symbol (and the Einstein summation
convention) the wave equation (2.1) becomes
1 2
φ(t, x) = − ∂ φ(t, x) + δij ∂i ∂j φ(t, x) = 0 . (2.2)
c2 t
We define a coordinate x0 that encodes time by
x0 = ct .
It is common to specify
the arguments of the functions collectively; e.g., instead
of φ x0 , x1 , x2 , x3 we will write φ(xσ ).
Remark. Strictly speaking, we conclude from Lµν = ∂x̄µ /∂xν = const that
for some constant vector āµ . However, this merely corresponds to an additional
translation (in time and/or space). For simplicity we suppress this translational
freedom for the moment.
In addition to (2.10), we conclude from (2.9) that invariance of the wave equa-
tion requires
Lαµ Lβ ν η µν = η αβ , (2.11)
which means that η remains invariant under (2.10).
Our considerations on the wave equation that lead us to (2.10) and (2.11) thus
prove the following theorem:
Theorem 2.2. The Lorentz transformations are uniquely characterized as the
transformations
x̄µ = Lµν xν
that leave η invariant, i.e.,
Lµα Lν β η αβ = η µν . (2.12)
see (2.4).
Remark. In matrix notation, equation (2.12′ ) reads
LT η L = η .
It is simple to see that the two representations (2.12) and (2.12′ ) are equivalent:
We multiply (2.12) with [L−1 ]σµ and [L−1 ]λν and obtain
Since [L−1 ]σµ Lµα = δσα and [L−1 ]λν Lν β = δλβ we get
L−1 η −1 (L−1 )T = η −1 .
Note that η and η −1 are identical as matrices; the reason for the choosing η −1
here will become clear later (when we discuss metrics).2 Therefore,
−1
L−1 η −1 (L−1 )T =η,
which implies
LT η L = η .
2
The index structure of the Minkowski metric η is ηµν ; the index structure of its inverse η −1
is η µν . (Regarded as matrices the two are equal.) The reason is that the matrix identity
η −1 η = 1 corresponds to η µν ηνσ = δ µσ in index notation.
which is (2.12′ ).
Remark. The equivalence of (2.12) and (2.12′ ) is in fact rather obvious and does
not need any particular work (however, we may consider the above a useful
exercise). One simply takes into account the action of L on contravariant and
covariant tensors; see appendix.
LT η L = η . (2.15′ )
In this section we will see how Lorentz transformations actually look like.
The first question we ask is: How many Lorentz transformations are there?
Every Lorentz transformation is represented by a (4 × 4) matrix, which makes
16 unknowns. Looking at equation (2.15) we see that there are 10 equations
that L must satisfy (since η is a symmetric matrix, 6 of the 16 equations
are redundant). 16 − 10 = 6. In other words, we expect that the Lorentz
transformations form a 6-parameter family.
Provided that the ansatz is successful (i.e., provided that there exist Lorentz
transformations with this structure), we automatically obtain the complete
six-parameter family of Lorentz transformations.
irrespective of where it sits within the matrix L. Using the ansatz for L
in (2.15), i.e., in LT η L = η, we find that a matrix L of the (2 × 2) block
form is a Lorentz transformation if and only if
T ∓1 0 ∓1 0
K K = . (2.17)
0 1 0 1
Note again that there exist three rotational degrees of freedom, since the group
SO(3) is a three-parameter group (characterized, e.g., by the Euler angles).
Let us now consider the − case, i.e., (2.17) with the minus sign. We obtain
2
T −1 0 a c −1 0 a b −a + c2 −ab + cd
K K = = ,
0 1 b d 0 1 c d −ab + cd −b2 + d2
b = − sinh w , d = cosh w
u=w.
Therefore,
a b cosh u − sinh u
K= = . (2.22)
c d − sinh u cosh u
Using K in the ansatzes we obtain the three remaining Lorentz transformations,
i.e.,
cosh u − sinh u 0 0
− sinh u cosh u 0 0
L= 0
, (2.23)
0 1 0
0 0 0 1
and the two other ones where the (2 × 2) block involves x2 or x3 instead of x1 ,
i.e.,
cosh u 0 − sinh u 0 cosh u 0 0 − sinh u
0 1 0 0 0 1 0 0
, .
− sinh u 0 cosh u 0 0 0 1 0
0 0 0 1 − sinh u 0 0 cosh u
Thereby we have successfully completed our search for the Lorentz transforma-
tions. We have found six elementary Lorentz transformations; because of the
group property, every Lorentz transformation can be represented as a combi-
nation (composition) of these six elementary transformations.
Remark. The parameter u is called rapidity.
can be rewritten as
t̄ cosh u −c−1 sinh u 0 0 t
x̄1 −c sinh u cosh u 0 0 x1
= (2.24)
x̄2 0 0 1 0 x2
x̄3 0 0 0 1 x3
i.e., the particles are in uniform motion in the direction of the x1 -axis with
some constant velocity v which is given by
v = c tanh u . (2.26)
Consequently, v is the constant velocity of the inertial observer X̄ as seen by
the observer X. (Note that a motion in the direction of the positive x1 -axis is
described by positive values of v, while a motion in the direction of the negative
x1 -axis is described by negative values of v.) We draw an interesting conclusion
from the formula (2.26): Since the modulus of tanh u is always less than 1, we
find
|v| < c . (2.27)
This means that if X and X̄ are inertial observers, then the relative velocity
|v| describing their relative motion must necessarily be less than the speed of
light.
Definition 2.3. The γ-factor associated with the relative velocity |v| is defined
as
1
γ = γ(v) = p . (2.28)
1 − v 2 /c2
Rewriting (2.24) in terms of v and γ(v) yields the standard form for the
Lorentz boost in x1 -direction:
t̄ γ −γv/c2 0 0 t
x̄1 −γv γ 0 0 x1
= (2.29)
x̄2 0 0 1 0 x2
x̄3 0 0 0 1 x3
Remark. For all inertial observers the speed of light is equal to c. This is
already implicit in our assumptions, since we have required the wave equation
(which contains c) to hold w.r.t. all inertial frames. As an exercise we can check
consistency with (2.29). To that end consider a photon which moves according
to x1 = ct w.r.t. the frame X. In coordinates X̄ we obtain t̄ = γ(1 − v/c)t and
x̄1 = γ(1 − v/c)ct from (2.29). Consequently, x̄1 = ct̄, i.e., also for the observer
X̄ the photon moves with velocity c.
The Lorentz boosts associated with the other axes are completely analogous.
Here is the complete list:
where ~v is an arbitrary vector (with |~v | < c of course). This general Lorentz
boost can be obtained by applying suitable rotations to (2.30a),
! γ −γ|v|/c2 !
1 −γ|v| γ 1
Lgeneral (~v ) = ,
RT 1 R
1
i.e., R is a rotation matrix that rotates the vector ~v into the vector (|v|, 0, 0)T .
We see that the family of Lorentz boosts is a three-parameter family, where
the parameters are the three components of ~v in (2.31).
and therefore
v 4 h v 2 x1 v
1 v2 v 1i
t̄ = 1 + +O 4 t − 2x = t 1 + O 2 − O
2 c2 c c c c c
2 4 2
1v v v
x̄1 = 1 + 2
+O 4 −vt + x1 = −vt + x1 1 + O 2 .
2c c c
t̄ ≃ t (2.35a)
x̄ ≃ x1 − vt (2.35b)
(x0 , x1 , x2 , x3 )
x0 = ct .
Remark. Recall that one can interpret the coordinate x0 as measuring time in
‘light meters’ instead of in seconds (where a ‘light meter’ is the time that light
takes to travel through one meter). More or less equivalently, one can measure
time in seconds and length in light seconds.
µ, ν, . . . = 0, 1, 2, 3
i, j, . . . = 1, 2, 3
xµ′ = Lµν xν .
c=1.
1
γ = γ(v) = √ . (2.37)
1 − v2
η(v, w) = −v 0 w0 + v 1 w1 + v 2 w2 + v 3 w3 = −v 0 w0 + ~v w
~, (2.39a)
i.e., η(·, ·) is left invariant. Conversely, suppose that η(·, ·) is left invariant by
a transformation L, i.e.,
η(Lv, Lw) = η(v, w)
for all v, w. Then
Since this holds for all v, w, the term in brackets must vanish, i.e.,
LT ηL − η = 0 .
The geometric structure of spacetime has been unveiled. In the following sec-
tions, this geometric structure will be the cornerstone for everything.
MINKOWSKI
We collect our previous considerations and condense our findings into what is in
fact the actual postulate of special relativity: The model of our spatiotemporal
reality is Minkowski spacetime.
Remark. Recall that (Galilean) space is an affine space modeled over a three-
dimensional Euclidean vector space, see section 1.1. Note the analogy.
that there exists bases of the vector space, {e0 , e1 , e2 , e3 }, such that
−1
1
η(eµ , eν ) = ηµν =
, (3.1)
1
1
or, alternatively,
η(v, w) = v µ wµ (3.4)
η(v, w) = −v 0 w0 + ~v w
~ = −v 0 w0 + v 1 w1 + v 2 w2 + v 3 w3 . (3.5)
Z A note of advice. The reader is strongly recommended to prepare their own lecture
notes and to include the figures/Minkowski diagrams drawn on the blackboard during
the lecture course.
It follows that
The most conspicuous structure is the set of the two null lines, which are the
straight lines consisting of null vectors (at ϕ = ±45◦ ).
which is again the equation of a unit hyperbola, whose asymptotes are the two
null lines.
In two-dimensional Minkowski space the light cone reduces to the two null lines
with a slope of ±45◦ , and the hyperboloids reduce to hyperbolas.
We see that the family of timelike vectors falls into two disconnected classes:
Future-directed timelike vectors and past-directed timelike vectors. The analog
is true for null vectors.
Definition 3.4. A time-like four-vector is called future-directed if u0 > 0 and
past-directed if u0 < 0. The analog holds for null vectors.
where i, j = 1, 2, 3. The vector e0 thus lies on the unit mass shell (since it is
timelike, future-oriented, and normalized), while the vectors ei are normalized
spacelike vectors orthogonal to it.
Remark. Recall that in two-dimensional Minkowski space we consider only e0
and e1 , where the two vectors are related by a reflection at the straight line
with slope 45◦ , see (3.7).
Henceforth we suppress the distinction between x0 and t and use the two inter-
changeably. In other words, when we write t, then we refer to time measured
in units such that
c=1.
These coordinates rely on the decomposition of four-vectors v w.r.t. the basis
{e0 , e1 , e2 , e3 }, i.e.,
v = v µ eµ = v 0 e0 + v i ei .
We write 0
v
v=
~v
w.r.t. the observer X.
The time-lines of an observer X are the straight lines given by ~x = const. The
special time-line ~x = 0 is the time-axis of the observer. Clearly, the time-lines
are associated with the vector e0 of the basis, which is simply because the
direction of the time-lines is given by e0 . Since the vector e0 is distinguished
from the other basis vectors by the fact that η(e0 , e0 ) = −1 (instead of +1),
and since the time-lines are constructed from it, the vector e0 is of central
importance.
Multiplication (in the sense of the scalar product) with the four-velocity u of
X yields
η(u, x) = −t .
This is reminiscent of the standard Hessian normal form1 for the representation
of a plane in a Euclidean space. In (3.11) the four-vector u is the normal vector
(in the sense of the Minkowski metric, not in the standard Euclidean sense) of
the plane.
The plane t = 0 is x | η(u, x) = 0 , which is simply the plane orthogonal to
X’s four-velocity u (and thus the plane spanned by the spatial basis vectors
{e1 , e2 , e3 }); the planes t = τ are the paralllel planes.
in other words: two events that are spacelike separated do not have a unique
chronological order—their chronological order is observer-dependent. In the
following we prove this claim.
W.l.o.g. we assume that p coincides with the origin (otherwise we make a trans-
lation). Since q is spacelike separated from p we are able to choose an inertial
observer X whose (timelike) four-velocity u is orthogonal to (the spacelike vec-
tor) q. W.r.t. X we have
0 0
p= , q= .
~o ~q
By construction, for the observer X, p and q lie in the same plane of simul-
taneity (namely t = 0) and are thus simultaneous; note that η(u, p) = 0 and
η(u, q) = 0.
Finally, consider two arbitrary events p and q that are timelike/null separated
(where the timelike/null four-vector −
→
pq is assumed to be future-oriented). For
all observers the event p comes before the event q. The proof is left as an
exercise.
Definition 3.8. The future of an event o is the set of all events that are
timelike/null separated from o by a future-oriented timelike/null vector. The
past of an event o is the set of all events that are timelike/null separated from
~ 0 , which implies
Hence, t = λ̃ and ~x = λ̃ w/w
w
~
~x = t.
w0
In other words, for the inertial observer X, the observer X ′ is in uniform motion
with velocity
w
~ w
~
~v = 0 = √ .
w 1+w ~2
Accordingly,
~v
~ =√
w = γ~v
1 − ~v 2
and w0 = γ. This implies that, while w is the zeroth unit vector as seen by X ′ ,
it looks like 0
w 1
w= =γ w.r.t. X , (3.14)
w
~ ~v
i.e., w.r.t. the basis {e0 , e1 , e2 , e3 } used by X.
w0′ 1
w= = w.r.t. X ′ = {e′0 , e′1 , e′2 , e′3 } , (3.15a)
~′
w ~o
w0 1
w= =γ w.r.t. X = {e0 , e1 , e2 , e3 } , (3.15b)
w
~ ~v
Obviously, the scenario (3.15) can also be described by interchanging the roles
of X and X ′ .
0
u 1
u= = w.r.t. X = {e0 , e1 , e2 , e3 } , (3.16a)
~u ~o
u0′ 1
u= =γ w.r.t. X ′ = {e′0 , e′1 , e′2 , e′3 } . (3.16b)
~u′ −~v
When X ′ moves with ~v w.r.t. X, then X ′ sees X move away with (−~v ).
η(u, w) = −γ . (3.17)
PARTICLES
w(λ)
~
~v (λ) = .
w0 (λ)
Timelike world lines describe the motion of particles at velocities less than the
speed of light, null world lines describe the motion of particles at the speed of
light. This is a simple consequence of (4.3): If the velocity is smaller than the
speed of light, i.e., |~v | < 1, then the vector (1, ~v )T is timelike (and, incidentally,
future-oriented), because −1 + ~v 2 < 0; hence w is timelike. If the velocity
equals the speed of light, then (1, ~v )T is null, because −1 + ~v 2 = 0; hence the
tangent vector w is a null vector. Finally, if the velocity is superluminous, then
A simple parametrization of the world line (4.3) is to use the coordinate time
t of some observer X as the parameter λ. Then (4.3) becomes
t
x(t) = ,
~x(t)
d d t 1
w(t) = x(t) = = .
dt dt ~x(t) ~v (t)
In general, reparametrizations of a curve change the length of the tangent vec-
tors. Obviously, the coordinate time reparametrization corresponds to setting
w0 = 1 in (4.3).
4.2 Tachyons
the exterior of the light cone of p. In brief: p and q are spacelike separated.
This has some counterintuitive consequences: For X, p comes before q (p is
‘emission’, q is ‘reception’ of the tachyon). However, there exist observers, for
who p and q are simultaneous. There exist even observers, for who p comes
after q. For such observers, the tachyon is seen to come out of the receiver at
q, move in the direction to the emitter, and vanish in the emitter p. This is
weird.
where ~v is the velocity of X ′ as seen by X, see (3.15). The event q lies on the
plane of simultaneity t′ = τ ′ with
′ 1 0
η(u , q) = η γ , = γ~v ~q = −τ ′ .
~v ~q
To compute the time t when X receives the tachyon that comes back we
compute the intersection of that plane with the world line (t, 0) of the emit-
ter/receiver X uses:
1 t !
η(u′ , p′ ) = η γ , = γ~v ~q .
~v 0
| {z }
−γt
t = −~v ~q , (4.5)
which is negative (when we have made the right choices, i.e., ~v ~q > 0). But this
means that X receives the tachyon that X ′ has sent back before the original
tachyon has been emitted (at t = 0). Then what is supposed to happen, if X de-
cides, during the time-interval (−~v ~
q , 0), “I don’t feel like sending a tachyon any
longer.” Then X ′ couldn’t have received anything, and consequently, wouldn’t
have sent back anything. But then where did the tachyon that X received at
t = −~v ~q come from?
R ∋ λ 7→ x(λ) , (4.6a)
and let
d
w(λ) =x(λ) . (4.6b)
dλ
be the tangent vector, which we suppose to be timelike for all λ. We ask
the question of how much time passes for the particle between two events
x1 = x(λ1 ) and x2 = x(λ2 ).
Consider the particle at a fixed value of λ, i.e., at a fixed event x(λ). To measure
how much time passes in the infinitesimal interval [λ, λ+ dλ], the particle needs
an observer. But not any observer:
If the particle is not in uniform motion, there does not exist one global rest
frame. However, there exists a momentary rest frame, i.e., an inertial observer
for whom the particle is at rest at the instant of time λ.
w(λ)
u= q . (4.7)
−η w(λ), w(λ)
In the infinitesimal interval [λ, λ + dλ] the world line connects the event x(λ)
with the event
q !
−η w(λ), w(λ)
x(λ + dλ) = x(λ) + w(λ)dλ = x(λ) + dλ .
~o
Accordingly,
q !
t(λ + dλ) t(λ) −η w(λ), w(λ) dλ
x(λ + dλ) − x(λ) = − =
~x(λ + dλ) ~x(λ) ~o
The time that has elapsed for the particle between the events x(λ) and x(λ+dλ)
we call ds. It coincides with the element dt = t(λ+dλ)−t(λ) in the coordinates
of the momentary rest frame and can thus be read off straightforwardly as the
0th component of the four-vector x(λ + dλ) − x(λ). Therefore,
q
ds = −η w(λ), w(λ) dλ . (4.8)
Remark. Since the particle is not in uniform motion in general, the momen-
tary rest frame changes along the world line of the particle. Therefore, to
avoid ambiguities in the notation we do not denote the time that elapses along
a particle’s world line by t (which is the coordinate time of some particular
momentary rest frame), but by s. This is the proper time of the particle.
Definition 4.3. The proper time (denoted by s) along a world line of a particle
is the flow of time as measured in the momentary rest frames of the particle.
In order to obtain the proper time s it merely remains to integrate (4.8) along
the particle’s world line. The proper time s along a world line is given by
Z q
s= −η w(λ), w(λ) dλ . (4.9a)
Accordingly, the proper time ∆s that elapses between two events x1 = x(λ1 )
and x2 = x(λ2 ) is
Z λ2 q
∆s = s|x2 − s|x1 = −η w(λ), w(λ) dλ . (4.9b)
λ1
Remark. It is not difficult to show that these formulas are in fact independent
on the chosen parametrization of the world line: Let κ denote the parameter
of an alternative parametrization, i.e., λ = λ(κ); then the tangent vector w.r.t.
the reparametrized curve is
d d dλ dλ
w̃(κ) = x(λ) = x(λ) = w(λ) ,
dκ dλ dκ dκ
where λ = λ(κ). Therefore,
q q dλ q
−η w̃(κ), w̃(κ) dκ = −η w(λ), w(λ) dκ = −η w(λ), w(λ) dλ ,
dκ
i.e., as expected, the concept of proper time depends only on the geometric
curve and not on the actual parametrization of the world line.
Remark. The definition of proper time is coordinate-independent (frame-inde-
pendent), since only the Minkowski metric η(·, ·) appears in (4.9).
Let us consider a (fixed) inertial observer X, whose coordinates are (t, ~x).
(This coordinate system is completely arbitrary; in particular, it need not be
the momentary rest frame of the particle under consideration at any time.)
W.r.t. this observer’s coordinate time t we parametrize the world line of the
particle. The world line then reads
t
R ∋ t 7→ x(t) = , (4.10)
~x(t)
The proper time that elapses for the particle during an interval [ti , tf ] of coor-
dinate time t, i.e., between the events xi = x(ti ) and xf = x(tf ), is thus
Z tf p
∆s = s|xf − s|xi = 1 − ~v 2 (t) dt . (4.11c)
ti
√
The factor 1 − ~v 2 is the time dilation factor. During the time dt, which passes
for the observer X, the particle’s proper time only increases by ds < dt. This
gives rise to the so-called twin paradox.
and
d 1
xB (t) = .
dt ~vB (t)
For twin A, his sister has been away for a time tf . In other words, the proper
time ∆sA for A between departure and arrival of B is ∆sA = tf . This is obvious
since A is always at rest w.r.t. the inertial coordinate system. Alternatively,
one can compute ∆sA explicitly by inserting ~v = ~vA = ~o into (4.11).
The proper time ∆sB that passes for twin B is different. We obtain
Z tf q
∆sB = 2 (t) dt
1 − ~vB
0
from (4.11). By assumption, ~vB (t) 6= 0 at least for some t; therefore the square
root is less than one at least for some t. Consequently,
Z tf q Z tf
∆sB = 2 (t) dt <
1 − ~vB dt = tf = ∆sA
0 | {z } 0
<1
4.5 Four-velocities
d
u(s) = x(s) , (4.14)
ds
and it has an important property:
Proposition 4.4. The tangent vector u = u(s) of a world line that is para-
metrized w.r.t. proper time s is normalized, i.e., η(u, u) = −1.
dx dx dλ dx ds −1 w
u= = = =p ,
ds dλ ds dλ dλ −η(w, w)
we find
1
η(u, u) = η(w, w) = −1 ,
−η(w, w)
as claimed.
Consider a (fixed) inertial observer X, whose coordinates are (t, ~x). As seen
previously, w.r.t. this observer’s coordinate time t, the world line of the particle
reads
t
R ∋ t 7→ x(t) = , (4.15a)
~x(t)
where ~v = ~v (t) is the standard three-velocity of the particle w.r.t. this observer.
Remark. Equation (3.15) and (4.16) resemble each other closely, the main dif-
ference being that ~v = const in (3.15) while ~v = ~v (t) varies along the world
line of the particle in (4.16).
From
ds p
= 1 − ~v 2 (t) ,
dt
see (4.11), we conclude that
dt 1
(s) = q =γ,
ds 1 − ~v 2 t(s)
Corollary 3.9 of section 3.4 states that the (absolute value of the) relative
velocity between two inertial observers is obtained in a coordinate-independent
(observer-independent ) manner via the scalar product. The analog holds for
the velocity of a particle w.r.t. a given observer:
η(uX , u) = −γ , (4.21)
1
γ(t) = p = −η uX , u(t) . (4.21′ )
1 − ~v 2 (t)
4.6 Photons
The simplest solutions of Maxwell’s (vacuum) equations are plane waves. Let X
be an inertial observer, whose basis is {u = e0 , e1 , e2 , e3 } and whose coordinates
are (t, ~x). In these coordinates, a plane wave is given by
~
φ(x) = φ(t, ~x) = aei(−ωt+k~x) , (4.26)
where a is the amplitude, ω the angular frequency, and ~k the wave vector. The
phase velocity ω/|~k| coincides with the speed of light (where c = 1), hence
ω = |~k| . (4.27)
A simple computation shows that (4.26) with (4.27) is indeed a solution of the
free wave equation φ = 0.
(Note that this is w.r.t. {e0 , e1 , e2 , e3 }, i.e., w.r.t. the observer X under con-
sideration.) By construction, k is a null vector, since η(k, k) = −ω 2 + ~k2 = 0.
Using the Minkowski metric, the plane wave (4.26) can be written as
µ
φ(x) = aeiη(k,x) = aeikµ x , (4.26′ )
A plane wave (of light) corresponds to free photons. As we will see in the
following, the world looks different for these particles, which is due to the fact
that k is null.
In the limit of geometric optics we deal with light rays. As implied by (4.26′ ),
in Minkowski space, a light ray (a photon) is described by a null line, a straight
world line whose tangent is a null vector.
kµ kµ = η(k, k) = 0 . (4.30)
For photons, the concept of proper time does not make sense. This is because
we have used momentary rest frames in the derivation of proper time, which
do not exist for photons—there do not exist observers who see photons at rest.
(Formally, by using the formulas of section 4.3, time seems to be at a standstill
for photons. However, it is beyond speculation how photons “perceive” time—
in fact, we can be quite certain that photons do not “perceive” anything at all.)
Accordingly, the concept of a four-velocity does not exist for photons either.
RELATIVISTIC EFFECTS
Let there be given two observers (or particles), which we call X and Y , and
which are represented by the four-velocities uX and uY , respectively. In partic-
ular, we have uX2 = η(uX , uX ) = −1 and uY2 = η(uY , uY ) = −1.
Case I. The third four-vector is uZ , which is timelike and normalized and thus
represents a third observer (or particle). Let vXY denote the (absolute value of
the) relative velocity between X and Y and vY Z the relative velocity between
Y and Z. The question we ask is the following: Given vXY and vY Z , what is
the relative velocity vXZ between X and Z?
the positive x1 -direction), then the relative velocity between A and B is not
1/2 + 3/4 = 5/4 (which would be larger than the speed of light) but 10/11;
see (4.23) and (4.24).
Case II. Alternatively, we consider the case when the third four-vector is a null
vector k, i.e., k2 = η(k, k) = 0. In this case, k describes a null line, i.e., a light
ray. Let ωY denote the (angular) frequency as seen by Y and by vXY the relative
velocity between X and Y . The question we ask is the following: Given vXY
and ωY , what is the angular frequency ωX of the photon as seen by X?
The two cases can be treated rather analogously. In the following we will take
Y as our reference observer; hence, in case I,
1 1 1
uX = γXY , uY = , uZ = γY Z . (5.1)
~vXY ~o ~vY Z
Let us concentrate on some special cases. The case α = 0 means that ~vXY and
~vY Z are parallel; α = π means that ~vXY and ~vY Z are antiparallel. In these cases
we obtain
vXY ± vY Z
vXZ = . (5.4)
1 ± vXY vY Z
The sign is a + sign, if the velocities are antiparallel, and a − sign, if the
velocities are parallel (as seen by Y ).
In the case α = π/2 the velocities ~vXY and ~vY Z are orthogonal. It easily follows
that p
vXZ = vXY 2 + v2 − v2 v2 . (5.5)
YZ XY Y Z
Let α denote the angle between ~vXY and ~kY (and recall that |~kY | = ωY ). Then
we get
2
2
ωX2 = γXY ωY2 1 − vXY cos α
and, finally,
1 − vXY cos α
ωX = ωY p . (5.6)
2
1 − vXY
i.e., the frequency increases and the wave length undergoes a blueshift.
Remark. Note that for the transversal Doppler effect the ‘symmetry’ between
the observers is broken. The fact that Y sees the observer X and the light ray
k move in orthogonal directions does not imply that X observes the same for
Y and k. On the contrary, the ‘transversality property’ is not ‘relative’.
The absolute values of η(·, ·) in the denominator ensure that the square roots
exist. However, if v (or w) is a null vector, then the r.h.s. of (5.11) would be
infinite in general; hence, (5.11) does not make sense for null vectors. Suppose
v is timelike and w is timelike or spacelike; then the inverse Cauchy-Schwarz
inequality states that
η(v, w)2 ≥ η(v, v)η(w, w) ,
Now, life is easy: We can consider the spatial parts of the four-vectors (‘spatial’
meaning ‘spatial’ w.r.t. X) and compute the conventional (Euclidean) angles
between these three-vectors. The spatial parts are ~v and w ~ and the angle αvw
between the two vectors is then simply given by
~v w
~
cos αvw = , (5.13)
|~v ||w|
~
cf. (5.10).
Therefore, PX v and PX w are indeed the projections of v and w onto the (lin-
ear) plane of simultaneity {t = 0} of X. Since PX v and PX w are purely spatial
(w.r.t. X), the (Euclidean) scalar product coincides with the Minkowski prod-
uct on such objects; e.g.,
√ p
~ = η(PX v, PX w) , |~v | = ~v~v = η(PX v, PX v) .
~v w
1
Let us exclude the degenerate case where ~v = ~o or w
~ = ~o.
η(PX v, PX w)
cos αvw = p p . (5.15)
η(PX v, PX v) η(PX w, PX w)
In the following we restrict ourselves to light rays and angles in between, since
this is the relevant case for our applications. Let k and l be null vectors
representing two light rays. According to (5.15) the angle between k and l is
η(PX k, PX l)
cos αkl = p p . (5.16)
η(PX k, PX k) η(PX l, PX l)
Accordingly, the angle between (the spatial parts of) k and l is given by
Equivalently, we have
η(k, l)
cos αkl − 1 = . (5.17′ )
η(uX , k)η(uX , l)
Remark. The formula (5.17′ ) is quite useful. Imagine an astronomer who ob-
serves two stars. The two stars correspond to points on the celestial sphere; by
construction, the angle between the two corresponds to the angle αkl between
the (projections of the) null lines k and l, which represent the light rays emitted
by the stars. Formula (5.17′ ) thus shows how the apparent angle between the
stars changes depending on the state of motion of the observer (as described
by uX ).
Remark. Rescalings of k and l (i.e., k 7→ λk k and l 7→ λl l where λk , λl are
positive), leave αkl invariant; this is intuitively clear, because angles are defined
between the directions determined by the vectors and thus do not depend on
the actual lengths of the vectors.
Equipped with (5.17′ ) we are now able to address the problem of the aberration
of light: Consider two observers, X (with four-velocity uX ) and Y (with uY ),
and two (future-pointing) null vectors k and l (representing light rays). We ask
the question of how the angle between the light rays is observed by X and Y
changes depending on the relative velocity between X and Y .
Since η(k, k) = 0 we have k0 = |~k| (and recall that k0 is essentially the frequency
of the light ray). Rescalings of k and l do not affect the angle between the two
rays, hence, w.l.o.g., we may set
1 1 1 1
uX = , uY = γ , k= , l= , (5.18′ )
~o ~v ~nk ~nl
where ~nk and ~nl are unit vectors, i.e., |~nk | = |~nl | = 1.
We denote the angle between (the spatial projections of) k and l w.r.t. X by
θX ; the angle the observer Y measures, we denote by θY . According to (5.17′ )
we have
η(k, l) −1 + ~nk~nl
cos θX − 1 = = = −1 + ~nk ~nl . (5.19)
η(uX , k)η(uX , l) (−1)(−1)
This does not come as a surprise, of course; cos θX = ~nk ~nl is the standard
formula. However, Y measures a different angle:
η(k, l) −1 + ~nk ~nl
cos θY − 1 = =
η(uY , k)η(uY , l) γ(−1 + ~v~nk )γ(−1 + ~v~nl )
cos θX − 1
= . (5.20)
γ 2 (1 − ~v~nk )(1 − ~v~nl )
Setting ~v~nk = |~v ||~nk | cos αk = v cos αk and ~v~nl = v cos αl , we have expressed
θY in terms of θX and the angles αk , αl between the light rays and the direction
of motion of Y . Here and henceforth we use the abbreviation v = |~v |.
Let us specialize (5.20) to the important case where one of the light rays, say
k, is aligned with the direction of relative motion between the observers, i.e.,
~nk k ~v . In other words, we set ~nk = ~v /v, so that
1 1
k = ~v , l= . (5.21)
v ~
n l
θ′ > θ , (5.24)
which means that the angle measured by the observer X ′ is larger than the
angle measured by X (provided that θ > 0).
If the velocity v is close to the velocity of light (which is 1 in our units), then
θ ′ ≫ θ. This has a strange effect on the field of vision. Objects that are
actually located behind appear right in front of an observer if the observer
moves sufficiently fast. For details we refer to the lecture course.
So far we have only considered particles in Minkowski space (which were rep-
resented as world lines). An extended rod (in a state of uniform motion) is
represented by a world sheet, which is a two-dimensional domain whose bound-
aries are two parallel world lines (which represent the two ends of the rod).
The two ends of the rod (which we call E1 and E2) are represented by world
lines that are straight lines (since the rod is in uniform motion). W.l.o.g. we
assume that the world line of E1 passes through the origin. We thus have
where uµ is the four-velocity of the rod, i.e., η(u, u) = ηµν uµ uν = −1, and ℓµ
is a constant four-vector representing the displacement of E1 and E2. Every
vector in ℓ + hui can take the role of the displacement vector; hence, w.l.o.g.
we may assume that ℓµ is orthogonal to uµ , i.e., η(ℓ, u) = ηµν ℓµ uν = 0.
In the rest frame of the rod, i.e., for a comoving observer, we have u = (1, ~o)T ,
and hence
µ 1 µ 0 1
E1 : s 7→ x1 (s) = s , E2 : s 7→ x2 (s) = ~ + s . (5.25′ )
~o ℓ ~o
Definition 5.1. The proper length of a rod is the length that is measured in
the rest frame of the rod.
Let us therefore calculate two events on the world lines E1 and E2, respectively,
that are simultaneous (for X); for simplicity, we determine the two events on
the plane of simultaneity t = 0. From (3.13) we obtain
η(ℓ, uX )
x1 = 0 on E1 and x2 = ℓ − u on E2 (5.27)
η(u, uX )
The length LX of the rod that X measures is the distance between the two
events; hence,
η(ℓ, uX ) η(ℓ, uX )
LX2 = η ℓ − u, ℓ − u .
η(u, uX ) η(u, uX )
Simple algebraic manipulations using η(ℓ, u) = 0 and η(u, u) = −1 show that
η(ℓ, uX )2 η(ℓ, uX )2
LX2 = η(ℓ, ℓ) − = L 2
− . (5.28)
η(u, uX )2 η(u, uX )2
where α is the angle between ~v and ~ℓ (as seen by the comoving observer).
Therefore, p
LX = L 1 − |~v |2 cos2 α ; (5.31)
in particular,
L ≥ LX . (5.32)
Remark. We see that the proper length of the rod (which is the length in the
rest frame) can also be viewed as the maximizer of the lengths of the rod as
seen by inertial observers.
Two cases are of special interest. First, the transversal case: Suppose that the
rod and the direction of motion are orthogonal, i.e., ~ℓ ⊥ ~v . Then cos α = 0
(i.e., α = π/2) and
LX = L . (5.33)
In other words, there does not exist a transversal contraction of lengths.
Second, the longitudinal case: Suppose that the direction of motion is parallel
(or antiparallel) to the rod itself, i.e., ~ℓ k ~v . Then cos α = 1 (i.e., α = 0) and
p
LX = Lγ −1 = L 1 − |~v |2 . (5.34)
This effect is called the Lorentz contraction. For the observer X the rod appears
contracted by a factor γ −1 .
6.1 Introduction
which is consistent with (6.1) and in addition implies a conservation of the sum
of the masses.
And in relativity? Suppose the definition of energy and momentum are the
same in relativistic physics. Since (6.1) must hold in each (relativistic) inertial
frame by the principle of relativity, we find that conservation of momentum
looks like
X vi + u X vj′ + u
mi = m′j (6.3)
1 + vi u 1 + vj u
i j
for an observer who moves with velocity u (w.r.t. the nameless observer we had
chosen initially); here we have used the relativistic addition of velocities (5.4).
Expanding (6.3) in u yields
X X X
mi vi + u mi − mi vi2 +
i i i
X X X X
2
+u − mi vi + mi vi3 + u3 mi vi2 − mi vi4 + . . .
i i i i
and the analogous expression for the r.h. side. Equating the two sides then
leads to X X
mi (vi )k = m′j (vj′ )k (6.4)
i j
for all k ∈ N. This is ridiculous.
Note that this leaves (6.6) untouched, since η(x, y) = − c12 x0 y 0 + ~xy~ in this
case.
The vector ~
p is the three-momentum of the particle w.r.t. the observer X,
p = mγ~v .
~ (6.9)
E = p0 c = mγc2 . (6.11)
This relationship between mass and energy is the foundation of the equivalence
of mass and energy.
W.r.t. a frame where the particle is in motion the energy consists of the rest
energy term and a term that is easily identified as the generalization of the
Newtonian kinetic energy. (In fact, for small velocities, the kinetic energy
reduces to the standard Newtonian kinetic energy.) Hence,
where
m|~v |2 3m|~v |4
Ekin = E − Erest = mc2 γ − 1 = + + ... . (6.15)
2 8c2
In units where
c=1, (6.16)
the formulas are even simpler: Since
1
p = mγ (6.17)
~v
we have
E = p0 = mγ , (6.18a)
p~ = mγ~v . (6.18b)
Moreover,
p2 = η(p, p) = −E 2 + p~2 = −m2 . (6.19)
p = ~k , (6.20)
This definition has its roots in quantum mechanics. When ν is the frequency
and ω the angular frequency of a photon, then its energy is given by E = hν =
~ω Likewise the three-momentum is p~ = ~~k. Since ω and ~k are the building
blocks of k, the definition (6.20) ensues.
Note that
p2 = η(p, p) = 0 (6.21)
for photons.
Since this equation includes the energies and the spatial components, it formu-
lates conservation of energy and (three-)momentum at the same time.
Example (Decay of a particle). Suppose we have one particle with mass m that
splits into two particles, each of mass m′ . Conservation of four-momentum
reads
p = p′1 + p′2 , (6.23)
where p = mu and p′1 = m′ u′1 and p′2 = m′ u′2 and where u, u′1 , u′2 are the
four-velocities. W.l.o.g. we can do the computations in the rest frame of the
first particle, hence
1 ′ ′ 1 ′ ′ 1
m = m γ1 ′ + m γ2 ′ . (6.24)
~o ~v1 ~v2
The equation for the (three-)momentum implies that ~v1′ = ~v ′ and ~v2′ = −~v ′ and
γ1′ = γ2′ = γ ′ . Therefore, the equation for the energy results in
1
m = 2m′ γ ′ = 2m′ p . (6.25)
1 − |~v |2
In particular we find
m > 2m′ . (6.26)
The Newtonian conservation of mass states that the sum of the masses on the
l.h. side equals the sum of the masses on the r.h. side. This type of conservation
of mass does not hold in general in special relativity.
Let the four-momenta of the particles before the collision be denoted by p and q,
respectively, and by p′ and q ′ after the collision. Four-momentum conservation
implies
p + q = p′ + q ′ . (6.27)
W.l.o.g. we set the mass m of the particles to one, so that p2 = −1, q 2 = −1,
p′2 = −1, q ′2 = −1 (where in our notation p2 = η(p, p),. . . ). From (6.27) we
are able to derive a number of simple identities,
for instance, the latter follows by computing and equating the (Minkowski)
norms of p − q ′ and p′ − q.
Let ϑpp′ denote the angle between the (spatial) direction of the incoming par-
ticle p and the outgoing particle p′ . Using (5.14) in (5.15) we get
η(p, p′ ) + η(q, p)η(q, p′ )
cos ϑpp′ = p p .
−1 + η(q, p)2 −1 + η(q, p′ )2
Since
we further obtain
s s
(−1 + η(p, q))(1 + η(p′ , q)) 1 − η(p, q) −1 − η(p′ , q)
cos ϑpp′ =p p = ,
−1 + η(q, p)2 −1 + η(q, p′ )2 −1 − η(p, q) 1 − η(p′ , q)
We can view the product η(p, q) as given (since this quantity is characteristic
of the experimental set-up), while the quantities η(p′ , q) and η(q ′ , q) appearing
in the formulas are unknown. However,
hence the only true variable is η(p′ , q). Consequently, we can eliminate η(p′ , q)
from the equations and then express ϑpq′ as a function of ϑpp′ and the given
quantity η(p, q). The most clever way is to simply multiply tan ϑpp′ and
tan ϑpq′ ; we obtain
2
tan ϑpp′ tan ϑpq′ = . (6.29)
1 − η(p, q)
Since η(p, q) = −γv = −(1 − |v|2 )−1/2 , where v is the relative velocity between
the particles p and q (or, simply, the velocity of the particle p in the lab frame),
we have
2
tan ϑpp′ tan ϑpq′ = . (6.30)
1 + γv
Consequently,
ϑpp′ + ϑpq′ < 90◦ , (6.31)
since γv > 1.
Remark. In the Newtonian limit we deal with small velocities, hence γ ≃ 1.
The formula thus reduces to
which implies that ϑpp′ + ϑpq′ = 90◦ . For a proof we use that
cos(a − b) − cos(a + b)
tan a tan b = .
cos(a − b) + cos(a + b)
If you’ve ever played billiard, you needn’t be convinced that 90◦ is the correct
result.
Let us conclude this section with an extended remark. Let us try to under-
stand what four-momentum conservation is able to tell us. We assume that p
and q are known from the experimental set-up of the problem; in particular,
the (three-)velocity associated with p is given. Therefore, since the masses
are known, the unknowns of the problem are the (three-)velocities of the two
particles after the collision. (From their three-velocities we are able to con-
struct their four-velocities.) In other words, there are 6 unknown variables.
On the other hand, four-momentum conservation provides us with 4 equations.
Consequently, we will not be able to solve the problem completely when we
only use the conservation equations; to compeletly solve the problem we must
analyze the equations of motion and the (field) theory that models the in-
teraction between the particles. In the present case, however, and in a large
number of other cases, we pose a type of question that can be answered by
using four-momentum conservation alone.
Let us discuss the problem at hand in detail. Physical intuition suggests that
the problem is effectively two-dimensional; the three-velocities involved will
be coplanar (i.e., lie in a plane). In fact, this is the first result implied by
four-momentum conservation. (We have thus used one of the four equations
of (6.27); three more to go.) Since the three-velocity associated with p must lie
in the plane (and p is known), the unknown position of the plane corresponds
to one degree of freedom (which is, e.g., a variable representing the plane’s
rotational angle about the axis through p). The one equation of (6.27) that
we have used so far thus reduces the 6 unknowns to 4 + 1 (the latter being
the plane’s angle). There are 4 remaining unknowns which we take to be the
following: The absolute values of the velocities vp′ and vq′ of the outgoing
particles and the angles ϑpp′ and ϑpq′ between the (spatial) direction of the
incoming particle p and the outgoing particles p′ and q ′ , respectively. Since
four-momentum conservation provides three conditions that we have not used
up yet, we will be able to compute all four unknown vp′ , vq′ , ϑpp′ , and ϑpq′
except for one that remains unspecified. For example, we will be able to express
three of the variables as functions of the remaining one. (A good choice for
the remaining unknown is vp′ , which is equivalent to η(p′ , q).) The reader is
encouraged to reread the analysis of this section in the light of the above.
Da capo al fine.
Let us consider a set-up that is similar to the one for the billiards. We assume
that p is the four-momentum of an ingoing photon that scatters at an electron
with four-momentum q. The interaction leads to p′ for the photon and q ′ for
the electron,
p + q = p′ + q ′ . (6.33)
The vectors p and p′ are null vectors, p2 = 0 and p′2 = 0, while q 2 = −m2e and
q ′2 = −m2e , where me is the electron mass.
The lab frame is the inertial frame where the electron is at rest initially, i.e.,
the inertial frame with four-velocity u such that
q = me u . (6.34)
Let us compute the angle ϑpp′ between the direction of the incoming photon p
and the outgoing photon p′ w.r.t. the lab frame. Since p and p′ are null we can
directly apply (5.17′ ), i.e.,
η(p, p′ )
cos ϑpp′ − 1 = . (6.35)
η(u, p)η(u, p′ )
By definition, p = ~k, where k is the null vector of the photon; hence
and analogously, η(u, p′ ) = −~ω ′ ; here, ω and ω ′ are the (angular) frequencies
of the photons p and p′ , respectively, as seen in the lab frame defined by u.
Also the product η(p, p′ ) is related to the angular frequencies. When we mul-
tiply (6.33) with p we obtain
where we have used (6.28) in the last step. Since η(p, p) = 0 we conclude that
η(p, p′ ) = η(q, p) − η(q, p′ ) = me ~ η(u, k) − η(u, k′ ) = me ~ −ω + ω ′ .
me (−ω + ω ′ )
cos ϑpp′ − 1 = (6.37)
~ ωω ′
or
1 1 ~
′
− = 1 − cos ϑpp′ , (6.38)
ω ω me
which is in turn equivalent to
h
λ′ − λ = 1 − cos ϑpp′ , (6.38′ )
me
Equation (6.38) shows the change of wave length of the scattered photon in
dependence on the angle.
ACCELERATED MOTION
7.1 Acceleration
R ∋ s 7→ xµ (s) , (7.1)
Suppose a particle with world line xµ (s) has vanishing four-acceleration. From
ẍµ (s) = 0 it follows that ẋµ (s) is a constant four-vector (whose norm must be
−1 because s is supposed to be proper time). Accordingly, xµ (s) describes a
straight world line, i.e., a particle in uniform motion.
Hence the three-velocity ~v (which is the relative velocity between the particle
and the observer) has a simple relationship with the spatial components of the
four-velocity; namely, ~u = γ~v . For accelerations life is not so simple. Let us
compute aµ (s) from uµ (s).
µ d h 1 i dγ 1 0
a = γ = + γ d~v . (7.6)
ds ~
v ds ~
v ds
d~v
~aN = (7.8)
dt
the Newtonian three-acceleration. Accordingly,
µ 4 1 2 0
a = γ (~v ~aN ) +γ . (7.9)
~v ~aN
In search for a better measure for the three-acceleration we analyze the particle
in a coordinate system in which the particle is instantaneously at rest. Let X
be a coordinate system (with coordinates {t, xi }) such that the particle is at e
e e
rest at time t = tr . Hence the velocity ~v (t) (which is the relative velocity of
e eX ) satisfies ~v (t ) = ~o, ei.e.,
the particle w.r.t. e
r
e ee
µ 1 µ
1
u =γ , u t=tr = . (7.10)
e e ~
v ~
o
ee
Furthermore, inserting ~v (tr ) = ~o in (7.7) we get
ee
!
0
µ
a t=tr = d~v . (7.11)
ee e
dt t=t r
ee e
We call
d~v
~ap = e (7.12)
dt t=tr
eee
the proper acceleration of the particle—it is the (Newtonian) acceleration mea-
sured in a momentary rest frame.
Remark. It is not difficult to argue that the proper acceleration is the actual
acceleration experienced by the particle. Namely, in the momentary rest frame
the particle’s velocities for times t in a small interval [tr − ǫ, tr + ǫ] are small.
Consequently, Newtonian physics eis a good approximation e in [te −ǫ, t +ǫ] (and,
r r
in fact, exact at t = tr ), so that the Newtonian concepts (like e d~ve/dt) reflect
e ein a correct way.
the physically reality e e
We write (7.11) as
0
aµ = , (7.13)
~ap
where it is understood that the coordinate system that is used is a momentary
rest frame. From (7.13) it is easy to see that
Hence the magnitude of the proper acceleration coincides with the norm of the
four-acceleration.
Our next aim is express the proper acceleration ~ap in terms of the Newtonian
acceleration ~aN for arbitrary observers. Equation (7.9) results in
ap = γ 3 aN . (7.16′ )
If ~v and ~aN are orthogonal, i.e., in the transversal case, we have α = 90◦ , so
that
a2 = ~a2p = γ 4~aN2
and thus
~ap = γ 2~aN , (7.17)
which describes the orthogonal (proper) acceleration of a particle.
Remark. Occasionally, equation (7.15) is expressed not in terms of ~aN = d~v /dt
but in d~v /ds. Since d~v /ds = γ d~v /dt, we have
d~v 2
a2 = γ 4 1 − ~v 2 sin2 α (7.15′ )
ds
and thus
dv
ap = γ 2 (7.16′′′ )
ds
in the case of linear acceleration.
In the case of linear acceleration, the motion of the particle is along a straight
line (in space). In slight abuse of pnotation we denote by a also the constant
value of the acceleration, i.e., a = η(a, a) = const. Equation (7.16′′ ) reads
dv
γ3 = a = const , (7.20)
dt
which is a differential equation can be integrated rather easily by noting that
Z
v
(1 − v 2 )−3/2 dv = √ . (7.21)
1 − v2
Exercise. (A time travel to the beginner’s course in analysis). If I don’t want
to bother Mathematica or Maple, then a standard way to solve the integral is
the following. We substitute v by sin w and get
Z Z
2 −3/2 1 v
(1 − v ) dv = 2
dw = tan w = tan arcsin v = √ ;
cos w 1 − v2
that’s it.
Solving
v
√ = at
1 − v2
for v yields
at
v(t) = ± √ ; (7.22)
1 + a2 t2
note that by neglecting the constant of integration we have set v(0) = 0.
Another integration then leads to
1 p
x(t) = x0 ± 1 + a2 t2 − 1 . (7.23)
a
When we consider the case x0 = ±1/a, then
1p
x(t) = ± 1 + a2 t2 . (7.24)
a
Since
1
−t2 + x(t)2 = 2 , (7.25)
a
the world line of the particle
1
x(t)
t 7→
0
(7.26)
0
represents an (equilateral) hyperbola. Therefore, constant linear acceleration
leads to ‘hyperbolic motion’.
where we have again set the constants of integration to zero. This equation
describes a hyperbola in Minkowski space.
Exercise. In units where c 6= 1 we have
c as c2 as
t(s) = sinh and x(s) = cosh .
a c a c
c2 h as i
∆x = cosh −1 .
a c
Suppose that the acceleration a equals 1g, where g is the acceleration due to
gravity. (This is the most pleasant acceleration for spacefarers.) Then
h i
∆x[in light years] = 0.97 cosh 1.03 s[in years] − 1 .
Compute how long it takes to get to the center of our galaxy (which is about
25000 light years from us). (Be amazed: it’s only 10 years 6 months.)
Consider a particle in uniform circular motion. W.r.t. some observer its world
line is given by
t
r cos ωt
xµ (t) =
r sin ωt , (7.32)
0
where r is the radius of the circular orbit and ω the constant angular velocity.
Since |~v | = ωr, the product ωr must be less than 1 to ensure that the circular
velocity is less than the speed of light.
Inserting this result into (7.9) leads to (7.35). The proper acceleration is
~ap = γ 2~aN
and the ~a (which is the spatial part of the four-acceleration) is equal to ~ap .
Since the acceleration—either of ~aN , ~ap , ~a—is orthogonal to ~v we are in the
‘transversal case’ of section 7.1.
a = γ 2 rω 2 , (7.37)
p = mu . (7.40)
Since a rocket obtains thrust by ejecting its propellant, its mass m decreases
with time. We are thus careful and write
The four momentum of the propellant (let’s say ‘gas’ for simplicity) that is
emitted in a small time interval [s, s + ds] is given by
where dmgas = ṁgas ds is the exhaust mass. (Note that each of the quantities
depends on s in general.) Consequently, the four momentum of the gas that
has been emitted up to time s is given by
Z Z s
pgas = dmgas ugas = ṁgas (s′ )ugas (s′ )ds′ . (7.42)
0
Conservation of momentum means that the total momentum, i.e., the sum of
the momentum of the rocket and the gas, remains constant, i.e.,
recall that p = p(s) is the momentum of the rocket and pgas the momentum of
the gas emitted up to time s. Differentiating (7.43) we obtain
When we multiply this equation with u (in the Minkowski sense), then
Since η(u, u) = −1, we have η(u, u̇) = 0, see (7.4); moreover, η(u, ugas ) = −γgas
2
−1/2
with γgas = (1 − vgas , where vgas is the relative velocity between the gas
emitted at time s and the rocket at time s. Therefore, equation (7.45) results
in
−ṁ − ṁgas γgas = 0 . (7.46)
We can thus replace ṁgas in (7.44) to obtain
−1
ṁu + mu̇ − ṁγgas ugas = 0 . (7.47)
Equivalently, we have
−1
ṁu + ma = ṁγgas ugas , (7.48)
where we have written a instead of u̇. It is now straightforward to take the
Minkowski norms of the l.h. side and the r.h. side,
As an example, let us consider a very simple rocket: let us assume that the
velocity vgas is a constant over time, i.e., that the gas is always emitted at the
same speed. From (7.16′′′ ) we have
dv
a = γ2 = γ 2 v̇ , (7.52)
ds
where v denotes the velocity of the rocket w.r.t. some given observer. (Since
we have assumed that lift-off is at time s = 0, we have v = 0 at s = 0.) The
rocket equation becomes
mγ 2 v̇ = −ṁvgas , (7.53)
which we can write as
v̇ ṁ
2
= −vgas (7.54)
1−v m
and thus integrate:
1
[log(1 + v) − log(1 − v)] = −vgas (log m − log m0 ) . (7.55)
2
Here, m0 is an integration constant—it is the mass of the rocket at lift-off.
Finally, r
1 − v m0 −vgas
= (7.56)
1+v m
and thus
m 2vgas
1− m0
v= . (7.57)
m 2vgas
1+ m0
We conclude that the rocket’s velocity comes arbitrarily close to 1 (i.e., c)
provided m/m0 becomes arbitarily small.
7.5 Four-force
Since
µ E
p = (7.59)
p~
w.r.t. some observer, we have
! ! !
dE dE dE
ds dt dt
Fµ = =γ =γ , (7.60)
d~
p
ds
d~
p
dt F~
When {t, ~x} is an inertial system in which the particle is at rest at some time
ee
t = tr , then
e e dE dm
e = (7.62)
dt t=tr dt t=tr
e ee e ee
and
d~v
F~ = m e =: F~p . (7.63)
e t=tr dt t=tr
ee eee
The quantity F~p is the ‘proper force’, which is experienced by the particle in
the momentary rest frame. Obviously,
Remark. We call a force heatlike if it does not change the particle’s velocity.
Then (7.61) becomes
dm µ
Fµ = γ u , (7.65)
dt
where uµ is the (constant) four-velocity of the particle. In the rest frame,
!
dm
Fµ = dt
. (7.66)
~oe
We call a force pure if it does not change the particle’s mass, i.e., if
m ≡ const (7.67)
η(F, u) = 0 . (7.69)
This is consistent with our expectations, since for a pure force we have
d µ
Fµ = p (s) = maµ , (7.70)
ds
and aµ is orthogonal to uµ .
Evidently, w.r.t. the momentary rest frame of the particle, the four-velocity of
the observer—let’s call it wµ — is given by
1
γ . (7.72)
−~v
Accordingly,
dE
−γ = η(F, w) = −γ F~p~v , (7.73)
dt
where we have evaluated the Minkowski product once in the momentary rest
frame of the particle and once in the observer’s inertial frame. We conclude
that
dE
= F~ ~v = F~p~v . (7.74)
dt
Alternatively, we can write
dE = F~p d~x , (7.75)
which represents the (infinitesimal) work done by the force on the particle.
Remark. Finally, we note that the four-force can be expressed in terms of the
proper force as
! !
1 0 ~
v ~p
F 0
~p )
F µ = γ(~v F + ~ (7.76)
~v ~p ) ~v2 = γ F~ k + F~ ⊥ .
Fp − (~v F
v2
~ ~
v p p
This is in complete analogy with the remark at the end of section 7.1.
Our analysis in this section so far can be summarized as follows: Given the
motion of a particle and its associated four-velocity, four-momentum, and four-
acceleration we are able to compute (what we call) the four-force (and the
proper force) that is connected with the particle’s motion. In particular, if the
particle’s mass m is constant along its world line, then
maµ = F µ , (7.77)
which can be solved for arbitrary initial data ~x(0), ~v (0) = ~x˙ (0), to yield ~x(t)
and thus the motion of the particle in the force field.
The relativistic analog (7.77) can not be interpreted in the analogous way.
Suppose that we are given a force field F µ (xσ ) on Minkowski space and initial
data for the particle, i.e., xµ (0) = x̊µ and uµ (0) = ẋµ (0) = ůµ (where we
continue to use the dot as referring to proper time). Then
mẍµ = F µ (xσ ) , xµ (0) = x̊µ , ẋµ (0) = ůµ
is not a well-posed IVP (initial value problem). The reason is that uµ uµ = −1
is required; hence the l.h.s. is orthogonal to ůµ at initial time, independently
of the choice of ůµ , but the r.h.s., i.e., F µ (xσ ), cannot be orthogonal to all
four-vectors ůµ simultaneously (unless F µ = 0).
We conclude that the concept of a four-force field (as a vector field on Minkowski
space) does not makes sense. To resolve the problem we must realize that it is
impossible to prescribe F µ = F µ (xσ ); instead, we attempt to prescribe F µ as a
function of the spacetime position and the four-velocity, i.e., F µ = F µ (xσ , uλ ).
Let us test the simplest ansatz that comes to mind: We assume linearity in uλ ,
i.e., we make the ansatz
Fµ (xσ , uλ ) = Fµν (xσ ) uν . (7.78)
Note that, equivalently, we are able to write F µ (xσ , uλ ) = F µν (xσ ) uν , when
we raise indices. Let us suppress the dependence on the spacetime position
and simply write Fµ = Fµν uν . The requirement is that F µ be orthogonal to
uµ (for arbitrary uµ ), i.e., Fµ uµ = 0 (for arbitrary uµ ). Let’s see whether the
ansatz (7.78) can guarantee that. We obtain
!
Fµ uµ = Fµν uµ uν = F(µν) + F[µν] uµ uν = F(µν) uµ uν = 0 ,
which is zero for all uµ if and only if F(µν) = 0.
Exercise. Prove that, if
F(µν) uµ uν = 0
for arbitrary uµ (where uµ uµ = −1), then F(µν) = 0.
In this context, the prescribed field is not a vector field but an antisymmetric
tensor field Fµν (xσ ) on Minkowski space. (The force is then a derived quantity,
see (7.78).) We will see in the next chapter that these considerations are
part of an important theory, namely the theory of electromagnetism. It is
remarkable that nature does the same thing as we did with the simplest possible
ansatz (7.78).
where we use the comma notation for the derivative, e.g., V,µ = ∂µ V = ∂V /∂xµ .
However, the required orthogonality Fµ uµ = 0 does not hold for this simple
ansatz; to ensure Fµ uµ = 0 we set
independently of uµ , as required.
aµ = Gµνσ(xλ ) uν uσ ; (7.81)
since uµ = γ(1, ~v )T and thus uσ V,σ = γV,t + γv i V,i . The spatial components
of (7.82) are i
~
Giνσ uν uσ = −δij V,j − γ 2 V,t + ~v ∇V v . (7.82i )
The temporal and spatial components of the l.h.s. of (7.81) are given by
d~v 1
0
µ 4 2
a = γ ~v + γ d~v ,
dt ~v dt
see (7.7).
Let us consider the Newtonian limit, i.e., the limit of small velocities. Then
~
equation (7.820 ) becomes ∂t V − ∂t V − ~v ∇V ~ and (7.82i ) becomes
= −~v ∇V
~ . Therefore, from (7.81), in the Newtonian limit, we obtain
−∇V
d~v ~ , d~v ~ .
~v = −~v ∇V = −∇V
dt dt
~ ~x(t) = ∂t V ~x(t) ; accordingly, the equations read
If V = V (~x), then ~v ∇V
d m~v 2
+ V (~x) = 0 , ~a = F~ ,
dt 2
where F~ = −∇V ~ is the negative gradient of the potential, i.e., the force.
(Since we did not explicitly introduce a coupling constant in (7.79), V should
be regarded as mass times potential, which makes −∇V~ the force.) We recover
the fundamental aspects of Newtonian gravity in the limit of small velocities:
The first equation is conservation of energy, which is the sum of kinetic plus
potential energy; the second equation is the law of motion.
The theory based on (7.79) and (7.80), which leads to (7.81), is a relativistic
scalar theory of gravity. Unfortunately, despite its appeal, it is false (i.e., in
contradiction with observations).
MAXWELL
d~
p
=e E~ + ~v × B
~ , (8.1)
dt
In a momentary rest frame, which is represented by {t, ~x}, the Lorentz force is
a purely electric force, ee
d~
p
~,
e = eE (8.2)
dt e
e
since ~v = ~o (at the considered time t = tr where the particle is instantaneously
e e e
where we have used (7.60) and (7.74). Comparing (8.4) and (8.5) we obtain
~k = E
E ~k , E~⊥ = γ E ~ ⊥ + ~v × B
~ . (8.6)
e e
~ and B
In particular, we note that E ~ mix: There is no observer-independent
distinction between the electric and the magnetic field. For instance, while
one observer might see only an electric field (and no magnetic field), another
observer sees both an electric and a magnetic field (according to the above
transformation).
Strictly speaking. so far we have only considered the limit of small velocities,
since we hesitated to automatically assume (8.1) for large velocities. However,
it is a fact (which is amply supported by experiments) that (8.1) is indeed the
correct representation of the Lorentz force for all velocities. In the following we
present an additional plausibility argument that supplements the experimental
evidence.
Clearly, such a tensor has 16 entries; hence, many of the components must be
redundant to encode the electromagnetic field.
Given Fµν , there is not much choice to derive a four-force from it other than
via the simple rule
F µ = eF µν uν , (8.8)
where uµ is the four-velocity of the particle. If we require the Lorentz four-
force (8.8) to be a pure force, then
F µ uµ = eFµν uµ uν = 0 , (8.9)
see (7.69). Consequently, the field tensor Fµν must be antisymmetric, i.e.,
Fµν = −Fνµ , or
Fµν = F[µν] , (8.10)
cf. the considerations of section 7.5.
As a consequence we obtain
d~
p
F~ = = e E~ + ~v × B
~ (8.14)
dt
for the Lorentz (three-)force, cf. (7.60). Comparison with (8.1) suggests
E~ = E
~ , ~=B
B ~. (8.15)
w.r.t. the chosen observer. The Lorentz force acting on a particle with four-
velocity uµ is
F µ = eF µν uν , (8.17)
where e is the charge of the particle. In the chosen coordinates we have
!
e ~v
E~
Fµ = γ , (8.18)
e E~ + ~v × B
~
ε ζ ζ] [ε ζ]
ǫγδαβ ǫγδεζ = −4δ[α δβ] = −4δα[ε δβ = −4δ[α δβ] ,
Making use of either the definition (8.20) between ∗Fµν and Fµν we find that
∗Fµν takes the form
0 B1 B2 B3
−B 1 0 E 3 −E 2
∗Fµν =
−B 2 −E 3
(8.22)
0 E1
−B 3 E 2 −E 1 0
w.r.t. the chosen frame of reference. Note that (8.22) arises from (8.16) by
replacing
B i 7→ E i , E i 7→ −B i .
~ and B
8.2 Transformations of E ~
Let L denote the Lorentz transformation connecting the inertial frames of ref-
Remark for experts. Let E µ and B µ be the electric four-field and magnetic
four-field associated with the four-velocity uµ . It is not difficult to show that
~B
E ~ = Eµ B µ . (8.26c)
The formulas (8.26) extract the physically relevant information about the elec-
tric field and the magnetic field from a given electromagnetic field tensor Fµν .
Now suppose there are two observers, X and X ′ , with four-velocities u and u′ ,
respectively. In the coordinates used by X we have
µ 1 ′µ 1
u = , u =γ , (8.27)
~o ~v
and
0 E1 E2 E3
E 1 0 B 3 −B 2
F µν =
E −B
, (8.28)
2 3 0 B1
E 3 B 2 −B 1 0
where E~ and B ~ are the electric and the magnetic field as seen by X. (Compare
the index structure of (8.16) and (8.28).) Let us denote by E ~ ′ and B
~ ′ the fields
as seen by X ′ . The associated four-fields are E ′µ and B ′µ , which are given by
! !
′µ µ ′ν
~
~v E ~k
~v E
E =F νu =γ ~ ~ =γ E ~k + E~ ⊥ + ~v × B ~ (8.29a)
E + ~v × B
5
These considerations are in complete the analogy with (7.13) and (7.14). A comment on
the notation, though: Where we wrote η(v, v) earlier, we write v µ vµ here.
and6
! !
~
~v B ~k
~v B
B ′µ = − ∗F µν u′ν = γ ~ − ~v × E
~ =γ ~k + B
~ ⊥ − ~v × E
~ . (8.29b)
B B
~ k )2 + γ 2 (E
= (E ~ ⊥ + ~v × B)
~ 2 (8.30a)
~ k )2 + γ 2 (B
= (B ~ ⊥ − ~v × E)
~ 2. (8.30b)
In addition we have
~ ′B
E ~ ′ = Eµ′ B ′µ = E
~B~. (8.30c)
Hence, summarizing,
~ ′k |2 + |E
|E ~ ′⊥ |2 = |E
~ k |2 + γ 2 (E
~ ⊥ + ~v × B)
~ 2,
~ ′k |2 + |B
|B ~ ′⊥ |2 = |B
~ k |2 + γ 2 (B
~ ⊥ − ~v × E)
~ 2,
~ ′k | |B
|E ~ ′k | + |E
~ ′⊥ | |B
~ ′⊥ | = |E
~ k | |B
~ k | + |E
~ ⊥ | |B
~ ⊥| .
when the spatial axes of the observers are properly aligned. Equations (8.31)
are the transformation rules for the electric field and the magnetic field under
~ and a
a change of inertial frame of reference. While X sees an electric field E
~ ′ ~ ′
magnetic field B, the observer X sees E and B . ~ ′
6
The ‘strange’ form of E ′µ and B ′µ is due to the fact that we use the coordinates of the
observer X to write down the fields experienced by X ′ . Clearly, in the own coordinates
~ ′ )T and analogously for B ′µ .
of the observer X ′ , E ′µ = (0, E
~B
E ~
~B
E ~ = 1
Fµν ∗F µν . (8.32)
4
The squared magnitudes of the electric and magnetic field are not invariant
under a change of observer, see (8.30). In terms of the electromagnetic field
tensor we have
~ 2 = η αβ Fαγ Fβδ uγ uδ ,
|E| ~ 2 = η αβ ∗F αγ ∗F βδ uγ uδ .
|B|
~2 − B
E ~ 2 = − 1 Fµν F µν = 1
∗Fµν ∗F µν . (8.33)
2 2
~2 −B
Since E ~ 2 has a coordinate-independent representation, it is automatically
an invariant.
Remark. The invariants E ~B~ and E ~2 − B
~ 2 are the only invariants of the elec-
tromagnetic field tensor. To show this we invoke a rather general argument.
Let Ajk be an antisymmetric matrix. Antisymmetric matrices have imaginary
eigenvalues. Moreover, eigenvalues come in complex conjugate pairs, i.e., if
λ ∈ C is an eigenvalue, so is λ̄. Therefore, if our space is 2n-dimensional, the
eigenvalues of Ajk must be ia1 , −ia1 , ia2 , −ia2 , . . . , ian , −ian for some ak ∈ R,
k = 1, . . . , n. The invariants of Ajk must be combinations of the eigenvalues;
accordingly, Ajk can possess merely n independent invariants: ai , i = 1, . . . , n.
Therefore, in our particular case, there can be up to two independent invariants
(which we have already found). In fact, one easily finds that the characteristic
polynomial of F µν is given by
~2 − B
λ4 − E ~ 2 λ2 + E
~B~ =0,
F = u ∧ E + ∗(u ∧ B) ,
which implies
∗F = ∗(u ∧ E) + ∗ ∗ (u ∧ B) = −u ∧ B + ∗(u ∧ E) ,
i.e., (8.25b). Let A and B be arbitrary 2-forms; the definition of the Hodge
star, A ∧ (∗B) = η(A, B)vol, leads to A ∧ (∗B) = (∗A) ∧ B. In particular, since
∗ ∗ = −id, we find (∗A) ∧ (∗A) = −A ∧ A for arbitrary A. Now, on the one
hand,
F ∧ (∗F ) = η(F, F )vol = 21 Fµν F µν vol ;
~2 − B
This yields (8.33) and thus invariance of E ~ 2.
7
There is a small subtlety here that involves the index structure of the tensor. We refrain
from giving any details, since the general picture remains unaffected.
8
“Simpler”.
= −2 η(E, B) vol .
On the other hand, F ∧ F = −η(F, ∗F ) vol, and η(F, ∗F ) = (1/2)Fµν ∗F µν ;
hence equation (8.32) follows.
Let’s pretend that we haven’t ever heard about a Coulomb field. But still we
would like to know the electromagnetic field generated by the point charge.
Is this possible? Yes. We can derive the electromagnetic field of a uniformly
moving charge from basic mathematical and geometric considerations.
Let us return to the problem at hand. The assumption that the world line of
the charge passes through the origin amounts to setting z µ = 0 in (8.34); hence
the distance between the event xµ and the particle’s world line is
The electromagnetic field tensor must be built from xµ , wµ , and r. There are
only two possible ways: Either
1
Fµν = w x , (8.36E )
f (r) [µ ν]
or
1 1
F̄µν = ǫµνστ w[σ xτ ] , (8.36B )
f (r) 2
Let us deonote the rest frame of the charge by X ′ and the temporal and spatial
coordinates of this comoving observer by t′ and ~x′ , respectively. Let us compute
w[µ xν] in these coordinates. Since, w.r.t. X ′ , the four-velocity wµ of the particle
is
µ 1
w = ,
~o
and the event xµ has coordinates
′
µ t
x = ,
~x′
~′ = E
~ ′ (t′ , ~x′ ) = ~x′ ~′ = B
~ ′ (t′ , ~x′ ) = ~o .
E , B (8.37)
2f (r)
The function f (r) can be determined (at least heuristically) by using geometric
arguments involving surfaces of spheres. We obtain f (r) ∝ r 3 and thus
′
~ ′ = ẽ ~x ,
E ~ ′ = ~o .
B (8.38)
r 3
where wµ is the four-velocity of the charge, r the distance between xρ and the
particle’s world line, and ẽ = e/(4πǫ0 ). In the coordinates of the comoving
observer X ′ we have
0 −x′1 −x′2 −x′3
ẽ x′1 0 0 0
Fµν = 3
(8.39′ )
r x ′2 0 0 0
x′3 0 0 0
and thus (8.38). Some comments are in order.
Remark. To derive (8.39) we have employed (8.36E ). What about (8.36B )?
Using (8.36B ) we obtain the dual of (8.39′ ), i.e.,
0 0 0 0
ẽ 0 0 x′3 −x′2
F̄µν = ∗F µν = 3 ,
r 0 −x ′3 0 x′1
0 x′2 −x′1 0
when we use the coordinates of X ′ . This electromagnetic field tensor would
correspond to
′
~ ′ = ~o ,
Ē ~ ′ = ẽ ~x ,
B̄ (8.40)
r 3
Our goal is to compute the electric and magnetic field as seen by an arbitrary
observer X, whose four-velocity is uµ and thus different from the particles’
four-velocity wµ . W.r.t. the coordinates {t, ~x} used by X we have
µ 1 µ 1 µ t
u = , w =γ x = . (8.41)
~o ~v ~x
Eµ = Fµν uν , (8.42)
We thus write
ẽ ~ ~x) = ẽ γ~
E µ (t, ~x) = γκ µ and E(t, κ. (8.46)
r3 r3
where α is the angle between κ ~ and ~v . Inserting this result into (8.46) we
finally arrive at
ẽ κµ
E µ (t, ~x) = , (8.48)
γ 2 1 − ~v 2 sin2 α κ |3
3/2 |~
9
Note the intimate connection of these considerations with the results of section 5.3.
so that
~ ~x) = ẽ ~
κ
E(t, . (8.49)
γ 2 1 − ~v 2 sin2 α
3/2 κ |3
|~
i.e.,
~ = ~v × E
B ~ , (8.50)
~ is given by (8.49).
where E
An alternative derivation
hence
~k = E
E ~ ′k ~⊥ = γ E
E ~ ′⊥ (8.52a)
~k = 0
B ~ ⊥ = γ ~v × E
B ~′ . (8.52b)
It remains to express ~x′ in terms of t and ~x, which can be done by using the
Lorentz transformation (2.31),
t
t′ γ −γ~v T
x1′ x1
= γ − 1 T (8.53)
x2′ −γ~v 1+ ~
v ~
v x2 .
x3′ v2 x3
Standard (but tiresome) algebraic manipulations show that the results can be
expressed in terms of the vector κ
~ = ~x − ~v t. Finally, we arrive again at (8.49).
~ is to again make use
The simplest way to compute the magnetic field B
of (8.52):
~k = E
E ~ ′k ~⊥ = γ E
E ~ ′⊥
~k = 0
B ~ ⊥ = γ ~v × E
B ~ ′ = γ ~v × E
~ ′⊥ .
Consequently,
~k = 0
B ~ ⊥ = ~v × γ E
B ~ ⊥ = ~v × E
~ ′⊥ = ~v × E ~ ,
and therefore
~ = ~v × E
B ~ . (8.54)
Obviously, the definitions and formulas given here make sense only once co-
ordinates {t, ~x} have been chosen. However, while the velocity field ~v (t, ~x)
undergoes the obvious transformation under a change of coordinates, the den-
sity ρ(t, ~x) does not: ρ is not a scalar function. This is because its definition
involves volumes, which change under a change of observer (Lorentz transfor-
mation), see section 5.3, where we discuss the Lorentz contraction. Using the
results of that section we can find an easy fix by simply compensating for the
change of volume with the right γ factor. However, to neatly embed the treat-
ment of distributions of particles into the four-vector formalism of Minkowski
spacetime, let us proceed a bit more systematically and geometrically.
dN = ρp d3 x . (8.58)
(Note that the volume element d3 x is derived from the rest frame coordinates
we use.)
Definition 8.1. The particle four-current density of a distribution of particles
is
j µ = ρp uµ . (8.59)
Since j µ is constructed from a scalar and a four-vector, it is a four-vector.
where ρ and ~v are the particle density and the velocity field of the particle
distribution as seen by X.
The continuity equation takes a manifestly covariant form when we use the
four-current j µ . From (8.56) we directly obtain
∂µ j µ = 0 , (8.62)
In the simplest case, the volume is a cuboid that is spanned by three spacelike
vectors aµ , bµ , cµ . W.l.o.g. we could choose these vectors to be orthogonal to
the four-velocity uµ , so that they lie in the observer’s plane of simultaneity;
this is not necessary, however. W.r.t. the co-moving observer’s coordinates we
have
0 0 0
µ 1 µ a µ b µ c
u = , a = , b = ~ , c = . (8.63)
~o ~a b ~c
Each point pµ in the volume taken up by the body propagates along a world
line pµ + suµ . Now, the proper volume of the cuboid (which is the volume
measured in the rest frame X) is simply
V = Vp = det ~a ~b ~c = ǫijk ai bj ck . (8.64)
V = Vp = ǫµαβγ uµ aα bβ cγ . (8.65)
Since the volume form ǫαβγδ is invariant under a change of inertial coordinates,
this formula holds independently of the chosen coordinates.
For a different observer X ′ , whose four-velocity is u′µ , the body moves with
some velocity ~v , i.e.,
′µ 1 µ 1
u = , u =γ (8.66)
~o ~v
where ρ = ρ(t, ~x) is the charge density and ~ = ρ~v is the charge current density.
Our aim is to find a version of Maxwell’s equations that is manifestly covariant.
11
The fact that we have chosen aµ , bµ , cµ to be orthogonal to u′µ is important here.
~E
∇ ~ = 4πρ , ~ ×B
∇ ~ − ∂t E
~ = 4π~ . (8.73)
In the preceding section we have seen that ρ and ~ can be collected into a
four-vector, the four-current density
1
jµ = ρ .
~v
Recalling that Fi0 = δij E j and Fij = ǫijk B k it is straightforward to see that
the equations (8.73) take the form
∂ν F µν = 4πj µ . (8.73′ )
~B
∇ ~ =0, ~ ×E
∇ ~ + ∂t B
~ =0. (8.74)
1 1h i
∂[0 Fij] = [∂0 Fij + ∂i Fj0 + ∂j F0i ] = ∂t ǫijk B k + ∂i Ej − ∂j Ei
3 3
1h k
i
= ǫijk ∂t B + 2∂[i Ej] .
3
Then
3ǫlij ∂[0 Fij] = ǫlij ǫijk ∂t B k + 2ǫlij ∂i Ej = δjj δlk − δjk δlj ∂t B k + 2ǫlij ∂i Ej
= 3δlk − δlk ∂t B k + 2ǫlij ∂i Ej = 2∂t B l + 2 ∇ ~ ×E~ l
Collecting the results we obtain the Maxwell equations in their manifestly co-
variant relativistic version:
This tensor is antisymmetric like Fµν and ∗Fµν ; moreover, Wµν is anti-self-dual,
i.e.,
(8.21)
∗Wµν = ∗Fµν + i ∗∗Fµν = ∗Fµν − iFµν = −iWµν . (8.78)
The complex conjugate tensor W µν is self-dual, i.e., ∗W µν = iW µν . In terms
of Wµν , the Maxwell equations read
dF = 0 , δF = 4πj . (8.80)
8.6 Four-potential
~
~v = ∇φ ⇔ ~ × ~v = 0 .
∇ (8.83)
vi = ∂i φ ⇔ v[i,j] = 0 . (8.84)
Remark. The first guess for the relation between Fµν and the potential Aµ is
probably Fµν = Aµ,ν . However, to ensure that the l.h. side is antisymmetric,
we must perform an antisymmetrization. The factor of 2 in (8.86) is introduced
for aesthetic reasons; alternatively, we can write (8.86) as
To prove (the nontrivial direction of) the statement (8.86), we must show that
2A[µ,ν] = Fµν . We proceed step by step:
Z1 Z1
σ
Aµ,ν = Fµσ (λx) x ,ν
λdλ = − Fσµ (λx) xσ ,ν
λdλ ,
0 0
hence
Z1 Z1
σ
A[µ,ν] = Fσ[ν (λx) x ,µ]
λdλ = Fσ[ν,µ] (λx) λxσ + Fσ[ν δσµ] λdλ
0 0
Z1
= Fσ[ν,µ] (λx) λxσ + F[µν] λdλ
0
Z1 Z1
2 σ
= Fσ[ν,µ] (λx) λ x dλ + Fµν λdλ .
0 0
Z1 Z1
(∗) 1
A[µ,ν] = Fµν,σ (λx) λ2 xσ dλ + Fµν λdλ
2
0 0
Z1 Z1
1 d 2
= Fµν (λx) λ dλ + Fµν λdλ
2 dλ
0 0
1 1 Z1 Z1
1
2
A[µ,ν] = Fµν (λx) λ − Fµν λdλ + Fµν λdλ = Fµν (x)
2 0 2
0 0
1
Fσ[ν,µ] = Fµν,σ . (∗)
2
It is important to note that there does not exist a unique four-potential. Let
µ and õ be four-potentials of Fµν , i.e.,
A[µ,ν] = 0 ,
To get rid of the second term, we require the four-potential to satisfy the Lorenz
gauge condition13
∂ν Aν = 0 . (8.90)
Clearly, a given four-potential Aµ will in general not satisfy the Lorenz gauge
condition (8.90); however, we can always make use of the gauge freedom (8.88)
to achieve (8.90). To see this, assume that ∂ν Aν 6= 0; then there exists a scalar
function Λ and a modified four-potential µ = Aµ + ∂ µ Λ, such that
∂ν Āν = ∂ν Aν + ∂ ν Λ = ∂ν Aν + Λ = 0 ,
Λ = −∂ν Aν
possesses a solution Λ.
Exercise. Show that the four-potential Aµ defined by (8.87) automatically sat-
isfies the Lorenz gauge condition, if j µ = 0.
where we have set A0 = φ (so that A0 = −φ). Since Fµν = 2A[µ,ν] we have
and hence
~ = ∂t A
E ~ + ∇φ
~ , ~ = −∇
B ~ ×A
~. (8.93)
Therefore, the components φ and A~ of the four-potential coincide (up to a minus
14
sign ) with the electric scalar potential and the magnetic vector potential,
which are known from the standard formulation of Maxwell’s equations in terms
of potentials.
The Maxwell equation (8.91) for the four-potential Aµ in Lorenz gauge can
be solved if boundary conditions are prescribed. In particular, we obtain the
well-known advanced and retarded solutions discussed in every textbook on
electromagnetism.
Remark for experts. Using the tensor Wµν = Fµν + i ∗F µν , see (8.77), and its
complex conjugate, the energy-momentum tensor Tµν can be constructed in a
simpler way,
1 1 σ
Tµν = Wµσ W ν . (8.94′ )
4π 2
14
If had chosen the signature of the Minkowski metric to be sign η = (+ − −−) instead of
(− + ++), there would not appear a minus sign here. However, our choice of signature
is well adapted to other situations; in particular, when we go over from special relativity
to general relativity.
Proof. To prove that the two expressions for Tµν coincide we perform a straight-
forward calculation.15
σ
Wµσ W ν = Fµσ + i ∗Fµσ Fν σ − i ∗Fν σ
= Fµσ Fν σ + 2 i ∗
F[µ σ Fν]σ + ∗Fµσ ∗Fν σ
| {z } |{z}
ητ [µ ∗F τ σ Fν]σ ηνπ ∗F πσ
The second term can be manipulated by making use of the results of Ap-
pendix B. We have
ǫτ σρπ Fνσ Fρπ = ǫτ σρπ Fν[σ Fρπ]
because of (B.10) and further
and thus
ǫτ σρπ Fνσ Fρπ ∝ ǫτ σρπ ǫνσρπ = ǫσρπτ ǫσρπν = −6δντ ,
where we have used (B.25). Therefore,
i.e., the second term in (8.95) vanishes. The third term in (8.95) can be ma-
nipulated along the following lines:
Here we have used (B.25). In the next step we apply (B.16), i.e.,
ξ] π
ǫµστ ρ ǫπσλξ F τ ρ Fλξ ηνπ = − 2δµπ δτ[λ δρξ] − 4δµ[λ δ[τ δρ] F τ ρ Fλξ ηνπ .
15
The calculation is straightforward, but this doesn’t mean that it’s easy. Actually, it isn’t.
In fact, it would be easier to define Fµν , ∗F µν and Wµν as matrices in Mathematica or
Maple and let the computer do the rest. However, we are here on a training ground for our
later lives as (theoretical) physicists. (Yes, this might be part of what you’ll be doing. . . )
ǫµστ ρ ǫπσλξ F τ ρ Fλξ ηνπ = −2ηµν δτλ δρξ F τ ρ Fλξ − 4δµλ δτξ δρπ F τ ρ Fλξ ηνπ .
To see this we simply note that Fνσ Fµ σ = Fν σ Fµσ = Fµσ Fν σ , hence the first
term is symmetric; for the second term, symmetry is evident.
where we have used equation (∗) of page 136. We note that Tµν is divergence-
free in the absence of sources, i.e.,
T µν,ν = 0 if j µ = 0 ; (8.97)
T µν,ν = −F µν j ν . (8.98)
Let us investigate the r.h.s. of this equation. The charge four-current density
j µ is given by
j µ = ρe uµ ,
see (8.59), where in the present context ρe is the proper charge density (i.e.,
charge per proper volume) of the charge distribution. Hence,
F µν j ν = ρe F µν uν . (8.99)
F µ = eF µν uν .
Since ρe is charge per proper volume, we conclude that (8.99) represents a force
density Fµ ,
Fµ = F µν j ν ; (8.100)
it is the Lorentz force (per proper volume) acting on the charge distribution of
particles.
Now let {t, ~x} be inertial coordinates of an inertial observer X. The total
four-force acting on an infinitesimal volume d3 x is
µ
Ftot = Fµ γ d3 x , (8.101)
where the factor γ enters because the proper volume of d3 x is γ d3 x, see (8.71).
The (total) four-force is connected with the derivative (w.r.t. proper time16 ) of
the (total) four-momentum in d3 x, which is
d µ d
ptot = γ pµtot . (8.102)
ds dt
16
Proper time is defined along the integration curves of the four-vector field uµ .
Making use of these considerations we proceed with equation (8.98). The inte-
gral version is Z Z
T ,ν d x + F µν j ν d3 x = 0 .
µν 3
(8.104)
where dσi is the surface element of the boundary ∂V of the volume V . If the
fall-off as |~x| → ∞ of the involved fields is sufficiently fast, then the boundary
integral vanishes in the limit of infinitely large spheres. Therefore,
Z
∂i T µi d3 x = 0 .
R3
Z
d
T µ0 + Pµ d3 x = 0 . (8.108)
dt R3
is the four-momentum density of the electromagnetic field (as seen by the ob-
server X). Consequently, the energy density is
00 1 0σ 0 1 σλ 00 1 ~ 2 1 ~ 2 ~ 2
T = F F σ − Fσλ F η = E − E −B
4π 4 4π 2
1 1 ~ 2 ~ 2
= E +B , (8.110)
4π 2
where we have used (8.33). Analogously, the three-momentum density is
i0 1 iσ 0 1 σλ i0 1 ij 0 1 ijk
T = F F σ − Fσλ F η = F Fj = ǫ Bk Ej
4π 4 4π 4π
1 ~ ~
i
= E×B (8.111)
4π
~ ×B
The vector E ~ is the well-known Poynting vector.
The second conclusion concerns the balance equation (8.108). The sum of the
total momentum (in R3 ) of the electromagnetic field and the total momentum
of the charged matter is constant, i.e., we have conservation of energy and
momentum of the system (matter + fields).
The energy density T 00 and the momentum density T i0 are w.r.t. the chosen
observer X. Let wµ denote the four-velocity of this observer, i.e.,
µ 1
w =
~o
T 00 = T µν wµ wν = Tµν wµ wν (8.113)
Finally, let us analyze the balance equation for general volumes in the vacuum
case (which corresponds to j µ = 0, Pµ = 0). We obtain
Z Z
d 00 3
T d x+ T 0i dσi = 0 and (8.114a)
dt V ∂V
Z Z
d
T k0 d3 x + T ki dσi = 0 . (8.114b)
dt V ∂V
The first equation states that the energy of the electromagnetic field in a given
volume is transported through the boundary via the Poynting vector T i0 . The
second equation formulates the loss of momentum in a volume through its
boundary in terms of the ‘stress-tensor’ T ij .
BEYOND MINKOWSKI
Newtonian gravity
In this context, mag is a property of the point particle, which we call its active
gravitational mass; a priori it might be different from its (inertial) mass. More
generally, the gravitational potential is determined by the Poisson equation
∆Φ = 4πGρag .
In this equation, ρag is the density of the active gravitational mass of the
configuration that generates the gravitational field, and G is the gravitational
constant (which takes the value G = (6.6743 ± 0.0007) × 10−11 m3 kg−1 s−2 ).
The gravitational field is the vector field
~ = −∇Φ
φ ~ .
When we invoke Newton’s third law of motion we see that the concepts of
active and passive gravitational mass coincide. Consider two point particles
with gravitational masses mag , mpg and Mag , Mpg , respectively. Then the
force exerted by the first particle on the second must equal the force exerted
by the second particle on the first, i.e.,
mag Mpg Mag mpg
−G
2
= −G ,
r r2
where r is the distance between the particles. We conclude that
mag Mag
= ,
mpg Mpg
which implies that we can set
mag = mpg
by adjusting units. In other words, there is but one gravitational mass mg that
enters the equations, i.e.,
∆Φ = 4πGρg , F ~,
~ = mg φ
mi = mg . (9.2)
There is just one concept of mass m that enters the equations, i.e.,
∆Φ = 4πGρ , ~,
F~ = mφ
Consider
¨(t) = φ
~x ~ t, ~x(t) + K
~ , (9.6)
which models the motion of a particle in a gravitational field φ(t, ~x), where
we include the possibility of an additional force.3 For the majority of thought
experiments, the gravitational field is assumed to be time-independent, i.e.,
~ = φ(
φ ~ tC , ~x); this is convenient but not necessary. Let ~x(t) be a particular solu-
tion (obtained, e.g., by prescribing initial conditions) and define, in slight abuse
of notation, φ(t) ~ =φ ~ t, ~x(t) . When we take a trajectory ~x(t) of (9.6) that is
sufficiently close to ~x(t) at t = t, then φ ~ t, ~x(t) ≈ φ
~ t, ~x(t) = φ(t)
~ in the time
interval containing t where ~x(t) is sufficiently close to ~x(t). Accordingly, (9.6)
reads
¨(t) ≈ φ(t)
~x ~ +K ~ (9.7)
in this time interval. Considering particles with initial data increasingly close
to the initial data of ~x(t) at t, which is ~x(t) and ~x˙ (t), equation (9.7) holds for
increasingly long times.
We then obtain
¨′ (t) = ~x
~x ¨(t) + φ(t)
~ .
Now consider Newton’s second law ~x ¨(t) = K ~ in the absence of gravitational
fields. When expressed in the accelerated frame we get
¨′ (t) = φ(t)
~x ~ +K ~ . (9.7′ )
3 ~ so
For instance, imagine a (small) ball sitting on a table; the table provides the force K
that the ball doesn’t move. In the general case, K ~ depends on time and on the spatial
position; we choose to suppress this dependence. Free fall corresponds to K ~ ≡ ~o.
~x′ = ~x − ~x(t) ,
which implies that the freely falling frame in a gravitational field is, in a neigh-
borhood of the trajectory under consideration, equivalent, to zeroth order, to
an inertial frame (in the absence of gravitational fields). More specifically we
obtain
ẍ′i (t) = φi,j t, ~x(t) x′j (t) + O(~x′2 ) = −Φ,ij t, ~x(t) x′j (t) + O(~x′2 ) .
i.e.,
¨′ = φ(~
~x ~ x′ ) . (9.9b)
This equation is identical to (9.8b).
Let us summarize. The gravitational field (9.8a) yields the equation of mo-
tion (9.8b); the gravitational field (9.9a), which corresponds to
~ ′ (~x′ ) = ψ(t,
ψ ~ ~x) = φ(~
~ x′ ) − ~g (9.9a′ )
in accelerated coordinates, leads to the same equation of motion, i.e., equa-
tion (9.9b), w.r.t. an accelerated frame.
Let us postpone this question. Instead, let us simply assume that the relativistic
theory of gravity is a theory in which the equivalence principle is implemented
in a well-defined manner. Let us further assume that the equivalence principle
amounts to ‘local equivalence of gravitation and acceleration’, which implies
that results on accelerated frames of reference will shed light on effects in a
relativistic theory of gravity.
Consider world lines of the form (9.10) that are infinitesimally separated, i.e.,
1 sinh as 0 1 sinh as′
and + ; (9.11)
a cosh as dx a cosh as′
because (sinh as, cosh as)T is the spatial frame vector (i.e., the normalized
spacelike vector orthogonal to the four-velocity). The result is
We infer that the (proper) distance between neighboring world lines is growing
with time. This is despite the fact that the acceleration along the two world
lines is identical.
Remark. Imagine two observers on two world lines (9.11) that are separated
by some (finite) initial distance. To get a measure of their distance, observer
one sends light signals to observer two, who signals back instantaneously upon
receipt. Observer one measures the times that pass between emission and
reception. Interestingly enough, these times increase over all bounds; in fact,
after some time, the signals sent by observer two cannot even reach observer
one any longer (which is a simple consequence of the fact that the asymptotes
of the two world lines are parallel null lines). Note that this is despite the fact
that the two observers experience the same acceleration.
For our present purposes this does not matter. The core of the equivalence
principle, which is the local equivalence of gravitation and acceleration, is un-
touched by these considerations (which are of course of a global nature). Let us
be persistent nonetheless; let us find a different family of accelerated observers.
Then 1 + a d 2
x̄µ x̄µ = (1 + a d)2 xµ xµ = ,
a
hence (9.14) is again a hyperbola and thus represents the motion of an observer
with constant acceleration.7 However, the acceleration differs from the original
acceleration a; it is
a
ā = . (9.15)
1 + ad
Conversely, the pair of world lines
1 1
xµ xµ = , x̄µ x̄µ = (9.16)
a2 ā2
is equidistant (with d = 1 − ā/a). For two uniformly accelerated observers to
retain a constant distance, the accelerations must be different.
in some timelike 2-plane (e.g., the plane spanned by {t, x}) represents a family
of uniformly accelerated observers (where, however, each observer experiences
a different acceleration) such that the pairwise distances between the observers
remain constant.
Take a particular world line (9.13). Consider, for |σ| < a−2 , the set
n o
x̄µ (s) = xµ (s) + σ ẋµ (s) , s ∈ R . (9.18)
Each event x̄µ (s) is separated from xµ (s) by a constant amount of proper time
σ along ẋµ (s). We find
1
x̄µ x̄µ = xµ xµ + σ 2 ẋµ ẋµ =− σ 2 = const ,
a2
i.e., the world line x̄µ (s) is again a hyperbola. Therefore, two world lines
of (9.17) are spatially and temporally equidistant.
However, we may repeat that, for our present purposes, this does not matter.
The core of the equivalence principle, which is the local equivalence of grav-
itation and acceleration, is untouched by these considerations (which are of
course of a global nature). We have good reasons to believe that a single ac-
celerated observer, represented by a particular world line in Minkowski space,
is completely equivalent to an observer experiencing accelerations caused by a
gravitational field. Furthermore, locally, in a neighborhood of that observer,
and for some finite time, the effects stemming from the acceleration of the
frame of reference, are approximately equivalent to the effects stemming from
gravity.
We do not yet have a relativistic theory of gravity at hand. There is only one
property of that theory we expect to hold (and which we thus assume): The
equivalence principle, i.e., the local (approximate) equivalence between gravi-
tation and acceleration discussed in section 9.1. Despite this severe restriction
we are able to derive a number of results.
Imagine the Piazza del Duomo (Piazza dei Miracoli) in Pisa and Galileo Galilei
standing on top of the Leaning Tower and performing free fall experiments. But
let us twist history and imagine that Galileo intends to drop . . . clocks. On the
lawn at the base of the Leaning Tower, Galileo’s assistant, let’s call him Albert,
who is like Galileo equipped with precise chronometers, is eager and ready to
begin with the measurements.
Suppose that the clocks Galileo drops are perfectly synchronized with his
own chronometer—a millisecond on the clocks is a millisecond on Galileo’s
chronometer. The gravitational field of the Earth accelerates a falling clock
until it reaches the velocity v at the base of the tower. Shortly before Albert
breaks the fall of the clock, he makes time measurements by comparing the
clock’s time with his chronometer’s time. The result is fascinating: The clock
and Albert’s chronometer are asynchronous and, really and truly, the clock
runs faster than the chronometer by a factor of
1 v2
1+ .
2 c2
Galileo is sceptical of the findings his assistant relates to him upon his return
at the base of the tower; he does not comprehend the results. But surprisingly,
his assistant understands. “This is how it works,” says Albert, “we use the
equivalence principle to explain the results.”
g, see section 9.1. Albert and the falling clock are spatially close to Galileo
and the experiment is rather short; the condition of locality in the equivalence
principle is thus satisfied and we may regard Albert as another uniformly accel-
erating observer, represented by another hyperbola in Minkowski space. The
freely falling clock, on the other hand, is to be identified with an inertial clock
by the equivalence principle; free fall in a gravitational field is equivalent to
inertial motion, which corresponds to a straight line in Minkowski space. Since
the clock is inertial we are permitted to apply our collective special relativistic
reasoning to measurements made w.r.t. this clock. At the moment Galileo drops
the clock, the relative velocity between Galileo’s chronometer and the clock is
zero. Hence, obviously, there is no time dilation between the two. However,
shortly before Albert catches the clock, the relative velocity between Albert’s
chronometer and the clock is v. Hence, as seen from the clock’s perspective,
the chronometer undergoes a time dilation by the factor of
r
v2 1 v2
1− 2 ≈1− ,
c 2 c2
see the considerations of section 4.3. Note that the clock is inertial, while the
chronometer is not (cf. the twin paradox in section 4.4).
We conclude that gravity influences the course of (proper) time; ‘deeper down’
in the gravitational field, time runs slower than ‘higher up’.
where ti and tf are the initial and final time, i.e., throw and catch, respectively;
v(t) is the relative velocity; |v(ti )| = |v(tf )| and v = 0 at the turning point; see
section 4.4.
Still hesitant to accept Albert’s explanations, Galileo takes matters into his
own hands. While he mounts the tower again, he orders Albert to stay at
the foot of the tower until he will return. After some hour, Galileo descends
again; with some despair he compares his chronometer with Albert’s, which
had remained at the ground for the entire time. And there it is again, the time
difference. While on Galileo’s chronometer the time ∆t has passed, Albert’s
chronometer shows that only
gh
1 − 2 ∆t
c
has passed for Albert; h is of course the height of the Leaning Tower. “Did
you tamper with your chronometer?” Galileo cries accusingly. But Albert is a
physicist of impeccable character. “Let me explain,” he says.
Galileo and Albert are observers who take fixed positions in a stationary gravi-
tational field. The equivalence principle tells us that, equivalently, Galileo and
Albert are represented by uniformly accelerated observers in Minkowski space,
i.e., by two hyperbolic world lines. The condition of locally in the equivalence
principle is satisfied, since the spatial separation, i.e., h is small. We take
the pair of hyperbolas (9.16) to be Galileo’s and Albert’s world lines, respec-
tively, where a is replaced by g and the distance d corresponds to the height h.
Therefore, we have
1 sinh gsA
Albert:
g cosh gsA
1 sinh ḡsG
Galileo: ,
ḡ cosh ḡsG
where sA and sG are Albert’s and Galileo’s proper time, respectively; further-
more, from (9.15), where we use SI units, we find
g gh
ḡ = ≈ g 1 − .
1 + gh/c2 c2
Suppose that Galileo leaves Albert when sA = 0 and sG = 0. To compute the
proper times sA and sG that have passed until Galileo’s return, we equate the
two time components, i.e.,
1 1
sinh(gsA ) = sinh(ḡsG ) ;
g ḡ
in this context we have neglected the time Galileo needs to ascend the tower
and related subtleties. We thus need to solve
gh
sinh(ḡsG ) ≈ 1 − 2 sinh(gsA ) ,
c
which yields
gh
ḡ sG ≈ g sA − tanh(gsA ) 2 ≈ g sA
c
and thus
gh
sG ≈ sA 1 + 2 ,
c
gh
sA ≈ sG 1 − 2 .
c
This is in perfect accordance with the result of the experiment.
“You seem sure of yourself, Albert,” Galileo observes, “but, according to the
first experiment, should not the relation
r
v2
sA ≈ sG 1 − 2
c
hold? Don’t you remember the time dilation factor you computed with the
clock’s terminal velocity v?” “Sure, I do. Obviously, we have
v = gt and h = 1
2 gt2 .
It follows that p
v= 2gh
and we obtain
r
v2 1 v2 1 2gh gh
1− 2
≈ 1 − 2
=1− 2
=1− 2 .
c 2c 2 c c
Quod erat demonstrandum.”
Inserting v we obtain
gh
νA ≈ 1 + 2 νG ,
c
which reproduces the measurement perfectly. Furthermore, this result is in
perfect accord with the previous considerations on time. Since
gh
sA ≈ 1 − 2 sG
c
we expect that frequencies behave like the reciprocals of sA and sG , i.e.,
gh −1 gh
νA ≈ 1 − 2 νG ≈ 1 + 2 νG .
c c
A consistent picture of the gravitational redshift effect emerges. Light traveling
from ‘deeper down’ to ‘higher up’ in a gravitational field is redshifted; on the
opposite path there is a blueshift.
Galileo and Albert decide to pack their gear and return to their humble abode
in the poorer quarters of Pisa. On their way home, Albert is deeply immersed in
thought. Finally, he asks: “Master Galilei, may I confront you with a Gedanken-
experiment of mine? It concerns another property of light, its bending in the
gravitational field of the Earth.”
Suppose that the Earth is flat, which implies that the gravitational field is
exactly the same (in magnitude and direction) along the surface. On this flat
Earth Galileo and Albert stand some hundred meters apart from each other.
Galileo sends a beam of light in the direction toward Albert; the initial height
of the beam is precisely specified, and, at the point of emission, the beam is
exactly parallel to the Earth’s surface. Interestingly enough, the height of the
beam, when Albert receives it, is less than the original height by
1 g d2
,
2 c2
where d is the distance between Galileo and Albert.
1 2 1 g d2
∆h = gt = ,
2 2 c2
as claimed.
g d2
t=0, x=− , y =d.
2 c2
Since, at t = 0, the Minkowski coordinate x measures the height in the grav-
itational field, the height of the spot that the beam of light produces on the
screen is less, by
g d2
,
2 c2
than the original height (since emission was at t = 0, x = 0). The Minkowski
picture thus reproduces, in a more formal way, the previous result.
Galileo approves of Albert’s ideas and entrusts him with an exercise. “Turn
the Gedankenexperiment into an actual experiment; allow for the fact that the
surface of the Earth is curved.”
gh
1±
c2
is particularly insistent and irritating; it dances in his mind and cries: “I’m the
gravitational potential, haven’t you noticed?” And the nightmare continues.
Galileo rides on a beam of light; he leaves the Earth’s surface and travels higher
and higher until he sees the Earth as a blue ball floating in space. “Compute
my redshift”, the light ray commands. Galileo is willing to obey but he fails.
And Albert steps forward from a dark corner of space and says: “Never, never
ever, try to push the boundaries of the equivalence principle. Be aware of its
limitations. Equivalence of gravitation and acceleration is merely local. Global
considerations of this kind are bound to fail.” And of out nothing a signpost
appears that says: “Realm of General Relativity. Border crossing.”
9.3 Metrics
In the following we discuss the concepts this definition is based on without going
into too much detail.8 The notion of a manifold is a fundamental concept
in differential geometry. We avoid a formal definition and simply say that
a manifold of dimension n is a ‘space’ that locally looks like (an open set
of) Rn , i.e., the defining property of a manifold of dimension n is to admit
8
These lecture notes are not the right place to discuss the mathematics underlying gen-
eral relativity in detail. The reader is referred to the lecture course ”Einführung in die
Relativitätstheorie und Kosmologie II” instead.
When x is not fixed but regarded as varying on the manifold we obtain a vector
field. Hence a vector field on a manifold M corresponds to a vector at each
point, i.e.,
∂
v = vi i , (9.21)
∂x
where v i depends on the position on M .
∂ ∂
v i (xk ) and v̄ i (x̄k )
∂xi ∂ x̄i
w.r.t. the first and the second coordinate system, respectively. It is straight-
forward to see that
∂ x̄i j k
v̄ i (x̄k ) = v (x ) . (9.24)
∂xj
Compare (9.23) and (9.24) with (A.18) and (A.19) by setting Aij = ∂ x̄i /∂xj .
The transformation behavior (9.24) of vector fields under a change of coordi-
nates generalizes the transformation behavior of scalar field (A.61).
∂
dxi = δij .
∂xj
Hence, at each point, a covector maps vectors to the real numbers. A covector
field corresponds to a covector at each point, i.e.,
a = ai dxi , (9.26)
∂
a(v) = ai dxi v j = ai v i . (9.27)
∂xj
∂xi j
dxi = dx̄ (9.28)
∂ x̄j
∂xj
āi (x̄k ) = aj (xk ) . (9.29)
∂ x̄i
This is in complete analogy with (A.20) and (A.22), when we note that
∂xi ∂ x̄i −1
= .
∂ x̄j ∂xj
Tensor fields are obtained via the tensor product of vector and covector fields.
For example,
∂ ∂ ∂
T = T ij klm i
⊗ dxj ⊗ k ⊗ l ⊗ dxm (9.30)
∂x ∂x ∂x
is a tensor of rank (3, 2). The transformation of tensor fields under a change
of coordinates generalizes the transformation of vector and covector fields in
the obvious way: Each contravariant (i.e., upper) index follows (9.24), each
covariant (i.e., lower) index behaves like (9.29), i.e.,
′ ′
∂ x̄i ∂xj ∂ x̄k ∂ x̄l ∂xm ′ ′ ′
T̄ ij klm (x̄n ) = i‘ j k‘ l‘ m
T i j ′ k l m′ (xn ) . (9.31)
∂x ∂ x̄ ∂x ∂x ∂ x̄
If g(v, v)|x is positive for all v(x) 6= 0 and at each point x, which corresponds to
positive definiteness at each point, then the metric is called Riemannian; below
we discuss a prominent example. Likewise, the angle between the two vectors
v(x), w(x) at the point x ∈ M is defined by using g(v, w)|x in the obvious way.
Metrics are not necessarily Riemannian. If the signature of g|x , at each point
x, is (− + ++), then the metric is called Lorentzian. This means that it is
possible to choose, at each point x ∈ M , a basis such that the components gij
(or rather gµν ) of the metric form the diagonal matrix diag(−1, 1, 1, 1). Let us
reiterate: It is possible to bring the metric to the standard Minkowski form,
but merely at each point x ∈ M separately. There does not exist a coordinate
system and an associated coordinate frame such that g = ηij dxi dxj globally
unless the metric is the Minkowski metric itself.13
Exercise. Show that the non-degeneracy of a metric, see (9.34), can be charac-
terized alternatively as in (A.25c).
Metrics raise and lower indices; e.g., a vector field v = v i ∂i becomes a covector
field with components
vi = gij v j .
13
If we achieve g = gij dxi dxj to fulfill gij |x = ηij at a point x, then gij |y 6= ηij for almost
all points in a neighborhood of x. The diagonal form can not be achieved on open sets.
Exercise. Show that the inverse metric gij is obtained by raising the two indices
of the metric gij .
Let us discuss a prominent Riemannian metric, the standard metric on the unit
sphere S 2 . In a first step we take the standard metric14
dx2 + dy 2
we obtain
2 2
dx2 + dy 2 = (cos ϕ)dρ − ρ (sin ϕ)dϕ + (sin ϕ)dρ + ρ (cos ϕ)dϕ
dx2 + dy 2 + dz 2
and thus
2
dx2 + dy 2 + dz 2 = sin ϑ dr + r cos ϑ dϑ
2
+ r 2 sin2 ϑ dϕ2 + cos ϑ dr − r sin ϑ dϑ
In the next section we turn our attention to the most basic example of Lo-
rentzian metrics: The Minkowski metric itself.
Recall that dt2 = dt ⊗ dt, (dx1 )2 = dx1 ⊗ dx1 , etc. In the inertial coordinates
{t, x1 , x2 .x3 } the components of the Minkowski metric are constant.
in particular the components are constant and do not depend on the posi-
tion. Therefore we are able to go over to orthonormal coordinates; we refer to
section A.4 for further details. We find that
1 1
u = √ (t + x) , v = √ (t − x) (9.39)
2 2
yields the desired result. Indeed,
1 1
−2du dv + dy 2 + dz 2 = −2 √ (dt + dx) √ (dt − dx) + dy 2 + dz 2
2 2
= −dt2 + dx2 + dy 2 + dz 2 .
The coordinates (u, v, y, z) are called ‘double null’ for the reason that the co-
ordinate lines defined by u and v are null lines. Clearly,
η(∂u , ∂u ) = 0 , η(∂v , ∂v ) = 0 ,
which is obvious from (9.38). Hence, in double null coordinates, the light cone
of a point (in 2-dimensional Minkowski space) is given by the coordinate lines
through that point.
where (t′ , x′ , y ′ , z ′ ) ∈ R4 with x′ > 0. The claim is that (9.40) is again the
Minkowski metric, represented in an accelerated frame of reference.
and therefore
∂t′ ∂t′ 2 ∂x′ ∂x′ 2
ds2 = −(−t2 + x2 ) dt + dx + dt + dx + dy 2 + dz 2
∂t ∂x ∂t ∂x
∂t′ 2 ∂x′ 2
2 2
= −(−t + x ) + dt2
∂t ∂t
∂t′ ∂t′ ∂x′ ∂x′
2 2
+ −2(−t + x ) +2 dt dx
∂t ∂x ∂t ∂x
∂t′ 2 ∂x′ 2
2 2
+ −(−t + x ) + dx2 + dy 2 + dz 2
∂x ∂x
= −dt2 + dx2 + dy 2 + dz 2 ,
9.5 Geodesics
Well, if c is a (differentiable) curve connecting the two points, then the length
of c is
Z Z λ2 Z λ2
dxi dxj 1/2
ds = g(w, w)1/2 dλ = gij xk (λ) dλ , (9.43)
λ1 λ1 dλ dλ
where
dxi (λ)
wi (λ) =
dλ
is the field of tangent vectors along the curve c, which is parametrized17 as
λ 7→ xi (λ).18
Example. We use (9.37) in (9.43) to compute the circumference of a circle of
latitude. This circle is represented by the curve ϑ = const parametrized by
ϕ ∈ [0, 2π). The tangent vector field is ∂ϕ ; the length of the tangent vectors is
gS 2 (∂ϕ , ∂ϕ ) = sin2 ϑ .
The length of the curve is computed through the path integral, i.e.,
Z Z 2π q Z 2π p
s = ds = gS 2 (∂ϕ , ∂ϕ ) dϕ = sin2 ϑ dϕ = 2π sin ϑ .
0 0
16
The metric is Minkowski in the former case; in the latter it is not.
17
The choice of parametrization is irrelevant. Simple exercise: Show the invariance of (9.43)
under reparametrizations.
18
For simplicity we assume that the entire curve is contained in the domain of one chart
with coordinates (x1 , . . . , xn ). If two (or more) charts are necessary to cover the curve we
divide the curve into two (or more) parts whose lengths we compute separately by (9.43);
adding up we obtain the total length.
In connection with the remarks of section 9.3 we note that our computation is
based on purely intrinsic quantities of the sphere (namely the metric and the
tangent vector field of the curve); the ambient vector space did not enter our
considerations.
Let us turn to spacetimes, i.e., manifolds with Lorentzian metrics (where the
signature is (− + ++) as usual). A curve is said to be timelike if the norm
of its tangent is everywhere timelike, i.e., g(v, v) < 0; it is null if its tan-
gent is everywhere null; it is spacelike if its tangent is everywhere spacelike;
cf. definition 4.2.
Every spacelike curve has a length defined by its ‘arc length’ (9.43). The ‘arc
length’ of timelike curves, on the other hand, represents proper time. Consider
a timelike curve c,
λ 7→ xµ (λ) ,
which connects two events p1 and p2 that are represented by xµ (λ1 ) and xµ (λ2 ),
respectively. Let
dxµ (λ)
wµ (λ) =
dλ
denote the tangent vector field along c. The proper time that passes along this
curve is
Z λ2 Z λ2
1/2 dxµ dxν 1/2
s= − g(w, w) dλ = − gµν xσ (λ) dλ . (9.44)
λ1 λ1 dλ dλ
A curve that extremizes (9.44) we call a (timelike) geodesic. To derive the con-
dition on a curve of being a geodesic, i.e., the geodesic equation, we regard (9.44)
19
As in (9.43) we assume, for simplicity, that the entire curve can be represented by one set
of local coordinates.
where in this context (and in this context alone) we make use of the abbrevia-
tion ẋµ = dxµ /dλ. (In general, we reserve the dot notation for differentiation
w.r.t. proper time.20 ) Varying the action
Z
s = L dλ , (9.45b)
d 1 d
= ,
ds L dλ
see (9.45). Accordingly, when we multiply (9.47) with L−1 we obtain
Let us reintroduce the dot notation, but this time according to the standard
convention that the overdot refers to differentiation w.r.t. proper time. The
equation then becomes
1
gµλ ẍλ + gµλ,σ ẋσ ẋλ − gσλ,µ ẋσ ẋλ = 0 .
2
Using the fact that ẋλ ẋσ is symmetric in λ and σ we write the second term as
Summarizing,
1
gµλ ẍλ + gµλ,σ + gµσ,λ − gλσ,µ ẋσ ẋλ = 0 (9.48)
2
corresponds to the Euler-Lagrange equations. This equation is the condition
that the curve extremizes ‘arc length’ (proper time), i.e., the geodesic equation.
Define
1
gµν,σ + gµσ,ν − gνσ,µ .
Γµνσ =
2
As an aide memoir one can use the ‘curly braces’ notation
the indices run over the cyclic permutations of (ijk), where positive and nega-
tive signs alternate.21 We thereby obtain
1 1
Γµνσ = g{µν,σ} = gµν,σ − gνσ,µ + gσµ,ν .
2 2
Expressed in terms of Γµνσ , equation (9.48) becomes
1 µλ
Γµνσ = gµλ Γλνσ = g gλν,σ + gλσ,ν − gνσ,λ . (9.49)
2
21
A note of caution: This notation is not particularly common in the literature.
22
The Christoffel symbols (9.49) represent the so-called Levi-Cività connection associated
with the metric g; see also footnote 26.
On multiplying (9.48′ ) with the inverse of the metric, it becomes the geodesic
equation
ẍµ + Γµσλ ẋσ ẋλ = 0 . (9.50)
Finally, consider null geodesics. Since the tangent field is a null vector field,
the ‘arc length’ is zero. A straightforward analog of the extremization con-
siderations is thus not available. However, we may simply resort to (9.50).
A (parametrized) null curve is called an (affinely parametrized) null geodesic
if (9.50) is satisfied.
Remark. Let us explain the terminology ‘affine parametrization’ for timelike
and spacelike geodesics. Suppose that a timelike geodesic is parametrized
w.r.t. proper time s. Let t = λ1 + λ2 s with λ1 , λ2 ∈ R. It is common to
refer to parameters t of this kind as affine parameters, since the transformation
s 7→ t is obviously an affine transformation. The importance of affine parame-
ters lies in the fact that the geodesic equation (9.50) is invariant under affine
reparametrizations.
Example. Using an inertial frame of reference in Minkowski space, the Christof-
fel symbols vanish, i.e.,
Γµνσ = 0 .
Therefore the geodesic equation becomes ẍµ = 0, which yields the straight
lines.
1 il
Γijk = g glj,k + glk,j − gjk,l
2
leads to
1 1l 1 11
Γ1jk = g glj,k + glk,j − gjk,l = g g1j,k + g1k,j − gjk,1 ,
2 2
1 2l 1 22
Γ2jk = g glj,k + glk,j − gjk,l = g g2j,k + g2k,j ,
2 2
since the metric is diagonal and independent of the second coordinate. We find
1 11
Γ111 = g g11,1 + g11,1 − g11,1 = 0 ,
2
1 11
Γ112 = g g11,2 + g12,1 − g12,1 = 0 ,
2
1 11
Γ122 = g g12,2 + g12,2 − g22,1 = − sin ϑ cos ϑ ,
2
1 22
Γ211 = g g21,1 + g21,1 = 0 ,
2
1 22 1
Γ212 = g g21,2 + g22,1 = cos ϑ ,
2 sin ϑ
1 22
Γ222 = g g22,2 + g22,2 = 0 .
2
The geodesic equation (9.50) thus reads
ϑ̈ − sin ϑ cos ϑ ϕ̇ ϕ̇ = 0 ,
cos ϑ (9.51)
ϕ̈ + 2 ϑ̇ ϕ̇ = 0 .
sin ϑ
Note the factor of 2 in the second equation which is due to the fact that
Γ221 = Γ212 . Simple solutions of (9.51) are the circles ϕ = const and ϑ = π/2;
for a simplification of the ODEs (9.51) we refer to the exercise course.
Remark. By equation (9.49), the Christoffel symbols can be computed directly
from the metric. However, in many cases this is rather time-consuming. An
alternative way to obtain the Christoffel symbols is to proceed in analogy
with (9.45) et. seq., i.e., to derive the Euler-Lagrange equations of the La-
grangian
1/2
L(xµ , ẋµ ) = ±gµν (xσ ) ẋµ ẋν .
∂L d ∂L ∂L d ∂L
− =0, − =0.
∂ϑ ds ∂ ϑ̇ ∂ϕ ds ∂ ϕ̇
Consider the geodesic equation (9.50). Being the tangent vector field to a curve,
ẋµ behaves like any other vector field under a change of coordinates, i.e.,
Consider a geodesic c. Expressed w.r.t. the coordinate system {xµ } this means
that c is represented by the (affinely parametrized) curve s 7→ xµ (s) that sat-
isfies the geodesic equation
cf. (9.50). Equivalently, we are able to use a different coordinate system {x̄µ }
and represent c as s 7→ x̄µ (s) with
¨µ + Γ̄µσλ (x̄κ ) x̄˙ σ x̄˙ λ = 0 .
x̄ (9.53b)
∂ x̄µ λ ∂ 2 x̄µ σ λ µ κ ∂ x̄
σ ∂ x̄λ
′ ′
ẍ + ẋ ẋ + Γ̄ σλ (x̄ ) ẋσ ẋλ = 0 ,
∂xλ ∂xλ ∂xσ ∂xσ′ ∂xλ′
or, equivalently, on multiplication with ∂xµ‘ /∂ x̄µ ,
∂xµ′ ∂ 2 x̄µ µ ∂ x̄σ ∂ x̄λ
′
µ′ µ κ ∂x ′ ′
ẍ + µ σ ′ λ′ + Γ̄ σλ (x̄ ) µ σ ′ λ′ ẋσ ẋλ = 0 . (9.54)
| ∂ x̄ ∂x ∂x {z ∂ x̄ ∂x ∂x }
′
Γµ σ′ λ′
From (9.55) we see that the Christoffel symbols are merely an array of real
numbers and not a tensor; the first term in the transformation formula is
‘untensorial’, cf. (9.31).
Exercise. Show the equivalence of (9.55a) and (9.55b). This amounts to proving
that
∂ 2 x̄µ ∂xν ∂xσ ∂ x̄µ ∂ 2 xκ
− ν σ = .
∂x ∂x ∂ x̄λ ∂ x̄ρ ∂xκ ∂ x̄λ ∂ x̄ρ
Use that
∂ x̄µ ∂xκ
= δµν ,
∂xκ ∂ x̄ν
and differentiate w.r.t., say, xπ .
Let v be a vector field, i.e., v = v i (xk )∂i . Then the covariant derivative of v is
a tensor field whose components we write as ∇i v j ; these are defined to be
∇i v j = ∂i v j + Γj ik v k . (9.56b)
Let a be a covector field, i.e., a = ai (xk )dxi . Then the covariant derivative is
a tensor field whose components we write as ∇i aj ; these are defined to be
∇i aj = ∂i aj − Γkij ak . (9.56c)
∇u = ui ∇i .
It is immediate from (9.56) that the covariant derivative ∇u preserves the rank
of the tensor: If, e.g., v is a vector, then so is ∇u v. (The components of ∇u v
are are ∇u v i .) If, e.g., T is a tensor of rank (p, q), then so is ∇u T .
26
For the abstract (and beautiful) definition of covariant derivatives in terms of connections
we refer to the lecture course ”Einführung in die Relativitätstheorie und Kosmologie II”.
27
We use Latin indices for a change to indicate the general nature of these considerations.
∇i aj v j = ∂i aj v j = aj,i v j + aj v j,i .
On the other hand, we apply (9.56b) and (9.56c) and the Leibniz rule for
derivative operators, i.e.,
∇i aj v j = ∇i aj v j + aj ∇i v j = aj,i − Γkij ak v j + aj v j,i + Γj ik v k
= aj,i v j + aj v j,i .
Since this reproduces the original result, we find that the three definitions (9.56)
are consistent (i.e., consistent with the Leibniz rule).
Third, the proof of the crucial claim: We show the tensorial character of the
covariant derivative. Consider ∇i v j ; we prove that these are the components
28
The last equation looks nicer when expressed as aj;i = aj,i − Γkji ak ; we simply use the
symmetry of the Christoffel symbols in the two lower indices. Note, however, that this
symmetry requires the choice of a coordinate frame.
∇u v = 0
dxµ (s)
uµ (s) = ẋµ (s) = .
ds
The covariant derivative of this vector along the curve c itself is
dxν ∂ dxµ
∇u uµ = uν ∇ν uµ = uν ∂ν uµ + uν Γµνσ uσ = + Γµνσ ẋν ẋσ
ds ∂xν ds
d2 xµ
= + Γµνσ ẋν ẋσ = ẍµ + Γµνσ ẋν ẋσ .
ds2
Comparing with (9.50) we conclude that a curve c is a geodesic if and only if
its tangent vector is parallelly propagated along c, i.e., parallel to itself along
the curve. The geodesic equation is simply ∇u u = 0 or
∇ẋ ẋ = 0 . (9.57)
or
A quick glance at (9.60) reveals its similarity to the geodesic equation (9.50).
Note that Gµνσ is symmetric in (νσ) as are the Christoffel symbols of a Levi-
Cività connection. The Christoffel symbols of which metric? Define the metric
And here’s the claim: xµ (s) satisfies the equation (9.60) in Minkowski space
if and only if x̄µ (s̄) satisfies the geodesic equation in the spacetime with met-
ric (9.61).
Let us prove this claim. Since we use two concepts of proper time in parallel,
it is advisable to avoid the use of the overdot. Let s denote proper time w.r.t.
ηµν and s̄ proper time w.r.t. gµν ; then
ds̄ = eV ds . (9.63)
To see this we simply note that g(v, v) = e2V η(v, v) for every vector v µ and in
particular for every timelike vector; if x(λ) denotes an arbitrary parametrization
of a timelike world line, then, by (4.8) and (9.44),
q q
V
ds̄ = −g dx/dλ, dx/dλ dλ = e −η dx/dλ, dx/dλ dλ = eV ds .
A very intuitive way of deriving (9.63) is to use the line element notation, i.e.,
We set
d µ d µ
uµ (s) = x (s) and ūµ (s̄) = x̄ (s̄) .
ds ds̄
Since
d d
= e−V ,
ds̄ ds
we find that
λ (s))
ūµ (s̄) = e−V(x uµ (s) .
Suppressing the arguments we simply write ūµ = e−V uµ . Note that uµ is
normalized w.r.t. ηµν while ūµ is normalized w.r.t. gµν , i.e.,
This is consistent with the requirement that the world line be parametrized
w.r.t. proper time s and s̄, respectively.
Furthermore,
dV(xλ (s)) ∂V dxµ
= = V,µ uµ (s) ,
ds ∂xµ ds
and hence
d µ duµ (s)
ū (s̄) = e−2V τ µ
− V,τ u (s) u (s) . (9.64)
ds̄ ds
duµ
= Gµνσ uν uσ . (9.65)
ds
It is important to note that the indices are raised and lowered with ηµν in this
context; in particular,
Gµνσ = η µρ Gρνσ .
and
gµν = e−2V η µν ,
where gµν is the inverse of gµν , i.e., gµν gνσ = δµσ , we obtain
where the indices on the r.h.s. are raised and lowered with gµν . An equation
like (9.66) is extremely ‘risky’; it is preferable to write
to avoid confusion about which indices are raised and lowered with which met-
ric.29
d µ
ū (s̄) = e−2V [Gµνσ uν (s)uσ (s) − V,τ uτ (s) uµ (s)]
ds̄
h i
= e−2V −Γµνσ uν (s) uσ (s) + δµ(ν V,σ) uν (s)uσ (s) − V,τ uτ (s) uµ (s)
h i
= −Γµνσ ūν (s̄) ūσ (s̄) + e−2V δµν V,σ uν (s) uσ (s) − V,τ uτ (s) uµ (s)
h i
= −Γµνσ ūν (s̄) ūσ (s̄) + e−2V uσ (s) V,σ uµ (s) − V,τ uτ (s) uµ (s)
d µ
ū (s̄) + Γµνσ ūν (s̄)ūσ (s̄) = 0 (9.67)
ds̄
or
Unfortunately, the scalar theory of gravity we have discussed here does not
represent our physical reality correctly. But we have come across a fundamental
principle:
In these curved spacetimes, the motion of test particles (in the absence of other
forces) is geodesic motion, i.e., free fall.
In sections 9.1 and 9.2 we have taken the local equivalence between gravitation
and acceleration to good account. Let us finally formalize this idea.
MATHEMATICAL BACKGROUND
Conventions
{e1 , e2 , . . . , en }
v = v i ei . (A.1)
v = v µ eµ . (A.2)
In this context, Greek indices are assumed to run from zero to three, i.e., the
basis is {e0 , e1 , e2 , e3 }, the components of the vector are v 0 , v 1 , v 2 , v 3 , and (A.2)
means v = v 0 e0 + v 1 e1 + v 2 e2 + v 3 e3 . (Note again that there’s nothing deep in
this—we are merely discussing conventions of writing up things.)
It is common not to distinguish between the abstract vector v and the collection
of its components w.r.t. the chosen basis3 ; we therefore typically write
v1
.
v = .. . (A.3)
v n
It is important to note that (A.3) tacitly assumes that a basis has been selected;
otherwise (A.3) does not make sense.
Rather confusing, at first sight at least, is the abstract index notation. Adopting
this notation (A.3) becomes
v1
.
v i = .. . (A.4)
v n
2
Unfortunately not all relativists follows this convention—some stick to Latin indices.
3
This is true when we consider only one (fixed) basis. However, when we deal with two
different bases at the same time, a distinction is useful; see (A.31) below.
Here, v i does not denote the ith component of the vector, but the collection of
all components i = 1, 2, . . . , n, and hence the vector itself. In abstract index
notation we would typically write “Consider a vector v i ∈ V and . . . ”, where the
superscript i is not an actual index but merely a dummy indicating the vector
character of the object; this is particularly useful, when we deal simultaneously
with vectors, covectors, and tensors, see below. In this script we use both the
standard notation and the abstract index notation.
Remark. In relativity, in dimension four, we would write
0
v
v1
vµ =
v 2 (A.5)
v3
in abstract index notation. Again, µ does not denote any particular value, but
the collection of all µ = 0, . . . , 3.
Consider a vector space V of dimension n over the real numbers. We now define
the dual space V ∗ associated with V .
a :V →R (A.6a)
v 7→ a(v) ∈ R , (A.6b)
{e1 , e2 , . . . , en } ; (A.7)
note the convention that co-basis vectors are indexed by superscripts. In order
to define the co-basis vector ei , since it is a map of the type (A.6), we must
prescribe how it acts on vectors v ∈ V . We define
ei (v) = v i , (A.8)
i.e., ei applied to a vector yields the ith component of the vector. Since v = v j ej ,
we have
ei (v) = ei (v j ej ) = v j ei (ej )
|{z}
vi
Note in passing that it is not difficult to prove that the dimension of V ∗ coin-
cides with the dimension of V , as we have tacitly presupposed above.
a = ai ei , (A.9)
a = (a1 , a2 , . . . , an ) . (A.10)
Hence when the covector a is applied to the basis vector ej we obtain the
j th component of the covector, aj . (This equation should be compared with
equation (A.8).)
In this section we analyze how vectors and covectors transform4 under a change
of basis. To that end suppose that the vector space V is equipped with two
different bases,
v = v̂ i êi , (A.13)
v = v̆ i ĕi , (A.14)
and
v̆ 1
.
v = .. w.r.t. {ĕ1 , . . . , ĕn } . (A.16)
v̆ n
Remark. In relativity, bases correspond to observers. Hence, when we introduce
a (four-)vector we must specify the observer, w.r.t. which we decompose the
vector.
or, equivalently,
Now suppose that the two bases are related via a basis transformation, i.e.,
v = v̂ i êi
v = v̆ k ĕk = v̆ k Aik êi = Aik v̆ k êi ,
therefore,
v̂ i = Aik v̆ k . (A.19)
We then obtain
δji = ĕj (ĕi ) = B jl êl (Aki êk ) = B jl Aki êl (êk ) = B jk Aki ,
|{z}
δlk
which implies that B is the inverse of A,
We conclude that
a(v) = âi v̂ i = B ki ăk Ail v̆ l = B ki Ail ăk v̆ l = ăk v̆ k , (A.23)
| {z }
δkl
i.e., we find consistency.
Since b is a map with two slots, we will occasionally write b(·, ·). This is
reminiscent of the notation h·, ·i that is employed for scalar products.
Example. A scalar product h·, ·i is defined as a positive definite symmetric
bilinear form. It is the prime example for a non-degenerate symmetric bilinear
form. To show this, we must prove that positive definiteness implies non-
degeneracy, cf. (A.25c): Thus, for v ∈ V assume that hv, wi = 0 ∀w ∈ V . Since
this holds for all w it holds necessarily also for w = v, i.e., hv, vi = 0. However,
the requirement of positive definiteness implies that hv, vi is always positive
unless v = 0. Hence, from hv, vi = 0 we conclude that v = 0. We have thus
established (A.25c).
Definition A.3. Let {e1 , e2 , . . . , en } be a basis of V . The components of the
bilinear form b = b(·, ·) are given by applying b to the basis vectors: we define
bij as
bij = b(ei , ej ) . (A.26)
Remark for experts. The resemblance between (A.12) and (A.26) is not a co-
incidence. In fact, a bilinear form is a co-tensor of rank 2, i.e., an element
of V ∗ ⊗ V ∗ , namely b = bij ei ⊗ ej (cf. a = ai ei ). Its behavior thus naturally
generalizes the behavior of covectors.
The components bij can be collected into an (n×n) matrix, which we call again
b in slight abuse of notation,
b11 b12 · · · b1n
b21 b22 · · · b2n
b = bij i,j = . .. .. .. . (A.27)
.. . . .
bn1 bn2 · · · bnn
the definition of which is given w.r.t. some chosen basis {e1 , e2 }. This bilinear
form is clearly symmetric and non-degenerate, since the matrix is symmetric
and non-singular.
Remark. Equation (A.28′ ) can be written in another form that is used fre-
quently, namely as
b(v, w) = v T b w = hv, b wi , (A.28′′ )
where h·, ·i denotes the standard scalar product w.r.t. the basis under con-
sideration.5 In quantum mechanics it is customary to write hv|b|wi instead
of (A.28′′ ).
5 ′′
There is a slight subtlety involved here that we choose to suppress. In (A.28
` i ´ ), strictly
speaking, b denotes the endomorphism on V whose matrix representation b j i,j is given
by (A.27).
Orthogonality
Since b(·, ·) is in general not the standard scalar product, the concept of or-
thogonality differs from the one we are used to in Euclidean geometry. This is
illustrated by the following example.
Example. Consider again the two-dimensional plane with the bilinear form
given by (A.29). A simple calculation shows that the vectors
1 2
v= and w=
0 1
are orthogonal.
Several basic properties carry over from Euclidean geometry, an important one
being the following:
In Euclidean
p geometry, the length of a vector v is given as its norm kvk, where
kvk = hv, vi. A non-degenerate symmetric bilinear form b = b(·, ·), however,
does not define a norm in general. This is simply because there might exists
vectors v such that b(v, v) = 0 or b(v, v) < 0.
Example. Consider again the two-dimensional plane with the bilinear form
given by (A.29). There exist vectors v such that b(v, v) < 0; for example,
0
for v = we have b(v, v) = −2 .
1
Although we are thus unable to define norms in the proper sense, we see that
we can easily ascribe the “squared norm” b(v, v) to each vector v. This concept
is of central importance in non-Euclidean geometry and in applications.
Remark. Occasionally (and in particular in relativity) we speak of the square
of a vector v. Despite being prone to confusion, it is also common to write v 2
for the expression b(v, v). (This is reminiscent of the notation ~v 2 = h~v , ~v i in
Euclidean space R3 .)
Evidently, the analogous decompositions hold for the vector w. Also the bilinear
form b = b(·, ·) possesses two component representations, which are given by
Now, the two bases are related via a basis transformation, i.e.,
ĕi = Aj i êj ,
b̆ij = b(ĕi , ĕj ) = b(Aki êk , Alj êl ) = Aki Alj b(êk , êl ) = Aki Alj b̂kl , (A.33)
or in matrix notation
b̆ = AT b̂ A . (A.33′ )
6
Recall that the representation of a vector v as a column vector presupposes the choice of
a basis, see (A.3). Although this is a trivial fact, it tends to be forgotten easily.
We see that when b̂ij is used to compute b(v, w) in hatted coordinates, then
Aki Alj b̂kl is used in breve coordinates. We have thus proved the following
proposition:
Now consider a second basis, {ē1 , ē2 }, which is related to the original basis via
a basis transformation, i.e.,
ēi = Aki ek ,
where √ √
1
Aij =√ √2 −2
√ 3 .
i,j 30 2 2 3
W.r.t. the new basis {ē1 , ē2 } the bilinear form exhibits the component repre-
sentation
k l
−1 0
b̄ij i,j = A i A j bkl i,j = . (A.37)
0 1
The relevance of this particular example will become clear below.
Orthogonal transformations
w.r.t. some basis {e0 , e1 } (where we follow the relativistic convention of in-
dexing, for a change). A straightforward computation shows that the basis
transformation determined by the matrix
√
2 √
−1
Aij = .
i,j −1 2
Orthonormal bases
hv, wi = δij v i wj = v T 1 w ,
P
or equivalently hv, wi = v T w = ni=1 v i wi . This is generalized to pseudo-scalar
products by virtue of the following theorem.
that the number of minus-signs and plus-signs does not depend on the actual
choice of basis follows from dimensional considerations.
Example. The signature of the bilinear form (A.36) is (−+), since there exists
an orthonormal basis, w.r.t. which it takes the form (A.37).
Proof. The proof of this theorem can also be regarded as an alternative proof
of Theorem A.9. Starting from a given basis {e1 , . . . , en }, where b(·, ·) is rep-
resented by bij , we explicitly construct an orthonormal basis {ē1 , . . . , ēn }.
To that end recall from (A.35) that with ēi = Aj i ej the components of bij
transform according to b̄ij = Aki Alj bkl , or, in matrix notation,
b̄ = AT b A . (A.42)
The eigenvalues λi are real (since b is symmetric) and non-zero (since b is not
singular). Assume that there exist n− negative eigenvalues and n+ positive
eigenvalues. Without loss of generality we may arrange the eigenvalues in such
an order that λ1 < 0, . . . , λn− < 0 and λn− +1 > 0, . . . , λn > 0. Let us
introduce the matrix
p
|λ1 |
..
Λ= . ;
p
|λn |
the diagonal matrix diag (λ1 , . . . , λn ) admits the following (unique) decompo-
sition
diag λ1 , . . . , λn = Λ diag −1, . . . , −1, +1, . . . , +1 Λ .
| {z } | {z }
n− times n+ times
Λ−1
|
T
{zO} b O Λ−1
| {z } = diag (−1, . . . , −1, +1, . . . , +1) .
| {z } | {z }
AT A n− times n+ times
Therefore, by defining A = Aij i,j
to be A = O Λ−1 , we achieve
♭ :V → V ∗ (A.45a)
♯ = ♭−1 :V ← V ∗ (A.45b)
♭ :V → V ∗ (A.46a)
∗
v 7→ ♭(v) := b(v, ·) ∈ V ; (A.46b)
Written in components, we see that this abstract definition lays the foundation
for the procedure commonly known as “lowering of indices”. Let us write
vi = bij v j . (A.49)
v i = bij vj . (A.50)
Since ♯ = ♭−1 , bij i,j
denotes the inverse matrix of bij , i.e.,
Remark. By definition, we obtain bij from bij when we take the inverse matrix.
However, it is equally possible to compute bij from bij by raising both indices.
To see that we compute
bij = δil bjl = bik bkl bjl = bik bjl bkl ,
Using raising and lowering of indices provides a means to write the pseudo-
scalar product in a convenient form. Namely, instead of b(v, w) = bij v i wj we
are able to write
b(v, w) = v i wi = vi wi . (A.52)
Accordingly, the square of a vector becomes
v 2 = b(v, v) = v i vi = vi v i . (A.53)
In application in relativity, where we use Greek indices that run from zero to
three, expressions of this kind are abundant:
v µ wµ , v 2 = v µ vµ , etc.
Definition A.12. An affine space over a vector space V is a set A that can
be identified with V provided that one (arbitrary) element is distinguished as
the origin. The elements of an affine space are called points
An affine space can thus be viewed as a “forgetful vector space”—a vector space
that has lost its information on the zero vector. By actively choosing an origin,
the affine space becomes a vector space again. Note, however, the immanent
arbitrariness: any point can be chosen as the origin. As a consequence, all
points in an affine space are equal. There is a price to pay, though:
Appealing to our mathematical intuition once again, we see that for any pair
of points p, q ∈ A there exists a unique vector v such that p + v = q; one writes
v = q − p or v = −→ Loosely speaking, we can say that (a copy of) the vector
pq.
space V is attached to each point p ∈ A.
Definition A.13. A group G is said to act on a set X (on the left) if every
g ∈ G induces a bijective map gX on the set X, i.e.,
gX : X → X (A.54a)
x 7→ g · x , (A.54b)
such that
The notation
x 7→ g · x
is chosen to suggest that elements of X can be “multiplied” with elements of g
(on the left). In some cases (when the group operation of G is denoted by a
plus sign instead of by a dot) it is preferable to write
x 7→ x + g
(x + g1 ) + g2 = x + (g1 + g2 ) , x+0=x,
where now 0 denotes the identity element of G. (We have chosen an action on
the right in this case; however, since (G, +) is commonly us for abelian groups,
this makes no actual difference.)
v 7→ λ · v (λ ∈ R, v ∈ V ) . (A.56)
λ · (µ · v) = (λµ) · v and 1 · v = v .
It is standard to omit the dots and simply write λ(µv) = (λµ)v and 1v = v.
Example. The general linear group GL(n, R) (special linear group, orthogonal
group,. . . ) acts on the vector space Rn as a group of endomorphisms.
There exists several types of group actions. We merely discuss the type that
is essential for our purposes: A group action is called simply transitive, if for
any two elements x1 , x2 ∈ X there exists exactly one element g ∈ G such that
g · x1 = x2 (or x1 + g = x2 in the ‘+’ notation).
Definition A.14. An affine space over a vector space V is a set A, on which
the (abelian group underlying the) vector space V acts simply transitively. The
elements of an affine space are called points.
p 7→ p + v . (A.57)
Furthermore, since the action is assumed to be simply transitive, for each pair
of points p, q ∈ A there exists a unique vector v such that p + v = q. One
typically writes v = q − p or v = −
→
pq.
A ∋ p ←→ −
→
op ∈ V .
In this way, A can be identified with the underlying vector space V . However,
this identification is not canonical, since the choice of origin is completely free.
Affine coordinates
A ∋ p ←→ x = −
→ ∈ V ←→ xi ∈ Rn .
op
A ∋ p ←→ x = −
→
op ∈ V ←→ xi ∈ Rn . (A.58)
where the bases {e1 , . . . , en } and {ē1 , . . . , ēn } on V are related by the basis
transformation8
ei = Gj i ēj , (A.59a)
where Gik is a non-singular matrix. Let v i denote the components of some
vector w.r.t. the basis {e1 , . . . , en } and v̄ i its components w.r.t. {ē1 , . . . , ēn }.
Then (A.59a) induces the transformation
v̄ i = Gik v k (A.59b)
Now consider a point p ∈ A, whose affine coordinates w.r.t. the two affine bases
are
p ↔ xi ∈ Rn p ↔ x̄i ∈ Rn
w.r.t. o, {e1 , . . . , en } w.r.t. ō, {ē1 , . . . , ēn }
8
Previously, in section A.4, we typically used the form ēi = Aj i ej for a basis transforma-
tion and called (Akl )k,l the basis transformation matrix. The previous equation clearly
corresponds to (A.59a) with A = G−1 .
ψ : Rn → A ψ ′ : Rn → A
(t, x) 7→ ψ(t, x) (t′ , x′ ) 7→ ψ ′ (t′ , x′ )
be two different coordinate systems on A (where the fact that the coordinates
are denoted by a tuple (t, x) instead of only x is of course completely irrelevant).
A particular point p ∈ A has thus two different coordinate representations
ψ ψ′
Rn −→ A ←− Rn
(t, x) 7→ p ←[ (t′ , x′ ) ,
and the coordinate transformation between the two charts is given by the map
ϕ = (ψ ′ )−1 ◦ ψ : Rn → Rn
(t, x) 7→ (t′ , x′ )
Consider a function
Φ :A → R
p 7→ Φ(p)
ψ′ Φ
φ′ :(t′ , x′ ) −→ p = ψ ′ (t, x) −→ Φ(p) = Φ(ψ ′ (t′ , x′ )) = (Φ ◦ ψ ′ )(t′ , x′ ) .
φ=Φ◦ψ and φ′ = Φ ◦ ψ ′ .
Therefore, we obtain
φ = Φ ◦ ψ = φ′ ◦ (ψ ′ )−1 ◦ ψ = φ′ ◦ (ψ ′ )−1 ◦ ψ = φ′ ◦ ϕ ,
(ANTI) SYMMETRIZATION
Consider an arbitrary tensor Ti1 ···ik . Let P(1···k) denote the set of all k! per-
mutations of the k-tuple (1, 2, . . . , k). Then the symmetric part of the tensor
Ti1 ···ik is defined as
1 X
T(i1 ···ik ) := Tiσ(1) ···iσ(k) . (B.1)
k!
σ∈P(1···k)
Therefore,
Aij = A(ij) + A[ij] , (B.4)
which shows that every tensor of rank 2 (which corresponds to a matrix) can
be uniquely decomposed into its symmetric plus its antisymmetric part.
and the analogous result for A(ijk) . Clearly, the analog of (B.4) does not hold.
Exercise. Let n be the dimension of the vector space and let Ai1 ···ik be a tensor
of rank k (with k ≤ n). Show that A[i1 ···ik ] has nk independent entries, while
A(i1 ···ik ) has n+k−1
k .
Example. The tensor A[ij] has n2 independent entries, while A(ij) has n+1 2 .
The sum of the two is n2 , which equals the number of coefficients of Aij ; com-
pare this result with (B.4). In contrast, the tensor A[ijk] has n3 independent
entries, while A(ijk) has n+2 2
3 . Show that the sum is n(n + 2)/3, which is less
than the n3 coefficients of Aijk .
hence
X
δji11···i
...jk =
k
sgn σ δji1σ(1) · · · δjikσ(k) = k! δ[j
i1
1
· · · δjikk ] . (B.15a)
σ∈P(1···k)
Alternatively we obtain
[i i ] [i i ]
δji11···i 1 k 1 k
...jk = k! δj1 · · · δjk = k! δ[j1 · · · δjk ] .
k
(B.15b)
In some contexts the nested variants of (B.15) are useful. We merely consider
a particular example,
αµν µ ν [µ
δκστ = 2δκα δ[σ α
δτ ] + 4δ[σ δτ ] δκν] . (B.16)
αµν µ ν α µ ν µ ν [µ
δκστ = 2δκα δ[σ δτ ] + 2δ[σ α ν µ
δτ ] δκ + 2 δ[τ δσ] δκ = 2δκα δ[σ α
δτ ] + 4δ[σ δτ ] δκν] ,
|{z}
α δν
−δ[σ τ]
which proves the claim. Finally we note that ǫi1 ···ik is a special case of the
generalized Kronecker symbol, because
ǫi1 ···ik = δi1···k
1 ···ik
. (B.17)
Example. A common source of confusion is the ǫ-tensor with upstairs indices.
On the one hand, there exists the tensor ǫi1 ···ik defined in analogy to (B.11),
i.e., (
i1 ···ik +1 if sgn(i1 · · · ik ) = +1
ǫ = (B.18)
−1 if sgn(i1 · · · ik ) = −1 ,
and ǫi1 ···ik = 0 otherwise. However, if the vector space is endowed with a metric
(scalar product), then the symbol ǫi1 ···ik is often used to denote the tensor that
is generated by raising the indices of ǫi1 ···ik . For instance, in Minkowski space,
in contrast to (B.18), the symbol ǫαβγδ might be used to denote
′ ′ ′ ′
ǫαβγδ = η αα η ββ η γγ η δδ ǫα′ β ′ γ ′ δ′ . (B.19)
In particular
′ ′ ′ ′
ǫ0123 = η 0α η 1β η 2γ η 3δ ǫα′ β ′ γ ′ δ′ = det η = −1 . (B.20)
We conclude that
(
αβγδ −1 if sgn(αβγδ) = +1
ǫ = (B.21)
+1 if sgn(αβγδ) = −1 ,
and ǫαβγδ = 0 otherwise. Let us summarize this notational confusion with the
‘equation’
ǫαβγδ = −ǫαβγδ , (B.22)
where the l.h.s. is the tensor generated by raising the indices of ǫαβγδ and the
r.h.s. is the ‘normal’ ǫ-tensor with upstairs indices. Which of two conventions
is followed is (hopefully) clear from the context.
l ...l
ǫi1 ···ip j1 ···jq ǫi1 ···ip l1 ···lq = p! δj11 ···jqq , (B.23)
where the factor p! enters, because there are p! permutations of the p-tuple
(i1 · · · ip ) that are summed over. In the special case of four dimensions we thus
obtain the following relations:
ǫν 1 ν 2 ν 3 α ǫν 1 ν 2 ν 3 β = 3! δαβ (B.24b)
When we use the ǫ-tensor (B.21), which is the negative of the tensor used
above, we obtain the following relations:
Here we have used (B.15). The relations (B.25) are useful in many contexts.
Example. If (anti)symmetrization involves two identical objects, the additional
symmetry results in simplifications. The simplest example is
1
which is proved straightforwardly: a[i aj] = 2 ai aj − aj ai = 0. Analogously,
because ai aj is already symmetric. For tensors of higher order things get more
complicated; we merely consider one example. Let Fab be antisymmetric, i.e.,
F[ab] = Fab . Then
1
Fa[b Fcd] = Fab Fcd + 2Fa[c Fd]b . (B.28b)
3
Let us prove (B.28a). (Recall that hatted indices are excluded from the an-
tisymmetrization; for example, in [abĉd] the antisymmetrization involves only
1
the indices [abd], hence A[abĉd] = 6 Aabĉd − Abaĉd + Abdĉa + . . . .)
1
F[ab Fcd] = Fa[b Fcd] − F[bâ Fcd] + F[bc Fâd] − F[bc Fd]a
4
1
= Fa[b Fcd] − F[cd Fb]a +Fa[d Fbc] − F[bc Fd]a
4 |{z} |{z}
−Fâb] −Fâd]
1
= Fa[b Fcd] + Fa[b Fcd] + Fa[d Fbc] + Fa[d Fbc] = Fa[b Fcd]
4
We have only used elementary considerations about permutations; furthermore,
we have employed the antisymmetry of Fab and the fact that cyclic permuta-
tions of a triple do not change signs. The proof of (B.28b) is simpler, since
1
Fa[b Fcd] = Fab F[cd] + Fa[c Fd]b + Fa[d Fb̂c]
3 |{z}
−Fc]b