Harvard - Black Holes, Hawking Radiation, and The FIrewall - Noah Miller
Harvard - Black Holes, Hawking Radiation, and The FIrewall - Noah Miller
Abstract
Here I give a friendly presentation of the the black hole informa-
tion problem and the firewall paradox for computer science people who
don’t know physics (but would like to). Most of the notes are just
requisite physics background. There are six sections. 1: Special Rela-
tivity. 2: General Relativity. 3: Quantum Field Theory. 4. Statistical
Mechanics 5: Hawking Radiation. 6: The Information Paradox.
Contents
1 Special Relativity 3
1.1 Causality and light cones . . . . . . . . . . . . . . . . . 3
1.2 Space-time interval . . . . . . . . . . . . . . . . . . . . 5
1.3 Penrose Diagrams . . . . . . . . . . . . . . . . . . . . . 7
2 General Relativity 10
2.1 The metric . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Einstein’s field equations . . . . . . . . . . . . . . . . . 13
2.4 The Schwarzschild metric . . . . . . . . . . . . . . . . . 15
2.5 Black Holes . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Penrose Diagram for a Black Hole . . . . . . . . . . . . 18
2.7 Black Hole Evaporation . . . . . . . . . . . . . . . . . . 23
1
3.5 The Hamiltonian . . . . . . . . . . . . . . . . . . . . . 29
3.6 The Ground State . . . . . . . . . . . . . . . . . . . . . 30
3.7 Particles . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8 Entanglement properties of the ground state . . . . . . . 35
4 Statistical Mechanics 37
4.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Temperature and Equilibrium . . . . . . . . . . . . . . 40
4.3 The Partition Function . . . . . . . . . . . . . . . . . . 43
4.4 Free energy . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Phase Transitions . . . . . . . . . . . . . . . . . . . . . 50
4.6 Example: Box of Gas . . . . . . . . . . . . . . . . . . . 52
4.7 Shannon Entropy . . . . . . . . . . . . . . . . . . . . . 53
4.8 Quantum Mechanics, Density Matrices . . . . . . . . . . 54
4.9 Example: Two state system . . . . . . . . . . . . . . . . 56
4.10 Entropy of Mixed States . . . . . . . . . . . . . . . . . 58
4.11 Classicality from environmental entanglement . . . . . . 58
4.12 The Quantum Partition Function . . . . . . . . . . . . . 62
5 Hawking Radiation 64
5.1 Quantum Field Theory in Curved Space-time . . . . . . 64
5.2 Hawking Radiation . . . . . . . . . . . . . . . . . . . . 65
5.3 The shrinking black hole . . . . . . . . . . . . . . . . . 66
5.4 Hawking Radiation is thermal . . . . . . . . . . . . . . 68
5.5 Partner Modes . . . . . . . . . . . . . . . . . . . . . . . 69
2
1 Special Relativity
1.1 Causality and light cones
There are four dimensions: the three spatial dimensions and time.
Every “event” that happens takes place at a coordinate labelled by
(t, x, y, z).
3
Figure 2: The worldline of something moving with velocity v.
Figure 3: lightcone.
The “past light cone” consists of all of the space-time points that can
send a message that point. The “future light cone” consists of all of the
space-time points that can receive a message from that point.
4
1.2 Space-time interval
In special relativity, time passes slower for things that are moving. If
your friend were to pass you by in a very fast spaceship, you would see
their watch tick slower, their heartbeat thump slower, and their mind
process information slower.
If your friend is moving with velocity v, you will see their time pass
slower by a factor of
1
γ=q . (2)
v2
1 − c2
For small v, γ ≈ 1. As v approaches c, γ shoots to infinity.
Let’s say your friend starts at a point (t1 , x1 ) and moves at a constant
velocity to a point (t2 , x2 ) at a constant velocity v.
Define
∆t = t2 − t1
∆x = x2 − x1 .
From your perspective, your friend has moved forward in time by ∆t.
However, because time passes slower for your friend, their watch will
have only ticked forward the amount
∆t
∆τ ≡ . (3)
γ
Here, s is the so-called “proper time” that your friend experiences along
their journey from (t1 , x1 ) to (t2 , x2 ).
5
Everybody will agree on what ∆τ is. Sure, people using different
coordinate systems will not agree on the exact values of t1 , x1 , t2 , x2 ,
or v. However, they will all agree on the value of ∆τ . This is because
∆τ is a physical quantity! We can just look at our friend’s watch and
see how much it ticked along its journey!
Figure 5: The time elapsed on your friend’s watch during their journey
is the invariant “proper time” of that space-time interval.
∆x
Usually, people like to write this in a different way, using v = ∆t .
(∆t)2
(∆τ )2 =
γ2
2
= (∆t)2 (1 − vc2 )
= (∆t)2 − c12 (∆x)2
(∆x)2 + (∆y)2
6
1.3 Penrose Diagrams
Penrose diagrams are used by physicists to study the “causal struc-
ture of space-time,” i.e., which points can affect (and be affected by)
other points. One difficult thing about our space-time diagrams is that
t and x range from −∞ to ∞. Therefore, it would be nice to reparam-
eterize them so that they have a finite range. This will allow us to look
at all of space-time on a finite piece of paper.
Doing this will severely distort our diagram and the distances be-
tween points. However, we don’t really care about the exact distances
between points. The fonly thing we care about preserving is 45◦ angles.
We are happy to distort everything else.
To recap, a Penrose diagram is just a reparameterization of our usual
space-time diagram that
3. lets us easily see how all space-time points are causally related.
7
Figure 7: Lines of constant t and constant x.
Let’s talk about a few features of the diagram. The bottom corner is
the “distant past.” All particles moving slower than c will emerge from
there. Likewise, he top corner is the “distant future,” where all particles
moving slower than c will end up. Even though each is just one point
in our picture, they really represent an infinite number of points.
The right corner and left corner are two points called “spacelike in-
8
fintiy.” Nothing physical ever comes out of those points.
The diagonal edges are called “lightlike infinty.” Photons emerge
from one diagonal, travel at a 45◦ angle, and end up at another diagonal.
9
2 General Relativity
2.1 The metric
Space-time is actually curved, much like the surface of the Earth.
However, locally, the Earth doesn’t look very curved. While it is not
clear how to measure large distances on a curved surface, there is no
trouble measuring distances on a tiny scale where things are basically
flat.
Figure 10: A curved surface is flat on tiny scales. Here, the distance,
A.K.A proper time, between nearby points is labelled dτ .
Say you have two points which are very close together on a curved
space-time, and an observer travels between the two at a constant ve-
locity. Say the two points are separated by the infinitesimal interval
where µ = 0, 1, 2, 3.
In general we can write the proper time dτ elapsed on the observer’s
watch by
X 3 X3
2
dτ = gµν dxµ dxν . (5)
µ=0 ν=0
10
combinations of dxµ , is the most general possible form for a distance we
could have. I should note that Eq 5 could be written as
dτ 2 = aT M a
where
1 0 0 0
0 −1 0 0
gµν 0 0 −1 0 .
= (7)
0 0 0 −1
However, in general relativity, we are interested in dynamical metrics
which vary from point to point.
I should mention one further thing. Just because Eq. 5 says dτ 2 =
(something), that doesn’t mean that dτ 2 is the square of some quantity
“dτ .” This is because the metric gµν is not positive definite. We can see
that for two nearby points that are contained within each other’s light-
cones, dτ 2 > 0. However, if they are outside of each other’s lightcones,
then dτ 2 < 0, meaning dτ 2 is not the square of some dτ . If dτ 2 = 0,
the the points are on the “rim” of each other’s light cones.
While the metric gives us an infinitesimal notion of distance, we have
to integrate it in order to figure out a macroscopic notion of distance.
Say you have a path in space-time. The total “length’ of that path ∆τ
is just the integral of dτ along the path.
Z Z sX
∆τ = dτ = gµν dxµ dxν (8)
µ,ν
11
Figure 11: The proper time ∆t along a path in space time gives the
elapsed proper time for a clock which follows that path.
2.2 Geodesics
Let’s think about 2D flat space-time again. Imagine all the paths
that start at (t1 , x1 ) and end at (t2 , x2 ). If we integrate dτ along this
path, we will get the proper time experienced by an observer travelling
along that path.
Figure 12: R Each path from (t1 , x1 ) to (t2 , x2 ) has a different proper
time ∆τ = dτ .
Remember that when things travel faster, time passes slower. The
more wiggly a path is, the faster that observer is travelling on average,
12
and the less proper time passes for them. The observer travelling on the
straight path experiences the most proper time of all.
Newton taught us that things move in straight lines if not acted on
by external forces. There is another way to understand this fact: things
move on paths that maximize their proper time when not acted on by
an external force.
This remains to be true in general relativity. Things like to move on
paths that maximize Z
∆τ = dτ.
13
The stress energy tensor Tµν can be thought of as a shorthand for
the energy density in space. Wherever there is stuff, there is a non-zero
Tµν . The exact form of Tµν depends on what the “stuff” actually is.
More specifically, the different components of Tµν correspond to dif-
ferent physical quantities.
14
Figure 14: Tµν is large where there is stuff, and 0 in the vacuum of
space.
Notice, however, that the Earth itself also has some matter density,
so it curves space-time as well. The thing is that it curves space-time
a lot less than the sun does. If we want to solve for the motion of
the Earth, we pretend is doesn’t have any mass and just moves in the
fixed “background” metric created by the sun. However, this is only an
approximation.
2
2GM 2 dr2
dτ = 1 − dt − 2GM
− r2 (dθ2 + sin2 θdφ2 ). (11)
r 1− r
2GM
1−
r 0 0 0
1
0 − 1− 2GM 0 0
gµν = r
2
(12)
0 0 −r 0
0 0 0 −r2 sin2 θ
15
Here we are using the spherical coordinates
(t, r, θ, φ)
where r is the radial coordinate and θ and φ are the “polar” and “az-
imuthal” angles on the surface of a sphere, respectively.
This is the first metric that was found using Einstein’s equations. It
was derived by a German man named Karl Schwarzschild. He wanted to
figure out what the metric was for a non-rotating spherically symmetric
gravitating body of mass M , like the sun. Outside of the radius of the
sun, the Schwarzschild metric does give the correct form for the metric
there. Inside the sun, the metric needs to be modified and becomes more
complicated
Interestingly, the metric “blows up” at the origin r = 0. Karl
Schwarzschild just assumed that this wasn’t physical. Because a real
planet or star would need the metric to be modified inside of its vol-
ume, this singularity would not exist in those cases. He assumed that
the singularity would not be able to form in real life under any circum-
stances. Einstein himself was disturbed by the singularity, and made
a number of flawed arguments for why they can’t exist. We know now
that he wasn’t right, and that these singularities really do form in real
life inside of what we call “black holes.”
In one of the amazing coincidences of history, “Schwarz” means “black”
in German while “schild” means “shield.” It appears that Karl Schwarzschild
was always destined to discover black holes, even if he himself didn’t
know that.
16
Note that at r = rs , the dt component of the metric becomes 0 and
the dr component becomes infinite. This particular singularity isn’t
“real.” It’s a “coordinate singularity.” There are other coordinates we
could use, like the Kruskal–Szekeres coordinates that do not have this
unattractive feature. We will ignore this.
The more important thing to note is that the dt and dr components
flip signs as r dips below 2GM . This is very significant. Remember that
the flat space metric is
The only thing that distinguishes time and space is a sign in the metric!
This sign flips once you cross the event horizon.
Here is why this is important. Say that a massive particle moves a
tiny bit to a nearby space-time point which is separated from the original
point by dτ . If the particle is moving slower than c, then dτ 2 > 0.
However, inside of a black hole, as per Eq. 11, we can see that when
dτ 2 > 0, the particle must either be travelling into the center of the
black hole or away from it. This is just because 11 is of the form
(
2 (+)dt2 + (−)dr2 + (−)dθ2 + (−)dφ2 if r > 2GM
dτ =
(−)dt2 + (+)dr2 + (−)dθ2 + (−)dφ2 if r < 2GM
were (+) denotes a positive quantity and (−) denotes a negative quan-
tity. In order that dτ 2 > 0, we much have dt2 > 0 outside of the event
horizon but dr2 > 0 inside the horizon, so dr cannot be 0.
Furthermore, if the particle started outside of the even horizon and
then went in, travelling with dr < 0 along its path, then by continuity
it has no choice but to keep travelling inside with dr < 0 until it hits
the singularity
The reason that a particle cannot “turn around” and leave the black
hole is the exact same reason why you cannot “turn around” and go back
in time. If you think about it, there is a similar “horizon” between you
and your childhood. You can never go back. If you wanted to go back
in time, at some point you would have to travel faster than the speed of
light (faster than 45◦ ).
The r coordinate becomes “time-like” behind the event horizon.
17
Figure 15: Going back in time requires going faster than c, which is
impossible.
Figure 16: Once you have passed the event horizon of a black hole, r
and t “flip,” so now going into the future means going further into the
black hole until you hit the singularity.
18
Figure 17: Penrose diagram of maximally extended space-time with
Schwarzschild metric.
There is a lot to unpack here. Let’s start with the right hand dia-
mond. This is space-time outside of the black hole, where everyone is
safe. The upper triangle is the interior of the black hole. Because the
boundary is a 45◦ angle, once you enter you cannot leave. This is the
event horizon. The jagged line up top is the singularity that you are
destined to hit once you enter the black hole. From the perspective of
people outside the black hole, it takes an infinite amount of time for
something enter the black hole. It only enters at t = +∞
Figure 18: Penrose diagram of black hole with some lines of constant
r and t labelled.
19
Figure 19: Two worldlines in this space-time, one which enters the
black hole and one which does not.
I’m sure you noticed that there are two other parts to the diagram.
The bottom triangle is the interior of the “white hole” and the left hand
diamond is another universe! This other universe is invisible to the
Schwarzschild coordinates, and only appears once the coordinates are
“maximally extended.”
First let’s look at the white hole. There’s actually nothing too crazy
about it. If something inside the black hole is moving away from the
singularity (with dr > 0) it has no choice but to keep doing so until it
leaves the event horizon. So the stuff that starts in the bottom triangle
is the stuff that comes out of the black hole. (In this context, however,
we call it the white hole). It enters our universe at t = −∞. It is
impossible for someone on the outside to enter the white hole. If they
try, they will only enter the black hole instead. This is because the can’t
go faster than 45◦ !
Figure 20: Stuff can come out of the white hole and enter our universe
at t = −∞.
20
Okay, now what the hell is up with this other universe? Its exactly
the same as our universe, but different. Note that two people in the
different universes can both enter the black hole and meet inside. How-
ever, they are both doomed to hit the singularity soon after. The two
universes have no way to communicate outside of the black hole.
Figure 21: People from parallel universes can meet inside the black
hole.
But wait! Hold the phone! Black holes exist in real life, right? Is
there a mirror universe on the other side of every black hole????
No. The Schwarzschild metric describes an “eternal black hole” that
has been there since the beginning of time and will be there until the
end of time. Real black holes are not like this. They form when stars
collapse. It is more complicated to figure out what the metric is if you
want to take stellar collapse into account, but it can be done. I will not
write the metric, but I will draw the Penrose diagram.
21
Figure 22: A Penrose diagram for a black hole that forms via stellar
collapse.
Because the black hole forms at a some finite time, there is no white
hole in our Penrose diagram. Likewise, there is no mirror universe.
Its interesting to turn the Penrose diagram upside down, which is
another valid solution to Einstein’s equations. This depicts a universe
in which a white hole has existed since the beginning of the universe. It
keeps spewing out material, getting smaller and smaller, until it disap-
pears at some finite time. No one can enter the white hole. If they try,
they will only see it spew material faster and faster as they get closer.
The white hole will dissolve right before their eyes. That is why they
can’t enter it.
Figure 23: The Penrose diagram for a white hole that exists for some
finite time.
22
2.7 Black Hole Evaporation
I have not mentioned anything about quantum field theory yet, but
I will give you a spoiler: black holes evaporate. This was discovered by
Stephen Hawking in 1975. They radiate energy in the form of very low
energy particles until they do not exist any more. This is a unique fea-
ture of what happens to black holes when you take quantum field theory
into account, and is very surprising. Having said that, this process is
extremely slow. A black hole with the mass of our sun would take 1067
years to evaporate. Let’s take a look at the Penrose diagram for a black
hole which forms via stellar collapse and then evaporates.
Figure 24: The Penrose diagram for a black hole which forms via stellar
collapse and then evaporates.
23
3 Quantum Field Theory
While reading this section, forget I told you anything about general
relativity. This section only applies to flat Minkowski space and has
nothing to do with black holes.
Ĥ = H → H (17)
24
3.2 Quantum Field Theory vs Quantum Mechanics
25
3.3 The Hilbert Space of QFT: Wavefunctionals
A classical field is a function φ(x) from space into R.
φ : R3 → R. (22)
φ ∈ C ∞ (R3 ). (23)
Now I’m going to tell you what a quantum field state is. Are you
ready? A quantum field state is a functional from classical fields to
complex numbers.
Ψ : C ∞ (R3 ) → C (24)
HQFT = all such wave functionals (25)
These are called “wave functionals.” Let’s say you have two wave func-
tionals Ψ and Φ. The inner product is this infinite dimensional integral,
which integrates over all possible classical field configurations:
Z Y
∗
hΨ|Φi = dφ(x) Ψ[φ] Φ[φ] . (26)
x∈R3
26
on a super computer. Furthermore, physicists have many reasons to
believe that if we had a full theory of quantum gravity, we would realize
that quantum field theory as we know it does break down at very tiny
Planck-length sized distances. Most likely it would not be anything as
crude as a literal lattice, but something must be going on at really small
lengths. Anyway, because we don’t have a theory of quantum gravity,
this is the best we can do for now.
The physical interpretation is that if |Ψ[φ]|2 is very big for a partic-
ular φ, then the quantum field is very likely to “be” in the classical field
configuration φ.
Note that we have a basis of wave functionals given by
(
1 if φ = φ0
Ψφ0 [φ] ∝ (27)
0 if φ 6= φ0
|Ψφ0 i
(You should think of these as the i in |ii. Each classical field φ0 labels
a “coordinate” of the QFT Hilbert space.) All other wave functionals
can be written as a linear combination of these wave functionals with
complex coefficients. However, this basis of the Hilbert space is physi-
cally useless. You would never ever see a quantum field state like these
in real life. (The reason is that they have infinite energy.) I will tell you
about a more useful basis for quantum field states a bit later.
Ô : H → H (28)
27
There are two important sets of observables I have to tell you about
for these wave functionals. They are called
There are an infinite number of them, one for each x. You should think
of the measurements φ̂(x) and π̂(x) as measurements occurring at the
point x in space. They are linear operators
Note that our previously defined Ψφ0 are eigenstates of this operator.
For any φ0 ∈ C ∞ (R3 ), we have
28
is the three dimensional volume measure.) This is just the infinite di-
mensional version of the partial derivative in multivariable calculus.
∂xi
= δij . (37)
∂xj
Basically, π̂(x) measures the rate of change of the wave functional with
respect to one particular field variable φ(x). (The i is there to make it
self-adjoint.) I don’t want to get bogged down in its physical interpre-
tation.
where
φ̂(x + ∆x, y, z) − φ̂(x, y, z)
∂x φ̂(x, y, z) = lim .
∆x→0 ∆x
Let’s get an intuitive understanding for this Hamiltonian by looking at
it term by term.
The
π̂(x)2
29
term means that a wavefunctional has a lot of energy if it changes quickly
when a particular field variable is varied.
For the other two terms, let’s imagine that our fields are well ap-
proximated by the state Ψφ0 , i.e. it is one of those basis states we talked
about previously. This means it is “close” to being a “classical” field.
Then the
(∇φ̂)2
term means that a wavefunctional has a lot of energy if φ0 has a big
gradient. Similarly, the
m2 φ̂2
term means the wave functional has a lot of energy if φ0 is non-zero in
a lot of places.
30
Figure 26: Some sample classical field configurations and the relative
size of Ψ0 when evaluated at each one. The upper-left field maximizes
Ψ0 because it is 0. The upper-right field is pretty close to 0, so Ψ0 is
still pretty big. The lower-left field makes Ψ0 small because it contains
large field values. The lower-right field makes Ψ0 because its frequency
|k| is large even though the Fourier-coefficient is not that large.
31
3. The m2 φ2 term wants likely classical field configurations to have
field values φ(x) close to 0. This is minimized by making Ψ0 peak
around the classical field configuration φ(x) = 0.
Now that we have some appreciation for the ground state, I want to
rewrite it in a suggestive way:
Z
1 p
Ψ0 [φ] ∝ exp − d3 k k2 + m2 |φk |2
2~
Y 1p 2 2
∝ exp − k + m2 |φk | .
3
2~
k∈R
We can see that |Ψ0 i “factorizes” nicely when written in terms of the
Fourier components φk of the classical field input.
3.7 Particles
You ask, “Alright, great, I can see what a quantum field is. But what
does this have to do with particles?”
Great question. These wave functionals seem to have nothing to do
with particles. However, the particle states are hiding in these wave
functionals, somehow. It turns out that we can make wave functionals
that describe a state with a certain number of particles possessing some
specified momenta ~k. Here is how you do it:
Let’s say that for each k, there are nk particles present with momenta
~k. Schematically, the wavefunctionals corresponding to these states are
Y 1p 2
Ψ[φ] ∝ Fnk (φk , φ−k ) exp − k + m2 |φk |2 . (43)
3
2~
k∈R
These states are definite energy states because they are eigenstates of
the Hamiltonian.
X p
Ĥ |nk1 , nk2 , nk3 , . . .i = E0 + 2 2 2
nk ~ k + m |nk1 , nk2 , nk3 , . . .i
k∈R2
(45)
32
(Remember that E0 is the energy of the ground state |Ψ0 i.) If you ever
took a class in special relativity, you would have learned that the energy
E of a particle with momentum p~ and mass m is equal to
E 2 = p2 c2 + m2 c4 . (46)
33
Figure 27: A localized wave packet is the sum of completely delocalized
definite-frequency waves. Note that you can’t localize a wave packet into
a volume that isn’t at least a few times as big as its wavelength.
34
order to calculate what the probability of such an event is.
(Once again, we might imagine that our tensor product is not truly taken
over all of R3 , but perhaps over a lattice of Planck-length spacing, for
all we know.) Each local Hilbert space Hx is given by all normalizable
functions from R → C. Following mathematicians, we might call such
functions L2 (R).
Hx = L2 (R)
Fixing x, each state in Hx simply assigns a complex number to each
possible classical value of φ(x). Once we tensor together all Hx , we
recover our space of field wave functionals. The question I now ask you
is: what are the position-space entanglement properties of the ground
state?
Let’s back up a bit and remind ourselves what the ground state
again. We wrote it in terms of the Fourier components:
Z
1 p
Ψ0 [φ] ∝ exp − d3 k k2 + m2 |φk |2
2~
d3 x
Z
φk ≡ 3/2
φ(x)e−ik·x
(2π)
We can plug in the bottom expression into the top expression to express
Ψ0 [φ] in terms of the position space classical field φ(x).
3 3
ZZZ
1 3 d xd y −ik·(x−y)
p
Ψ0 [φ] ∝ exp − dk e k2 + m2 φ(x)φ(y)
2~ (2π)3
ZZ 3 3
1 d xd y
∝ exp − f (|x − y|)φ(x)φ(y)
2~ (2π)3
35
One could in principle perform the k integral to compute f (|x − y|),
although I won’t do that here. (There’s actually a bit of funny business
you have to do, introducing a “regulator” to make the integral converge.)
The important thing to note is that the values of the field variables φ(x)
and φ(y) are entangled together by f (|x − y|), and the wave functional
Ψ0 does not factorize nicely in position space the way it did in Fourier
space. The bigger f (|x−y|) is, the larger the entanglement between Hx
and Hy is. We can see that in the ground state, the value of the field at
one point is quite entangled with the field at other points. Indeed, there
is a lot of short-range entanglement all throughout the universe. How-
ever, it turns out that f (|x − y|) becomes very small at large distances.
Therefore, nearby field variables are highly entangled, while distant field
variables are not very entangled.
This is not such a mysterious property. If your quantum field is in
the ground state, and you measure the value of the field at some x to
be φ(x) then all this means is that nearby field values are likely to also
be close to φ(x). This is just because the ground state wave functional
is biggest for classical fields that vary slowly in space.
You might wonder if this entanglement somehow violates causality.
Long story short, it doesn’t. This entanglement can’t be used to send
information faster than light. (However, it does have some unintuitive
consequences, such as the Reeh–Schlieder theorem.)
Let me wrap this up by saying what this has to do with the Firewall
paradox. Remember, in this section we have only discussed QFT in flat
space! However, while the space-time at the horizon of a black hole is
curved, it isn’t curved that much. Locally, it looks pretty flat. There-
fore, one would expect for quantum fields in the vicinity of the horizon
to behave much like they would in flat space. This means that low en-
ergy quantum field states will still have a strong amount of short-range
entanglement because short-range entanglement lowers the energy of the
state. (This is because of the (∇φ̂)2 term in the Hamiltonian.) However,
the Firewall paradox uses the existence of this entanglement across the
horizon to make a contradiction. One resolution to the contradiction is
to say that there’s absolutely no entanglement across the horizon what-
soever. This would mean that there is an infinite energy density at the
horizon, contradicting the assumption that nothing particularly special
happens there.
36
4 Statistical Mechanics
4.1 Entropy
Statistical Mechanics is a branch of physics that pervades all other
branches. Statistical mechanics is relevant to Newtonian mechanics,
relativity, quantum mechanics, and quantum field theory.
(x1 , y1 , z1 , px 1 , py 1 , pz 1 , . . . xN , yN , zN , px N , py N , pz N ) ∈ R6N
37
A “microstate” is a state of the above form. It contains absolutely
all the physical information that an omniscent observer could know. If
you were to know the exact microstate of a system and knew all of the
laws of physics, you could in principle deduce what the microstate will
be at all future times and what the microstate was at all past times.
However, practically speaking, we can never know the true microstate
of a system. For example, you could never know the positions and mo-
menta of every damn particle in a box of gas. The only things we can
actually measure are macroscopic variables such as internal energy, vol-
ume, and particle number (U, V, N ). A “macrostate” is just a set of
microstates. For examples, the “macrostate” of a box of gas labelled by
(U, V, N ) would be the set of all microstates with energy U , volume V ,
and particle number N . The idea is that if you know what macrostate
your system is in, you know that your system is equally likely to truly
be in any of the microstates it contains.
Figure 29: You may know the macrostate, but only God knows the
microstate.
38
ciated with a macrostate. If a macrostate is just a set of Ω microstates,
then the entropy S of the system is
S ≡ k log Ω. (48)
Here, k is Boltzmann’s constant. It is a physical constant with units of
energy / temperature.
Joules
k ≡ 1.38065 × 10−23 (49)
Kelvin
The only reason that we need k to define S is because the human race
defined units of temperature before they defined entropy. (We’ll see
how temperature factors into any of this soon.) Otherwise, we probably
would have set k = 1 and temperature would have the same units as
energy.
You might be wondering how we actually count Ω. As you probably
noticed, the phase space R6N is not discrete. In that situation, we
integrate over a phase space volume with the measure
d3 x1 d3 p1 . . . d3 xN d3 pN .
However, this isn’t completely satisfactory because position and mo-
mentum are dimensionful quantities while Ω should be a dimensionless
number. We should therefore divide by a constant with units of posi-
tion times momentum. Notice, however, that because S only depends
on log Ω, any constant rescaling of Ω will only alter S by a constant and
will therefore never affect the change in entropy ∆S of some process. So
while we have to divide by a constant, whichever constant we divide by
doesn’t affect the physics.
Anyway, even though we are free to choose whatever dimensionful
constant we want, the “best” is actually Planck’s constant h! Therefore,
for a classical macrostate that occupies a phase space volume Vol,
Z N
1 1 Y
Ω= d3 xi d3 pi . (50)
N ! h3N Vol i=1
39
in there and know where everything is. Therefore, the macrostate you
use to describe your room contains very few microstates and has a small
entropy. However, according to your mother who has not studied your
room very carefully, the entropy of your room is very large. The point
is that while everyone might agree your room is messy, the entropy of
your room really depends on how little you know about it.
40
Say system A could be in one of ΩA possible microstates and system
B could be in ΩB possible microstates. Therefore, the total AB system
could be in ΩA ΩB possible microstates. Therefore, the entropy SAB of
both systems combined is just the sum of entropies of both sub-systems.
Let’s say that the internal energy of system A is UA and the internal
energy of system B is UB . Crucially, note that the total energy of
combined system
UAB = UA + UB
is constant over time! This is because energy of the total system is
conserved. Therefore,
dUA = −dUB .
41
Now, the combined system will maximize its entropy when UA and UB
have some particular values. Knowing the value of UA is enough though,
because UB = UAB − UA . Therefore, entropy is maximized when
∂SAB
0= . (54)
∂UA
However, we can rewrite this as
∂SAB
0=
∂UA
∂SA ∂SB
= +
∂UA ∂UA
∂SA ∂SB
= −
∂UA ∂UB
1 1
= − .
TA TB
Therefore, our two systems are in equilibrium if they have the same
temperature!
TA = TB (55)
If there are other macroscopic variables we are using to define our
macrostates, like volume V or particle number N , then there will be
other quantities that must be equal in equibrium, assuming our two sys-
tems compete for volume or trade particles back and forth. In these
cases, we define the quantities P and µ to be
P ∂S µ ∂S
≡ ≡− . (56)
T ∂V U,N T ∂N U,V
PA = PB µA = µB . (57)
(You might object that pressure has another definition, namely force di-
vided by area. It would be incumbent on us to check that this definition
matches that definition in the relevant situation where both definitions
have meaning. Thankfully it does.)
42
4.3 The Partition Function
43
Figure 33: A large environment E and system S have a fixed total en-
ergy Etot . E is called a “heat bath” because it is very big. The combined
system has a temperature T .
Etot = E + EE (58)
Here is the important part. Say that our heat bath has a lot of energy:
Etot E. As far as the heat bath is concerned, E is a very small
amount of energy. Therefore,
1
ΩE (Etot − E) = exp SE (Etot − E)
k
1 E
≈ exp SE (Etot ) −
k kT
by Taylor expanding SE in E and using the definition of temperature.
We now have
E
Prob(E) ∝ ΩS (E) exp − .
kT
44
ΩS (E) is sometimes called the “degeneracy” of E. In any case, we can
easily see what the ratio of Prob(E1 ) and Prob(E2 ) must be.
Prob(E1 ) ΩS (E1 )e−E1 /kT
=
Prob(E2 ) ΩS (E2 )e−E2 /kT
Furthermore, we can use the fact that all probabilities must sum to 1 in
order to calculate the absolute probability. We define
X
Z(T ) ≡ ΩS (E)e−E/kT (61)
E
X
= e−Es /kT
s
P
where s is a sum over all states of S. Finally, we have
ΩS (E)e−E/kT
Prob(E) = (62)
Z(T )
However, more than being a mere proportionality factor, Z(T ) takes
on a life of its own, so it is given the special name of the “partition
function.” Interestingly, Z(T ) is a function that depends on T and
not E. It is not a function that has anything to do with a particular
macrostate. Rather, it is a function that has to with every microstate
at some temperature. Oftentimes, we also define
1
β≡
kT
and write X
Z(β) = e−βEs . (63)
s
The partition function Z(β) has many amazing properties. For one,
it can be used to write an endless number of clever identities. Here is
one. Say you want to compute the expected energy hEi your system
has at temperature T .
X
hEi = Es Prob(Es )
s
−βEs
P
s Es e
=
Z(β)
1 ∂
=− Z
Z ∂β
∂
=− log Z
∂β
45
This expresses the expected energy hEi as a function of temperature.
(We could also calculate hE n i for any n if we wanted to.)
Where the partition function really shines is in the “thermodynamic
limit.” Usually, people define the thermodynamic limit as
SS = N S1 E = N E1
The thing to really gawk at in the above equation is that the probability
that S has some energy E is given by
Prob(E) ∝ eN (...) .
46
Prob(E) will change radically. Therefore, Prob(E) will be extremely
concentrated at some particular energy, and deviating slightly from that
maximum will cause Prob(E) to plummit.
Figure 34: In the thermodynamic limit, the system S will have a well
defined energy.
47
Let’s just appreciate this for a second. Our original definition of
S(U ) was
S(U ) = k log(Ω(U ))
and our original definition of temperature was
1 ∂S
= .
T ∂U
In other words, T is a function of U . However, we totally reversed logic
when we coupled our system to a larger environment. We no longer
knew what the exact energy of our system was. I am now telling you
that instead of calculating T as a function of U , when N is large we are
actually able to calculate U as a function of T ! Therefore, instead of
having to calculate Ω(U ), we can just calculate Z(T ) instead.
I should stress, however, that Z(T ) is still a perfectly worthwhile
thing to calculate even when your system S isn’t “big.” It will still give
you the exact average energy hEi when your system is in equilibrium
with a bigger environment at some temperature. What’s special about
the thermodynamic limit is that you no longer have to imagine the heat
bath is there in order to interpret your results, because any “average
quantity” will basically just be an actual, sharply defined, “quantity.” In
short,
Z(β) = Ω(U )e−βU (thermodynamic limit) (66)
It’s worth mentioning that the other contributions to Z(β) will also be
absolute huge; they just won’t be as stupendously huge as the term due
to U .
Okay, enough adulation for the partition function. Let’s do some-
thing with it again. Using the above equation there is a very easy way
to figure out what SS (U ) is in terms of Z(β).
SS (U ) = k log ΩS (U )
= k log ZeβU
(thermodynamic limit)
= k log Z + kβU
∂
=k 1−β log Z
∂β
(Gah. Another amazing identity, all thanks to the partition function.)
This game that we played, coupling our system S to a heat bath so
we could calculate U as a function of T instead of T as a function of
U , can be replicated with other quantities like the chemical potential µ
(defined in Eq. 57). We could now imagine that S is trading particles
48
with a larger environment. Our partition function would then be a
function of µ in addition to T .
Z = Z(µ, T )
In the thermodynamic limit, we could once again use our old tricks to
find N in terms of µ.
F ≡ U − TS (67)
(This is also called the “Helmholtz Free Energy.”) F is defined for any
system with some well defined internal energy U and entropy S when
present in a larger environment which has temperature T . Crucially,
the system does not need to be in thermal equilibrium with the environ-
ment. In other words, free energy is a quantity associated with some
system which may or may not be in equilibrium with an environment at
temperature T .
49
can let the second law of thermodynamics to do all the hard work,
transferring energy into our system at no cost to us! I should warn
you that ∆F is actually not equal to the change in internal energy ∆U
that occurs during this equilibriation. This is apparent just from its
definition. (Although it does turn out that F is equal to the “useful
work” you can extract from such a system.)
The reason I’m telling you about F is because it is a useful quan-
tity for determining what will happen to a system at temperature T .
Namely, in the thermodynamic limit, the system will minimize F by
equilibriating with the environment.
Recall Eq. 66 (reproduced below).
Z(β) = exp k1 S − βU
(at equilibrium in thermodynamic limit)
= exp(−βF ).
First off, we just derived another amazing identity of the partition func-
tion. More importantly, recall that U , as written in Eq. 66, is defined
to be the energy that maximizes Ω(U )e−βU , A.K.A. the energy that
maximizes the entropy of the world. Because we know that the entropy
of the world always wants to be maximized, we can clearly see that F
wants to be minimized, as claimed.
Therefore, F is a very useful quantity! It always wants to be min-
imized at equilibrium. It can therefore be used to detect interesting
phenomena, such as phase transitions.
50
Figure 36: A phase transition, right below the critical temperature Tc ,
at Tc , and right above Tc .
This can indeed happen, and is in fact what a physicist would call a
“first order phase transition.” We can see that will be a discontinuity in
the first derivative of Z(T ) at Tc . You might be wondering how this is
possible, given the fact that from its definition, Z is clearly an analytic
function as it is a sum of analytic functions. The thing to remember is
that we are using the thermodynamic limit, and the sum of an infinite
number of analytic functions may not be analytic.
Because there is a discontinuity in the first derivative of Z(β), there
∂
will be a discontinuity in E = − ∂β log Z. This is just the “latent heat”
you learned about in high school. In real life systems, it takes some
time for enough energy to be transferred into a system to overcome
the latent heat energy barrier. This is why it takes so long for a pot
of water to boil or a block of ice to melt. Furthermore, during these
lengthy phase transitions, the pot of water or block of ice will actually
be at a constant temperature, the “critical temperature” (100◦ C and 0◦ C
respectively). Once the phase transition is complete, the temperature
can start changing again.
51
4.6 Example: Box of Gas
For concreteness, I will compute the partition function for an ideal
gas. By ideal, I mean that the particles do not interact with each other.
Let N be the number of particles in the box and m be the mass of
each particle. Suppose the particles exist in a box of volume V . The
positions and momenta of the particles at ~xi and p~i for i = 1 . . . N . The
energy is given by the sum of kinetic energies of all particles.
N
X p~2i
E= . (68)
i=1
2m
Therefore,
X
Z(β) = e−βEs
s
N N
!
Z Y 2
1 1 X p~i
= 3N
d3 xi d3 pi exp −β
N! h i=1 i=1
2m
N Z
1 VN Y 2
p
~
= 3N
d3 pi exp −β i
N ! h i=1 2m
1 V N 2mπ 3N/2
=
N ! h3N β
If N is large, the thermodynamic limit is satisfied. Therefore,
∂
U =− log Z
∂β
3 ∂
−2
V 32 2mπ
=− N log N ! 3N 3
2 ∂β h β
3N
=
2β
3
= N kT.
2
You could add interactions between the particles by adding some po-
tential energy between V each pair of particles (unrelated to the volume
V ).
N
X p~2i 1X
E= + V (|~xi − ~xj |) (69)
i=1
2m 2 i,j
52
Figure 38: An example for an interaction potential V between particles
as a function of distance r.
It turns out that entropy is maximized when all the probabilities ps are
equal to each other. Say there are Ω states and each ps = Ω−1 . Then
S = log Ω (71)
53
One tiny technicality when dealing with the Shannon entropy is in-
terpreting the value of
0 log 0.
It is a bit troublesome because log 0 = −∞. However, it turns out that
the correct value to assign the above quantity is
0 log 0 ≡ 0.
lim x log x = 0.
x→0
2. Uncertainty due to the fact that you may not know the exact
quantum state your system is in anyway. (This is sometimes called
“classical uncertainty.”)
ρ : H → H. (72)
54
which one it is in. This would be an example of a “classical superposi-
tion” of quantum states. Usually, we think of classical superpositions as
having a thermodynamical nature, but that doesn’t have to be the case.
Anyway, say that your lab mate thinks there’s a 50% chance the
system could be in either state. The density matrix corresponding to
this classical superposition would be
1 1
ρ= |ψ1 i hψ1 | + |ψ2 i hψ2 | .
2 2
More generally, if you have a set of N quantum states |ψi i each with a
classical probability pi , then the corresponding density matrix would be
N
X
ρ= pi |ψi i hψi | . (73)
i=1
55
We can therefore see that for our state |ψi,
hÔi = Tr |ψi hψ| Ô . (77)
ρ = |ψi hψ|
for some |ψi is said to represent a “pure state,” because you know with
100% certainty which quantum state your system is in. Note that for a
pure state,
ρ2 = ρ (for pure state).
It turns out that the above condition is a necessary and sufficient con-
dition for determining if a density matrix represents a pure state.
If a density matrix is instead a combination of different states in a
classical superposition, it is said to represent a “mixed state.” This is
sort of bad terminology, because a mixed state is not a “state” in the
Hilbert space Ĥ, but whatever.
H = C2
The pure state density matrix is different from the mixed state because
of the non-zero off diagonal terms. These are sometimes called “inter-
ference terms.” The reason is that states in a quantum superposition
can “interfere” with each other, while states in a classical superposition
can’t.
Let’s now look at the expectation value of the following operators
for both density matrices.
1 0 0 1
σz = σx =
0 −1 1 0
They are given by
1
0 1 0
hσz iMixed = Tr 2 =0
0 12 0 −1
1 1
1 0
hσz iPure = Tr 2 2
1 1 =0
2 2
0 −1
1
0 0 1
hσx iMixed = Tr 2 =0
0 21 1 0
1 1
0 1
hσx iPure = Tr 2 2
1 1 =1
2 2
1 0
57
So we can see that a measurement given by σz cannot distinguish be-
tween ρMixed and ρPure , while a measurement given by σx can distinguish
between them! There really is a difference between classical super posi-
tions and quantum superpositions, but you can only see this difference
if you exploit the off-diagonal terms!
|ψiA |ψiB
58
for some |ψiA ∈ HA and |ψiB ∈ HB .
So, for example, if HA = C2 and HB = C2 , then the state
1 i
|0i √ |0i − √ |1i
2 2
would not be entangled, while the state
1
√ |0i |0i + |1i |1i
2
would be entangled.
Let’s say a state starts out unentangled. How would it then become
entangled over time? Well, say the two systems A and B have Hamilto-
nians ĤA and ĤB . If we want the systems to interact weakly, i.e. “trade
energy,” we’ll also need to add an interaction term to the Hamiltonian.
59
Figure 39: Air molecules bumping up against a quantum system S will
entangle with it.
Notice that the experimentalist will not have access to the observ-
ables in the environment. Associated with HS is a set of observables
ÔS . If you tensor these observables together with the identity,
ÔS ⊗ 1E
you now have an observable which only measures quantities in the HS
subsector of the full Hilbert space. The thing is that entanglement
within the environment gets in the way of measuring ÔS ⊗ 1E in the
way the experimenter would like.
Say, for example, HS = C2 and HE = CN for some very big N . Any
state in HS ⊗ HE will be of the form
c0 |0i |ψ0 i + c1 |1i |ψ1 i (79)
for some c0 , c1 ∈ C and |ψ0 i , |ψ1 i ∈ H. The expectation value for our
observable is
∗ ∗
hÔS ⊗ 1E i = c0 h0| hψ0 | + c1 h1| hψ1 | ÔS ⊗ 1E c0 |0i |ψ0 i + c1 |1i |ψ1 i
=|c0 |2 h0| ÔS |0i + |c1 |2 h1| ÔS |1i + 2 Re c∗0 c1 h0| ÔS |1i hψ0 |ψ1 i
The thing is that, if the environment E is very big, then any two random
given vectors |ψ0 i , |ψ1 i ∈ HE will generically have almost no overlap.
hψ0 |ψ1 i ≈ e−N
(This is just a fact about random vectors in high dimensional vector
spaces.) Therefore, the expectation value of this observable will be
hÔS ⊗ 1E i ≈ |c0 |2 h0| ÔS |0i + |c1 |2 h1| ÔS |1i .
60
Because there is no cross term between |0i and |1i, we can see that
when we measure our observable, our system S seems to be in a classical
superposition, A.K.A a mixed state!
This can be formalized by what is called a “partial trace.” Say that
|φi iE comprises an orthonormal basis of HE . Say we have some density
matrix ρ representating a state in the full Hilbert space. We can “trace
over the E degrees of freedom” to recieve a density matrix in the S
Hilbert space.
X
ρS ≡ TrE (ρ) ≡ E hφi | ρ |φi iE . (80)
i
You be wondering why anyone would want to take this partial trace.
Well, I would say that if you can’t perform the E degrees of freedom,
why are you describing them? It turns out that the partially traced
density matrix gives us the expectation values for any observables in
S. Once we compute ρS , by tracing over E, we can then calculate the
expectation value of any observable ÔS by just calculating the trace over
S of ρS ÔS :
Tr ρÔS ⊗ 1E = TrS (ρS ÔS ).
Even though the whole world is in some particular state in HS ⊗ HE ,
when you only perform measurements on one part of it, that part might
as well only be in a mixed state for all you know! Entanglement looks
like a mixed state when you only look at one part of a Hilbert space.
Furthermore, when the environment is very large, the off diagonal “in-
terference terms” in the density matrix are usually very close to zero,
meaning the state looks very mixed.
This is the idea of “entanglement entropy.” If you have an entangled
state, then trace out over the states in one part of the Hilbert space,
you will recieve a mixed density matrix. That density matrix will have
some Von Neumann entropy, and in this context we would call it “en-
tanglement entropy.” The more entanglement entropy your state has,
the more entangled it is! And, as we can see, when you can only look
at one tiny part of a state when it is heavily entangled, it appears to be
in a classical superposition instead of a quantum superposition!
The process by which quantum states in real life become entangled
with the surrounding environment is called “decoherence.” It is one of
the most visciously efficient processes in all of physics, and is the reason
why it took the human race so long to discover quantum mechanics. It’s
very ironic that entanglement, a quintessentially quantum phenomenon,
when taken to dramatic extremes, hides quantum mechanics from view
61
entirely!
I would like to point out an important difference between a clas-
sical macrostate and a quantum mixed state. In classical mechanics,
the subtle perturbing effects of the environment on the system make it
difficult to keep track of the exact microstate a system is in. However,
in principle you can always just re-measure your system very precisely
and figure out what the microstate is all over again. This isn’t the case
in quantum mechanics when your system becomes entangled with the
environment. The problem is that once your system entangles with the
environment, that entanglement is almost certainly never going to undo
itself. In fact, it’s just going to spread from the air molecules in your
laboratory to the surrounding building, then the whole univeristy, then
the state, the country, the planet, the solar system, the galaxy, and then
the universe! And unless you “undo” all of that entanglement, the show’s
over! You’d just have to start from scratch and prepare your system in
a pure state all over again.
Obviously, this is just the same Z(T ) that we saw in classical mechanics!
They are really not different at all. However, there is something very
interesting in the above expression. The operator
exp −Ĥ/kT
if we just replace
i
− t −→ −β.
~
It seems as though β is, in some sense, an “imaginary time.” Rotating the
time variable into the imaginary direction is called a “Wick Rotation,”
62
and is one of the most simple, mysterious, and powerful tricks in the
working physicist’s toolbelt. There’s a whole beautiful story here with
the path integral, but I won’t get into it.
Anyway, a mixed state is said to be “thermal” if it is of the form
1 X −Es /kT
ρThermal = e |Es i hEs | (82)
Z(T ) s
1 −β Ĥ
= e
Z(β)
for some temperature T where |Es i are the energy eigenstates with eigen-
values Es . If you let your system equilibriate with an environment at
some temperature T , and then trace out by the environmental degrees
of freedom, you will find your system in the thermal mixed state.
63
5 Hawking Radiation
5.1 Quantum Field Theory in Curved Space-time
When you have some space-time manifold in general relativity, you
can slice it up into a bunch of “space-like” surfaces that represent a
choice of instances in time. These are called “time slices.” All the normal
vectors of the surface must be time-like.
Once you make these time slices, you can formulate a quantum field
theory in the space-time. A quantum field state on a time slice Σ is just
a wave functional
Ψ : C ∞ (Σ) → C.
(Of course, once again this is just the case for a spin-0 boson, and will
be more complicated for different types of quantum fields, such as the
ones we actually see in nature.) Therefore, we have a different Hilbert
space for each time slice Σ.
This might seem a bit weird to you. Usually we think about all states
as living in one Hilbert space, and the state evolves in time according
to the Schrödinger equation. Here, we have a different Hilbert space for
each time slice and the Schrödinger equation evolves a state from one
Hilbert space into a state in a different Hilbert space. This is just a
64
convenient way of talking about quantum fields in curved space-time,
and is nothing “new” exactly. We are not really modifying quantum
mechanics in any way, we’re just using new language.
The reason for this is very subtle, and difficult to explain in words.
Perhaps one day I will be able to explain why the black hole emits Hawk-
ing radiation in a way that is both intuitive and correct, but as of now
I cannot, so I won’t. I will say, however, that the emission of Hawking
radiation crucially relies on the fact that different observers have differ-
ent notions of what they would justifiably call a particle. While there
were initially no particles in the quantum field before the black hole
formed, the curvature caused by black hole messes up the definition of
what a “particle is,” and so all of a sudden particles start appearing out
of nowhere. You shouldn’t necessarily think of the particles as coming
off of the horizon of the black hole, even though the formation of the
horizon is crucial for Hawking radiation to be emitted. Near the horizon,
the “definition of what a particle is” is a very fuzzy thing. However, once
you get far enough away from the black hole, you would be justified in
claiming that it is emitting particles. Capisce?
Now, in real life, for a black hole that has approximately the mass
of a star, this Hawking radiation will be extremely low-energy, perhaps
even as low-energy as Jeb. In fact, the bigger the black hole, the lower
the energy of the radiation. The Hawking radiation from any actually
existing black hole is far too weak to have been detected experimentally.
65
5.3 The shrinking black hole
However, Hawking didn’t stop there. The black hole is emitting
particles, and those particles must come from somewhere. Furthermore,
Einstein’s theory of general relativity tells us that energy has some effect
on space-time, given by
1 8πG
Rµν − gµν R = 4 Tµν .
2 c
However, there is an issue. What is Tµν for the quantum field? In
quantum mechanics, a state can be a superposition of states with dif-
ferent energies. However, there is only one space-time manifold, not a
superposition of multiple space-time manifolds! So what do we do?
The answer? We don’t know what to do! This is one to view the
problem of quantum gravity. We’re okay with states living on time-slices
in a curved manifold. No issues there! But when we want to study how
the quantum field then affects the manifold its living on, we have no
idea what to do.
In other words, we have a perfectly good theory of classical gravity.
But we don’t know what the “Hilbert space” of gravity is! There are
many proposals for what quantum gravity could be, but there are no
proposals that are all of the following:
1. Consistent
2. Complete
3. Predictive
4. Applicable to the universe we actually live in
5. Confirmed by experiment.
In fact, maybe there is no “Hilbert space for gravity,” and in order to
figure out the correct theory of quantum gravity we will have to finally
supersede quantum mechanics. But there are currently no proposals
that do this. For example, the notion of a Hilbert space remains intact
in both string theory and loop quantum gravity.
But certainly we don’t need to know the complete theory of quantum
gravity in order to figure out what happens to our black hole, right? For
example, all of the particles in the earth and the sun are quantum in
nature, and yet we have no trouble describing the motion of Earth’s
orbit. So even though we don’t have a complete theory of quantum
gravity, we can still analyze certain situations, right?
66
Indeed. While the stress energy tensor for a quantum field does not
have a definite value, we can still define the expectation value for the
stress energy tensor, hT̂µν i. We could then guess that the effect of the
quantum field on space time is given by
1 8πG
Rµν − gµν R = 4 hT̂µν i.
2 c
This is the so called “semi-classical approximation” which Hawking used
to figure out how the radiation affects the black hole. This has caused
much consternation since.
You might argue that a black hole is a very extreme object because
of its singularity. Presumably, one would need a theory of quantum
gravity to properly describe what goes on in the singularity of a black
hole where space-time is infinitely curved. So then why are we using the
semi-classical approximation in a situation where it does not apply?
The answer is that, yes, we are not yet able to describe the singularity
of the black hole. However, we are not trying to. We are only trying
to describe what is going on at the horizon, where space time is not
particularly curved at all. So our use of the semi-classical approximation
ought to be perfectly justified.
Anyway, because energetic particles are leaving the black hole, Hawk-
ing realized that, assuming the semi-classical approximation is reason-
able, the black hole itself will actually shrink, which would never happen
classically!
Because of this, the black hole will shrink more and more as time goes
on, emitting higher energy radiation as it does so. Therefore, as it gets
smaller it also shrinks faster. The Hawking radiation would eventually
become very energetic and detectable by astronomers here on Earth.
Presumably, in its final moments it would explode like a firecracker in
the night sky. (The semi-classical approximation would certainly not
apply as the black hole poofs out of existence.)
Figure 42: The black hole evaporating, emitting increasingly high en-
ergy Hawking radiation, shrinking, and then eventually disappearing.
67
However, we have never actually seen this happen. The black holes
that we know about are simply too big and would be shrinking too
slowly. A stellar mass black hole would take 1067 years to evaporate in
this way.
But maybe much smaller black holes formed due to density fluctu-
ations in the early universe instead of from stellar collapse. Perhaps
these black holes would just be finishing up their evaporation process
now and would be emitting Hawking radiation energetic enough to de-
tect. While plausible, this has never been observed. These would be
called “primordial black holes” but remain hypothetical.
68
5.5 Partner Modes
69
mass. This is drawn in Fig. 43. Note that the outgoing mode starts
out near the horizon with a small wavelength and high energy, but its
wavelength gets redshifted as it escapes the gravitational pull of the
black hole.
The crucial thing about these two modes is that they are heavily
entangled. By that, I mean that if the outgoing mode has some occupa-
tion number then the infalling mode must also have the same occupation
number. (Speaking fuzzily, for every particle that comes out, one part-
ner particle must fall in.) So if we think about the Hilbert space of
a single mode (assuming we are talking about approximately localize
wavepackets) we can imagine states are given by linear combinations of
states of the form
|nik
where the integer n is the occupation number of the k mode. The Hilbert
space of the partner modes is given by
Hawking’s discovery was that the modes were entangled sort of like
X
f (n) |nik,in |nik,out . (86)
n
Hopefully you can see what I mean by the modes being entangled.
To reiterate, when we trace out by the infalling mode, the density
matrix of the outgoing mode looks thermal. Therefore, the outside ob-
server will not be able to see superpositions between different occupation
number states in the outgoing mode. This is just another way of saying
that
1 1 1 1
√ |0i |0i + √ |1i |1i and √ |0i + √ |1i
2 2 2 2
are different. It’s just now our Hilbert space is spanned by mode occu-
pation numbers instead of 0 and 1.
70
6 The Information Paradox
6.1 What should the entropy of a black hole be?
Pretend that you didn’t know black holes radiate Hawking radiation.
What would you guess the entropy of the black hole to be, based on the
theory of general relativity?
An outside observer can measure a small number of quantities which
characterize the black hole. (This is assuming the black hole has finished
its collapsing process and has settled down into a stable configuration.)
There’s obviously the mass of the black hole, which is its most important
quantity. Interestingly, if the star was spinning before it collapsed, the
black hole will also have some angular momentum and its equator will
bulge out a bit. So the black hole is also characterized by an angular
momentum vector. Furthermore, if the star had some net electric charge,
the black hole will also have a charge.
However, if an outside observer knows these quantities, they will
know everything about the black hole. So we should expect for the
entropy of a black hole to be 0.
But maybe that’s not quite fair. After all, the contents of the star
should somehow be contained in the singularity, hidden behind the hori-
zon. Interestingly, all of the specific details of the star from before the
collapse do not have any effect on the properties of the resulting black
hole. The only stuff that matters it the total mass, total angular mo-
mentum, etc. That leaves an infinite number of possible stars that could
all have produced the same black hole. So actually, we should expect
the entropy of a black hole to be ∞.
However, instead of being 0 or ∞, it seems as though the actual “en-
tropy” of a black hole is an average of the two: finite, but stupendously
large. Here are some numerical estimates taken from [3]. The entropy
of the universe (minus all the black holes) mostly comes from cosmic
microwave background radiation, and is about 1087 (setting k = 1).
Meanwhile, the entropy of a solar mass black hole is 1078 . The entropy
of our sun, as it is now, is a much smaller 1060 . The entropy of the
supermassive black hole in the center of our galaxy is 1088 , larger than
the rest of the universe combined (minus black holes). The entropy of
any of the largest known supermassive black holes would be 1096 .
There is a simple “argument” which suggests that black holes are the
most efficient information storage devices in the universe: if you wanted
to store a lot of information in a region smaller than a black hole horizon,
it would probably have to be so dense that it would just be a black hole
71
anyway, as it would be contained inside its own Schwarzschild horizon.
72
law of thermodynamics. Moreover, when two black holes merge, the
area of the final black hole will always exceed the sum of the areas of
the two original black holes.
73
principle of equivalence” and used it to help him figure out his theory of
general relativity. In other words, general relativity, which is the more
fundamental theory of gravity, left behind a “clue” in the less funda-
mental theory of Newtonian gravity. If you correctly identify physical
principles which should hold in the more fundamental theory, you can
use them to figure out what that more fundamental theory actually is.
Physicists now believe that “conservation of information” is one of
those principles, on par with the principle of equivalence. Because in-
formation is never truly lost in any known physical process, and because
it sounds appropriately profound, it might useful to adopt the attitude
that information is never lost, and see where that takes us.
In that spirit, many physicists disagree with Hawking’s original claim
that information is truly lost in the black hole. They don’t know exactly
why Hawking was wrong, but they think that if they assume Hawking
is wrong, it will help them figure out something about quantum gravity.
(And I think that does make some sense.)
But then what is the paradox in the “information paradox?” Well,
there is no paradox in the literal sense of the word. See, a paradox is
when you derive a contradiction. But the thing we derive, that infor-
mation is lost in the black hole, is only a “contradiction” if we assume
that information is never lost to an outside observer. (And if we’re be-
ing honest, seeing as we do not yet have a theory of quantum gravity,
we don’t yet know for sure if that’s false.) In other words, it’s only a
“paradox” if we assume it’s a paradox, and that’s not much of a paradox
at all.
But so what. Who cares. These are just words. Even if it’s not
a “paradox” in the dictionary sense of the word, its still something to
think about nonetheless.
To summarize, most physicists believe that the process of black hole
evaporation should truly be unitary. If they knew how it was unitary,
there would no longer be a “paradox.”
There’s one possible resolution I’d like to discuss briefly. What if
the black hole never “poofs” away in the final stage of evolution, but
some quantum gravitational effect we do not yet understand stabilizes
it instead, allowing for some Planck-sized object to stick around? Such
an object would be called a “remnant.” The so called “remnant solution”
to the information paradox is not a very popular one. People don’t like
the idea of a very tiny, low-mass object holding an absurdly large amount
of information and being entangled with a very large number other of
particles. It seems much more reasonable to people that the information
of what went into the black hole is being released via the radiation in a
74
way too subtle for us to currently understand.
Figure 44: The Penrose diagram containing a black hole which evap-
orates, with some time-slices drawn. Σ1 is the time slice in the infinite
past and Σ3 is the time slice in the infinite future. Σ2 passes through
the point where the black hole poofs out of existence, dividing the slice
into two halves.
75
Furthermore, let
Uji : Hi → Hj
be the unitary time evolution operator that evolves a state in Hi to a
state in Hj . Note that
Uij = Uji−1 .
Crucially, the Hamiltonian for our quantum field is local. That means
that the degrees of freedom on the “in” half of Σ2 can’t make it out to
Σ3 . However, it turns out this entire picture is incompatible with unitary
time evolution. Why?
Well, consider the unitary operator
U23 U31 .
This evolves an initial state on Σ1 to a state on Σ3 , and then de-evolves
it backwards to a state on Σ2 . Say we have some initial state
|ψ1 i ∈ H1
and act on it with U23 U31 . We will call the result |ψ2 i:
|ψ2 i ≡ U23 U31 |ψ1 i ∈ H2 .
However, if we want an outside observer to be able to reconstruct what
went into the black hole, the the density matrix corresponding to |ψ2 i
must be pure once we trace out by the “in” degrees of freedom. That is,
Trin (|ψ2 i hψ2 |)
must be pure. This is only possible if
|ψ2 i = |ψin i |ψout i
for some
|ψin i ∈ Hin |ψout i ∈ Hout .
Therefore, inverting our unitary operator, we can now write
|ψ1 i = U13 U32 |ψin i |ψout i .
Here comes the key step. If the Hamiltonian is local, and only the “out”
part of a state can go off to affect the state on Σ3 , then if we replace
|ψin i with some other state, then the above equation should still hold.
In other words, we should have both equations
|ψ1 i = U13 U32 |ψin i |ψout i
0
|ψ1 i = U13 U32 |ψin i |ψout i
76
for any two distinct states
0
|ψin i , |ψin i ∈ Hin .
However, subtracting one of those equations from the other, we see that
0
0 = U13 U32 |ψin i − |ψin i |ψout i .
This is a contradiction because unitary operators must be invertible!
(Some of you might recognize that we have emulated the proof of the
“no cloning” theorem of quantum mechanics. Here, however, we have
proven something more like a “no destruction” theorem, seeing as Hin
crashes into the singularity and is destroyed.)
So wait, what gives? When we assumed that time evolution was
unitary, we derived a contradiction. What is the resolution to this con-
tradiction?
One possible resolution is to postulate that the inside of the black
hole does not exist.
77
“radical conservatism” we should still allow for people to jump into the
black hole and see things the way Einstein’s theory would predict they
would see it. The crucial realization is that, for the person who jumped
into the black hole, the outside universe may as well not exist.
Figure 46: Maybe someone who jumps into a black hole relinquishes
the right to describe what goes on outside of it.
78
the region of space that is contained within one Planck length of the
horizon.
Figure 47: The “stretched horizon” is the region that is within one
Planck length lp of the horizon.
First, as the infalling observer nears the horizon, the outside observer
will see them drape themselves over the horizon like a table cloth. (This
is actually a prediction of general relativity.) In the limit that the in-
falling observer is much less massive than the black hole, they will never
actually enter the black hole but only asymptotically approach the hori-
zon. However, if the infalling observer has some finite mass, their own
gravitational field will distort the horizon a bit to allow the observer to
enter it at some very large yet finite time.
Susskind proposed that something different happens. Instead of en-
tering the black hole at some finite time, the infalling observer will in-
stead be stopped at the stretched horizon, which is quite hot when you
get up close. At this point they will be smeared all over the horizon like
cream cheese on a bagel. Then, the Hawking radiation coming off of the
horizon will hit the observer on its way out, carrying the information
79
about them which has been plastered on the horizon.
So the outside observer, who is free to collect this radiation, should
be able to reconstruct all the information about the person who went in.
Of course, that person will have burned up at the stretched horizon and
will be dead. From the infalling observer’s perspective, however, they
were able to pass peacefully through the black hole and sail on to the
singularity. So from their perspective they live, while from the outside it
looks like they died. However, no contradiction can be reached, because
nobody has access to both realities.
Having said this, in order that we can’t derive a contradiction, it
must take some time for the infalling observer to “thermalize” (equi-
libriate) on the horizon. Otherwise, the outside observer could see the
infalling observer die and then rocket themselves straight into the black
hole themselves to meet the alive person once again before they hit the
horizon, thus producing a contradiction.
Somehow, according to the BHC worldview, the information out-
side the horizon is redundant with the information inside the horizon.
Perhaps the two observers are simply viewing the same Hilbert space
through different bases.
80
black hole remains near the black hole (perhaps in the stretched hori-
zon). Therefore, the radiation we collect at early times will still remain
heavily entangled with the degrees of freedom near the black hole, and
as such the state will look mixed to us because we cannot yet observe
all the complicated entanglement.
Furthermore, as we continue collect radiation, generically speaking
the radiation will still be heavily entangled with those near-horizon de-
grees of freedom.
However, once we hit the Page time, something special happens. The
entanglement entropy of the outgoing radiation finally starts decreasing,
as we are finally able to start seeing entanglements between all this
seemingly random radiation we have painstakingly collected. Don Page
proposed the following graph of what he entanglement entropy of the
outgoing radiation should look like. It is fittingly called the “Page curve.”
Some people like to say that if one could calculate the Page curve
from first principles, the information paradox would be solved.
The Page curve starts by increasing linearly until the Page time. Let
me explain the intuition behind the shape of this graph. As more and
more information leaves the black hole in the form of Hawking radiation,
we are “tracing out” fewer and fewer of the near-horizon degrees of free-
dom. The dimension of our density matrix grows bigger and bigger, and
because the outgoing radiation is still so entangled with the near-horizon
degrees of freedom, the density matrix will still have off diagonal terms
which are essentially zero. Recall that if you tensor together a Hilbert
space of dimension n with a Hilbert space of dimension m, the resulting
Hilbert space has dimension n × m. Therefore, once the black hole’s
81
entropy has reduced by half, the dimension of the Hilbert space we are
tracing out finally becomes smaller than the dimension of the Hilbert
space we are not tracing out. The off-diagonal terms spring into our
density matrix, growing in size and number as the black hole continues
to shrink. Finally, once the black hole is gone, we can easily see that all
the resulting radiation is in a pure state.
Let me now dumb down the thought experiment conducted in the
AMPS paper. (I will try to keep the relevant details but not reproduce
the technical justifications for why this thought experiment should work,
and to be honest I do not understand all of them.) Say an observer,
commonly named Alice, collects all the Hawking radiation coming out of
a black hole and waits for the Page time to come and go. At maybe about
1.5 times the Page time, Alice is now able to see significant entanglement
in all the radiation she has collected. Alice then dives into the black hole,
and sees an outgoing Hawking mode escaping.
Figure 49: Alice diving into the black hole after the Page time to see
the outgoing mode emerge, just like in Fig. 43.
82
tioned before. (Here I am using the so-called “no drama” postulate,
which is really just the equivalence principle. Alice ought to still be able
to use regular old quantum field theory just fine as she passes through
the horizon. As I explained previously, a quantum field which is not
highly entangled on short distances will have a very large energy den-
sity, thus violating the “no drama” postulate.) The contradiction is that
the outgoing mode cannot be entangled both with all the radiation Alice
has already collected and also with the nearby infalling mode.
Why not? Well, it has to do with something called the “strong
subadditivity of entanglement entropy.” Say you tensor together three
Hilbert spaces HA , HB and HC .
HABC = HA ⊗ HB ⊗ HC
Likewise, you can also calculate the density matrix that comes from
tracing over both A and C or both B and C.
You can then calculate the entanglement entropies for each density ma-
trix.
83
Now, to the particular case at hand,
A = all the Hawking radiation that came out before Alice jumped in
B = the next outgoing mode leaving the horizon
C = the infalling partner mode on the other side of the horizon
SAB ≥ SB + SA . (89)
The second fact we will use is that, because Alice is conducting this
experiment after the Page time, the emission of the B mode will decrease
the entanglement entropy.
SA > SAB
SA > SB + SA . (90)
SB > 0
giving us a contradiction.
Morally speaking, the above argument shows that BHC wants “too
much.” If all the information comes out of the black hole, then the
outgoing mode must be highly entangled with all the radiation that
already came out once the Page time has passed. But if we ascribe
84
to the “no drama” principle, then Alice shouldn’t need to know about
all that old radiation to describe what’s happening near the horizon.
The relevant degrees of freedom should be right in front of her, just like
Hawking thought originally.
(Another way people like to explain this paradox is to evoke some-
thing called the “monogamy of entanglement,” saying that the outgoing
mode can’t both be entangled with near-horizon degrees of freedom and
all the outgoing radiation.)
Now I’m sure there’s a question on your mind. Where does any
“Firewall” come into this? Well, one suggestion that the AMPS paper
makes to resolving the paradox is to say that the outgoing Hawking
mode isn’t entangled with any near-horizon degrees of freedom in the
way QFT predicts. In other words, they suggest ditching the no-drama
principle. As I discussed earlier in the notes, breaking entanglement on
short distances in quantum field theory means that the energy density
becomes extremely high, due to the gradient term in the Hamiltonian.
This would be the so-called “Firewall.” Perhaps it means that space-
time ends at the Horizon, and that you really can’t enter a black hole
after all.
One final thing I should mention is that Alice doesn’t actually have
to cross the horizon in order to figure out if the outgoing mode and the
infalling partner mode are entangled. It is enough for her to conduct
repeated measurements on multiple different outgoing modes. For ex-
ample, say you could conduct measurements on many spins, with the
knowledge that they were all prepared the same way. You may start by
conducting measurements using the observable σz . If all the measure-
ments come out to be +1, then you can be pretty sure that they were
all in the +1 eigenstate of σz . However, if half are +1 and the other
half are −1, then you don’t yet know if your states are in mixed state
or just in a superposition of σz eigenstates. You could then conduct
measurements with σx and σy on the remaining spins to figure out if
your states really were mixed they whole time. Going back to Alice, she
could try to detect superpositions between the different |nik,out states
for many different modes k. If there are no such superpositions, she
would deduce that the outgoing modes really are entangled with their
infalling partner modes without ever entering the black hole.
85
6 of [1]. The question we must ask is: why should Alice be allowed to
describe everything she sees using density matrices, anyway? Certainly,
in order to actually reach a contradiction, there first must be some mea-
surement she could conduct which could actually show that the outgoing
mode B really is entangled with all the old radiation A. But how can
she perform this measurement anyway?
In order to do this, she would have to first “distill the q-bits” in A
which are entangled with B. But doing that is not so easy. In fact, it
turns out that is a very difficult computation for a quantum computer
to do. It would probably take a quantum circuit of exponential size to
do, and by the time Alice finished, the black hole would have already
evaporated. That is, the problem is likely to be intractable. It takes
exponential time to distill the bit, but only polynomial time for the
black hole go away. More specifically, Harlow and Hayden showed that
if Alice is able to distill the entanglement in time, then SZK ⊆ BQP .
Apparently, computer scientists have many reasons to believe that that
is not the case.
This would be a pretty weird resolution to the Firewall paradox.
What happens if Alice just gets, like, really lucky and finishes her distil-
lation in time to jump in? (I should mention that not enough is known
about the Harlow Hayden resolution to know if such luck is really possi-
ble. However, it also cannot yet be ruled out.) Would the firewall exist
in that case? Computer scientists are fine with resolutions like Har-
low and Hayden’s, because they don’t really care about the case where
you’re just super lucky. It’s of no concern to them. But physicists are
not used to the laws of physics being altered so dramatically by luck,
even if the luck required is exponentially extreme. Can a whole region
of space-time really go away just like that?
References
[1] Scott Aaronson. The complexity of quantum states and trans-
formations: from quantum money to black holes. arXiv preprint
arXiv:1607.05256, 2016.
[3] Daniel Harlow. Jerusalem lectures on black holes and quantum in-
formation. Reviews of Modern Physics, 88(1):015002, 2016.
86
[4] Stephen W Hawking. Particle creation by black holes. Communica-
tions in mathematical physics, 43(3):199–220, 1975.
87