0% found this document useful (0 votes)
97 views556 pages

Ergodic Theory and Recurrence

Uploaded by

Soy Ignacio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views556 pages

Ergodic Theory and Recurrence

Uploaded by

Soy Ignacio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 556

1

Recurrence

Ergodic theory studies the behavior of dynamical systems with respect to


measures that remain invariant under time evolution. Indeed, it aims to describe
those properties that are valid for the trajectories of almost all initial states
of the system, that is, all but a subset that has zero weight for the invariant
measure. Our first task, in Section 1.1, will be to explain what we mean by
‘dynamical system’ and ‘invariant measure’.
The roots of the theory date back to the first half of the 19th century.
By 1838, the French mathematician Joseph Liouville observed that every
energy-preserving system in classical (Newtonian) mechanics admits a natural
invariant volume measure in the space of configurations. Just a bit later, in
1845, the great German mathematician Carl Friedrich Gauss pointed out that
the transformation
1
(0, 1] → R, x → fractional part of ,
x
which has an important role in number theory, admits an invariant measure
equivalent to the Lebesgue measure (in the sense that the two have the same
zero measure sets). These are two of the examples of applications of ergodic
theory that we discuss in Section 1.3. Many others are introduced throughout
this book.
The first important result was found by the great French mathematician
Henri Poincaré by the end of the 19th century. Poincaré was particularly
interested in the motion of celestial bodies, such as planets and comets, which
is described by certain differential equations originating from Newton’s law of
universal gravitation. Starting from Liouville’s observation, Poincaré realized
that for almost every initial state of the system, that is, almost every value of
the initial position and velocity, the solution of the differential equation comes
back arbitrarily close to that initial state, unless it goes to infinity. Even more,
this recurrence property is not specific to (celestial) mechanics: it is shared by
any dynamical system that admits a finite invariant measure. That is the theme
of Section 1.2.
2 Recurrence

The same theme reappears in Section 1.5, in a more elaborate context: there,
we deal with any finite number of dynamical systems commuting with each
other, and we seek simultaneous returns of the orbits of all those systems to the
neighborhood of the initial state. This kind of result has important applications
in combinatorics and number theory, as we will see.
The recurrence phenomenon is also behind the constructions that we present
in Section 1.4. The basic idea is to fix some positive measure subset of
the domain and to consider the first return to that subset. This first-return
transformation is often easier to analyze, and it may be used to shed much
light on the behavior of the original transformation.

1.1 Invariant measures


Let (M, B, μ) be a measure space and f : M → M be a measurable
transformation. We say that the measure μ is invariant under f if

μ(E) = μ(f −1 (E)) for every measurable set E ⊂ M. (1.1.1)

We also say that μ is f -invariant, or that f preserves μ, to mean just the


same. Notice that the definition (1.1.1) makes sense, since the pre-image of
a measurable set under a measurable transformation is still a measurable set.
Heuristically, the definition means that the probability that a point picked “at
random” is in a given subset is equal to the probability that its image is in that
subset.
It is possible, and convenient, to extend this definition to other types of
dynamical systems, beyond transformations. We are especially interested in
flows, that is, families of transformations f t : M → M, with t ∈ R, satisfying the
following conditions:

f 0 = id and f s+t = f s ◦ f t for every s, t ∈ R. (1.1.2)

In particular, each transformation f t is invertible and the inverse is f −t . Flows


arise naturally in connection with differential equations of the form

(t) = X(γ (t))
dt
in the following way: under suitable conditions on the vector field X, for each
point x in the domain M there exists exactly one solution t → γx (t) of the
differential equation with γx (0) = x; then f t (x) = γx (t) defines a flow in M.
We say that a measure μ is invariant under a flow (f t )t if it is invariant under
each one of the transformations f t , that is, if

μ(E) = μ(f −t (E)) for every measurable set E ⊂ M and t ∈ R. (1.1.3)


1.1 Invariant measures 3

Proposition 1.1.1. Let f : M → M be a measurable transformation and μ be


a measure on M. Then f preserves μ if and only if
 
φ dμ = φ ◦ f dμ (1.1.4)

for every μ-integrable function φ : M → R.

Proof. Suppose that the measure μ is invariant under f . We are going to show
that the relation (1.1.4) is valid for increasingly broader classes of functions.
Let XB denote the characteristic function of any measurable subset B. Then
  
μ(B) = XB dμ and μ(f −1 (B)) = Xf −1 (B) dμ = (XB ◦ f ) dμ.

Thus, the hypothesis μ(B) = μ(f −1 (B)) means that (1.1.4) is valid for
characteristic functions. Then, by linearity of the integral, (1.1.4) is valid for all
simple functions. Next, given any integrable φ : M → R, consider a sequence
(sn )n of simple functions, converging to φ and such that |sn | ≤ |φ| for every n.
That such a sequence exists is guaranteed by Proposition A.1.33. Then, using
the dominated convergence theorem (Theorem A.2.11) twice:
   
φ dμ = lim sn dμ = lim (sn ◦ f ) dμ = (φ ◦ f ) dμ.
n n

This shows that (1.1.4) holds for every integrable function if μ is invariant.
The converse is also contained in the arguments we just presented.

1.1.1 Exercises
1.1.1. Let f : M → M be a measurable transformation. Show that a Dirac measure δp is
invariant under f if and only if p is a fixed point of f . More generally, a probability
 
measure δp,k = k−1 δp + δf (p) + · · · + δf k−1 (p) is invariant under f if and only if
f k (p) = p.
1.1.2. Prove the following version of Proposition 1.1.1. Let M be a metric space, f :
M → M be a measurable transformation and μ be a measure on M. Show that f
preserves μ if and only if
 
φ dμ = φ ◦ f dμ

for every bounded continuous function φ : M → R.


1.1.3. Prove that if f : M → M preserves a measure μ then, given any k ≥ 2, the iterate
f k also preserves μ. Is the converse true?
1.1.4. Suppose that f : M → M preserves a probability measure μ. Let B ⊂ M be a
measurable set satisfying any one of the following conditions:
(a) μ(B \ f −1 (B)) = 0;
(b) μ(f −1 (B) \ B) = 0;
(c) μ(B f −1 (B)) = 0;
(d) f (B) ⊂ B.
Show that there exists C ⊂ M such that f −1 (C) = C and μ(B C) = 0.
4 Recurrence

1.1.5. Let f : U → U be a C1 diffeomorphism on an open set U ⊂ Rd . Show that the


Lebesgue measure m is invariant under f if and only if | det Df | ≡ 1.

1.2 Poincaré recurrence theorem


We are going to study two versions of Poincaré’s theorem. The first one
(Section 1.2.1) is formulated in the context of (finite) measure spaces.
The theorem of Kac̆, that we state and prove in Section 1.2.2, provides
a quantitative complement to that statement. The second version of the
recurrence theorem (Section 1.2.3) assumes that the ambient is a topological
space with certain additional properties. We will also prove a third version of
the recurrence theorem, due to Birkhoff, whose statement is purely topological.

1.2.1 Measurable version


Our first result asserts that, given any finite invariant measure, almost every
point in any positive measure set E returns to E an infinite number of times:

Theorem 1.2.1 (Poincaré recurrence). Let f : M → M be a measurable


transformation and μ be a finite measure invariant under f . Let E ⊂ M be
any measurable set with μ(E) > 0. Then, for μ-almost every point x ∈ E there
exist infinitely many values of n for which f n (x) is also in E.

Proof. Denote by E0 the set of points x ∈ E that never return to E. As a first


step, let us prove that E0 has zero measure. To this end, let us observe that the
pre-images f −n (E0 ) are pairwise disjoint. Indeed, suppose there exist m > n ≥ 1
such that f −m (E0 ) intersects f −n (E0 ). Let x be a point in the intersection and
y = f n (x). Then y ∈ E0 and f m−n (y) = f m (x) ∈ E0 . Since E0 ⊂ E, this means
that y returns to E at least once, which contradicts the definition of E0 . This
contradiction proves that the pre-images are pairwise disjoint, as claimed.
Since μ is invariant, we also have that μ(f −n (E0 )) = μ(E0 ) for all n ≥ 1. It
follows that
 ∞   ∞ ∞
−n −n
μ f (E0 ) = μ(f (E0 )) = μ(E0 ).
n=1 n=1 n=1

The expression on the left-hand side is finite, since the measure μ is assumed
to be finite. On the right-hand side we have a sum of infinitely many terms that
are all equal. The only way such a sum can be finite is if the terms vanish. So,
μ(E0 ) = 0 as claimed.
Now let us denote by F the set of points x ∈ E that return to E a finite number
of times. It is clear from the definition that every point x ∈ F has some iterate
1.2 Poincaré recurrence theorem 5

f k (x) in E0 . In other words,




F⊂ f −k (E0 ).
k=0

Since μ(E0 ) = 0 and μ is invariant, it follows that


 ∞   ∞ ∞
−k
 −k  
μ(F) ≤ μ f (E0 ) ≤ μ f (E0 ) = μ(E0 ) = 0.
k=0 k=0 k=0

Thus, μ(F) = 0 as we wanted to prove.

Theorem 1.2.1 implies an analogous result for continuous time systems: if


μ is a finite invariant measure of a flow (f t )t then for every measurable set
E ⊂ M with positive measure and for μ-almost every x ∈ E there exist times
tj → +∞ such that f tj (x) ∈ E. Indeed, if μ is invariant under the flow then, in
particular, it is invariant under the so-called time-1 map f 1 . So, the statement
we just made follows immediately from Theorem 1.2.1 applied to f 1 (the times
tj one finds in this way are integers). Similar observations apply to the other
versions of the recurrence theorem that we present in the sequel.
On the other hand, the theorem in the next section is specific to discrete time
systems.

1.2.2 Kac̆ theorem


Let f : M → M be a measurable transformation and μ be a finite measure
invariant under f . Let E ⊂ M be any measurable set with μ(E) > 0. Consider
the first-return time function ρE : E → N ∪ {∞}, defined by

ρE (x) = min{n ≥ 1 : f n (x) ∈ E} (1.2.1)

if the set on the right-hand side is non-empty and ρE (x) = ∞ if, on the contrary,
x has no iterate in E. According to Theorem 1.2.1, the second alternative occurs
only on a set with zero measure.
The next result shows that this function is integrable and even provides the
value of the integral. For the statement we need the following notation:
E0 = {x ∈ E : f n (x) ∈
/ E for every n ≥ 1} and
E0∗ = {x ∈ M : f n (x) ∈
/ E for every n ≥ 0}.
In other words, E0 is the set of points in E that never return to E and E0∗ is
the set of points in M that never enter E. We have seen in Theorem 1.2.1 that
μ(E0 ) = 0.

Theorem 1.2.2 (Kac̆). Let f : M → M be a measurable transformation, μ be a


finite invariant measure and E ⊂ M be a positive measure set. Then the function
6 Recurrence

ρE is integrable and

ρE dμ = μ(M) − μ(E0∗ ).
E

Proof. For each n ≥ 1, define


En = {x ∈ E : f (x) ∈
/ E, . . . , f n−1 (x) ∈
/ E, but f n (x) ∈ E} and
En∗ = {x ∈ M : x ∈
/ E, f (x) ∈
/ E, . . . , f n−1 (x) ∈
/ E, but f n (x) ∈ E}.
That is, En is the set of points of E that return to E for the first time exactly at
time n,
En = {x ∈ E : ρE (x) = n},
and En∗ is the set points that are not in E and enter E for the first time exactly at
time n. It is clear that these sets are measurable and, hence, ρE is a measurable
function. Moreover, the sets En , En∗ , n ≥ 0 constitute a partition of the ambient
space: they are pairwise disjoint and their union is the whole of M. So,

 ∞

 ∗
 ∗
 
μ(M) = μ(En ) + μ(En ) = μ(E0 ) + μ(En ) + μ(En∗ ) . (1.2.2)
n=0 n=1

Now observe that

f −1 (En∗ ) = En+1

∪ En+1 for every n. (1.2.3)

Indeed, f (y) ∈ En∗ means that the first iterate of f (y) that belongs to E is

f n (f (y)) = f n+1 (y) and that occurs if and only if y ∈ En+1 or else y ∈ En+1 .
This proves the equality (1.2.3). So, given that μ is invariant,

μ(En∗ ) = μ(f −1 (En∗ )) = μ(En+1



) + μ(En+1 ) for every n.

Applying this relation successively, we find that



m
μ(En∗ ) = μ(Em∗ ) + μ(Ei ) for every m > n. (1.2.4)
i=n+1

The relation (1.2.2) implies that μ(Em∗ ) → 0 when m → ∞. So, taking the limit
as m → ∞ in the equality (1.2.4), we find that


μ(En∗ ) = μ(Ei ). (1.2.5)
i=n+1

To complete the proof, replace (1.2.5) in the equality (1.2.2). In this way we
find that
∞ 
 ∞   ∞ 
μ(M) − μ(E0∗ ) = μ(Ei ) = nμ(En ) = ρE dμ,
n=1 i=n n=1 E

as we wanted to prove.
1.2 Poincaré recurrence theorem 7

In some cases, for example when the system (f , μ) is ergodic (this property
will be defined and studied later, starting from Chapter 4), the set E0∗ has zero
measure. Then the conclusion of the Kac̆ theorem means that

1 μ(M)
ρE dμ = (1.2.6)
μ(E) E μ(E)
for every measurable set E with positive measure. The left-hand side of this
expression is the mean return time to E. So, (1.2.6) asserts that the mean return
time is inversely proportional to the measure of E.
Remark 1.2.3. By definition, En∗ = f −n (E) \ n−1
k=0 f
−k
(E). So, the fact that the

sum (1.2.2) is finite implies that the measure of En converges to zero when
n → ∞. This fact will be useful later.

1.2.3 Topological version


Now let us suppose that M is a topological space, endowed with its Borel
σ -algebra B. A point x ∈ M is recurrent for a transformation f : M → M
if there exists a sequence nj → ∞ of natural numbers such that f nj (x) → x.
Analogously, we say that x ∈ M is recurrent for a flow (f t )t if there exists a
sequence tj → +∞ of real numbers such that f tj (x) → x when j → ∞.
In the next theorem we assume that the topological space M admits a
countable basis of open sets, that is, there exists a countable family {Uk : k ∈ N}
of open sets such that every open subset of M may be written as a union of
elements Uk of this family. This condition holds in most interesting examples.
Theorem 1.2.4 (Poincaré recurrence). Suppose that M admits a countable
basis of open sets. Let f : M → M be a measurable transformation and μ be a
finite measure on M invariant under f . Then, μ-almost every x ∈ M is recurrent
for f .

Proof. For each k, denote by Ũk the set of points x ∈ Uk that never return to
Uk . According to Theorem 1.2.1, every Ũk has zero measure. Consequently,
the countable union 
Ũ = Ũk
k∈N
also has zero measure. Hence, to prove the theorem it suffices to check that
every point x that is not in Ũ is recurrent. That is easy, as we are going to see.
Consider x ∈ M \ Ũ and let U be any neighborhood of x. By definition, there
exists some element Uk of the basis of open sets such that x ∈ Uk and Uk ⊂ U.
Since x is not in Ũ, we also have that x ∈ / Ũk . In other words, there exists n ≥ 1
such that f n (x) is in Uk . In particular, f n (x) is also in U. Since the neighborhood
U is arbitrary, this proves that x is a recurrent point.

Let us point out that the conclusions of Theorems 1.2.1 and 1.2.4 are false,
in general, if the measure μ is not finite:
8 Recurrence

Example 1.2.5. Let f : R → R be the translation by 1, that is, the


transformation defined by f (x) = x + 1 for every x ∈ R. It is easy to check
that f preserves the Lebesgue measure on R (which is infinite). On the other
hand, no point x ∈ R is recurrent for f . According to the recurrence theorem,
this last observation implies that f can not admit any finite invariant measure.

However, it is possible to extend these statements for certain cases of infinite


measures: see Exercise 1.2.2.
To conclude, we present a purely topological version of Theorem 1.2.4,
called the Birkhoff recurrence theorem, that makes no reference at all to
invariant measures:

Theorem 1.2.6 (Birkhoff recurrence). If f : M → M is a continuous


transformation on a compact metric space M then there exists some point x ∈ X
that is recurrent for f .

Proof. Consider the family I of all non-empty closed sets X ⊂ M that are
invariant under f , in the sense that f (X) ⊂ X. This family is non-empty, since
M ∈ I. We claim that an element X ∈ I is minimal for the inclusion relation
if and only if the orbit of every x ∈ X is dense in X. Indeed, it is clear that if
X is a closed invariant subset then X contains the closure of the orbit of each
one of its elements. Hence, in order to be minimal, X must coincide with every
one of these closures. Conversely, for the same reason, if X coincides with the
orbit closure of each one of its points then it has no proper subset that is closed
and invariant. That is, X is minimal. This proves our claim. In particular, every
point x in a minimal set is recurrent. Therefore, to prove the theorem it suffices
to prove that there exists some minimal set.
We claim that every totally ordered set {Xα } ⊂ I admits a lower bound.
Indeed, consider X = α Xα . Observe that X is non-empty, since the Xα are
compact and they form a totally ordered family. It is clear that X is closed and
invariant under f and it is equally clear that X is a lower bound for the set {Xα }.
That proves our claim. Now it follows from Zorn’s lemma that I does contain
minimal elements.

Theorem 1.2.6 can also be deduced from Theorem 1.2.4 together with
the fact, which we will prove later (in Chapter 2), that every continuous
transformation on a compact metric space admits some invariant probability
measure.

1.2.4 Exercises
1.2.1. Show that the following statement is equivalent to Theorem 1.2.1, meaning that
each one of them can be obtained from the other. Let f : M → M be a measurable
transformation and μ be a finite invariant measure. Let E ⊂ M be any measurable
1.2 Poincaré recurrence theorem 9

set with μ(E) > 0. Then there exists N ≥ 1 and a positive measure set D ⊂ E such
that f N (x) ∈ E for every x ∈ D.
1.2.2. Let f : M → M be an invertible transformation and suppose that μ is an invariant
measure, not necessarily finite. Let B ⊂ M be a set with finite measure. Prove
that, given any measurable set E ⊂ M with positive measure, μ-almost every
point x ∈ E either returns to E an infinite number of times or has only a finite
number of iterates in B.
1.2.3. Let f : M → M be an invertible transformation and suppose that μ is a σ -finite
invariant measure: there exists an increasing sequence of measurable subsets Mk
with μ(Mk ) < ∞ for every k and k Mk = M. We say that a point x goes to
infinity if, for every k, there exists only a finite number of iterates of x that are
in Mk . Show that, given any E ⊂ M with positive measure, μ-almost every point
x ∈ E returns to E an infinite number of times or else goes to infinity.
1.2.4. Let f : M → M be a measurable transformation, not necessarily invertible, μ be
an invariant probability measure and D ⊂ M be a set with positive measure. Prove
that almost every point of D spends a positive fraction of time in D:
1
lim sup #{0 ≤ j ≤ n − 1 : f j (x) ∈ D} > 0
n n
for μ-almost every x ∈ D. [Note: One may replace lim sup by lim inf in the
statement, but the proof of that fact will have to wait until Chapter 3.]
1.2.5. Let f : M → M be a measurable transformation preserving a finite measure μ.
Given any measurable set A ⊂ M with μ(A) > 0, let n1 < n2 < · · · be the sequence
of values of n such that μ(f −n (A) ∩ A) > 0. The goal of this exercise is to prove
that VA = {n1 , n2 , . . . } is a syndetic, that is, that there exists C > 0 such that ni+1 −
ni ≤ C for every i.
(a) Show that for any increasing sequence k1 < k2 < · · · there exist j > i ≥ 1 such
that μ(A ∩ f −(kj −ki ) (A)) > 0.
(b) Given any infinite sequence = (lj )j of natural numbers, denote by S( ) the
set of all finite sums of consecutive elements of . Show that VA intersects
S( ) for every .
(c) Deduce that the set VA is syndetic.
[Note: Exercise 3.1.2 provides a different proof of this fact.]
1.2.6. Show that if f : [0, 1] → [0, 1] is a measurable transformation preserving the
Lebesgue measure m then m-almost every point x ∈ [0, 1] satisfies
lim inf n|f n (x) − x| ≤ 1.
n

[Note: Boshernitzan [Bos93] proved a much more general result, namely that
lim infn n1/d d(f n (x), x) < ∞ for μ-almost every point and every probability
measure μ invariant under f : M → M, assuming M is a separable metric whose
d-dimensional Hausdorff measure is σ -finite.]
1.2.7. Define f : [0, 1] →√[0, 1] by f (x) = (x + ω) − [x + ω], where ω represents the
golden ratio (1 + 5)/2. Given x ∈ [0, 1], check that n|f n (x) − x| = n2 |ω − qn |
for every n, where (qn )n → ω is the sequence of rational numbers given by qn =
[x + nω]/n. √ Using that the roots of the polynomial R(z)
√ = z2 − z − 1 are precisely
ω and ω − 5, prove that lim infn n2 |ω − qn | ≥ 1/ 5. [Note: This shows that
the constant 1 in Exercise 1.2.6 cannot be replaced by any constant smaller than
10 Recurrence

1/ 5. It is not known whether 1 is the smallest constant such that the statement
holds for every transformation on the interval.]

1.3 Examples
Next, we describe some simple examples of invariant measures for transforma-
tions and flows that help us interpret the significance of the Poincaré recurrence
theorems and also lead to some interesting conclusions.

1.3.1 Decimal expansion


Our first example is the transformation defined on the interval [0, 1] in the
following way:

f : [0, 1] → [0, 1], f (x) = 10x − [10x].

Here and in what follows, we use [y] as the integer part of a real number y,
that is, the largest integer smaller than or equal y. So, f is the map sending
each x ∈ [0, 1] to the fractional part of 10x. Figure 1.1 represents the graph
of f .
We claim that the Lebesgue measure μ on the interval is invariant under the
transformation f , that is, it satisfies

μ(E) = μ(f −1 (E)) for every measurable set E ⊂ M. (1.3.1)

This can be checked as follows. Let us begin by supposing that E is an interval.


Then, as illustrated in Figure 1.1, its pre-image f −1 (E) consists of ten intervals,
each of which is ten times shorter than E. Hence, the Lebesgue measure of
f −1 (E) is equal to the Lebesgue measure of E. This proves that (1.3.1) does
hold in the case of intervals. As a consequence, it also holds when E is a finite

0 2/10 4/10 6/10 8/10 1


Figure 1.1. Fractional part of 10x
1.3 Examples 11

union of intervals. Now, the family of all finite unions of intervals is an algebra
that generates the Borel σ -algebra of [0, 1]. Hence, to conclude the proof it is
enough to use the following general fact:

Lemma 1.3.1. Let f : M → M be a measurable transformation and μ be a


finite measure on M. Suppose that there exists some algebra A of measurable
subsets of M such that A generates the σ -algebra B of M and μ(E) =
μ(f −1 (E)) for every E ∈ A. Then the latter remains true for every set E ∈ B,
that is, the measure μ is invariant under f .

Proof. We start by proving that C = {E ∈ B : μ(E) = μ(f −1 (E))} is a monotone


class. Let E1 ⊂ E2 ⊂ · · · be any increasing sequence of elements of C and let
E= ∞ i=1 Ei . By Theorem A.1.14 (see Exercise A.1.9),

μ(E) = lim μ(Ei ) and μ(f −1 (E)) = lim μ(f −1 (Ei )).
i i

So, using the fact that Ei ∈ C,


μ(E) = lim μ(Ei ) = lim μ(f −1 (Ei )) = μ(f −1 (E)).
i i

Hence, E ∈ C. In precisely the same way, one gets that the intersection of any
decreasing sequence of elements of C is in C. This proves that C is indeed a
monotone class.
Now it is easy to deduce the conclusion of the lemma. Indeed, since
C is assumed to contain A, we may use the monotone class theorem
(Theorem A.1.18), to conclude that C contains the σ -algebra B generated by
A. That is precisely what we wanted to prove.

Now we explain how one may use the fact that the Lebesgue measure is
invariant under f , together with the Poincaré recurrence theorem, to reach some
interesting conclusions. The transformation f is directly related to the usual
decimal expansion of a real number: if x is given by
x = 0.a0 a1 a2 a3 · · ·
with ai ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and ai = 9 for infinitely many values of i, then
its image under f is given by
f (x) = 0.a1 a2 a3 · · · .
Thus, more generally, the n-th iterate of f can be expressed in the following
way, for every n ≥ 1:
f n (x) = 0.an an+1 an+2 · · · (1.3.2)
Let E be the subset of points x ∈ [0, 1] whose decimal expansion starts with
the digit 7, that is, such that a0 = 7. According to Theorem 1.2.1, almost every
element in E has infinitely many iterates that are also in E. By the expression
(1.3.2), this means that there are infinitely many values of n such that an = 7.
12 Recurrence

So, we have shown that almost every number x whose decimal expansion starts
with 7 has infinitely many digits equal to 7.
Of course, instead of 7 we may consider any other digit. Even more, there
is a similar result (see Exercise 1.3.2) when, instead of a single digit, one
considers a block of k ≥ 1 consecutive digits. Later on, in Chapter 3, we will
prove a much stronger fact: for almost every number x ∈ [0, 1], every digit
occurs with frequency 1/10 (more generally, every block of k ≥ 1 digits occurs
with frequency 1/10k ) in the decimal expansion of x.

1.3.2 Gauss map


The system we present in this section is related to another important algorithm
in number theory, the continued fraction expansion, which plays a central role
in the problem of finding the best rational approximation to any real number.
Let us start with a brief presentation of this algorithm.
Given any number x0 ∈ (0, 1), let

1 1
a1 = and x1 = − a1 .
x0 x0

Note that a1 is a natural number, x1 ∈ [0, 1) and


1
x0 = .
a1 + x1
Supposing that x1 is different from zero, we may repeat this procedure, defining

1 1
a2 = and x2 = − a2 .
x1 x1

Then
1 1
x1 = and so x0 = .
a1 + x2 1
a1 +
a2 + x2
Now we may proceed by induction: for each n ≥ 1 such that xn−1 ∈ (0, 1),
define
1 1
an = and xn = − an = G(xn−1 ),
xn−1 xn−1
and observe that
1
x0 = . (1.3.3)
1
a1 +
1
a2 +
1
··· +
an + xn
1.3 Examples 13

It can be shown that the sequence


1
zn = (1.3.4)
1
a1 +
1
a2 +
1
··· +
an
converges to x0 when n → ∞. This is usually expressed through the expression
1
x0 = , (1.3.5)
1
a1 +
1
a2 +
1
··· +
1
an +
···
which is called continued fraction expansion of x0 .
Note that the sequence (zn )n defined by the relation (1.3.4) consists of
rational numbers. Indeed, one can show that these are the best rational
approximations of the number x0 , in the sense that each zn is closer to x0 than
any other rational number whose denominator is smaller than or equal to the
denominator of zn (written in irreducible form). Observe also that to obtain
(1.3.5) we had to assume that xn ∈ (0, 1) for every n ∈ N. If in the course of the
process one encounters some xn = 0, then the algorithm halts and we consider
(1.3.3) to be the continued fraction expansion of x0 . It is clear that this can
happen only if x0 itself is a rational number.
This continued fraction algorithm is intimately related to a certain dynamical
system on the interval [0, 1] that we describe in the following. The Gauss map
G : [0, 1] → [0, 1] is defined by
1 1
G(x) = − = fractional part of 1/x,
x x
if x ∈ (0, 1] and G(0) = 0. The graph of G can be easily sketched (see
Figure 1.2), starting from the following observation: for every x in each interval
Ik = (1/(k + 1), 1/k], the integer part of 1/x is equal to k and so G(x) = 1/x − k.
The continued fraction expansion of any number x0 ∈ (0, 1) can be obtained
from the Gauss map in the following way: for each n ≥ 1, the natural number
an is determined by
Gn−1 (x0 ) ∈ Ian ,
and the real number xn is simply the n-th iterate Gn (x0 ) of the point x0 .
This process halts whenever we encounter some xn = 0; as we explained
previously, this can only happen if x0 is a rational number (see Exercise 1.3.4).
In particular, there exists a full Lebesgue measure subset of (0, 1) such that all
the iterates of G are defined for all the points in that subset.
14 Recurrence

0 1/4 1/3 1/2 1


Figure 1.2. Gauss map

A remarkable fact that makes this transformation interesting from the point
of view of ergodic theory is that G admits an invariant probability measure
that, in addition, is equivalent to the Lebesgue measure on the interval. Indeed,
consider the measure defined by

c
μ(E) = dx for every measurable set E ⊂ [0, 1], (1.3.6)
E 1+x

where c is a positive constant. The integral is well defined, since the function
in the integral is continuous on the interval [0, 1]. Moreover, this function takes
values inside the interval [c/2, c] and that implies
c
m(E) ≤ μ(E) ≤ c m(E) for every measurable set E ⊂ [0, 1]. (1.3.7)
2
In particular, μ is indeed equivalent to the Lebesgue measure m.

Proposition 1.3.2. The measure μ is invariant under G. Moreover, if we


choose c = 1/ log 2 then μ is a probability measure.

Proof. We are going to use the following lemma:

Lemma 1.3.3. Let f : [0, 1] → [0, 1] be a transformation such that there exist
pairwise disjoint open intervals I1 , I2 , . . . satisfying

1. the union k Ik has full Lebesgue measure in [0, 1] and


2. the restriction fk = f | Ik to each Ik is a diffeomorphism onto (0, 1).

Let ρ : [0, 1] → [0, ∞) be an integrable function (relative to the Lebesgue


measure) satisfying
 ρ(x)
ρ(y) =  (x)|
(1.3.8)
−1
|f
x∈f (y)

for almost every y ∈ [0, 1]. Then the measure μ = ρdx is invariant under f .
1.3 Examples 15

Proof. Let φ = χE be the characteristic function of an arbitrary measurable set


E ⊂ [0, 1]. Changing variables in the integral,
  1
φ(f (x))ρ(x) dx = φ(y)ρ(fk−1 (y))|(fk−1 ) (y)| dy.
Ik 0

Note that (fk−1 ) (y) = 1/f  (fk−1 (y)). So, the previous relation implies that
 1 ∞ 

φ(f (x))ρ(x) dx = φ(f (x))ρ(x) dx
0 k=1 Ik

∞ 
(1.3.9)
 1
ρ(fk−1 (y))
= φ(y) dy.
k=1 0 |f  (fk−1 (y))|

Using the monotone convergence theorem (Theorem A.2.9) and the hypothesis
(1.3.8), we see that the last expression in (1.3.9) is equal to
 ∞
 
1
ρ(fk−1 (y)) 1
φ(y) dy = φ(y)ρ(y) dy.
0 k=1
|f  (fk−1 (y))| 0

1 1
In this way we find that 0 φ(f (x))ρ(x) dx = 0 φ(y)ρ(y) dy. Since μ = ρdx and
φ = XE , this means that μ(f −1 (E)) = μ(E) for every measurable set E ⊂ [0, 1].
In other words, μ is invariant under f .

To conclude the proof of Proposition 1.3.2 we must show that the condition
(1.3.8) holds for ρ(x) = c/(1 + x) and f = G. Let Gk denote the restriction of
G to the interval Ik = (1/(k + 1), 1/k), for k ≥ 1. Note that G−1
k (y) = 1/(y + k)
 
for every k. Note also that G (x) = (1/x) = −1/x for every x  = 0. Therefore,
2

∞ ∞  2  ∞
ρ(G−1 (y)) c(y + k) 1 c
k
−1
= = .
k=1
|G  (G (y))|
k k=1
y + k + 1 y + k k=1
(y + k)(y + k + 1)
(1.3.10)
Observing that
1 1 1
= − ,
(y + k)(y + k + 1) y + k y + k + 1
we see that the last sum in (1.3.10) has a telescopic structure: except for the
first one, all the terms occur twice, with opposite signs, and so they cancel out.
This means that the sum is equal to the first term:

 c c
= = ρ(y).
k=1
(y + k)(y + k + 1) y + 1

This proves that the equality (1.3.8) is indeed satisfied and, hence, we may use
Lemma 1.3.1 to conclude that μ is invariant under f .
16 Recurrence

Finally, observing that c log(1 + x) is a primitive of the function ρ(x), we


find that  1
c
μ([0, 1]) = dx = c log 2.
0 1+x
So, picking c = 1/ log 2 ensures that μ is a probability measure.

This proposition allows us to use ideas from ergodic theory, applied to the
Gauss map, to obtain interesting conclusions in number theory. For example
(see Exercise 1.3.3), the natural number 7 occurs infinitely many times in the
continued fraction expansion of almost every number x0 ∈ (1/8, 1/7), that is,
one has an = 7 for infinitely many values of n ∈ N. Later on, in Chapter 3,
we will prove a much more precise statement, that contains the following
conclusion: for almost every x0 ∈ (0, 1) the number 7 occurs with frequency
1 64
log
log 2 63
in the continued fraction expansion of x0 . Try to guess right away where this
number comes from!

1.3.3 Circle rotations


Let us consider on the real line R the equivalence relation ∼ that identifies any
numbers whose difference is an integer number:
x∼y ⇔ x − y ∈ Z.
We represent by [x] ∈ R/Z the equivalence class of each x ∈ R and denote
by R/Z the space of all equivalence classes. This space is called the circle
and is also denoted by S1 . The reason for this terminology is that R/Z can be
identified in a natural way with the unit circle {z ∈ C : |z| = 1} on the complex
plane, by means of the map
φ : R/Z → {z ∈ C : |z| = 1}, [x] → e2πxi . (1.3.11)
Note that φ is well defined: since the function x → e2πxi is periodic of period
1, the expression e2πxi does not depend on the choice of a representative x for
the class [x]. Moreover, φ is a bijection.
The circle R/Z inherits from the real line R the structure of an abelian group,
given by the operation
[x] + [y] = [x + y].
Observe that this is well defined: the equivalence class on the right-hand side
does not depend on the choice of representatives x and y for the classes on the
left-hand side. Given θ ∈ R, we call rotation of angle θ the transformation
Rθ : R/Z → R/Z, [x] → [x + θ ] = [x] + [θ ].
1.3 Examples 17

Note that Rθ corresponds, via the identification (1.3.11), to the transformation


z → e2πθi z on {z ∈ C : |z| = 1}. The latter is just the restriction to the unit circle
of the rotation of angle 2π θ around the origin in the complex plane. It is clear
from the definition that R0 is the identity map and Rθ ◦ Rτ = Rθ+τ for every θ
and τ . In particular, every Rθ is invertible and the inverse is R−θ .
We can also endow S1 with a natural structure of a probability space, as
follows. Let π : R → S1 be the canonical projection, that assigns to each x ∈ R
its equivalence class [x]. We say that a set E ⊂ S1 is measurable if π −1 (E) is a
measurable subset of the real line. Next, let m be the Lebesgue measure on the
real line. We define the Lebesgue measure μ on the circle to be given by
 
μ(E) = m π −1 (E) ∩ [k, k + 1) for every k ∈ Z.
Note that the left-hand side of this  equality does not depend on k, since, by
definition, π −1 (E) ∩ [k, k + 1) = π −1 (E) ∩ [0, 1) + k and the measure m is
invariant under translations.
It is clear from the definition that μ is a probability. Moreover, μ is invariant
under every rotation Rθ (according to Exercise 1.3.8, it is the only probability
measure with this property). This can be shown as follows. By definition,
π −1 (R−1 −1
θ (E)) = π (E) − θ for every measurable set E ⊂ S . Let k be the
1

integer part of θ . Since m is invariant under all the translations,


   
m (π −1 (E) − θ ) ∩ [0, 1) = m π −1 (E) ∩ [θ , θ + 1)
   
= m π −1 (E) ∩ [θ , k + 1) + m π −1 (E) ∩ [k + 1, θ + 1) .
 
Note that π −1 (E) ∩ [k + 1, θ + 1) = π −1 (E) ∩ [k, θ ) + 1. So, the expression
on the right-hand side of the previous equality may be written as
     
m π −1 (E) ∩ [θ , k + 1) + m π −1 (E) ∩ [k, θ ) = m π −1 (E) ∩ [k, k + 1) .
Combining these two relations we find that
   −1 −1   −1 
μ R−1
θ (E) = m π (Rθ (E) ∩ [0, 1)) = m π (E) ∩ [k, k + 1) = μ(E)
for every measurable set E ⊂ S1 .
The rotations Rθ : S1 → S1 exhibit two very different types of dynamical
behavior, depending on the value of θ . If θ is rational, say θ = p/q with p ∈ Z
and q ∈ N, then
q
Rθ ([x]) = [x + qθ] = [x] for every [x].
Consequently, in this case every point x ∈ S1 is periodic with period q. In the
opposite case we have:
Proposition 1.3.4. If θ is irrational then O([x]) = {Rnθ ([x]) : n ∈ N} is a dense
subset of the circle R/Z for every [x].

Proof. We claim that the set D = {m + nθ : m ∈ Z, n ∈ N} is dense in R. Indeed,


consider any number r ∈ R. Given any ε > 0, we may choose p ∈ Z and q ∈ N
18 Recurrence

such that |qθ − p| < ε. Note that the number a = qθ − p is necessarily different
from zero, since θ is irrational. Let us suppose that a is positive (the case when
a is negative is analogous). Subdividing the real line into intervals of length a,
we see that there exists an integer l such that 0 ≤ r − la < a. This implies that
|r − (lqθ − lp)| = |r − la| < a < ε.
As m = lq and n = −lq are integers and ε is arbitrary, this proves that r is in
the closure of the set D, for every r ∈ R.
Now, given y ∈ R and ε > 0, we may take r = y − x and, using the previous
paragraph, we may find m, n ∈ Z such that |m + nθ − (y − x)| < ε. This is
equivalent to saying that the distance from [y] to the iterate Rnθ ([x]) is less than
ε. Since x, y and ε are arbitrary, this shows that every orbit O([x]) is dense
in S1 .

In particular, it follows that every point on the circle is recurrent for Rθ (this
is also true when θ is rational). The previous proposition also leads to some
interesting conclusions in the study of the invariant measures of Rθ . Among
other things, we will learn later (in Chapter 6) that if θ is irrational then the
Lebesgue measure is the unique probability measure that is preserved by Rθ .
Related to this, we will see that the orbits of Rθ are uniformly distributed
subsets of S1 .

1.3.4 Rotations on tori


The notions we just presented can be generalized to arbitrary dimension, as we
are going to explain. For each d ≥ 1, consider the equivalence relation on Rd
that identifies any two vectors whose difference is an integer vector:
(x1 , . . . , xd ) ∼ (y1 , . . . , yd ) ⇔ (x1 − y1 , . . . , xd − yd ) ∈ Zd .
We denote by [x] or [(x1 , . . . , xd )] the equivalence class of any x = (x1 , . . . , xd ).
Then we call the d-dimensional torus, or simply the d-torus, the space
Td = Rd /Zd = (R/Z)d
formed by those equivalence classes. Let m be the Lebesgue measure on Rd .
The operation
[(x1 , . . . , xd )] + [(y1 , . . . , yd )] = [(x1 + y1 , . . . , xd + yd )]
is well defined and turns Td into an abelian group. Given θ = (θ1 , . . . , θd ) ∈ Rd ,
we call
Rθ : Td → Td , Rθ ([x]) = [x] + [θ ]
the rotation by θ (sometimes, Rθ is also called the translation by θ ). The map
φ : [0, 1]d → Td , (x1 , . . . , xd ) → [(x1 , . . . , xd )]
1.3 Examples 19

is surjective and allows us to define a Lebesgue probability measure μ on the


d-torus, through the following formula:
 
μ(B) = m φ −1 (B) for every B ⊂ Td such that φ −1 (B) is measurable.
This measure μ is invariant under Rθ for every θ .
We say that a vector θ = (θ1 , . . . , θd ) ∈ Rd is rationally independent if, for
any integer numbers n0 , n1 , . . . , nd , we have that
n0 + n1 θ1 + · · · + nd θd = 0 ⇒ n0 = n1 = · · · = nd = 0.
Otherwise, we say that θ is rationally dependent. One can show that θ is
rationally independent if and only if the rotation Rθ is minimal, meaning that
the orbit O([x]) = {Rnθ ([x]) : n ∈ N} of every [x] ∈ Td is a dense subset of Td .
In this regard, see Exercises 1.3.9–1.3.10 and also Corollary 4.2.3.

1.3.5 Conservative maps


Let M be an open subset of the Euclidian space Rd and f : M → M be a
C1 diffeomorphism. This means that f is a bijection, both f and its inverse
f −1 are differentiable and the two derivatives are continuous. Denote by vol
the restriction to M of the Lebesgue measure (volume measure) on Rd . The
formula of change of variables asserts that, for any measurable set B ⊂ M,

vol(f (B)) = | det Df | dx. (1.3.12)
B
One can easily deduce the following consequence:
Lemma 1.3.5. A C1 diffeomorphism f : M → M preserves the volume measure
vol if and only if the absolute value | det Df | of its Jacobian is equal to 1 at
every point.

Proof. Suppose that the absolute value | det Df | of its Jacobian is equal to 1
at every point. Let E be any measurable set E and B = f −1 (E). The formula
(1.3.12) yields

vol(E) = 1 dx = vol(B) = vol(f −1 (E)).
B
This means that f preserves the measure vol and so we proved the “if” part of
the statement.
To prove the “only if,” suppose that | det Df (x)| > 1 for some point x ∈ M.
Then, since the Jacobian is continuous, there exists a neighborhood U of x and
some number σ > 1 such that
| det Df (y)| ≥ σ for all y ∈ U.
Then, applying (1.3.12) to B = U, we get that

vol(f (U)) ≥ σ dx ≥ σ vol(U).
U
20 Recurrence

Denote E = f (U). Since vol(U) > 0, the previous inequality implies that
vol(E) > vol(f −1 (E)). Hence, f does not leave vol invariant. In precisely the
same way, one shows that if | det Df (x)| < 1 for some point x ∈ M then f does
not leave the measure vol invariant.

1.3.6 Conservative flows


Now we discuss the invariance of the volume measure in the setting of flows
f t : M → M, t ∈ R. As before, take M to be an open subset of the Euclidean
space Rd . Let us suppose that the flow is C1 , in the sense that the map (t, x) →
f t (x) is differentiable and all the derivatives are continuous. Then, in particular,
every flow transformation f t : M → M is a C1 diffeomorphism: the inverse is
f −t . Since f 0 is the identity map and the Jacobian varies continuously, we have
that det Df t (x) > 0 at every point.
Applying Lemma 1.3.5 in this context, we find that the flow preserves the
volume measure if and only if
det Df t (x) = 1 for every x ∈ U and every t ∈ R. (1.3.13)
However, this is not very useful in practice because most of the time we do
not have an explicit expression for f t and, hence, it is not clear how to check
the condition (1.3.13). Fortunately, there is a reasonably explicit expression for
the Jacobian of the flow that can be used in some interesting situations. Let us
explain this.
Let us suppose that the flow f t : M → M corresponds to the trajectories of a
C1 vector field F : M → Rd . In other words, each t → f t (x) is the solution of
the differential equation
dy
= F(y) (1.3.14)
dt
that has x as the initial condition (when dealing with differential equations we
always assume that their solutions are defined for all times).
The Liouville formula relates the Jacobian of f t to the divergence div F of
the vector field:
 t 
det Df t (x) = exp div F(f s (x)) ds for every x and every t.
0

Recall that the divergence of a vector field F is the trace of its Jacobian matrix,
that is
∂F1 ∂Fd
div F = + ··· + . (1.3.15)
∂x1 ∂xd
Combining the Liouville formula with (1.3.13), we obtain:

Lemma 1.3.6 (Liouville). The flow (f t )t associated with a C1 vector field F


preserves the volume measure if and only if the divergence of F is identically
zero.
1.3 Examples 21

We can extend this discussion to the case when M is any Riemannian


manifold of dimension d ≥ 2. The reader who is unfamiliar with this notion
may wish to check Appendix A.4.5 before proceeding.
For simplicity, let us suppose that the manifold is orientable. Then the
volume measure on M is given by a differentiable d-form ω, called the volume
form (this remains true in the non-orientable case, except that the form ω is
defined up to sign only). What this means is that the volume of any measurable
set B contained in the domain of local coordinates (x1 , . . . , xd ) is given by

vol(B) = ρ(x1 , . . . , xd ) dx1 · · · dxd ,
B

where ω = ρdx1 · · · dxd is the expression of the volume form in those local
coordinates. Let F be a C1 vector field on M. Writing

F(x1 , . . . , xd ) = (F1 (x1 , . . . , xd ), . . . , Fd (x1 , . . . , xd )),

we may express the divergence as


∂(ρF) ∂(ρF)
div F = + ··· +
∂x1 ∂xd
(it can be shown that the right-hand side does not depend on the choice of the
local coordinates). A proof of the following generalization of Lemma 1.3.6 can
be found in Sternberg [Ste58]:

Theorem 1.3.7 (Liouville). The flow (f t )t associated with a C1 vector field F


on a Riemannian manifold preserves the volume measure on the manifold if
and only if div F = 0 at every point.

Then, it follows from the recurrence theorem for flows that, assuming that
the manifold has finite volume (for example, if M is compact) and div F = 0,
then almost every point is recurrent for the flow of F.

1.3.7 Exercises
1.3.1. Use Lemma 1.3.3 to give another proof of the fact that the decimal expansion
transformation f (x) = 10x − [10x] preserves the Lebesgue measure on the
interval.
1.3.2. Prove that, for any number x ∈ [0, 1] whose decimal expansion contains the
block 617 (for instance, x = 0.3375617264 · · · ), that block occurs infinitely
many times in the decimal expansion of x. Even more, the block 617 occurs
infinitely many times in the decimal expansion of almost every x ∈ [0, 1].
1.3.3. Prove that the number 617 appears infinitely many times in the continued
fraction expression of almost every number x0 ∈ (1/618, 1/617), that is, one
has an = 617 for infinitely many values of n ∈ N.
22 Recurrence

1.3.4. Let G be the Gauss map. Show that a number x ∈ (0, 1) is rational if and only if
there exists n ≥ 1 such that Gn (x) = 0.
1.3.5. Consider the sequence 1, 2, 4, 8, . . . , an = 2n , . . . of all the powers of 2. Prove that,
given any digit i ∈ {1, . . . , 9}, there exist infinitely many values of n for which an
starts with that digit.
1.3.6. Prove the following extension of Lemma 1.3.3. Let f : M → M be a C1 local
diffeomorphism on a compact Riemannian manifold M. Let vol be the volume
measure on M and ρ : M → [0, ∞) be a continuous function. Then f preserves
the measure μ = ρ vol if and only if

 ρ(x)
= ρ(y) for every y ∈ M.
| det Df (x)|
x∈f −1 (y)

When f is invertible this means that f preserves the measure μ if and only if
ρ(x) = ρ(f (x))| det Df (x)| for every x ∈ M.
1.3.7. Check that if A is a d × d matrix with integer coefficients and determinant
different from zero then the transformation fA : Td → Td defined on the torus by
fA ([x]) = [A(x)] preserves the Lebesgue measure on Td .
1.3.8. Show that the Lebesgue measure on S1 is the only probability measure invariant
under all the rotations of S1 , even if we restrict to rational rotations. [Note: We
will see in Chapter 6 that, for any irrational θ , the Lebesgue measure is the
unique probability measure invariant under Rθ .]
1.3.9. Suppose that θ = (θ1 , . . . , θd ) is rationally dependent. Show that there exists a
continuous non-constant function ϕ : Td → C such that ϕ ◦ Rθ = ϕ. Conclude
that there exist non-empty open subsets U and V of Td that are disjoint and
invariant under Rθ , in the sense that Rθ (U) = U and Rθ (V) = V. Deduce that no
orbit O([x]) of the rotation Rθ is dense in Td .
1.3.10. Suppose that θ = (θ1 , . . . , θd ) is rationally independent. Prove that if V is
a non-empty open subset of Td invariant under Rθ , then V is dense in Td .
Conclude that n∈Z Rnθ (U) is dense in the torus, for every non-empty open
subset U. Deduce that there exists [x] whose orbit O([x]) under the rotation
Rθ is dense in Td . Conclude that O([y]) is dense in Td for every [y].
1.3.11. Let U be an open subset of R2d and H : U → R be a C2 function. Denote by
(p1 , . . . , pd , q1 , . . . , qd ) the coordinate variables in R2d . The Hamiltonian vector
field associated with H is defined by
 
∂H ∂H ∂H ∂H
F(p1 , . . . , pd , q1 , . . . , qd ) = ,..., ,− ,..., − .
∂q1 ∂qd ∂p1 ∂pd

Check that the flow defined by F preserves the volume measure.


1.3.12. Let f : U → U be a C1 diffeomorphism preserving the volume measure on an
open subset U of Rd . Let H : U → R be a first integral of f , that is, a C1 function
such that H ◦ f = H. Let c be a regular value of H and ds be the volume measure
defined on the hypersurface Hc = H −1 (c) by the restriction of the Riemannian
metric of Rd . Prove that the restriction of f to the hypersurface Hc preserves the
measure ds/ grad H.
1.4 Induction 23

1.4 Induction
In this section we describe a general method, based on the Poincaré recurrence
theorem, to construct from a given system (f , μ) other systems, that we refer to
as systems induced by (f , μ). The reason this is interesting is the following. On
the one hand, it is often the case that an induced system is easier to analyze,
because it has better global properties than the original one. On the other hand,
interesting conclusions about the original system can often be obtained from
analyzing the induced one. Examples will appear in a while.

1.4.1 First-return map


Let f : M → M be a measurable transformation and μ be an invariant
probability measure. Let E ⊂ M be a measurable set with μ(E) > 0 and
ρ(x) = ρE (x) be the first-return time of x to E, as given by (1.2.1). The
first-return map to the domain E is the map g given by
g(x) = f ρ(x) (x)
whenever ρ(x) is finite. The Poincaré recurrence theorem ensures that this is
the case for μ-almost every x ∈ E and so g is defined on a full measure subset
of E. We also denote by μE the restriction of μ to the measurable subsets E.
Proposition 1.4.1. The measure μE is invariant under the map g : E → E.
Proof. For every k ≥ 1, denote by Ek the subset of points x ∈ E such that
ρ(x) = k. By definition, g(x) = f k (x) for every x ∈ Ek . Let B be any measurable
subset of E. Then
∞  
−1 −k
μ(g (B)) = μ f (B) ∩ Ek . (1.4.1)
k=1

On the other hand, since μ is f -invariant,


       
μ B = μ f −1 (B) = μ f −1 (B) ∩ E1 + μ f −1 (B) \ E . (1.4.2)
Analogously,
   
μ f −1 (B) \ E = μ f −2 (B) \ f −1 (E)
   
= μ f −2 (B) ∩ E2 + μ f −2 (B) \ (E ∪ f −1 (E)) .
Replacing this expression in (1.4.2), we find that
 
   2
 −k  
1
μ B = μ f (B) ∩ Ek + μ f −2 (B) \ f −k (E) .
k=1 k=0

Repeating this argument successively, we obtain


 
   n
 −k  
n−1
μ B = μ f (B) ∩ Ek + μ f −n (B) \ f −k (E) . (1.4.3)
k=1 k=0
24 Recurrence

Now let us go to the


 −nlimit when n → ∞. It is clear that the last term is
n−1 −k
bounded above by μ f (E) \ k=0 f (E) . So, using Remark 1.2.3, that term
converges to zero when n → ∞. In this way we conclude that

    
μ B = μ f −k (B) ∩ Ek .
k=1

Together with (1.4.1), this shows that μ(g−1 (B)) = μ(B) for every measurable
subset B of E. That is to say, the measure μE is invariant under g.

Example 1.4.2. Consider the transformation f : [0, ∞) → [0, ∞) defined by


f (0) = 0 and f (x) = 1/x if x ∈ (0, 1) and f (x) = x − 1 if x ≥ 1.
Let E = [0, 1]. The time ρ of first return to E is given by
ρ(0) = 1 and ρ(x) = k + 1 if x ∈ (1/(k + 1), 1/k] with k ≥ 1.
So, the first-return map to E is given by
g(0) = 0 and g(x) = 1/x − k if x ∈ (1/(k + 1), 1/k] with k ≥ 1.
In other words, g is the Gauss map. We saw in Section 1.3.2 that the Gauss map
admits an invariant probability measure equivalent to the Lebesgue measure on
[0, 1). From this, one can draw some interesting conclusions about the original
map f . For instance, using the ideas in the next section one finds that f admits
an (infinite) invariant measure equivalent to the Lebesgue measure on [0, ∞).

1.4.2 Induced transformations


In an opposite direction, given any measure ν invariant under g : E → E, we
may construct a certain related measure νρ that is invariant under f : M → M.
For this, g does not even have to be a first-return map: the construction that we
present below is valid for any map induced from f , that is, any map of the form
g : E → E, g(x) = f ρ(x) (x), (1.4.4)
where ρ : E → N is a measurable function (it suffices that ρ is defined on some
full measure subset of E). As before, we denote by Ek the subset of points x ∈ E
such that ρ(x) = k. Then we define
∞ 
  
νρ (B) = ν f −n (B) ∩ Ek , (1.4.5)
n=0 k>n

for every measurable set B ⊂ M.

Proposition 1.4.3. The measure νρ defined in (1.4.5) is invariant under f and


satisfies νρ (M) = E ρ dν. In particular, νρ is finite if and only if the function ρ
is integrable with respect to ν.
1.4 Induction 25

Proof. First, let us prove that νρ is invariant. By the definition (1.4.5),


∞  ∞ 
 −1    −(n+1)    
νρ f (B) = ν f (B) ∩ Ek = ν f −n (B) ∩ Ek .
n=0 k>n n=1 k≥n

We may rewrite this expression as follows:


∞  ∞
 −1    −n    
νρ f (B) = ν f (B) ∩ Ek + ν f −k (B) ∩ Ek . (1.4.6)
n=1 k>n k=1

Concerning the last term, observe that



 ∞
 −k   −1      
ν f (B) ∩ Ek = ν g (B) = ν B = ν B ∩ Ek ,
k=1 k=1

since ν is invariant under g. Replacing this in (1.4.6), we see that


∞  ∞
 −1    −n      
νρ f (B) = ν f (B) ∩ Ek + ν B ∩ Ek = νρ B
n=1 k>n k=1

for every measurable set B ⊂ E. The second claim is a direct consequence of


the definitions:
∞ 
 ∞  ∞ 
 −n   
νρ (M) = ν f (M) ∩ Ek = ν(Ek ) = kν(Ek ) = ρ dν.
n=0 k>n n=0 k>n k=1 E

This completes the proof.

It is interesting to analyze how this construction relates to the one in the


previous section when g is a first-return map of f and the measure ν is the
restriction μ | E of some invariant measure μ of f :

Corollary 1.4.4. If g is the first-return map of f to a measurable subset E and


ν = μ | E, then

1. νρ (B) = ν(B) = μ(B) for every measurable set B ⊂ E.


2. νρ (B) ≤ μ(B) for every measurable set B ⊂ M.

Proof. By definition, f −n (E) ∩ Ek = ∅ for every 0 < n < k. This implies that,
given any measurable set B ⊂ E, all the terms with n > 0 in the definition

(1.4.5) are zero. Hence, νρ (B) = k>0 ν(B ∩ Ek ) = ν(B) as claimed in the first
part of the statement.
Consider any measurable set B ⊂ M. Then,
         
μ B = μ B ∩ E + μ B ∩ Ec = ν B ∩ E + μ B ∩ Ec

     (1.4.7)
= ν B ∩ Ek + μ B ∩ Ec .
k=1
26 Recurrence
 
Since μ is invariant, μ(B∩Ec ) = μ f −1 (B)∩f −1 (Ec ) . Then, as in the previous
equality,
     
μ B ∩ Ec = μ f −1 (B) ∩ E ∩ f −1 (Ec ) + μ f −1 (B) ∩ Ec ∩ f −1 (Ec )

    
= ν f −1 (B) ∩ Ek + μ f −1 (B) ∩ Ec ∩ f −1 (Ec ) .
k=2

Replacing this in (1.4.7), we find that


 
  1 
 −n  
1
μ B = ν f (B) ∩ Ek + μ f −1 (B) ∩ f −n (Ec ) .
n=0 k>n n=0

Repeating this argument successively, we get that


 
   N 
 −n  
N
μ B = ν f (B) ∩ Ek + μ f −N (B) ∩ f −n (Ec )
n=0 k>n k=0


N 
 
≥ ν f −n (B) ∩ Ek for every N ≥ 1.
n=0 k>n

Taking the limit as N → ∞, we conclude that μ(B) ≥ νρ (B).

We also have from the Kac̆ theorem (Theorem 1.2.2) that


 
νρ (M) = ρ dν = ρ dμ = μ(M) − μ(E0∗ ).
E E

So, it follows from Corollary 1.4.4 that νρ = μ if and only if μ(E0∗ ) = 0.


Example 1.4.5 (Manneville–Pomeau). Given d > 0, let a be the only number
in (0, 1) such that a(1 + ad ) = 1. Then define f : [0, 1] → [0, 1] as follows:
x−a
f (x) = x(1 + xd ) if x ∈ [0, a] and f (x) = if x ∈ (a, 1].
1−a
The graph of f is depicted on the left-hand side of Figure 1.3. Observe that
|f  (x)| ≥ 1 at every point, and the inequality is strict at every x > 0. Let (an )n
be the sequence on the interval [0, a] defined by a1 = a and f (an+1 ) = an for
n ≥ 1. We also write a0 = 1. Some properties of this sequence are studied in
Exercise 1.4.2.
Now consider the map g(x) = f ρ(x) (x), where
ρ : [0, 1] → N, ρ(x) = 1 + min{n ≥ 0 : f n (x) ∈ (a, 1]}.
In other words, ρ(x) = k and so g(x) = f k (x) for every x ∈ (ak , ak−1 ]. The
graph of g is represented on the right-hand side of Figure 1.3. Note that the
restriction to each interval (ak , ak−1 ] is a bijection onto (0, 1]. A key point is
that the induced map g is expanding:
1
|g (x)| ≥ > 1 for every x ∈ [0, 1].
1−a
1.4 Induction 27
f g
1 1

a1
a2

0 a3 a2 a1 1 0 a3 a2 a1 1

Figure 1.3. Construction of an induced transformation

Using the ideas that will be developed in Chapter 11, one can show that g
admits a unique invariant probability measure ν equivalent to the Lebesgue
measure on (0, 1]. In fact, the density (Radon–Nikodym derivative) of ν with
respect to the Lebesgue measure is bounded from zero and infinity. Then, the
f -invariant measure νρ in (1.4.5) is equivalent to Lebesgue measure. It follows
(see Exercise 1.4.2) that this measure is finite if and only if d ∈ (0, 1).

1.4.3 Kakutani–Rokhlin towers


It is possible, and useful, to generalize the previous constructions even further,
by omitting the initial transformation f : M → M altogether. More precisely,
given a transformation g : E → E, a measure ν on E invariant under g and a
measurable function ρ : E → N, we are going to construct a transformation
f : M → M and a measure νρ invariant under f such that E can be identified
with a subset of M, g is the first-return map of f to E, with first-return time
given by ρ, and the restriction of νρ to E coincides with ν.
This transformation f is called the Kakutani–Rokhlin tower of g with time
ρ. The measure νρ is finite if and only if ρ is integrable with respect to ν. They
are constructed as follows. Begin by defining
M = {(x, n) : x ∈ E and 0 ≤ n < ρ(x)}
∞ 
 k−1
= Ek × {n}.
k=1 n=0

In other words, M consists of k copies of each set Ek = {x ∈ E : ρ(x) = k},


“piled up” on top of each other. We call each k>n Ek × {n} the n-th floor of
M. See Figure 1.4.
Next, define f : M → M as follows:

(x, n + 1) if n < ρ(x) − 1
f (x, n) = .
(g(x), 0) if n = ρ(x) − 1
28 Recurrence

k-th floor

(k−1)-st floor
g

2nd floor

1st floor

ground floor
E1 E2 E3 Ek

Figure 1.4. Kakutani–Rokhlin tower of g with time ρ

In other words, each point (x, n) is “lifted” one floor at a time, until reaching the
floor ρ(x) − 1; at that stage, the point “falls” directly to (g(x), 0) on the ground
(zero-th) floor. The ground floor E × {0} is naturally identified with the set E.
Besides, the first-return map to E × {0} corresponds precisely to g : E → E.
Finally, the measure νρ is defined by
νρ | (Ek × {n}) = ν | Ek
for every 0 ≤ n < k. It is clear that the restriction of νρ to the ground floor
coincides with ν. Moreover, νρ is invariant under f and
∞ 
νρ (M) = kν(Ek ) = ρ dν.
k=1 E

This completes the construction of the Kakutani–Rokhlin tower.

1.4.4 Exercises
1.4.1. Let f : S1 → S1 be the transformation f (x) = 2x mod Z. Show that the function
τ (x) = min{k ≥ 0 : f k (x) ∈ (1/2, 1)} is integrable with respect to the Lebesgue
measure. State and prove a corresponding result for any C1 transformation g :
S1 → S1 that is close to f , in the sense that supx {g(x) − f (x), g (x) − f  (x)} is
sufficiently small.
1.4.2. Consider the measure νρ and the sequence (an )n defined in Example 1.4.5. Check
that νρ is always σ -finite. Show that (an )n is decreasing and converges to zero.
Moreover, there exist c1 , c2 , c3 , c4 > 0 such that
 
c1 ≤ aj j1/d ≤ c2 and c3 ≤ aj − aj+1 j1+1/d ≤ c4 for every j. (1.4.8)

Deduce that the g-invariant measure νρ is finite if and only if d ∈ (0, 1).
1.4.3. Let σ :  →  be the map defined on the space  = {1, . . . , d}Z by σ ((xn )n ) =
(xn+1 )n . Describe the first-return map g to the subset {(xn )n ∈  : x0 = 1}.
1.4.4. [Kakutani–Rokhlin lemma] Let f : M → M be an invertible transformation
and μ be an invariant probability measure without atoms and such that
μ( n∈N f n (E)) = 1 for every E ⊂ M with μ(E) > 0. Show that for every
1.5 Multiple recurrence theorems 29

n ≥ 1 and ε > 0 there exists a measurable set B ⊂ M such that the iterates
B, f (B), . . . , f n−1 (B) are pairwise disjoint and the complement of their union has
measure less than ε. In particular, this holds for every invertible system that is
aperiodic, that is, whose periodic points form a zero measure set.
1.4.5. Let f : M → M be a transformation and (Hj )j≥1 be a collection of subsets of M
such that if x ∈ Hn then f j (x) ∈ Hn−j for every 0 ≤ j < n. Let H be the set of points
that belong to Hj for infinitely many values of j, that is, H = ∞ k=1

j=k Hj . For
τ (y)
y ∈ H, define τ (y) = min{j ≥ 1 : y ∈ Hj } and T(y) = f (y). Observe that T maps
H inside H. Moreover, show that

1
k−1
1 1
lim sup #{1 ≤ j ≤ n : x ∈ Hj } ≥ θ > 0 ⇒ lim inf τ (T i (x)) ≤ .
n n k k i=0 θ

1.4.6. Let f : M → M be a transformation preserving a measure μ. Let (Hj )j≥1 and


τ : M → N be as in Exercise 1.4.5. Consider the sequence of functions (τn )n
defined by τ1 (x) = τ (x) and τn (x) = τ (f τn−1 (x) (x)) + τn−1 (x) for n > 1. Suppose
that
1
lim sup #{1 ≤ j ≤ n : x ∈ Hj } ≥ θ > 0 for μ-almost every x ∈ M.
n n
Show that τn+1 (x)/τn (x) → 1 for μ-almost every x ∈ M. [Note: Sequences with
this property are called non-lacunary.]

1.5 Multiple recurrence theorems


Now we consider finite families of commuting maps fi : M → M, i = 1, . . . , q,
that is, such that
fi ◦ fj = fj ◦ fi for every i, j ∈ {1, . . . , q}.
Our goal is to explain that the results in Section 1.2 extend to this setting: we
find points that are simultaneously recurrent for these transformations.
The first result in this direction generalizes the Birkhoff recurrence theorem
(Theorem 1.2.6):
Theorem 1.5.1 (Birkhoff multiple recurrence). Let M be a compact metric
space and f1 , . . . , fq : M → M be continuous commuting maps. Then there exists
a ∈ M and a sequence (nk )k → ∞ such that
n
lim fi k (a) = a for every i = 1, . . . , q. (1.5.1)
k

The key point here is that the sequence (nk )k does not depend on i: we say
that the point a is simultaneously recurrent for all the maps fi , i = 1, . . . , q.
A proof of Theorem 1.5.1 is given in Section 1.5.1. Next, we discuss the
following generalization of the Poincaré recurrence theorem (Theorem 1.2.1):
Theorem 1.5.2 (Poincaré multiple recurrence). Let (M, B, μ) be a probability
space and fi : M → M, i = 1, . . . , q be measurable commuting maps that
30 Recurrence

preserve the measure μ. Then, given any set E ⊂ M with positive measure,
there exists n ≥ 1 such that
 
μ E ∩ f1−n (E) ∩ · · · ∩ fq−n (E) > 0.

In other words, for a positive measure subset of points x ∈ E, their orbits


under all the maps fi , i = 1, . . . , q return to E simultaneously at time n (we say
that n is a simultaneous return of x to E): once more, the crucial point with the
statement is that n does not depend on i.
The proof of Theorem 1.5.2 will not be presented here; we refer the
interested reader to the book of Furstenberg [Fur77]. We are just going to
mention some direct consequences and, in Chapter 2, we will use this theorem
to prove the Szemerédi theorem on the existence of arithmetic progressions
inside “dense” subsets of integer numbers.
To begin with, observe that the set of simultaneous returns is always infinite.
Indeed, let n be as in the statement of Theorem 1.5.2. Applying the theorem to
the set F = E ∩ f1−n (E) ∩ · · · ∩ fq−n (E), we find m ≥ 1 such that
 
μ E ∩ f1−(m+n) (E) ∩ · · · ∩ fq−(m+n) (E)
 
≥ μ F ∩ f1−m (F) ∩ · · · ∩ fq−m (F) > 0.

Thus, m + n is also a simultaneous return to E, for all the points in some subset
of E with positive measure.
It follows that, for any set E ⊂ M with μ(E) > 0 and for μ-almost every
point x ∈ E, there exist infinitely many simultaneous returns of x to E. Indeed,
suppose there is a positive measure set F ⊂ E such that every point of F has a
finite number of simultaneous returns to E. On the one hand, up to replacing
F by a suitable subset, we may suppose that the simultaneous returns to E
of all the points of F are bounded by some k ≥ 1. On the other hand, using
the previous paragraph, there exists n > k such that G = F ∩ f1−n (F) ∩ · · · ∩
fq−n (F) has positive measure. Now, it is clear from the definition that n is a
simultaneous return to E of every x ∈ G. This contradicts the choice of F, thus
proving our claim.
Another direct corollary is the Birkhoff multiple recurrence theorem
(Theorem 1.5.1). Indeed, if fi : M → M, i = 1, . . . , q are continuous commuting
transformations on a compact metric space then there exists some probability
measure μ that is invariant under all these transformations (this fact will be
checked in the next chapter, see Exercise 2.2.2). From this point on, we may
argue exactly as in the proof of Theorem 1.2.4. More precisely, consider
any countable basis {Uk } for the topology of M. According to the previous
paragraph, for every k there exists a set Ũk ⊂ Uk with zero measure such
that every point in Uk \ Ũk has infinitely many simultaneous returns to Uk .
Then Ũ = k Ũk has measure zero and every point in its complement is
simultaneously recurrent, in the sense of Theorem 1.5.1.
1.5 Multiple recurrence theorems 31

1.5.1 Birkhoff multiple recurrence theorem


In this section we prove Theorem 1.5.1 in the case when the transformations
f1 , . . . , fq are homeomorphisms of M, which suffices for all our purposes in the
present chapter. The general case may be deduced easily (see Exercise 2.4.7)
using the concept of natural extension, which we will present in the next
chapter.
The theorem may be reformulated in the following useful way. Consider the
transformation F : M q → M q defined on the product space M q = M × · · · × M
by F(x1 , . . . , xq ) = (f1 (x1 ), . . . , fq (xq )). Denote by q the diagonal of M q , that is,
the subset of points of the form x̃ = (x, . . . , x). Theorem 1.5.1 claims, precisely,
that there exist ã ∈ q and (nk )k → ∞ such that
lim F nk (ã) = ã. (1.5.2)
k

The proof of Theorem 1.5.1 is by induction on the number q of transfor-


mations. The case q = 1 is contained in Theorem 1.2.6. Consider any q ≥ 2
and suppose that the statement is true for every family of q − 1 commuting
homeomorphisms. We are going to prove that it is true for the family f1 , . . . , fq .
Let G be the (abelian) group generated by the homeomorphisms f1 , . . . , fq .
We say that a set X ⊂ M is G-invariant if g(X) ⊂ X for every g ∈ G. Observing
that the inverse g−1 is also in G, we see that this implies g(X) = X for every
g ∈ G. Just as we did in Theorem 1.2.6, we may use Zorn’s lemma to conclude
that there exists some minimal, non-empty, closed, G-invariant set X ⊂ M (this
is Exercise 1.5.2). The statement of the theorem is not affected if we replace M
by X. Thus, it is no restriction to assume that the ambient space M is minimal.
This assumption is used as follows:

Lemma 1.5.3. If M is minimal then for every non-empty open set U ⊂ M there
exists a finite subset H ⊂ G such that

h−1 (U) = M.
h∈H

Proof. For any x ∈ M, the closure of the orbit G(x) = {g(x) : g ∈ G} is a


non-empty, closed, G-invariant subset of M. So, the hypothesis that M is
minimal implies that every orbit G(x) is dense in M. In particular, there is
g ∈ G such that g(x) ∈ U. This proves that {g−1 (U) : g ∈ G} is an open cover of
M. By compactness, it follows that there exists a finite subcover, as claimed.

Consider the product M q endowed with the distance function


 
d (x1 , . . . , xq ), (y1 , . . . , yq ) = max{d(xi , yi ) : 1 ≤ i ≤ q}.
Note that the map M → q , x → x̃ = (x, . . . , x) is a homeomorphism, and even
an isometry for this choice of a distance. Every open set U ⊂ M corresponds
to an open set U  ⊂ q through this homeomorphism. Given any g ∈ G,
32 Recurrence

we denote by g̃ : M q → M q the homeomorphism defined by g̃(x1 , . . . , xq ) =


(g(x1 ), . . . , g(xq )). The fact that the group G is abelian implies that g̃ commutes
with F; note also that every g̃ preserves the diagonal q . Then the conclusion
of Lemma 1.5.3 may be rewritten in the following form:

) = q .
h̃−1 (U (1.5.3)
h∈H

Lemma 1.5.4. Given ε > 0 there exist x̃ ∈ q, ỹ ∈ q and n ≥ 1 such that


d(F n (x̃), ỹ) < ε.

Proof. Define gi = fi ◦ fq−1 for each i = 1, . . . , q − 1. Since the maps fi commute


with each other, so do the maps gi . Then, by induction, there exist y ∈ M and
(nk )k → ∞ such that
n
lim gi k (y) = y for every i = 1, . . . , q − 1.
k
−nk
Denote xk = fq (y) and consider x̃k = (xk , . . . , xk ) ∈ q. Then,
F nk (x̃k ) = (f1 k fq−nk (y), . . . , fq−1 fq−nk (y), fqnk fq−nk (y))
n n k

n n
= (g1k (y), . . . , gq−1
k
(y), y)
converges to (y, . . . , y, y) when k → ∞. This proves the lemma, with x̃ = x̃k ,
ỹ = (y, . . . , y, y) and n = nk for every k sufficiently large.

The next step is to show that the point ỹ in Lemma 1.5.4 is arbitrary:

Lemma 1.5.5. Given ε > 0 and z̃ ∈ q there exist w̃ ∈ q and m ≥ 1 such that
d(F m (w̃), z̃) < ε.

Proof. Given ε > 0 and z̃ ∈ q , consider U = open ball of center z̃ and radius
ε/2. By Lemma 1.5.3 and the observation (1.5.3), we may find a finite set
), h ∈ H cover q . Since the elements of G are
H ⊂ G such that the sets h̃−1 (U
(uniformly) continuous functions, there exists δ > 0 such that
d(x̃1 , x̃2 ) < δ ⇒ d(h̃(x̃1 ), h̃(x̃2 )) < ε/2 for every h ∈ H.
By Lemma 1.5.4 there exist x̃, ỹ ∈ q and n ≥ 1 such that d(F n (x̃), ỹ) < δ. Fix
h ∈ H such that ỹ ∈ h̃−1 (U). Then,
     
d h̃(F n (x̃)), z̃ ≤ d h̃(F n (x̃)), h̃(ỹ) + d h̃(ỹ), z̃ < ε/2 + ε/2.
Take w̃ = h̃(x̃). Since h̃ commutes with F n , the previous inequality implies that
d(F n (w̃), z̃) < ε, as we wanted to prove.

Next, we prove that one may take x̃ = ỹ in Lemma 1.5.4:

Lemma 1.5.6 (Bowen). Given ε > 0 there exist ṽ ∈ q and k ≥ 1 with


d(F k (ṽ), ṽ) < ε.
1.5 Multiple recurrence theorems 33

Proof. Given ε > 0 and z̃0 ∈ q , consider the sequences εj , mj and z̃j , j ≥ 1
defined by recurrence as follows. Initially, take ε1 = ε/2.
• By Lemma 1.5.5 there are z̃1 ∈ q and m1 ≥ 1 with d(F m1 (z̃1 ), z̃0 ) < ε1 .
• By the continuity of F m1 , there exists ε2 < ε1 such that d(z̃, z̃1 ) < ε2 implies
d(F m1 (z̃), z̃0 ) < ε1 .
Next, given any j ≥ 2:
• By Lemma 1.5.5 there are z̃j ∈ q and mj ≥ 1 with d(F mj (z̃j ), z̃j−1 ) < εj .
• By the continuity of F mj , there exists εj+1 < εj such that d(z̃, z̃j ) < εj+1
implies d(F mj (z̃), z̃j−1 ) < εj .
In particular, for any i < j,
ε
d(F mi+1 +···+mj (z̃j ), z̃i ) < εi+1 ≤ .
2
Since q is compact, we can find i, j with i < j such that d(z̃i , z̃j ) < ε/2. Take
k = mi+1 + · · · + mj . Then,
d(F k (z̃j ), z̃j ) ≤ d(F k (z̃j ), z̃i ) + d(z̃i , z̃j ) < ε.
This completes the proof of the lemma.

Now we are ready to conclude the proof of Theorem 1.5.1. For that, let us
consider the function
φ: q → [0, ∞), φ(x̃) = inf{d(F n (x̃), x̃) : n ≥ 1}.
Observe that φ is upper semi-continuous: given any ε > 0, every point x̃ admits
some neighborhood V such that φ(ỹ) < φ(x̃) + ε for every y ∈ V. This is an
immediate consequence of the fact that φ is given by the infimum of a family of
continuous functions. Then (Exercise 1.5.4), φ admits some continuity point ã.
We are going to show that this point satisfies the conclusion of Theorem 1.5.1.
Let us begin by observing that φ(ã) = 0. Indeed, suppose that φ(ã) is
positive. Then, by continuity, there exist β > 0 and a neighborhood V of ã
such that φ(ỹ) ≥ β > 0 for every ỹ ∈ V. Then,
d(F n (ỹ), ỹ) ≥ β for every y ∈ V and n ≥ 1. (1.5.4)
On the other hand, according to (1.5.3), for every x̃ ∈ q there exists h ∈ H
such that h̃(x̃) ∈ V. Since the transformations h are uniformly continuous, we
may fix α > 0 such that
 
d(z̃, w̃) < α ⇒ d h̃(z̃), h̃(w̃) < β for every h ∈ H. (1.5.5)
By Lemma 1.5.6, there exists n ≥ 1 such that d(x̃, F n (x̃)) < α. Then, using
(1.5.5) and recalling that F commutes with every h̃,
 
d h̃(x̃), F n (h̃(x̃)) < β.
This contradicts (1.5.4). This contradiction proves that φ(ã) = 0, as claimed.
34 Recurrence

In other words, there exists (nk )k → ∞ such that d(F nk (ã), ã) → 0 when k →
∞. This means that (1.5.2) is satisfied and, hence, the proof of Theorem 1.5.1
is complete.

1.5.2 Exercises
1.5.1. Show, by means of examples, that the conclusion of Theorem 1.5.1 is generally
false if the transformations fi do not commute with each other.
1.5.2. Let G be the abelian group generated by commuting homeomorphisms f1 , . . . , fq :
M → M on a compact metric space. Prove that there exists some minimal element
X ⊂ M for the inclusion relation in the family of non-empty, closed, G-invariant
subsets of M.
1.5.3. Show that if ϕ : M → R is an upper semi-continuous function on a compact
metric space then ϕ attains its maximum, that is, there exists p ∈ M such that
ϕ(p) ≥ ϕ(x) for every x ∈ M.
1.5.4. Show that if ϕ : M → R is an (upper or lower) semi-continuous function on a
compact metric space then the set of continuity points of ϕ contains a countable
intersection of open and dense subsets of M. In particular, the set of continuity
points is dense in M.
1.5.5. Let f : M → M be a measurable transformation preserving a finite measure μ.
Given k ≥ 1 and a positive measure set A ⊂ M, show that for almost every x ∈ A
there exists n ≥ 1 such that f jn (x) ∈ A for every 1 ≤ j ≤ k.
1.5.6. Let f1 , . . . , fq : M → M be commuting homeomorphisms on a compact metric
space. A point x ∈ M is called non-wandering if for every neighborhood U of x
n n
there exist n1 , . . . , nq ≥ 1 such that f1 1 · · · fq q (U) intersects U. The non-wandering
set is the set (f1 , . . . , fq ) of all non-wandering points. Prove that (f1 , . . . , fq ) is
non-empty and compact.
2
Existence of invariant measures

In this chapter we prove the following result, which guarantees the existence
of invariant measures for a broad class of transformations:

Theorem 2.1 (Existence of invariant measures). Let f : M → M be a


continuous transformation on a compact metric space. Then there exists some
probability measure on M invariant under f .

The main point in the proof is to introduce a certain topology in the set
M1 (M) of probability measures on M, that we call weak∗ topology. The idea
is that two measures are close, with respect to this topology, if the integrals
they assign to (many) bounded continuous functions are close. The precise
definition and some of the properties of the weak∗ topology are presented
in Section 2.1. The crucial property, that makes this topology so useful for
proving the existence theorem, is that it turns M1 (M) into a compact space
(Theorem 2.1.5).
The proof of Theorem 2.1 is given in Section 2.2. We will also see,
through examples, that the hypotheses of continuity and compactness cannot
be omitted.
In Section 2.3 we insert the construction of the weak∗ topology into a
broader framework from functional analysis and we also take the opportunity
to introduce the notion of the Koopman operator of a transformation, which
will be very useful in the sequel. In particular, as we are going to see, it allows
us to give an alternative proof of Theorem 2.1, based on tools from functional
analysis.
In Section 2.4 we describe certain explicit constructions of invariant
measures for two important classes of systems: skew-products and natural
extensions (or inverse limits) of non-invertible transformations.
Finally, in Section 2.5 we discuss some important applications of the idea of
multiple recurrence (Section 1.5) in the context of combinatorial arithmetics.
Theorem 2.1.5 has an important role in the arguments, which is the reason why
this discussion was postponed to the present chapter.
36 Existence of invariant measures

2.1 Weak∗ topology


In this section M will always be a metric space. Our goal is to define the
so-called weak∗ topology in the set M1 (M) of Borel probability measures on
M and to discuss its main properties.
Let d(·, ·) be the distance function on M and B(x, δ) denote the ball of center
x ∈ M and radius δ > 0. Given B ⊂ M, we define
d(x, B) = inf{d(x, y) : y ∈ B}
and we call the δ-neighborhood of B the set Bδ of points x ∈ M with d(x, B) < δ.

2.1.1 Definition and properties of the weak∗ topology


Given a measure μ ∈ M1 (M), a finite set  = {φ1 , . . . , φN } of bounded
continuous functions φi : M → R and a number ε > 0, we define
 
 
V(μ, , ε) = {ν ∈ M1 (M) :  φi dν − φi dμ < ε for every i}. (2.1.1)

Note that the intersection of any two such sets contains some set of this form.
Thus, the family {V(μ, , ε) : , ε} may be taken as a basis of neighborhoods
of each μ ∈ M1 (M).
The weak∗ topology is the topology defined by these bases of neighbor-
hoods. In other words, the open sets in the weak∗ topology are the sets A ⊂
M1 (M) such that for every μ ∈ A there exists some V(μ, , ε) contained in A.
Observe that the definition depends only on the topology of M, not on its dis-
tance. Furthermore, this topology is Hausdorff: Proposition A.3.3 implies that
if μ and ν are distinct probabilities then there exist ε > 0 and some bounded
continuous function φ : M → R such that V(μ, {φ}, ε) ∩ V(ν, {φ}, ε) = ∅.

Lemma 2.1.1. A sequence (μn )n∈N converges to a measure μ ∈ M1 (M) in the


weak∗ topology if and only if
 
φ dμn → φ dμ for every bounded continuous function φ : M → R.

Proof. To prove the “only if” claim, consider any set  = {φ} consisting of a
single bounded continuous function φ. Since (μn )n → μ, for any ε > 0 there
exists n̄ ≥ 1 such that μn ∈ V(μ, , ε) for every n ≥ n̄. This means, precisely,
that   
 
 φ dμn − φ dμ < ε for every n ≥ n̄.
 

In other words, the sequence ( φ dμn )n converges to φ dμ.


The converse asserts that if ( φ dμn )n converges to φ dμ for every
bounded continuous function φ then, given any  = {φ1 , . . . , φN } and ε > 0,
there exists n̄ ≥ 1 such that μn ∈ V(μ, , ε) for n ≥ n̄. To check that this is
2.1 Weak∗ topology 37

so, let  = {φ1 , . . . , φN }. The hypothesis ensures that for every i there exists n̄i
such that   
 
 φi dμn − φi dμ < ε for every n ≥ n̄i .
 
Taking n̄ = max{n̄1 , . . . , n̄N } we get that μn ∈ V(μ, , ε) for every n ≥ n̄.

2.1.2 Portmanteau theorem


Now let us discuss other useful ways of defining the weak∗ topology. Indeed,
the relations (2.1.2), (2.1.3), (2.1.4) and (2.1.5) below introduce other natural
choices for neighborhoods of a probability measure μ ∈ M1 . In Theorem 2.1.2
we prove that all these choices give rise to the same topology in M1 (M), which
coincides with the weak∗ topology.
A direct variation of the definition of weak∗ topology is obtained by taking
as the basis of neighborhoods the family of sets
  
 
V(μ, , ε) = {η ∈ M1 (M) :  ψi dη − ψi dμ < ε for every i}, (2.1.2)

where ε > 0 and  = {ψ1 , . . . , ψN } is a family of Lipschitz functions. The next


definition is formulated in terms of closed subsets. Given any finite family
F = {F1 , . . . , FN } of closed subsets of M and given any ε > 0, consider
Vf (μ, F, ε) = {ν ∈ M1 : ν(Fi ) < μ(Fi ) + ε for every i}. (2.1.3)
The next construction is analogous, just with open subsets instead of closed
subsets. Given any finite family A = {A1 , . . . , AN } of open subsets of M and
given any ε > 0, consider
Va (μ, A, ε) = {ν ∈ M1 : ν(Ai ) > μ(Ai ) − ε for every i}. (2.1.4)
We call a continuity set of a measure μ any Borel subset B of M whose
boundary ∂B has zero measure for μ. Given any finite family B = {B1 , . . . , BN }
of continuity sets of μ and given any ε > 0, consider
Vc (μ, B, ε) = {ν ∈ M1 : |μ(Bi ) − ν(Bi )| < ε for every i}. (2.1.5)
Given any two topologies T1 and T2 in the same set, we say that T1 is weaker
than T2 (or, equivalently, that T2 is stronger than T1 ) if every subset that is open
for T1 is also open for T2 . We say that the two topologies are equivalent if they
have exactly the same open sets.
Theorem 2.1.2. The topologies defined by the bases of neighborhoods (2.1.1),
(2.1.2), (2.1.3), (2.1.4) and (2.1.5) are all equivalent.

Proof. Since every Lipschitz function is continuous, it is clear that the


topology (2.1.2) is weaker than the topology (2.1.1).
To show that the topology (2.1.3) is weaker than the topology (2.1.2),
consider any finite family F = {F1 , . . . , FN } of closed subsets of M. According
38 Existence of invariant measures

to Lemma A.3.4, for each δ > 0 and each i there exists a Lipschitz function
ψi : M → [0, 1] such that XFi ≤ ψi ≤ XFδ . Observe that δ Fiδ = Fi , because
i
Fi is closed, and so μ(Fiδ ) → μ(Fi ) when δ → 0. Fix δ > 0 small enough so
that μ(Fiδ ) − μ(Fi ) < ε/2 for every i. Let  be the set of functions ψ1 , . . . , ψN
obtained in this way. Observe that
  
 
 ψi dν − ψi dμ < ε/2 ⇒ ν(Fi ) − μ(F δ ) < ε/2 ⇒ ν(Fi ) ≤ μ(Fi ) + ε
  i

for every i. In other words, V(μ, , ε/2) is contained in Vf (μ, F, ε).


It is easy to see that the topologies (2.1.3) and (2.1.4) are equivalent.
Indeed, let F = {F1 , . . . , Fn } be any finite family of closed subsets and let
A = {A1 , . . . , AN }, where each Ai is the complement of Fi . Clearly,
Vf (μ, F, ε) = {ν ∈ M1 : ν(Fi ) < μ(Fi ) + ε for every i}
= {ν ∈ M1 : ν(Ai ) > μ(Ai ) − ε for every i} = Va (μ, A, ε).
Next, let us show that the topology (2.1.5) is weaker than these equivalent
topologies (2.1.3) and (2.1.4). Given any finite family B = {B1 , . . . , BN } of
continuity sets of μ, let Fi be the closure and Ai be the interior of each Bi .
Denote F = {F1 , . . . , FN } and A = {A1 , . . . , AN }. Since μ(Fi ) = μ(Bi ) = μ(Ai ),
ν(Fi ) < μ(Fi ) + ε ⇒ ν(Bi ) < μ(Bi ) + ε
ν(Ai ) > μ(Ai ) − ε ⇒ ν(Bi ) > μ(Bi ) − ε
for every i. This means that Vf (μ, F, ε)∩Va (μ, A, ε) is contained in Vc (μ, B, ε).
Finally, let us prove that the topology (2.1.1) is weaker than the topology
(2.1.5). Let  = {φ1 , . . . , φN } be a finite family of bounded continuous
functions. Fix an integer number such that sup |φi (x)| < for every i.
For each i, the pre-images φi−1 (s), s ∈ [− , ] are pairwise disjoint. Hence,
 −1
μ φi (s) = 0 except for a countable set of values of s. In particular, we
may choose k ∈ N and points − = t0 < t1 < · · · < tk−1 < tk = such that
tj − tj−1 < ε/2 and μ({φi−1 (tj )}) = 0 for every j. Then, each
Bi,j = φi−1 ((tj−1 , tj ])
is a continuity set of μ. Moreover,
 k   k 
k
tj μ(Bi,j ) ≥ φi dμ ≥ tj−1 μ(Bi,j ) > tj μ(Bi,j ) − ε/2,
j=1 j=1 j=1

and we also have similar inequalities for the integrals relative to ν. It follows
that
   
  k
 φi dμ − φi dν  ≤ |μ(Bi,j ) − ν(Bi,j )| + ε/2 (2.1.6)
 
j=1

for every i. Denote B = {Bi,j : i = 1, . . . , N and j = 1, . . . , k}. Then the relation


(2.1.6) implies that Vc (μ, B, ε/(2k )) is contained in V(μ, , ε).
2.1 Weak∗ topology 39

2.1.3 The weak∗ topology is metrizable


Now assume that the metric space M is separable. We will see in Exercise 2.1.3
that the weak∗ topology on M1 (M) is separable. Here we show that it is also
metrizable: we exhibit a distance function on M1 (M) that induces the weak∗
topology.
Given μ, ν ∈ M1 (M), define D(μ, ν) to be the infimum of all numbers δ > 0
such that
μ(B) < ν(Bδ ) + δ and ν(B) < μ(Bδ ) + δ (2.1.7)
for every Borel set B ⊂ M.
Lemma 2.1.3. The function D is a distance on M1 (M).

Proof. Let us start by showing that D(μ, ν) = 0 implies μ = ν. Indeed, the


hypothesis implies
μ(B) ≤ ν(B̄) and ν(B) ≤ μ(B̄)
for every Borel set B ⊂ M, where B̄ denotes the closure of B. When B is closed,
these inequalities mean that μ(B) = ν(B). As we have seen previously, any two
measures that coincide on the closed subsets are necessarily the same.
We leave it to the reader to check all the other conditions in the definition of
a distance (Exercise 2.1.5).

This distance D is called the Levy–Prohorov metric on M1 (M). In what


follows we denote by BD (μ, r) the ball of radius r > 0 around any μ ∈ M1 (M).
Proposition 2.1.4. If M is a separable metric space then the topology induced
by the Levy–Prohorov distance D coincides with the weak∗ topology on
M1 (M).

Proof. Let ε > 0 and F = {F1 , . . . , FN } be a finite family of closed subsets of


M. Fix δ ∈ (0, ε/2) such that μ(Fiδ ) < μ(Fi ) + ε/2 for every i. If ν ∈ BD (μ, δ)
then
ν(Fi ) < μ(Fiδ ) + δ < μ(Fi ) + ε for every i,
which means that ν ∈ Vf (μ, F, ε). This shows that the topology induced by
the distance D is stronger than the topology (2.1.3) which, as we have seen, is
equivalent to the weak∗ topology.
We are left to prove that if M is separable then the weak∗ topology is stronger
than the topology induced by D. For that, let {p1 , p2 , . . . } be any countable
dense subset of M. Given ε > 0, let us fix δ ∈ (0, ε/3). For each j, the spheres
∂B(pj , r) = {x : d(x, pj ) = r}, r > 0 are pairwise disjoint. So, we may find r > 0
arbitrarily small such that μ(∂B(pj , r)) = 0 for every j. Fix any such r, with
r ∈ (0, δ/3). The family {B(pj , r) : j = 1, 2, . . . } is a countable cover of M by
continuity sets of μ. Fix k ≥ 1 such that the set U = kj=1 B(pj , r) satisfies
 
μ U > 1 − δ. (2.1.8)
40 Existence of invariant measures

Figure 2.1. Partition defined by a finite cover

Next, let us consider the (finite) partition P of U defined by the family of balls
{B(pj , r) : j = 1, . . . , k}.
That is, the elements of P are the maximal sets P ⊂ U such that, for each j,
either P is contained in B(pj , r) or P is disjoint from B(pj , r). See Figure 2.1.
Now let E be the family of all finite unions of elements of P. Note that the
boundary of every element of E has measure zero, since it is contained in the
union of the boundaries of the balls B(pj , r), 1 ≤ j ≤ k. That is, every element
of E is a continuity set of μ.
If ν ∈ Vc (μ, E, δ) then
|μ(E) − ν(E)| < δ for every E ∈ E. (2.1.9)
In particular, (2.1.8) together with (2.1.9) imply that
 
ν U > 1 − 2δ. (2.1.10)
Now, given any Borel subset B, denote by EB the union of all the elements of
P that intersect B. Then EB ∈ E and so the relation (2.1.9) yields
|μ(EB ) − ν(EB )| < δ.
Observe that B is contained in EB U c . Moreover, EB ⊂ Bδ because every
element of P has diameter less than 2r < δ. These facts, together with (2.1.8)
and (2.1.10), imply that
μ(B) ≤ μ(EB ) + δ < ν(EB ) + 2δ ≤ ν(Bδ ) + 2δ
ν(B) ≤ ν(EB ) + 2δ < μ(EB ) + 3δ ≤ μ(Bδ ) + 3δ.
Since 3δ < ε, these relations imply that ν ∈ BD (μ, ε).

One can show that if M is a complete separable metric space then


the Levy–Prohorov metric on M1 (M) is complete (and separable, by
Exercise 2.1.3). See, for example, Theorem 6.8 in Billingsley [Bil68].
2.1 Weak∗ topology 41

2.1.4 The weak∗ topology is compact


In this section we take the metric space M to be compact. We are going to
prove the following fundamental result:

Theorem 2.1.5. The space M1 (M) is compact for the weak∗ topology.

Since we already know that M1 (M) is metrizable, it suffices to prove:

Proposition 2.1.6. Every sequence (μk )k∈N in M1 (M) has some subsequence
that converges in the weak∗ topology.

Proof. Let {φn : n ∈ N} be a countable dense subset of the unit ball of


C0 (M) (recall Theorem A.3.13). For each n ∈ N, the sequence of real numbers
φn dμk , k ∈ N is bounded by 1. Hence, for each n ∈ N there exists a sequence
(kjn )j∈N such that

φn dμkjn converges to some number n ∈ R when j → ∞.

Moreover, each sequence (kjn+1 )j∈N may be chosen to be a subsequence of the


j
previous (kjn )j∈N . Define j = kj for each j ∈ N. By construction, ( j )j∈N is a
subsequence of every (kjn )j∈N , up to finitely many terms. Hence,
 
φn dμ j → n for every n ∈ N.
j

One can easily deduce that



(ϕ) = lim ϕ dμ j (2.1.11)
j

exists, for every function ϕ ∈ C0 (M). Indeed, suppose first that ϕ is in the unit
ball of C0 (M). Given any ε > 0, we may find n ∈ N such that ϕ − φn  ≤ ε.
Then,   
 
 ϕ dμ − φn dμ  ≤ ε
 j j

for every j. Since φn dμ j converges (to n ), it follows that


 
lim sup ϕ dμ j − lim inf ϕ dμ j ≤ 2ε.
j j

Since ε is arbitrary, we find that limj ϕ dμ j exists. This proves (2.1.11) when
the function is in the unit ball. The general case reduces immediately to this
one, just replacing ϕ with ϕ/ϕ. In this way, we have completed the proof of
(2.1.11).
Finally, it is clear that the operator  : C0 (M) → R defined by (2.1.11) is
linear and positive: (ϕ) ≥ min ϕ ≥ 0 whenever ϕ ≥ 0 at all points. Moreover,
(1) = 1. Thus, by Theorem A.3.11, there exists some Borel probability
42 Existence of invariant measures

measure μ on M such that (ϕ) = ϕ dμ for every continuous function ϕ.


Now, the equality in (2.1.11) may be rewritten as
 
ϕ dμ = lim ϕ dμ j for every ϕ ∈ C0 (M).
j

According to Lemma 2.1.1, this means that the subsequence (μ j )j∈N converges
to μ in the weak∗ topology.

As we observed previously, Theorem 2.1.5 is an immediate consequence of


the proposition we have just proved.

2.1.5 Theorem of Prohorov


The theorem that we are going to state in this section provides a very general
criterion for a family of probability measures to be compact. Indeed, the
class of metric spaces M to which it applies includes virtually all interesting
examples.

Definition 2.1.7. A set M of Borel measures in a topological space is tight if


for every ε > 0 there exists a compact set K ⊂ M such that μ(K c ) < ε for every
measure μ ∈ M.

Note that when M consists of a single measure this definition corresponds


exactly to Definition A.3.6. Clearly, tightness is a hereditary property: if a set
is tight then all its subsets are also tight. Note also that if M is a compact metric
space then the space M1 (M) of all probability measures is a tight set. So, the
next result is an extension of Theorem 2.1.5:

Theorem 2.1.8 (Prohorov). Let M be a complete separable metric space. A


set K ⊂ M1 (M) is tight if and only if every sequence in K admits some
subsequence that is convergent in the weak∗ topology of M1 (M).

Proof. We only prove the necessary condition, which is the most useful part
of the statement. Then, in Exercise 2.1.8, we invite the reader to prove the
converse.
Suppose that K is tight. Consider an increasing sequence (Kl )l of compact
subsets of M such that η(Klc ) ≤ 1/l for every l and every η ∈ K. Fix any
sequence (μn )n in K. To begin with, we claim that for every l there exists a
subsequence (nj )j and there exists a measure νl on M such that νl (Klc ) = 0 and
(μnj | Kl )j converges to νl , in the sense that
 
ψ dμnj → ψ dνl for every continuous function ψ : Kl → R. (2.1.12)
Kl Kl

Indeed, that is a simple consequence of Theorem 2.1.5: up to restricting to a


subsequence, we may suppose that the limit bl = limn μn (Kl ) exists (note that
2.1 Weak∗ topology 43

1 ≥ bl ≥ 1 − 1/l); it follows from the theorem that the sequence of normalized


restrictions
 
(μn | Kl )/μn (Kl ) n

admits a subsequence converging to some probability measure ηl ∈ M1 (Kl ); to


conclude the proof of the claim it suffices to take ηl to be a probability measure
on M with ηl (Klc ) = 0 and to choose νl = bl ηl .
Next, using a diagonal argument analogous to the one in Proposition 2.1.6,
we may choose a subsequence (nj )j in such a way that (2.1.12) holds, simul-
taneously, for every l ≥ 1. Observe that the sequence (νl )l is non-decreasing:
given k > l and any continuous function φ : M → [0, 1],
   
φ dνl = lim φ dμnj ≤ lim φ dμnj = φ dνk .
j Kl j Kk

Analogously, for any k > l and any continuous function φ : M → [0, 1],
  
φ dνk − φ dνl = lim φ dμnj ≤ lim sup μnj (Klc ) ≤ 1/l.
j Kk \Kl j

Using Exercise A.3.5, we may translate this in terms of measures of sets (rather
than integrals of functions): for every k > l and every Borel set E ⊂ M,

νl (E) ≤ νk (E) ≤ νl (E) + 1/l. (2.1.13)

Define ν(E) = liml νl (E) for each Borel set E. We claim that ν is a probability
measure on M. It is immediate from the definition that ν(∅) = 0 and that
ν is additive. Furthermore, ν(M) = liml ν(Kl ) = liml bl = 1. To show that ν
is countably additive (σ -additive), we use the criterion of continuity at the
empty set (Theorem A.1.14). Consider any decreasing sequence (Bn )n of Borel
subsets of M with n Bn = ∅. Given ε > 0, choose l such that 1/l < ε. Since
νl is countably additive, Theorem A.1.14 shows that νl (Bn ) < ε for every n
sufficiently large. Hence, ν(Bn ) ≤ νl (Bn ) + 1/l < 2ε for every n sufficiently
large. This proves that (ν(Bn ))n converges to zero and, by Theorem A.1.14, it
follows that ν is indeed countably additive.
The definition of ν implies (see Exercise 2.1.1 or Exercise 2.1.4) that (νl )l
converges to ν in the weak∗ topology. So, given ε > 0 and any bounded
continuous function ϕ : M → R, we have that | ϕ dνl − ϕ dν| < ε for every
l sufficiently large. Fix l such that, in addition, sup |ϕ|/l < ε. Then,
       
     
 ϕ dμn − ϕ dνl  ≤  ϕ dμ  +  ϕ dμ − ϕ dν 
l  ≤ 2ε
 j   nj   nj
Klc Kl Kl

for every j sufficiently large. This shows that | ϕ dμnj − ϕ dν| < 3ε whenever
j is large enough. Thus, (μnj )j converges to ν in the weak∗ topology.
44 Existence of invariant measures

2.1.6 Exercises
2.1.1. Let M be a metric space and (μn )n be a sequence in M1 (M). Show that the
following conditions are all equivalent:
1. (μn )n converges to a probability measure μ in the weak∗ topology.
2. lim supn μn (F) ≤ μ(F) for every closed set F ⊂ M.
3. lim infn μn (A) ≥ μ(A) for every open set A ⊂ M.
4. limn μn (B) = μ(B) for every continuity set B of μ.
5. limn ψ dμn = ψ dμ for every Lipschitz function ψ : M → R.
2.1.2. Fix any dense subset F of the unit ball of C0 (M). Show that a sequence (μn )n∈N
of probability measures on M converges to some μ ∈ M1 (M) in the weak∗
topology if and only if
 
φ dμn converges to φ dμ, for every φ ∈ F .

2.1.3. Show that the subset formed by the measures with finite support is dense in
M1 (M), relative to the weak∗ topology. Assuming that the metric space M is
separable, conclude that M1 (M) is also separable.
2.1.4. The uniform topology in M1 (M) is defined by the basis of neighborhoods

Vu (μ, ε) = {ν ∈ M1 (M) : |μ(B) − ν(B)| < ε for every B ∈ B},

and the pointwise topology is defined by the basis of neighborhoods

Vp (μ, B, ε) = {ν ∈ M1 (M) : |μ(Bi ) − ν(Bi )| < ε for every i},

where ε > 0, n ≥ 1 and B = {B1 , . . . , BN } is a finite family of measurable sets.


Check that the uniform topology is stronger than the pointwise topology and the
latter is stronger than the weak∗ topology. Show, by means of examples, that
these relations may be strict.
2.1.5. Complete the proof of Lemma 2.1.3.
2.1.6. Let Vk , k = 1, 2, . . . be random variables with real values, that is, measurable
functions Vk : (X, B, μ) → R defined in some probability space (X, B, μ). The
distribution function of Vk is the monotone function Fk : R → [0, 1] defined by
Fk (a) = μ({x ∈ X : Vk (x) ≤ a}). We say that (Vk )k converges in distribution to
some random variable V if limk Fk (a) = F(a) for every continuity point a of the
distribution function F of the random variable V. What does this have to do with
the weak∗ topology?
2.1.7. Let (μn )n∈N be a sequence of probability measures converging to some μ in the
weak∗ topology. Let B be a continuity set of μ with μ(B) > 0. Prove that the
normalized restrictions (μn | B)/μn (B) converge to the normalized restriction
(μ | B)/μ(B) when n → ∞. What can be said if we replace continuity sets by
closed sets or by open sets?
2.1.8. (Converse to the theorem of Prohorov) Prove that if K ⊂ M1 (M) is such that
every sequence in K admits some convergent subsequence then K is tight.
2.2 Proof of the existence theorem 45

2.2 Proof of the existence theorem


Given any f : M → M and any measure η on M, we denote by f∗ η and call the
iterate (or image) of η under f the measure defined by
   
f∗ η B = η f −1 (B)
for each measurable set B ⊂ M. Note that the measure η is invariant under f if
and only if f∗ η = η.

Lemma 2.2.1. Let η be a measure and φ be a bounded measurable function.


Then  
φ df∗ η = φ ◦ f dη. (2.2.1)

Proof. If φ is the characteristic function of a measurable set B then the relation


(2.2.1) means that f∗ η(B) = η(f −1 (B)), which holds by definition. By linearity
of the integral, it follows that (2.2.1) holds whenever φ is a simple function.
Finally, since every bounded measurable function can be approximated by
simple functions (see Proposition A.1.33), it follows that the claim in the
lemma is true in general.

Proposition 2.2.2. If f : M → M is continuous then f∗ : M1 (M) → M1 (M) is


continuous relative to the weak∗ topology.

Proof. Let ε > 0 and  = {φ1 , . . . , φn } be any family of bounded continuous


functions. Since f is continuous, the family  = {φ1 ◦f , . . . , φn ◦f } also consists
of bounded continuous functions. By the previous lemma,
     
   
 φi d(f∗ μ) − φi d(f∗ ν) =  (φi ◦ f ) dμ − (φi ◦ f ) dν 
   

and so the left-hand side is smaller than ε if the right-hand side is smaller than
ε. That means that
 
f∗ V(μ, , ε) ⊂ V(f∗ μ, , ε)) for every μ,  and ε,
and this last fact shows that f∗ is continuous.

At this point, Theorem 2.1 could be deduced from the classical Schauder–
Tychonoff fixed point theorem for continuous operators in topological vector
spaces. A topological vector space is a vector space V endowed with a
topology relative to which both operations of V (addition and multiplication
by a scalar) are continuous. A set K ⊂ V is said to be convex if (1 − t)x + ty ∈ K
for every x, y ∈ K and every t ∈ [0, 1].

Theorem 2.2.3 (Schauder–Tychonoff). Let F : V → V be a continuous


transformation on a topological vector space V. Suppose that there exists a
compact convex set K ⊂ V such that F(K) ⊂ K. Then F(v) = v for some v ∈ K.
46 Existence of invariant measures

Theorem 2.1 corresponds to the special case when V = M(M) is the space
of complex measures, K = M1 (M) is the space of probability measures on M
and F = f∗ is the action of f on M(M).
However, the situation of Theorem 2.1 is a lot simpler than the general
case of the Schauder–Tychonoff theorem because the operator f∗ besides being
continuous is also linear. This allows for a direct and elementary proof of
Theorem 2.1 that also provides some additional information about the invariant
measure.
To that end, let ν be any probability measure on M: for example, ν could be
the Dirac mass at any point. Form the sequence of probability measures

1 j
n−1
μn = f ν, (2.2.2)
n j=0 ∗

j
where f∗ ν is the image of ν under the iterate f j . By Theorem 2.1.5, this
sequence has some accumulation point, that is, there exists some subsequence
(nk )k∈N and some probability measure μ ∈ M1 (M) such that
nk −1
1 
f∗j ν → μ (2.2.3)
nk j=0

in the weak∗ topology. Now we only need to prove:

Lemma 2.2.4. Every accumulation point of a sequence (μn )n∈N of the form
(2.2.2) is a probability measure invariant under f .

Proof. The relation (2.2.3) asserts that, given any family  = {φ1 , . . . , φN } of
bounded continuous functions and given any ε > 0, we have
 n   
 1 k −1 
 (φ ◦ f j
) dν − φ dμ < ε/2 (2.2.4)
n i i 
k j=0

for every i and every k sufficiently large. By Proposition 2.2.2,


 nk −1 
1  1 
nk
f∗ μ = f∗ lim f∗ ν = lim
j
f∗j ν . (2.2.5)
k nk k nk
j=0 j=1

Now observe that


 n  nk  
 1 k −1 1  
 (φ ◦ f j
) dν − (φ ◦ f j
) dν 
n i
nk j=1
i 
k j=0
  
1   2
=  φi dν − (φi ◦ f nk ) dν  ≤ sup |φi |,
nk nk
2.2 Proof of the existence theorem 47

and the latter expression is smaller than ε/2 for every i and every k sufficiently
large. Combining this fact with (2.2.4), we conclude that
 nk   
1  
 (φ ◦ f j
) dν − φ dμ<ε (2.2.6)
n i i 
k j=1

for every i and every k sufficiently large. This means that

1 
n
k
f∗j ν → μ
nk j=1

when k → ∞. However, (2.2.5) means that this sequence converges to f∗ μ. By


uniqueness of the limit, it follows that f∗ μ = μ.

Now the proof of Theorem 2.1 is complete. The examples that follow show
that neither of the two hypotheses in the theorem, continuity and compactness,
may be omitted.

Example 2.2.5. Consider f : (0, 1] → (0, 1] given by f (x) = x/2. Suppose that
f admits some invariant probability measure: we are going to show that this is
actually not true. By the recurrence theorem (Theorem 1.2.4), relative to that
probability measure almost every point of (0, 1] is recurrent. However, it is
clear that there are no recurrent points: the orbit of every x ∈ (0, 1] converges
to zero and, in particular, does not accumulate on the initial point x. Hence, f
is an example of a continuous transformation (on a non-compact space) that
does not have any invariant probability measure.

Example 2.2.6. Modifying a little the previous construction, we see that the
same phenomenon may occur in compact spaces, if the transformation is not
continuous. Consider f : [0, 1] → [0, 1] given by f (x) = x/2 if x  = 0 and f (0) =
1. For the same reason as before, no point x ∈ (0, 1] is recurrent. So, if there
exists some invariant probability measure μ then it must give full weight to the
sole recurrent point x = 0. In other words, μ must be the Dirac mass supported
at zero, that is, the measure δ0 defined by
δ0 (E) = 1 if 0 ∈ E and δ0 (E) = 0 if 0 ∈
/ E.
However, the measure δ0 is not invariant under f : for example, the measurable
set E = {0} has measure 1 and yet its pre-image f −1 (E) is the empty set, which
has measure zero. Thus, this transformation f has no invariant probability
measures.

Our third example is of a different nature. We include it to stress the


limitations of Theorem 2.1 (which are inherent to its great generality): the
measures whose existence is ensured by the theorem may be completely trivial;
for example, in the situation that we are going to describe “almost every point”
just means the point x = 0. For this reason, an important objective in ergodic
48 Existence of invariant measures

theory is to construct more sophisticated invariant measures, with additional


interesting properties such as, for instance, being equivalent to the Lebesgue
measure.

Example 2.2.7. Consider f : [0, 1] → [0, 1] given by f (x) = x/2. This is


a continuous transformation on a compact space. So, by Theorem 2.1, f
admits some invariant probability measure. Using the same arguments as in
the previous example, we find that there exists a unique invariant probability
measure, namely, the Dirac mass δ0 at the origin. Note that in this case the
measure δ0 is indeed invariant.

As an immediate application of Theorem 2.1, we have the following


alternative proof of the Birkhoff recurrence theorem (Theorem 1.2.6). Suppose
that f : M → M is a continuous transformation on a compact metric space.
By Theorem 2.1, there exists some f -invariant probability measure μ. Every
compact metric space admits a countable basis of open sets. So, we may apply
Theorem 1.2.4 to conclude that μ-almost every point is recurrent. In particular,
the set of recurrent points is non-empty, as stated by Theorem 1.2.6.

2.2.1 Exercises
2.2.1. Prove the following generalization of Lemma 2.2.4. Let f : M → M be a
continuous transformation on a compact metric space, ν be a probability measure
on M and (In )n be a sequence of intervals of natural numbers such that #In
converges to infinity when n goes to infinity. Then every accumulation point of
the sequence
1  j
μn = f ν
#In j∈I ∗
n

is an f -invariant probability measure.


2.2.2. Let f1 , . . . , fq : M → M be any finite family of commuting continuous transfor-
mations on a compact metric space. Prove that there exists some probability
measure μ that is invariant under fi for every i ∈ {1, . . . , q}. In fact, the conclusion
remains true for any countable family {fj : j ∈ N} of commuting continuous
transformations on a compact metric space.
2.2.3. Let f : [0, 1] → [0, 1] be the decimal expansion transformation. Show that for
every k ≥ 1 there exists some invariant probability measure whose support is
formed by exactly k points (in particular, f admits infinitely many invariant prob-
ability measures). Determine whether there are invariant probability measures μ
such that
(a) the support of μ is infinite countable;
(b) the support of μ is non-countable but has empty interior;
(c) the support of μ has non-empty interior but μ is singular with respect to the
Lebesgue measure m.
2.3 Comments in functional analysis 49

2.2.4. Prove the theorem of existence of invariant measures for continuous flows:
every continuous flow (f t )t∈R on a compact metric space admits some invariant
probability measure.
2.2.5. Show that the transformation f : [−1, 1] → [−1, 1], f (x) = 1 − 2x2 has some
invariant probability measure equivalent to the Lebesgue on the interval.
2.2.6. Let f : M → M be an invertible measurable transformation and m be a probability
measure on M such that m(A) = 0 if and only if m(f (A)) = 0. We say that the pair
(f , m) is totally dissipative if there exists a measurable set W ⊂ M whose iterates
f j (W), j ∈ Z are pairwise disjoint and such that their union has full measure. Prove
that if (f , m) is totally dissipative then f admits some σ -finite invariant measure
equivalent to Lebesgue measure m. This measure is necessarily infinite.
2.2.7. Let f : M → M be an invertible measurable transformation and m be a probability
measure on M such that m(A) = 0 if and only if m(f (A)) = 0. We say that the
pair (f , m) is conservative if there is no measurable set W ⊂ M with positive
measure whose iterates f j (W), j ∈ Z are pairwise disjoint. Show that if (f , m) is
conservative then, for every measurable set X ⊂ M, m-almost every point of X
returns to X infinitely times.
2.2.8. Suppose that (f , m) is conservative. Show that f admits a σ -finite invariant
measure μ equivalent to m if and only if there exist sets X1 ⊂ · · · ⊂ Xn ⊂ · · ·
with M = n Xn and m(Xn ) < ∞ for every n, such that the first-return map fn to
each Xn admits a finite invariant measure μn absolutely continuous with respect
to the restriction of m to Xn .
2.2.9. Find conservative pairs (f , m) such that f has no finite invariant measures
equivalent to m. [Observation: Ornstein [Orn60] gave examples such that f does
not even have σ -finite invariant measures equivalent to m.]

2.3 Comments in functional analysis


The definition of weak∗ topology in the space of probability measures is a
special case of a construction from functional analysis that is worthwhile
recalling here. It leads us to introducing a certain linear isometry Uf in
the space L1 (μ), called the Koopman operator of the system (f , μ). These
operators have an important role in ergodic theory because they allow for
powerful tools from Analysis to be used in the study of invariant measures.
To illustrate this fact, we present an alternative proof of Theorem 2.1 based on
the spectral properties of the Koopman operator.

2.3.1 Duality and weak topologies


Let E be a Banach space, that is, a vector space endowed with a complete norm.
The dual of E is the space E∗ of all continuous linear functionals defined on E.
This is also a Banach space, with the norm
 
|g(v)|
g = sup : v ∈ E \ {0} . (2.3.1)
v
50 Existence of invariant measures

The weak topology in the space E is the topology defined by the following
basis of neighborhoods:

V(v, {g1 , . . . , gN }, ε) = {w ∈ E : |gi (v) − gi (w)| < ε for every i}, (2.3.2)

where g1 , . . . , gN ∈ E∗ . In terms of sequences, it satisfies

(vn )n → v ⇒ (g(vn ))n → g(v) for every g ∈ E∗ .

The weak∗ topology in the dual space E∗ is the topology defined by the
following basis of neighborhoods:

V ∗ (g, {v1 , . . . , vN }, ε) = {h ∈ E∗ : |g(vi ) − h(vi )| < ε for every i}, (2.3.3)

where v1 , . . . , vN ∈ E. It satisfies

(gn )n → g ⇒ (gn (v))n → g(v) for every v ∈ E.

The weak∗ topology has the following remarkable property:

Theorem 2.3.1 (Banach–Alaoglu). The closed unit ball of E∗ is compact for


the weak∗ topology.

The construction carried out in the previous sections corresponds to the


situation where E is the space C0 (M) of (complex) continuous functions and
E∗ is the space M(M) of complex measures on a compact metric space
M: according to the theorem of Riesz–Markov (Theorem A.3.12), M(M)
corresponds to the dual of C0 (M) when we identify each measure μ ∈ M(M)
with the linear functional Iμ (φ) = φ dμ. Note that the definition of the norm
(2.3.1) implies that
 
| φ dμ|
μ = sup : φ ∈ C0 (M) \ {0} .
sup |φ|
In particular, the set M1 (M) of probability measures is contained in the unit
ball of M(M). Since this set is closed for the weak∗ topology, we conclude that
Theorem 2.1.5 is also a direct consequence of the theorem of Banach–Alaoglu.
Now consider any continuous transformation f : M → M and the correspond-
ing action f∗ : M(M) → M(M), μ → f∗ μ in the space of complex measures.
Then f∗ is a linear operator on M(M) and it is continuous with respect to
the weak∗ topology. There exists another continuous linear operator naturally
associated with f , namely Uf : C0 (M) → C0 (M), φ → φ ◦ f . Observe that these
two operators are dual, in the following sense (remember Lemma 2.2.1):
  
Uf (φ) dμ = (φ ◦ f ) dμ = φ d(f∗ μ). (2.3.4)

These observations motivate the important notion that we are going to


introduce in the next section.
2.3 Comments in functional analysis 51

2.3.2 Koopman operator


Let (M, B) be a measurable space, f : M → M be a measurable transformation
and μ be an f -invariant measure. The Koopman operator of (f , μ) is the linear
operator
Uf : L1 (μ) → L1 (μ), Uf (φ) = φ ◦ f .
Note that Uf is well defined and is an isometry, that is, it preserves the norm in
the Banach space L1 (μ): since μ is invariant under f ,
  
Uf (φ)1 = |Uf (φ)| dμ = |φ| ◦ f dμ = |φ| dμ = φ1 . (2.3.5)

Moreover, Uf is a positive linear operator: Uf (φ) ≥ 0 at μ-almost every point


whenever φ ≥ 0 at μ-almost every point. For future reference, we summarize
these facts in the following proposition:

Proposition 2.3.2. The Koopman operator Uf : L1 (M) → L1 (M) of any system


(f , μ) is a positive linear isometry.

The property (2.3.5) implies that the operator Uf is injective. In general, Uf


is not surjective (see Exercise 2.3.5). It is clear that if f is invertible then Uf is
an isomorphism: the inverse is just the Koopman operator Uf −1 of the inverse
of f .
We may also consider versions of the Koopman operator defined on the
spaces Lp (μ),
Uf : Lp (μ) → Lp (μ), Uf (φ) = φ ◦ f
for each p ∈ [1, ∞]. Proposition 2.3.2 remains valid in all these cases: all these
operators are positive linear isometries.
When M is a compact metric space and f is continuous, it is particularly
interesting to investigate the action of Uf restricted to the space C0 (M) of
continuous functions:
Uf : C0 (M) → C0 (M).
It is clear that this operator is continuous relative to the norm of uniform
convergence. As we have seen previously, the dual space of C0 (M) is naturally
identified with the space M(M) of complex measures on M. Moreover, the
relation (2.3.4) shows that, under that identification, the dual operator
Uf∗ : C0 (M)∗ → C0 (M)∗
corresponds precisely to the action f∗ : M(M) → M(M) of the transformation
f on M(M). This fact allows us to give an alternative proof of Theorem 2.1,
based on certain facts from spectral theory.
For that, we need to recall some notions from the theory of positive linear
operators. The reader can find a lot more details in Deimling [Dei85], including
the proofs of the results quoted in the following.
52 Existence of invariant measures

Let E be a Banach space. A closed convex subset C is called a cone of E if


it satisfies:
λC ⊂ C for every λ ≥ 0 and C ∩ (−C) = {0}. (2.3.6)
We call the cone C normal if
inf{x + y : x, y ∈ C such that x = y = 1} > 0.
Let us fix a cone C of E. Given any continuous linear operator T : E → E,
we say that T is positive over C if the image T(C) ⊂ C. Given a continuous
linear functional φ : E → R, we say that φ is positive over C if φ(v) ≥ 0 for
every v ∈ C. By definition, the dual cone C∗ is the cone of E∗ formed by all
the linear functionals positive over C.

Example 2.3.3. The cone C+ 0


(M) = {ϕ ∈ C0 (M) : ϕ ≥ 0} is a normal cone
of C0 (M) (Exercise 2.3.3). By the Riesz–Markov theorem (Theorem A.3.11),
the dual cone is naturally identified with the space of finite positive measures
on M.

Denote by r(T) the spectral radius of the continuous linear operator T:



r(T) = lim n T n .
n

Then r(T) = r(T ∗ ), where T ∗ : E∗ → E∗ represents the linear operator dual


to T. The next result is a consequence of the theorem of Banach–Mazur; see
Proposition 7.2 in Deimling [Dei85]:

Theorem 2.3.4. Let C be a normal cone of a Banach space E and T : E → E


be a linear operator positive over C. Then r(T ∗ ) is an eigenvalue of the dual
operator T ∗ : E∗ → E∗ and it admits some eigenvector v ∗ ∈ C∗ .

As an application, let us give an alternative proof of the existence of invariant


measures for continuous transformations on compact spaces. Consider the
cone C = C+ 0
(M) of E = C0 (M). As we observed before, the dual cone C∗
is the space of finite positive measures on M. It is clear from the definition
that the operator T = Uf is positive over C. Also, its spectral radius is equal
to 1, since sup |T(ϕ)| ≤ sup |ϕ| for every ϕ ∈ C0 (M) and T(1) = 1. So, by
Theorem 2.3.4, there exists some finite positive measure μ on M that is an
eigenvalue of the dual operator T ∗ = f∗ associated with the eigenvalue 1. In
other words, the measure μ is invariant. Multiplying by a suitable constant, we
may suppose that μ is a probability measure.

2.3.3 Exercises
2.3.1. Let 1 be the space of summable sequences of complex numbers, endowed with

the norm (an )n 1 = ∞ n=0 |an |. Let

be the space of bounded sequences and
c0 be the space of sequences converging to zero, both endowed with the norm
(an )n ∞ = supn≥0 |an |.
2.4 Skew-products and natural extensions 53

(a) Check that ∞ , 1 and c0 are Banach spaces.


  
(b) Show that the map (an )n → (bn )n → n an bn defines norm-preserving
isomorphisms from ∞ to the dual space ( 1 )∗ and from 1 to the dual space
(c0 )∗ .
2.3.2. Show that a sequence (xk )k in 1 (write xk = (xnk )n for each k) converges in the
topology defined by the norm if and only if it converges in the weak topology,

that is, if and only if ( n an xnk )k converges for every (an )n ∈ ∞ . [Observation:
This does not imply that the two topologies are the same. Why not?] Show that
this is no longer true if we replace the weak topology by the weak∗ topology.
2.3.3. Check that C+0 (M) is a normal cone.
2.3.4. Let Rθ : S1 → S1 be an irrational rotation and m be the Lebesgue measure on the
circle. Calculate the eigenvalues and the eigenvectors of the Koopman operator
Uθ : L2 (m) → L2 (m). Show that the spectrum of Uθ coincides with the unit circle
{z ∈ C : |z| = 1}.
2.3.5. Show, through examples, that the Koopman operator Uf need not be surjective.
2.3.6. Let U : H → H be an isometry of a Hilbert space H. By Exercise A.6.8, the image
of U is a closed subspace of H. Deduce that there exist closed subspaces V and
W such that U(V) = V, the iterates of W are pairwise orthogonal and orthogonal
to V, and
∞
H=V⊕ U n (W).
n=0

Furthermore, U is an isomorphism if and only if W = {0}.


2.3.7. Let φ : E → R be a continuous convex functional on a separable Banach space E.
Assume that φ is differentiable in all directions at some point u ∈ E. Prove that
there exists at most one bounded linear functional T : E → R tangent to φ at u,
that is, such that T(v) ≤ φ(u + v) − φ(u) for every v ∈ E. If φ is differentiable at
u then the derivative Dφ(u) is a linear functional tangent to φ at u. [Observation:
The smoothness theorem of Mazur (Theorem 1.20 in Phelps [Phe93]) states that
the set of points where φ is differentiable and, consequently, there exists a unique
linear functional tangent to φ is a residual subset of E.]

2.4 Skew-products and natural extensions


In this section we describe two general constructions that are quite useful
in ergodic theory. The first one is a basic model for the situation where
two dynamical systems are coupled in the following way: the first system
is autonomous but the second one is not, because its evolution depends on
the evolution of the former. The second construction associates an invertible
system with any given system, in such a way that their invariant measures are
in one-to-one correspondence. This permits reduction to the invertible case for
many statements about general, not necessarily invertible systems.

2.4.1 Measures on skew-products


Let (X, A) and (Y, B) be measurable spaces. We call a skew-product any
measurable transformation F : X × Y → X × Y of the form F(x, y) =
54 Existence of invariant measures

(f (x), g(x, y)). Represent by π : X × Y → X the canonical projection to the


first coordinate. By definition,
π ◦ F = f ◦ π. (2.4.1)
Let m be a probability measure on X × Y invariant under F and let μ = π∗ m be
its projection to X. Using that m is invariant under F, we get that
f∗ μ = f∗ π∗ m = π∗ F∗ m = π∗ m = μ,
that is, μ is invariant under f . The next proposition provides a partial converse
to this conclusion: under suitable hypotheses, every f -invariant measure is the
projection of some F-invariant measure.

Proposition 2.4.1. Let X be a complete separable metric space, Y be a


compact metric space and F be continuous. Then, for every probability
measure μ on X invariant under f there exists some probability measure m
on X × Y invariant under F and such that π∗ m = μ.

Proof. Given any f -invariant probability measure invariant μ on X, let


K ⊂ M1 (X × Y) be the set of measures η on X × Y such that π∗ η = μ.
Consider any η ∈ K. Then, π∗ F∗ η = f∗ π∗ η = f∗ μ = μ. This proves that K is
invariant under F∗ . Next, note that the projection π : X × Y → X is continuous
and, thus, the operator π∗ is continuous relative to the weak∗ topology. So,
K is closed in M1 (X × Y). By Proposition A.3.7, given any ε > 0 there
exists a compact set K ⊂ X such that μ(K c ) < ε. Then K × Y is compact
and η (K × Y)c = μ(K c ) < ε for every η ∈ K. This proves that the set K
is tight. Consider any η ∈ K. By the theorem of Prohorov (Theorem 2.1.8), the
sequence
1 j
n−1
F η
n j=0 ∗

has some accumulation point m ∈ K. Arguing as in the proof of Lemma 2.2.4,


we conclude that m is invariant under F.

2.4.2 Natural extensions


We are going to see that, given any surjective transformation f : M → M, one
can always find an extension fˆ : M̂ → M̂ that is invertible. By extension we
mean that there exists a surjective map π : M̂ → M such that π ◦ fˆ = f ◦ π .
This fact is very useful, for it makes it possible to reduce to the invertible case
the proofs of many statements about general systems. We comment on the
surjective hypothesis in Example 2.4.2: we will see that this hypothesis can be
omitted in many interesting cases.
To begin with, take M̂ to be the set of all pre-orbits of f , that is, all sequences
(xn )n≤0 indexed by the non-positive integers and satisfying f (xn ) = xn+1 for
2.4 Skew-products and natural extensions 55

every n < 0. Consider the map π : M̂ → M sending each sequence (xn )n≤0 to
its term x0 of order zero. Observe that π(M̂) = M. Finally, define fˆ : M̂ → M̂
to be the shift by one unit to the left:
fˆ(. . . , xn , . . . , x0 ) = (. . . , xn , . . . , x0 , f (x0 )). (2.4.2)
It is clear that fˆ is well defined and satisfies π ◦ fˆ = f ◦ π . Moreover, fˆ is
invertible: the inverse is the shift to the right:
(. . . , yn , . . . , y−1 , y0 ) → (. . . , yn , . . . , y−2 , y−1 ).
If M is a measurable space then we may turn M̂ into a measurable space by
endowing it with the σ -algebra generated by the measurable cylinders
[Ak , . . . , A0 ] = {(xn )n≤0 ∈ M̂ : xi ∈ Ai for i = k, . . . , 0}, (2.4.3)
where k ≤ 0 and Ak , . . . , A0 are measurable subsets of M. Then π is a
measurable map, since
π −1 (A) = [A]. (2.4.4)
Moreover, fˆ is measurable if f is measurable:
 
fˆ−1 ([Ak , . . . , A0 ]) = Ak , . . . , A−2 , A−1 ∩ f −1 (A0 ) . (2.4.5)
The inverse of fˆ is also measurable, since
fˆ([Ak , . . . , A0 ]) = [Ak , . . . , A0 , M]. (2.4.6)
Analogously, if M is a topological space then we may turn M̂ into a
topological space by endowing it with the topology generated by the open
cylinders [Ak , . . . , A0 ], where k ≤ 0 and Ak , . . . , A0 are open subsets of M.
The relations (2.4.4) and (2.4.6) show that π and fˆ−1 are continuous, whereas
(2.4.5) shows that fˆ is continuous if f is continuous. Observe that if M admits
a countable basis U of open sets then the cylinders [Ak , . . . , A0 ] with k ≥ 0 and
A0 , . . . , Ak ∈ U constitute a countable basis of open sets for M̂.
If M is a metric space, with distance d, then the following function is a
distance on M̂:
 
0
d̂ x̂, ŷ) = 2n min{d(xn , yn ), 1}, (2.4.7)
n=−∞

where x̂ = (xn )n≤0 and ŷ = (yn )n≤0 . It follows immediately from the definition
that if x̂ and ŷ belong to the same pre-image π −1 (x) then
d̂(fˆj (x̂), fˆj (ŷ)) ≤ 2−j d̂(x̂, ŷ) for every j ≥ 0.
So, every pre-image π −1 (x) is a stable set, that is, a subset restricted to which
the transformation fˆ is uniformly contracting.
Example 2.4.2. Given any transformation g : M → M, consider its maximal
invariant set Mg = ∞
n=1 g (M). Clearly, g(Mg ) ⊂ Mg . Suppose that
n

(i) M is compact and g is continuous or (ii) #g−1 (y) < ∞ for every y.
56 Existence of invariant measures

Then (Exercise 2.4.3), the restriction f = (g | Mg ) : Mg → Mg is surjective. This


restriction contains all the interesting dynamics of g. For example, assuming
that f n (M) is a measurable set for every n, every probability measure invariant
under g is also invariant under f . Analogously, every point that is recurrent for
g is also recurrent for f , at least in case (i). For this reason, we also refer to the
natural extension of f = (g | Mg ) as the natural extension of g.

A set  ⊂ M such that f −1 () =  is called an invariant set of f . There is


a corresponding notion for the transformation fˆ. The next proposition shows
that every closed invariant set of f admits a unique lift to a closed invariant set
of the transformation fˆ:

Proposition 2.4.3. Assume that M is a topological space. If  ⊂ M is a closed


set invariant under f then ˆ = π −1 () is the only closed set invariant under f
ˆ = .
and satisfying π()

Proof. Since π is continuous, if  is closed then ˆ = π −1 () is also closed.


Moreover, if  is invariant under f then ˆ is invariant under fˆ:
fˆ−1 ()
ˆ = (π ◦ fˆ)−1 () = (f ◦ π )−1 () = π −1 () = .
ˆ

In the converse direction, let ˆ ⊂ M̂ be a closed set invariant under fˆ and


such that π() ˆ = . It is clear that ˆ ⊂ π −1 (). To prove the other inclusion,
we must show that, given any x0 ∈ , if x̂ ∈ π −1 (x0 ) then x̂ ∈ . ˆ Let us write
x̂ = (xn )n≤0 . Consider n ≤ 0 and any neighborhood of x̂ of the form
V = [An , . . . , A0 ], An , . . . , A0 open subsets of M.
By the definition of natural extension, x0 = f −n (xn ) and, hence, xn ∈ f n () = .
Then, the hypothesis π() ˆ =  implies that π(ŷn ) = xn for some ŷn ∈ . ˆ
Since ˆ is invariant under fˆ, we have that fˆ−n (ŷn ) ∈ .
ˆ Moreover, the property
π(ŷn ) = xn implies that
f −n (ŷn ) = (. . . , yn,k , . . . , yn,−1 , yn,0 = xn , xn−1 , . . . , x−1 , x0 ).
It follows that f −n (ŷn ) ∈ V, since V contains x̂ and its definition only depends
on the coordinates indexed by j ∈ {n, . . . , 0}. This proves that x̂ is accumulated
by elements of . ˆ Since ˆ is closed, it follows that x̂ ∈ .ˆ

Now let μ̂ be an invariant measure of fˆ and let μ = π∗ μ̂. The property


π ◦ fˆ = f ◦ π implies that μ is invariant under f :
f∗ μ = f∗ π∗ μ̂ = π∗ fˆ∗ μ̂ = π∗ μ̂ = μ.
We say that μ̂ is a lift of μ. The next result, which is a kind of version of
Proposition 2.4.3 for measures, is due to Rokhlin [Rok61]:

Proposition 2.4.4. Assume that M is a complete separable metric space and


f : M → M is continuous. Then every probability measure μ invariant under f
2.4 Skew-products and natural extensions 57

admits a unique lift, that is, there is a unique measure μ̂ on M̂ invariant under
fˆ and such that π∗ μ̂ = μ.

Uniqueness is easy to establish and is independent of the hypotheses on the


space M and the transformation f . Indeed, if μ̂ is a lift of μ then (2.4.4) and
(2.4.5) imply that the measure of every cylinder is uniquely determined:
   
μ̂([Ak , . . . , A0 ]) = μ̂ Ak ∩ · · · ∩ f −k (A0 ) = μ Ak ∩ · · · ∩ f −k (A0 ) . (2.4.8)

The proof of existence will be proposed to the reader in Exercise 5.2.4, using
ideas to be developed in Chapter 5. We will also see in Exercise 8.5.7 that those
arguments remain valid in the somewhat more general setting of Lebesgue
spaces. But existence of the lift is not true in general, for arbitrary probability
spaces, as shown by the example in Exercise 1.15 in the book of Przytycki and
Urbański [PU10]).

2.4.3 Exercises
2.4.1. Let M be a compact metric space and X be a set of continuous maps f : M →
M, endowed with a probability measure ν. Consider the skew-product F : X N ×
M → X N × M defined by F((fn )n , x) = ((fn+1 )n , f0 (x)). Show that F admits some
invariant probability measure m of the form m = ν N × μ. Moreover, a measure m
of this form is invariant under F if and only if the measure μ is stationary for ν,
that is, if and only if μ(E) = f∗ μ(E) dν(f ) for every measurable set E ⊂ M.
2.4.2. Let f : M → M be a surjective transformation, fˆ : M̂ → M̂ be its natural extension
and π : M̂ → M be the canonical projection. Show that if g : N → N is an
invertible transformation such that f ◦p = p◦g for some map p : N → M then there
exists a unique map p̂ : N → M̂ such that π ◦ p̂ = p and p̂ ◦ g = fˆ ◦ p̂. Suppose that
M and N are compact spaces and the maps p and g are continuous. Show that if p
is surjective then p̂ is surjective (and so g : N → N is an extension of fˆ : M̂ → M̂).
2.4.3. Check the claims in Example 2.4.2.
2.4.4. Show that if (M, d) is a complete separable metric space then the same holds
for the space (M̂, d̂) of the pre-orbits of any continuous surjective transformation
f : M → M.
2.4.5. The purpose of this exercise and the next is to generalize the notion of
natural extension to finite families of commuting transformations. Let M be
a compact space and f1 , . . . , fq : M → M be commuting surjective continuous
transformations. Let M̂ be the set of all sequences (xn1 ,...,nq )n1 ,...,nq ≤0 , indexed by
the q-tuples of non-positive integer numbers, such that
fi (xn1 ,...,ni ,...,nq ) = xn1 ,...,ni +1,...,nq for every i and every (n1 , . . . , nq ).
Let π : M̂ → M be the map sending (xn1 ,...,nq )n1 ,...,nq ≤0 to the point x0,...,0 .
For each i, let fˆi : M̂ → M̂ be the map sending (xn1 ,...,ni ,...nq )n1 ,...,nq ≤0 to
(xn1 ,...,ni +1,...nq )n1 ,...,nq ≤0 .
(a) Prove that M̂ is a compact space. Moreover, M̂ is metrizable if M is
metrizable.
58 Existence of invariant measures

(b) Show that every fˆi : M̂ → M̂ is a homeomorphism with π ◦ fˆi = fi ◦ π .


Moreover, these homeomorphisms commute.
(c) Prove that π is continuous and surjective. In particular, M̂ is non-empty.
2.4.6. Let M be a compact space and g1 , . . . , gq : M → M be commuting continuous
transformations. Define Mg = ∞ n=1 g1 · · · gq (M).
n n
n1 nq
(a) Check that Mg = n1 ,...,nq g1 · · · gq (M), where the intersection is over all
q-tuples (n1 , . . . , nq ) with ni ≥ 1 for every i.
(b) Show that gi (Mg ) ⊂ Mg and the restriction fi = gi | Mg is surjective, for
every i.
[Observation: It is clear that these restrictions fi commute.]
2.4.7. Use the construction in Exercises 2.4.5 and 2.4.6 to extend the proof of
Theorem 1.5.1 to the case when the transformations fi are not necessarily
invertible.

2.5 Arithmetic progressions


In this section we prove two fundamental results of combinatorial arithmetics,
the theorem of van der Waerden and the theorem of Szemerédi, using the
multiple recurrence theorems (Theorem 1.5.1 and Theorem 1.5.2) introduced
in Section 1.5.
We call a partition of the set Z of integers numbers any finite family of
pairwise disjoint sets S1 , . . . , Sk ⊂ Z whose union is the whole of Z. Recall that
a (finite) arithmetic progression is a sequence of the form

m + n, m + 2n, . . . , m + qn, with m ∈ Z and n, q ≥ 1.

The number q is called the length of the progression.


The next theorem was originally proven by the Dutch mathematician Bartel
van der Waerden [vdW27] in the 1920’s:

Theorem 2.5.1 (van der Waerden). Given any partition {S1 , . . . , Sl } of Z, there
exists j ∈ {1, . . . , l} such that Sj contains arithmetic progressions of every length.
In other words, for every q ≥ 1 there exist m ∈ Z and n ≥ 1 such that m + in ∈ Sj
for every 1 ≤ i ≤ q.

Some time afterwards, the Hungarian mathematicians Pål Erdös and Pål
Turan [ET36] conjectured the following statement, which is stronger than the
theorem of van der Waerden: any set S ⊂ Z whose upper density is positive
contains arithmetic progressions of every length. This was proven by another
Hungarian mathematician, Endre Szemerédi [Sze75], almost four decades
later. To state the theorem of Szemerédi precisely, we need to define the notion
of upper density of a subset of Z.
We call an interval of the set Z any subset I of the form {n ∈ Z : a ≤ n < b}
with a ≤ b. The cardinal of an interval I is the number #I = b − a. The upper
2.5 Arithmetic progressions 59

density of a subset S of Z is the number


 
# S∩I
Du (S) = lim sup ,
#I→∞ #I
where I represents any interval of Z. The lower density Dl (S) of a subset S of Z
is defined analogously, just replacing limit superior with limit inferior. In other
words, Du (S) is the largest and Dl (D) is the smallest number D such that
 
# S ∩ Ij
→ D for some sequence of intervals Ij ⊂ Z with #Ij → ∞.
#Ij
In the next lemma we collect some simple properties of the upper and lower
densities. The proof is left as an exercise (Exercise 2.5.1).
Lemma 2.5.2. For any S ⊂ Z,
0 ≤ Dl (S) ≤ Du (S) ≤ 1 and Dl (S) = 1 − Du (Z \ S).
Moreover, if S1 , . . . , Sl is a partition of Z then
Dl (S1 ) + · · · + Dl (Sl ) ≤ 1 ≤ Du (S1 ) + · · · + Du (Sl ).
Example 2.5.3. Let S be the set of even numbers. For any interval I ⊂ Z, we
have #(S ∩ I) = #I/2 if the cardinal of I is even and #(S ∩ I) = (#I ± 1)/2
if the cardinal of I is odd; the sign ± is positive if the smallest element of I
is an even number and it is negative otherwise. It follows, immediately, that
Du (S) = Dl (S) = 1/2.
Example 2.5.4. Let S be the following subset of Z:
{1, 3, 4, 7, 8, 9, 13, 14, 15, 16, 21, 22, 23, 24, 25, 31, 32, 33, 34, 35, 36, 43, . . .}.
That is, for each k ≥ 1 we include in S a block of k consecutive integers and
then we omit the next k integer numbers. On the one hand, S contains intervals
of every length. Consequently, Du (S) = 1. On the other hand, the complement
of S also contains intervals of every length. So, Dl (S) = 1 − Du (Z \ S) = 0.
Notice that, in both examples, the set S contains arithmetic progressions of
every length. Actually, in Example 2.5.3 the set S even contains arithmetic
progressions of infinite length. That is not true in Example 2.5.4, because in
this case the complement of S contains arbitrarily long intervals.
Theorem 2.5.5 (Szemerédi). If S is a subset of Z with positive upper density
then it contains arithmetic progressions of every length.
The theorem of van der Waerden is an easy consequence of the theorem of
Szemerédi. Indeed, it follows from Lemma 2.5.2 that if S1 , . . . , Sl is a partition
of Z then there exists j such that Du (Sj ) > 0. By Theorem 2.5.5, such an Sj
contains arithmetic progressions of every length.
The original proofs of these results were combinatorial. Then, Furstenberg
(see [Fur81]) observed that the two theorems could also be deduced from
60 Existence of invariant measures

ideas in ergodic theory: we will show in Section 2.5.1 how to obtain the
theorem of van der Waerden from the multiple recurrence theorem of Birkhoff
(Theorem 1.5.1); similar arguments yield the theorem of Szemerédi from the
multiple recurrence theorem of Poincaré (Theorem 1.5.2), as we will see in
Section 2.5.2.
The theory of Szemerédi remains a very active research area. In particular,
alternative proofs of Theorem 2.5.5 have been given by other authors. Recently,
this led to the following spectacular result of the British mathematician Ben
Green and the Australian mathematician Terence Tao [GT08]: the set of
prime numbers contains arithmetic progressions of every length. This is not a
consequence of the theorem of Szemerédi, because the upper density of the set
of prime numbers is zero, but the theorem of Szemerédi does have an important
role in the proof. On the other hand, the Green–Tao theorem is a special case
of yet another conjecture of Erdös: if S ⊂ N is such that the sum of the inverses
diverges, that is, such that
1
= ∞,
n∈S
n
then S contains arithmetic progressions of every length. This more general
statement remains open.

2.5.1 Theorem of van der Waerden


In this section we prove Theorem 2.5.1. The idea of the proof is to reduce the
conclusion of the theorem to a claim about the shift map
σ :  → , (αn )n∈Z → (αn+1 )n∈Z
in the space  = {1, 2, . . . , l}Z of two-sided sequences with values in the
set {1, 2, . . . , l}. This claim will then be proved using the multiple recurrence
theorem of Birkhoff.
Observe that every partition {S1 , . . . , Sl } of Z into l ≥ 2 subsets determines an
element α = (αn )n∈Z of , through αn = i ⇔ n ∈ Si . Conversely, every α ∈ 
determines a partition of Z into subsets
Si = {n ∈ Z : αn = i}, i = 1, . . . , l.
We are going to show that for every α ∈  and every q ≥ 1, there exist m ∈ Z
and n ≥ 1 such that
αm+n = · · · = αm+qn . (2.5.1)
In view of what we have just observed, this means that for every partition
{S1 , . . . , Sl } and every q ≥ 1 there exists i ∈ {1, . . . , l} such that Si contains some
arithmetic progression of length q. Since there are finitely many Si , that implies
that some Sj contains arithmetic progressions of arbitrarily large lengths. This
is the same as saying that Si contains arithmetic progressions of every length
2.5 Arithmetic progressions 61

because, clearly, every arithmetic progression of length q contains arithmetic


progressions of every length smaller than q. This reduces the proof of the
theorem to proving the claim in (2.5.1).
To that end, let us consider on  the distance defined by d(β, γ ) = 2−N(β,γ ) ,
where
 
N(β, γ ) = max N ≥ 0 : βn = γn for every n ∈ Z with |n| < N .
Note that
d(β, γ ) < 1 if and only if β0 = γ0 . (2.5.2)
 
Since the metric space (, d) is compact, the closure Z = σ n (α) : n ∈ Z of
the trajectory of α is also compact. Moreover, Z is invariant under the shift
map. Let us consider the transformations f1 = σ , f2 = σ 2 , . . . , fq = σ q defined
from Z to Z. It is clear from the definition that these transformations commute
with each other. So, we may use Theorem 1.5.1 to conclude that there exist
θ ∈ Z and a sequence (nk )k → ∞ such that
n
lim fi k (θ ) = θ for every i = 1, 2, . . . , q.
k
n
Observe that fi j = σ i nj . In particular, we may fix n = nj such that the iterates
σ n (θ), σ 2n (θ ), . . . , σ qn (θ) are all within a distance of less than 1/2 from the
point θ . Consequently,
 
d σ in (θ), σ jn (θ ) < 1 for every 1 ≤ i, j ≤ q.
Then, as θ is in the closure Z of the orbit of α, we may find m ∈ Z such that
σ m (α) is so close to θ that
 
d σ m+in (α), σ m+jn (α) < 1 for every 1 ≤ i, j ≤ q.
Taking into account the observation (2.5.2) and the definition of the shift map
σ , this means that αm+n = · · · = αm+qn , as we wanted to prove. This completes
the proof of the theorem of van der Waerden.

2.5.2 Theorem of Szemerédi


Now let us prove Theorem 2.5.5. We use the same kind of dictionary between
partitions of Z and sequences of integer numbers that was used in the previous
section to prove the theorem of van der Waerden.
Let S be a subset of integer numbers with positive upper density, that is, such
that there exist c > 0 and intervals Ij = [aj , bj ) of Z satisfying
 
# S ∩ Ij
lim #Ij = ∞ and lim ≥ c.
j j #Ij
Let us associate with S the sequence α = (αj )j∈Z ∈  = {0, 1}Z defined by
αj = 1 ⇔ j ∈ S.
62 Existence of invariant measures

Consider the shift map σ :  →  and the subset A = {α ∈  : α0 = 1} of .


Note that both A and its complement are open cylinders of . Thus, A is both
open and closed in . Moreover, for every j ∈ Z,
σ j (α) ∈ A ⇔ αj = 1 ⇔ j ∈ S.
So, to prove the theorem of Szemerédi it suffices to show that for every k ∈ N
there exist m ∈ Z and n ≥ 1 such that
σ m+n (α), σ m+2n (α), . . . , σ m+kn (α) ∈ A. (2.5.3)
For that, let us consider the sequence μj of probability measures defined on
 by
1 
μj = δ i . (2.5.4)
#Ij i∈I σ (α)
j

Since the space M1 () of all probability measures on  is compact


(Theorem 2.1.5), up to replacing (μj )j by some subsequence we may suppose
that it converges in the weak∗ topology to some probability measure μ on .
Observe that μ is a σ -invariant probability measure, for, given any
continuous function ϕ :  → R,

1  1  
(ϕ ◦ σ ) dμj = ϕ(σ (α)) +
i
ϕ(σ (α)) − ϕ(σ (α))
bj aj
#Ij i∈I #Ij
j

1  
= ϕ dμj + ϕ(σ (α)) − ϕ(σ (α))
bj aj
#Ij
and, taking the limit when j → ∞, it follows that (ϕ ◦ σ ) dμ = ϕ dμ.
Observe also that μ(A) > 0. Indeed, since A is closed, Theorem 2.1.2 ensures
that  
# S ∩ Ij
μ(A) ≥ lim sup μj (A) = lim sup ≥ c.
j j #Ij
Given any k ≥ 1, consider fi = σ i for i = 1, . . . , k. It is clear that these
transformations commute with each other. So, we are in a position to apply
Theorem 1.5.2 to conclude that there exists some n ≥ 1 such that
 
μ A ∩ σ −n (A) ∩ · · · ∩ σ −kn (A) > 0.
Since A is open, this implies (Theorem 2.1.2) that
 
μl A ∩ σ −n (A) ∩ · · · ∩ σ −kn (A) > 0
for every l sufficiently large. By the definition (2.5.4) of μl , this means that
there exists some m ∈ Il such that
σ m (α) ∈ A ∩ σ −n (A) ∩ · · · ∩ σ −kn (A).
In particular, σ m+in (α) ∈ A for every i = 1, . . . , k, as we wanted to prove.
2.5 Arithmetic progressions 63

2.5.3 Exercises
2.5.1. Prove Lemma 2.5.2.
2.5.2. Show that the conclusion of Theorem 2.5.1 remains valid for partitions of finite
subsets of Z, as long as they are sufficiently large. More precisely: given q, l ≥ 1
there exists N ≥ 1 such that, for any partition of the set {1, 2, . . . , N} into l subsets,
at least one of these subsets contains arithmetic progressions of length q.
2.5.3. A point x ∈ M is said to be super non-wandering if, given any neighborhood
U of x and any k ≥ 1, there exists n ≥ 1 such that kj=0 f −jn (U)  = ∅. Show
that the theorem of van der Warden is equivalent to the following statement:
every invertible transformation on a compact metric space has some super
non-wandering point.
2.5.4. Prove the following generalization of the theorem of van der Waerden to arbitrary
dimension, called the Grünwald theorem: given any partition Nk = S1 ∪ · · · ∪ Sl
and any q ≥ 1, there exist j ∈ {1, . . . , l}, d ∈ N and b ∈ Nk such that

b + d(a1 , . . . , ak ) ∈ Sj for any 1 ≤ ai ≤ q and any 1 ≤ i ≤ k.


3
Ergodic theorems

In this chapter we present the fundamental results of ergodic theory. To


motivate the kind of statements that we are going to discuss, let us consider
a measurable set E ⊂ M with positive measure and an arbitrary point x ∈ M.
We want to analyze the set of iterates of x that visit E, that is,
{ j ≥ 0 : f j (x) ∈ E}.
For example, the Poincaré recurrence theorem states that this set is infinite,
for almost every x ∈ E. We would like to have more precise quantitative
information. Let us call the mean sojourn time of x to E the value of
1
τ (E, x) = lim #{0 ≤ j < n : f j (x) ∈ E}. (3.0.1)
n→∞ n

There is an analogous notion for flows, defined by


1  
τ (E, x) = lim m {0 ≤ t ≤ T : f t (x) ∈ E} , (3.0.2)
T→∞ T

where m is the Lebesgue measure on the real line. It would be interesting to


know, for example, under which conditions the mean sojourn time is positive.
But before tackling this problem one must answer an even more basic question:
when do the limits in (3.0.1)–(3.0.2) exist?
These questions go back to the work of the Austrian physicist Ludwig Boltz-
mann (1844–1906), who developed the kinetic theory of gases. Boltzmann
was an emphatic supporter of the atomic theory, according to which gases
are formed by a large number of small moving particles, constantly colliding
with each other, at a time when this theory was still highly controversial.
In principle, it should be possible to explain the behavior of a gas by
applying the laws of classical mechanics to each one of these particles
(molecules). In practice, this is not realistic because the number of molecules is
huge.
The proposal of the kinetic theory was, then, to try and explain the behavior
of gases at a macroscopic scale as the statistical combination of the motions
of all its molecules. To formulate the theory in precise mathematical terms,
3.1 Ergodic theorem of von Neumann 65

Boltzmann was forced to make an assumption that became known as the


ergodic hypothesis. In modern language, the ergodic hypothesis claims that, for
the kind of systems (Hamiltonian flows) that describe the motions of particles
of a gas, the mean sojourn time to any measurable set E exists and is equal to
the measure of E, for almost every point x.
Efforts to validate (or not) this hypothesis led to important developments,
in mathematics (ergodic theory, dynamical systems) as well as in physics
(statistical mechanics). In this chapter we concentrate on results concerning the
existence of the mean sojourn time. The question of whether τ (E, x) = μ(E)
for almost every x is the subject of Chapter 4.
Denoting by ϕ the characteristic function of the set E, we may rewrite the
expression on the right-hand side of (3.0.1) as

1
n−1
lim ϕ(f j (x)). (3.0.3)
n→∞ n
j=0

This suggests a natural generalization of the original question: does the limit
in (3.0.3) exist for more general functions ϕ, for example, for all integrable
functions?
The ergodic theorem of von Neumann (Theorem 3.1.6) states that the limit in
(3.0.3) does exist, in the space L2 (μ), for every function ϕ ∈ L2 (μ). The ergodic
theorem of Birkhoff (Theorem 3.2.3) goes a lot further, by asserting that the
convergence holds at μ-almost every point, for every ϕ ∈ L1 (μ). In particular,
the limit in (3.0.1) is well defined for μ-almost every x (Theorem 3.2.1).
We give a direct proof of the theorem of von Neumann and we also show
how it can be deduced from the theorem of Birkhoff. Concerning the latter, we
are going to see that it can be obtained as a special case of an even stronger
result, the subadditive ergodic theorem of Kingman (Theorem 3.3.3). This
theorem asserts that ψn /n converges almost everywhere, for any sequence of
functions ψn such that ψm+n ≤ ψm + ψn ◦ f m for every m, n.
All these results remain valid for flows, as we comment upon in Section 3.4.

3.1 Ergodic theorem of von Neumann


In this section we state and prove the ergodic theorem of von Neumann. We
begin by reviewing some general ideas concerning isometries in Hilbert spaces.
See Appendices A.6 and A.7 for more information on this topic.

3.1.1 Isometries in Hilbert spaces


Let H be a Hilbert space and F be a closed subspace of H. Then,

H = F ⊕ F⊥ , (3.1.1)
66 Ergodic theorems

where F ⊥ = {w ∈ H : v · w = 0 for every v ∈ F} is the orthogonal complement


of F. The projection PF : H → F associated with the decomposition (3.1.1) is
called the orthogonal projection to F. It is uniquely characterized by
x − PF (x) = min{x − v : v ∈ F}.
Observe that PF (v) = v for every v ∈ F and, consequently, P2F = PF .
Example 3.1.1. Consider the Hilbert space L2 (μ), with the inner product

ϕ · ψ = ϕ ψ̄ dμ.

Let F be the subspace of constant functions. Given any ϕ ∈ L2 (μ), we have


that (PF (ϕ) − ϕ) · 1 = 0, that is,
PF (ϕ) · 1 = ϕ · 1.
Since PF (ϕ) is a constant function, the expression on the left-hand side is equal
to PF (ϕ). The expression on the right-hand side is equal to ϕ dμ. Therefore,
the orthogonal projection to the subspace F is given by

PF (ϕ) = ϕ dμ.

Recall that the adjoint operator U ∗ : H → H of a continuous linear operator


U : H → H is defined by the relation
U ∗ u · v = u · Uv for every u, v ∈ H. (3.1.2)
The operator U is said to be an isometry if it preserves the inner product:
Uu · Uv = u · v for every u, v ∈ H. (3.1.3)
This is equivalent to saying that U preserves the norm of H (see Exer-
cise A.6.9). Another equivalent condition is U ∗ U = id . Indeed,
Uu · Uv = u · v for every u, v ⇔ U ∗ Uu · v = u · v for every u, v.
The property U ∗ U = id implies that U is injective. In general, an isometry need
not be surjective. See Exercises 2.3.5 and 2.3.6. If an isometry is surjective then
it is an isomorphism; such isometries are also called unitary operators.
Example 3.1.2. If f : M → M preserves a measure μ then, as we saw in
Section 2.3.2, the Koopman operator Uf : L2 (μ) → L2 (μ) is an isometry. If
f is invertible then Uf is a unitary operator.
We call the set of invariant vectors of a continuous linear operator U : H → H
the subspace
I(U) = {v ∈ H : Uv = v}.
Observe that I(U) is a closed vector subspace, since U is continuous and linear.
When U is an isometry, we have that I(U) = I(U ∗ ):
Lemma 3.1.3. If U : H → H is an isometry then Uv = v if and only if U ∗ v = v.
3.1 Ergodic theorem of von Neumann 67

Proof. Since U ∗ U = id , it is clear that Uv = v implies U ∗ v = v. Now assume


that U ∗ v = v. Then, Uv · v = v · U ∗ v = v · v = v2 . So, using the fact that U
preserves the norm of H,
Uv − v2 = (Uv − v) · (Uv − v) = Uv2 − Uv · v − v · Uv + v2 = 0.

This means that Uv = v.

To close this brief digression, let us quote a classical result from functional
analysis, due to Marshall H. Stone, that permits the reduction of the study of
Koopman operators of continuous time systems to the discrete case.
Let Ut : H → H, t ∈ R be a 1-parameter group of linear operators on a
Banach space: by this we mean that U0 = id and Ut+s = Ut Us for every t, s ∈ R.
We say that the group is strongly continuous if
lim Ut v = Ut0 v, for every t0 ∈ R and v ∈ H.
t→t0

Theorem 3.1.4 (Stone). If Ut : H → H, t ∈ R is a strongly continuous


1-parameter group of unitary operators on a complex Hilbert space then there
exists a self-adjoint operator A, defined on a dense subspace D(A) of H, such
that Ut | D(A) = eitA for every t ∈ R.

A proof may be found in Yosida [Yos68, § IX.9] and a simple application is


given in Exercise 3.1.5. The operator iA is called the infinitesimal generator of
the group. It may be retrieved through
1 
iAv = lim Ut v − v . (3.1.4)
t→0 t

See Yosida [Yos68, § IX.3] for a proof of the fact that the limit on the
right-hand side exists for every v in a dense subspace of H.

Example 3.1.5. Let H be the Banach space of continuous functions ϕ : S1 →


C, with the norm of uniform convergence. Define Ut (ϕ)(x) = ϕ(x +t) for every
function ϕ ∈ H. Observe that (Ut )t is a strongly continuous 1-parameter group
of isometries of H. The infinitesimal generator is given by
1  1 
iAφ(x) = lim Ut φ(x) − φ(x) = lim φ(x + t) − φ(x) = φ  (x).
t→0 t t→0 t

Its domain is the subset of functions of class C1 , which is well known to be


dense in H.

3.1.2 Statement and proof of the theorem


Our first ergodic theorem is:

Theorem 3.1.6 (von Neumann). Let U : H → H be an isometry in a Hilbert


space H and P be the orthogonal projection to the subspace I(U) of invariant
68 Ergodic theorems

vectors of U. Then,
1 j
n−1
lim U v = Pv for every v ∈ H. (3.1.5)
n→∞ n
j=0

Proof. Let L(U) be the set of vectors v ∈ H of the form v = Uu − u for some
u ∈ H and let L̄(U) be its closure. We claim that

I(U) = L̄(U)⊥ . (3.1.6)

This can be checked as follows. Consider any v ∈ I(U) and w ∈ L̄(U). By


Lemma 3.1.3, we have that v ∈ I(U ∗ ), that is, U ∗ v = v. Moreover, by definition
of L̄(U), there are un ∈ H, n ≥ 1 such that (Uun − un )n → w. Since

v · (Uun − un ) = v · Uun − v · un = U ∗ v · un − v · un = 0

for every n, we conclude that v · w = 0. This proves that I(U) ⊂ L̄(U)⊥ . Next,
consider any v ∈ L̄(U)⊥ . Then, in particular,

v · (Uu − u) = 0 or, equivalently, U∗v · u − v · u = 0

for every u ∈ H. This means that U ∗ v = v. Using Lemma 3.1.3 once more,
we deduce that v ∈ I(U). This shows that L̄(U)⊥ ⊂ I(U), which completes the
proof of (3.1.6). As a consequence, using (3.1.1),

H = I(U) ⊕ L̄(U). (3.1.7)

Now we prove the identity (3.1.5), successively, for v ∈ I(U), for v ∈ L̄(U)
and for any v ∈ H. Begin by supposing that v ∈ I(U). On the one hand, Pv = v.
On the other hand,
1 j 1
n−1 n−1
Uv= v=v
n j=0 n j=0

for every n, and so this sequence converges to v when n → ∞. Combining


these two observations we get (3.1.5) in this case.
Next, suppose that v ∈ L(U). Then, by definition, there exists u ∈ H such
that v = Uu − u. It is clear that

1 j 1   j+1  1
n−1 n−1
Uv= U u − U j u = (U n u − u).
n j=0 n j=0 n

The norm of this last expression is bounded by 2u/n and, consequently,


converges to zero when n → ∞. This shows that

1 j
n−1
lim U v = 0 for every v ∈ L(U). (3.1.8)
n n
j=0
3.1 Ergodic theorem of von Neumann 69

More generally, suppose that v ∈ L̄(U). Then, there exist vectors vk ∈ L(U)
converging to v when k → ∞. Observe that
   
 1 n−1 j 1
n−1
 1 n−1
 U v− j 
U vk  ≤ U j (v − vk ) ≤ v − vk 
n n n
j=0 j=0 j=0

for every n and every k. Together with (3.1.8), this implies that

1 j
n−1
lim U v=0 for every v ∈ L̄(U). (3.1.9)
n n
j=0

Since (3.1.6) implies that Pv = 0 for every v ∈ L̄(U), this shows that (3.1.5)
holds also when v ∈ L̄(U).
The general case of (3.1.5) follows immediately, as H = I(U) ⊕ L̄(U).

3.1.3 Convergence in L2 (μ)


Given a measurable transformation f : M → M and an invariant probability
measure μ on M, we say that a measurable function ψ : M → R is invariant
if ψ ◦ f = ψ at μ-almost every point. The following result is a special case of
Theorem 3.1.6:

Theorem 3.1.7. Given any ϕ ∈ L2 (μ), let ϕ̃ be the orthogonal projection of ϕ


to the subspace of invariant functions. Then the sequence

1
n−1
ϕ ◦fj (3.1.10)
n j=0

converges to ϕ̃ in the space L2 (μ). If f is invertible, then the sequence

1
n−1
ϕ ◦ f −j (3.1.11)
n j=0

also converges to ϕ̃ in L2 (μ).

Proof. Let U = Uf : L2 (μ) → L2 (μ) be the Koopman operator of (f , μ). Note


that a function ψ is in the subspace I(U) of invariant functions if and only
if ψ ◦ f = ψ at μ-almost every point. By Theorem 3.1.6, the sequence in
(3.1.10) converges in the space L2 (μ) to the orthogonal projection ϕ̃ of ϕ to
the subspace I(U). This proves the first claim.
The second one is analogous, taking instead U = Uf −1 , which is the inverse
of Uf . We get that the sequence in (3.1.11) converges in L2 (μ) to the orthogonal
projection of ϕ to the subspace I(Uf −1 ). Observing that I(Uf −1 ) = I(Uf ), we
conclude that the limit of this sequence is just the same function ϕ̃ as before.
70 Ergodic theorems

3.1.4 Exercises
3.1.1. Show that under the hypotheses of the von Neumann ergodic theorem one has
the following stronger conclusion:

1 
n−1
lim ϕ ◦ f j → P(ϕ).
n−m→∞ n − m
j=m

3.1.2. Use the previous exercise to show that, given any A ⊂ M with μ(A) > 0, the set
of values of n ∈ N such that μ(A ∩ f −n (A)) > 0 is syndetic. [Observation: We
have seen a different proof of this fact in Exercise 1.2.5.]
3.1.3. Prove that the set F = {ϕ ∈ L1 (μ) : ϕ is f -invariant} is a closed subspace of L1 (μ).
3.1.4. State and prove a version of the von Neumann ergodic theorem for flows.
3.1.5. Let ft : M → M, t ∈ R be a continuous flow on a compact metric space M and μ be
an invariant probability measure. Check that the 1-parameter group Ut : L2 (μ) →
L2 (μ), t ∈ R of Koopman operators ϕ → Ut ϕ = ϕ ◦ ft is strongly continuous.
Show that μ is ergodic if and only if 0 is a simple eigenvalue of the infinitesimal
generator of the group.

3.2 Birkhoff ergodic theorem


The theorem that we present in this section was proven by George David
Birkhoff,1 the prominent American mathematician of his generation and author
of many other fundamental contributions to dynamics. It is a substantial
improvement of the von Neumann ergodic theorem, because its conclusion
is stated in terms of convergence at μ-almost every point, which in this context
is a stronger property than convergence in L2 (μ), as explained in Section 3.2.3.

3.2.1 Mean sojourn time


We start by stating the version of the theorem for mean sojourn times:

Theorem 3.2.1 (Birkhoff). Let f : M → M be a measurable transformation


and μ be a probability measure invariant under f . Given any measurable set
E ⊂ M, the mean sojourn time
1
τ (E, x) = lim #{j = 0, 1, . . . , n − 1 : f j (x) ∈ E}
n n

exists at μ-almost every point x ∈ M. Moreover, τ (E, x) dμ(x) = μ(E).

Observe that if τ (E, x) exists for some x ∈ M then

τ (E, f (x)) = τ (E, x). (3.2.1)

1 His son Garret Birkhoff was also a mathematician, and is well known for his work in algebra.
The notion of projective distance that we use in Section 12.3 was due to him.
3.2 Birkhoff ergodic theorem 71

Indeed, by definition,

1
n
τ (E, f (x)) = lim XE (f j (x))
n→∞ n
j=1

1 1 
n−1
= lim XE (f j (x)) − XE (x) − XE (f n (x))
n→∞ n n
j=0

1 
= τ (E, x) − lim XE (x) − XE (f n (x)) .
n→∞ n

Since the characteristic function is bounded, the last limit is equal to zero. This
proves (3.2.1).
The next example shows that the mean sojourn time does not exist for every
point, in general:

Example 3.2.2. Consider the number x ∈ (0, 1) defined by the decimal


expansion x = 0.a1 a2 a3 . . . , where ai = 1 if 2k ≤ i < 2k+1 with k even and
ai = 0 if 2k ≤ i < 2k+1 with k odd. In other words,

x = 0.10011110000000011111111111111110 . . . ,

where the lengths of the alternating blocks of 0s and 1s are given by successive
powers of 2. Let f : [0, 1] → [0, 1] be the transformation defined in Section 1.3.1
and let E = [0, 1/10). That is, E is the set of all points whose decimal expansion
starts with the digit 0. It is easy to check that if n = 2k − 1 with k even then

1
n−1
21 + 23 + · · · + 2k−1 2
XE (f (x)) =
j
= .
n j=0 2k − 1 3

On the other hand, if one takes n = 2k − 1 with k odd then

1
n−1
21 + 23 + · · · + 2k−2 2k − 2 1
XE (f j (x)) = = →
n j=0 2k − 1 3(2k − 1) 3

as k → ∞. Thus, the mean sojourn time of x in the set E does not exist.

3.2.2 Time averages


As we observed previously,

1
n−1
τ (E, x) = lim ϕ(f j (x)), where ϕ = XE .
n n
j=0

The next statement extends Theorem 3.2.1 to the case when ϕ is any integrable
function:
72 Ergodic theorems

Theorem 3.2.3 (Birkhoff). Let f : M → M be a measurable transformation


and μ be a probability measure invariant under f . Given any integrable
function ϕ : M → R, the limit

1
n−1
ϕ̃(x) = lim ϕ(f j (x)) (3.2.2)
n→∞ n
j=0

exists at μ-almost every point x ∈ M. Moreover, the function ϕ̃ defined in this


way is integrable and satisfies
 
ϕ̃(x) dμ(x) = ϕ(x) dμ(x).

In a little while, we will obtain this theorem as a special case of a more


general result, the subadditive ergodic theorem. The limit ϕ̃ is called the
time average, or orbital average, of ϕ. The next proposition shows that time
averages are constant on the orbit of μ-almost every point, which generalizes
(3.2.1):

Proposition 3.2.4. Let ϕ : M → R be an integrable function. Then,

ϕ̃(f (x)) = ϕ̃(x) for μ-almost every point x ∈ M. (3.2.3)

Proof. By definition,

1 1 1 
n n−1
ϕ̃(f (x)) = lim ϕ(f (x)) = lim
j
ϕ(f j (x)) + ϕ(f n (x)) − ϕ(x)
n→∞ n n→∞ n n
j=1 j=0

1 n 
= ϕ̃(x) + lim ϕ(f (x)) − ϕ(x) .
n→∞ n

We need the following lemma:

Lemma 3.2.5. If φ is an integrable function then limn (1/n)φ(f n (x)) = 0 for


μ-almost every point x ∈ M.

Proof. Fix any ε > 0. Since μ is invariant, we have that


   
μ {x ∈ M : |φ(f n (x))| ≥ nε} = μ {x ∈ M : |φ(x)| ≥ nε}
∞
 |φ(x)| 
= μ {x ∈ M : k ≤ < k + 1} .
k=n
ε

Adding these expressions over n ∈ N, we obtain


∞ ∞
    |φ(x)| 
μ {x ∈ M : |φ(f (x))| ≥ nε} =
n
kμ {x ∈ M : k ≤ < k + 1}
n=1 k=1
ε

|φ|
≤ dμ.
ε
3.2 Birkhoff ergodic theorem 73

Since φ is integrable, by assumption, all these expressions are finite. That


implies that the set B(ε) of all points x such that |φ(f n (x))| ≥ nε for infinitely
many values of n has zero measure (check Exercise A.1.6). Now, the definition
/ B(ε) there exists p ≥ 1 such that |φ(f n (x))| <
of B(ε) implies that for every x ∈
nε for every n ≥ p. Consider the set B = ∞ i=1 B(1/i). Then B has zero measure
and limn (1/n)φ(f (x)) = 0 for every x ∈
n
/ B.

Applying Lemma 3.2.5 to the function φ = ϕ we obtain the identity in


(3.2.3). This completes the proof of Proposition 3.2.4.

In general, the total measure subset of points for which the limit in (3.2.2)
exists depends on the function ϕ under consideration. However, in some
situations it is possible to choose such a set independent of the function. A
useful example of such a situation is:

Theorem 3.2.6. Let M be a compact metric space and f : M → M be a


measurable map. Then there exists some measurable set G ⊂ M with μ(G) = 1
such that
1
n−1
ϕ(f j (x)) → ϕ̃(x) (3.2.4)
n j=0

for every x ∈ G and every continuous function ϕ : M → R.

Proof. By the Birkhoff ergodic theorem, for every continuous function ϕ there
exists G(ϕ) ⊂ M such that μ(G(ϕ)) = 1 and (3.2.4) holds for every x ∈ G(ϕ).
By Theorem A.3.13, the space C0 (M) of continuous functions admits some
countable dense subset {ϕk : k ∈ N}. Take


G= G(ϕk ).
k=1

Since the intersection is countable, it is clear that μ(G) = 1. So, it suffices to


prove that (3.2.4) holds for every continuous function ϕ whenever x ∈ G. This
can be done as follows. Given ϕ ∈ C0 (M) and any ε > 0, take k ∈ N such that
 
ϕ − ϕk  = sup |ϕ(x) − ϕk (x)| : x ∈ M ≤ ε.

Then, given any point x ∈ G,

1 1
n−1 n−1
lim sup ϕ(f j (x)) ≤ lim ϕk (f j (x)) + ε = ϕ̃k (x) + ε
n n j=0 n n j=0

1 1
n−1 n−1
lim inf ϕ(f j (x)) ≥ lim ϕk (f j (x)) − ε = ϕ̃k (x) − ε.
n n j=0 n n
j=0
74 Ergodic theorems

This implies that

1 1
n−1 n−1
lim sup ϕ(f j (x)) − lim inf ϕ(f j (x)) ≤ 2ε.
n n j=0 n n j=0

Since ε is arbitrary, it follows that the limit ϕ̃(x) exists, as stated.

In general, one can not say anything about the speed of convergence in
Theorem 3.2.3. For example, it follows from a theorem of Kakutani and
Petersen (check pages 94 to 99 of Petersen [Pet83]) that if the measure μ
is ergodic2 and non-atomic then, given any sequence (an )n of positive real
numbers with limn an = 0, there exists some bounded measurable function ϕ
with

1  1  
n−1

lim sup  ϕ(f (x)) − ϕ dμ = +∞.
j
n an n j=0

Another interesting observation is that there is no analogue of the Birkhoff


ergodic theorem for infinite invariant measures. Indeed, suppose that μ is a
σ -finite, but infinite, invariant measure of a transformation f : M → M. We say
that a measurable set W ⊂ M is wandering if the pre-images f −i (W), i ≥ 0
are pairwise disjoint. Suppose that μ is ergodic and conservative, that is, such
that every wandering set has zero measure. Then, given any sequence (an )n of
positive real numbers,

1. either, for every ϕ ∈ L1 (μ),

1 
n−1
lim inf ϕ ◦fj = 0 at almost every point;
n an j=0

2. or, there exists (nk )k → ∞ such that, for every ϕ ∈ L1 (μ),


nk −1
1 
lim ϕ ◦ f j = ∞ at almost every point.
k an
k j=0

This and other related facts about infinite measures are proved in Section 2.4
of Aaronson [Aar97].

3.2.3 Theorem of von Neumann and consequences


The theorem of von Neumann (Theorem 3.1.7) may also be deduced directly
from the theorem of Birkhoff, as we are going to explain.

2 We say that an invariant measure μ is ergodic if f −1 (A) = A up to measure zero implies that
either μ(A) = 0 or μ(Ac ) = 0. The study of ergodic measures will be the subject of the next
chapter.
3.2 Birkhoff ergodic theorem 75

Consider any ϕ ∈ L2 (μ) and let ϕ̃ be the corresponding time average. We


start by showing that ϕ̃ ∈ L2 (μ) and its norm satisfies ϕ̃2 ≤ ϕ2 . Indeed,
  2
1 2 
n−1 n−1
1
|ϕ̃| ≤ lim |ϕ ◦ f j | and, hence, |ϕ̃  ≤ lim ϕ ◦ f | .
j
n n n n j=0
j=0

Then, by the Fatou lemma (Theorem A.2.10),


    2
2 1/2
1
n−1 1/2
|ϕ̃  dμ ≤ lim inf |ϕ ◦ f j | dμ . (3.2.5)
n n j=0

We can use the Minkowski inequality (Theorem A.5.3) to bound the sequence
on the right-hand side from above:
   2 
1
n−1 1/2 n−1 1/2
1
|ϕ ◦ f j | dμ ≤ |ϕ ◦ f | dμ
j 2
. (3.2.6)
n j=0 n j=0

Since 2μ isinvariant
1/2
under f , the expression on the right-hand side is equal to
|ϕ| dμ . So, (3.2.5) and (3.2.6) imply that ϕ̃2 ≤ ϕ2 < ∞.
n−1
Now let us show that (1/n) j=0 ϕ ◦ f j converges to ϕ̃ in L2 (μ). Initially,
suppose that the function ϕ is bounded, that is, there exists C > 0 such that
|ϕ| ≤ C. Then,
1 n−1 
 j
 ϕ ◦f  ≤ C for every n and |ϕ̃| ≤ C.
n j=0

Then we may use the dominated convergence theorem (Theorem A.2.11) to


conclude that
   2   2
1
n−1 n−1
1
lim ϕ ◦ f − ϕ̃ dμ =
j
lim ϕ ◦ f − ϕ̃ dμ = 0.
j
n n j=0 n n
j=0

In other words, (1/n) n−1 j=0 ϕ ◦ f converges to ϕ̃ in the space L (μ). We are
j 2

left to extend this conclusion to arbitrary functions ϕ in L2 (μ). For that, let us
consider some sequence (ϕk ) of bounded functions such that (ϕk )k converges
to ϕ. For example:

ϕ(x) if |ϕ(x)| ≤ k
ϕk (x) =
0 otherwise.
Denote by ϕ̃k the corresponding time averages. Given any ε > 0, let k0 be fixed
such that ϕ − ϕk 2 < ε/3 for every k ≥ k0 . Note that (ϕ − ϕk ) ◦ f j 2 is equal
to ϕ − ϕk 2 for every j ≥ 0, because the measure μ is invariant. Thus,
1 n−1 
 j
 (ϕ −ϕ k )◦f  ≤ ϕ −ϕk 2 < ε/3 for every n ≥ 1 and k ≥ k0 . (3.2.7)
n j=0 2
76 Ergodic theorems

Observe also that ϕ̃ − ϕ̃k is the time average of the function ϕ − ϕk . So, the
argument in the previous paragraph gives that
ϕ̃ − ϕ̃k 2 ≤ ϕ − ϕk 2 < ε/3 for every k ≥ k0 . (3.2.8)
By assumption, for every k ≥ 1 there exists n0 (k) ≥ 1 such that
1 n−1 
 
 ϕk ◦ f − ϕ̃k  < ε/3
j
for every n ≥ n0 (k). (3.2.9)
n j=0 2

Adding (3.2.7), (3.2.8), (3.2.9) we get that


1 n−1 
 
 ϕ ◦ f − ϕ̃  < ε
j
for every n ≥ n0 (k0 ).
n j=0 2

This completes the proof of the theorem of von Neumann from the theorem of
Birkhoff.
Exercise 3.2.5 contains an extension of these conclusions to any Lp (μ)
space.

Corollary 3.2.7. The time average ϕ̃ of any function ϕ ∈ L2 (μ) coincides with
the orthogonal projection P(ϕ) of ϕ to the subspace of invariant functions.

Proof. On the one hand, Theorem 3.1.7 gives that (1/n) n−1 j=0 ϕ ◦ f converges
j

to P(ϕ) in L2 (μ). On the other hand, we have just shown that this sequence
converges to ϕ̃ in the space L2 (μ). So, by uniqueness of the limit, P(ϕ) = ϕ̃.

Corollary 3.2.8. If f : M → M is invertible then the time averages of any


function ϕ ∈ L2 (μ) relative to f and to f −1 coincide at μ-almost every point:

1 1
n−1 n−1
lim ϕ ◦ f −j = lim ϕ ◦fj at μ-almost every point. (3.2.10)
n n n n
j=0 j=0

Proof. The limit on the left-hand side of (3.2.10) is the orthogonal projection
of ϕ to the subspace of functions invariant under f −1 , whereas the limit on the
right-hand side is the orthogonal projection of ϕ to the subspace of functions
invariant under f . It is clear that these two subspaces are exactly the same.
Thus, the two limits coincide in L2 (μ).

3.2.4 Exercises
3.2.1. Let X = {x1 , . . . , xr } be a finite set and σ : X → X be a permutation. We call σ a
cyclic permutation if it admits a unique orbit (containing all r elements of X).
1. Prove that, for any cyclic permutation σ and any function ϕ : X → R,

1
n−1
ϕ(x1 ) + · · · + ϕ(xr )
lim ϕ(σ i (x)) = .
n→∞ n r
i=0
3.2 Birkhoff ergodic theorem 77

2. More generally, prove that for any permutation σ and any function ϕ

1
n−1
ϕ(x) + ϕ(σ (x)) + · · · + ϕ(σ p−1 (x))
lim ϕ(σ (x)) =
i
,
n→∞ n p
i=0

where p ≥ 1 is the cardinality of the orbit of x.


3.2.2. Check that Lemma 3.2.5 can also be deduced from the Birkhoff ergodic theorem
and then we may even weaken the hypothesis: it suffices to suppose that φ is
measurable and ψ = φ ◦ f − φ is integrable.
3.2.3. A function ϕ : Z → R is said to be uniformly quasi-periodic if for every ε > 0
there exists L(ε) ∈ N such that every interval {n + 1, . . . , n + L(ε)} in the set of
integers contains some τ such that |ϕ(k + τ ) − ϕ(k)| < ε for every k ∈ Z. Any
such τ is called an ε-quasi-period of f .
(a) Prove that if ϕ is uniformly quasi-periodic then ϕ is bounded.
(b) Show that for every ε > 0 there exists ρ ≥ 1 such that

 1 (n+1)ρ ρ 
  1 
 ϕ(j) − ϕ(j) < 2ε for every n ≥ 1.
ρ j=nρ+1 ρ j=1


(c) Show that the sequence (1/n) nj=1 ϕ(j) converges to some real number
when n → ∞.

(d) More generally, prove that limn (1/n) nk=1 ϕ(x + k) exists for every x ∈ Z
and is independent of x.
3.2.4. Prove that for Lebesgue-almost every x ∈ [0, 1], the geometric mean of the integer
numbers a1 , . . . , an , . . . in the continued fraction expansion of x converges to some
real number: in other words, there exists b ∈ R such that limn (a1 a2 · · · an )1/n = b.
[Observation: Compare with Exercise 4.2.12.]
3.2.5. Let ϕ : M → R be an integrable function and ϕ̃ be the corresponding time average,
given by Theorem 3.2.3. Show that if ϕ ∈ Lp (μ) for some p > 1 then ϕ̃ ∈ Lp (μ)
and ϕ̃p ≤ ϕp . Moreover,

1
n−1
ϕ ◦fj
n j=0

converges to ϕ̃ in the space Lp (μ).


3.2.6. Prove the Birkhoff ergodic theorem for flows: if μ is a probability measure
invariant under a flow f and ϕ ∈ L1 (μ) then the function
 T
1
ϕ̃(x) = lim ϕ(f t (x)) dt
T→∞ T 0

is defined at μ-almost every point and ϕ̃ dμ = ϕ dμ.


3.2.7. Prove that if a continuous transformation f : M → M of a compact metric space
M admits exactly one invariant probability measure μ, and this measure is such
that μ(A) > 0 for every non-empty open set A ⊂ M, then every orbit of f is dense
in M.
78 Ergodic theorems

3.3 Subadditive ergodic theorem


A sequence of functions ϕn : M → R is said to be subadditive for a
transformation f : M → M if
ϕm+n ≤ ϕm + ϕn ◦ f m for every m, n ≥ 1. (3.3.1)
Example 3.3.1. A sequence ϕn : M → R is additive for the transformation f if
ϕm+n = ϕm + ϕn ◦ f m for every m, n ≥ 1. For example, the time sums

n−1
ϕn (x) = ϕ(f j (x))
j=0

of any function ϕ : M → R form an additive sequence. In fact, every additive


sequence is of this form, with ϕ = ϕ1 . Of course, additive sequences are also
subadditive.
For the next example we need the notion of the norm of a square matrix A
of dimension d, which is defined as follows:
 
Av
A = sup : v ∈ Rd \ {0} . (3.3.2)
v
Compare with (2.3.1). It follows directly from the definition that the norm of
the product of two matrices is less than or equal to the product of the norms of
those matrices:
AB ≤ A B . (3.3.3)
Example 3.3.2. Let A : M → GL(d) be a measurable function with values
in the linear group, that is, the set GL(d) of invertible square matrices of
dimension d. Define
φ n (x) = A(f n−1 (x)) · · · A(f (x))A(x)
for every n ≥ 1 and x ∈ M. Then the sequence ϕn (x) = log φ n (x) is
subadditive. Indeed,
φ m+n (x) = φ n (f m (x))φ m (x).
and so, using (3.3.3),
ϕm+n (x) = log φ n (f m (x))φ m (x)
≤ log φ m (x) + log φ n (f m (x)) = ϕm (x) + ϕn (f m (x))
for every m, n and x.
Recall that, given any function ϕ : M → R, we denote by ϕ + : M → R its
positive part, which is defined by ϕ + (x) = max{ϕ(x), 0}.
Theorem 3.3.3 (Kingman). Let μ be a probability measure invariant under a
transformation f : M → M and let ϕn : M → R, n ≥ 1 be a subadditive sequence
of measurable functions such that ϕ1+ ∈ L1 (μ). Then (ϕn /n)n converges at
3.3 Subadditive ergodic theorem 79

μ-almost every point to some function ϕ : M → [−∞, +∞) that is invariant


under f . Moreover, ϕ + ∈ L1 (μ) and
  
1 1
ϕ dμ = lim ϕn dμ = inf ϕn dμ ∈ [−∞, +∞).
n n n n

The proof of Theorem 3.3.3 that we are going to present is due to Avila
and Bochi [AB], who started from a proof of the Birkhoff ergodic theorem
(Theorem 3.2.3) by Katznelson and Weiss [KW82]. An important observation
is that Theorem 3.2.3 is not used in the arguments. This allows us to obtain the
theorem of Birkhoff as a particular case of Theorem 3.3.3.

3.3.1 Preparing the proof


A sequence (an )n in [−∞, +∞) is said to be subadditive if am+n ≤ am + an for
every m, n ≥ 1.

Lemma 3.3.4. If (an )n is a subadditive sequence then


an an
lim = inf ∈ [−∞, ∞). (3.3.4)
n n n n

Proof. If am = −∞ for some m then, by subadditivity, an = −∞ for every


n > m. In that case, both sides of (3.3.4) are equal to −∞ and so the lemma
holds. From now on let us assume that an ∈ R for every n.
Let L = infn (an /n) ∈ [−∞, +∞) and B be any real number larger than L.
Then we may find k ≥ 1 such that
ak
< B.
k
For n > k, we may write n = kp + q, where p and q are integers such that p ≥ 1
and 1 ≤ q ≤ k. Then, by subadditivity,

an ≤ akp + aq ≤ pak + aq ≤ pak + α,

where α = max{ai : 1 ≤ i ≤ k}. Hence,


an pk ak α
≤ + .
n n k n
Observe that pk/n converges to 1 and α/n converges to zero when n → ∞. So,
since ak /k < B, we have that
an
L≤ <B
n
for every n sufficiently large. Making B → L, we conclude that
an an
lim = L = inf .
n n n n

This completes the argument.


80 Ergodic theorems

Now let (ϕn )n be as in Theorem 3.3.3. By subadditivity,


ϕn ≤ ϕ1 + ϕ1 ◦ f + · · · + ϕ1 ◦ f n−1 .
This relation remains valid if we replace ϕn and ϕ1 by their positive parts ϕn+
and ϕ1+ . Hence, the hypothesis that ϕ1+ ∈ L1 (μ) implies that ϕn+ ∈ L1 (μ) for
every n. Moreover, the hypothesis that (ϕn )n is subadditive implies that

an = ϕn dμ, n ≥ 1

is a subadditive sequence in [−∞, +∞). Therefore, by Lemma 3.3.4, the limit


an an
L = lim = inf ∈ [−∞, ∞)
n n n n

exists. Define ϕ− : M → [−∞, ∞] and ϕ+ : M → [−∞, ∞] through


ϕn ϕn
ϕ− (x) = lim inf (x) and ϕ+ (x) = lim sup (x).
n n n n
Clearly, ϕ− (x) ≤ ϕ+ (x) for every x ∈ M. We are going to prove that
 
ϕ− dμ ≥ L ≥ ϕ+ dμ, (3.3.5)

as long as each function ϕn is bounded from below. Consequently, the two


functions ϕ− and ϕ+ coincide at μ-almost every point and their integral is
equal to L. Thus, the theorem will be proven in this case, with ϕ = ϕ− = ϕ+
(the fact that ϕ is invariant under f is part of Exercise 3.3.2). At the end, we
remove that boundedness assumption using a truncation trick.

3.3.2 Key lemma


In this section we assume that ϕ− > −∞ at every point. Fix ε > 0 and define,
for each k ∈ N,
   
Ek = x ∈ M : ϕj (x) ≤ j ϕ− (x) + ε for some j ∈ {1, . . . , k} .
It is clear that Ek ⊂ Ek+1 for every k. Moreover, the definition of ϕ− (x) implies
that M = k Ek . Define also

ϕ− (x) + ε if x ∈ Ek
ψk (x) =
ϕ1 (x) if x ∈ Ekc .
It follows from the definition of Ek that ϕ1 (x) > ϕ− (x) + ε for every x ∈ Ekc .
Combining this fact with the previous observations, we see that the sequence
(ψk (x))k is non-increasing and converges to ϕ− (x) + ε, for every x ∈ M. In
particular, by the monotone convergence theorem (Theorem A.2.9),
 
ψk dμ → (ϕ− + ε) dμ as k → ∞.

The crucial step in the proof of the theorem is the following estimate:
3.3 Subadditive ergodic theorem 81

E kc E kc E kc E kc

m0 n1 m1 nl ml n

E kc E kc E kc E kc

m0 n1 m1 nl ml nl+1 n

Figure 3.1. Decomposition of the trajectory of a point

Lemma 3.3.5. For every n > k ≥ 1 and μ-almost every x ∈ M,



n−k−1 
n−1
ϕn (x) ≤ ψk (f (x)) + i
max{ψk , ϕ1 }(f i (x)).
i=0 i=n−k

Proof. Take x ∈ M such that ϕ− (x) = ϕ− (f j (x)) for every j ≥ 1 (this holds
at μ-almost every point, according to Exercise 3.3.2). Consider the sequence,
possibly finite, of integer numbers
m0 ≤ n1 < m1 ≤ n2 < m2 ≤ . . . (3.3.6)
defined inductively as follows (see also Figure 3.1).
Define m0 = 0. Let nj be the smallest integer greater than or equal to mj−1
satisfying f nj (x) ∈ Ek (if it exists). Then, by the definition of Ek , there exists mj
such that 1 ≤ mj − nj ≤ k and
ϕmj −nj (f nj (x)) ≤ (mj − nj )(ϕ− (f nj (x)) + ε). (3.3.7)
This completes the definition of the sequence (3.3.6). Now, given n ≥ k, let
l ≥ 0 be the largest integer such that ml ≤ n. By subadditivity,
nj −1

ϕnj −mj−1 (f mj−1 (x)) ≤ ϕ1 (f i (x))
i=mj−1

for every j = 1, . . . , l such that mj−1 = nj , and analogously for ϕn−ml (f ml (x)).
Thus,
  l
ϕn (x) ≤ ϕ1 (f (x)) +
i
ϕmj −nj (f nj (x)) (3.3.8)
i∈I j=1
l
where I = j=1 [mj−1 , nj ) [ml , n). Observe that
l
ϕ1 (f i (x)) = ψk (f i (x)) for every i ∈ [mj−1 , nj ) ∪ [ml , min{nl+1 , n}),
j=1

since f i (x) ∈ Ekc in all these cases. Moreover, since ϕ− is constant on orbits (see
Exercise 3.3.2) and ψk ≥ ϕ− + ε, the relation (3.3.7) gives that
mj −1 mj −1
 
ϕmj −nj (f nj (x)) ≤ (ϕ− (f i (x)) + ε) ≤ ψk (f i (x))
i=nj i=nj
82 Ergodic theorems

for every j = 1, . . . , l. In this way, using (3.3.8), we conclude that


min{nl+1 ,n}−1

n−1
ϕn (x) ≤ ψk (f (x)) +
i
ϕ1 (f i (x)).
i=0 i=nl+1

Since nl+1 > n − k, the lemma is proven.

3.3.3 Estimating ϕ−
Towards establishing (3.3.5), in this section we prove the following lemma:

Lemma 3.3.6. ϕ− dμ = L

Proof. Suppose for a while that ϕn /n is uniformly bounded from below, that is,
that there exists κ > 0 such that ϕn /n ≥ −κ for every n. Applying the lemma of
Fatou (Theorem A.2.10) to the sequence of non-negative functions ϕn /n + κ,
we get that ϕ− is integrable and
 
ϕn
ϕ− dμ ≤ lim dμ = L.
n n
To prove the opposite inequality, observe that Lemma 3.3.5 implies
  
1 n−k k
ϕn dμ ≤ ψk dμ + max{ψk , ϕ1 } dμ. (3.3.9)
n n n
Note that max{ψk , ϕ1 } ≤ max{ϕ− + ε, ϕ1+ }, and this last function is integrable.
So, the limit superior of the last term in (3.3.9) as n → ∞ is less than or equal
to zero. So, making n → ∞ we get that L ≤ ψk dμ for every k. Then, making
k → ∞, we conclude that

L ≤ ϕ− dμ + ε.

Finally, making ε → 0 we get that L ≤ ϕ− dμ. This proves the lemma when
ϕn /n is uniformly bounded from below.
We are left to remove this hypothesis. Define, for each κ > 0,
ϕnκ = max{ϕn , −κn} and ϕ−κ = max{ϕ− , −κ}.
The sequence (ϕnκ )n satisfies all the conditions of Theorem 3.3.3: indeed, it is
subadditive and the positive part of ϕ1κ is integrable. Moreover, it is clear that
ϕ−κ = lim infn (ϕnκ /n). So, the argument in the previous paragraph shows that
 
κ 1
ϕ− dμ = inf ϕnκ dμ. (3.3.10)
n n

By the monotone convergence theorem (Theorem A.2.9), we also have that


   
κ
ϕn dμ = inf ϕn dμ and ϕ− dμ = inf ϕ−κ dμ. (3.3.11)
κ κ
3.3 Subadditive ergodic theorem 83

Combining the relations (3.3.10) and (3.3.11), we get that


   
κ 1 κ 1
ϕ− dμ = inf ϕ− = inf inf ϕn dμ = inf ϕn dμ = L.
κ κ n n n n

This completes the proof of the lemma.

3.3.4 Bounding ϕ+
To complete the proof of (3.3.5), we are now going to show that ϕ+ dμ ≤ L
as long as infx ϕn (x) is finite for every n. Let us start by proving the following
auxiliary result:

Lemma 3.3.7. For any fixed k,


ϕkn ϕn
lim sup = k lim sup .
n n n n
Proof. The inequality ≤ is clear, since ϕkn /kn is a subsequence of ϕn /n. To
prove the opposite inequality, let us write n = kqn + rn with rn ∈ {1, . . . , k}. By
subadditivity,
ϕn ≤ ϕkqn + ϕrn ◦ f kqn ≤ ϕkqn + ψ ◦ f kqn ,
where ψ = max{ϕ1+ , . . . , ϕk+ }. Observe that n/qn → k when n → ∞. Moreover,
as ψ ∈ L1 (μ), we may use Lemma 3.2.5 to see that ψ ◦ f n /n converges to zero
at μ-almost every point. Hence, dividing all the terms in the previous relation
by n and taking the limit superior as n → ∞, we get that
1 1 1 1 1
lim sup ϕn ≤ lim sup ϕkqn + lim sup ψ ◦ f kqn = lim sup ϕkq ,
n n n n n n k q q
as stated in the lemma.

Lemma 3.3.8. Suppose that infx ϕn (x) > −∞ for every n. Then ϕ+ dμ ≤ L.
n−1
Proof. For each k and n ≥ 1, consider θn = − j=0 ϕk ◦ f jk . Observe that
 
θn dμ = −n ϕk dμ for every n, (3.3.12)

since f k preserves the measure μ. Since the sequence (ϕn )n is subadditive,


θn ≤ −ϕkn for every n. Hence, using Lemma 3.3.7,
θn ϕkn ϕn
θ− = lim inf ≤ − lim sup = −k lim sup = −kϕ+ ,
n n n n n n
and so  
θ− dμ ≤ −k ϕ+ dμ. (3.3.13)

Observe also that the sequence (θn )n is additive: θm+n = θm + θn ◦ f km for every
m, n ≥ 1. Since θ1 = −ϕk is bounded from above by − inf ϕk , we also have that
84 Ergodic theorems

the function θ1+ is bounded and, consequently, integrable. Thus, we may apply
Lemma 3.3.6, together with the equality (3.3.12), to conclude that
  
θn
θ− dμ = inf dμ = − ϕk dμ. (3.3.14)
n n
Putting (3.3.13) and (3.3.14) together we get that
 
1
ϕ+ dμ ≤ ϕk dμ.
k
Finally, taking the infimum over k we get that ϕ+ dμ ≤ L.

Lemmas 3.3.6 and 3.3.8 imply the relation (3.3.5) and, thus, Theorem 3.3.3
is proven when inf ϕk > −∞ for every k. In the general case, consider
ϕnκ = max{ϕn , −κn} and ϕ−κ = max{ϕ− , −κ} and ϕ+κ = max{ϕ+ , −κ}
for every constant κ > 0. The previous arguments may be applied to the
sequence (ϕnκ )n for each fixed κ > 0. Hence, ϕ+κ = ϕ−κ at μ-almost every point
for every κ > 0. Since ϕ−κ → ϕ− and ϕ+κ → ϕ+ when κ → ∞, it follows that
ϕ− = ϕ+ at μ-almost every point. The proof of Theorem 3.3.3 is complete.

3.3.5 Lyapunov exponents


We have observed previously that every sequence of time sums

n−1
ϕn = ϕ ◦ f j, n≥1
j=0

is additive and, in particular, subadditive. Therefore, the ergodic theorem


of Birkhoff (Theorem 3.2.3) is a particular case of Theorem 3.3.3. Another
important consequence of the subadditive ergodic theorem is the theorem of
Furstenberg–Kesten that we state next.
Let f : M → M be a measurable transformation and μ be an invariant
probability measure. Consider any measurable function θ : M → GL(d) with
values in the group GL(d). The cocycle defined by θ over f is the sequence of
functions defined by
φ n (x) = θ (f n−1 (x)) · · · θ (f (x))θ (x) for n ≥ 1 and φ 0 (x) = id
for every x ∈ M. We leave it to the reader to check that
φ m+n (x) = φ n (f m (x)) · φ m (x) for every m, n ∈ N and x ∈ M. (3.3.15)
It is also easy to check that, conversely, any sequence (φ n )n with this property
is the cocycle defined by θ = φ 1 over the transformation f .
Theorem 3.3.9 (Furstenberg–Kesten). If log+ θ  ∈ L1 (μ) then
1
λmax (x) = lim log φ n (x)
n n
3.3 Subadditive ergodic theorem 85

exists at μ-almost every point. Moreover, λ+


max ∈ L (μ) and
1
  
1 1
λmax dμ = lim log φ  dμ = inf
n
log φ n  dμ.
n n n n

If log+ θ −1  ∈ L1 (μ) then


1
λmin (x) = lim − log φ n (x)−1 
n n
exists at μ-almost every point. Moreover, λmin ∈ L1 (μ) and
  
1 1
λmin dμ = lim − log (φ n )−1  dμ = sup − log (φ n )−1  dμ.
n n n n
To deduce this result from Theorem 3.3.3 it suffices to note that the
sequences
ϕnmax (x) = log φ n (x) and ϕnmin (x) = log φ n (x)−1 
are subadditive (recall Example 3.3.2).
The multiplicative ergodic theorem of Oseledets, which we are going to
state in the following, provides a major refinement of the conclusion of the
Furstenberg–Kesten theorem. It asserts that, under the same hypotheses as
Theorem 3.3.9, for μ-almost every x ∈ M there exist a positive integer k = k(x)
and real numbers λ1 (x) > · · · > λk (x) and a filtration (that is, a decreasing
sequence of vector subspaces)
Rd = Vx1 > · · · > Vxk > Vxk+1 = {0} (3.3.16)
such that, for every i ∈ {1, . . . , k} and μ-almost every x ∈ M,
(a1) k(f (x)) = k(x) and λi (f (x)) = λi (x) and θ (x) · Vxi = Vfi(x) ;
1
(b1) lim log φ n (x)v = λi (x) for every v ∈ Vxi \ Vxi+1 ;
n n
1  k
(c1) lim log | det φ (x)| =
n
di (x)λi (x), where di (x) = dim Vxi − dim Vxi+1 .
n n
i=1

Moreover, the numbers k(x) and λ1 (x), . . . , λk (x) and the subspaces Vx1 , . . . , Vxk
depend measurably on the point x.
The numbers λi (x) are called the Lyapunov exponents of θ at the point x.
They satisfy λ1 = λmax and λk = λmin . For this reason, we also call λmax (x)
and λmin (x) the extremal Lyapunov exponents of θ at the point x. Each di (x) is
called the multiplicity of the Lyapunov exponent λi (x).
When f is invertible, we may extend the sequence φ n to the whole of Z,
through
φ −n (x) = φ n (f −n (x))−1 for every n ≥ 1 and x ∈ M.
Assuming also that log+ θ −1  ∈ L1 (μ), one obtains a stronger conclusion than
before: more than a filtration, there is a decomposition
Rd = Ex1 ⊕ · · · ⊕ Exk (3.3.17)
86 Ergodic theorems

such that, for every i = 1, . . . , k,


(a2) θ (x) · Exi = Efi (x) and Vxi = Vxi+1 ⊕ Exi ; so, dim Exi = dim Vxi − dim Vxi+1 ;
1
(b2) lim log φ n (x)v = λi (x) for every v ∈ Exi different from zero;
n→±∞ n

1  k
(c2) lim log | det φ (x)| = n
di (x)λi (x), where di (x) = dim Exi .
n→+∞ n
i=1

The reader will find a much more detailed discussion of these results,
including proofs, in Chapter 4 of [Via14].

3.3.6 Exercises
3.3.1. Give a direct proof of the Birkhoff ergodic theorem (Theorem 3.2.3), using the
approach in the proof of Theorem 3.3.3.
3.3.2. Given a subadditive sequence (ϕn )n with ϕ1+ ∈ L1 (μ), show that the functions
ϕn ϕn
ϕ− = lim inf and ϕ+ = lim sup
n n n n
are f -invariant, that is, they satisfy ϕ− (x) = ϕ− ◦ f (x) and ϕ+ (x) = ϕ+ ◦ f (x) for
μ-almost every x ∈ M.
3.3.3. State and prove the subadditive ergodic theorem for flows.
3.3.4. Let M be a compact manifold and f : M → M be a diffeomorphism of class C1
that preserves the Lebesgue measure. Check that

k(x)
di (x)λi (x) = 0 at μ-almost every point x ∈ M,
i=1

where λi (x), i = 1, . . . , k(x) are the Lyapunov exponents of Df at the point x and
di (x), i = 1, . . . , k(x) are the corresponding multiplicities.
3.3.5. Let (ϕn )n be a subadditive sequence of functions for some transformation f : M →
M. We call the time constant of (ϕn )n the number

1
lim ϕn dμ
n n

when it exists. Assuming that the limit does exist and is finite, show that we may
write ϕn = ψn + γn for each n, in such a way that (ψn )n is an additive sequence
and (γn )n is a subadditive sequence with time constant equal to zero.
3.3.6. Under the assumptions of the Furstenberg–Kesten theorem, show that the
sequence ψn = (1/n) log φ n  is uniformly integrable, in the following sense:
for every ε > 0 there exists δ > 0 such that

μ(E) < δ ⇒ ψn+ dμ < ε for every n.
E

3.3.7. Under the assumptions of the Furstenberg–Kesten theorem, let k denote the
time average of the function ψk = (1/k) log φ k  relative to the transformation
f k . Show that λmax (x) ≤ k (x) for every k and μ-almost every x. Using
Exercise 3.3.6, show that for every ρ > 0 and μ-almost every x there exists k
such that k (x) ≤ λmax (x) + ρ.
3.4 Discrete time and continuous time 87

3.4 Discrete time and continuous time


Most of the time we focus our presentation in the context of dynamical systems
with discrete time. However, almost everything that was said so far extends,
more or less straightforwardly, to systems with continuous time. One reason
why the two theories are so similar is that one may relate systems of either kind
to systems of the other kind, through certain constructions. That is the subject
of the present section. For simplicity, we stick to the case of invertible systems.
The general case may be handled using the notion of natural extension, which
was described in Section 2.4.2.

3.4.1 Suspension flows


Our first construction associates with every invertible map f : M → M and
every measurable function τ : M → (0, ∞) a flow gt : N → N, t ∈ R, that we
call the suspension of f with return time τ , whose recurrence properties are
directly related to the recurrence properties of f . In particular, we associate a
measure ν invariant under this flow with every measure μ invariant under f .
For this construction we assume that the function τ is such that
∞ ∞

τ (f (x)) =
j
τ (f −j (x)) = +∞ (3.4.1)
j=1 j=1

for every x ∈ M. That is the case, for instance, if τ is bounded away from zero.
The first step is to construct the domain N of the suspension flow. Let us
consider the transformation F : M × R → M × R defined by
F(x, s) = (f (x), s − τ (x)).
Note that F is invertible. Let ∼ be the equivalence relation defined in M × R
by
(x, s) ∼ (x̃, s̃) ⇔ there exists n ∈ Z such that F n (x, s) = (x̃, s̃).
We denote by N the set of equivalence classes and by π : M × R → N the
canonical projection associating with every (x, s) ∈ M × R the corresponding
equivalence class.
Now consider the flow Gt : M × R → M × R given by Gt (x, s) = (x, s + t). It
is clear that Gt ◦ F = F ◦ Gt for every t ∈ R. This ensures that Gt , t ∈ R induces
a flow gt , t ∈ R in the quotient space N, given by
gt (π(x, s)) = π(Gt (x, s)) for every x ∈ M and s, t ∈ R. (3.4.2)
Indeed, if π(x, s) = π(x̃, s̃) then there exists n ∈ Z such that F n (x, s) = (x̃, s̃).
Hence,
Gt (x̃, s̃) = Gt ◦ F n (x, s) = F n ◦ Gt (x, s)
and so π(Gt (x, s)) = π(Gt (x̃, s̃)). This shows that the flow gt , t ∈ R is well
defined.
88 Ergodic theorems

To better understand how this flow is related to the transformation f , we need


to revisit the construction from a more concrete point of view. Let us consider
D = {(x, s) ∈ M × R : 0 ≤ s < τ (x)}. We claim that D is a fundamental domain
for the equivalence relation ∼, that is, it contains exactly one representative
of each equivalence class. Uniqueness of the representative is immediate: just
observe that if (x, s) ∈ D then F n (x, s) = (xn , sn ) with sn < 0 for every n > 0
and sn > τ (f n (x)) for every n > 0. To prove existence, we need the condition
(3.4.1): it ensures that the iterates (xn , sn ) = F n (x, s) of any (x, s) satisfy

lim sn = −∞ and lim sn = +∞.


n→+∞ n→−∞

Then, taking n maximum such that sn ≥ 0, we find that (xn , sn ) ∈ D. In this way,
the claim is proved. Now observe that the claim means that the restriction of
the projection π to domain D is a bijection over N. Thus, we may identify N
with D and, in particular, we may consider gt , t ∈ R as a flow in D.
In just the same way, we may identify M with the subset  = π(M × {0}) of
N. Observing that

gτ (x) (π(x, 0)) = π(x, τ (x)) = π(f (x), 0), (3.4.3)

we see that, through this identification, the transformation f : M → M


corresponds to the first-return map (or Poincaré return map) of the suspension
flow to . See Figure 3.2.
Now let μ be a measure on M invariant under f . Let us denote by ds the
Lebesgue measure on the real line R. It is clear that the (infinite) measure
μ × ds is invariant under the flow Gt , t ∈ R. Moreover, it is invariant under the
transformation F, since μ is invariant under f . We call the suspension of μ with
return time τ the measure ν defined on N by

ν = π∗ (μ × ds | D). (3.4.4)

In other words, ν is the measure such that


   τ (x)
ψ dν = ψ(π(x, s)) ds dμ(x)
0

f(x)

0 τ (x)

Figure 3.2. Suspension flow


3.4 Discrete time and continuous time 89

for every bounded measurable function ψ : N → (0, ∞). In particular,


 
ν(N) = 1 dν = τ (x) dμ(x) (3.4.5)

is finite if and only if the function τ is integrable with respect to μ.

Proposition 3.4.1. The flow gt , t ∈ R preserves the measure ν.

Proof. Let us fix t ∈ R. Given any measurable set B ⊂ N, let B̂ = π −1 (B) ∩ D.


By the definition of ν, we have that ν(B) = (μ × ds)(B̂). For each n ∈ Z, let
B̂n be the set of all pairs (x, s) ∈ B̂ such that G−t (x, s) ∈ F n (D) and let Bn =
π(B̂n ). Since D is a fundamental domain, {B̂n : n ∈ Z} is a partition of B̂ and
{Bn : n ∈ Z} is a partition of B. Moreover, B̂n = π −1 (Bn ) ∩ D and, consequently,
ν(Bn ) = (μ × ds)(B̂n ) for every n. The definition of the suspension flow gives
that
     k   k  −t 
π −1 g−t (Bn ) = G−t π −1 (Bn ) = G−t F (B̂n ) = F G (B̂n ) .
k∈Z k∈Z

Observing that F −n (G−t (B̂n )) ⊂ D, we conclude that


     
ν g−t (Bn ) = (μ × ds) π −1 (g−t (Bn )) ∩ D = (μ × ds) F −n (G−t (B̂n )) .
As the measure μ × ds is invariant under both F and Gt , the last expression is
equal to (μ × ds)(B̂n ). Therefore,
 
ν(g−t (B)) = ν(g−t (Bn )) = (μ × ds)(B̂n ) = (μ × ds)(B̂) = ν(B).
n∈Z n∈Z

This proves that ν is invariant under the flow gt , t ∈ R.

In Exercise 3.4.2 we invite the reader to relate the recurrence properties of


the systems (f , μ) and (gt , ν).

3.4.2 Poincaré maps


Next, we present a kind of inverse for the construction described in the previous
section. Let gt : N → N, t ∈ R be a measurable flow and ν be an invariant
measure. Let  ⊂ N be a cross-section of the flow, that is, a subset of N such
that for every x ∈  there exists τ (x) ∈ (0, ∞] such that gt (x) ∈
/  for every t ∈
τ (x)
(0, τ (x)) and g (x) ∈  whenever τ (x) is finite. We call τ (x) the first-return
time of x to . Our goal is to construct, starting from ν, a measure μ that is
invariant under the first-return map (or Poincaré return map)
f : {x ∈  : τ (x) < ∞} → , f (x) = gτ (x) (x).
Observe that this map is injective.
For each ρ > 0, denote ρ = {x ∈  : τ (x) ≥ ρ}. Given A ⊂ ρ and δ ∈ (0, ρ],
consider Aδ = {gt (x) : x ∈ A and 0 ≤ t < δ}. Observe that the map (x, t) → gt (x)
90 Ergodic theorems

is a bijection from A × [0, δ) to Aδ . Assume that  is endowed with a σ -algebra


of measurable subsets such that

1. the function τ and the maps f and f −1 are measurable;


2. if A ⊂ ρ is measurable then Aδ ⊂ N is measurable, for every δ ∈ (0, ρ].

Lemma 3.4.2. Let A be a measurable subset of ρ for some ρ > 0. Then, the
function δ → ν(Aδ )/δ is constant in the interval (0, ρ].

Proof. Consider any δ ∈ (0, ρ] and l ≥ 1. It is clear that



l−1
Aδ = giδ/l (Aδ/l ),
i=0

and this is a disjoint union. Using that ν is invariant under the flow gt , t ∈ R,
we conclude that ν(Aδ ) = lν(Aδ/l ) for every δ ∈ (0, ρ] and every l ≥ 1. Then,
ν(Arδ ) = rν(Aδ ) for every δ ∈ (0, ρ] and every rational number r ∈ (0, 1). Using,
furthermore, the fact that both sides of this relation vary monotonically with
r, we get that the equality remains true for every real number r ∈ (0, 1). This
implies the conclusion of the lemma.

Given any measurable subset A of ρ , ρ > 0, let us define


ν(Aδ )
μ(A) = for any δ ∈ (0, ρ]. (3.4.6)
δ
Next, given any measurable subset A of , let

μ(A) = sup μ(A ∩ ρ ). (3.4.7)


ρ

See Figure 3.3. We leave it to the reader to check that μ is a measure in 


(Exercise 3.4.1). We call it the flux of ν through  under the flow.

Proposition 3.4.3. Suppose that the measure ν is finite. Then the measure μ
in  is invariant under the Poincaré map f .

f(A)
f(A)δ


A

Figure 3.3. Flux of a measure through a cross-section


3.4 Discrete time and continuous time 91

Proof. Start by observing that the map f is essentially surjective: the


complement of the image f () has measure zero. Indeed, suppose that there
exists a set E with μ(E) > 0 contained in  \f (). It is no restriction to assume
that E ⊂ ρ for some ρ > 0. Then, ν(Eρ ) > 0. Since ν is finite, by assumption,
we may apply the Poincaré recurrence theorem to the flow g−t , t ∈ R. We get
that there exists z ∈ Eρ such that g−s (z) ∈ Eρ for arbitrarily large values of s > 0.
By definition, z = gt (y) for some y ∈ E and some t ∈ (0, ρ]. By construction,
the backward trajectory of y intersects . Hence, there exists x ∈  such that
f (x) = y. This contradicts the choice of E. Thus, the claim is proved.
Given any measurable set B ⊂ , let us denote A = f −1 (B). Moreover, given
ε > 0, let us consider a countable partition of B into measurable subsets Bi
satisfying the following conditions: for every i there is ρi > 0 such that

1. Bi and Ai = f −1 (Bi ) are contained in ρi ;


2. sup(τ | Ai ) − inf(τ | Ai ) < ερi .

Next, choose ti < inf(τ | Ai ) ≤ sup(τ | Ai ) < si such that si − ti < ερi . Fix
δi = ρi /2. Then, using the fact that f is essentially surjective,
gti (Aiδi ) ⊃ Biδi −(si −ti ) and gsi (Aiδi ) ⊂ Biδi +(si −ti ) .
Hence, using the hypothesis that ν is invariant,
ν(Aiδi ) = ν(gti (Aiδi )) ≥ ν(Biδi −(si −ti ) )
ν(Aiδi ) = ν(gsi (Aiδi )) ≤ ν(Biδi +(si −ti ) ).
Dividing by δi we get that
(si − ti )
μ(Ai ) ≥ 1 − μ(Bi ) > (1 − 2ε)μ(Bi )
δ
(si − ti )
μ(Ai ) ≤ 1 + μ(Bi ) < (1 + 2ε)μ(Bi ).
δ
Finally, adding over all the values of i, we conclude that
(1 − 2ε)μ(A) ≤ μ(B) ≤ (1 + 2ε)μ(A).
Since ε is arbitrary, this proves that the measure μ is invariant under f .

3.4.3 Exercises
3.4.1. Check that the function μ defined by (3.4.6)–(3.4.7) is a measure.
3.4.2. In the context of Section 3.4.1, suppose that M is a topological space and f : M →
M and τ : M → (0, ∞) are continuous. Let gt : N → N be the suspension flow
and ν be the suspension of some Borel measure μ invariant under f .
(a) Show that if x ∈ M is recurrent for the transformation f then π(x, s) ∈ N is
recurrent for the flow gt , for every s ∈ R.
(b) Show that if π(x, s) ∈ N is recurrent for the flow gt , for some s ∈ R, then
x ∈ M is recurrent for f .
92 Ergodic theorems

(c) Conclude that the set of recurrent points of f has total measure for μ if and
only if the set of recurrent points of gt , t ∈ R has total measure for ν. In
particular, this happens if at least one of the measures μ or ν is finite.
3.4.3. Let gt : N → N, t ∈ R be the flow defined by a vector field X of class C1 on a
compact Riemannian manifold N. Assume that this flow preserves the volume
measure ν associated with the Riemannian metric. Let  be a hypersurface of N
transverse to X and ν be the volume measure on  associated with the restriction
of the Riemannian metric. Define φ :  → (0, ∞) through φ(y) = |X(y) · n(y)|,
where n(·) is a unit vector field orthogonal to . Show that the measure η = φν
is invariant under the Poincaré map f :  →  of the flow. Indeed, η coincides
with the flux of ν through .
3.4.4. The following construction has a significant role in the theory of interval
exchanges. Let N̂ ⊂ R4+ be the set of all 4-tuples (λ1 , λ2 , h1 , h2 ) of positive real
numbers, endowed with the standard volume measure ν̂ = dλ1 dλ2 dh1 dh2 . Define

(λ1 − λ2 , λ2 , h1 , h1 + h2 ) if λ1 > λ2
F : N̂ → N̂, F(λ1 , λ2 , h1 , h2 ) =
(λ1 , λ2 − λ1 , h1 + h2 , h2 ) if λ1 < λ2 .

(F is not defined when λ1 = λ2 .) Let N be the quotient of N̂ by the equivalence


relation z ∼ z̃ ⇔ F n (z) = z̃ for some n ∈ Z and let π : N̂ → N be the canonical
projection. Define

Gt : N̂ → N̂, t ∈ R, Gt (λ1 , λ2 , h1 , h2 ) = (et λ1 , et λ2 , e−t h1 , e−t h2 ).

Let â : N̂ → (0, ∞) be the functional given by â(λ1 , λ2 , h1 , h2 ) = λ1 h1 + λ2 h2 . For


each c > 0, let N̂c be the subset of all x ∈ N̂ such that â(x) = c, let ν̂c be the volume
measure defined on N̂c by the restriction of the Riemannian metric of R4+ and let
η̂c = ν̂c / grad â.
(a) Show that F preserves the functional â and so there exists a functional
a : N → (0, ∞) such that a ◦ π = â. Show that Gt commutes with F and
preserves â. Hence, (Gt )t induces a flow (gt )t in the quotient space N that
preserves the functional a. Check that F and (Gt )t preserve ν̂ and η̂c for
every c.
 
(b) Check that D = (λ1 , λ2 , h1 , h2 ) : λ1 +λ2 ≥ 1 > max{λ1 , λ2 } is a fundamental
domain for ∼. Consider the measure ν = π∗ (ν̂ | D) on N. Check that the
definition does not depend on the choice of the fundamental domain and
show that ν is invariant under the flow (gt )t . Is the measure ν finite?
(c) Check that  = π({(λ1 , λ2 , h1 , h2 ) : λ1 + λ2 = 1}) is a cross-section of the
flow (gt )t . Calculate the Poincaré map f :  →  and the corresponding
first-return time function τ . Calculate the flux μ of the measure ν through
. Is the measure μ finite?
(d) For every c > 0, let Nc = π(N̂c ) and ηc = π∗ (η̂c ∩ D). Show that Nc and ηc
are invariant under (gt )t , for every c > 0. Check that ηc (Nc ) < ∞ for every
c. Conclude that ν-almost every point is recurrent for the flow (gt )t .
4
Ergodicity

The theorems presented in the previous chapter fully establish the first part of
Boltzmann’s ergodic hypothesis: for any measurable set E, the mean sojourn
time τ (E, x) is well defined for almost every point x. The second part of the
ergodic hypothesis, that is, the claim that τ (E, x) should coincide with the
measure of E for almost every x, is a statement of a different nature and is
the subject of the present chapter.
In this chapter we always take μ to be a probability measure invariant under
some measurable transformation f : M → M. We say that the system (f , μ) is
ergodic if, given any measurable set E, we have τ (E, x) = μ(E) for μ-almost
every point x ∈ M. We are going to see that this is equivalent to saying that
the system is dynamically indivisible, in the sense that every invariant set
has either full measure or zero measure. Other equivalent formulations of the
ergodicity property are discussed in Section 4.1. One of them is that time
averages coincide with space averages: for every integrable function ϕ,

1
n−1
lim ϕ(f j (x)) = ϕ dμ at μ-almost every point.
n n
j=0

In Section 4.2 we illustrate, by means of examples, several techniques to


prove or disprove ergodicity. Most of them will be utilized again later in
more complex situations. Next, we take the following viewpoint: we fix the
dynamical systems and analyze the properties of ergodic measures within the
space of all invariant measures of that dynamical system. As we are going to
see in Section 4.3, the ergodic measures are precisely the extremal elements of
that space.
In Section 4.4 we give a brief outline of the historical development of
ergodic theory in the context of conservative systems. The main highlights are
KAM theory, thus denominated in homage to Andrey Kolmogorov, Vladimir
Arnold and Jürgen Moser, and hyperbolic dynamics, which was initiated by
Steven Smale, Dmitry Anosov, Yakov Sinai and their collaborators. The two
theories deal with distinct types of dynamical behavior, elliptic and hyperbolic,
94 Ergodicity

and they reach opposing conclusions: roughly speaking, hyperbolic systems


are ergodic whereas elliptic systems are not.

4.1 Ergodic systems


We use the expressions “the measure μ is ergodic with respect to the
transformation f ” or “the transformation f is ergodic with respect to the
measure μ” to mean the same thing, namely, that the system (f , μ) is ergodic.
Recall that, by definition, this means that the mean sojourn time in any
measurable set of μ-almost every point coincides with the measure of that set.
This condition can be rephrased in several equivalent ways, as we are going to
see next.

4.1.1 Invariant sets and functions


A measurable function ϕ : M → R is said to be invariant if ϕ = ϕ ◦f at μ-almost
every point. In other words, ϕ is invariant if it is constant on every trajectory
of f outside a zero measure subset. Moreover, we say that a measurable set
B ⊂ M is invariant if its characteristic function XB is an invariant function.
Equivalently, B is invariant if it differs from its pre-image f −1 (B) by a zero
measure set:
μ(B f −1 (B)) = 0.

Exercise 1.1.4 collects some equivalent formulations of this property. It is


easy to check that the family of all invariant sets is a σ -algebra, that is, it
is closed under countable unions and intersections and under passage to the
complement.

Example 4.1.1. Let f : [0, 1] → [0, 1] be the decimal expansion transformation


introduced in Section 1.3.1, and μ be the Lebesgue measure. Clearly, the set
A = Q ∩ [0, 1] of rational numbers is invariant. Other interesting examples are
the sets of points x = 0.a1 a2 . . . in [0, 1] with prescribed proportions of digits ai
with each value k ∈ {0, . . . , 9}. More precisely, given any vector p = (p0 , . . . , p9 )

such that pi ≥ 0 for every i and i pi = 1, define
 
1
Ap = x : lim #{1 ≤ i ≤ n : ai = k} = pk for k = 0, . . . , 9 .
n n

Observe that if x = 0 · a1 a2 . . . then every point y ∈ f −1 (x) may be written as


y = 0 · ba1 a2 . . . with b ∈ {0, . . . , 9}. It is clear that the extra digit b does not
affect the proportion of digits with any of the values 0, . . . , 9 in the decimal
expansion. Thus, y ∈ Ap if and only if x ∈ Ap . This implies that Ap is indeed
invariant under f .
4.1 Ergodic systems 95

Example 4.1.2. Let ϕ : [0, 1] → R be a function in L1 (μ). According to the


ergodic theorem of Birkhoff (Theorem 3.2.3), the time average ϕ̃ is an invariant
function. So, every level set
Bc = {x ∈ [0, 1]; ϕ̃(x) = c}
is an invariant set. Observe also that every invariant function is of this form:
it is clear that if ϕ is invariant then it coincides with its time average ϕ̃ at
μ-almost every point.

The next proposition collects a few equivalent ways to define ergodicity. We


say that a function ϕ is constant at μ-almost every point if there exists c ∈ R
such that ϕ(x) = c for μ-almost every x ∈ M.

Proposition 4.1.3. Let μ be an invariant probability measure of a measurable


transformation f : M → M. The following conditions are all equivalent:

(i) For every measurable set B ⊂ M one has τ (B, x) = μ(B) for μ-almost
every point.
(ii) For every measurable set B ⊂ M the function τ (B, ·) is constant at
μ-almost every point.
(iii) For every integrable function ϕ : M → R one has ϕ̃(x) = ϕ dμ for
μ-almost every point.
(iv) For every integrable function ϕ : M → R the time average ϕ̃ : M → R is
constant at μ-almost every point.
(v) For every invariant integrable function ψ : M → R one has ψ(x) = ψ dμ
for μ-almost every point.
(vi) Every invariant integrable function ψ : M → R is constant at μ-almost
every point.
(vii) For every invariant subset A we have either μ(A) = 0 or μ(A) = 1.

Proof. It is immediate that (i) implies (ii), that (iii) implies (iv) and that
(v) implies (vi). It is also clear that (v) implies (iii) and (vi) implies (iv),
because the time average is an invariant function (recall Proposition 3.2.4).
Analogously, (iii) implies (i) and (iv) implies (ii), because the mean sojourn
time is a time average (of the characteristic function of B). We are left to prove
the following implications:
(ii) implies (vii): Let A be an invariant set. Then τ (A, x) = 1 for μ-almost
every x ∈ A and τ (A, x) = 0 for μ-almost every x ∈ Ac . Since τ (A, ·) is assumed
to be constant at μ-almost every point, it follows that μ(A) = 0 or μ(A) = 1.
(vii) implies (v): Let ψ be an invariant integrable function. Then every
level set
Bc = {x ∈ M : ψ(x) ≤ c}
is an invariant set. So, the hypothesis implies that μ(Bc ) ∈ {0, 1} for every
c ∈ R. Since c → μ(Bc ) is non-decreasing, it follows that there exists c̄ ∈ R
96 Ergodicity

such that μ(Bc ) = 0 for every c < c̄ and μ(Bc ) = 1 for every c ≥ c̄. Then ψ = c̄
at μ-almost every point. Hence, ψ dμ = c̄ and so ψ = ψ dμ at μ-almost
every point.

4.1.2 Spectral characterization


The next proposition characterizes the ergodicity property in terms of the
Koopman operator Uf (ϕ) = ϕ ◦ f :
Proposition 4.1.4. Let μ be an invariant probability measure of a measurable
transformation f : M → M. The following conditions are equivalent:

(i) (f , μ) is ergodic.
(ii) For any pair of measurable sets A and B one has

1   −j 
n−1
lim μ f (A) ∩ B = μ(A)μ(B). (4.1.1)
n n
j=0

(iii) For any functions ϕ ∈ Lp (μ) and ψ ∈ Lq (μ), with 1/p + 1/q = 1, one has
n−1   
1 j
lim (Uf ϕ)ψ dμ = ϕ dμ ψ dμ. (4.1.2)
n n
j=0

Proof. It is clear that (iii) implies (ii): just take ϕ = XA and ψ = XB . To show
that (ii) implies (i), let A be an invariant set. Taking A = B in hypothesis (ii),
we get that
1   −j 
n−1
μ(A) = lim μ f (A) ∩ A = μ(A)2 .
n n
j=0

This implies that μ(A) = 0 or μ(A) = 1.


Now it suffices to prove that (i) implies (iii). Consider any ϕ ∈ Lp (μ) and
ψ ∈ Lq (μ). By ergodicity and the ergodic theorem of Birkhoff (Theorem 3.2.3)
we have that

1 j
n−1
U ϕ → ϕ dμ (4.1.3)
n j=0 f

at μ-almost every point. Initially, assume that |ϕ| ≤ k for some k ≥ 1. Then, for
every n ∈ N,
   
 1 n−1 j 
 U ϕ ψ  ≤ k|ψ|.
 n f 
j=0

So, since k|ψ| ∈ L1 (μ), we may use the dominated convergence theorem
(Theorem A.2.11) to conclude that
⎛ ⎞
  j
n−1  
⎝1 ⎠
U ϕ ψ dμ → ϕ dμ ψ dμ.
n j=0 f
4.1 Ergodic systems 97

This proves the claim (4.1.2) when ϕ is bounded. All that is left to do is remove
this restriction. Given any ϕ ∈ Lp (μ) and k ≥ 1, define

⎨ k if ϕ(x) > k
ϕk (x) = ϕ(x) if ϕ(x) ∈ [−k, k]

−k if ϕ(x) < −k.

Fix ε > 0. By the previous argument, for every k ≥ 1 one has


      
 1
n−1

 U
j
ϕ ψ dμ − ϕ dμ ψ dμ<ε (4.1.4)
 n j=0 f
k k 

if n is large enough (depending on k). Next, observe that ϕk − ϕp → 0 when
k → ∞: this is clear when p = ∞, because ϕk = ϕ for every k > ϕ∞ ; for
p < ∞ use the monotone convergence theorem (Theorem A.2.9). Hence, using
the Hölder inequality (Theorem A.5.5), we have that
   
   
 (ϕk − ϕ) dμ ψ dμ ≤ ϕk − ϕp  ψ dμ < ε, (4.1.5)
 

for every k sufficiently large. Similarly,


   n−1   
 1 n−1 j  1  
 U (ϕ − ϕ) ψ dμ ≤  U
j
(ϕ − ϕ) ψ dμ
 n f k  n  f k 
j=0 j=0

1 j
n−1 (4.1.6)
≤ U (ϕk − ϕ)p ψq dμ
n j=0 f

= ϕk − ϕp ψq < ε,

for every n and every k sufficiently large, independent of n. Fix k so that (4.1.5)
and (4.1.6) hold and then take n sufficiently large such that (4.1.4) also holds.
Summing the three relations (4.1.4) to (4.1.6), we get that
   n−1 
 1  j   

 U ϕ ψ dμ − ϕ dμ ψ dμ < 3ε
 n j=0 f 

for every n sufficiently large. This gives condition (iii).

In the case p = q = 2, the condition (4.1.2) may be expressed in terms of the


inner product · in the space L2 (μ). In this way we get that (f , μ) is ergodic if
and only if:

1  n 
n−1
lim (Uf ϕ) − (ϕ · 1) · ψ = 0 for every ϕ, ψ ∈ L2 (μ). (4.1.7)
n n
j=0
98 Ergodicity

We will use a few times the following elementary facts: given any
measurable sets A and B,
|μ(A) − μ(B)| = |μ(A \ B) − μ(B \ A)|
(4.1.8)
≤ μ(A \ B) + μ(B \ A) = μ(A B),
and given any sets A1 , A2 , B1 , B2 ,
   
A1 ∩ A2 B1 ∩ B2 ⊂ (A1 B1 ) ∪ (A2 B2 ). (4.1.9)
Corollary 4.1.5. Assume that the condition (4.1.1) in Proposition 4.1.4 holds
for every A and B in some algebra A that generates the σ -algebra of
measurable sets. Then (f , μ) is ergodic.

Proof. Let A and B be arbitrary measurable sets. By the approximation


theorem (Theorem A.1.19), given any ε > 0 there are A0 and B0 in A such
that μ(A A0 ) < ε and μ(B B0 ) < ε. Observe that
  −j  
μ f (A) ∩ B − μ(f −j (A0 ) ∩ B0 ) ≤ μ(f −j (A) f −j (A0 )) + μ(B B0 )

= μ(A A0 ) + μ(B B0 ) < 2ε


(the equality uses the fact that μ is an invariant measure) for every j and
|μ(A)μ(B) − μ(A0 )μ(B0 )| ≤ μ(A A0 ) + μ(B B0 ) < 2ε.
Then, the hypothesis

1   −j 
n−1
lim μ f (A0 ) ∩ B0 = μ(A0 )μ(B0 )
n n
j=0

implies that

1   −j 
n−1
−4ε ≤ lim inf μ f (A) ∩ B − μ(A)μ(B)
n n j=0

1   −j 
n−1
≤ lim sup μ f (A) ∩ B − μ(A)μ(B) ≤ 4ε.
n n j=0

Since ε is arbitrary, this proves that the condition (4.1.1) holds for all pairs of
measurable sets. According to Proposition 4.1.4, it follows that the system is
ergodic.

In the same spirit, it suffices to check part (iii) of Proposition 4.1.4 on dense
subsets:
Corollary 4.1.6. Assume that the condition (4.1.2) in Proposition 4.1.4 for
every ϕ and ψ in dense subsets of Lp (μ) and Lq (μ), respectively. Then (f , μ)
is ergodic.
We leave the proof of this fact to the reader (see Exercise 4.1.3).
4.1 Ergodic systems 99

4.1.3 Exercises
4.1.1. Let (M, A) be a measurable space and f : M → M be a measurable transformation.
Prove that if p ∈ M is a periodic point of period k, then the measure μp = 1k (δp +
δf (p) + · · · + δf k−1 (p) ) is ergodic.
4.1.2. Let μ be an invariant probability measure, not necessarily ergodic, of a
measurable transformation f : M → M. Show that the following limit exists for
any pair of measurable sets A and B:

1   −i 
n−1
lim μ f (A) ∩ B .
n n
i=0

4.1.3. Show that an invariant probability measure μ is ergodic for a transformation


f : M → M if and only if any one of the following conditions holds:
(a) μ( n≥0 f −n (A)) = 1 for every measurable set A with μ(A) > 0;
(b) given any measurable sets A, B with μ(A)μ(B) > 0, there is n ≥ 1 such that
 
μ f −n (A) ∩ B > 0;
(c) the convergence in condition (iii) of Proposition 4.1.4 holds for some choice
of p, q and some dense subset of functions ϕ ∈ Lp (μ) and ψ ∈ Lq (μ);
(d) there is p ∈ [1, ∞] such that every invariant function ϕ ∈ Lp (μ) is constant
at μ-almost every point;
(e) every integrable function ϕ satisfying ϕ ◦ f ≥ ϕ at μ-almost every point (or
ϕ ◦ f ≤ ϕ at μ-almost every point) is constant at μ-almost every point.
4.1.4. Take M to be a metric space. Prove that an invariant probability measure μ
is ergodic for f : M → M if and only if the time average of every bounded
uniformly continuous function ϕ : M → R is constant at μ-almost every
point.
4.1.5. Take M to be a metric space. We call the basin of an invariant probability measure
μ the set B(μ) of all points x ∈ M such that

1
n−1
lim ϕ(f j (x)) = ϕ dμ
n→∞ n
j=0

for every bounded continuous function ϕ : M → R. Check that the basin is an


invariant set. Moreover, if μ is ergodic then B(μ) has full μ-measure.
4.1.6. Show that if μ and η are distinct ergodic probability measures of a transformation
f : M → M, then η and μ are mutually singular.
4.1.7. Let μ be a probability measure invariant under some transformation f : M → M.
Show that the product measure μ2 = μ × μ is invariant under the transformation
f2 : M × M → M × M defined by f2 (x, y) = (f (x), f (y)). Moreover, if (f2 , μ2 ) is
ergodic then (f , μ) is ergodic. Is the converse true?
4.1.8. Let μ be a probability measure invariant under some transformation f : M →
M. Assume that (f n , μ) is ergodic for every n ≥ 1. Show that if ϕ is a
non-constant eigenfunction of the Koopman operator Uf then the eigenvalue
is not a root of unity and any set restricted to which ϕ is constant has zero
measure.
100 Ergodicity

4.2 Examples
In this section we use a number of examples to illustrate different methods for
checking whether a system is ergodic or not.

4.2.1 Rotations on tori


Initially, let us consider the case of a rotation Rθ : S1 → S1 on the circle S1 =
R/Z. As observed in Section 1.3.3, the Lebesgue measure m is invariant under
Rθ . We want to analyze the ergodic behavior of the system (Rθ , m) for different
values of θ .
q
If θ is rational, say θ = p/q in irreducible form, Rθ (x) = x for every x ∈ S1 .
Then, given any segment I ⊂ S1 with length less than 1/q, the set
q−1
A = I ∪ Rθ (I) ∪ · · · ∪ Rθ (I)

is invariant under Rθ and its Lebesgue measure satisfies 0 < m(A) < 1. Thus, if
θ is rational then the Lebesgue measure is not ergodic. The converse is much
more interesting:

Proposition 4.2.1. If θ is irrational then Rθ is ergodic relative to the Lebesgue


measure.

We are going to mention two different proofs of this fact. The first one,
which we detail below, uses some simple facts from Fourier analysis. The
second one, which we leave as an exercise (Exercise 4.2.6), is based on a
density point argument similar to the one we will use in Section 4.2.2 to prove
that the decimal expansion map is ergodic relative to the Lebesgue measure.
We denote by L2 (m) the Hilbert space of measurable functions ψ whose
square is integrable, that is, such that:

|ψ|2 dm < ∞.

It is convenient to consider functions with values in C, and we will do so. We


use the well-known fact that the family of functions

φk : S1 → C, x → e2πikx , k∈Z

is a Hilbert basis of this space: given any ϕ ∈ L2 (m) there exists a unique
sequence (ak )k∈Z of complex numbers such that

ϕ(x) = ak e2πikx for almost every x ∈ S1 . (4.2.1)
k∈Z

This is called the Fourier series expansion of ϕ ∈ L2 (m). Then


   2πikθ 2πikx
ϕ Rθ (x) = ak e e . (4.2.2)
k∈Z
4.2 Examples 101

Assume that ϕ is invariant. Then (4.2.1) and (4.2.2) coincide. By uniqueness


of the coefficients in the Fourier series expansion, this happens if and only if
ak e2πikθ = ak for every k ∈ Z.
The hypothesis that θ is irrational means that e2πikθ = 1 for every k = 0. Hence,
the relation that we just obtained implies that ak = 0 for every k = 0. In other
words, ϕ(z) = a0 for m-almost every z ∈ S1 . This proves that every invariant
L2 function is constant m-almost everywhere. In particular, the characteristic
function ϕ = XA of any invariant set A ⊂ S1 is constant at m-almost every
point. This is the same as saying that A has either zero measure or full measure.
Hence, by Proposition 4.1.3, the measure m is ergodic.
These observations extend in a natural way to the rotation on the d-torus Td ,
for any d ≥ 1:
Proposition 4.2.2. If θ = (θ1 , . . . , θd ) is rationally independent then the
rotation Rθ : Td → Td is ergodic relative to the Lebesgue measure.
This may be proved by the same argument as in the case d = 1, using the
fact (see Exercise 4.2.1) that the family of functions
φk1 ,...,kd : Td → C, (x1 , . . . , xd ) → e2πi(k1 x1 +···+kd xd ) , (k1 , . . . , kd ) ∈ Zd
is a Hilbert basis of the space L2 (m).
Corollary 4.2.3. If θ = (θ1 , . . . , θd ) is rationally independent then the rotation
Rθ : Td → Td is minimal, that is, every orbit O(x) = {Rnθ (x) : n ∈ N} is dense
in Td .

Proof. Let us consider in Td the flat distance, defined by:


d([ξ ], [η]) = inf{d(ξ  , η ) : ξ  , η ∈ Rd , ξ  ∼ ξ , η ∼ η}.
Observe that this distance is preserved by every rotation. Let {Uk : k ∈ N} be a
countable basis of open sets of Td and m be the Lebesgue measure on Td . By
ergodicity, there is W ⊂ Td , with total Lebesgue measure, such that τ (Uk , x) =
m(Uk ) > 0 for every k and every x ∈ W. In particular, the orbit of x is dense in
Td for every x ∈ W. Now consider an arbitrary point x ∈ M and consider any y ∈
W. Then, for every δ > 0 there exists k ≥ 1 such that d(f k (y), x) < δ. It follows
that d(f n+k (y), f n (x)) < δ for every n ≥ 1. Since the orbit of y is dense, this
implies that the orbit of x is δ-dense, that is, it intersects the δ-neighborhood of
every point. Since δ is arbitrary, this implies that the orbit of x is dense in the
ambient torus.

In fact, the irrational rotations on the circle or, more generally, on any torus
have a much stronger property than ergodicity: they are uniquely ergodic,
meaning that they admit a unique invariant probability measure (which is
the Lebesgue measure, of course). Uniquely ergodic systems are studied in
Chapter 6.
102 Ergodicity

4.2.2 Decimal expansion


Consider the transformation f : [0, 1] → [0, 1], f (x) = 10x − [10x] introduced
in Section 1.3.1. We have seen that f preserves the Lebesgue measure m.

Proposition 4.2.4. The transformation f is ergodic relative to the Lebesgue


measure m.

Proof. By Proposition 4.1.3, it suffices to prove that every invariant set A


has total measure. The main ingredient is the derivation theorem (Theo-
rem A.2.15), according to which almost every point of A is a density point of A.
More precisely (see also Exercise A.2.9), m-almost every point a ∈ A satisfies
   
m I ∩A
lim inf : I an interval such that a ∈ I ⊂ B(a, ε) = 1 . (4.2.3)
ε→0 m(I)
Let us fix a density point a ∈ A. Since the set of points of the form m/10k ,
k ∈ N, 0 ≤ m ≤ 10k has zero measure, it is no restriction to suppose that a is
not of that form. Let us consider the family of intervals
 
m−1 m
I(k, m) = k
, k , k ∈ N, m = 1, . . . , 10k .
10 10
It is clear that for each k ∈ N there exists a unique m = mk such that I(k, mk )
contains the point a. Denote Ik = I(k, mk ). The property (4.2.3) implies that
 
m Ik ∩ A
→ 1 when k → ∞.
m(Ik )
Observe also that each f k is an affine bijection from Ik to the interval (0, 1). This
has the following immediate consequence, which is crucial for our argument:

Lemma 4.2.5 (Distortion). For every k ∈ N, one has


m(f k (E1 )) m(E1 )
= (4.2.4)
m(f k (E2 )) m(E2 )
for any measurable subsets E1 and E2 of Ik .

Applying this fact to E1 = Ik ∩ A and E2 = Ik we find that


    
m f k Ik ∩ A m Ik ∩ A
  = .
m (0, 1) m(Ik )
 
Clearly, m (0, 1) = 1. Moreover, as we take A to be invariant, f k (Ik ∩ A) is
contained in A. In this way we get that
 
m Ik ∩ A
m(A) ≥ for every k.
m(Ik )
Since the sequence on the right-hand side converges to 1 when k → ∞, it
follows that m(A) = 1, as we wanted to prove.
4.2 Examples 103

The proof of Lemma  4.2.5 relies on the


 fact that the transformation f is
affine on each interval (m − 1)/10, m/10 ; that may give the impression that
the method of proof that we just presented is restricted to a very special class
of examples. In fact, this is not so—much to the contrary.
The reason is that there are many situations where one can obtain a slightly
weaker version of the statement of Lemma 4.2.5 that is, nevertheless, still
sufficient to conclude the proof of ergodicity. In a few words, instead of the
claim that the two sides of (4.2.4) are equal, one can often show that the
quotient between the two terms is bounded by some uniform constant. That
is called the bounded distortion property. As an illustration of these ideas, in
Section 4.2.4 we prove that the Gauss transformation is ergodic.
Next, we describe an application of Proposition 4.2.4 in the context of
number theory. We say that a number x ∈ R is 10-normal if every block of
digits (b1 , . . . , bl ), l ≥ 1 occurs with frequency 10−l in the decimal expansion of
x. Rational numbers are never 10-normal, of course, and it is also easy to give
irrational examples, such as x = 0.101001000100001000001 · · · . Moreover, it
is not difficult to construct 10-normal numbers, for example, the Champer-
nowne constant x = 0.12345678910111213141516171819202122 · · · , which
is obtained by concatenation of the successive natural numbers.
However, it is usually difficult to check whether a given number is 10-normal √
or not. For example, that remains unknown for the numbers π , e and even 2.
On the other hand, using the previous proposition one can easily prove that
almost every number is 10-normal:

Proposition 4.2.6. The set of 10-normal numbers x ∈ R has full Lebesgue


measure in the real line.

Proof. Since the fact of being 10-normal or not is independent of the integer
part of the number, we only need to show that almost every x ∈ [0, 1] is
10-normal. Consider f : [0, 1] → [0, 1] defined by f (x) = 10x − [10x]. For each
block (b1 , . . . , bl ) ∈ {0, . . . , 9}l , consider the interval
 κ κ +1% 
l
Ib1 ,...,bl = , with κ = bi 10l−i .
10l 10l i=1

Recall that if x = 0.a0 a1 · · · ak ak+1 · · · then f k (x) = 0.ak ak+1 · · · for every k ≥ 1.
Hence, f k (x) ∈ Ib1 ,...,bl if and only if (ak , . . . , ak+l−1 ) = (b1 , . . . , bl ). So, the mean
sojourn time τ (Ib1 ,...,bl , x) is equal to the frequency of the block (b1 , . . . , bl ) in
the decimal expansion of x. Using the Birkhoff ergodic theorem and the fact
that the transformation f is ergodic with respect to the Lebesgue measure m,
we conclude that for every (b1 , . . . , bl ) there exists a full Lebesgue measure
subset B(b1 , . . . , bl ) of the interval [0, 1] such that
1
τ (Ib1 ,...,bl , x) = m(Ib1 ,...,bl ) = for every x ∈ B(b1 , . . . , bl ).
10l
104 Ergodicity

Let B be the intersection of B(b1 , . . . , bl ) over all values of b1 , . . . , bl in {0, . . . , 9}


and every l ≥ 1. Then m(B) = 1 and every x ∈ B is 10-normal.

More generally, for any integer d ≥ 2, we say that x ∈ R is d-normal if


every block (b1 , . . . , bl ) ∈ {0, . . . , d − 1}l , l ≥ 1 occurs with frequency d−l in
the expansion of x in base d. Finally, we say that x is a normal number if it is
d-normal for every d ≥ 2. Everything that was said before for d = 10 extends
immediately to general d. In particular, the set of d-normal numbers has full
Lebesgue measure for every d ≥ 2. Taking the intersection over all the values
of d, we conclude that Lebesgue-almost every real number is normal (Borel
normal theorem).

4.2.3 Bernoulli shifts


Let (X, C, ν) be a probability space. In this section we consider the product
space  = X N , endowed with the product σ -algebra B = C N and the product
measure μ = ν N . As explained in Appendix A.2.3, this means that:  is the set
of all sequences (xn )n∈N with xn ∈ X for every n; B is the σ -algebra generated
by the measurable cylinders
[m; Am , . . . , An ] = {(xi )i∈N : xi ∈ Ai for m ≤ i ≤ n}
with m ≤ n and Ai ∈ C for each i; and μ is the probability measure on 
characterized by
&
n
μ([m; Am , . . . , An ]) = ν(Ai ). (4.2.5)
i=m
We may think of the elements of  as representing the results of a sequence
of random experiments with values in X and all subject to the same probability
distribution ν: given any measurable set A ⊂ X, the probability of xi ∈ A
is equal to ν(A) for every i. Moreover, in this model the results of the
successive experiments are independent: indeed, the relation (4.2.5) means that
the probability of xi ∈ Ai for every m ≤ i ≤ n is the product of the probabilities
of the individual events xi ∈ Ai .
In this section we introduce a dynamical system σ :  →  on the space ,
called the shift map, which preserves the measure μ. The pair (σ , μ) is called
a Bernoulli shift. The main result is that every Bernoulli shift is an ergodic
system.
It is worth pointing out that N may be replaced with Z throughout the
construction. That is, we may take  to be the space of two-sided sequences
(. . . , x−n , . . . , x0 , . . . , xn , . . . ). Up to minor adjustments, which we leave to the
reader, all that follows remains valid in that case. In addition, in the two-sided
case the shift map is invertible.
The shift map σ :  →  is defined by

σ (xn )n ) = (xn+1 )n .
4.2 Examples 105

That is, by definition, σ sends each sequence (x0 , x1 , . . . , xn , . . . ) to the sequence


(x1 , . . . , xn , . . . ). Observe that the pre-image of any cylinder is still a cylinder:

σ −1 ([m; Am , . . . , An ]) = [m + 1; Am , . . . , An ]. (4.2.6)

It follows that the map σ is measurable with respect to the σ -algebra B.


Moreover,
   
μ σ −1 ([m; Am , . . . , An ]) = ν(Am ) · · · ν(An ) = μ [m; Am , . . . , An ] ,

and (using Lemma 1.3.1) that ensures that the measure μ is invariant under σ .

Proposition 4.2.7. Every Bernoulli shift (σ , μ) is ergodic.

Proof. Let A be an invariant measurable set. We want to prove that μ(A) = 0


or μ(A) = 1. We use the following fact:

Lemma 4.2.8. If B and C are finite unions of pairwise disjoint cylinders, then
 
μ B ∩ σ −j (C) = μ(B)μ(σ −j (C)) = μ(B)μ(C),

for every j sufficiently large.

Proof. First, suppose that B and C are both cylinders: B = [k; Bk , . . . , Bl ] and
C = [m; Cm , . . . , Cn ]. Then,

σ −j (C) = [m + j; Cm , . . . , Cn ] for each j.

Consider any j large enough that m + j > l. Then,


B ∩ σ −j (C) = {(xn )n : xk ∈ Bk , . . . , xl ∈ Bl , xm+j ∈ Cm , . . . , xn+j ∈ Cn }
= [k; Bk , . . . , Bl , X, . . . , X, Cm , . . . , Cn ],
where X appears exactly m + j − l − 1 times. By the definition (4.2.5), this gives
that
 −j
 & l &
n
μ B ∩ σ (C) = ν(Bi ) 1 m+j−l−1
ν(Ci ) = μ(B)μ(C).
i=k i=m

This proves the conclusion of the lemma when both sets are cylinders. The
general case follows easily, using the fact that μ is finitely additive.

Proceeding with the proof of Proposition 4.2.7, suppose for a while that
the invariant set A belongs to the algebra B0 whose elements are the finite
unions of pairwise disjoint cylinders. Then, on the one hand, we may apply
the previous lemma with B = C = A to conclude that μ(A ∩ σ −j (A)) = μ(A)2
for every large j. On the other hand, since A is invariant, the left-hand side of
this identity is equal to μ(A) for every j. It follows that μ(A) = μ(A)2 , which
means that either μ(A) = 0 or μ(A) = 1.
106 Ergodicity

Now let A be an arbitrary invariant set. By the approximation theorem


(Theorem A.1.19), given any ε > 0 there exists B ∈ B0 such that μ(A B) < ε.
By Lemma 4.2.8 we may fix j such that
 
μ B ∩ σ −j (B) = μ(B)μ(σ −j (B)) = μ(B)2 . (4.2.7)

Using (4.1.8) and (4.1.9) and the fact that μ is invariant, we get that
    
μ A ∩ σ −j (A) − μ B ∩ σ −j (B)  ≤ 2μ(A B) < 2ε (4.2.8)

(a similar fact was deduced during the proof of Corollary 4.1.5). Moreover,
   
μ(A)2 − μ(B)2  ≤ 2μ(A) − μ(B) < 2ε. (4.2.9)

Putting the relations (4.2.7), (4.2.8) and (4.2.9) together, we conclude that
|μ(A) − μ(A)2 | < 4ε. Since ε is arbitrary, we deduce that μ(A) = μ(A)2 and,
hence, either μ(A) = 0 or μ(A) = 1.

When X is a topological space and C is the corresponding Borel σ -algebra,


we may endow  with the product topology which, by definition, is the
topology generated by the cylinders [m; Am , . . . , An ] where Am , . . . , An are
open subsets of X. The property (4.2.6) implies that the shift map σ :  →
 is continuous with respect to this topology. The theorem of Tychonoff
(see [Dug66]) asserts that  is compact if X is compact.
A relevant special case is when X is a finite set endowed with the discrete
topology, that is, such that every subset of X is open. A map f : M → M in a
topological space M is said to be transitive if there exists some x ∈ M whose
trajectory f n (x), n ≥ 1 is dense in M. We leave it to the reader (Exercise 4.2.2)
to prove the following result:

Proposition 4.2.9. Let X be a finite set and  be either X N or X Z . Then the


shift map σ :  →  is transitive. Moreover, the set of all periodic points of σ
is dense in .

The following informal statement, which is one of many versions of the


monkey paradox, illustrates the meaning of the ergodicity of the Bernoulli
measure μ: A monkey hitting keys at random on a typewriter keyboard for
an infinite amount of time will almost surely type the complete text of “Os
Lusı́adas”.1
To “prove” this statement we need to formulate it a bit more precisely. The
possible texts typed by the monkey correspond to the sequences (xn )n∈N in
the (finite) set X of all the characters on the keyboard: letters, digits, space,
punctuation signs, and so on. Denote by σ :  →  the shift map in the space
 = X N . It is assumed that each character ∗ ∈ X has a positive probability p∗

1 Monumental epic poem by the 16th-century Portuguese poet Luis de Camões.


4.2 Examples 107

of being hit at each time. This corresponds to the probability measure



ν= p∗ δ∗
∗∈X

on the set X. Furthermore, it is assumed that the character hit at each time
is independent of all the previous ones. This means that the distribution of the
sequences of characters (xn )n is governed by the Bernoulli probability measure
μ = νN.
The text of “Os Lusı́adas” corresponds to a certain finite (albeit very long)
sequence of characters (l0 , . . . , lN ). Consider the cylinder L = [0; l0 , . . . , lN ].
Then
&N
μ(L) = plj
j=1

is positive (although very small).A sequence


 (xn )n contains a complete copy
of “Os Lusı́adas” precisely if σ (xn )n ∈ L for some k ≥ 0. By the Birkhoff
k

ergodic theorem and the fact that (σ , μ) is ergodic, the set K of values of k for
which that happens satisfies
1  
lim # K ∩ [0, n − 1] = μ(L) > 0, (4.2.10)
n n

with full probability. In particular, for almost all sequences (xn )n the set K
is infinite, which means that (xn )n contains infinitely many copies of “Os
Lusı́adas”. Actually, (4.2.10) yields an even stronger conclusion: still with full
probability, the copies of our poem correspond to a positive (although small)
fraction of all the typed characters. In other words, on average, the monkey
types a new copy of “Os Lusı́adas” every so many (a great many) years.

4.2.4 Gauss map


As we have seen in Section 1.3.2, the gauss map G(x) = 1/x − [1/x] has an
invariant probability measure μ equivalent to the Lebesgue measure, namely:

1 dx
μ(E) = . (4.2.11)
log 2 E 1 + x
Proposition 4.2.10. The system (G, μ) is ergodic.

This can be proved using a more elaborate version of the method introduced
in Section 4.2.2. We are going to outline the arguments in the proof, referring to
Section 4.2.2 for those parts that are common to both situations and addressing
in more detail the main new difficulty.
Let A be an invariant set with positive measure. We want to show that
μ(A) = 1. On the one hand, it remains true that for almost every point a ∈ [0, 1]
there exists a sequence of intervals Ik containing a and such that Gk maps Ik
108 Ergodicity

bijectively and differentiably onto (0, 1). Indeed, such intervals can be found
as follows. First, consider
 
1 1
I(1, m) = , ,
m+1 m
for each m ≥ 1. Next, define, by recurrence,
 
I(k, m1 , . . . , mk ) = I(1, m1 ) ∩ G−k+1 I(k − 1, m2 , . . . , mk )
for m1 , . . . , mk ≥ 1. Then, it suffices to take as Ik the interval I(k, m1 , . . . , mk )
that contains a. This is well defined for every k ≥ 1 and every point a in the
complement of a countable set, namely, the set ∞ −k
k=0 G ({0, 1}).
On the other hand, although the restriction of Gk to each Ik is a differentiable
bijection, it is not affine. For that reason, the analogue of (4.2.4) cannot hold in
the present case. This difficulty is by passed by the result that follows, which
is an example of distortion control: it is important to note that the constant K
is independent of Ik , E1 , E2 and, most of all, k.

Proposition 4.2.11 (Bounded distortion). There exists K > 1 such that, given
any k ≥ 1 and any interval Ik such that Gk restricted to Ik is a differentiable
bijection,
μ(Gk (E1 )) μ(E1 )
≤ K
μ(Gk (E2 )) μ(E2 )
for any measurable subsets E1 and E2 of the interval Ik .

For the proof of this proposition we need the following two auxiliary results:

Lemma 4.2.12. For every x ∈ (0, 1] we have


|G (x)| ≥ 1 and |(G2 ) (x)| ≥ 2 and |G (x)/G (x)2 | ≤ 2.

Proof. Recall that G(x) = 1/x − m on each interval (1/(m + 1), 1/m].
Therefore,
1 2
G (x) = − 2 and G (x) = 3 .
x x
The first identity implies that |G (x)| ≥ 1 for every x ∈ (0, 1]. Moreover,
|G (x)| ≥ 2 whenever x ≤ 2/3. On the other hand, x ≥ 2/3 implies that
G(x) = 1/x − 1 < 2/3 and, consequently, G (G(x)) ≥ 2. Combining these
observations we find that |(G2 ) (x)| = |G (x)| |G (G(x))| ≥ 2 for every x ∈ (0, 1].
Finally, |G (x)/G (x)2 | = 2|x| ≤ 2 also for every x ∈ (0, 1].

Lemma 4.2.13. There exists C > 1 such that, given any k ≥ 1 and any interval
Ik such that Gk restricted to Ik is a differentiable bijection,
|(Gk ) (x)|
≤C for any x and y in Ik .
|(Gk ) (y)|
4.2 Examples 109

Proof. Let g be a local inverse of G, that is, a differentiable function defined on


some interval and such that G(g(z)) = z for every z in the domain of definition.
Note that
 
 G (g(z)) g (z) G (g(z))
log |G ◦ g(z)| = =  .
G (g(z)) G (g(z))2
Therefore, the last estimate in Lemma 4.2.12 implies that
 
 log |G ◦ g(z)|   ≤ 2 for every g and every z. (4.2.12)
In other words, every function of the form log |G ◦ g| admits 2 as a Lipschitz
constant. Observe also that if x, y ∈ Ik then
|(Gk ) (x)| 
k−1
log 
= log |G (Gj (x))| − log |G (Gj (y))|
|(G ) (y)| j=0
k


k
= log |G ◦ gj (Gj (x))| − log |G ◦ gj (Gj (y))|,
j=1

where gj denotes a local inverse of G defined on the interval [Gj (x), Gj (y)].
Using the estimate (4.2.12), we get that

|(Gk ) (x)| k 
k−1
log 
≤2 |G (x) − G (y)| = 2
j j
|Gk−i (x) − Gk−i (y)|. (4.2.13)
|(G ) (y)|
k
j=1 i=0

Now, the first two estimates in Lemma 4.2.12 imply that


|Gk (x) − Gk (y)| ≥ 2[i/2] |Gk−i (x) − Gk−i (y)|
for every i = 0, . . . , k. Replacing in (4.2.13), we conclude that

|(Gk ) (x)| 
k−1
log ≤2 2−[i/2] |Gk (x) − Gk (y)| ≤ 8|Gk (x) − Gk (y)| ≤ 8.
|(Gk ) (y)| i=0

Now it suffices to take C = e8 .

Proof of Proposition 4.2.11. Let m be the Lebesgue measure on [0, 1]. It


follows from Lemma 4.2.13 that
k 
m(Gk (E1 )) E1 |(G ) | dm m(E1 )
= ≤ C .
m(Gk (E2 )) E2 |(G k ) | dm m(E2 )
On the other hand, the definition (4.2.11) implies that
1 1
m(E) ≤ μ(E) ≤ m(E)
2 log 2 log 2
for every measurable set E ⊂ [0, 1]. Combining these two relations, we find
that
μ(Gk (E1 )) m(Gk (E1 )) m(E1 ) μ(E1 )
≤ 2 ≤ 2C ≤ 4C .
μ(Gk (E2 )) m(Gk (E2 )) m(E2 ) μ(E2 )
Hence, it suffices to take K = 4C.
110 Ergodicity

We are ready to conclude that (G, μ) is ergodic. Let A be an invariant set with
μ(A) > 0. Then A also has positive Lebesgue measure, since μ is absolutely
continuous with respect to the Lebesgue measure. Let a be a density point of
A whose future trajectory is contained in the open interval (0, 1). Consider the
sequence (Ik )k of the intervals I(k, m1 , . . . , mk ) that contain a. It follows from
Lemma 4.2.12 that
 
1
diam Ik ≤ sup 
: x ∈ Ik ≤ 2−[k/2]
|(G ) (x)|
k

for every k ≥ 1. In particular, the diameter of Ik converges to zero and so


 
μ Ik ∩ A
→ 1 when k → ∞. (4.2.14)
μ(Ik )
Let us take E1 = Ik ∩ Ac and E2 = Ik . By Proposition 4.2.11,
   
μ(Gk Ik ∩ Ac ) μ Ik ∩ Ac
≤K .
μ(Gk (Ik )) μ(Ik )

Observe that Gk (Ik ∩ Ac ) = Ac up to a zero measure set, because the set A


is assumed to be invariant. Recall also that Gk (Ik ) = (0, 1), which has full
measure. Therefore, the previous inequality may be written as
 
μ I k ∩ Ac
μ(Ac ) ≤ K .
μ(Ik )
According to (4.2.14), the expression on the right-hand side converges to zero
when k → ∞. It follows that μ(Ac ) = 0, as we wanted to prove.

4.2.5 Linear endomorphisms of the torus


Recall that we call the torus of dimension d (or just d-torus) the quotient space
Td = Rd /Zd , that is, the space of all equivalence classes of the equivalence
relation defined in Rd by x ∼ y ⇔ x − y ∈ Zd . This quotient inherits from Rd
the structure of a differentiable manifold of dimension d. In what follows we
assume that Td is also endowed with the flat Riemannian metric, which makes
it locally isometric to the Euclidean space Rd . Let m be the volume measure
associated with this Riemannian metric (see Appendix A.4.5).
Let A be a d-by-d matrix with integer coefficients and determinant different
from zero. Then A(Zd ) ⊂ Zd and, consequently, A induces a transformation
fA : T d → T d , fA ([x]) = [A(x)],

where [x] denotes the equivalence class that contains x ∈ Rd . These transfor-
mations are called linear endomorphisms of the torus.
Note that fA is differentiable and the derivative DfA ([x]) at each point is
canonically identified with A. In particular, the Jacobian det DfA ([x]) is constant
equal to det A. It follows (Exercise 4.2.9) that the degree of f is equal to | det A|.
In particular, fA is invertible if and only if | det A| = 1. In this case, the inverse
4.2 Examples 111

is the transformation fA−1 induced by the inverse matrix A−1 ; observe that A−1
is also a matrix with integer coefficients.
In any case, fA preserves the Lebesgue measure on Td . This may be seen as
follows. Since fA is a local diffeomorphism, the pre-image of any measurable
set D with sufficiently small diameter consists of | det A| (= degree of fA )
pairwise disjoint sets Di , each of which is mapped diffeomorphically onto D.
By the formula of change of variables, m(D) = | det A| m(Di ) for every i. This
proves that m(D) = m(fA−1 (D)) for every measurable set D with small diameter.
Hence, fA does preserve the Lebesgue measure m. Next we prove the following
fact:

Theorem 4.2.14. The system (fA , m) is ergodic if and only if no eigenvalue of


the matrix A is a root of unity.

Proof. Suppose that no eigenvalue of A is a root of unity. Consider any


function ϕ ∈ L2 (m) and let

ϕ([x]) = ck e2πi(k·x)
k∈Zd

be its Fourier series expansion (with k · x = k1 x1 + · · · + kd xd ). The coefficients


ck ∈ C satisfy 
|ck |2 = ϕ22 < ∞. (4.2.15)
k∈Zd

Then, the Fourier series expansion of ϕ ◦ fA is:


  ∗
ϕ(fA ([x])) = ck e2πi(k·A(x)) = ck e2πi(A (k)·x) ,
k∈Zd k∈Zd

where A∗ denotes the adjoint of A. Suppose that ϕ is an invariant function, that


is, ϕ ◦ fA = ϕ at m-almost every point. Then, since the Fourier series expansion
is unique, we must have
cA∗ (k) = ck for every k ∈ Z. (4.2.16)
We claim that the trajectory of every k = 0 under the transformation A∗ is
infinite. Indeed, if the trajectory of some k = 0 were finite then there would
exist l, r ∈ Z with r > 0 such that A(l+r)∗ (k) = Al∗ (k). This could only happen
if A∗ had some eigenvalue λ such that λr = 1. Since A and A∗ have the same
eigenvalues, that would mean that A has some eigenvalue which is a root of
unity, which is excluded by the hypothesis. Hence, the trajectory of every
k = 0 is infinite, as claimed. Then the identity (4.2.16), together with (4.2.15),
implies that ck = 0 for every k = 0. Thus, ϕ = c0 at m-almost every point. This
proves that the system (fA , m) is ergodic.
To prove the converse, assume that A admits some eigenvalue which is a
root of unity. Then the same holds for A∗ and, hence, there exists r > 0 such
that 1 is an eigenvalue of Ar∗ . Since Ar∗ has integer coefficients, it follows (see
112 Ergodicity

Exercise 4.2.8) that there exists some k ∈ Zd \ {0} such that Ar∗ (k) = k. Fix k
and consider the function ϕ ∈ L2 (m) defined by

r−1 
r−1
2πi(Ai∗ (k)·x) i
ϕ([x]) = e = e2πi(k·A (x)) .
i=0 i=0

Then ϕ is an invariant function for fA and it is not constant at m-almost every


point. Hence, (fA , m) is not ergodic.

4.2.6 Hopf argument


In this section we present an alternative, more geometric, method to prove the
ergodicity of certain linear endomorphisms of the torus. This is based on an
argument introduced by Eberhard F. Hopf in his pioneering work [Hop39] on
the ergodicity of geodesic flows on surfaces with negative Gaussian curvature.
In the present linear context, the Hopf argument may be used whenever
| det A| = 1 and the matrix A is hyperbolic, that is, A has no eigenvalues in
the unit circle. But its strongest point is that it may be extended to much more
general differentiable systems, not necessarily linear. Some of these extensions
are mentioned in Section 4.4.
The hypothesis that the matrix A is hyperbolic means that the space Rd may
be written as a direct sum Rd = Es ⊕ Eu such that:

1. A(Es ) = Es and all the eigenvalues of A | Es have absolute value smaller


than 1;
2. A(Eu ) = Eu and all the eigenvalues of A | Eu have absolute value bigger
than 1.

Then there exist constants C > 0 and λ < 1 such that


An (v s ) ≤ Cλn v s  for every v s ∈ Es and every n ≥ 0,
(4.2.17)
A−n (v u ) ≤ Cλn v u  for every v u ∈ Eu and every n ≥ 0.
 
2 1
Example 4.2.15. Consider A = . The eigenvalues of A are
1 1
√ √
3+ 5 3− 5
λu = > 1 > λs = >0
2 2
and the corresponding eigenspaces are:
 √ '  √ '
5 − 1 5 + 1
Eu = (x, y) ∈ R2 : y = x and Es = (x, y) ∈ R2 : y = − x .
2 2

The family of all affine subspaces of Rd of the form v + Es , with v ∈ Rd ,


defines a partition F s of Rd that we call stable foliation and whose elements
we call stable leaves of A. This partition is invariant under A, meaning that
4.2 Examples 113
u
(x)

s
(x)

Figure 4.1. Stable foliation and unstable foliation in the torus

the image of any stable leaf is still a stable leaf. Moreover, by (4.2.17),
the transformation A contracts distances uniformly inside each stable leaf.
Analogously, the family of all affine subspaces of Rd of the form v + Eu with
v ∈ Rd defines the unstable foliation F u of Rd , whose elements are called
unstable leaves. The unstable foliation is also invariant and the transformation
A expands distances uniformly inside unstable leaves.
Mapping F s and F u by the canonical projection π : Rd → Td , we obtain
foliations W s and W u of the torus that we call stable foliation and unstable
foliation of the transformation fA . See Figure 4.1. The previous observations
show that these foliations are invariant under fA . Moreover:
j j
(i) d(fA (x), fA (y)) → 0 when j → +∞, for any points x and y in the same
stable leaf;
j j
(ii) d(fA (y), fA (z)) → 0 when j → −∞, for any points y and z in the same
unstable leaf.

We are going to use this geometric information to prove that (fA , m) is


ergodic. To that end, let ϕ : Td → R be any continuous function and consider
the time averages

1 1
n−1 n−1
+ j − −j
ϕ (x) = lim ϕ(fA (x)) and ϕ (x) = lim ϕ(fA (x)),
n n n n
j=0 j=0

which are defined for m-almost every x ∈ Td . By Corollary 3.2.8, there exists
a full measure set X ⊂ Td such that
ϕ + (x) = ϕ − (x) for every x ∈ X. (4.2.18)
Let us denote by W s (x) and W u (x), respectively, the stable leaf and the
unstable leaf of fA through each point x ∈ Td .

Lemma 4.2.16. The function ϕ + is constant on each leaf of W s : if ϕ + (x) exists


and y ∈ W s (x) then ϕ + (y) exists and it is equal to ϕ + (x). Analogously, ϕ − is
constant on each leaf of W u .
114 Ergodicity
j j
Proof. According to property (i) above, d(fA (x), fA (y)) converges to zero when
j → ∞. Noting that ϕ is uniformly continuous (because its domain is compact),
it follows that
j j
ϕ(fA (x)) − ϕ(fA (y)) → 0 when j → ∞.

In particular, the Cesàro limit

1
n−1
j j
lim ϕ(fA (x)) − ϕ(fA (y))
n n
j=0

is also zero. That implies that ϕ + (y) exists and is equal to ϕ + (x). The argument
for ϕ − is entirely analogous.

Given any open subset R of the torus and any x ∈ R, denote by W s (x, R)
the connected component of W s (x) ∩ R that contains x and by W u (x, R) the
connected component of W u (x) ∩ R that contains x. We call R a rectangle if
W s (x, R) intersects W u (y, R) at a unique point, for every x and y in R. See
Figure 4.2.

Lemma 4.2.17. Given any rectangle R ⊂ Td , there exists a measurable set


YR ⊂ X ∩ R such that m(R \ YR ) = 0 and, given any x and y in YR , there exist
points x and y in X ∩ R such that x ∈ W s (x, R) and y ∈ W s (y, R) and y ∈
W u (x ).

Proof. Let us denote by msx the Lebesgue measure on the stable leaf W s (x)
of each point x ∈ Td . Note that m(R \ X) = 0, since X has full measure in Td .
Then, by the theorem of Fubini,
 
msx W s (x, R) \ X = 0 for m-almost every x ∈ R.
   
Define YR = x ∈ X ∩ R : msx W s (x, R) \ X = 0 . Then YR has full measure in
R. Given x, y ∈ R, consider the map π : W s (x, R) → W s (y, R) defined by

π(x ) = intersection between W u (x , R) and W s (y, R).

R s
(x)

x
x

y s
(y)
y

Figure 4.2. Rectangle in Td


4.2 Examples 115

This map is affine and, consequently, it has the following property, called
absolute continuity:
msx (E) = 0 ⇔ msy (π(E)) = 0.
In particular, the image of W s (x, R) ∩ X has full measure in W s (y, R) and,
consequently, it intersects W s (y, R) ∩ X. So, there exists x ∈ W s (x, R) ∩ X
whose image y = π(x ) is in W s (y, R) ∩ X. Observing that x and y are in
the same unstable leaf, by the definition of π , we see that these points satisfy
the conditions in the conclusion of the lemma.

Consider any rectangle R. Given any x, y in YR , consider the points x , y in X


given by Lemma 4.2.17. Using Lemma 4.2.16 as well, we obtain
ϕ − (x) = ϕ + (x) = ϕ + (x ) = ϕ − (x ) = ϕ − (y ) = ϕ + (y ) = ϕ + (y) = ϕ − (y).
This shows that the functions ϕ + and ϕ − coincide with one another and are
constant in YR .
Now let R1 , . . . , RN be a finite cover of the torus by rectangles. Consider
the set
N
Y= Yj , where Yj = YRj .
j=1

Observe that m(Y) = 1, since Y ∩ Rj ⊃ Yj has full measure in Rj for every j.


We claim that ϕ + = ϕ − is constant on the whole Y. Indeed, given any k, l ∈
{1, . . . , N} we may find j0 = k, j1 , . . . , jn−1 , jn = l such that each Rji intersects
Rji−1 (that is just because the torus is path-connected). Recalling that Rj is an
open set and Yj is a full measure subset, we get that each Yji intersects Yji−1 .
Then, ϕ + = ϕ − is constant on the union of all the Yji . This proves our claim.
In this way, we have shown that the time averages ϕ ± of any contin-
uous function ϕ are constant at m-almost every point. Consequently (see
Exercise 4.1.4), the system (fA , m) is ergodic.

4.2.7 Exercises
4.2.1. Prove Proposition 4.2.2.
4.2.2. Prove Proposition 4.2.9.
4.2.3. Let I = [0, 1] and f : I → I be the function defined by


⎪ 2x if 0 ≤ x < 1/3

⎨ 2x − 2/3 if 1/3 ≤ x < 1/2
f (x) =

⎪ 2x − 1/3 if 1/2 ≤ x < 2/3

⎩ 2x − 1 if 2/3 ≤ x ≤ 1.

Show that f is ergodic with respect to the Lebesgue measure m.


4.2.4. Let X be a finite set and  = X N . Prove that every infinite compact subset of 
invariant under the shift map σ :  →  contains some non-periodic point.
116 Ergodicity

4.2.5. Let X be a topological space, endowed with the corresponding Borel σ -algebra
C, and let  = X N . Show that if X has a countable basis of open sets then the
Borel σ -algebra of  (for the product topology) coincides with the product
σ -algebra B = C N . The same is true for  = X Z and B = C Z .
4.2.6. In this exercise we propose an alternative proof of Proposition 4.2.1. Assume
that θ is irrational. Let A be an invariant set with positive measure. Recalling
that the orbit {Rnθ (a) : n ∈ Z} of every a ∈ S1 is dense in S1 , show that no point
of S1 is a density point of Ac . Conclude that μ(A) = 1.
4.2.7. Assume that θ is irrational. Let ϕ : S1 → R be any continuous function. Show
that
1
n−1
j
ϕ̃(x) = lim ϕ(Rθ (x)) (4.2.19)
n→∞ n
j=0

exists at every point and, in fact, the limit is uniform. Deduce that ϕ̃ is constant
at every point. Conclude that Rθ has a unique invariant probability measure.
4.2.8. Let A be a square matrix of dimension d with rational coefficients and let λ be
a rational eigenvalue of A. Show that there exists some eigenvector with integer
coefficients, that is, some k ∈ Zd \ {0} such that Ak = λk.
4.2.9. Show that if f : M → M is a local diffeomorphism on a compact Riemannian
manifold then

degree f = | det Df | dm,

where m denotes the volume measure induced by the Riemannian metric of M,


normalized in such a way that m(M) = 1. In particular, for any square matrix A
of dimension d with integer coefficients, the degree of the linear endomorphism
fA : Td → Td is equal to | det A|.
4.2.10. A number x ∈ (0, 1) has continued fraction expansion of bounded type if the
sequence (an )n constructed in Section 1.3.2 is bounded. Prove that the set L ⊂
(0, 1) of points with continued fraction expansion of bounded type has Lebesgue
measure zero.
4.2.11. Let f : M → M be a measurable transformation, μ be an ergodic invariant
measure and ϕ : M → R be a measurable function with ϕ dμ = +∞. Prove
n−1
that limn (1/n) j=0 ϕ(f j (x)) = +∞ for μ-almost every x ∈ M.
4.2.12. Observe that the number b in Exercise 3.2.4 is independent of x in a set with full
Lebesgue measure. Prove that the arithmetic mean of the numbers a1 , . . . , an , . . .
goes to infinity: limn (1/n)(a1 + · · · + an ) = +∞.

4.3 Properties of ergodic measures


In this section we take the transformation f : M → M to be fixed and we analyze
the set Me (f ) of probability measures that are ergodic with respect to f as a
subset of the space M1 (f ) of all probability measures invariant under f .
Recall that a measure ν is said to be absolutely continuous with respect to
another measure μ if μ(E) = 0 implies ν(E) = 0. Then we write ν  μ. This
4.3 Properties of ergodic measures 117

relation is transitive: if ν  μ and μ  λ then ν  λ. The first result asserts


that the ergodic probability measures are minimal for this order relation:
Lemma 4.3.1. If μ and ν are invariant probability measures such that μ is
ergodic and ν is absolutely continuous with respect to μ then μ = ν.

Proof. Let ϕ : M → R be any bounded measurable function. Since μ is


invariant and ergodic, the time average

1
n−1
ϕ̃(x) = lim ϕ(f j (x))
n→∞ n
j=0

is constant: ϕ̃(x) = ϕ dμ at μ-almost every point. Since ν  μ, it follows that


the equality also holds at ν-almost every point. In particular,
  
ϕ dν = ϕ̃ dν = ϕ dμ

(the first equality is part of the Birkhoff ergodic theorem). Therefore, the
integrals of each bounded measurable function ϕ with respect to μ and with
respect to ν coincide. In particular, considering characteristic functions, we
conclude that μ = ν.

It is clear that if μ1 and μ2 are probability measures invariant under the


transformation f then so is (1 − t)μ1 + tμ2 , for any t ∈ (0, 1). This means that
the space M1 (f ) of all probability measures invariant under f is convex. The
next proposition asserts that the ergodic probability measures are the extremal
elements of this convex set:
Proposition 4.3.2. An invariant probability measure μ is ergodic if and only
if it is not possible to write it as μ = (1 − t)μ1 + tμ2 with t ∈ (0, 1) and μ1 , μ2 ∈
M1 (f ) with μ1  = μ2 .

Proof. To prove the “if” claim, assume that μ is not ergodic. Then there exists
some invariant set A with 0 < μ(A) < 1. Define μ1 and μ2 to be the normalized
restriction of μ to the set A and to its complement Ac , respectively:
   
μ E∩A μ E ∩ Ac
μ1 (E) = and μ2 (E) = .
μ(A) μ(Ac )
Since A and Ac are invariant sets and μ is an invariant measure, both μ1 and
μ2 are still invariant probability measures. Moreover,
μ = μ(A)μ1 + μ(Ac )μ2
and, consequently, μ is not extremal.
To prove the converse, assume that μ is ergodic and μ = (1 − t)μ1 + tμ2 for
some t ∈ (0, 1). It is clear that μ(E) = 0 implies μ1 (E) = μ2 (E) = 0, that is, μ1
and μ2 are absolutely continuous with respect to μ. Hence, by Lemma 4.3.1,
μ1 = μ = μ2 . This shows that μ is extremal.
118 Ergodicity

Let us also point out that distinct ergodic measures “live” in disjoint subsets
of the space M (see also Exercise 4.3.6):
Lemma 4.3.3. Assume that the σ -algebra of M admits some countable
generating subset . Let {μi : i ∈ I} be an arbitrary family of ergodic
probability measures, all distinct. Then these measures μi are mutually
singular: there exist pairwise disjoint measurable subsets {Pi : i ∈ I} invariant
under f and such that μi (Pi ) = 1 for every i ∈ I.

Proof. Let A be the algebra generated by . Note that A is countable, since it


coincides with the union of the (finite) algebras generated by the finite subsets
of . For each i ∈ I, define

Pi = {x ∈ M : τ (A, x) = μi (A)}.
A∈A

Since μi is ergodic, {x ∈ M : τ (A, x) = μi (A)} has full measure for each A.


Using that A is countable, it follows that μi (Pi ) = 1 for every i ∈ I. Moreover,
if there exists x ∈ Pi ∩ Pj then μi (A) = τ (A, x) = μj (A) for every A ∈ A. In other
words, μi = μi . This proves that the Pi are pairwise disjoint.

Now assume that f : M → M is a continuous transformation in a topological


space M. We say that f is transitive if there exists some x ∈ M such that {f n (x) :
n ∈ N} is dense in M. The next lemma provides a useful characterization of
transitivity. Recall that a topological space M is called a Baire space if the
intersection of any countable family of open dense subsets is dense in M. Every
complete metric space is a Baire space and the same is true for every locally
compact topological space (see [Dug66]).
Lemma 4.3.4. Let M be a Baire space with a countable basis of open sets.
Then f : M → M is transitive if and only if for every pair of open sets U and V
there exists k ≥ 1 such that f −k (U) intersects V.

Proof. Assume that f is transitive and let x ∈ M be a point whose orbit {f n (x) :
n ∈ N} is dense. Then there exists m ≥ 1 such that f m (x) ∈ V and (using the
fact that {f n (x) : n > m} is also dense) there exists n > m such that f n (x) ∈ U.
Take k = n − m. Then f m (x) ∈ f −k (U) ∩ V. This proves the “only if” part of the
statement.
To prove the converse, let {Uj : j ∈ N} be a countable basis of open subsets
of M. The hypothesis ensures that the open set ∞ k=1 f
−k
(Uj ) is dense in M for
every j ∈ N. Then the intersection
∞ 
 ∞
X= f −k (Uj )
j=1 k=1

is a dense subset of M. In particular, it is non-empty. On the other hand, by


definition, if x ∈ X then for every j ∈ N there exists k ≥ 1 such that f k (x) ∈ Uj .
4.3 Properties of ergodic measures 119

Since the Uj constitute a basis of open subsets of M, this means that {f k (x) :
k ∈ N} is dense in M.

Proposition 4.3.5. Let M be a Baire space with a countable basis of open sets.
If μ is an ergodic probability measure then the restriction of f to the support
of μ is transitive.

Proof. Start by noting that supp μ has a countable basis of open sets, because
it is a subspace of M, and it is a Baire space, since it is closed in M. Let
U and V be open subsets of supp μ. By the definition of support, μ(U) >
0 and μ(V) > 0. Define B = ∞ k=1 f
−k
(U). Then μ(B) > 0, because B ⊃ U,
−1
and f (B) ⊂ B. By ergodicity (see Exercise 1.1.4) it follows that μ(B) = 1.
Then B must intersect V. This proves that there exists k ≥ 1 such that f −k (U)
intersects V. By Lemma 4.3.4, it follows that the restriction f : supp μ → supp μ
is transitive.

4.3.1 Exercises
4.3.1. Let M be a topological space M with a countable basis of open sets, f : M → M
be a measurable transformation and μ be an ergodic probability measure. Show
that the orbit {f n (x) : n ≥ 0} of μ-almost every point x ∈ M is dense in the support
of μ.
4.3.2. Let f : M → M be a continuous transformation in a compact metric space. Given
a function ϕ : M → R, prove that there exists an invariant probability measure μϕ
such that  
ϕ dμϕ = sup ϕ dη.
η∈M1 (f )

4.3.3. Let g : E → E be a transformation induced by f : M → M, that is, a transformation


of the form g(x) = f ρ(x) (x) with ρ : E → N (see Section 1.4.2). Let ν be an
invariant probability measure of g and νg be the invariant measure of f defined
by (1.4.5). Assume that νρ (M) < ∞ and denote μ = νρ /νρ (M). Show that (f , μ)
is ergodic if and only if (g, ν) is ergodic.
4.3.4. Let f : M → M be a continuous transformation in a separable complete metric
space. Given any invariant probability measure μ, let μ̂ be its lift to the natural
extension f̂ : M̂ → M̂ (see Section 2.4.2). Show that (f̂ , μ̂) is ergodic if and only
if (f , μ) is ergodic.
4.3.5. Let f : M → M be a measurable transformation and μ be an invariant measure.
Let gt : N → N, t ∈ R be a suspension flow of f and ν be the corresponding
suspension of the measure μ (see Section 3.4.1). Assume that ν(N) < ∞ and
denote ν̂ = ν/ν(N). Show that ν̂ is ergodic for the flow (gt )t if and only if μ is
ergodic for f .
4.3.6. Show that for finite or countable families of ergodic measures the conclusion of
Lemma 4.3.3 holds even if the σ -algebra is not countably generated.
4.3.7. Give an example of a metric space M and a transformation f : M → M such that
there exists a sequence of ergodic Borel measures μn converging, in the weak∗
topology, to a non-ergodic invariant measure μ.
120 Ergodicity

4.3.8. Let M be a metric space, f : M → M be a continuous transformation and μ be an


n−1 j
ergodic probability measure. Show that 1n j=0 f∗ ν converges to μ in the weak∗
topology for any probability measure ν on M absolutely continuous with respect
to μ, but not necessarily invariant.
4.3.9. Let X = {1, . . . , d} and σ :  →  be the shift map in  = X N or  = X Z .
(1) Show that for every δ > 0 there exists k ≥ 1 such that, given x1 , . . . , xs ∈ 
and m1 , . . . , ms ≥ 1, there exists a periodic point y ∈  with period ns and
such that d(f j+ni (y), f j (xi )) < δ for every 0 ≤ j < mi , where n1 = 0 and ni =
(m1 + k) + · · · + (mi−1 + k) for 1 < i ≤ s.
(2) Let ϕ :  → R be a continuous function and ϕ̃ be its Birkhoff average.
Show that, given ε > 0, points x1 , . . . , xs ∈  where the Birkhoff average

of ϕ is well defined, and numbers α 1 , . . . , α s > 0 such that i α i = 1, there

exists a periodic point y ∈  satisfying |ϕ̃(y) − i α i ϕ̃(xi )| < ε.
(3) Conclude that the set Me (σ ) of ergodic probability measures is dense in
the space M1 (σ ) of all invariant probability measures.

4.4 Comments in conservative dynamics


The ergodic theorem of Birkhoff, proven in the 1930’s, provided a solid math-
ematical foundation to the statement of the Boltzmann ergodic hypothesis, but
left entirely open the question of its veracity. In this section we briefly survey
the main results obtained since then, in the context of conservative systems,
that is, dynamical systems that preserve a volume measure on a manifold.
Let us start by mentioning that, in a certain abstract sense, the majority of
conservative systems are ergodic. That is the sense of the theorem that we
state next, which was proven in the early 1940’s by John Oxtoby and Stanislav
Ulam [OU41]. Recall that a subset of a Baire space is called residual if it
may be written as a countable intersection of open and dense subsets. By the
definition of Baire space, every residual subset is dense.

Theorem 4.4.1 (Oxtoby, Ulam). For every compact Riemannian manifold M


there exists a residual subset R of the space Homeovol (M) of all conservative
homeomorphisms of M such that every element of R is ergodic.

The results presented below imply that the conclusion of this theorem is
no longer true when one replaces Homeovol (M) by the space Diffeokvol (M)
of conservative diffeomorphisms of class Ck , at least for k > 3. Essentially
nothing is known in this regard in the cases k = 2 and k = 3. On the other hand,
Artur Avila, Sylvain Crovisier and Amie Wilkinson have recently announced
a C1 version of the previous theorem: for every compact Riemannian manifold
M, there exists a residual subset R of the space Diffeo1vol (M) of conservative
diffeomorphisms of class C1 such that every f ∈ R with positive entropy hvol (f )
is ergodic. The notion of entropy will be studied in Chapter 9.
4.4 Comments in conservative dynamics 121

4.4.1 Hamiltonian systems


The systems that interested Boltzmann, relative to the motion of gas molecules,
may, in principle, be described by the laws of Newtonian classical mechanics.
In the so-called Hamiltonian formalism of classical mechanics, the states
of the system are represented by “generalized coordinates” q1 , . . . , qd and
“generalized momenta” p1 , . . . , pd , and the system’s evolution is described by
the solutions of the Hamilton–Jacobi equations:
dqj ∂H dpj ∂H
= and =− , j = 1, . . . , d, (4.4.1)
dt ∂pj dt ∂qj
where H (the total energy of the system) is a C2 function of the variables q =
(q1 , . . . , qd ) and p = (p1 , . . . , pd ); the integer d ≥ 1 is the number of degrees of
freedom.
Example 4.4.2 (Harmonic pendulum). Let d = 1 and H(q, p) = p2 /2 − g cos q,
where g is a positive constant and (q, p) ∈ R2 . The Hamilton–Jacobi equations
dq dp
= p and = −g sin q
dt dt
describe the motion of a pendulum subject to a constant gravitational field:
the coordinate q measures the angle with respect to the position of (stable)
equilibrium and p measures the angular momentum. Then p2 /2 is the kinetic
energy and −g cos q is the potential energy. Thus, the Hamiltonian H is the
total energy.
Note that H is always a first integral of the system, that is, it is constant
along the flow trajectories:

dH  ∂H dqj ∂H dpj
d
= + ≡ 0.
dt j=1
∂qj dt ∂pj dt

Thus, we may consider the restriction of the flow to each energy hypersurface
Hc = {(q, p) : H(q, p) = c}. The volume measure dq1 · · · dqd dp1 · · · dpd is called
the Liouville measure. Observe that the divergence of the vector field
 
∂H ∂H ∂H ∂H
F= − ,...,− , ,...,
∂p1 ∂pd ∂q1 ∂qd
is identically zero. Thus (recall Section 1.3.6) the Liouville measure is
invariant under the Hamiltonian flow. It follows (see Exercise 1.3.12) that
the restriction of the flow to each energy hypersurface Hc admits an invariant
measure μc that is given by

ds
μc (E) = for every measurable set E ⊂ Hc ,
E  grad H

where ds denotes the volume element on the hypersurface. Then, the ergodic
hypothesis may be viewed as claiming that, in general, Hamiltonian systems
122 Ergodicity

are ergodic with respect to this invariant measure μc on (almost) every energy
hypersurface.
The first important result in this context was announced by Andrey Kol-
mogorov at the International Congress of Mathematicians ICM 1954 and was
substantiated, soon afterwards, by the works of Vladimir Arnold and Jürgen
Moser. This led to the deep theory of so-called almost integrable systems that
is known as KAM theory, in homage to its founders, and to which several
other mathematicians contributed in a decisive manner, including Helmut
Rüssmann, Michael Herman, Eduard Zehnder, Jean-Christophe Yoccoz and
Jürgen Pöschel, among others. Let us explain what is meant by “almost
integrable”.
A Hamiltonian system with d degrees of freedom is said to be integrable (in
the sense of Liouville) if it admits d first integrals I1 , . . . , Id :

• independent: that is, such that the gradients


 
∂Ij ∂Ij ∂Ij ∂Ij
grad Ij = , ,..., , , 1 ≤ j ≤ d,
∂q1 ∂p1 ∂qd ∂pd
are linearly independent at every point on an open and dense subset of the
domain;
• in involution: that is, such that the Poisson brackets
d
∂Ij ∂Ik ∂Ij ∂Ik
{Ij , Ik } = −
i=1
∂qi ∂pi ∂pi ∂qi

are all identically zero.

It follows from the previous remarks that every system with d = 1 degree
of freedom is integrable: the Hamiltonian H itself is a first integral. Another
important example:

Example 4.4.3. For any number d ≥ 1 of degrees of freedom, assume that


the Hamiltonian H depends only on the variables p = (p1 , . . . , pd ). Then the
Hamilton–Jacobi equations (4.4.1) reduce to
dqj ∂H dpj ∂H
= (p) and =− (p) = 0.
dt ∂pj dt ∂qj
The second equation means that each pj is a first integral; it is easy to see that
the first integrals are independent and in involution. Then the expression on the
right-hand side of the first equation is independent of time. Hence, the solution
is given by
∂H
qj (t) = qj (0) + (p(0)) t.
∂pi
As we are going to comment in the following, this example is totally typical of
integrable systems.
4.4 Comments in conservative dynamics 123

A classical theorem of Liouville asserts that if the system is integrable


then the Hamilton–Jacobi equations may be solved completely by quadratures.
In the proof (see Arnold [Arn78]) one constructs certain functions ϕ =
(ϕ1 , . . . , ϕd ) with values in Td which, together with the first integrals I =
(I1 , . . . , Id ) ∈ Rd , constitute canonical coordinates of the system (they are called
action-angle coordinates). What we mean by “canonical” is that the coordinate
change
 : (q, p) → (ϕ, I)
preserves the form of the Hamilton–Jacobi equations: (4.4.1) becomes
dϕj ∂H  dIj ∂H 
= and =− , (4.4.2)
dt ∂Ij dt ∂ϕj
where H  = H ◦  −1 is the expression of the Hamiltonian in the new
coordinates. Since the Ij are first integrals, the second equation yields
dIj ∂H 
0= =− .
dt ∂ϕj
This means that H does not depend on the variables ϕj and so we are in the type
of situation described in Example 4.4.3. Each trajectory of the Hamiltonian
flow is constrained inside a torus {I = const} and, according to the first equation
in (4.4.2), it is linear in the coordinate ϕ:
∂H 
ϕj (t) = ϕj (0) + ωj (I)t, where ωj (I) = (I).
∂Ij
In terms of the original coordinates (q, p), we conclude that the trajectories of
the Hamiltonian flow are given by
t →  −1 (ϕ(0) + ω(I)t, I) = ϕ(0),I (ω(I)t), (4.4.3)
where ϕ(0),I : Rd → M is a Zd -periodic function and ω(I) = (ω1 (I), . . . , ωd (I))
is called a frequency vector. We say that the trajectory is quasi-periodic.

4.4.2 Kolmogorov–Arnold–Moser theory


It is clear that integrable systems are never ergodic. However, since integra-
bility is a very rare property, this alone would not be an obstruction to most
Hamiltonian systems being ergodic. Nevertheless, the fundamental result that
we state next asserts that generic integrable systems are robustly non-ergodic:
every nearby Hamiltonian flow is also non-ergodic.
Let H0 be an integrable Hamiltonian, written in action-angle coordinates
(ϕ, I). More precisely, let Bd be a ball in Rd and assume that H0 (ϕ, I) is defined
for every (ϕ, I) ∈ Td × Bd but depends only on the coordinate I. We call H0
non-degenerate if its Hessian matrix is invertible:
 2 
∂ H0
det  = 0 at every point. (4.4.4)
∂Ii ∂Ij i,j
124 Ergodicity

Observe that the Hessian matrix of H0 coincides with the Jacobian matrix of
the function I → ω(I). Therefore, the twist condition (4.4.4) means that the
map assigning to each value of I the corresponding frequency vector ω(I) is a
local diffeomorphism.
The next theorem means that, under this condition, most of the invariant tori
of the Hamiltonian flow of H0 persist for any nearby system:

Theorem 4.4.4. Let H0 be an integrable non-degenerate Hamiltonian of class


C∞ . Then there exists a neighborhood V of H0 in the space C∞ (Td × Bd , R)
such that for every H ∈ V there exists a compact set K ⊂ Td × Bd satisfying:

(i) K is a union of differentiable tori of the form {(ϕ, u(ϕ)) : ϕ ∈ Td } each of


which is invariant under the Hamiltonian flow of H;
(ii) the restriction of the Hamiltonian flow of H to each of these tori is
conjugate to a linear flow on Td ;
(iii) the set K has positive volume and, in fact, the volume of its complement
goes to zero when H → H0 .

In particular, the Hamiltonian flow of H cannot be ergodic.

The latter is because the set K may be decomposed into positive volume
subsets that are also unions of invariant tori and, thus, are invariant. The proof
of the theorem shows that the persistence or not of a given invariant torus of H0
is intimately related to the arithmetic properties of the corresponding frequency
vector. Let us explain this.
Given c > 0 and τ > 0, we say that a vector ω0 ∈ Rd is (c, τ )-Diophantine if
c
|k · ω0 | ≥ τ
for every k ∈ Zd , (4.4.5)
k
where k = |k1 | + · · · + |kd |. Diophantine vectors are rationally independent;
in fact, the condition (4.4.5) means that ω0 is badly approximated by rationally
dependent vectors. We say that ω0 is τ -Diophantine if it is (c, τ )-Diophantine
for some c > 0. The set of τ -Diophantine vectors is non-empty if and only if
τ ≥ d − 1; moreover, it has full measure in Rd if τ is strictly larger than d − 1
(see Exercise 4.4.1).
While proving Theorem 4.4.4, it is shown that, given c > 0, τ ≥ d − 1 and
any compact set  ⊂ ω(Bd ), one can find a neighborhood V of H0 such that,
for every H ∈ V and every (c, τ )-Diophantine vector ω0 ∈ , the Hamiltonian
flow of H admits a differentiable invariant torus restricted to which the flow is
conjugate to the linear flow t → ϕ(t) = ϕ(0) + tω0 .
Next, we discuss a version of Theorem 4.4.4 for discrete time systems or,
more precisely, symplectic transformations. We call a symplectic manifold
(see Arnold [Arn78, Chapter 8]) any differentiable manifold M endowed
with a symplectic form, that is, a non-degenerate differential 2-form θ . By
“non-degenerate” we mean that for every x ∈ M and every u  = 0 there exists
4.4 Comments in conservative dynamics 125

v such that θx (u, v) = 0. Existence of a symplectic form on M implies that the


dimension is even: write dim M = 2d. Moreover, the d-th power θ d = θ ∧· · ·∧θ
is a volume form on M.
A differentiable transformation f : M → M is said to be symplectic if it
preserves the symplectic form, meaning that θx (u, v) = θf (x) (Df (x)u, Df (x)v)
for every x ∈ M and any u, v ∈ Tx M. Then, in particular, f preserves the volume
form θ d .

Example 4.4.5. Let M = R2d , with coordinates (q1 , . . . , qd , p1 , . . . , pd ), and let


θ be the differential 2-form defined by

θx = dq1 ∧ dp1 + · · · + dqd ∧ dpd (4.4.6)

for every x. Then θ is a symplectic form on M. Actually, a classical theorem


of Darboux states that for every symplectic form there exists some atlas of the
manifold such that the expression of the symplectic form in any local chart is
of the type (4.4.6). Consider any transformation of the form

f0 (q1 , . . . , qd , p1 , . . . , pd ) = (q1 + ω1 (p), . . . , qd + ωd (p), p1 , . . . , pd ).

Using
∂ ∂ ∂ ∂  ∂ωi ∂
Df0 · = and Df0 · = + ,
∂qj ∂qj ∂pj ∂pj i
∂pj ∂qi

we see that f0 is symplectic with respect to the form θ .

Example 4.4.6 (Cotangent bundle). Let M be a manifold of class Cr with r ≥


3. By definition, the cotangent space Tq∗ M at each point q ∈ M is the dual
of the tangent space Tq M, and the cotangent bundle of M is the disjoint union
T ∗ M = q∈M Tq∗ M of all cotangent spaces. See Appendix A.4.3. The cotangent
bundle is a manifold of class Cr−1 and the canonical projection π ∗ : T ∗ M → M
mapping each Tq∗ M to the corresponding base point q is a map of class Cr−1 .
A very important feature of the cotangent bundle is that it always admits a
canonical symplectic form, that is, one that depends only on the manifold M.
That can be seen as follows. Let α be the differential 1-form on T ∗ M defined by

α(q,p) : T(q,p) (T ∗ M) → R, α(q,p) = p ◦ Dπ ∗ (q, p)

for each (q, p) ∈ T ∗ M. It is clear that α is well defined and of class Cr−2 .
Consider the exterior derivative θ ∗ = dα. One can check (for instance, using
local coordinates) that θ ∗ is non-degenerate at every point and, thus, is a
symplectic form in T ∗ M.

There is no corresponding statement for the tangent bundle TM. However,


once we fix a Riemannian metric on M it is possible to endow the tangent
bundle with a (non-canonical) symplectic form:
126 Ergodicity

Example 4.4.7 (Tangent bundle). Let M be a Riemannian manifold of class


Cr with r ≥ 3. Then we may identify the tangent bundle TM with the cotangent
bundle T ∗ M through the map : TM → T ∗ M that maps each point (q, v) with
v ∈ Tq M to the point (q, p) with p ∈ Tq∗ M defined by

p : Tq M → R, p(w) = v ·q w.

Indeed, is a diffeomorphism and it maps fibers of TM to fibers of T ∗ M,


preserving the base point. In particular, we may use to transport the

symplectic form θ in Example 4.4.6 to a symplectic form θ in TM:
   
θ(q,v) w1 , w2 = θ ∗ (q,v) D (q, v)w1 , D (q, v)w2

for any w1 , w2 ∈ T(q,v) (TM). It is clear from the construction that, unlike θ ∗ ,
this form θ depends on the Riemannian metric in M.
By analogy with the case of flows, we call a transformation f0 integrable if
there exist coordinates (q, p) ∈ Td × Bd such that f0 (q, p) = (q + ω(p), p) for
every (q, p). Moreover, we say that f0 is non-degenerate if

the map p → ω(p) is a local diffeomorphism. (4.4.7)

Theorem 4.4.8. Let f0 be a non-degenerate integrable transformation of class


C∞ . Then there exists a neighborhood V of f0 in the space C∞ (Td × Bd , Rd )
such that for every symplectic transformation2 f ∈ V there exists a compact set
K ⊂ Td × Bd satisfying:

(i) K is a union of differentiable tori of the form {(q, u(q)) : q ∈ Td }, each of


which is invariant under f ;
(ii) the restriction of the transformation f to each of these tori is conjugate to
a translation on Td ;
(iii) the set K has positive volume and, in fact, the volume of the complement
converges to zero when f → f0 .

In particular, the transformation f cannot be ergodic.


Just as in the previous (continuous time) situation, the set K is formed by
tori restricted to which the dynamics is conjugate to a Diophantine rotation.
Theorems 4.4.4 and 4.4.8 extend to systems of class Cr with r finite but
sufficiently large, depending on the dimension. For example, the version of
Theorem 4.4.8 for d = 1 is true for r > 3 and false for r < 3; in the boundary
case r = 3, parts (i) and (ii) of the theorem remain valid but part (iii) does not.
The notion of Hamiltonian flow extends to any symplectic manifold (M, θ ),
as follows. Let H : M → R be a function of class C2 and dH(z) : Tz M → R
denote its derivative at each point z ∈ M. By the definition of symplectic form,

2 Relative to the canonical symplectic form (4.4.6).


4.4 Comments in conservative dynamics 127

each θz : Tz M × Tz M → R is a non-degenerate alternate 2-form. Hence, there


exists exactly one vector XH (z) ∈ Tz M such that
θz (XH (z), v) = dH(z)v for every v ∈ Tz M.

The map z → XH (z) is a vector field of class C1 on the manifold M. This is


called the Hamiltonian vector field associated with H. The corresponding flow,
given by the differential equation
dz
= XH (z), (4.4.8)
dt
is the Hamiltonian flow associated with H. We leave it to the reader to check
that (4.4.8) corresponds precisely to the Hamilton–Jacobi equations (4.4.1)
when M = R2d and θ is the symplectic form in Example 4.4.5.

4.4.3 Elliptic periodic points


The ideas behind the results stated in the previous section may be used to
describe the behavior of conservative systems in the neighborhood of elliptic
periodic points. Let us explain this briefly, starting with the symplectic case in
dimension 2.
When M is a surface, the notions of symplectic form and area form coincide.
Thus, a differentiable transformation f : M → M is symplectic if and only if
it preserves area. Let ζ ∈ M be an elliptic fixed point, that is, such that the
eigenvalues of Df (ζ ) are in the unit circle. Let λ and λ̄ be the eigenvalues. We
say that the fixed point ζ is non-degenerate if λk  = 1 for every 1 ≤ k ≤ 4. Then,
by the Birkhoff normal form theorem (see Arnold [Arn78, Appendix 7]), there
exist canonical coordinates (x, y) ∈ R2 in the neighborhood of the fixed point,
with ζ = (0, 0), such that the transformation f has the form:
f (θ , ρ) = (θ + ω0 + ω1 ρ, ρ) + R(θ, ρ) with |R(θ , ρ)| ≤ C|ρ|2 , (4.4.9)

where (θ , ρ) ∈ S1 × R are the “polar” coordinates defined by


√ √
x = ρ cos 2π θ and y = ρ sin 2π θ .
Observe that the normal form f0 : (θ , ρ) → (θ + ω0 + ω1 ρ, ρ) is integrable.
Moreover, f0 satisfies the twist condition (4.4.7) as long as ω1  = 0 (this
condition does not depend on the choice of the canonical coordinates, just on
the transformation f ). Then one may apply the methods of Theorem 4.4.8 to
conclude that there exists a set K with positive area that is formed by invariant
circles with Diophantine rotation numbers, that is, such that the restriction of f
to each of these circles is conjugate to a Diophantine rotation. Even more, the
fixed point ζ is a density point of this set:
m(B(ζ , r) \ K)
lim = 0,
r→0 m(B(ζ , r))

where B(ζ , r) represents the ball of radius r > 0 around ζ .


128 Ergodicity

We will refer to points ζ as in the previous paragraph as generic elliptic fixed


points. An important consequence of what we just said is that generic elliptic
fixed points of area-preserving transformations are stable: the trajectory of
any point close to ζ remains close to ζ for all times, as it is “trapped” on the
inside of some small invariant circle. This feature does not extend to higher
dimensions, as we will explain shortly.
Still in dimension two, we want to mention other important dynamical
phenomena that take place in the neighborhood of generic elliptic fixed points.
Let us start by presenting a very useful tool, known as the Poincaré–Birkhoff
fixed point theorem or Poincaré last theorem. The statement was proposed by
Poincaré, who also presented some special cases, a few months before his
death; the general case was proved by Birkhoff [Bir13] in the following year.
Let A = S1 × [a, b], with 0 < a < b, and let f : A → A be a homeomorphism
that preserves each of the boundary components of the annulus A. We say
that f is a twist homeomorphism if it rotates the two boundary components in
opposite senses or, more precisely, if there exists some lift F : R × [a, b] →
R × [a, b], F(θ , ρ) = ("(θ , ρ), R(θ, ρ)), of the map f to the universal cover of
the annulus, such that
  
"(θ , a) − θ "(θ , b) − θ < 0 for every θ ∈ R. (4.4.10)

Theorem 4.4.9 (Poincaré–Birkhoff fixed point). If f : A → A is a twist


homeomorphism that preserves area then f admits at least two fixed points
in the interior of A.
As mentioned previously, every generic elliptic fixed point ζ is accumulated
by invariant circles with Diophantine rotation numbers. Any two such disks
bound an annulus around ζ . Applying Theorem 4.4.9 (or, more precisely, its
corollary in Exercise 4.4.6) one gets that any such annulus contains, at least, a
pair of periodic orbits with the same period.
In a sense, these pairs of periodic orbits are what is left of the invariant
circles of the normal form f0 with rational rotation numbers, which are
usually destroyed by the addition of the term R in (4.4.9). Their periods go
to infinity when one approaches ζ . Generically, one of these periodic orbits is
hyperbolic (saddle points) and the other one is elliptic. An example is sketched
in Figure 4.3: the elliptic fixed point ζ is surrounded by a hyperbolic periodic
orbit and an elliptic periodic orbit, marked with the letters p and q, respectively,
both with period 4. Two invariant circles around ζ are also represented.
The Swiss mathematician Eduard Zehnder proved that, generically, the
hyperbolic periodic orbits exhibit transverse homoclinic points, that is, their
stable manifolds and unstable manifolds intersect transversely, as depicted
in Figure 4.3. This implies that the geometry of the stable manifolds and
unstable manifolds is extremely complex. Moreover, the elliptic periodic orbits
satisfy the genericity conditions mentioned previously. This means that all
4.4 Comments in conservative dynamics 129

p q p

q q
ζ

p p
q

Figure 4.3. Invariant circles, periodic orbits and homoclinic intersections in the
neighborhood of a generic elliptic fixed point

Figure 4.4. Computational evidence for the presence of invariant circles, elliptic
islands and transverse homoclinic intersections

the dynamical complexity that we are describing in the neighborhood of ζ is


reproduced in the neighborhood of each one of these “satellite” elliptic orbits
(which have their own “satellites”, etc.).
Moreover, a theory developed by the French physicist Serge Aubry and
the American mathematician John Mather shows that ζ is also accumulated
by certain infinite, totally disconnected invariant sets, restricted to which
the transformation f is minimal (all the orbits are dense). In a sense, these
Aubry–Mather sets are a souvenir of the invariant circles of the normal form
f0 with irrational non-Diophantine rotation numbers that are also typically
destroyed by the addition of the perturbation term R in (4.4.9).
Figure 4.4 illustrates a good part of what we have been saying. It depicts
several computer-calculated trajectories of an area-preserving transformation.
The behavior of these trajectories suggests the presence of invariant circles,
elliptic satellites with their own invariant circles, and even hyperbolic orbits
130 Ergodicity

with associated transverse homoclinic intersections. One can also observe the
presence of certain trajectories with “chaotic” behavior, apparently related to
those homoclinic intersections.
More generally, let f : M → M be a symplectic diffeomorphism on a
symplectic manifold M of any (even) dimension 2d ≥ 2. We say that a
fixed point ζ ∈ M is elliptic if all the eigenvalues of the derivative Df (ζ )
are in the unit circle. Let λ1 , λ̄1 , . . . , λd , λ̄d be those eigenvalues. We say
k k
that ζ is non-degenerate if λ11 . . . λdd = 1 for every (k1 , . . . , kd ) ∈ Zd with
|k1 | + · · · + |kd | ≤ 4 (in particular, the eigenvalues are all distinct). Then, by
the Birkhoff normal form theorem (see Arnold [Arn78, Appendix 7]), there
exist canonical coordinates (x1 , . . . , xd , y1 , . . . , yd ) ∈ R2d in a neighborhood of ζ
such that ζ = (0, . . . , 0, 0, . . . , 0) and the transformation f has the form

f (θ , ρ) = (θ + ω0 + ω1 (ρ), ρ) + R(θ, ρ) with R(θ, ρ) ≤ const ρ2 ,

where ω0 ∈ Rd , ω1 : Rd → Rd is a linear map and (θ , ρ) ∈ Td × Rd are the


“polar” coordinates defined by
√ √
xj = ρj cos 2π θj and yj = ρj sin 2π θj , j = 1, . . . , d.

Assuming that ω1 is an isomorphism (this is yet another generic condition


on the transformation f ), we have that the normal form

f0 : (θ , ρ) → (θ + ω0 + ω1 (ρ), ρ)

is integrable and satisfies the twist condition (4.4.7). Applying the ideas of
Theorem 4.4.8, one concludes that ζ is a density point of a set K formed
by invariant tori of dimension d, restricted to which the transformation f is
conjugate to a Diophantine rotation.
In particular, symplectic transformations with generic elliptic fixed (or
periodic) points are never ergodic. Observe, on the other hand, that for d > 1
a torus of dimension d does not separate the ambient space M into two
connected components. Therefore, the argument we used before to conclude
that generic elliptic fixed points on surfaces are stable does not extend to higher
dimensions. In fact, it is known that when d > 1 elliptic fixed points are usually
unstable: trajectories starting arbitrarily close to the fixed point may escape
from a fixed neighborhood of it. This is related to the phenomenon known as
Arnold diffusion, which is a very active research topic in this area.
Finally, let us mention that this theory also applies to continuous time
conservative systems. We say that a stationary point ζ of a Hamiltonian flow
is elliptic if all the eigenvalues of the derivative of the vector field at the point
ζ are pure imaginary numbers. Arguments similar to those in the discrete time
case show that, under generic hypothesis, ζ is a density point of a set formed
by invariant tori of dimension d restricted to each of which the Hamiltonian
flow is conjugate to a linear flow.
4.4 Comments in conservative dynamics 131

Moreover, there are corresponding results for periodic trajectories of


Hamiltonian flows. One way to obtain such results is by considering a
cross-section to the flow at some point of the periodic trajectory and applying
the previous ideas to the corresponding Poincaré map. In this way one finds
that, under generic conditions, elliptic periodic trajectories of Hamiltonian
flows are accumulated by sets with positive volume consisting of invariant tori
of the flow.
The theory of Kolmogorov–Arnold–Moser has many other applications, in a
wide variety of situations in mathematics that go beyond the scope of this book.
The reader may find more complete information in the following references:
Arnold [Arn78], Bost [Bos86], Yoccoz [Yoc92], de la Llave [dlL93] and
Arnold, Kozlov and Neishtadt [AKN06], among others.

4.4.4 Geodesic flows


Let M be a compact Riemannian manifold. Some of the notions that are used
here are recalled in Appendix A.4.
It follows from the theory of ordinary differential equations that for each
(x, v) ∈ TM there exists a unique geodesic γx,v : R → M of the manifold M
such that γx,v (0) = x and γ̇x,v (0) = v. Moreover, the family of transformations
defined by
f t : (x, v) → (γx,v (t), γ̇x,v (t))
is a flow on the tangent bundle TM, which is called the geodesic flow of M. We
denote by T 1 M the unit tangent bundle, formed by the pairs (x, v) ∈ TM with
v = 1. The unit tangent bundle is invariant under the geodesic flow.
Equivalently, the geodesic flow may be defined as the Hamiltonian flow in
the tangent bundle TM (with the symplectic structure defined in Example 4.4.7)
associated with the Hamiltonian function H(x, v) = v2 . So, (f t )t preserves
the Liouville measure of the tangent bundle.
In this context, the Liouville measure may be described as follows. Every
inner product in a finite-dimensional vector space induces a volume element3
in that space, relative to which the cube spanned by any orthonormal basis
has volume 1. In particular, the Riemannian metric induces a volume element
dv on each tangent space Tx M. Integrating this volume element along M, we
get a volume measure dx on the manifold itself. The Liouville measure of TM
is given, locally, by the product dxdv. Moreover, its restriction m to the unit
tangent bundle is given, locally, by the product dxdα, where dα is the measure
of angle on the unit sphere of Tx M.
The fact that H is a first integral means that the norm v is constant
along trajectories of the flow. In particular, (f t )t leaves the unit tangent bundle

3 That is, a volume form defined up to sign: the sign is not determined because the inner product
does not detect the orientation of the vector space.
132 Ergodicity

invariant. Furthermore, the geodesic flow preserves the restriction m of the


Liouville measure to T 1 M. However, the behavior of geodesic flows is, usually,
very different from the dynamics of the almost integrable systems that we
described in Section 4.4.2.
For example, the Austrian mathematician Eberhard F. Hopf [Hop39] proved
in 1939 that if M is a compact surface with negative Gaussian curvature at
every point then its geodesic flow is ergodic. Almost three decades later, his
theorem was extended to manifolds in any dimension, through the following
remarkable result of the Russian mathematician Dmitry Anosov [Ano67]:
Theorem 4.4.10 (Anosov). Let M be a compact manifold with negative
sectional curvature. Then the geodesic flow on the unit tangent bundle is
ergodic with respect to the Liouville measure on T 1 M.
Thus, the geodesic flows of manifolds with negative curvature were the first
important class of Hamiltonian systems for which the ergodic hypothesis could
be validated rigorously.

4.4.5 Anosov systems


There are two fundamental steps in the proof of Theorem 4.4.10. The first
one is to show that every geodesic flow on a manifold with negative curvature
is uniformly hyperbolic. This means that every trajectory γ of the flow is
contained in invariant submanifolds W s (γ ) and W u (γ ) that intersect each other
transversely along γ and satisfy:
• every trajectory in W s (γ ) is exponentially asymptotic to γ in the future;
• every trajectory in W u (γ ) is exponentially asymptotic to γ in the past;

(see Figure 4.5), with exponential convergence rates that are uniform, that is,
independent of γ . Moreover, the geodesic flow is transitive. The second main
step in the proof of Theorem 4.4.10 consists of showing that every transitive,
uniformly hyperbolic flow (or transitive Anosov flow) of class C2 that preserves
volume is ergodic. We will comment on this last issue in a little while.
There exists a corresponding notion for discrete time systems: we say that
a diffeomorphism f : N → N on a compact Riemannian manifold is uniformly

W s (γ )

γ
W u (γ)

Figure 4.5. Hyperbolic behavior


4.4 Comments in conservative dynamics 133

hyperbolic (or an Anosov diffeomorphism) if the tangent space to the manifold


at every point z ∈ N admits a direct sum decomposition Tz N = Ezs ⊕ Ezu such
that the decomposition is invariant under the derivative of f :
Df (z)Ezs = Efs(z) and Df (z)Ezu = Efu(z) for every z ∈ N, (4.4.11)
and the derivative contracts Ezs and expands Ezu , uniformly:
sup Df (z) | Ezs  < 1 and sup Df (z)−1 | Ezu  < 1 (4.4.12)
z∈N z∈N

(for some choice of a norm compatible with the Riemannian metric on M).
One can prove that for each z ∈ N the set W s (z) of points whose forward
trajectory is asymptotic to the trajectory of z is a differentiable (immersed)
submanifold of N tangent to Ezs at the point z; analogously, the set W s (z)
of points whose backward trajectory is asymptotic to the trajectory of z is a
differentiable submanifold tangent to Ezu at the point z. These submanifolds
form foliations (that is, decompositions of N into differentiable submanifolds)
that are invariant under the diffeomorphism:
f (W s (z)) = W s (f (z)) and f (W u (z)) = W u (f (z)) for every z ∈ N.
We call W s (z) the stable manifold (or stable leaf ) and W u (z) the unstable
manifold (or unstable leaf ) of the point z ∈ M.
Concerning the second part of the proof of Theorem 4.4.10, the crucial tech-
nical tool to prove that every transitive, uniformly hyperbolic diffeomorphism
of class C2 that preserves volume is ergodic is the following theorem of Anosov
and Sinai [AS67]:
Theorem 4.4.11 (Absolute continuity). The stable and unstable foliations of
any Anosov diffeomorphism (or flow) of class C2 are absolutely continuous:
1. if X ⊂ N has zero volume then X ∩ W s (x) has volume zero inside W s (x) for
almost every x ∈ N;
2. if Y ⊂  is a zero volume subset of some submanifold  transverse to the
stable foliation, then the union of the stable manifolds through the points of
Y has zero volume in N;
and analogously for the unstable foliation.
Ergodicity of the system may then be deduced using the Hopf argument,
which we introduced in a special case in Section 4.2.6. Let us explain this.
Given any continuous function ϕ : N → R, let Eϕ be the set of all points
z ∈ N for which the forward and backward time averages, ϕ + (z) and ϕ − (z),
are well defined and coincide. This set Eϕ has full volume, as we have
seen in Corollary 3.2.8. Observe also that ϕ + is constant on each stable
manifold and ϕ − is constant on each unstable manifold. So, by the first part
of Theorem 4.4.11, the intersection Yz = W u (z) ∩ Eϕ has full volume in W u (z)
for almost every z ∈ N. Moreover, ϕ − = ϕ + is constant on each Yz . Fix any
134 Ergodicity

such z. The transitivity hypothesis implies that the union of all stable manifolds
through the points of W u (z) is the whole ambient manifold N. Hence, using the
second part of Theorem 4.4.11, the union of the stable manifolds through the
points of Yz has full volume in N. Clearly, ϕ + is constant on this union. This
shows that the time average of every continuous function ϕ is constant on a
full measure set. Hence, f is ergodic.
We close this section by observing that all the known examples of Anosov
diffeomorphisms are transitive. The corresponding statement for Anosov flows
is false (see Verjovsky [Ver99]). Another open problem in this setting is
whether ergodicity still holds when the Anosov system is only of class C1 . It is
known (see [Bow75b, RY80]) that in this case the absolute continuity theorem
(Theorem 4.4.11) is false, in general.

4.4.6 Billiards
As we have seen in Sections 4.4.2 and 4.4.3, non-ergodic systems are quite
common in the realm of Hamiltonian flows and symplectic transformations.
However, this fact alone is not sufficient to invalidate the ergodic hypothesis
of Boltzmann in the context where it was formulated. Indeed, ideal gases are
a special class of systems and it is conceivable that ergodicity could be typical
in this more restricted setting, even it is not typical for general Hamiltonian
systems.
In the 1960’s, the Russian mathematician and theoretical physicist Yakov
Sinai [Sin63] conjectured that Hamiltonian systems formed by spherical
hard balls that hit each other elastically are ergodic. Hard ball systems (see
Example 4.4.13 for a precise definition) had been proposed as a model for the
behavior of ideal gases by the American scientist Josiah Willard Gibbs who,
together with Boltzmann and Scottish mathematician and theoretical physicist
James Clark Maxwell, created the area of statistical mechanics. The ergodic
hypothesis of Boltzmann–Sinai, as Sinai’s conjecture is often referred to, is the
main topic in the present section.
In fact, we are going to discuss the problem of ergodicity for somewhat
more general systems, called billiards, whose formal definition was first given
by Birkhoff in the 1930’s.
In its simplest form, a billiard is given by a bounded connected domain  ⊂
R , called the billiard table, whose boundary ∂ is formed by a finite number
2

of differentiable curves. We call the corners those points of the boundary where
it fails to be differentiable; by hypothesis, they constitute a finite set C ⊂ ∂.
One considers a point particle moving uniformly along straight lines inside
, with elastic reflections on the boundary. That is, whenever the particle hits
∂ \ C it is reflected in such a way that the angle of incidence equals the angle
of reflection. When the particle hits some corner it is absorbed: its trajectory is
not defined from then on.
4.4 Comments in conservative dynamics 135

∂Ω ∂Ω

θ
s s′
θ′ s′
θ
θ
s

Figure 4.6. Dynamics of billiards

Let us denote by n the unit vector field orthogonal to the boundary ∂ and
pointing to the inside of . It defines an orientation in ∂\C: a vector t tangent
to the boundary is positive if the basis {t, n} of R2 is positive. It is clear that the
motion of the particle is characterized completely by the sequence of collisions
with the boundary. Moreover, each such collision may be described by the
position s ∈ ∂ and the angle of reflection θ ∈ (−π/2, π/2). Therefore, the
evolution of the billiard is governed by the transformation
f : (∂ \ C) × (−π/2, π/2) → ∂ × (−π/2, π/2), (4.4.13)
that associates with each collision (s, θ ) the subsequent one (s , θ  ). See
Figure 4.6.
In the example on the left-hand side of Figure 4.6 the billiard table is a
polygon, that is, the boundary consists of a finite number of straight line
segments. The one trajectory represented in the figure hits one of the corners.
Nearby trajectories, to either side, collide with distinct boundary segments,
with very different angles of incidence. In particular, it is clear that the billiard
transformation (4.4.13) cannot be continuous. Discontinuities may occur even
in the absence of corners. For example, on the right-hand side of Figure 4.6
the boundary has four connected components, all of which are differentiable
curves. Consider the trajectory represented in the figure, tangent to one of the
boundary components. Nearby trajectories, to either side, hit with different
boundary components. Consequently, the billiard map is discontinuous in this
case also.
Example 4.4.12 (Circular billiard table). On the left-hand side of Figure 4.7
we represent a billiard in the unit ball  ⊂ R2 . The corresponding billiard
transformation is given by
f : (s, θ ) → (s − (π − 2θ ), θ ).
The behavior of this transformation is described geometrically on the
right-hand side of Figure 4.7. Observe that f preserves the area measure ds dθ
and satisfies the twist condition (4.4.4). Note also that f is integrable (in the
sense of Section 4.4.2) and, in particular, the area measure is not ergodic.
We will see in a while (Theorem 4.4.14) that every planar billiard preserves
a natural measure equivalent to the area measure on ∂ × (−π/2, π/2).
136 Ergodicity
s
θ = π/ 2
θ

π −2θ

θ′

θ = −π/ 2
s′

Figure 4.7. Billiard on a circular table

Then, using the previous observations, the KAM theory allows us to prove
that billiards with almost circular tables are not ergodic with respect to that
invariant measure.

The definition of billiard extends immediately to bounded connected


domains  in any Euclidean space Rd , d ≥ 1, whose boundary consists of
a finite number of differentiable hypersurfaces intersecting each other along
submanifolds with codimension larger than 1. We denote by C the union of
the submanifolds. As before, we endow ∂ with the orientation induced by
the unit vector n orthogonal to the boundary and pointing to the “inside” of .
Elastic reflections on the boundary are defined by the following two conditions:
(i) the incident trajectory segment, the reflected trajectory segment and the
orthogonal vector n are co-planar and (ii) the angle of incidence equals the
angle of reflection. The billiard transformation is defined as in (4.4.13), having
as domain
{(s, v) ∈ (∂ \ C) × Sd−1 : v · n(s) > 0}.
Even more generally, we may take as a billiard table any bounded connected
domain in a Riemannian surface, whose boundary is formed by a finite
number of differentiable hypersurfaces intersecting along higher codimension
submanifolds. The definitions are analogous, except that the trajectories
between consecutive reflections on the boundary are given by segments of
geodesics and angles are measured according to the Riemannian metric on
the manifolds.

Example 4.4.13 (Ideal gases and billiards). Ideally, a gas is formed by a


large number N of molecules (N ≈ 1027 ) that move uniformly along straight
lines, between collisions, and collide with each other elastically. Check the
right-hand side of Figure 4.8. For simplicity, let us assume that the molecules
are identical spheres and that they are contained in the torus4 of dimension

4 One may replace the torus Td by a more plausible container, such as the d-dimensional cube
[0, 1]d , for example. However, the analysis is a bit more complicated in that case, because we
must take into account the collisions of the balls with the container’s walls.
4.4 Comments in conservative dynamics 137

d
Rij

vj

vi
vj vj

vi

vi

Figure 4.8. Model for an ideal gas

d ≥ 2. Let us also assume that all the molecules move with constant unit speed.
This system can be modelled by a billiard, as follows.
For 1 ≤ i ≤ N, denote by pi ∈ Td the position of the center of the i-th molecule
Mi . Let ρ > 0 be the radius of each molecule. Then, each state of the system is
entirely described by a value of p = (p1 , . . . , pN ) in the set
 = {p = (p1 , . . . , pN ) ∈ TNd : pi − pj  ≥ 2ρ for every i = j}
(this set is connected, as long as the radius ρ is sufficiently small).
In the absence of collisions, the point p moves along a straight line inside ,
with constant speed. When two molecules Mi and Mj collide, pi − pj  = 2ρ
and the velocity vectors change in the following way. Let vi and vj be the
velocity vectors of the two molecules immediately before the collision and
let Rij be the straight line through pi and pj . The elasticity hypothesis means
that the velocity vectors vi and vj immediately after the collision are given by
(check the right-hand side of Figure 4.8):
(i) the components of vi and vi in the direction of Rij are symmetric and the
same is true for vj and vj ;
(ii) the components of vi and vi in the direction orthogonal to Rij are equal and
the same is true for vj and vj .
This means, precisely, that the point p undergoes elastic reflection on
the hypersurface {p ∈ ∂ : pi − pj  = 2ρ} of the boundary of  (see
Exercise 4.4.4). Therefore, the motion of the point p corresponds exactly to
the evolution of the billiard in the table .
The next result places billiards well inside the domain of interest of ergodic
theory. Let ds be the volume measure induced on the boundary ∂ by the
Riemannian metric of the ambient manifold; in the planar case (that is, when
 ⊂ R2 ), ds is just the arc-length. Denote by dθ the angle measure on each
hemisphere {v ∈ Sd−1 : v · n(s) > 0}.
Theorem 4.4.14. The transformation f preserves the measure ν = cos θ ds dθ
on the domain {(s, v) ∈ ∂ × Sd−1 : v · n(s) > 0}.
138 Ergodicity
dh
s′ s′

dt

s s

Figure 4.9. Calculating the derivative of the billiard map

In what follows we sketch the proof for planar billiards. The reader should
have no trouble checking that all the arguments extend naturally to arbitrary
dimension.
Consider any family of trajectories starting from a given boundary point (this
means that s is fixed), as represented on the left-hand side of Figure 4.9. Let
this family be parameterized by the angle of reflection θ . Denote by (s, s ) the
length of the line segment connecting s to s . Then (s, s )dθ = dh = cos θ  ds
and, thus,
∂s (s, s )
= .
∂θ cos θ 
To calculate the derivative of θ  with respect to θ , observe that the variation
of θ  is the sum of two components: the first one corresponds to the variation
of θ , whereas the second one arises from the variation of the normal vector
n(s ) as the collision point s varies. By the definition of curvature, this
second component is equal to κ(s )ds . It follows that dθ  = dθ + κ(s )ds and,
consequently,
∂θ  ∂s (s, s )
= 1 + κ(s ) = 1 + κ(s ) .
∂θ ∂θ cos θ 
This can be summarized as follows:
  
∂ (s, s ) ∂  (s, s ) ∂
Df (s, θ ) =  
+ 1 + κ(s ) . (4.4.14)
∂θ cos θ ∂s cos θ ∂θ 


Next, consider any family of parallel trajectories, as represented on the


right-hand side of Figure 4.9. Let this family be parameterized by the
arc-length t in the direction orthogonal to the trajectories. The variations of
s and s along this family are given by − cos θ ds = dt = cos θ  ds . Since the
trajectories all have the same direction, the variations of the angles θ and θ 
arise, solely, from the variations of the normal vectors n(s) and n(s ) as s and
s vary. That is, dθ = κ(s)ds and dθ  = κ(s )ds . Therefore,
 
1 ∂ κ(s) ∂ 1 ∂ κ(s ) ∂
Df (s, θ ) − − = + . (4.4.15)
cos θ ∂s cos θ ∂θ cos θ  ∂s cos θ  ∂θ 
4.4 Comments in conservative dynamics 139

Let J(s, θ ) be the matrix of the derivative Df (s, θ ) with respect to the bases
{∂/∂s, ∂/∂θ} and {∂/∂s , ∂/∂θ  }. The relations (4.4.14) and (4.4.15) imply that
 
 (s,s ) 1 
 θ  θ  
 cos
(s,s ) κ(s ) 
cos
 1 + κ(s ) cos θ cos θ  cos θ
det J(s, θ ) =   = . (4.4.16)
 0 − 1  cos θ 
 cos θ 
 1 − κ(s) 
cos θ

So, by change of variables,


  
cos θ
ϕ dν = ϕ(s , θ ) cos θ ds dθ = ϕ(f (s, θ )) cos θ 
    
ds dθ
cos θ 
 
= ϕ(f (s, θ )) cos θ ds dθ = (ϕ ◦ f ) dν

for every bounded measurable function ϕ. This proves that f preserves the
measure ν = cos θ ds dθ , as we stated.
We call a billiard dispersing if the boundary of the billiard table is strictly
convex at every point, when viewed from the inside. In the planar case, with
the orientation conventions that we adopted, this means that the curvature κ is
negative at every point. Figure 4.10 presents two examples. In the first one,
 ⊂ R2 and the boundary is a connected set formed by the union of five
differentiable curves. In the second example,  ⊂ T2 and the boundary has
three connected components, all of which are differentiable and convex.
The class of dispersing billiards was introduced by Sinai in his 1970
article [Sin70]. The denomination “dispersing” refers to the fact that in
such billiards any (thin) beam of parallel trajectories becomes divergent
upon reflection on the boundary, as illustrated on the left-hand side of
Figure 4.10. Sinai observed that dispersing billiards are hyperbolic systems,
in a non-uniform sense: invariant sub-bundles Ezs and Ezu as in (4.4.11) exist
at almost every point and, instead of (4.4.12), we have that the derivative
is contracting along Ezs and expanding along Ezu asymptotically, that is, for
sufficiently large iterates (depending on the point z).

∂Ω

Figure 4.10. Dispersive billiards


140 Ergodicity

Figure 4.11. Bunimovich stadium and mushroom

The billiards associated with ideal gases (Example 4.4.13) with N = 2


molecules are dispersing: it is easy to see that {(p1 , p2 ) ∈ R2d : p1 − p2  = 2ρ}
is a convex hypersurface. Consequently, these billiards are hyperbolic, in the
sense of the previous paragraph. Using a subtle version of the Hopf argument,
Sinai proved in [Sin70] that such billiards are ergodic, at least when d = 2.
This was later extended to arbitrary dimension d ≥ 2 by Sinai and his student
Nikolai Chernov [SC87], still in the case N = 2. Thus, dispersing billiards were
the first class of billiards for which ergodicity was proven rigorously.
The case N ≥ 3 of the Boltzmann–Sinai ergodic hypothesis is a lot
more difficult because the corresponding billiards are not dispersing: the
hypersurface
{(p1 , p2 , . . . , pN ) ∈ RNd : p1 − p2  = 2ρ}
has cylinder geometry, with zero curvature along the direction of the variables
pi , i > 2. Such billiards are called semi-dispersing. Most results in this setting
are due to the Hungarian mathematicians András Krámli, Nándor Simányi and
Domoko Szász. In [KSS91, KSS92] they proved hyperbolicity and ergodicity
for N = 3 and also for N = 4 assuming that d ≥ 3. Later, Simányi [Sim02]
proved hyperbolicity for the general case: any number of spheres, in any
dimension. The problem of ergodicity remains open, in general, although there
are many other partial results.
There are now several known examples of ergodic billiards that are not
dispersing. This even includes some billiards whose boundary curvature is
non-negative at every point. The best-known example is the Bunimovich
stadium, whose boundary is formed by two semi-circles and two straight line
segments. See Figure 4.11. This billiard is hyperbolic, but this property arises
from a different mechanism, called defocusing: a beam of parallel trajectories
reflecting on a concave segment of the billiard table wall starts by focusing, but
then gets dispersed. Another interesting example is the Bunimovich mushroom:
hyperbolic behavior and elliptic behavior coexist on disjoint invariant sets both
with positive measure.

4.4.7 Exercises
4.4.1. We say that ω ∈ Rd is τ -Diophantine if it is (c, τ )-Diophantine, that is, if it
satisfies (4.4.5), for some c > 0. Prove that the set of τ -Diophantine vectors is
4.4 Comments in conservative dynamics 141

non-empty if and only if τ ≥ d − 1. Moreover, show that the set has full Lebesgue
measure in Rd whenever τ is strictly larger than d − 1.
4.4.2. Consider a billiard on a rectangular table. Check that every trajectory that does
not hit any corner either is periodic or is dense in the billiard table.
4.4.3. Show that every billiard on an acute triangle exhibits some periodic trajectory.
[Observation: the same is true for right triangles, but the problem is open for
obtuse triangles.]
4.4.4. Consider the billiard model for ideal gases in Example 4.4.13. Check that elastic
collisions between any two molecules correspond to the elastic reflections of the
billiard point particle on the boundary of .
4.4.5. Prove Theorem 4.4.9 under the additional hypothesis that the function ρ →
"(θ , ρ) is monotone (increasing or decreasing) for every θ ∈ R.
4.4.6. Consider the context of Theorem 4.4.9 but, instead of (4.4.10), assume that f
rotates the two boundary components of A with different velocities: there exists
some lift F : R × [a, b] → R × [a, b] and there exist p, q ∈ Z with q ≥ 1, such that,
denoting F q = ("q , Rq ),
 q  
" (θ , a) − p − θ "q (θ, b) − p − θ < 0 for every θ ∈ R. (4.4.17)

Show that f has two periodic orbits with period q in the interior of A, at least.
4.4.7. Let  be a convex domain in the plane whose boundary ∂ is a differentiable
curve. Show that the billiard on  has infinitely many periodic orbits.
5
Ergodic decomposition

For convex subsets of vector spaces with finite dimension, it is clear that every
element of the convex set may be written as a convex combination of the
extremal elements. For example, every point in a triangle may be written as
a convex combination of the vertices of the triangle. In view of the results in
Section 4.3, it is natural to ask whether a similar property holds in the space
of invariant probability measures, that is, whether every invariant measure is a
convex combination of ergodic measures.
The ergodic decomposition theorem, which we prove in this chapter
(Theorem 5.1.3), asserts that the answer is positive, except that the number
of “terms” in this combination is not necessarily finite, not even countable.
This theorem has several important applications; in particular, it permits the
reduction of the proof of many results to the case when the system is ergodic.
We are going to deduce the ergodic decomposition theorem from another
important result from measure theory, the Rokhlin disintegration theorem.
The simplest instance of this theorem holds when we have a partition of
a probability space (M, μ) into finitely many measurable subsets P1 , . . . , PN
with positive measure. Then, obviously, we may write μ as a linear
combination

μ = μ(P1 )μ1 + · · · + μ(PN )μN


of its normalized restrictions μi (E) = μ(E ∩ Pi )/μ(Pi ) to each of the partition
elements. The Rokhlin disintegration theorem (Theorem 5.1.11) states that this
type of disintegration of the probability measure is possible for any partition
P (possibly uncountable!) that can be obtained as the limit of an increasing
sequence of finite partitions.

5.1 Ergodic decomposition theorem


Before stating the ergodic decomposition theorem, let us analyze a couple of
examples that help motivate and clarify its content:
5.1 Ergodic decomposition theorem 143

Example 5.1.1. Let f : [0, 1] → [0, 1] be given by f (x) = x2 . The Dirac


measures δ0 and δ1 are invariant and ergodic for f . It is also clear that x = 0 and
x = 1 are the unique recurrent points for f and so every invariant probability
measure μ must satisfy μ({0, 1}) = 1. Then, μ = μ({0})δ0 + μ({1})δ1 is a
(finite) convex combination of the ergodic measures.
Example 5.1.2. Let f : T2 → T2 be given by f (x, y) = (x + y, y). The Lebesgue
measure m on the torus is preserved by f . Observe that every horizontal circle
Hy = S1 × {y} is invariant under f and the restriction f : Hy → Hy is the rotation
Ry . Let my be the Lebesgue measure on Hy . Observe that my is also invariant
under f . Moreover, my is ergodic whenever y is irrational. On the other hand,
by the Fubini theorem,

m(E) = my (E) dy for every measurable set E. (5.1.1)

The identity is not affected if we consider the integral restricted to the subset
of irrational values of y. Then (5.1.1) presents m as an (uncountable) convex
combination of ergodic measures.

5.1.1 Statement of the theorem


Let us start by introducing some useful terminology. In what follows, (M, B, μ)
is a probability space and P is a partition of M into measurable subsets. We
denote by π : M → P the canonical projection that assigns to each point x ∈ M
the element P(x) of the partition that contains it. This projection map endows
P with the structure of a probability space, as follows. Firstly, by definition, a
subset Q of P is measurable if and only if its pre-image
π −1 (Q) = union of all P ∈ P that belong to Q
is a measurable subset of M. It is easy to check that this definition is consistent:
the family B̂ of measurable subsets is a σ -algebra in P. Then, we define the
quotient measure μ̂ by
μ̂(Q) = μ(π −1 (Q)) for every Q ∈ B̂.

Theorem 5.1.3 (Ergodic decomposition). Let M be a complete separable


metric space, f : M → M be a measurable transformation and μ be an
invariant probability measure. Then there exist a measurable set M0 ⊂ M
with μ(M0 ) = 1, a partition P of M0 into measurable subsets and a family
{μP : P ∈ P} of probability measures on M, satisfying

(i) μP (P) = 1 for μ̂-almost every P ∈ P;


(ii) P → μP (E) is measurable, for every measurable set E ⊂ M;
(iii) μP is invariant and ergodic for μ̂-almost every P ∈ P;
(iv) μ(E) = μP (E) dμ̂(P), for every measurable set E ⊂ M.
144 Ergodic decomposition

Part (iv) of the theorem means that μ is a convex combination of the ergodic
probability measures μP , where the “weight” of each μP is determined by the
probability measure μ̂. Part (ii) ensures that the integral in (iv) is well defined.
Moreover (see Exercise 5.1.3), it implies that the map P → M1 (M) given by
P → μP is measurable.

5.1.2 Disintegration of a measure


We are going to deduce Theorem 5.1.3 from an important result in measure
theory, the Rokhlin disintegration theorem, which has many other applications.
To state this theorem we need the following notion.
Definition 5.1.4. A disintegration of μ with respect to a partition P is a family
{μP : P ∈ P} of probability measures on M such that, for every measurable set
E ⊂ M:

(i) μP (P) = 1 for μ̂-almost every P ∈ P;


(ii) the map P → R, defined by P → μP (E) is measurable;
(iii) μ(E) = μP (E) dμ̂(P).

Recall that the partition P inherits from M a natural structure of probability


space, with a σ -algebra B̂ and a probability measure μ̂. The measures μP are
called conditional probabilities of μ with respect to P.
Example 5.1.5. Let P = {P1 , . . . , Pn } be a finite partition of M into measurable
subsets with μ(Pi ) > 0 for every i. The quotient measure μ̂ is given by
μ̂({Pi }) = μ(Pi ). Consider the normalized restriction μi of μ to each Pi :
 
μ E ∩ Pi
μi (E) = for every measurable set E ⊂ M.
μ(Pi )
Then {μ1 , . . . , μn } is a disintegration of μ with respect to P: it is clear that

μ(E) = ni=1 μ̂({Pi })μi (E) for every measurable set E ⊂ M.
This construction extends immediately to countable partitions. In the next
example we treat an uncountable case:
Example 5.1.6. Let M = T2 and P be the partition of M into horizontal circles
S1 × {y}, y ∈ S1 . Let m be the Lebesgue measure on T2 and m̂ be the Lebesgue
measure on S1 . Denote by my the Lebesgue measure (arc-length) on each
horizontal circle S1 × {y}. By the Fubini theorem,

m(E) = my (E) dm̂(y) for every measurable set E ⊂ T2 .

Hence, {my : y ∈ S1 } is a disintegration of m with respect to P.


The next proposition asserts that disintegrations are essentially unique, when
they exist. The hypothesis is very general: it holds, for example, if M is a
5.1 Ergodic decomposition theorem 145

topological space with a countable basis of open sets and B is the Borel
σ -algebra:

Proposition 5.1.7. Assume that the σ -algebra B admits some countable


generator. If {μP : P ∈ P} and {μP : P ∈ P} are disintegrations of μ with respect
to P, then μP = μP for μ̂-almost every P ∈ P.

Proof. Let  be a countable generator of the σ -algebra B and A be the algebra


generated by . Note that A is countable, since it coincides with the union
of the (finite) algebras generated by the finite subsets of . For each A ∈ A,
consider the sets
QA = {P ∈ P : μP (A) > μP (A)} and RA = {P ∈ P : μP (A) < μP (A)}.
If P ∈ QA then P is contained in π −1 (QA ) and, using property (i) in the
definition of disintegration, μP (A ∩ π −1 (QA )) = μP (A). Otherwise, P is
disjoint from π −1 (QA ) and, hence, μP (A ∩ π −1 (QA )) = 0. Moreover, these
conclusions remain valid when one takes μP in the place of μP . Hence, using
property (iii) in the definition of disintegration,
⎧  
⎪ −1
μ ∩ π (Q ) dμ̂(P) = QA μP (A) dμ̂(P)
 ⎨ P
A
 −1
P A
μ A ∩ π (QA ) =

⎩ 
 −1
 
μ
P P A ∩ π (Q A d μ̂(P) = QA μP (A) d μ̂(P).
)

Since μP (A) > μP (A) for every P ∈ QA , this implies that μ̂(QA ) = 0 for every
A ∈ A. A similar argument shows that μ̂(RA ) = 0 for every A ∈ A. So,

Q A ∪ RA
A∈A

is also a subset of P with measure zero. For every P in the complement of


this subset, the measures μP and μP coincide on the generating algebra A and,
consequently, they coincide on the whole σ -algebra B.

On the other hand, disintegrations may fail to exist:

Example 5.1.8. Let f : S1 → S1 be an irrational rotation and P be the partition


of S1 whose elements are the orbits {f n (x) : n ∈ Z} of f . Assume that there
exists a disintegration {μP : P ∈ P} of the Lebesgue measure μ with respect to
P. Consider the iterates {f∗ μP : P ∈ P} of the conditional probabilities. Since
the partition elements are invariant sets, f∗ μP (P) = μP (P) = 1 for μ̂-almost
every P. It is clear that, given any measurable set E ⊂ M,
P → f∗ μP (E) = μP (f −1 (E))
is a measurable function. Moreover, since μ is an invariant measure,
 
−1 −1
μ(E) = μ(f (E)) = μP (f (E)) dμ̂(P) = f∗ μP (E) dμ̂(P).
146 Ergodic decomposition

These observations show that {f∗ μP : P ∈ P} is a disintegration of μ with


respect to the partition P. By uniqueness (Proposition 5.1.7), it follows that
f∗ μP = μP for μ̂-almost every P. That is, almost every conditional probability
μP is invariant. That is a contradiction, because P = {f n (x) : n ∈ Z} is an infinite
countable set and so there can be no invariant probability measure giving P a
positive weight.

The theorem of Rokhlin states that disintegrations always exist if the


partition P is the limit of an increasing sequence of countable partitions and
the space M is reasonably well behaved. The precise statement is given in the
section that follows.

5.1.3 Measurable partitions


We say that P is a measurable partition if there exists some measurable set
M0 ⊂ M with full measure such that, restricted to M0 ,

)
P= Pn
n=1

for some increasing sequence P1 ≺ P2 ≺ · · · ≺ Pn ≺ · · · of countable partitions


(see also Exercise 5.1.1). By Pi ≺ Pi+1 we mean that every element of Pi+1 is
contained in some element of Pi or, equivalently, every element of Pi coincides
with a union of elements of Pi+1 . Then we say that Pi is coarser than Pi+1 or,
equivalently, that Pi+1 is finer than Pi .
*
Represent by ∞ n=1 Pn the partition whose elements are the non-empty
intersections of the form ∞ n=1 Pn with Pn ∈ Pn for every n. Equivalently, this
is the coarser partition such that

)
Pn ≺ Pn for every n.
n=1

It follows immediately from the definition that every countable partition is


measurable. It is easy to find examples of uncountable measurable partitions:

Example 5.1.9. Let M = T2 , endowed with the Lebesgue measure m, and let
P be the partition of M into horizontal circles S1 × {y}. Then P is a measurable
partition. To see that, consider

Pn = {S1 × I(i, n) : i = 1, . . . , 2n },

where I(i, n), 1 ≤ i ≤ 2n is the segment of S1 = R/Z corresponding to the


interval [(i − 1)/2n , i/2n ) ⊂ R. The sequence (Pn )n is increasing and P =
*∞
n=1 Pn .
5.1 Ergodic decomposition theorem 147

On the other hand, not all partitions are measurable:


Example 5.1.10. Let f : M → M be a measurable transformation and μ
be an ergodic probability measure. Let P be the partition of M whose
elements are the orbits of f . Then P is not measurable, unless f exhibits an
orbit with full measure. Indeed, suppose that there exists a non-decreasing
sequence P1 ≺ P2 ≺ · · · ≺ Pn ≺ · · · of countable partitions such that P =
*∞
n=1 Pn restricted to some full measure subset. This last condition implies
that almost every orbit of f is contained in some element Pn of the partition
Pn . In other words, up to measure zero, every element of Pn is invariant
under f . By ergodicity, it follows that for every n there exists exactly
one Pn ∈ Pn such that μ(Pn ) = 1. Denote P = ∞ n=1 Pn . Then P is an
*∞
element of the partition n=1 Pn = P, that is, P is an orbit of f , and it has
μ(P) = 1.

Theorem 5.1.11 (Rokhlin disintegration). Assume that M is a complete


separable metric space and P is a measurable partition. Then the probability
measure μ admits some disintegration with respect to P.
Theorem 5.1.11 is proven in Section 5.2. The hypothesis that P is
measurable is, actually, also necessary for the conclusion of the theorem (see
Exercise 5.2.2).

5.1.4 Proof of the ergodic decomposition theorem


At this point we are going to use Theorem 5.1.11 to prove the ergodic
decomposition theorem. Let U be a countable basis of open sets of M and
A be the algebra generated by U . Note that A is countable and that it generates
the Borel σ -algebra of M. By the ergodic theorem of Birkhoff, for every
A ∈ A there exists a set MA ⊂ M with μ(MA ) = 1 such that the mean sojourn
time τ (A, x) is well defined for every x ∈ MA . Let M0 = A∈A MA . Note that
μ(M0 ) = 1, since the intersection is countable.
Now consider the partition P of M0 defined as follows: two points x, y ∈ M0
are in the same element of P if and only if τ (A, x) = τ (A, y) for every A ∈ A.
We claim that this partition is measurable. To prove that it is so, consider any
enumeration {Ak : k ∈ N} of the elements of the algebra A and let {qk : k ∈ N} be
an enumeration of the rational numbers. For each n ∈ N, consider the partition
Pn of M0 defined as follows: two points x, y ∈ M0 are in the same element of
Pn if and only if, given any i, j ∈ {1, . . . , n},
either τ (Ai , x) ≤ qj and τ (Ai , y) ≤ qj
or τ (Ai , x) > qj and τ (Ai , y) > qj .
2
It is clear that every Pn is a finite partition, with no more than 2n elements. It
follows immediately from the definition that x and y are in the same element
148 Ergodic decomposition
*∞
of n=1 Pn if and only if τ (Ai , x) = τ (Ai , y) for every i. This means that

)
P= Pn ,
n=1

which implies our claim.


So, by Theorem 5.1.11, there exists some disintegration {μP : P ∈ P} of μ
with respect to P. Parts (i), (ii) and (iv) of Theorem 5.1.3 are contained in the
definition of disintegration. To prove part (iii) it suffices to show that μP is
invariant and ergodic for μ̂-almost every P, which is what we do now.
Consider the family of probability measures {f∗ μP : P ∈ P}. Observe that
every P ∈ P is an invariant set, since mean sojourn times are constant on orbits.
It follows that
f∗ μP (P) = μP (f −1 (P)) = μP (P) = 1.
Moreover, given any measurable set E ⊂ M, the function
P → f∗ μP (E) = μP (f −1 (E))
is measurable and, using the fact that μ is invariant under f ,
 
−1 −1
μ(E) = μ(f (E)) = μP (f (E)) dμ̂(P) = f∗ μP (E) dμ̂(P).

This shows that {f∗ μP : P ∈ P} is a disintegration of μ with respect to P. By


uniqueness (Proposition 5.1.7), it follows that f∗ μP = μP for almost every P.
We are left to prove that μP is ergodic for almost every P. Since μ(M0 ) = 1,
we have that μP (M0 ∩ P) = 1 for almost every P. Hence, it is enough to prove
that, given any P ∈ P and any measurable set E ⊂ M, the mean sojourn time
τ (E, x) is well defined for every x ∈ M0 ∩ P and is constant on that set. Fix P
and denote by C the class of all measurable sets E for which this holds. By
construction, C contains the generating algebra A. Observe that if E1 , E2 ∈ C
with E1 ⊃ E2 then E1 \ E2 ∈ C:
τ (E1 \ E2 , x) = τ (E1 , x) − τ (E2 , x)
is well defined and it is constant on M0 ∩ P. In particular, if E ∈ C then Ec is
also in C. Analogously, C is closed under countable pairwise disjoint unions: if
Ej ∈ C are pairwise disjoint then
+ % 
τ Ej , x = τ (Ej , x)
j j

is well defined and it is constant on M0 ∩ P. It is easy to deduce that C is a


monotone class: given any sequences An , Bn ∈ C, n ≥ 1 with An ⊂ An+1 and
Bn ⊃ Bn+1 for every n, the two previous observations yield

 ∞
 ∞
 +
∞ %c
An = A1 ∪ (An+1 \ An ) ∈ C and Bn = Bcn ∈ C.
n=1 n=1 n=1 n=1
5.1 Ergodic decomposition theorem 149

By Theorem A.1.18, it follows that C contains the Borel σ -algebra of M.


This concludes the proof of Theorem 5.1.3 from Theorem 5.1.11.

5.1.5 Exercises
5.1.1. Show that a partition P is measurable if and only if there exist measurable subsets
M0 , E1 , E2 , . . . , En , . . . such that μ(M0 ) = 1 and, restricted to M0 ,
)

P = {En , M \ En }.
n=1

5.1.2. Let μ be an ergodic probability measure for a transformation f . Then μ is also


invariant under f k for any k ≥ 2. Describe the ergodic decomposition of μ for the
iterate f k .
5.1.3. Let M be a metric space and X be a measurable space. Prove that the following
conditions are all equivalent:
(a) the map ν : X → M1 (M), x → νx is measurable;
(b) the map X → R, x → ϕ dνx is measurable, for every bounded continuous
function ϕ : M → R;
(c) the map X → R, x → ψ dνx is measurable, for every bounded measurable
function ψ : M → R;
(d) the map X → R, x → νx (E) is measurable, for every measurable set E ⊂ M.
5.1.4. Prove that if {μP : P ∈ P} is a disintegration of μ then
   
ψ dμ = ψ dμP dμ̂(P)

for every bounded measurable function ψ : M → R.


5.1.5. Let μ be a probability measure invariant under a measurable transformation
f : M → M. Let f̂ : M̂ → M̂ be the natural extension of f and μ̂ be the lift
of μ (Section 2.4.2). Relate the ergodic decomposition of μ to the ergodic
decomposition of μ̂.
5.1.6. When M is a compact metric space, we may obtain the ergodic decomposition of
an invariant probability measure μ by taking for M0 the subset of points x ∈ M
such that
1
n−1
μx = lim δf j (x)
n n
j=0

exists in the weak∗ topology and taking for P the partition of M0 defined
by P(x) = P(y) ⇔ μx = μy . Check the details of this alternative proof of
Theorem 5.1.3 for compact metric spaces.
5.1.7. Let σ :  →  be the shift map in  = {1, . . . , d}Z . Consider the partition W s of
 into “stable sets”

W s ((an )n ) = {(xn )n : xn = an for every n ≥ 0}.

Given any probability measure μ invariant under σ , let {μP : P ∈ P} be an ergodic


decomposition of μ. Check that P ≺ W s , restricted to a full measure subset of M.
150 Ergodic decomposition

5.2 Rokhlin disintegration theorem


Now we prove Theorem 5.1.11. Fix any increasing P1 ≺ P2 ≺ · · · ≺ Pn ≺ · · ·
*
of countable partitions such that P = ∞ n=1 Pn restricted to some full measure
set M0 ⊂ M. As before, we use Pn (x) to denote the element of Pn that contains
a given point x ∈ M.

5.2.1 Conditional expectations


Let ψ : M → R be any bounded measurable function. For each n ≥ 1, define
en (ψ) : M → R as follows:
⎧ 
⎨ 1
ψ dμ if μ(Pn (x)) > 0
en (ψ, x) = μ(Pn (x)) Pn (x) (5.2.1)

0 otherwise.
Since the partitions Pn are countable, the second case of the definition
corresponds to a subset of points with total measure zero. Observe also that
en (ψ) is constant on each Pn ∈ Pn ; let us denote by En (ψ, Pn ) the value of that
constant. Then,
   
ψ dμ = ψ dμ = μ(Pn )En (ψ, Pn ) = en (ψ) dμ (5.2.2)
Pn Pn Pn

for every n ∈ N (the sums involve only partition elements Pn ∈ Pn with positive
measure).

Lemma 5.2.1. Given any bounded measurable function ψ : M → R, there


exists a subset Mψ of M with μ(Mψ ) = 1 such that

(i) e(ψ, x) = limn en (ψ, x) exists for every x ∈ Mψ ;


(ii) e(ψ) : Mψ → R is measurable and constant on each P ∈ P;
(iii) ψ dμ = e(ψ) dμ.

Proof. Initially, suppose that ψ ≥ 0. For each α < β, let S(α, β) be the set of
points x ∈ M such that
lim inf en (ψ, x) < α < β < lim sup en (ψ, x).
n n

It is clear that the sequence en (ψ, x) diverges if and only if x ∈ S(α, β) for some
pair of rational numbers α < β. In other words, the limit e(ψ, x) exists if and
only if x belongs to the intersection Mψ of all S(α, β)c with rational α < β. As
this is a countable intersection, in order to prove that μ(Mψ ) = 1 it suffices to
show that μ(S(α, β)) = 0 for every α < β. We do this next.
Let α and β be fixed and denote S = S(α, β). Given x ∈ S, fix any sequence
of integers 1 ≤ ax1 < bx1 < · · · < axi < bxi < · · · such that
eaxi (ψ, x) < α and ebxi (ψ, x) > β for every i ≥ 1.
5.2 Rokhlin disintegration theorem 151

Define Ai to be the union of the partition elements Ai (x) = Paxi (x) and Bi to be
the union of the partition elements Bi (x) = Pbxi (x) obtained in this way, for all
points x ∈ S. By construction, S ⊂ Ai+1 ⊂ Bi ⊂ Ai for every i ≥ 1. In particular,
S is contained in the set
∞ ∞

S= Bi = Ai .
i=1 i=1

Since the sequence Pn , n ≥ 1, is monotone increasing, given any two of the sets
Ai (x) = Paxi (x) that form Ai , either they are disjoint or one is contained in the
other. It follows that the maximal sets Ai (x) are pairwise disjoint and, hence,
they constitute a partition of Ai . Hence, adding only over such maximal sets
with positive measure,
  
ψ dμ = ψ dμ ≤ αμ(Ai (x)) = αμ(Ai ),
Ai Ai (x) Ai (x) Ai (x)

for every i ≥ 1. Analogously,


  
ψ dμ = ψ dμ ≥ βμ(Bi (x)) = βμ(Bi ).
Bi Bi (x) Bi (x) Bi (x)

Since Ai ⊃ Bi and we are assuming that ψ ≥ 0, it follows that


 
αμ(Ai ) ≥ ψ dμ ≥ ψ dμ ≥ βμ(Bi ),
Ai Bi

for every i ≥ 1. Taking the limit as i → ∞, we find that αμ( S) ≥ βμ( S). This
implies that μ(S) = 0 and, hence, μ(S) = 0. This proves the claim when ψ is
non-negative. The general case follows immediately, since we may always write
ψ = ψ + − ψ − , where ψ ± are measurable, non-negative and bounded. Note that
en (ψ) = en (ψ + ) − en (ψ − ) for every n ≥ 1 and, hence, the conclusion of the
lemma holds for ψ if it holds for ψ + and ψ − . This ends the proof of claim (i).
The other claims are simple consequences of the definition. The fact that
e(ψ) is measurable follows directly from Proposition A.1.31. Since Pn is
coarser than P, it is clear that en (ψ) is constant on each P ∈ P, restricted to a
subset of M with full measure. Hence, the same is true for e(ψ). This proves
part (ii). Observe also that |en (ψ)| ≤ sup |ψ| for every n ≥ 1. Hence, we may
use the dominated convergence theorem to pass to the limit in (5.2.2). In this
way, we get part (iii).

We are especially interested in the case when ψ is a characteristic function:


ψ = XA for some measurable set A ⊂ M. In this case, the definition means that
 
μ Pn (x) ∩ A
e(ψ, x) = lim . (5.2.3)
n μ(Pn (x))
We denote by PA the subset of elements P of the partition P that intersect
Mψ . Observe that μ̂(PA ) = 1. Moreover, we define E(A) : PA → R by setting
152 Ergodic decomposition

E(A, P) = e(ψ, x) for any x ∈ Mψ ∩ P. Note that e(ψ) = E(A) ◦ π . Hence, the
function E(A) is measurable and satisfies:
  
ψ dμ = e(ψ) dμ = E(A) dμ̂. (5.2.4)

5.2.2 Criterion for σ -additivity


The hypothesis that the ambient space M is complete separable metric space is
used in the proof of the important criterion for σ -additivity that we now state
and prove:
Proposition 5.2.2. Let M be a complete separable metric space and A be an
algebra generated by a countable basis U = {Uk : k ∈ N} of open sets of M.
Let μ : A → [0, 1] be an additive function with μ(∅) = 0. Then μ extends to a
probability measure on the Borel σ -algebra of M.
First, let us outline the proof. We consider the product space  = {0, 1}N ,
endowed with the topology generated by the cylinders
[0; a0 , . . . , as ] = {(ik )k∈N : i0 = a0 , . . . , is = as }, s ≥ 0.
Note that  is compact (Exercise A.1.11). Using the fact that M is a complete
metric space, we will show that the map
 
γ : M → , γ (x) = XUk (x) k∈N
is a measurable embedding of M inside . Moreover, the function μ yields an
additive function ν defined on the algebra A generated by the cylinders of .
This algebra is compact (Definition A.1.15), since every element is compact.
Hence, ν extends to a probability measure on the Borel σ -algebra of ; we
still represent this extension by ν. We will show that the image γ (M) has full
measure for ν. Then, the image γ∗−1 ν is a probability measure on the Borel
σ -algebra of M. Finally, we will check that this probability measure is an
extension of the function μ.
Now let us detail these arguments. In what follows, given any set A ⊂ M, we
denote A1 = A and A0 = Ac .
Lemma 5.2.3. The image γ (M) is a Borel subset of .

Proof. Let x ∈ M and (ik )k = γ (x). It is clear that


k ij
(A) j=0 Uj = ∅ for every k ∈ N,

since x is in the intersection. Moreover, since U is a basis of open sets of M,

(B) there exists k ∈ N such that ik = 1 and diam Uk ≤ 1, and

(C) for every k ∈ N such that ik = 1 there exists l(k) > k such that il(k) = 1 and
Ūl(k) ⊂ Uk and diam Ul(k) ≤ diam Uk /2.
5.2 Rokhlin disintegration theorem 153

Conversely, suppose that (ik )k ∈  satisfies conditions (A), (B) and (C). We are
going to show that there exists x ∈ M such that γ (x) = (ik )k . For that, define

n
Fn = Vk ,
k=0

where Vk = Ukc if ik = 0 and Vk = Ūl(k) if ik = 1. Then (Fn )n is a decreasing


sequence of closed sets. Condition (A) assures that Fn = ∅ for every n ≥ 1.
Conditions (B) and (C) imply that the diameter of Fn converges to zero as
n → ∞. Then, since M is a complete metric space, the intersection n Fn
i
contains some point x. By construction, Fn is contained in nk=0 Ukk for every
n. It follows that
∞
i
x∈ Ukk ,
k=0
that is, γ (x) = (ik )k . In this way, we have shown that the image of γ is perfectly
characterized by the conditions (A), (B) and (C).
To conclude the proof it suffices to show that the subset described by each of
these conditions may be constructed from cylinders through countable unions
and intersections. Given k ∈ N, let N(k) be the set of (k + 1)-tuples (a0 , . . . , ak )
a a
in {0, 1} such that U0 0 ∩ · · · ∩ Uk k = ∅. Condition (A) corresponds to the subset

 
[0; a0 , . . . , ak ].
k=0 (a0 ,...,ak )∈N(k)

Let D = {k ∈ N : diam Uk ≤ 1}. Then, condition (B) corresponds to


 
[0; a0 , . . . , ak−1 , 1].
k∈D (a0 ,...,ak−1 )

Finally, given any k ∈ N, let L(k) be the set of all l > k such that Ūl ⊂ Uk and
diam Ul ≤ diam Uk /2. Condition (C) corresponds to the subset
∞  +
[0; a0 , . . . , ak−1 , 0]
k=0 a0 ,...,ak−1
  %
∪ [0; a0 , . . . , ak−1 , 1, ak+1 , . . . , al−1 , 1] .
l∈L(k) ak+1 ,...,al−1

This completes the proof of the lemma.

Corollary 5.2.4. The map γ : M → γ (M) is a measurable bijection whose


inverse is also measurable.

Proof. Given any points x = y in M, there exists k ∈ N such that Uk contains


one of the points but not the other. This ensures that γ is injective. For any
s ≥ 0 and a0 , . . . , as ∈ {0, 1},
γ −1 ([0; a0 , . . . , as ]) = U0 0 ∩ · · · ∩ Usas .
a
(5.2.5)
154 Ergodic decomposition

This implies that γ is measurable, because the cylinders generate the Borel
σ -algebra of . Next, observe that
a
γ (U0 0 ∩ · · · ∩ Usas ) = [0; a0 , . . . , as ] ∩ γ (M) (5.2.6)
a
for every s, a0 , . . . , as . Using Lemma 5.2.3, it follows that γ (U0 0 ∩ · · · ∩ Usas )
is a Borel subset of  for every s, a0 , . . . , as . This proves that the inverse
transformation γ −1 is measurable.
Now we are ready to prove that μ extends to a probability measure on
the Borel σ -algebra of M, as claimed in Proposition 5.2.2. For that, let us
consider the algebra A generated by the cylinders of . Note that the
elements of A are the finite pairwise disjoint unions of cylinders. In particular,
every element of A is compact and, consequently, A is a compact algebra
(Definition A.1.15). Define:
 a 
ν([0; a0 , . . . , as ]) = μ U0 0 ∩ · · · ∩ Usas , (5.2.7)
for every s ≥ 0 and a0 , . . . , as in {0, 1}. Then ν is an additive function in the set
of all cylinders, with values in [0, 1]. It extends in a natural way to an additive
function defined on the algebra A , which we still denote as ν.
It is clear that ν() = 1. Moreover, since the algebra A is compact, we may
use Theorem A.1.14 to conclude that the function ν : A → [0, 1] is σ -additive.
Hence, by Theorem A.1.13, the function ν extends to a probability measure
defined on the Borel σ -algebra of . Given any cover C of γ (M) by cylinders,
it follows from the definition (5.2.7) that
+ % + %
−1
ν C =μ γ (C) = μ(M) = 1.
C∈C C∈C

Taking the infimum over all covers, we conclude that ν(γ (M)) = 1.
By Corollary 5.2.4, the image γ∗−1 ν is a Borel probability measure on M.
By definition, and using the relation (5.2.6),
 a   + a   
γ∗−1 ν U0 0 ∩ · · · ∩ Usas = ν γ U0 0 ∩ · · · ∩ Usas = ν [0; a0 , . . . , as ] ∩ γ (M)
 a 
= ν([0; a0 , . . . , as ]) = μ U0 0 ∩ · · · ∩ Usas
for any s, a0 , . . . , as . This implies that γ∗−1 ν is an extension of the function
μ : A → [0, 1]. Therefore, the proof of Proposition 5.2.2 is complete.

5.2.3 Construction of conditional measures


Let U = {Uk : k ∈ N} be a basis of open sets of M and A be the algebra generated
by U . It is clear that A generates the Borel σ -algebra of M. Observe also that
A is countable: it coincides with the union of the (finite) algebras generated by
the subsets {Uk : 0 ≤ k ≤ n}, for every n ≥ 1. Define

P∗ = PA .
A∈A
5.2 Rokhlin disintegration theorem 155

Then μ̂(P∗ ) = 1, since the intersection is countable. For each P ∈ P∗ , define


μP : A → [0, 1], μP (A) = E(A, P). (5.2.8)
In particular, μP (M) = E(M, P) = 1. It is clear that μP is an additive function:
the definition (5.2.3) gives that
 
A ∩ B = ∅ ⇒ E A ∪ B, P = E(A, P) + E(B, P) for every P ∈ P∗ .
By Proposition 5.2.2, it follows that this function extends to a probability
measure defined on the Borel σ -algebra of M, which we still denote as μP .
We are left to check that this family of measures {μP : P ∈ P∗ } satisfies all the
conditions in the definition of disintegration (Definition 5.1.4).
Let us start with condition (i). Let P ∈ P∗ and, for every n ≥ 1, let Pn be the
element of the partition Pn that contains P. Observe that if A ∈ A is such that
A ∩ Pn = ∅ for some n then
 
μ A ∩ Pm
μP (A) = E(A, P) = lim = 0,
m μ(Pm )
since Pm ⊂ Pn for every m ≥ n. Fix n. For each s ≥ 0, let Psn be the union of all
a
sets of the form U0 0 ∩ · · · ∩ Usas that intersect Pn . By the previous observation,
the cylinders of length s + 1 that are not in Psn have measure zero for μP .
Therefore, μP (Psn ) = 1 for every s ≥ 0. Passing to the limit when s → ∞,
we conclude that μP (U) = 1 for every open set U that contains Pn . Since the
measure μP is regular (Proposition A.3.2), it follows that μP (Pn ) = 1. Passing
to the limit when n → ∞, we find that μP (P) = 1 for every P ∈ P∗ .
Finally, let C denote the family of all measurable sets E ⊂ M for which
conditions (ii) and (iii) hold. By construction (recall Lemma 5.2.1), given any
A ∈ A, the function P → μP (A) = E(A, P) is measurable and satisfies
 
μ(A) = E(A, P) dμ̂(P) = μP (A) dμ̂(P).

This means that A ⊂ C. We claim that C is a monotone class. Indeed, suppose


that B is the union of an increasing sequence (Bj )j of sets in C. Then, by
Proposition A.1.31,
P → μP (B) = sup μP (Bj ) is a measurable function
j

and, using the monotone convergence theorem,


  
μ(B) = lim μ(Bn ) = lim μP (Bn ) dμ̂ = lim μP (Bn ) dμ̂ = μP (B) dμ̂.
n n n

This means that B ∈ C. Analogously, if B is the intersection of a decreasing se-


quence of sets in C then P → μP (B) is measurable and μ(B) = μP (B) dμ̂(P).
That is, B ∈ C. This proves that C is a monotone class, as we claimed. By
Theorem A.1.18 it follows that C coincides with the Borel σ -algebra of M.
The proof of Theorem 5.1.11 is complete.
156 Ergodic decomposition

5.2.4 Exercises
5.2.1. Let P and Q be measurable partitions of (M, B, μ) such that P ≺ Q up to measure
zero. Let {μP : P ∈ P} be a disintegration of μ with respect to P and, for every
P ∈ P, let {μP,Q : Q ∈ Q, Q ⊂ P} be a disintegration of μP with respect to Q. Let
π : Q → P be the canonical projection, such that Q ⊂ π(Q) for almost every
Q ∈ Q. Show that {μπ(Q),Q : Q ∈ Q} is a disintegration of μ with respect to Q.
5.2.2. (Converse to the theorem of Rokhlin) Let M be a complete separable metric
space. Show that if P satisfies the conclusion of Theorem 5.1.11, that is, if μ
admits a disintegration with respect to P, then the partition P is measurable.
5.2.3. Let P1 ≺ · · · ≺ Pn ≺ · · · be an increasing sequence of countable partitions such
that the union n Pn generates the σ -algebra B of measurable sets, up to measure
zero. Show that the conditional expectation e(ψ) = limn en (ψ) coincides with ψ
at almost every point, for every bounded measurable function.
5.2.4. Prove Proposition 2.4.4, using Proposition 5.2.2.
6
Unique ergodicity

This chapter is dedicated to a distinguished class of dynamical systems,


characterized by the fact that they admit exactly one invariant probability
measure. Initially, in Section 6.1, we give alternative formulations of this
property and we analyze the properties of the unique invariant measure.
The relation between unique ergodicity and minimality is another important
theme. A dynamical system is said to be minimal if every orbit is dense in the
ambient space. As we observe in Section 6.2, every uniquely ergodic system is
minimal, restricted to the support of the invariant measure, but the converse is
not true, in general.
The main construction of uniquely ergodic transformations is algebraic in
nature. In Section 6.3 we introduce the notion of the Haar measure of a
topological group and we show that every transitive translation on a compact
metrizable topological group is minimal and even uniquely ergodic: the Haar
measure is the unique invariant probability measure.
In Section 6.4 we present a remarkable application of the idea of unique
ergodicity in the realm of arithmetics: the theorem of Hermann Weyl on the
equidistribution of polynomial sequences.
Throughout this chapter, unless stated otherwise, it is understood that M is
a compact metric space and f : M → M is a continuous transformation.

6.1 Unique ergodicity


We say that a transformation f : M → M is uniquely ergodic if it admits
exactly one invariant probability measure. The corresponding notion for flows
is defined in precisely the same way. This denomination is justified by the
observation that the invariant probability measure μ is necessarily ergodic.
Indeed, suppose there existed some invariant set A ⊂ M with 0 < μ(A) < 1.
Then the normalized restriction of μ to A, defined by
 
μ E∩A
μA (E) = for every measurable set E ⊂ M,
μ(A)
158 Unique ergodicity

would be an invariant probability measure, different from μ, which would


contradict the assumption that f is uniquely ergodic.
Proposition 6.1.1. The following conditions are equivalent:
(i) f admits a unique invariant probability measure;
(ii) f admits a unique ergodic probability measure;
(iii) for every continuous function ϕ : M → R, the sequence of time averages
n−1
n−1 j=0 ϕ(f j (x)) converges at every point to a constant;
(iv) for every continuous function ϕ : M → R, the sequence of time averages
n−1
n−1 j=0 ϕ ◦ f j converges uniformly to a constant.
Proof. It is easy to see that (ii) implies (i). Indeed, since invariant measure
is a convex combination of ergodic measures (Theorem 5.1.3), if there is a
unique ergodic probability measure then the invariant probability measure is
also unique. It is clear that (iv) implies (iii), since uniform convergence implies
pointwise convergence. To see that (iii) implies (ii), suppose that μ and ν are
ergodic probability measures of f . Then, given any continuous function ϕ :
M → R,

1 n−1 ⎨ ϕ dμ at μ-almost every point
lim ϕ(f (x)) =
j
n n ⎩
j=0 ϕ dν at ν-almost every point.
Since, by assumption, the limit does not depend on the point x, it follows that
 
ϕ dμ = ϕ dν

for every continuous function ϕ. Using Proposition A.3.3 we find that μ = ν.


We are left to prove that (i) implies (iv). Start by recalling that f admits
some invariant probability measure μ (by Theorem 2.1). The idea is to show
that if (iv) does not hold then there exists some probability measure ν = μ and,
hence, (i) does not hold either. Suppose then that (iv) does not hold, that is,
n−1
that there exists some continuous function ϕ : M → R such that n−1 j=0 ϕ ◦f j
does not converge uniformly to any constant; in particular, it does not converge
uniformly to ϕ dμ. By definition, this means that there exists ε > 0 such that
for every k ≥ 1 there exist nk ≥ k and xk ∈ M such that
 nk −1  
1  
 ϕ(f j
(x )) − ϕ dμ ≥ ε. (6.1.1)
n k 
k j=0

Let us consider the sequence of probability measures


nk −1
1 
νk = δj .
nk j=0 f (xk )

Since the space M1 (M) of probability measures on M is compact for the weak∗
topology (Theorem 2.1.5), up to replacing this sequence by a subsequence,
6.2 Minimality 159

we may suppose that it converges to some probability measure ν on M. By


Lemma 2.2.4 applied to the Dirac measure δx , the probability measure ν is
invariant under f . On the other hand, the fact that (νk )k converges to ν in the
weak∗ topology implies that
  nk −1
1 
ϕ dν = lim ϕ dνk = lim ϕ(f j (xk )).
k k nk
j=0

Then, recalling (6.1.1), we have that


  
 
 ϕ dν − ϕ dμ ≥ ε.
 

In particular, ν  = μ. This concludes the argument.

6.1.1 Exercises
6.1.1. Give an example of a transformation f : M → M in a compact metric space such
n−1
that (1/n) j=0 ϕ ◦ f j converges uniformly, for every continuous function ϕ :
M → R, but f is not uniquely ergodic.
6.1.2. Let f : M → M be a transitive continuous transformation in a compact metric
n−1
space. Show that if (1/n) j=0 ϕ ◦ f j converges uniformly, for every continuous
function ϕ : M → R, then f is uniquely ergodic.
6.1.3. Let f : M → M be an isometric homeomorphism in a compact metric space M.
Show that if μ is an ergodic measure for f then, for every n ∈ N, the function
ϕ(x) = d(x, f n (x)) is constant on the support of μ.

6.2 Minimality
Let  ⊂ M be a closed invariant set of f : M → M. We say that  is minimal
if it coincides with the closure of the orbit {f n (x) : n ≥ 0} of every point x ∈ .
We say that the transformation f is minimal if the ambient M is a minimal set.
Recall that the support of a measure μ is the set of all points x ∈ M such
that μ(V) > 0 for every neighborhood V of x. It follows immediately from the
definition that the complement of the support is an open set: if x ∈ / supp μ then
there exists an open neighborhood V such that μ(V) = 0; then V is contained
in the complement of the support. Therefore, supp μ is a closed set.
It is also easy to see that the support of any invariant measure is an invariant
set, in the following sense: f (supp μ) ⊂ supp μ. Indeed, let x ∈ supp μ and
let V be any neighborhood of y = f (x). Since f is continuous, f −1 (V) is a
neighborhood of x. Then μ(f −1 (V)) > 0, because x ∈ supp μ. Hence, using
that μ is invariant, μ(V) > 0. This proves that y ∈ supp μ.

Proposition 6.2.1. If f : M → M is uniquely ergodic then the support of the


unique invariant probability measure μ is a minimal set.
160 Unique ergodicity

Proof. Suppose that there exists x ∈ supp μ whose orbit {f j (x) : j ≥ 0} is not
dense in the support of μ. This means that there exists some open subset U of
M such that U ∩ supp μ is non-empty and
f j (x) ∈
/ U ∩ supp μ for every j ≥ 0. (6.2.1)
Let ν be any accumulation point of the sequence of probability measures

n−1
−1
νn = n δf j (x) , n≥1
j=0

with respect to the weak∗ topology. Accumulation points do exist, by


Theorem 2.1.5, and ν is an invariant probability measure, by Lemma 2.2.4.
The condition (6.2.1) means that νn (U) = 0 for every n ≥ 1. Hence, using
Theorem 2.1.2 (see also part 3 of Exercise 2.1.1) we have that ν(U) = 0. This
implies that no point of U is in the support of μ, which contradicts the fact that
U ∩ supp μ is non-empty.

The converse to Proposition 6.2.1 is false in general:


Theorem 6.2.2 (Furstenberg). There exists some real-analytic diffeomorphism
f : T2 → T2 that is minimal, preserves the Lebesgue measure m on the torus,
but is not ergodic for m. In particular, f is not uniquely ergodic.
In the remainder of this section we give a brief sketch of the proof of
this result. A detailed presentation may be found in the original paper of
Furstenberg [Fur61], as well as in Mañé [Mañ87]. In Section 7.3.1 we mention
other examples of minimal transformations that are not uniquely ergodic.
To prove Theorem 6.2.2, we look for a transformation f : T2 → T2 of
the form f (x, y) = (x + α, y + φ(x)), where α is an irrational number and
φ : S1 → R is a real-analytic function with φ(x) dx = 0. Note that f preserves
the Lebesgue measure on T2 . Let us also consider the map f0 : T2 → T2 given
by f0 (x, y) = (x+α, y). Note that no orbit of f0 is dense in T2 and that the system
(f0 , m) is not ergodic.
Let us consider the cohomological equation
u(x + α) − u(x) ≡ φ(x). (6.2.2)
If φ and α are such that (6.2.2) admits some measurable solution u : S1 → R
then (f0 , m) and (f , m) are ergodically equivalent (see Exercise 6.2.1) and,
consequently, (f , m) is not ergodic. On the other hand, one can show that if
(6.2.2) admits no continuous solution then f is minimal (the converse to this
fact is Exercise 6.2.2). Therefore, it suffices to find φ and α such that the
cohomological equation admits a measurable solution but not a continuous
solution.
It is convenient to express these conditions in terms of the Fourier expansion

φ(x) = n∈Z an e2πinx . To ensure that φ is real-analytic it is enough to
6.2 Minimality 161

require that:

there exists ρ < 1 such that |an | ≤ ρ n for every n sufficiently large. (6.2.3)

Indeed, in that case the series n∈Z an zn converges uniformly on every corona
{z ∈ C : r ≤ |z| ≤ r−1 } with r > ρ. In particular, its sum in the unit circle, which
coincides with φ, is a real-analytic function. Since we want φ to take values in
the real line and to have zero average, we must also require:

a0 = 0 and a−n = ān for every n ≥ 1. (6.2.4)

According to Exercise 6.2.3, the cohomological equation admits a solution


in the space L2 (m) if and only if
∞  2
 a 
 n 
 e2πniα − 1  < ∞. (6.2.5)
n=1

Moreover, the solution is uniquely determined: u = n∈Z bn e
2πinx
with
an
bn = for every n ∈ Z. (6.2.6)
e2πinα − 1
Fejér’s theorem (see [Zyg68]) states that if u is a continuous function
then the sequence of partial sums of its Fourier expansion converges Cesàro
uniformly to u:
⎛ ⎞
1 n k
⎝ bj e2πijx ⎠ converges uniformly to u(x). (6.2.7)
n k=1 j=−k

Hence, to ensure that u is not continuous it suffices to require:


⎛ ⎞
 k
⎝ bj ⎠ is not Cesàro convergent. (6.2.8)
j=−k
k

In this way, the problem is reduced to finding α and (an )n that satisfy (6.2.3),
(6.2.4), (6.2.5) and (6.2.8). Exercise 6.2.4 hints at the issues involved in the
choice of such objects.

6.2.1 Exercises
6.2.1. Show that if u is a measurable solution of the cohomological equation (6.2.2)
then h : T2 → T2 , h(x, y) = (x, y + u(x)) is an ergodic equivalence between (f0 , m)
and (f , m), that is, h is an invertible measurable transformation that preserves the
measure m and conjugates the two maps f and f0 . Deduce that (f , m) cannot be
ergodic.
6.2.2. Show that if u is a continuous solution of the cohomological equation (6.2.2) then
h : T2 → T2 , h(x, y) = (x, y + u(x)) is a topological conjugacy between f0 and f .
In particular, f cannot be transitive.
162 Unique ergodicity

6.2.3. Check that if u(x) = n∈Z bn e
2π inx
is a solution of (6.2.2) then
an
bn = for every n ∈ Z. (6.2.9)
e2π inα −1

Moreover, u ∈ L2 (m) if and only if ∞ n=1 |bn | < ∞.
2

6.2.4. We say that an irrational number α is Diophantine if there exist c > 0 and τ > 0
such that |qα − p| ≥ c|q|−τ for any p, q ∈ Z with q  = 0. Show that the condition
(6.2.5) is satisfied whenever α is Diophantine and φ satisfies (6.2.3).
6.2.5. (Theorem of Gottschalk) Let f : M → M be a continuous map in a compact metric
space M. Show that the closure of the orbit of a point x ∈ M is a minimal set if
and only if Rε = {n ∈ Z : d(x, f n (x)) < ε} is a syndetic set for every ε > 0.
6.2.6. Let f : M → M be a continuous map in a compact metric space M. We say that
x, y ∈ M are close if infn d(f n (x), f n (y)) = 0. Show that if x ∈ M is such that the
closure of its orbit is a minimal set then, for every neighborhood U of x and every
point y close to x, there exists an increasing sequence (ni )i such that f ni1 +···+nik (x)
and f ni1 +···+nik (y) are in U for any i1 < · · · < ik and k ≥ 1.
6.2.7. (Theorem of Hindman) A theorem of Auslander and Ellis (see [Fur81,
Theorem 8.7]) states that in the conditions of Exercise 6.2.6 the closure of the
orbit of every y ∈ M contains some point x that is close to y and such that the
closure of its orbit is a minimal set. Deduce the following refinement of the
theorem of van der Waerden: given any decomposition N = S1 ∪ · · · ∪ Sq of the set
of natural numbers into pairwise disjoint sets, there exists j such that Sj contains
a sequence n1 < · · · < ni < · · · such that ni1 + · · · + nik ∈ Sj for every k ≥ 1 and
any i1 < · · · < ik .

6.3 Haar measure


We are going to see that every compact topological group carries a remarkable
probability measure, called the Haar measure, that is invariant under every
translation and every surjective group endomorphism. Assuming that the group
is metrizable, every transitive translation is uniquely ergodic, with the Haar
measure as the unique invariant probability measure.

6.3.1 Rotations on tori


Fix d ≥ 1 and a rationally independent vector θ = (θ1 , . . . , θd ). As we have
seen in Section 4.2.1, the rotation Rθ : Td → Td is ergodic with respect to the
Lebesgue measure m on the torus. Our goal now is to show that, in fact, Rθ is
uniquely ergodic.
According to Proposition 6.1.1, we only have to show that, given any
continuous function ϕ : Td → R, there exists cϕ ∈ R such that

1
n−1
j
ϕn = ϕ ◦ Rθ converges to cϕ at every point. (6.3.1)
n j=0
6.3 Haar measure 163

Take cϕ = ϕ dm. By ergodicity, the sequence (ϕn )n of time averages


converges to cϕ at m-almost every point. In particular, ϕn (x) → cϕ for a dense
subset of values of x ∈ Td .
Let d be the distance induced in the torus Td = Rd /Zd by the usual norm
in Rd : the distance between any two points in the torus is the minimum of the
distances between all their representatives in Rd . It is clear that the rotation Rθ
preserves that distance:

d(Rθ (x), Rθ (y)) = d(x, y) for every x, y ∈ Td .

Then, using that ϕ is continuous, given any ε > 0 we may find δ > 0 such that
j j j j
d(x, y) < δ ⇒ d(Rθ (x), Rθ (y)) < δ ⇒ |ϕ(Rθ (x)) − ϕ(Rθ (y))| < ε

for every j ≥ 0. Then,

d(x, y) < δ ⇒ |ϕn (x) − ϕn (y)| < ε for every n ≥ 1.

Since ε does not depend on n, this proves that the sequence (ϕn )n is
equicontinuous.
This allows us to use the theorem of Ascoli to prove the claim (6.3.1), as
follows. Suppose that there exists x̄ ∈ Td such that (ϕn (x̄))n does not converge
to cϕ . Then there exists c  = cϕ and some subsequence (nk )k such that ϕnk (x̄)
converges to c when k → ∞. By the theorem of Ascoli, up to restricting to a
subsequence we may suppose that (ϕnk )k is uniformly convergent. Let ψ be its
limit. Then ψ is a continuous function such that ψ(x) = cϕ for a dense subset
of values of x ∈ Td but ψ(x̄) = c is different from cϕ . It is clear that such a
function does not exist. This contradiction proves our claim that Rθ is uniquely
ergodic.

6.3.2 Topological groups and Lie groups


Recall that a topological group is a group (G, ·) endowed with a topology with
respect to which the two operations

G × G → G, (g, h) → gh and G → G, g → g−1 (6.3.2)

are continuous. In all that follows it is assumed that the topology of G is such
that every set consisting of a single point is closed. When G is a manifold and
the operations in (6.3.2) are differentiable, we say that (G, ·) is a Lie group. See
Exercise 6.3.1.
The Euclidean space Rd is a topological group, and even a Lie group, relative
to addition +, and the same holds for the torus Td . Recall that Td is the quotient
of Rd by its subgroup Zd . This construction may be generalized as follows:

Example 6.3.1. Given any closed normal subgroup H of a topological group


G, let G/H be the set of equivalence classes for the equivalence relation defined
164 Unique ergodicity

in G by x ∼ y ⇔ x−1 y ∈ H. Denote by xH the equivalence class that contains


each x ∈ G. Consider the following group operation in G/H:

xH · yH = (x · y)H.

The hypothesis that H is a normal subgroup ensures that this operation is


well defined. Let π : G → G/H be the canonical projection, given by π(x) =
xH. Consider in G/H the quotient topology, defined in the following way:
a function ψ : G/H → X is continuous if and only if ψ ◦ π : G → X is
continuous. The hypothesis that H is closed ensures that the points are closed
subsets of G/H. It follows easily from the definitions that G/H is a topological
group. Recall also that if the group G is abelian then all subgroups are
normal.

Example 6.3.2 (Linear group). The set G = GL(d, R) of invertible real


matrices of dimension d is a Lie group for the multiplication of matrices,
called real linear group of dimension d. Indeed, G may be identified with an
2
open subset of the Euclidean space R(d ) and, thus, has a natural structure of a
differentiable manifold. Moreover, it follows directly from the definitions that
the multiplication of matrices and the inversion map A → A−1 are differentiable
with respect to this manifold structure. G has many important Lie subgroups,
such as the special linear group SL(d, R), consisting of the matrices with
determinant 1, and the orthogonal group O(d, R), formed by the orthogonal
matrices.

We call left-translation and right-translation associated with an element g


of the group G, respectively, the maps

Lg : G → G, Lg (h) = gh and Rg : G → G, Rg (h) = hg.

An endomorphism of G is a continuous map φ : G → G that preserves the


group operation, that is, such that φ(gh) = φ(g)φ(h) for every g, h ∈ G. When
φ is an invertible endomorphism, that is, a bijection whose inverse is also an
endomorphism, we call it an automorphism.

Example 6.3.3. Let A ∈ GL(d, Z); in other words, A is an invertible matrix of


dimension d with integer coefficients. Then, as we have seen in Section 4.2.5,
A induces an endomorphism fA : Td → Td . It can be shown that every
endomorphism of the torus Td is of this form.

A topological group is locally compact if every g ∈ G has some compact


neighborhood. For example, every Lie group is locally compact. On the other
hand, the additive group of rational numbers, with the topology inherited from
the real line, is not locally compact.
The following theorem is the starting point of the ergodic theory of locally
compact groups:
6.3 Haar measure 165

Theorem 6.3.4 (Haar). Let G be a locally compact topological group. Then:

(i) There exists some Borel measure μG on G that is invariant under all
left-translations, finite on compact sets and positive on open sets;
(ii) If η is a measure invariant under all left-translations and finite on
compact sets then η = cμG for some c > 0.
(iii) μG (G) < ∞ if and only if G is compact.

We are going to sketch the proof of parts (i) and (ii) in the special case
when G is a Lie group. It will be apparent that in this case μG is a volume
measure on G. The proof of part (iii), for any topological group, is proposed in
Exercise 6.3.4.
Starting with part (i), let e be the unit element and d ≥ 1 be the dimension of
the Lie group. Consider any inner product · in the tangent space Te G. For each
g ∈ G, represent by Lg : Te G → Tg G the derivative of the left-translation Lg at
the point e. Next, consider the inner product defined in Tg G in the following
way:
u · v = L−1 −1
g (u) · Lg (v) for every u, v ∈ Tg G.

It is clear that this inner product depends differentiably on g. Therefore, it


defines a Riemannian metric in G. It is also clear from the construction that
this metric is invariant under left-translations: noting that Lhg = DLh (g)Lg , we
see that
DLh (g)(u) · DLh (g)(v) = L−1 −1
hg DLh (g)(u) · Lhg DLh (g)(v)

= L−1 −1
g (u) · Lg (v) = u · v

for any g, h ∈ G and u, v ∈ Tg G. Let μG be the volume measure induced by this


Riemannian metric. This measure may be characterized in the following way.
Given any x = (x1 , . . . , xd ) in G, consider
⎛ ⎞
g1,1 (x) · · · g1,d (x)
⎜ ⎟ ∂ ∂
ρ(x) = det ⎝ . . . ..
.
..
. ⎠ where gi,j = · .
∂xi ∂xj
gd,1 (x) · · · gd,d (x)

Then μG (B) = B |ρ(x)| dx1 · · · dxd , for any measurable set B contained in the
domain of the local coordinates. Noting that the function ρ is continuous and
non-zero, for every local chart, it follows that μG is positive on open sets and
finite on compact sets. Moreover, since the Riemannian metric is invariant
under left-translations, the measure μG is also invariant under left-translations.
Now we move on to discussing part (ii) of Theorem 6.3.4. Let ν any measure
as in the statement. Denote by B(g, r) the open ball of center g and radius r,
relative to the distance associated with the Riemannian metric. In other words,
B(g, r) is the set of all points in G that may be connected to g by some curve of
length less than r. Fix ρ > 0 such that ν(B(e, ρ)) is finite (such a ρ does exist
166 Unique ergodicity

because G is locally compact and ν is finite on compact sets). We claim that


ν(B(g, r)) ν(B(e, ρ))
lim sup ≤ (6.3.3)
r→0 μG (B(g, r)) μG (B(e, ρ))
for every g ∈ G. This may be seen as follows.
First, the limit on the left-hand side of the inequality does not depend on
g, because both measures are assumed to be invariant under left-translations.
Therefore, it is enough to consider the case g = e. Let (rn )n be any sequence
converging to zero and such that:
ν(B(e, rn )) ν(B(e, r))
lim = lim sup . (6.3.4)
n μG (B(e, rn )) r→0 μG (B(e, r))

By the Vitali lemma (Theorem A.2.16), we may find (gj )j in B(e, ρ) and (nj )j
in N such that

1. the balls B(gj , rnj ) are contained in B(e, ρ) and they are pairwise disjoint;
2. the union of these balls has full μG -measure in B(e, ρ).

Moreover, given any a ∈ R smaller than the limit in (6.3.4), we may suppose
that the integers nj are sufficiently large that ν(B(gj , rnj )) ≥ aμG (B(gj , rnj )) for
every j. It follows that
 
ν(B(e, ρ)) ≥ ν(B(gj , rnj )) ≥ aμG (B(gj , rnj )) = aμG (B(e, ρ)).
j j

Since a may be taken arbitrarily close to (6.3.4), this proves the claim (6.3.3).
Next, we claim that ν is absolutely continuous with respect to μG . Indeed,
let b be any number larger than the quotient on the right-hand side of (6.3.3).
Given any measurable set B ⊂ G with μG (B) = 0, and given any ε > 0, let
{B(gj , rj ) : j} be a cover of B by balls of small radii, such that ν(B(gj , rj )) ≤

bμ(B(gj , rj )) and j μG (B(gj , rj )) ≤ ε. Then,
 
ν(B) ≤ ν(B(gj , rj )) ≤ b μ(B(gj , rj )) ≤ bε.
j j

Since ε > 0 is arbitrary, it follows that ν(B) = 0. Therefore, ν  μG , as claimed.


Now, by the Lebesgue derivation theorem (Theorem A.2.15),

dν 1 dν ν(B(g, r))
(g) = lim dμG = lim
μG r→0 μ(B(g, r)) B(g,r) μG r→0 μ(B(g, r))

for μ-almost every g ∈ G. The limit on the left-hand side does not depend on g
and, by (6.3.3), it is finite. Let c ∈ R be that limit. Then ν = cμG , as stated in
part (ii) of Theorem 6.3.4.
In the case when the group G is compact, it follows from Theo-
rem 6.3.4 that there exists a unique probability measure that is invariant
under left-translations, positive on open sets and finite on compact sets. This
probability measure μG is called the Haar measure of the group. For example,
6.3 Haar measure 167

the normalized Lebesgue measure is the Haar measure on the torus Td . See
also Exercises 6.3.5 and 6.3.6. The Haar measure features some additional
properties:
Corollary 6.3.5. Assume that G is compact. Then the Haar measure μG is
invariant under right-translations and under every surjective endomorphism
of G.

Proof. Given any g ∈ G, consider the probability measure (Rg )∗ μG . Observe


that Lh ◦ Rg = Rg ◦ Lh for every h ∈ G. Hence,
(Lh )∗ (Rg )∗ μG = (Rg )∗ (Lh )∗ μG = (Rg )∗ μG .
In other words, (Rg )∗ μG is invariant under every left-translation. By unique-
ness, it follows that (Rg )∗ μG = μG for every g ∈ G, as claimed.
Given any surjective endomorphism φ : G → G, consider the probability
φ∗ μG . Given any h ∈ G, choose some g ∈ φ −1 (h). Observe that Lh ◦ φ = φ ◦ Lg .
Hence,
(Lh )∗ φ∗ μG = φ∗ (Lg )∗ μG = φ∗ μG .
In other words, φ∗ μG is invariant under every left-translation. By uniqueness,
it follows that φ∗ μG = μG , as claimed.

More generally, when we do not assume G to be compact, the argument in


Corollary 6.3.5 shows that for every g ∈ G there exists λ(g) > 0 such that
(Lg )∗ μG = λ(g)μG .
The map G → (0, ∞), g → λ(g) is a group homomorphism.

6.3.3 Translations on compact metrizable groups


We call a distance d in a topological group G left-invariant if it is invariant
under every left-translation: d(Lh (g1 ), Lh (g2 )) = d(g1 , g2 ) for every g1 , g2 ,
h ∈ G. Analogously, we call a distance right-invariant if it is invariant under
every right-translation. In this section we always take the group G to be com-
pact and metrizable. We start by observing that it is always possible to choose
the distance in G in such a way that it is invariant under all the translations:
Lemma 6.3.6. If G is a compact metrizable topological group then there exists
some distance compatible with the topology of G that is both left-invariant and
right-invariant.

Proof. Let (Un )n be a countable basis of neighborhoods of the unit element e


of G. By Lemma A.3.4, for every n there exists a continuous function ϕn : G →
[0, 1] such that ϕn (e) = 0 and ϕn (z) = 1 for every z ∈ G \ Un . Define


ϕ : G → [0, 1], ϕ(z) = 2−n ϕn (z).
n=1
168 Unique ergodicity

Then, ϕ is continuous and ϕ(e) = 0 < ϕ(z) for every z  = e. Now define

d(x, y) = sup{|ϕ(gxh) − ϕ(gyh)| : (g, h) ∈ G2 } (6.3.5)

for every x, y ∈ G. The supremum is finite, since we take G to be compact. It


is easy to see that d is a distance in G. Indeed, note that d(x, y) = 0 means that
ϕ(gxh) = ϕ(gyh) for every g, h ∈ G. In particular, taking g = e and h = y−1 , we
get that ϕ(xy−1 ) = ϕ(e). By the construction of ϕ, this implies that x = y. All the
other axioms of the notion of distance follow directly from the definition of d.
It is also clear from the definition that d is invariant under both left-translations
and right-translations.
We are left to prove that the distance d is compatible with the topology of
the group G. It is easy to check that, given any neighborhood V of a point
x ∈ G, there exists δ > 0 such that B(x, δ) ⊂ V. Indeed, since U = x−1 V is a
neighborhood of e ∈ G, the properties of ϕ ensure that there exists δ > 0 such
that ϕ(z) ≤ 1 − δ for every z ∈ / U. Then, y ∈ / V implies that ϕ(x−1 y) ≤ 1 − δ
or, in other words, that |ϕ(e) − ϕ(x−1 y)| ≥ δ. Taking g = x−1 and h = e in the
definition (6.3.5), we see that this last inequality implies that d(x, y) ≥ δ, that
is, y ∈/ B(x, δ). Now let us check the converse: given x ∈ G and δ > 0, there
exists some neighborhood V of x contained in B(x, δ). By continuity, for every
pair (g, h) ∈ G2 there exists an open neighborhood U × V × W of (g, x, h) in G3
such that

|ϕ(gxh) − ϕ(g x h )| ≤ δ/2 for every (g , x , h ) ∈ U × V × W. (6.3.6)

The sets U × W obtained in this way, with x fixed and g, h variable, form
an open cover of G2 . Let Ui × Wi , i = 1, . . . , k be a finite subcover and Vi ,
i = 1, . . . , k be the corresponding neighborhoods of x. Take V = ki=1 Vi and
consider any y ∈ V. Given any (g, h) ∈ G2 , the condition (6.3.6) implies that
|ϕ(gxh) − ϕ(gyh)| ≤ δ/2. It follows that d(x, y) ≤ δ/2 and, consequently,
y ∈ B(x, δ).

Example 6.3.7. Given a matrix A ∈ GL(d, R), denote by A its operator
norm, that is, A = sup{Av : v = 1}. Observe that OA = A = AO
for every O in the orthogonal group O(d, R). Define

d(A, B) = log(1 + A−1 B − id  + B−1 A − id ).

Then d is a distance in GL(d, R), invariant under left-translations:

d(CA, CB) = log(1 + A−1 C−1 CB − id  + B−1 C−1 CA − id ) = d(A, B)

for every C ∈ GL(d, R). This distance is not invariant under right-translations
in GL(d, R) (Exercise 6.3.3). However, it is right-invariant (and left-invariant)
6.3 Haar measure 169

restricted to the orthogonal group O(d, R): for every O ∈ O(d, R),

d(AO, CO) = log(1 + O−1 A−1 BO − id  + O−1 B−1 AO − id )


= log(1 + O−1 (A−1 B − id )O + O−1 (B−1 A − id )O)
= d(A, B).

Theorem 6.3.8. Let G be a compact metrizable topological group and let


g ∈ G. The following conditions are equivalent:

(i) Lg is uniquely ergodic;


(ii) Lg is ergodic with respect to μG ;
(iii) the subgroup {gn : n ∈ Z} generated by g is dense in G.

Proof. It is clear that (i) implies (ii). To prove that (ii) implies (iii), consider the
invariant distance d given by Lemma 6.3.6. Let H be the closure of {gn : n ∈ Z}
and consider the continuous function

ϕ(x) = min{d(x, y); y ∈ H}.

Observe that this function is invariant under Lg : using that gH = H, we get that

ϕ(x) = min{d(x, y) : y ∈ H} = min{d(gx, gy) : y ∈ H}


= min{d(gx, z) : z ∈ H} = ϕ(gx) for every x ∈ G.
Since H is closed, ϕ(x) = 0 if and only if x ∈ H. If H = G then μG (G \ H) > 0,
as the Haar measure is positive on open sets. In that case, the function ϕ is
not constant at μG -almost every point, which implies that Lg cannot be ergodic
with respect to μG .
Finally, to prove that (iii) implies (i), let us show that if μ is a probability
measure invariant under Lg then μ = μG . For that, it suffices to check that μ
is invariant under every left-translation in G. Fix h ∈ G. Since μ is invariant
under Lg ,
 
ϕ(x) dμ(x) = ϕ(gn x) dμ(x)

for every n ∈ N and every continuous function ϕ : G → R. On the other hand,


the hypothesis ensures that there exists a sequence of natural numbers nj → ∞
such that gnj → h. Given any (uniformly) continuous function ϕ : G → R and
any ε > 0, fix δ > 0 such that |ϕ(x) − ϕ(y)| < ε whenever d(x, y) < δ. If j is
sufficiently large,

d(gnj x, hx) = d(gnj , h) < δ for every x ∈ G.

Hence, |ϕ(gnj x) − ϕ(hx)| < ε for every x and, consequently,


   
      n  
 ϕ(x) − ϕ(hx) dμ =  ϕ(g j x) − ϕ(hx) dμ < ε.

170 Unique ergodicity

Since ε is arbitrary, it follows that ϕ dμ = ϕ ◦ Lh dμ for every continuous


function ϕ and every h ∈ G. This implies that μ is invariant under Lh for every
h ∈ G, as claimed.

6.3.4 Odometers
Odometers, or adding machines, are mathematical models for the mechanisms
that register the distance (number of kilometers) travelled by a car or the
amount of electricity (number of energy units) consumed in a house. They
come with a dynamic, which consists in advancing the counter by one unit
each time. The main difference with respect to real-life odometers is that our
idealized counters allow for an infinite number of digits.
Fix any number basis d ≥ 2, for example d = 10, and consider the set X =
{0, 1, . . . , d − 1}, endowed with the discrete topology. Let M = X N be the set of
all sequences α = (αn )n with values in X, endowed with the product topology.
This topology is metrizable: it is compatible, for instance, with the distance
defined in M by

d(α, α  ) = 2−N(α,α ) where N(α, α  ) = min{j ≥ 0 : αj = αj }. (6.3.7)
Observe also that M is compact, being the product of compact spaces (theorem
of Tychonoff).
Let us introduce in M the following operation of “sum with transport”: given
α = (αn )n and β = (βn )n in M, define α + β = (γn )n as follows. First,
• if α0 + β0 < d then γ0 = α0 + β0 and δ1 = 0;
• if α0 + β0 ≥ d then γ0 = α0 + β0 − d and δ1 = 1.

Next, for every n ≥ 1,


• if αn + βn + δn < d then γn = αn + βn + δn and δn+1 = 0;
• if αn + βn + δn ≥ d then γn = αn + βn + δn − d and δn+1 = 1.

The auxiliary sequence (δn )n corresponds precisely to the transports. The map
+ : M × M → M defined in this way turns M into an abelian topological group
and the distance (6.3.7) is invariant under all the translations (Exercise 6.3.8).
Now consider the “translation by 1” f : M → M defined by
 
f (αn )n = (αn )n + (1, 0, . . . , 0, . . . ) = (0, . . . , 0, αk + 1, αk+1 , . . . , αn , . . . )
where k ≥ 0 is the smallest value of n such that αn < d − 1; if there exists no
such k, that is, if (αn )n is the constant sequence equal to d − 1, then the image
f ((αn )n ) is the constant sequence equal to 0. We leave it to the reader to check
that this transformation f : M → M is uniquely ergodic (Exercise 6.3.9).
It is possible to genralize this construction somewhat, in the following
.
way. Take M = ∞ n=0 {0, 1, . . . , dn − 1}, where (dn )n is any sequence of integer
numbers larger than 1. Just as in the previous particular case, this set has the
6.3 Haar measure 171

Ik–1

I0

Figure 6.1. Example of the piling method

structure of a metrizable compact abelian group and the “translation by 1” is


uniquely ergodic.

Example 6.3.9. A (simple) pile in an interval1 I is an ordered family S of


pairwise disjoint subintervals I0 , . . . , Ik−1 with the same length and whose
union is I. Write Ik = I0 . We associate with S the transformation f : I → I
whose restriction to each Ij is the translation mapping Ij to Ij+1 . Graphically,
we represent the subintervals “piled up” on top of each other in order: from
the bottom I0 to the top Ik−1 . Then f is nothing but the translation “upwards”,
except at the top of the pile. See the left-hand side of Figure 6.1.
Let us consider a sequence (Sn )n of piles in the same interval I, constructed
as follows. Fix any integer number d ≥ 2. Take S0 = {I}. For each n ≥ 1, take
as Sn the pile obtained by dividing Sn−1 into d columns, all with the same
width, and piling them up on top of each other. This procedure is described on
the right-hand side of Figure 6.1 for d = 3. Let fn : I → I be the transformation
associated with each Sn . We leave it to the reader to show (Exercise 6.3.10) that
the sequence (fn )n converges at every point to a transformation f : I → I that
preserves the Lebesgue measure. Moreover, this transformation f is uniquely
ergodic.

This is only one of the simplest applications of the so-called piling method,
which is a very effective tool to produce examples with interesting properties.
The reader may find a detailed discussion of this method in Section 6 of
Friedman [Fri69]. Another application, a bit more elaborate, will be given in
Example 8.2.3.

Example 6.3.10 (Substitutions). We are going to mention briefly a construc-


tion of a combinatorial nature that generalizes the definition of odometer and
provides several other interesting examples of minimal and even uniquely
ergodic systems. For more information, including about the relations between

1 For definiteness, take all intervals to be closed on the left and open on the right.
172 Unique ergodicity

such systems and the odometer, we recommend the book of Queffélec [Que87]
and the paper of Ferenczi, Fisher and Talet [FFT09].
We call a substitution in a finite alphabet A any map associating with each
letter α ∈ A a word s(α) formed by a finite number of letters of A. A few
examples, for A = {0, 1}: Thue–Morse substitution s(0) = 01 and s(1) = 10;
Fibonacci substitution s(0) = 01 and s(1) = 0; Feigenbaum substitution s(0) =
11 and s(1) = 10; Cantor substitution s(0) = 010 and s(1) = 111; and Chacon
substitution s(0) = 0010 and s(1) = 1. We may iterate a substitution by defining
s1 (α) = s(α) and
sk+1 (α) = s(α1 ) · · · s(αn ) if sk (α) = α1 · · · αn .
We call a substitution s primitive (or aperiodic) if there exists k ≥ 1 such that
for any α, β ∈ A the word sk (α) contains the letter β.
Let A be endowed with the discrete topology and  = AN be the space of
all sequences in A, endowed with the product topology. Denote by S :  → 
the map induced in that space by a given substitution s: the image of each
(a0 , . . . , an , . . . ) ∈  is the sequence of the letters that constitute the word
obtained when one concatenates the finite words s(a0 ), . . . , s(an ), . . . Suppose
that there exists some letter α0 ∈ A such that the word s(α0 ) has length larger
than 1 and starts with the letter α0 . That is the case for all the examples listed
above. Then (Exercise 6.3.11), S admits a unique fixed point x = (xn )n with
x0 = α0 .
Consider the restriction σ : X → X of the shift map σ :  →  to the
closure X ⊂  of the orbit {σ n (x) : n ≥ 0} of the point x. If the substitution s is
primitive then σ : X → X is minimal and uniquely ergodic (see Section 5
in [Que87]). That holds, for instance, for the Thue–Morse, Fibonacci and
Feigenbaum substitutions.

6.3.5 Exercises
6.3.1. Let G be a manifold and · be a group operation in G such that the map (g, h) →
g · h is of class C1 . Show that g → g−1 is also of class C1 .
6.3.2. Let G be a compact topological space such that every point admits a countable
basis of neighborhoods and let · be a group operation in G such that the map
(g, h) → g · h is continuous. Show that g → g−1 is also continuous.
6.3.3. Show that the distance d in Example 6.3.7 is not right-invariant.
6.3.4. Prove part (iii) of Theorem 6.3.4: a locally compact group G is compact if and
only if its Haar measure is finite.
6.3.5. Identify GL(1, R) with the multiplicative group R \ {0}. Check that the measure
μ defined on GL(1, R) by
 
ϕ(x)
ϕ dμ = dx
GL(1,R) R\{0} |x|

is both left-invariant and right-invariant. Find a measure invariant under all the
translations of GL(1, C).
6.4 Theorem of Weyl 173

6.3.6. Identify GL(2, R) with {(a11 , a12 , a21 , a22 ) ∈ R4 : a11 a22 − a12 a21  = 0}, in such
a way that det(a11 , a12 , a21 , a22 ) = a11 a22 − a12 a21 . Show that the measure μ
defined by
 
ϕ(x11 , x12 , x21 , x22 )
ϕ dμ = dx11 dx12 dx21 dx22
GL(2,R) | det(x11 , x12 , x21 , x22 )|2
is both left-invariant and right-invariant. Find a measure invariant under all the
translations of GL(2, C).
6.3.7. Let G be a compact metrizable group and let g ∈ G. Check that the following
conditions are equivalent:
(1) Lg is uniquely ergodic;
(2) Lg is transitive: there is x ∈ G such that {gn x : n ∈ Z} is dense in G;
(3) Lg is minimal: {gn y : n ∈ Z} is dense in G for every y ∈ G.
6.3.8. Show that the operation + : M × M → M defined in Section 6.3.4 is continuous
and endows M with the structure of an abelian group. Moreover, every
translation in this group preserves the distance defined in (6.3.7).
6.3.9. Let f : M → M be an odometer, as defined in Section 6.3.4, with d = 10. Given
b0 , . . . , bk−1 in {0, . . . , 9}, denote by [b0 , . . . , bk−1 ] the set of all sequences β ∈ M
with β0 = b0 , . . . , βk−1 = bk−1 . Show that
1   1
lim # 0 ≤ j < n : f j (x) ∈ [b0 , . . . , bk−1 ] = k
n n 10
for every x ∈ M. Moreover, this limit is uniform. Conclude that f admits a unique
invariant probability measure and calculate that measure explicitly.
6.3.10. Check the claims in Example 6.3.9.
6.3.11. Prove that if s is a substitution in a finite alphabet A and α ∈ A is such that
s(α) has length larger than 1 and starts with the letter α, then the transformation
S :  →  defined in Example 6.3.10 admits a unique fixed point that starts
with the letter α ∈ A.

6.4 Theorem of Weyl


In this section we use ideas that were discussed previously to prove a beautiful
theorem of Hermann Weyl [Wey16] about the distribution of polynomial
sequences.
Consider any polynomial function P : R → R with real coefficients and
degree d ≥ 1:
P(x) = a0 + a1 x + a2 x2 + · · · + ad xd .
Composing P with the canonical projection R → S1 , we obtain a polynomial
function P∗ : R → S1 with values on the circle S1 = R/Z. Define
zn = P∗ (n), for every n ≥ 1.
We may think of zn as the fractional part of the real number P(n). We want to
understand how the sequence (zn )n is distributed on the circle.
174 Unique ergodicity

Definition 6.4.1. We say that a sequence (xn )n in S1 is equidistributed if, for


any continuous function ϕ : S1 → R,

1
n
lim ϕ(xj ) = ϕ(x) dx.
n→∞ n
j=1

According to Exercise 6.4.1, this is equivalent to saying that, for every


segment I ⊂ S1 , the fraction of terms of the sequence that are in I is equal
to the corresponding length m(I).

Theorem 6.4.2 (Weyl). If at least one of the coefficients a1 , a2 , . . . , ad is


irrational then the sequence zn = P∗ (n), n ∈ N is equidistributed.

In order to develop some intuition about this theorem, let us start by


considering the special case d = 1. In this case the polynomial function reduces
to P(x) = a0 + a1 x. Let us consider the transformation

f : S1 → S1 , f (θ ) = θ + a1 .

By assumption, the coefficient a1 is irrational. Therefore, as we have seen


in Section 6.3.1, this transformation admits a unique invariant probability
measure, which is the Lebesgue measure m. Consequently, given any
continuous function ϕ : S1 → R and any point θ ∈ S1 ,

1
n
lim ϕ(f j (θ )) = ϕ dm.
n→∞ n
j=1

Take θ = a0 . Then f j (θ ) = a0 + a1 j = zj . Hence, the previous relation yields



1
n
lim ϕ(zj ) = ϕ dm.
n→∞ n
j=1

This is precisely what it means to say that zj is equidistributed.

6.4.1 Ergodicity
Now we extend the previous arguments to any degree d ≥ 1. Consider the
transformation f : Td → Td defined on the d-dimensional torus Td by the
following expression:

f (θ1 , θ2 , . . . , θd ) = (θ1 + α, θ2 + θ1 , . . . , θd + θd−1 ), (6.4.1)

where α is an irrational number to be chosen later. Note that f is invertible: the


inverse is given by

f −1 (θ1 , θ2 , . . . , θd ) = (θ1 −α, θ2 −θ1 +α, . . . , θd −θd−1 +· · ·+(−1)d−1 θ1 +(−1)d α).


6.4 Theorem of Weyl 175

Note also that the derivative of f at each point is given by the matrix
⎛ ⎞
1 0 0 ··· 0 0
⎜ 1 1 0 ··· 0 0 ⎟
⎜ ⎟
⎜ 0 1 1 ··· 0 0 ⎟
⎜ ⎟,
⎜ . . . . . . ⎟
⎝ .. .. .. .. .. .. ⎠
0 0 0 ··· 1 1
whose determinant is 1. This ensures that f preserves the Lebesgue measure
on the torus (recall Lemma 1.3.5).
Proposition 6.4.3. The Lebesgue measure on Td is ergodic for f .

Proof. We are going to use a variation of the Fourier series expansion


argument in Proposition 4.2.1. Let ϕ : Td → R be any function in L2 (m). Write

ϕ(θ ) = an e2πin·θ
n∈Zd

with θ = (θ1 , . . . , θd ), n = (n1 , . . . , nd ) and n · θ = n1 θ1 + · · · + nd θd . The L2 -norm


of ϕ is given by
 
|an |2 = |ϕ(θ )|2 dθ1 · · · dθd < ∞. (6.4.2)
n∈Zd

Observe that

ϕ(f (θ )) = an e2πi(n1 (θ1 +α)+n2 (θ2 +θ1 )+···+nd (θd +θd−1 ))
n∈Zd

= an e2πin1 α e2πiL(n)·θ ,
n∈Zd

where L(n) = (n1 + n2 , n2 + n3 , . . . , nd−1 + nd , nd ). Suppose that the function ϕ


is invariant, that is, ϕ ◦ f = ϕ at almost every point. Then
an e2πin1 α = aL(n) for every n ∈ Zd . (6.4.3)
This implies that an and aL(n) have the same absolute value. On the other hand,
the integrability relation (6.4.2) implies that there exists at most a finite number
of terms with any given absolute value different from zero. It follows that an =
0 for every n ∈ Zd whose orbit Lj (n), j ∈ Z is infinite. Observing the expression
of L, we deduce that an = 0 except, possibly, if n2 = · · · = nd = 0. For the
remaining values of n, that is, for every n = (n1 , 0, . . . , 0), one has that L(n) = n
and, thus, the relation (6.4.3) becomes
an = an e2πin1 α .
Since α is irrational, the last factor is different from 1 whenever n1 is non-zero.
Therefore, this relation implies that an = 0 also for n = (n1 , 0, . . . , 0) with n1  =
0. In this way, we have shown that if ϕ is an invariant function then all the terms
176 Unique ergodicity

in its Fourier series vanish except, possibly, the constant term. This means that
ϕ is constant at almost every point, which proves that the Lebesgue measure is
ergodic for f .

6.4.2 Unique ergodicity


The last step in the proof of Theorem 6.4.2 is the following result:

Proposition 6.4.4. The transformation f is uniquely ergodic: the Lebesgue


measure on the torus is the unique invariant probability measure.

Proof. The proof is by induction on the degree d of the polynomial P. The case
of degree 1 was treated previously. Therefore, we only need to explain how the
case of degree d may be deduced from the case of degree d − 1. For that, we
write Td = Td−1 × S1 and
f : Td−1 × S1 → Td−1 × S1 , f (θ0 , η) = (f0 (θ0 ), η + θd−1 ), (6.4.4)
where θ0 = (θ1 , . . . , θd−1 ) and f0 (θ0 ) = (θ1 + α, θ2 + θ1 , . . . , θd−1 + θd−2 ). By
induction, the transformation
f0 : Td−1 → Td−1
is uniquely ergodic. Let us denote by π : Td → Td−1 the projection π(θ ) = θ0 .

Lemma 6.4.5. For any probability measure μ invariant under f , the projection
π∗ μ coincides with the Lebesgue measure m0 on Td−1 .

Proof. Given any measurable set E ⊂ Td−1 ,


(π∗ μ)(f0−1 (E)) = μ(π −1 f0−1 (E)).
Using that π ◦ f = f0 ◦ π and the fact that μ is f -invariant, we get that the
expression on the right-hand side is equal to
μ(f −1 π −1 (E)) = μ(π −1 (E)) = (π∗ μ)(E).
Therefore, (π∗ μ)(f0−1 (E)) = (π∗ μ)(E) for every measurable subset E, that is,
π∗ μ is invariant under f0 . It is clear that π∗ μ is a probability measure. Since f0
is uniquely ergodic, it follows that π∗ μ coincides with the Lebesgue measure
m0 on Td−1 .

Now suppose that μ, besides being invariant, is also ergodic for f . By


Theorem 3.2.6, and by ergodicity, the set G(μ) ⊂ M of all points θ ∈ Td such
that

1
n−1
lim ϕ(f j (θ )) = ϕ dμ for any continuous function ϕ : Td → R
n n
j=0
(6.4.5)
6.4 Theorem of Weyl 177

has full measure. Let G0 (μ) be the set of all θ0 ∈ Td−1 such that G(μ) intersects
{θ0 } × S1 . In other words, G0 (μ) = π(G(μ)). It is clear that π −1 (G0 (μ))
contains G(μ) and, thus, has full measure. Hence, using Lemma 6.4.5,
m0 (G0 (μ)) = μ(π −1 (G0 (μ))) = 1. (6.4.6)
For the same reasons, this relation remains valid for the Lebesgue measure:
m0 (G0 (m)) = m(π −1 (G0 (m))) = 1. (6.4.7)
The identities (6.4.6) and (6.4.7) imply that the intersection between G0 (μ)
and G0 (m) has full measure for m0 . So, in particular, these two sets cannot be
disjoint. Let θ0 be any point in the intersection. By definition, G(μ) intersects
{θ0 } × S1 . But the next result asserts that G(m) contains {θ0 } × S1 :
Lemma 6.4.6. If θ0 ∈ G0 (m) then {θ0 } × S1 is contained in G(m).

Proof. The crucial observation is that the measure m is invariant under every
transformation of the form
Rβ : Td−1 × S1 → Td−1 × S1 , (ζ , η) → (ζ , η + β).
The hypothesis θ0 ∈ G0 (m) means that there exists some η ∈ S1 such that
(θ0 , η) ∈ G(m), that is,

1
n−1
lim ϕ(f j (θ0 , η)) = ϕ dm
n n
j=0

for every continuous function ϕ : Td → R. Any other point of {θ0 } × S1 may


be written as (θ0 , η + β) = Rβ (θ0 , η) for some β ∈ S1 . Recalling (6.4.1), we see
that
   
f Rβ (τ0 , ζ ) = (τ1 + α, τ2 + τ1 , . . . , τd−1 + τd−2 , ζ + β + τd−1 ) = Rβ f (τ0 , ζ )
for every (τ0 , ζ ) ∈ Td−1 × S1 . Hence, by induction,
   
f j (θ0 , η + β) = f j Rβ (θ0 , η) = Rβ f j (θ0 , η)
for every j ≥ 1. Therefore, given any continuous function ϕ : Td → R,

1 1
n−1 n−1
lim ϕ(f j (θ0 , η + β)) = lim (ϕ ◦ Rβ )(f j (θ0 , η))
n n n j=0
j=0
 
= (ϕ ◦ Rβ ) dm = ϕ dm.

This proves that (θ0 , η + β) is in G(m) for every β ∈ S1 , as stated.

It follows from what we said so far that G(μ) and G(m) intersect each other
at some point of {θ0 } × S1 . In view of the definition (6.4.5), this implies that the
two measures have the same integral for every continuous function. According
to Proposition A.3.3, this implies that μ = m, as we wanted to prove.
178 Unique ergodicity

Corollary 6.4.7. The orbit of every point θ ∈ Td is equidistributed on the torus


Td , in the sense that

1
n−1
lim ψ(f j (θ )) = ψ dm
n n
j=0

for every continuous function ψ : Td → R.

Proof. This follows immediately from Propositions 6.1.1 and 6.4.4.

6.4.3 Proof of the theorem of Weyl


To complete the proof of Theorem 6.4.2, we introduce the polynomial
functions p1 , . . . , pd defined by
pd (x) = P(x) and
(6.4.8)
pj−1 (x) = pj (x + 1) − pj (x) for j = 2, . . . , d.
Lemma 6.4.8. The polynomial pj (x) has degree j, for every 1 ≤ j ≤ d.
Moreover, p1 (x) = αx + β with α = d!ad .

Proof. By definition, pd (x) = P(x) has degree d. Hence, to prove the first claim
it suffices to show that if pj (x) has degree j then pj−1 (x) has degree j − 1. In
order to do that, let
pj (x) = bj xj + bj−1 xj−1 + · · · + b0 ,
where bj = 0. Then
pj (x + 1) = bj (x + 1)j + bj−1 (x + 1)j−1 + · · · + b0
= bj xj + (jbj + bj−1 )xj−1 + · · · + b0 .
Subtracting one expression from the other, we get that
pj−1 (x) = (jbj )xj−1 + bj−2 xj−2 + · · · + b0
has degree j − 1. This proves the first claim in the lemma. This calculation
also shows that the main coefficient of pj−1 (x) (the coefficient of the term with
highest degree) can be obtained multiplying by j the main coefficient of pj (x).
Consequently, the main coefficient of p1 must be equal to d!aq , as claimed in
the last part of the lemma.

Lemma 6.4.9. For every n ≥ 0,


   
f n p1 (0), p2 (0), . . . , pd (0) = p1 (n), p2 (n), . . . , pd (n) .

Proof. The proof is by induction on n. Since the case n = 0 is obvious, we only


need to treat the inductive step. Recall that f was defined in (6.4.1). If
f n−1 (p1 (0), p2 (0), . . . , pd (0)) = (p1 (n − 1), p2 (n − 1), . . . , pd (n − 1))
6.4 Theorem of Weyl 179

then f n (p1 (0), p2 (0), . . . , pd (0)) is equal to


(p1 (n − 1) + α, p2 (n − 1) + p1 (n − 1), . . . , pd (n − 1) + pd−1 (n − 1)).
Using the definition (6.4.8) and Lemma 6.4.8, we find that this expression is
equal to
(p1 (n), p2 (n), . . . , pd (n)),
and that proves the lemma.

Finally, we are ready to prove that the sequence zn = P∗ (n), n ∈ N is


equidistributed. We treat two cases separately.
First, suppose that the main coefficient ad of P(x) is irrational. Then the
number α in Lemma 6.4.8 is irrational and, thus, the results in Section 6.4.2
are valid for the transformation f : Td → Td . Let ϕ : S1 → R be any continuous
function. Consider ψ : Td → R defined by
ψ(θ1 , θ2 , . . . , θd ) = ϕ(θd ).
Fix θ = (p1 (0), p2 (0), . . . , pd (0)). Using Lemma 6.4.9 and Corollary 6.4.7, we
get that
 
1 1
n−1 n−1
lim ϕ(zn ) = lim ψ(f (θ )) = ψ dm = ϕ dx.
n
n n n n
j=0 j=0

This ends the proof of Theorem 6.4.2 in the case when ad is irrational.
Now suppose that ad is rational. Write ad = p/q with p ∈ Z and q ∈ N. It is
clear that we may write zn as a sum
zn = xn + yn , xn = ad nd and yn = Q∗ (n)
where Q(x) = a0 +a1 x+· · ·+ad−1 xd−1 and Q∗ : R → S1 is given by Q∗ = π ◦Q.
To begin with, observe that
p p
xn+q − xn = (n + q)d − nd
q q
is an integer, for every n ∈ N. This means that the sequence xn , n ∈ N is periodic
(with period q) in the circle R/Z. In particular, it takes no more than q distinct
values. Observe also that, since ad is rational, the hypothesis of the theorem
implies that some of the coefficients a1 , . . . , ad−1 of Q are irrational. Hence, by
induction on the degree, the sequence yn , n ∈ N is equidistributed. More than
that, the subsequences
yqn+r = Q∗ (qn + r), n∈Z
are equidistributed for every r ∈ {0, 1, . . . , q − 1}. In fact, as the reader may
readily check, these sequences may be written as ynq+r = Q(r) ∗ (n) for some
(r)
polynomial Q that also has degree d − 1 and, thus, the induction hypothesis
applies to each one of them as well. From these two observations it follows
180 Unique ergodicity

that every subsequence zqn+r , n ∈ Z is equidistributed. Consequently, zn , n ∈ N


is also equidistributed. This completes the proof of Theorem 6.4.2.

6.4.4 Exercises
6.4.1. Show that a sequence (zj )j is equidistributed on the circle if and only if
1
lim #{1 ≤ j ≤ n : zj ∈ I} = m(I)
n n

for every segment I ⊂ S1 , where m(I) denotes the length of I.



6.4.2. Show that the sequence ( n mod Z)n is equidistributed on the circle. Does the
same hold for the sequence (log n mod Z)n ?
6.4.3. Koksma [Kok35] proved that the sequence (an mod Z)n is equidistributed on the
circle for Lebesgue-almost every a√> 1. That is not true for every a > 1. Indeed,
consider the golden ratio a = (1 + 5)/2. Check that the sequence (an mod Z)n
converges to 0 ∈ S1 when n → ∞; in particular, it is not equidistributed on the
circle.
7
Correlations

The models of dynamical systems that interest us the most, transformations


and flows, are deterministic: the state of the system at any time determines the
whole future trajectory; when the system is invertible, the past trajectory is
equally determined. However, these systems may also present stochastic (that
is, “partly random”) behavior: at some level coarser than that of individual
trajectories, information about the past is gradually lost as the system is
iterated. That is the subject of the present chapter.
In probability theory one calls the correlation between two random variables
X and Y the number
 
C(X, Y) = E (X − E[X])(Y − E[Y]) = E[XY] − E[X]E[Y].
Note that the expression (X − E[X])(Y − E[Y]) is positive if X and Y
are on the same side (either larger or smaller) of their respective means,
E[X] and E[Y], and it is negative otherwise. Therefore, the sign of C(X, Y)
indicates whether the two variables exhibit, predominantly, the same behavior
or opposite behaviors, relative to their means. Furthermore, correlation close
to zero indicates that the two behaviors are little, if at all, related to each other.
Given an invariant probability measure μ of a dynamical system f : M →
M and given measurable functions ϕ, ψ : M → R, we want to analyze the
evolution of the correlations
Cn (ϕ, ψ) = C(ϕ ◦ f n , ψ)
when time n goes to infinity. We may think of ϕ and ψ as quantities that are
measured in the system, such as temperature, acidity (pH), kinetic energy,
and so forth. Then Cn (ϕ, ψ) measures how much the value of ϕ at time n
is correlated with the value of ψ at time zero; to what extent one value
“influences” the other.
For example, if ϕ = XA and ψ = XB are characteristic functions, then ψ(x)
provides information on the position of the initial point x, whereas ϕ(f n (x))
informs on the position of its n-th iterate f n (x). If the correlation Cn (ϕ, ψ) is
small, then the first information is of little use to make predictions about the
182 Correlations

second one. That kind of behavior, where correlations approach zero as time n
increases, is quite common in important models, as we are going to see.
We start by introducing the notions of (strong) mixing and weak mixing
systems, and by studying their basic properties (Section 7.1). In Sections 7.2
and 7.3 we discuss these notions in the context of Markov shifts, which
generalize Bernoulli shifts, and of interval exchanges, which are an extension
of the class of circle rotations. In Section 7.4 we analyze, in quantitative terms,
the speed of decay of correlations for certain classes of functions.

7.1 Mixing systems


Let f : M → M be a measurable transformation and μ be an invariant
probability measure. The correlations sequence of two measurable functions
ϕ, ψ : M → R is defined by
  
Cn (ϕ, ψ) = (ϕ ◦ f )ψ dμ − ϕ dμ ψ dμ,
n
n ∈ N. (7.1.1)

We say that the system (f , μ) is mixing if


 
lim Cn (XA , XB ) = lim μ f −n (A) ∩ B − μ(A)μ(B) = 0, (7.1.2)
n n

for any measurable sets A, B ⊂ M. In other words, when n grows the probability
of the event {x ∈ B and f n (x) ∈ A} converges to the product of the probabilities
of the events {x ∈ B} and {f n (x) ∈ A}.
Analogously, given a flow f t : M → M, t ∈ R and an invariant probability
measure μ, we define
  
Ct (ϕ, ψ) = (ϕ ◦ f )ψ dμ − ϕ dμ ψ dμ,
t
t∈R (7.1.3)

and we say that the system (f t , μ) is mixing if


 
lim Ct (XA , XB ) = lim μ f −t (A) ∩ B − μ(A)μ(B) = 0, (7.1.4)
t→+∞ t→+∞

for any measurable sets A, B ⊂ M.

7.1.1 Properties
A mixing system is necessarily ergodic. Indeed, suppose that there exists some
invariant set A ⊂ M with 0 < μ(A) < 1. Taking B = Ac , we get f −n (A) ∩ B = ∅
for every n. Then, μ(f −n (A) ∩ B) = 0 for every n, whereas μ(A)μ(B) = 0. In
particular, (f , μ) is not mixing. The example that follows shows that ergodicity
is strictly weaker than mixing:
Example 7.1.1. Let θ ∈ R be an irrational number. As we have seen in
Section 4.2.1, the rotation Rθ : S1 → S1 is ergodic with respect to the Lebesgue
measure m. However, (Rθ , m) is not mixing. Indeed, if A, B ⊂ S1 are two small
intervals then R−n −n
θ (A) ∩ B is empty and, thus, m(Rθ (A) ∩ B) = 0 for infinitely
7.1 Mixing systems 183

many values of n. Since m(A)m(B)  = 0, it follows that the condition in (7.1.2)


does not hold.

It is clear from the definition (7.1.2) that if (f , μ) is mixing then (f k , μ) is


mixing for every k ∈ N. The corresponding statement for ergodicity is false:
the map f (x) = 1 − x on the set {0, 1} is ergodic with respect to the measure
(δ0 + δ1 )/2 but the second iterate f 2 is not.

Lemma 7.1.2. Assume that limn μ(f −n (A) ∩ B) = μ(A)μ(B) for every pair of
sets A and B in an algebra A that generates the σ -algebra of measurable sets.
Then (f , μ) is mixing.
Proof. Let C be the family of all measurable sets A such that μ(f −n (A) ∩ B)
converges to μ(A)μ(B) for every B ∈ A. By assumption, C contains A. We
claim that C is a monotone class. Indeed, let A = k Ak be the union of an
increasing sequence A1 ⊂ · · · ⊂ Ak ⊂ · · · of elements of C. Given ε > 0, there
exists k0 ≥ 1 such that
μ(A) − μ(Ak ) = μ(A \ Ak ) < ε
for every k ≥ k0 . Moreover, for every n ≥ 1,
     
μ f −n (A) ∩ B − μ f −n (Ak ) ∩ B = μ f −n (A \ Ak ) ∩ B
≤ μ(f −n (A \ Ak )) = μ(A \ Ak ) < ε.
For each fixed k ≥ k0 , the fact that Ak ∈ C ensures that there exists n(k) ≥ 1
such that
 
|μ f −n (Ak ) ∩ B − μ(Ak )μ(B)| < ε for every n ≥ n(k).
Adding these three inequalities we conclude that
 
|μ f −n (A) ∩ B − μ(A)μ(B)| < 3ε for every n ≥ n(k0 ).
Since ε > 0 is arbitrary, this shows that A ∈ C. In the same way, one proves
that the intersection of any decreasing sequence of elements of C is still an
element of C. So, C is indeed a monotone class. By the monotone class theorem
(Theorem A.1.18), it follows that C contains every measurable set: for every
measurable set A one has
 
lim μ f −n (A) ∩ B = μ(A)μ(B) for every B ∈ A.
n

All that is left to do is to deduce that this property holds for every measurable
set B. This follows from precisely the same kind of arguments as we have just
detailed, as the reader may readily check.

Example 7.1.3. Every Bernoulli shift (recall Section 4.2.3) is mixing. Indeed,
given any two cylinders A = [p; Ap , . . . , Aq ] and B = [r; Br , . . . , Bs ],
 
μ f −n (A) ∩ B = μ([r; Br , . . . , Bs , X, . . . , X, Ap , . . . , Aq ])
= μ([r; Br , . . . , Bs ])μ([p; Ap , . . . , Aq ]) = μ(A)μ(B)
184 Correlations

for every n > s − p. Let A be the algebra generated by the cylinders: its
elements are the finite pairwise disjoint unions of cylinders. It follows from
what we have just said that μ(f −n (A) ∩ B) = μ(A)μ(B) for every pair of sets
A, B ∈ A and every n sufficiently large. Since A generates the σ -algebra of
measurable sets, we may use Lemma 7.1.2 to conclude that the system is
mixing, as stated.
Example 7.1.4. Let g : S1 → S1 be defined by g(x) = kx, where k ≥ 2 is
an integer number, and let m be the Lebesgue measure on the circle. The
system (g, m) is equivalent to a Bernoulli shift, in the following sense. Let
X = {0, 1, . . . , k − 1} and let f : M → M be the shift map in M = X N . Consider
the product measure μ = ν N in M, where ν is the probability measure defined
by ν(A) = #A/k for every A ⊂ X. The map

   an−1
h:M→S , 1
h (an )n =
n=1
kn
is a bijection, restricted to a full measure subset, and both h and its inverse
are measurable. Moreover, h∗ μ = m and h ◦ f = g ◦ h at almost every point.
We say that h is an ergodic equivalence between (g, m) and (f , μ). Through
it, properties of one system may be translated to corresponding properties for
the other system. In particular, recalling Example 7.1.3, we get that (g, m) is
mixing: given any measurable sets A, B ⊂ S1 ,
 −n  +   
m g (A) ∩ B = μ h−1 (g−n (A) ∩ B) = μ f −n (h−1 (A)) ∩ h−1 (B)

→ μ(h−1 (A))μ(h−1 (B)) = m(A)m(B) when n → ∞.


Example 7.1.5. For surjective endomorphisms of the torus (Section 4.2.5)
mixing and ergodicity are equivalent properties: the system (fA , m) is mixing
if and only if no eigenvalue of the matrix A is a root of unity (compare
Theorem 4.2.14). In Exercise 7.1.4 we invite the reader to prove this fact; a
stronger statement will appear in Exercise 8.4.2. More generally, relative to
the Haar measure, a surjective endomorphism of a compact group is mixing if
and only if it is ergodic. In fact, even stronger statements are true, as we will
comment upon in Section 9.5.3.
Let us also discuss the topological version of the notion of a mixing system.
For that, take the ambient M to be a topological space. A transformation f :
M → M is said to be topologically mixing if, given any non-empty open sets
U, V ⊂ M, there exists n0 ∈ N such that f −n (U) ∩ V is non-empty for every n ≥
n0 . This is similar to but strictly stronger than the hypothesis of Lemma 4.3.4:
in the lemma we asked f −n (U) to intersect V for some value of n, whereas now
we request that to happen for every n sufficiently large.
Proposition 7.1.6. If (f , μ) is mixing then the restriction of f to the support of
μ is topologically mixing.
7.1 Mixing systems 185

Proof. Denote X = supp(μ). Let A, B ⊂ X be open sets. By the definition of


support of a measure, μ(A) > 0 and μ(B) > 0. Hence, since μ is mixing, there
exists n0 such that μ(f −n (A) ∩ B) > μ(A)μ(B)/2 > 0 for every n ≥ n0 . In
particular, μ(f −n (A) ∩ B) = ∅ for every n ≥ n0 .

It follows directly from this proposition that if f admits some invariant


probability measure μ that is mixing and positive on open sets, then f is
topologically mixing. For example, given any finite set X = {1, . . . , d}, the shift
map
f : XZ → XZ (or f : X N → X N )
is topologically mixing. Indeed, for any probability measure ν supported on
the whole of X, the Bernoulli measure μ = ν Z (or μ = ν N ) is mixing and
positive on open sets, as we have seen in Example 7.1.3. Analogously, by
Example 7.1.4, every transformation f : S1 → S1 of the form f (x) = kx with
k ≥ 2 is topologically mixing.

Example 7.1.7. Translations in a metrizable group G are never topologically


mixing. Indeed, consider any left-translation Lg (the case of right-translations
is analogous). We may suppose that g is not the unit element e since otherwise
it is obvious that Lg is not topologically mixing. Fix some distance d invariant
under all the translations of the group G (recall Lemma 6.3.6) and let α =
d(e, g−1 ). Consider U = V = ball of radius α/4 around e. Every Lg−n (U) is a
ball of radius α/4. Assume that Lg−n (U) intersects V. Then Lg−n (U) is contained
in the ball of radius 3α/4 and, thus, Lg−n−1 (U) is contained in the ball of radius
3α/4 around g−1 . Consequently, Lg−n−1 (U) does not intersect V. Since n is
arbitrary, this shows that Lg is not topologically mixing.

7.1.2 Weak mixing


A system (f , μ) is weak mixing if, given any measurable sets A, B ⊂ M,

1 1   −j 
n−1 n−1
lim |Cj (XA , XB )| = lim μ(f (A) ∩ B) − μ(A)μ(B) = 0. (7.1.5)
n n n→∞ n
j=0 j=0

It is clear from the definition that every mixing system is also weak mixing.
On the other hand, every weak mixing system is ergodic. Indeed, if A ⊂ M is
an invariant set then
1
n−1
lim |Cj (XA , XAc )| = μ(A)μ(Ac )
n n
j=0

and, hence, the hypothesis implies that μ(A) = 0 or μ(Ac ) = 0.

Example 7.1.8. Translations in metrizable compact groups are never weak


mixing with respect to the Haar measure μ (or any other invariant measure
186 Correlations

positive on open sets). Indeed, as observed in Example 7.1.7, it is always


possible to choose open sets U and V such that f −n (U) ∩ V is empty for at
least one in every two consecutive values n. Then,

1   −j  1
n−1
lim inf μ(f (U) ∩ V) − μ(U)μ(V) ≥ μ(U)μ(V) > 0.
n n j=0 2

In this way we get several examples of ergodic systems, even uniquely ergodic
ones, that are not weak mixing.

We are going to see in Section 7.3.2 that the family of interval exchanges
contains many systems that are weak mixing (and uniquely ergodic) but are
not mixing.
The proof of the next result is analogous to the proof of Lemma 7.1.2 and is
left to the reader:
 −j
Lemma 7.1.9. Assume that limn (1/n) n−1 j=0 |μ(f (A) ∩ B) − μ(A)μ(B)| = 0
for every pair of sets A and B in some algebra A that generates the σ -algebra
of measurable sets. Then (f , μ) is weak mixing.

Example 7.1.10. Given a system (f , μ), let us consider the product transfor-
mation f2 : M × M → M × M given by f2 (x, y) = (f (x), f (y)). It is easy to see
that f2 preserves the product measure μ2 = μ × μ. If (f2 , μ2 ) is ergodic then
(f , μ) is ergodic: just note that if A ⊂ M is invariant under f and μ(A) ∈ (0, 1)
then A × A is invariant under f2 and μ2 (A × A) ∈ (0, 1).
The converse is not true in general, that is, (f2 , μ2 ) may not be ergodic even
if (f , μ) is ergodic. For example, if f : S1 → S1 is an irrational rotation and d is
a distance invariant under rotations, then any neighborhood {(x, y) : d(x, y) < r}
of the diagonal is invariant under f2 .

The next result shows that this type of phenomenon cannot occur in the
category of weak mixing systems:

Proposition 7.1.11. The following conditions are equivalent:


(i) (f , μ) is weak mixing;
(ii) (f2 , μ2 ) is weak mixing;
(iii) (f2 , μ2 ) is ergodic.

Proof. To prove that (i) implies (ii), consider any measurable sets A, B, C, D in
M. Then
 
μ2 (f −j (A × B) ∩ (C × D)) − μ2 (A × B)μ2 (C × D)
2
   
= μ f −j (A) ∩ C μ(f −j (B) ∩ D) − μ(A)μ(B)μ(C)μ(D)
       
≤ μ f −j (A) ∩ C − μ(A)μ(C) + μ f −j (B) ∩ D − μ(B)μ(D).
7.1 Mixing systems 187

Therefore, the hypothesis (i) implies that

1   
n−1
μ2 (f2 (A × B) ∩ (C × D)) − μ2 (A × B)μ2 (C × D) = 0.
−j
lim
n n
j=0

It follows that
1   
n−1
μ2 (f2 (X) ∩ Y) − μ2 (X)μ2 (Y) = 0
−j
lim
n n
j=0

for any X, Y in the algebra generated by the products E × F of measurable


subsets of M, that is, the algebra formed by the finite pairwise disjoint unions
of such products. Since this algebra generates the σ -algebra of measurable
subsets of M × M, we may use Lemma 7.1.9 to conclude that (f2 , μ2 ) is weak
mixing.
It is immediate that (i) implies (iii). To prove that (iii) implies (i), observe
that
1    −j  2
n−1
μ f (A) ∩ B − μ(A)μ(B)
n j=0

1    −j 2   
n−1
= μ f (A) ∩ B − 2μ(A)μ(B)μ f −j (A) ∩ B + μ(A)2 μ(B)2 .
n j=0

The right-hand side may be rewritten as

1   + −j  
n−1
μ2 f2 (A × A) ∩ (B × B) − μ2 (A × A)μ2 (B × B)
n j=0

1    −j  
n−1
− 2μ(A)μ(B) μ f (A) ∩ B − μ(A)μ(B) .
n j=0

Since (f2 , μ2 ) is ergodic and, consequently, (f , μ) is also ergodic, part (ii) of


Proposition 4.1.4 gives that both terms in this expression converge to zero. In
this way, we conclude that

1    −j  2
n−1
lim μ f (A) ∩ B − μ(A)μ(B) = 0
n n
j=0

for any measurable sets A, B ⊂ M. Using Exercise 7.1.2, we deduce that (f , μ)


is weak mixing.

7.1.3 Spectral characterization


In this section we discuss equivalent formulations of the notions of mixing and
weak mixing systems, in terms of the Koopman operator.
188 Correlations

Proposition 7.1.12. The following conditions are equivalent:


(i) (f , μ) is mixing.
(ii) There exist p, q ∈ [1, ∞] with 1/p + 1/q = 1 such that Cn (ϕ, ψ) → 0 for
any ϕ ∈ Lp (μ) and ψ ∈ Lq (μ).
(iii) The condition in part (ii) holds for ϕ in some dense subset of Lp (μ) and
ψ in some dense subset of Lq (μ).

Proof. Condition (i) is the special case of (ii) for characteristic functions. Since
the correlations (ϕ, ψ) → Cn (ϕ, ψ) are bilinear functions, condition (i) implies
that Cn (ϕ, ψ) → 0 for any simple functions ϕ and ψ. This implies (iii), since
the simple functions form a dense subset of Lr (μ) for any r ≥ 1.
To show that (iii) implies (ii), let us begin by observing that as correlations
Cn (ϕ, ψ) are equicontinuous functions of ϕ and ψ. Indeed, given ϕ1 , ϕ2 ∈
Lp (μ) and ψ1 , ψ2 ∈ Lq (μ), the Hölder inequality (Theorem A.5.5) gives that
  
 
 (ϕ1 ◦f )ψ1 dμ− (ϕ2 ◦f )ψ2 dμ ≤ ϕ1 −ϕ2 p ψ1 q +ϕ2 p ψ1 −ψ2 q .
n n
 
Moreover,
    
 
 ϕ1 dμ ψ1 dμ− ϕ2 dμ ψ2 dμ ≤ ϕ1 −ϕ2 1 ψ1 1 +ϕ2 1 ψ1 −ψ2 1 .
 
Adding these inequalities, and noting that  · 1 ≤  · r for every r ≥ 1, we get
that
 
Cn (ϕ1 , ψ1 ) − Cn (ϕ2 , ψ2 ) ≤ 2ϕ1 − ϕ2 p ψ1 q + 2ϕ2 p ψ1 − ψ2 q (7.1.6)

for every n ≥ 1. Then, given ε > 0 and any ϕ ∈ Lp (μ) and ψ ∈ Lq (μ), we may
take ϕ  and ψ  in the dense subsets mentioned in the hypothesis such that
ϕ − ϕ  p < ε and ψ − ψ  q < ε.
In particular, ϕ  p < ϕp + ε and ψ  q < ψq + ε. Then, (7.1.6) gives that
|Cn (ϕ, ψ)| ≤ |Cn (ϕ  , ψ  )| + 2ε(ϕp + ψq + 2ε) for every n.
Moreover, by hypothesis, |Cn (ϕ  , ψ  )| < ε for every n sufficiently large. Since ε
is arbitrary, these two inequalities imply that Cn (ϕ, ψ) converges to zero when
n → ∞. This proves property (ii).

The same argument proves the following version of Proposition 7.1.12 for
the weak mixing property:
Proposition 7.1.13. The following conditions are equivalent:
(i) (f , μ) is weak mixing.

(ii) There exist p, q ∈ [1, ∞] with 1/p+1/q = 1 such that (1/n) nj=1 |Cj (ϕ, ψ)|
converges to 0 for any ϕ ∈ Lp (μ) and ψ ∈ Lq (μ).
(iii) The condition in part (ii) holds for ϕ in some dense subset of Lp (μ) and
ψ in some dense subset of Lq (μ).
7.1 Mixing systems 189

In the case p = q = 2, we may express the correlations in terms of the inner


product · in the Hilbert space L2 (μ):
 
Cn (ϕ, ψ) = Ufn ϕ − (ϕ · 1) · ψ for every ϕ, ψ ∈ L2 (μ).
Therefore, Proposition 7.1.12 gives that (f , μ) is mixing if and only if
 
lim Ufn ϕ − (ϕ · 1) · ψ = 0 for every ϕ, ψ ∈ L2 (μ), (7.1.7)
n

and Proposition 7.1.13 gives that (f , μ) is weak mixing if and only if


1   j  
n
lim Uf ϕ − (ϕ · 1) · ψ  = 0 for every ϕ, ψ ∈ L2 (μ). (7.1.8)
n n
j=1

The condition (7.1.7) means that Ufn ϕ converges weakly to ϕ ·1 = ϕ dμ, while
(7.1.8) is a Cesàro version of that assertion. Compare both conditions with the
characterization of ergodicity in (4.1.7).

Corollary 7.1.14. Let f : M → M be a mixing transformation relative to some


invariant probability measure μ. Let ν be any probability measure on M,
absolutely continuous with respect to μ. Then f∗n ν converges pointwise to μ,
that is, ν(f −n (B)) → μ(B) for every measurable set B ⊂ M.

Proof. Let ϕ = XB and ψ = dν/dμ. Note that ϕ ∈ L∞ (μ) and ψ ∈ L1 (μ).


Hence, by Proposition 7.1.12,
     
n dν dν
(XB ◦ f ) dμ = (Uf ϕ)ψ dμ → ϕ dμ ψ dμ = XB dμ
n
dμ.
dμ dμ
The sequence on the left-hand side coincides with (XB ◦ f n ) dν = ν(f −n (B)).
The right-hand side is equal to μ(B) 1 dν = μ(B).

7.1.4 Exercises
7.1.1. Show that (f , μ) is mixing if and only if μ(f −n (A) ∩ A) → μ(A)2 for every
measurable set A.
7.1.2. Let (an )n be a bounded sequence of real numbers. Prove that

1
n
lim |aj | = 0
n n
j=1

if and only if there exists E ⊂ N with density zero at infinity (that is, with
limn (1/n)#(E ∩ {0, . . . , n − 1}) = 0) such that the restriction of (an )n to the
complement of E converges to zero when n → ∞. Deduce that

1 1 2
n n
lim |aj | = 0 ⇔ lim (aj ) = 0.
n n n n
j=1 j=1

7.1.3. Prove that if μ is weak mixing for f then μ is weak mixing for every iterate f k ,
k ≥ 1.
190 Correlations

7.1.4. Show that if no eigenvalue of A ∈ SL(d, R) is a root of unity then the linear
endomorphism fA : Td → Td induced by A is mixing, with respect to the Haar
measure.
7.1.5. Let f : M → M be a measurable transformation in a metric space. Check that an
invariant probability measure μ is mixing if and only if (f∗n η)n converges to μ in
the weak∗ topology for every probability measure η absolutely continuous with
respect to μ.
7.1.6. (Multiple von Neumann ergodic theorem). Show that if (f , μ) is weak mixing
then
 
1
N−1
(ϕ1 ◦ f ) · · · (ϕk ◦ f ) → ϕ1 dμ · · · ϕk dμ
n kn
in L2 (μ),
N n=0

for any bounded measurable functions ϕ1 , . . . , ϕk .

7.2 Markov shifts


In this section we introduce an important class of systems that generalizes
the notion of Bernoulli shift. As explained previously, Bernoulli shifts model
sequences of identical experiments such that the outcome of each experiment is
independent of all the others. In the definition of Markov shifts we weaken this
independence condition: we allow each outcome to depend on the preceding
one, but not the others. More generally, Markov shifts may be used to model the
so-called finite memory processes, that is, sequences of experiments for which
there exists k ≥ 1 such that the outcome of each experiment depends only on
the outcomes of the k previous experiments. In this regard, see Exercise 7.2.4.
To define a Markov shift, let (X, A) be a measurable space and  = X N
(or  = X Z ) be the space of all sequences in X, endowed with the product
σ -algebra. Let us consider the shift map
 
σ :  → , σ (xn )n = (xn+1 )n .

Let us be given a family {P(x, ·) : x ∈ X} of probability measures on X that


depend measurably on the point x. They will be called transition probabilities:
for each measurable set E ⊂ X, the number P(x, E) is meant to represent the
probability that xn+1 ∈ E, given that xn = x. A probability measure p in X is
called a stationary measure, relative to the family of transition probabilities, if
it satisfies

P(x, E) dp(x) = p(E), for every measurable set E ⊂ X. (7.2.1)

Heuristically, this means that, relative to p, a probability of the event xn+1 ∈ E


is equal to the probability of the event xn ∈ E.
7.2 Markov shifts 191

Fix any stationary measure p (assuming it exists) and then define


 
μ [m; Am , . . . , An ]
  
= dp(xm ) dP(xm , xm+1 ) · · · dP(xn−1 , xn ) (7.2.2)
Am Am+1 An

for every cylinder [m; Am , . . . , An ] of . One can show (check Exercise 7.2.1)
that this function extends to a probability measure in the σ -algebra generated
by the cylinders. This probability measure is invariant under the shift map σ ,
because the right-hand side of (7.2.2) does not depend on m. Every probability
measure μ obtained in this way is called a Markov measure; moreover, the
system (σ , μ) is called a Markov shift.
Example 7.2.1 (Bernoulli measure). Suppose that P(x, ·) does not depend on
x, that is, that there exists a probability measure ν on X such that P(x, ·) = ν
for every x ∈ X. Then
 
P(x, E) dp(x) = ν(E) dp(x) = ν(E)

for every probability measure p and every measurable set E ⊂ X. Therefore,


there exists exactly one stationary measure, namely p = ν. The definition in
(7.2.2) gives
  
 
μ [m; Am , . . . , An ] = dν(xm ) dν(xm+1 ) · · · dν(xn )
Am Am+1 An

= ν(Am )ν(Am+1 ) · · · ν(An ).


Example 7.2.2. Suppose that the set X is finite, say X = {1, . . . , d} for some
d ≥ 2. Any family of transition probabilities P(x, ·) on X is completely
characterized by the values
Pi,j = P(i, {j}), 1 ≤ i, j ≤ d. (7.2.3)
Moreover, a measure p on the set X is completely characterized by the values
pi = p({i}), 1 ≤ i ≤ d. With these notations, the definition (7.2.1) translates to

d
pi Pi,j = pj , for every 1 ≤ j ≤ d. (7.2.4)
i=1

Moreover, a Markov measure μ is determined by


 
μ [m; am , . . . , an ] = pam Pam ,am+1 · · · Pan−1 ,an . (7.2.5)
In the remainder of this book we always restrict ourselves to finite Markov
shifts, that is, to the context of Example 7.2.2. We take the set X endowed with
the discrete topology and the corresponding Borel σ -algebra. Observe that the
matrix  
P = Pi,j 1≤i,j≤d
defined by (7.2.3) satisfies the following conditions:
192 Correlations

(i) Pi,j ≥ 0 for every 1 ≤ i, j ≤ d;


d
(ii) j=1 Pi,j = 1 for every 1 ≤ i ≤ d.

We say that P is a stochastic matrix. Conversely, any matrix that satisfies (i)
and (ii) defines a family of transition probabilities on the set X. Observe also
that, denoting p = (p1 , . . . , pd ), the relation (7.2.4) corresponds to

P∗ p = p, (7.2.6)

where P∗ denotes the transpose of the matrix P. In other words, the stationary
measures correspond precisely to the eigenvectors of the transposed matrix for
the eigenvalue 1. Using the following classical result, one can show that such
eigenvalues always exist:

Theorem 7.2.3 (Perron–Frobenius). Let A be a d ×d matrix with non-negative


coefficients. Then there exists λ ≥ 0 and some vector v = 0 with non-negative
coefficients such that Av = λv and λ ≥ |γ | for every eigenvalue γ of A.
If A has some power whose coefficients are all positive then λ > 0 and it
has some eigenvector v whose coefficients are all positive. Indeed, λ > |γ | for
any other eigenvalue γ of A. Moreover, the eigenvalue λ has multiplicity 1
and it is the only eigenvalue of A having some eigenvector with non-negative
coefficients.

A proof of the Perron–Frobenius theorem may be found in Meyers [Mey00],


for example. Applying this theorem to the matrix A = P∗ , we conclude that
there exist λ ≥ 0 and p = 0 with pi ≥ 0 for every i, such that

d
pi Pi,j = λpj , for every 1 ≤ j ≤ d.
i=1

Adding over j = 1, . . . , d we get that



d 
d 
d
pi Pi,j = λ pj .
j=1 i=1 j=1

Using property (ii) of the stochastic matrix, the left-hand side of this equality
may be written as
d d d
pi Pi,j = pi .
i=1 j=1 i=1

Comparing the last two equalities and recalling that the sum of the coefficients
of p is a positive number, we conclude that λ = 1. This proves our claim that
there always exist vectors p  = 0 satisfying (7.2.6).
When Pn has positive coefficients for some n ≥ 1, it follows from
Theorem 7.2.3 that the eigenvector is unique up to scaling, and it may be
chosen with positive coefficients.
7.2 Markov shifts 193

Example 7.2.4. In general, p is not unique and it may also happen that there
is no eigenvalue with positive coefficients. For example, consider:
⎛ ⎞
1−a a 0 0 0
⎜ b 1−b 0 ⎟
⎜ 0 0 ⎟
⎜ ⎟
P=⎜ 0 0 1−c c 0 ⎟
⎜ ⎟
⎝ 0 0 d 1−d 0 ⎠
e 0 0 0 1−e
where a, b, c, d, e ∈ (0, 1). A vector p = (p1 , p2 , p3 , p4 , p5 ) satisfies P∗ p = p if
and only if ap1 = bp2 and cp3 = dp4 and p5 = 0. Therefore, the eigenspace has
dimension 2 and no eigenvector has positive coefficients.
On the other hand, suppose that p is such that pi = 0 for some i. Let μ be
the corresponding Markov measure and let i = (X \ {i})N (or i = (X \ {i})Z ).
Then μ(i ) = 1, since μ([n; i]) = pi = 0 for every n. This means that we may
eliminate the symbol i, and still have a system that is equivalent to the original
one. Therefore, up to removing from the set X a certain number of superfluous
symbols, we may always take the eigenvector p to have positive coefficients.
Denote by P the set of all sequences (xn )n ∈  satisfying
Pxn ,xn+1 > 0 for every n, (7.2.7)
that is, such that all the transitions are “allowed” by P. It is clear from the
definition that P is invariant under the shift map σ . The transformations σ :
P → P constructed in this way are called shifts of finite type and will be
studied in more detail in Section 10.2.2.
Lemma 7.2.5. The set P is closed in  and, given any solution of P∗ p = p
with positive coefficients, the support of the corresponding Markov measure μ
coincides with P .

Proof. Let xk = (xnk )n , k ∈ N be any sequence in P and suppose that it


converges in  to some x = (xn )n . By the definition of the topology in , this
means that for every n there exists kn ≥ 1 such that xnk = xn for every k ≥ kn . So,
given any n, taking k ≥ max{kn , kn+1 } we conclude that Pxn ,xn+1 = Pxnk ,xk > 0.
n+1
This shows that x ∈ P and that proves the first part of the lemma.
To prove the second part, recall that the cylinders [m; xm , . . . , xn ] form a basis
of neighborhoods of any x = (xn )n in . If x ∈ P then
μ([m; xm , . . . , xn ]) = pxm Pxm ,xm+1 · · · Pxn−1 ,xn > 0
for every cylinder and, thus, x ∈ supp μ. If x ∈ / P then there exists n such that
Pxn ,xn+1 = 0. In that case, μ([n; xn , xn+1 ]) = 0 and so x ∈
/ supp μ.

Example 7.2.6. There are three possibilities for the support of a Markov
measure in Example 7.2.4. If p = (p1 , p2 , 0, 0, 0) with p1 , p2 > 0 then we may
eliminate the symbols 3, 4, 5. All the sequences of symbols 1, 2 are admissible.
194 Correlations

Hence supp μ = {1, 2}N . Analogously, if p = (0, 0, p3 , p4 , 0) with p3 , p4 > 0 then


supp μ = {3, 4}N . In all the other cases, p = (p1 , p2 , p3 , p4 , 0) with p1 , p2 , p3 , p4 >
0. Eliminating the symbol 5, we get that the set of admissible sequences is
P = {1, 2}N ∪ {3, 4}N .
Both sets in this union are invariant and have positive measure. So, in this case
the Markov shift (σ , μ) is not ergodic. But it follows from the theory presented
in the next section that in the previous two cases the system (σ , μ) is indeed
ergodic.

In the next lemma we collect some simple properties of stochastic matrices


that will be useful in what follows:

Lemma 7.2.7. Let P be a stochastic matrix and p = (p1 , . . . , pd ) be a solution


of P∗ p = p. For every n ≥ 0, denote by Pni,j , 1 ≤ i, j ≤ d the coefficients of the
matrix Pn . Then:
d
j=1 Pi,j = 1 for every 1 ≤ i ≤ d and every n ≥ 1;
n
(i)
d
i=1 pi Pi,j = pj for every 1 ≤ j ≤ d and every n ≥ 1;
n
(ii)
(iii) the hyperplane H = {(h1 , . . . , hd ) : h1 + · · · + hd = 0} is invariant under the
matrix P∗ .

Proof. Condition (ii) in the definition of stochastic matrix may be written as


Pu = u, with u = (1, . . . , 1). Then Pn u = u for every n ≥ 1. This is just another
way of writing claim (i). Analogously, P∗ p = p implies that (P∗ )n p = p for
every n ≥ 1, which is another way of writing claim (ii). Observe that H is the
orthogonal complement of vector u. Since u is invariant under P, it follows that
H is invariant under the transposed matrix P∗ , as claimed in (iii).

7.2.1 Ergodicity
In this section we always take p = (p1 , . . . , pd ) to be a solution of P∗ p = p

with pi > 0 for every i, normalized in such a way that i pi = 1. Let μ be the
corresponding Markov measure. We want to understand which conditions the
stochastic matrix P must satisfy for the system (σ , μ) to be ergodic.
We say that a stochastic matrix P is irreducible if for every 1 ≤ i, j ≤ d there
exists n ≥ 0 such that Pni,j > 0. In other words, P is irreducible if any outcome
i may be followed by any outcome j, after a certain number n of steps which
may depend on i and j.

Theorem 7.2.8. The Markov shift (σ , μ) is ergodic if and only if the matrix P
is irreducible.

The remainder of the present section is dedicated to the proof of this


theorem. We start by proving the following useful estimate:
7.2 Markov shifts 195

Lemma 7.2.9. Let A = [m; am , . . . , aq ] and B = [r; br , . . . , bs ] be cylinders of 


with r > q. Then
r−q
  Paq ,br
μ A ∩ B = μ(A)μ(B) .
pbr
Proof. We may write A ∩ B as a disjoint union

A∩B = [m; am , . . . , aq , xq+1 , . . . , xr−1 , br , . . . , bs ],
x

over all x = (xq+1 , . . . , xr−1 ) ∈ X r−q−1 . Then,


  
μ A∩B = pam Pam ,am+1 . . . Paq−1 ,aq Paq ,xq+1 . . . Pxr−1 ,br Pbr ,br+1 . . . Pbs−1 ,bs
x
 1
= μ(A) Paq ,xq+1 . . . Pxr−1 ,br μ(B).
x
pbr
r−q
The sum in this last expression is equal to Paq ,br . Therefore,
  r−q
μ A ∩ B = μ(A)μ(B)Paq ,br /pbr ,
as stated.

Lemma 7.2.10. A stochastic matrix P is irreducible if and only if

1 l
n−1
lim Pi,j = pj for every 1 ≤ i, j ≤ d. (7.2.8)
n n
l=0

Proof. Assume that (7.2.8) holds. Recall that pj > 0 for every j. Then, given
any 1 ≤ i, j ≤ d, we have Pli,j > 0 for infinitely many values of l. In particular,
P is irreducible.
To prove the converse, consider A = [0; i] and B = [0; j]. By Lemma 7.2.9,

1   1 1 l
n−1 n−1
−l
μ A ∩ σ (B) = μ(A)μ(B) P .
n l=0 pj n l=0 i,j
According to Exercise 4.1.2, the left-hand side converges when n → ∞.
Therefore,
1 l
n−1
Qi,j = lim Pi,j
n n
l=0
exists for every 1 ≤ i, j ≤ d. Consider the matrix Q = (Qi,j )i,j , that is,

1 l
n−1
Q = lim P. (7.2.9)
n n
l=0

Using Lemma 7.2.7(ii) and taking the limit when n → ∞, we get that

d
pi Qi,j = pj for every 1 ≤ j ≤ d. (7.2.10)
i=1
196 Correlations

Observe also that, given any k ≥ 1,


1  k+l 1 l
n−1 n−1
P Q = lim
k
P = lim P = Q. (7.2.11)
n n n n
l=0 l=0

It follows that Qi,j does not depend on i. Indeed, suppose that there exist r and
s such that Qr,j < Qs,j . Of course, we may choose s in such a way that the
right-hand side of this inequality is larger. Since P is irreducible, there exists k
such that Pks,r > 0. Hence, using (7.2.11) followed by Lemma 7.2.7(i),
/ d 0

d 
Qs,j = Pks,i Qi,j < Pks,i Qs,j = Qs,j ,
i=1 i=1

which is a contradiction. This contradiction proves that Qi,j does not depend
on i, as claimed. Write Qj = Qi,j for any i. The property (7.2.10) gives that
/ d 0
d 
pj = Qi,j pi = Qj pi = Qj ,
i=1 i=1

for every j. This finishes the proof of the lemma.

Proof of Theorem 7.2.8. Suppose that μ is ergodic. Let A = [0; i] and B = [0; j].
By Proposition 4.1.4,
1  
n−1
lim μ A ∩ σ −l (B) = μ(A)μ(B) = pi pj . (7.2.12)
n n
l=0

On the other hand, by Lemma 7.2.9, we have that μ(A∩σ −l (B)) = pi Pli,j . Using
this identity in (7.2.12) and dividing both sides by pi we find that
1 l
n−1
lim Pi,j = pj .
n n
l=0

Note that j is arbitrary. So, by Lemma 7.2.10, this proves that P is irreducible.
Now suppose that the matrix P is irreducible. We want to conclude that μ is
ergodic. According to Corollary 4.1.5, it is enough to prove that
1  
n−1
lim μ A ∩ σ −l (B) = μ(A)μ(B) (7.2.13)
n n
l=0

for any A and B in the algebra generated by the cylinders. Since the elements
of this algebra are the finite pairwise disjoint unions of cylinders, it suffices
to consider the case when A and B are cylinders, say A = [m; am , . . . , aq ] and
B = [r; br , . . . , bs ]. Observe also that the validity of (7.2.13) is not affected if
one replaces B by some pre-image σ −j (B). So, it is no restriction to suppose
that r > q. Then, by Lemma 7.2.9,
1   1 1  r−q+l
n−1 n−1
−l
μ A ∩ σ (B) = μ(A)μ(B) P
n l=0 pbr n l=0 aq ,br
7.2 Markov shifts 197

for every n. By Lemma 7.2.10,

1  r−q+l 1 l
n−1 n−1
lim Paq ,br = lim Paq ,br = pbr .
n n n n
l=0 l=0

This proves the property (7.2.13) for the cylinders A and B.

7.2.2 Mixing
In this section we characterize the Markov shifts that are mixing, in terms of
the corresponding stochastic matrix P. As before, we take p to be a normalized
solution of P∗ p = p with positive coefficients and μ to be the corresponding
Markov measure.
We say that a stochastic matrix P is aperiodic if there exists n ≥ 1 such that
Pi,j > 0 for every 1 ≤ i, j ≤ d. In other words, P is aperiodic if some power Pn
n

has only positive coefficients. The relation between the notions of aperiodicity
and irreducibility is analyzed in Exercise 7.2.6.

Theorem 7.2.11. The Markov shift (σ , μ) is mixing if and only if the matrix P
is aperiodic.

For the proof of Theorem 7.2.11 we need the following fact:

Lemma 7.2.12. A stochastic matrix P is aperiodic if and only if


lim Pli,j = pj for every 1 ≤ i, j ≤ d. (7.2.14)
l

Proof. Since we assume that pj > 0 for every j, it is clear that (7.2.14) implies
that Pli,j > 0 for every i, j and every l sufficiently large.
Now suppose that P is aperiodic. Then we may apply the theorem of
Perron–Frobenius (Theorem 7.2.3) to the matrix A = P∗ . Since p is an
eigenvector of A with positive coefficients, we get that λ = 1 and all the other
eigenvalues of A are smaller than 1 in absolute value. By Lemma 7.2.7(iii),
the hyperplane H formed by the vectors (h1 , . . . , hd ) with h1 + · · · + hd = 0 is
invariant under A. It is clear that H is transverse to the direction of p. Then the
decomposition
Rd = Rp ⊕ H (7.2.15)
is invariant under A and the restriction of A to the hyperplane H is a contraction,
meaning that its spectral radius is smaller than 1. It follows that the sequence
(Al )l converges to the projection on the first coordinate of (7.2.15), that is, to
the matrix B characterized by Bp = p and Bh = 0 for every h ∈ H. In other
words, (Pl )l converges to B∗ . Observe that
Bi,j = pi for every 1 ≤ i, j ≤ d.
Therefore, limn Pli,j = Bj,i = pj for every i, j.
198 Correlations

Proof of Theorem 7.2.11. Suppose that the measure μ is mixing. Let A = [0; i]
and B = [0; j]. By Lemma 7.2.9, we have that μ(A ∩ σ −l (B)) = pi Pli,j for every
l. Therefore,
 
pi lim Pli,j = lim μ A ∩ σ −l (B) = μ(A)μ(B) = pi pj .
l l

Dividing both sides by pi we get that liml Pli,j = pj . According to Lemma 7.2.12,
this proves that P is aperiodic.
Now suppose that the matrix P is aperiodic. We want to conclude that μ is
mixing. According to Lemma 7.1.2, it is enough to prove that
 
lim μ A ∩ σ −l (B) = μ(A)μ(B) (7.2.16)
l

for any A and B in the algebra generated by the cylinders. Since the elements
of this algebra are the finite pairwise disjoint unions of cylinders, it suffices
to treat the case when A and B are cylinders, say A = [m; am , . . . , aq ] and B =
[r; br , . . . , bs ]. By Lemma 7.2.9,
  1 r−q+l
μ A ∩ σ −l (B) = μ(A)μ(B) Paq ,br
pbr
for every l > q − r. Then, using Lemma 7.2.12,
  1
lim μ A ∩ σ −l (B) = μ(A)μ(B)
r−q+l
lim Paq ,br
l pbr l
1
= μ(A)μ(B) lim Plaq ,br = μ(A)μ(B)
pbr l
This proves the property (7.2.16) for cylinders A and B.

Example 7.2.13. In Example 7.2.4 we found different types of Markov


measures, depending on the choice of the probability eigenvector p. In the
first case, p = (p1 , p2 , 0, 0, 0) and the measure μ is supported on {1, 2}N . Once
the superfluous symbols 3, 4, 5 have been removed, the stochastic matrix
reduces to  
1−a a
P= .
b 1−b
Since this matrix is aperiodic, the Markov measure μ is mixing. The second
case is entirely analogous. In the third case, p = (p1 , p2 , p3 , p4 , 0) and, after
removing the superfluous symbol 5, the stochastic matrix reduces to
⎛ ⎞
1−a a 0 0
⎜ b 1−b 0 0 ⎟
P=⎜ ⎝ 0
⎟.
0 1−c c ⎠
0 0 d 1−d
This matrix is not irreducible and, hence, the Markov measures that one finds
in this case are not ergodic (recall also Example 7.2.6).
7.2 Markov shifts 199

Example 7.2.14. It is not difficult to find examples of irreducible matrices that


are not aperiodic:
⎛ ⎞
0 1/2 0 1/2
⎜ 1/2 0 1/2 0 ⎟
P=⎜ ⎝ 0 1/2 0 1/2 ⎠ .

1/2 0 1/2 0
Indeed, we see that Pni,j > 0 if and only if n has the same parity as i − j. Note
that ⎛ ⎞
1/2 0 1/2 0
⎜ 0 1/2 0 1/2 ⎟
P2 = ⎜ ⎝ 1/2 0 1/2 0 ⎠ .

0 1/2 0 1/2
Exercise 7.2.6 shows that every irreducible matrix has a form of this type.

7.2.3 Exercises
7.2.1. Let X = {1, . . . , d} and P = (Pi,j )i,j be a stochastic matrix and p = (pi )i be a
probability vector such that P∗ p = p. Show that the function defined on the set of
all cylinders by
 
μ [m; am , . . . , an ] = pam Pam ,am+1 · · · Pan−1 ,an
extends to a measure on the Borel σ -algebra of  = X N (or  = X Z ), invariant
under the shift map σ :  → .
7.2.2. Prove that every weak mixing Markov shift is actually mixing.
7.2.3. Let X = {1, . . . , d} and let μ be a Markov measure for the shift map σ : X Z → X Z .
Does it follow that μ is also a Markov measure for the inverse σ −1 :  → ?
7.2.4. Let X be a finite set and  = X Z (or  = X N ). Let μ be a probability measure
on , invariant under the shift map σ :  → . Given k ≥ 0, we say that μ has
memory k if
μ([m − l; am−l , . . . , am−1 , am ]) μ([m − k; am−k , . . . , am−1 , am ])
=
μ([m − l; am−l , . . . , am−1 ]) μ([m − k; am−k , . . . , am−1 ])
for every l ≥ k, every m and every (an )n ∈  (by convention, the equality holds
whenever at least one of the denominators is zero). Check that the measures with
memory 0 are the Bernoulli measures and the measures with memory 1 are the
Markov measures. Show that every measure with memory k ≥ 2 is equivalent to
a Markov measure in the space  ˜ = X̃ Z (or ˜ = X̃ N ), where X̃ = X k .
7.2.5. The goal is to show that the set of all measures with finite memory is dense
in the space M1 (σ ) of all probability measures invariant under the shift map
σ :  → . Given any invariant probability measure μ and any k ≥ 1, consider
the function μk defined on the set of all cylinders by
• μk = μ for cylinders with length less than or equal to k;
• for every l ≥ k, every m and every (an )n ∈ ,
μk ([m − l; am−l , . . . , am−1 , am ]) μ([m − k; am−k , . . . , am−1 , am ])
= .
μk ([m − l; am−l , . . . , am−1 ]) μ([m − k; am−k , . . . , am−1 ])
200 Correlations

Show that μk extends to a probability measure on the Borel σ -algebra of ,


invariant under the shift map and with memory k. Check that limk μk = μ in the
weak∗ topology.
7.2.6. Let P be an irreducible stochastic matrix. The goal is to show that there exist
κ ≥ 1 and a partition of X into κ subsets such that the restriction of Pκ to each of
these subsets is aperiodic. To do so:
(1) For every i ∈ X, define R(i) = {n ≥ 1 : Pni,i > 0}. Show that R(i) is closed
under addition: if n1 , n2 ∈ R(i) then n1 + n2 ∈ R(i).
(2) Let R ⊂ N be closed under addition and let κ ≥ 1 be the greatest common
divisor of its elements. Show that there exists m ≥ 1 such that R ∩ [m, ∞) =
κN ∩ [m, ∞).
(3) Show that the greatest common divisor κ of the elements of R(i) does not
depend on i ∈ X and that P is aperiodic if and only if κ = 1.
(4) Assume that κ ≥ 2. Find a partition {Xr : 0 ≤ r < κ} of X such that the
restriction of Pκ to each Xr is aperiodic.

7.3 Interval exchanges


By definition, an interval exchange is a bijection of the interval [0, 1) with
a finite number of discontinuities and whose restriction to every continuity
subinterval is a translation. Figure 7.1 describes an example with four
continuity subintervals. To fix ideas, we always take the transformation to be
continuous on the right, that is, we take all continuity subintervals to be closed
on the left and open on the right.
As a direct consequence of the definition, every interval exchange preserves
the Lebesgue measure on [0, 1). These transformations exhibit a very rich
dynamical behavior and they also have important connections with many
other systems, such as polygonal billiards, conservative flows on surfaces and
Teichmüller flows. For example, the construction that we sketch next shows
that interval exchanges arise naturally as Poincaré return maps of conservative
vector fields on surfaces.

f (T )

f (C )

f (A )

f (G )

T C A G

Figure 7.1. An interval exchange


7.3 Interval exchanges 201

Example 7.3.1. Let S be an orientable surface and ω be an area form in S, that


is, a differential 2-form that is non-degenerate at every point. We may associate
with every vector field X a differential 1-form β, defined by
βx (v) = ωx (X(x), v) for every vector v ∈ Tx S.
Observe that X and β have the same zeros. Moreover, at all other points
the kernel of β coincides with the direction of the vector field. The 1-form
β permits the definition of the notion of “transverse arc-length” of curves
c : [a, b] → S, as follows:
  b
(c) = β = βc(t) (ċ(t)) dt.
c a
Note that the flow trajectories have transverse arc-length zero. However, for
curves transverse to the flow, the measure is equivalent to the usual arc-length
measure, in the sense that they have the same zero measure sets. We can show
(see Exercise 7.3.1) that the 1-form β is closed if and only if X preserves area.
Then, using the theorem of Green, the Poincaré maps of the flow preserve
the transverse length. With an additional hypothesis on the zeros of X, the
first-return map f :  →  to any cross-section  is well defined and is
continuous outside a finite subset of . Then, parameterizing  by transverse
arc length, f is an interval exchange.
An interval exchange is determined by two ingredients. The first one, of a
combinatorial nature, concerns the number of continuity subintervals, the order
of these subintervals and the order of their images inside the interval [0, 1). This
may be informed by assigning a label (for example, a letter) to each continuity
subinterval and to its image, and by listing these labels in their corresponding
orders, in two horizontal rows. For example, in the case described in Figure 7.1,
we obtain  
T C A G
π= .
G A C T
Note that the choice of the labels is arbitrary. We denote by A, and call
alphabet, the set of all labels.
The second ingredient, of a metric nature, concerns the lengths of the
subintervals. This may be expressed through a vector with positive coefficients,
indexed by the alphabet: each coefficient determines the length of the
corresponding continuity subinterval (and of its image). In the case of
Figure 7.1 this length vector has the form
λ = (λT , λC , λA , λG ).
The sum of the coefficients of a length vector is always equal to 1.
Then, the interval exchange f : [0, 1) → [0, 1) associated with each pair (π , λ)
is defined as follows. For every label α ∈ A, denote by Iα the corresponding
continuity subinterval and define wα = v1 − v0 , where v0 is the sum of the
202 Correlations

f (A)

f (B)

A B

Figure 7.2. Rotation viewed as an exchange of two intervals

lengths λβ corresponding to all labels β to the left of α on the top row of π and
v1 is the sum of the lengths λγ corresponding to all the labels γ to the left of α
on the bottom row of π . Then
f (x) = x + wα for every x ∈ Iα .
The vector w = (wα )α∈A is called the translation vector. Clearly, for each fixed
π , the translation vector is a linear function of the length vector λ = (λα )α∈A .

Example 7.3.2. The simplest interval exchanges have only two continuity
subintervals. See Figure 7.2. Choosing the alphabet A = {A, B}, we get
  
A B x + λB for x ∈ IA
π= and f (x) =
B A x − λA = x + λB − 1 for x ∈ IB .
This transformation corresponds precisely to the rotation RλB if we identify
[0, 1) with the circle S1 in the natural way. In this sense, the class of interval
exchanges are a generalization of the family of circle rotations.

7.3.1 Minimality and ergodicity


As we saw previously, a circle rotation Rθ is minimal if and only if θ is
irrational. Moreover, in that case Rθ is also uniquely ergodic. Given that almost
every number is irrational, this means that minimality and unique ergodicity
are typical in the family of circle rotations. In this section we discuss the two
properties in the broader context of interval exchanges.
Let us start with an observation that has no analogue for rotations. We say
that the combinatorics π of an interval exchange reducible if there exists some
position such that the labels to the left of that position in the two rows of π are
exactly the same. For example,
 
B X O L F D
π=
X O B F D L
is reducible, as the labels to the left of the fourth position are the same in
both rows: B, O and X. As a consequence, for any length vector λ, the interval
7.3 Interval exchanges 203

exchange f defined by (π , λ) leaves the subinterval IB ∪ IO ∪ IX invariant. In


particular, f cannot be minimal, not even transitive. In what follows we always
assume the combinatorics π to be irreducible.
It is natural to ask whether the interval exchange is minimal whenever the
length vector λ = (λα )α∈A is rationally independent, that is, whenever

nα λα = 0
α∈A

for every non-zero vector (nα )α∈A with integer coefficients. This turns out to
be true but, in fact, the hypothesis of rational independence is a bit too strong:
we are going to present a somewhat more general condition that still implies
minimality.
We denote by ∂Iα the left endpoint of each subinterval Iα . We say that a
pair (π , λ) satisfies the Keane condition if the trajectories of these points are
disjoint:
f m (∂Iα ) = ∂Iβ for every m ≥ 1 and any α, β ∈ A with ∂Iβ = 0 (7.3.1)
(note that there always exist ᾱ and β̄ such that f (∂Iᾱ ) = 0 = ∂Iβ̄ ). We leave the
proof of the next lemma as an exercise (Exercise 7.3.2):

Lemma 7.3.3. (1) If the pair (π , λ) satisfies the Keane condition then the
combinatorics matrix π is irreducible.
(2) If π is irreducible and λ is rationally independent then the pair (π , λ)
satisfies the Keane condition.

Since the subset of rationally independent vectors has full Lebesgue


measure, it follows that the Keane condition is satisfied for almost every length
vector λ, if π is irreducible.

Example 7.3.4. In the case of two subintervals (recall Example 7.3.2), the
interval exchange has the form f m (x) = x + mλB mod Z. Then, the Keane
condition means that
mλB  = λA + n and λA + mλB = λA + n
for every m ∈ N and n ∈ Z. It is clear that this holds if and only if the vector
(λA , λB ) is rationally independent.

Example 7.3.5. For exchanges of three or more intervals, the Keane condition
is strictly weaker than the rational independence of the length vector. Consider,
for example,
 
A B C
π= .
C A B
Then f m (x) = x + mλC mod Z and, thus, the Keane condition means that
{mλC , λA + mλC , λA + λB + mλC } and {λA + n, λA + λB + n}
204 Correlations

are disjoint for every m ∈ N and n ∈ Z. Equivalently,


pλC ∈
/ {q, λA + q} for every p ∈ Z and q ∈ Z.
This may hold even when (λA , λB , λC ) is rationally dependent.

The following result was proved by Michael Keane [Kea75]:

Theorem 7.3.6 (Keane). If (π , λ) satisfies the Keane condition then the


interval exchange f is minimal.

Example 7.3.7. The Keane condition is not necessary for minimality. For
example, consider the interval exchange defined by (π , λ), where
 
A B C D
π= ,
D C B A
λA = λC , λB = λD and λA /λB = λC /λD is irrational. Then (π , λ) does not satisfy
the Keane condition and yet f is minimal.

As observed previously, every minimal circle rotation is also uniquely


ergodic. This is still true for exchanges of three intervals, but not in general.
Indeed, Keane gave an example of an exchange of four intervals exhibiting two
ergodic probability measures, notwithstanding the fact that the combinatorics
matrix π is irreducible and the length vector λ is rationally independent.
Keane conjectured that, nevertheless, it should be true that almost every
interval exchange is uniquely ergodic. The following remarkable result, ob-
tained independently by Howard Masur [Mas82] and William Veech [Vee82],
established this conjecture:

Theorem 7.3.8 (Masur, Veech). Assume that π is irreducible. Then, for


Lebesgue-almost every length vector λ, the interval exchange defined by (π , λ)
is uniquely ergodic.

Earlier, Michael Keane and Gérard Rauzy [KR80] had shown that unique
ergodicity holds for a residual (Baire second category) subset of length vectors
whenever the combinatorics is irreducible.

7.3.2 Mixing
The interval exchanges provide many examples of systems that are uniquely
ergodic and weak mixing but not (strongly) mixing.
Indeed, the theorem of Masur–Veech (Theorem 7.3.8) asserts that almost
every interval exchange is uniquely ergodic. Another deep theorem, due to
Artur Avila and Giovanni Forni [AF07], states that, circle rotations (more
precisely, interval exchanges with a unique discontinuity point) excluded,
almost every interval exchange is weak mixing. The topological version of
this fact had been proved by Arnaldo Nogueira and Donald Rudolph [NR97].
7.3 Interval exchanges 205

On the other hand, a result of Anatole Katok [Kat80] that we are going to
discuss below asserts that interval exchanges are never mixing:

Theorem 7.3.9. Let f : [0, 1) → [0, 1) be an interval exchange and μ be an


invariant probability measure. Then (f , μ) is not mixing.

Proof. We may take μ to be ergodic, for otherwise the conclusion is obvious.


If μ has some atom then its support is a periodic orbit and, thus, μ cannot
be mixing. Hence, we may also take μ to be non-atomic. Denote by m the
Lebesgue measure on the interval and consider the map

h : [0, 1) → [0, 1), h(x) = μ([0, x]).

Then h is a homeomorphism and satisfies h∗ μ = m. Consequently, the map


g = h ◦ f ◦ h−1 : [0, 1) → [0, 1) has finitely many discontinuity points and
preserves the Lebesgue measure. In particular, the restriction of g to each
continuity subinterval is a translation. Therefore, g is also an interval exchange.
It is clear that (f , μ) is mixing if and only if (g, m) is mixing. Therefore, to
prove Theorem 7.3.9 it is no restriction to suppose that μ is the Lebesgue
measure m. We do that from now on.
Our goal is to find a measurable set X such that m(X ∩ f −n (X)) does not
converge to m(X)2 when n → ∞. Let d = #A.

Lemma 7.3.10. Every interval J = [a, b) contained in some Iβ admits a


partition {J1 , . . . , Js } into no more than d + 2 subintervals of the form Ji =
[ai , bi ) and admits natural numbers t1 , . . . , ts ≥ 1 such that
(i) f n (Ji ) ∩ J = ∅ for every 0 < n < ti and 1 ≤ i ≤ s;
(ii) f ti | Ji is a translation for every 1 ≤ i ≤ s;
(iii) {f t1 (J1 ), . . . , f ts (Js )} is a partition of J;
(iv) the intervals f n (Ji ), 1 ≤ i ≤ s, 0 ≤ n < ti are pairwise disjoint;
ti −1 n
(v) ∞ n=0 f (J) =
n s
i=1 n=0 f (Ji ).

Proof. Let B be the set formed by the endpoints a and b of J together with the
endpoints ∂Iα , α ∈ A minus the origin. Then #B ≤ d + 1. Let BJ ⊂ J be the set
of points x ∈ J for which there exists m ≥ 1 such that f m (x) ∈ B and f n (x) ∈
/J
for every 0 < n < m. The fact that f is injective, together with the definition of
m, implies that the map

BJ → B, x → f m (x)

is injective. In particular, #BJ ≤ #B ≤ d + 1. Consider the partition of J into


subintervals Ji = [ai , bi ) with endpoints ai , bi in the set BJ ∪{a, b}. This partition
has at most d +2 elements. By the Poincaré recurrence theorem, for each i there
exists ti ≥ 1 such that f ti (Ji ) intersects J. Take ti minimum with this property.
Part (i) of the lemma is an immediate consequence. By the definition of BJ ,
the restriction of f ti to the interval Ji is a translation, as stated in part (ii), and
206 Correlations

its image is contained in J. Moreover, the images f ti (Ji ), 1 ≤ i ≤ s are pairwise


disjoint, since f is injective and the ti are the first-return times to J. In particular,

s 
s
m(f (Ji )) =
ti
m(Ji ) = m(J)
i=1 i=1

and so si=1 f ti (Ji ) = J. This proves part (iii). Part (iv) also follows directly
from the fact that f is injective and the ti are the first-return times to J. Finally,
part (v) is a direct consequence of part (iii).

Consider any interval J contained in some Iβ . By ergodicity, the invariant set



n=0 f n (J) has full measure. By part (v) of Lemma 7.3.10, this set is a finite
union of intervals closed on the left and open on the right. Therefore,

  i −1
s t
f (J) =
n
f n (Ji ) = I.
n=0 i=1 n=0

So, by Lemma 7.3.10(iv), the family PJ = {f n (Ji ) : 1 ≤ i ≤ s and 0 ≤ n < ti } is


a partition of I.

Lemma 7.3.11. Given δ > 0 and N ≥ 1, we may choose the interval J in such
a way that diam PJ < δ and ti ≥ N for every i.

Proof. It is clear that diam f n (Ji ) = diam Ji ≤ diam J for every i and every n.
Hence, diam PJ < δ as long as we pick J with diameter smaller than δ. To get
the second property in the statement, take any point x ∈ I such that f n (x) = ∂Iα
for every 0 ≤ n < N and α ∈ A. We claim that f n (x) = x for every 0 < n < N.
Otherwise, since f n is a translation in the neighborhood of x, we would have
f n (y) = y for every point y in that neighborhood, which would contradict the
hypothesis that (f , m) is ergodic. This proves our claim. Now it suffices to take
J = [x, x + ε) with ε < min0<n<N d(x, f n (x)) to ensure that ti ≥ N for every i.

Lemma 7.3.12. For every 1 ≤ i ≤ s there exist si ≤ d + 2 and natural numbers


{ti,1 , . . . , ti,si } such that ti,j ≥ ti and, given any set A in the algebra AJ generated
by PJ , there exists ti,j such that
 % 1
−ti,j
m A ∩ f (A) ≥ m(A). (7.3.2)
(d + 2)2
Proof. Applying Lemma 7.3.10 to each of the intervals Ji , 1 ≤ i ≤ s we find
si ≤ d + 2, a partition {Ji,j : 1 ≤ j ≤ si } of the interval Ji and natural numbers ti,j
such that each ti,j is the first-return time of the points of Ji,j to Ji . It is clear that
ti,j ≥ ti , since ti is the first-return time of any point of Ji to the interval J. The
fact that Ji,j ⊂ f −ti,j (Ji ) implies that

si

si
f (Ji ) =
n
f (Ji,j ) ⊂
n
f −ti,j (f n (Ji )) for every n ≥ 0.
j=1 j=1
7.3 Interval exchanges 207

Since the algebra AJ is formed by the finite pairwise disjoint unions of intervals
f n (Ji ), 0 ≤ n < ti , it follows that

s 
si
A⊂ f −ti,j (A) for every A ∈ AJ .
i=1 j=1
 si
In particular, m(A) ≤ si=1 j=1 m(A ∩ f −ti,j (A)). Recalling that s ≤ d + 2 and
si ≤ d + 2 for every i, this implies (7.3.2).

We are ready to conclude the proof of Theorem 7.3.9. For that, let us fix a
measurable set X ⊂ [0, 1) with
1
0 < m(X) < .
4(d + 2)2
By Lemma 7.3.11, given any N ≥ 1 we may find an interval J ⊂ [0, 1) such that
all the first-return times ti ≥ N and there exists A ∈ AJ such that
1
m(X A) < m(X)2 . (7.3.3)
4
Applying Lemma 7.3.12, we get that there exists ti,j ≥ ti ≥ N such that:
   
m X ∩ f −tij (X) ≥ m A ∩ f −tij (A) − 2m(X A)
1 1
≥ m(A) − m(X)2 .
(d + 2)2 2
The relation (7.3.3) implies that m(A) ≥ (3/4)m(X). Therefore,
  3 1 1
m X ∩ f −tij (X) ≥ m(X) − m(X)2
4 (d + 2)2 2
1
≥ 3m(X)2 − m(X)2 > 2m(X)2 .
2
This proves that lim supn m(X ∩ f −n (X)) ≥ 2m(X)2 , and so the system (f , m)
cannot be mixing.

7.3.3 Exercises
7.3.1. Let ω be an area form on a surface. Let X be a differentiable vector field on S
and β be the differential 1-form defined on S by βx = ωx (X(x), ·). Show that β is
closed if and only if X preserves the area measure.
7.3.2. Prove Lemma 7.3.3.
7.3.3. Show that if (π , λ) satisfies the Keane condition then f has no periodic points.
[Observation: This is a step in the proof of Theorem 7.3.6.]
7.3.4. Let f : [0, 1) → [0, 1) be an irreducible interval exchange and let a ∈ (0, 1)
be the largest of all the discontinuity points of f and f −1 . The Rauzy–Veech
renormalization R(f ) : [0, 1) → [0, 1) is defined by R(f )(x) = g(ax)/a, where
g is the first-return map of f to the interval [0, a). Check that R(f ) is an interval
exchange with the same number of continuity subintervals as f , or less. If f is
described by the data (π , λ), how can we describe R(f )?
208 Correlations

7.3.5. Given d ≥ 2 and a bijection σ : N → N without periodic points, consider the


transformation f : [0, 1] → [0, 1] where each f (x) is obtained by permuting the
digits of the base d expansion of x as prescribed by σ . More precisely, if x =
∞ −n
n=1 an d with an ∈ {0, . . . , d − 1} and infinitely many values of n such that an <

d − 1, then f (x) = ∞ −n
n=1 aσ (n) d . Show that f preserves the Lebesgue measure m
in the interval and that (f , m) is mixing.

7.4 Decay of correlations


In this section we discuss how quickly the correlations sequence Cn (ϕ, ψ)
decays to zero in a mixing system. Since we are dealing with deterministic
systems, we cannot expect interesting estimates to hold for arbitrary functions.
However, as we are going to see, such estimates do exist in many important
cases, if we restrict ϕ and ψ to suitable subsets of functions. Given that the
correlations (ϕ, ψ) → Cn (ϕ, ψ) are bilinear functions, it is natural to consider
subsets that are vector subspaces.
We say that (f , μ) has exponential decay of correlations on a given vector
space V if there exists λ < 1 and for every ϕ, ψ ∈ V there exists A(ϕ, ψ) > 0
such that
|Cn (ϕ, ψ)| ≤ A(ϕ, ψ)λn for every n ≥ 1. (7.4.1)
There are similar notions (polynomial decay, for instance) where the exponen-
tial λn is replaced by some other sequence converging to zero.
To illustrate the theory, let us analyze the issue of decay of correlations in
the context of one-sided Markov shifts. That will also allow us to introduce
several ideas that will be useful later in more general situations. Let f : M → M
be the shift map in M = X N , where X = {1, . . . , d} is a finite set. Let P = (Pi,j )i,j
be an aperiodic stochastic matrix and p = (pi )i be the positive eigenvector of
P∗ , normalized by p1 + · · · + pd = 1. Let μ be the Markov measure defined in
M by (7.2.2).
Consider L = G−1 P∗ G, where G is the diagonal matrix whose coefficients
are p1 , . . . , pd . The coefficients of L are given by
pj
Li,j = Pj,i for each 1 ≤ i, j ≤ d.
pi
Recall that we denote u = (1, . . . , 1) and H = {(h1 , . . . , hd ) : h1 + · · · + hd = 0}.
Let
V = {(v1 , . . . , vd ) : p1 v1 + · · · + pd vd = 0}.
Then G(u) = p and G(V) = H. Recalling (7.2.15), it follows that the
decomposition
Rd = Ru ⊕ V (7.4.2)
is invariant under L and all the eigenvalues of the restriction of L to V are
smaller than 1 in absolute value. We say that L has the spectral gap property if
7.4 Decay of correlations 209

the largest eigenvalue is simple and all the rest of the spectrum is contained in
a closed disk with strictly smaller radius.
The transfer operator of the shift map f is the linear operator Lf mapping
each function ψ : M → R to the function Lf ψ : M → R defined by


d
Lf ψ(x1 , . . . , xn , . . . , ) = Lx1 ,x0 ψ(x0 , x1 , . . . , xn , . . . ). (7.4.3)
x0 =1

The transfer operator is dual to the Koopman operator Uf , in the following


sense:
 
ϕ(Lf ψ) dμ = (Uf ϕ)ψ dμ (7.4.4)

for any bounded measurable functions ϕ, ψ. Let us prove this fact.


We call a function ϕ : M → R locally constant if there is n ≥ 0 such
that every ϕ(x) depends only on the first n coordinates x0 , . . . , xn−1 of the
point x. For example, characteristic functions of cylinders are locally constant
functions. Since every bounded measurable function is a uniform limit of linear
combinations of characteristic functions of cylinders, it follows that every
bounded measurable function is the uniform limit of some sequence of locally
constant functions. Thus, to prove (7.4.4) it suffices to consider the case when
ϕ and ψ are both locally constant.
Then, consider functions ϕ and ψ that depend only on the first n coordinates.
By the definition of Markov measure,
 
ϕ(Lf ψ) dμ = pa1 Pa1 ,a2 · · · Pan−1 ,an ϕ(a1 , . . . , an )Lf ψ(a1 , . . . , an ).
a1 ,...,an

Using the definition of the transfer operator, the right-hand side of this
expression is equal to

pa0 Pa0 ,a1 Pa1 ,a2 · · · Pan−1 ,an ϕ(a1 , . . . , an )ψ(a0 , a1 , . . . , an ).
a0 ,a1 ,...,an

Observe that ϕ(a1 , . . . , an ) = Uf ϕ(a0 , a1 , . . . , an ). So, using once more the


definition of the Markov measure, this last expression is equal to (Uf ϕ)ψ dμ.
This proves the duality property (7.4.4).
As a consequence, we may write the correlations sequence in terms of the
iterates of the transfer operator:
    
 
Cn (ϕ, ψ) = (Ufn ϕ)ψ − ϕ dμ ψ dμ = ϕ Lnf ψ − ψ dμ dμ. (7.4.5)

The property Lu = u means that j Li,j = 1 for every j. This has the following
useful consequence:

sup |Lf ψ| ≤ sup |ψ| for every ψ. (7.4.6)


210 Correlations

Taking ϕ ≡ 1 in (7.4.4) we get the following special case, which will also be
useful later:  
Lf ψ dμ = ψ dμ for every ψ. (7.4.7)

Now let us denote by E0 the subset of functions ψ that depend only on the
first coordinate. The map ψ → (ψ(1), . . . , ψ(d)) is an isomorphism between
E0 and the Euclidean space Rd . Moreover, the definition

d
Lf ψ(x1 ) = Lx1 ,x0 ψ(x0 )
x0 =1

shows that the restriction of the transfer operator to E0 corresponds precisely


to the operator L : Rd → Rd . Note also that the hyperplane V ⊂ Rd corresponds
to the subset of ψ ∈ E0 such that ψ dμ = 0. Consider in E0 the norm defined
by ψ0 = sup |ψ|.
Fix any number λ between 1 and the spectral radius of L restricted to V.
Every function ψ ∈ E0 may be written:
 
ψ = c + v with c = ψ dμ ∈ Ru and v = ψ − ψ dμ ∈ V.

Then the spectral gap property implies that there exists B > 1 such that

 n 

sup Lf ψ − ψ dμ ≤ Bψ0 λn for every n ≥ 1. (7.4.8)

Using (7.4.5), it follows that


|Cn (ϕ, ψ)| ≤ Bϕ0 ψ0 λn for every n ≥ 1.
In this way, we have shown that every aperiodic Markov shift has exponential
decay of correlations in E0 .
With a little more effort, one can improve this result, by extending the
conclusion to a much larger space of functions. Consider in M the distance
defined by
 
d (xn )n , (yn )n = 2−N(x,y) where N(x, y) = min{n ≥ 0 : xn  = yn }.
Fix any θ > 0 and denote by E the set of functions ϕ that are θ -Hölder, that is,
such that  
|ϕ(x) − ϕ(y)|
Kθ (ϕ) = sup : x = y is finite.
d(x, y)θ
It is clear that E contains all the locally constant functions. We claim:

Theorem 7.4.1. Every aperiodic Markov shift (f , μ) has exponential decay of


correlations in the space E of θ -Hölder functions, for any θ > 0.

Observe that Lf (E) ⊂ E. The function ψ = sup |ψ| + Kθ (ψ) is a complete
norm in E and the linear operator Lf : E → E is continuous relative to this
7.4 Decay of correlations 211

norm (Exercise 7.4.1). One way to prove the theorem is by showing that this
operator has the spectral gap property, with invariant decomposition
  
E = Ru ⊕ ψ ∈ E : ψ dμ = 0 .

Once that is done, exactly the same argument that we used before for E0 proves
the exponential decay of correlations in E. We do not present the details here
(but we will come back to this theme, in a much more general context, near
the end of Section 12.3). Instead, we give a direct proof that (7.4.8) may be
extended to the space E.
Given ψ ∈ E and x = (x1 , . . . , xn , . . . ) ∈ M, we have

d
Lkf ψ(x) = Lx1 ,ak · · · La2 ,a1 ψ(a1 , . . . , ak , x1 , . . . , xn , . . . )
a1 ,...,ak =1

for every k ≥ 1. Then, given y = (y1 , . . . , yn , . . . ) with x1 = y1 = j,



d
|Lkf ψ(x) − Lkf ψ(y)| ≤ Lj,ak · · · La2 ,a1 Kθ (ψ)2−kθ d(x, y)θ .
a1 ,...,ak =1
d
Using the property i=1 Lj,i = 1, we conclude that
|Lkf ψ(x) − Lkf ψ(y)| ≤ Kθ (ψ)2−kθ d(x, y)θ ≤ Kθ (ψ)2−kθ . (7.4.9)
Given any function ϕ, denote by π ϕ the function that depends only on the first
coordinate and coincides with the mean of ϕ on each cylinder [0; i]:

1
π ϕ(i) = ϕ dμ.
pi [0;i]
It is clear that sup |π ϕ| ≤ sup |ϕ| and π ϕ dμ = ϕ dμ. The inequality (7.4.9)
implies that
sup |Lkf ψ − π(Lkf ψ)| ≤ Kθ (ψ)2−kθ for every k ≥ 1.
Then, using the property (7.4.6),
−kθ
f ψ − Lf π(Lf ψ)| ≤ Kθ (ψ)2
sup |Lk+l l k
for every k, l ≥ 1. (7.4.10)
Moreover, the properties (7.4.6) and (7.4.7) imply that
 
sup |π(Lf ψ)| ≤ sup |ψ| and
k
π(Lf ψ) dμ = ψ dμ.
k

Therefore, the property (7.4.8) gives that


  
 l 
sup Lf π(Lkf ψ) − ψ dμ ≤ B sup |ψ|λl for every l ≥ 1. (7.4.11)

Adding (7.4.10) and (7.4.11), we get that


  
 k+l 
sup Lf ψ − ψ dμ ≤ Kθ (ψ)2−kθ + B sup |ψ|λl for every k, l ≥ 1.
212 Correlations

Fix σ < 1 such that σ 2 ≥ max{2−θ , λ}. Then the previous inequality (with
l ≈ n/2 ≈ k) gives
  
 n 
sup Lf ψ − ψ dμ ≤ Bψσ n−1 for every n. (7.4.12)

Now Theorem 7.4.1 follows from the same argument that we used before for
E0 , with (7.4.12) in the place of (7.4.8).

7.4.1 Exercises
7.4.1. Show that ϕ = sup |ϕ| + Kθ (ϕ) defines a complete norm in the space E of
θ-Hölder functions and the transfer operator Lf is continuous relative to this
norm.
7.4.2. Let f : M → M be a local diffeomorphism on a compact manifold M and d ≥ 2
be the degree of f . Assume that there exists σ > 1 such that Df (x)v ≥ σ v
for every x ∈ M and every vector v tangent to M at the point x. Fix θ > 0 and let
E be the space of θ -Hölder functions ϕ : M → R. For every ϕ ∈ E, define
1 
Lf ϕ : M → R, Lf ϕ(y) = ϕ(x).
d −1
x∈f (y)

(a) Show that inf ϕ ≤ inf Lf ϕ ≤ sup Lf ϕ ≤ sup ϕ and Kθ (Lf ϕ) ≤ σ −θ Kθ (ϕ) for
every ϕ ∈ E.
(b) Conclude that Lf : E → E is a continuous linear operator (relative to the
norm defined in Exercise 7.4.1) with Lf  = 1.
(c) Show that, for every ϕ ∈ E, the sequence (Lnf ϕ)n converges to a constant
νϕ ∈ R when n → ∞. Moreover, there exists C > 0 such that

Lnf ϕ − νϕ  ≤ Cσ −nθ ϕ for every n and every ϕ ∈ E.


(d) Conclude that the operator Lf : E → E has the spectral gap property.
(e) Show that the map ϕ → νϕ extends to a Borel probability measure on M
(recall Theorem A.3.12).
8
Equivalent systems

This chapter is devoted to the isomorphism problem in ergodic theory: under


what conditions should two systems (f , μ) and (g, ν) be considered “the
same” and how does one decide, for given systems, whether they are in those
conditions?
The fundamental notion is called ergodic equivalence: two systems are said
to be ergodically equivalent if, restricted to subsets with full measure, the
corresponding transformations are conjugated by some invertible map that
preserves the invariant measures. Through such a map, properties of either
system may be translated to corresponding properties of the other system.
Although this is a natural notion of isomorphism in the context of ergodic
theory, it is not an easy one to handle. In general, the only way to prove that two
given systems are equivalent is by exhibiting the equivalence map more or less
explicitly. On the other hand, the most usual way to show that two systems are
not equivalent is by finding some property that holds for one but not the other.
Thus, it is useful to consider a weaker notion, called spectral equivalence:
two systems are spectrally equivalent if their Koopman operators are conju-
gated by some unitary operator. Two ergodically equivalent systems are always
spectrally equivalent, but the converse is not true.
The idea of spectral equivalence leads to a rich family of invariants, related
to the spectrum of the Koopman operator, that must have the same value for
any two systems that are equivalent and, thus, may be used to exclude that
possibility. Other invariants, of non-spectral nature, have an equally crucial
role. The most important of all, the entropy, will be treated in Chapter 9.
The notions of ergodic equivalence and spectral equivalence, and the
relations between them, are studied in Sections 8.1 and 8.2, respectively. In
Sections 8.3 and 8.4 we study two classes of systems with opposite dynamical
features: transformations with discrete spectrum, that include the ergodic
translations on compact abelian groups, and transformations with a Lebesgue
spectrum, which have the Bernoulli shifts as the main example.
These two classes of systems, as well as others that we introduced previously
(ergodicity, strong mixing, weak mixing) are invariants of spectral equivalence
214 Equivalent systems

and, hence, also of ergodic equivalence. Finally, in Section 8.5 we discuss a


third notion of equivalence, that we call ergodic isomorphism, especially in
the context of Lebesgue spaces.

8.1 Ergodic equivalence


Let μ and ν be probability measures invariant under measurable transforma-
tions f : M → M and g : N → N, respectively. We say that the systems (f , μ)
and (g, ν) are ergodically equivalent if one can find measurable sets X ⊂ M
and Y ⊂ N with μ(M \ X) = 0 and ν(N \ Y) = 0, and a measurable bijection
φ : X → Y with measurable inverse, such that

φ∗ μ = ν and φ ◦ f = g ◦ φ.

We leave it to the reader to check that this is indeed an equivalence relation,


that is, reflexive, symmetric and transitive.
Observe also that the sets X and Y in the definition may be chosen to be
invariant under f and g, respectively. Indeed, consider X0 = ∞ n=0 f
−n
(X). It
is clear from the definition that X0 ⊂ X and f (X0 ) ⊂ X0 . Since μ(X) = 1
and the intersection is countable, we have that μ(X0 ) = 1. Analogously, Y0 =
∞ −n
n=0 g (Y) is a measurable subset of Y such that ν(Y0 ) = 1 and g(Y0 ) ⊂ Y0 .
Moreover, by construction, Y0 = φ(X0 ). Therefore, the restriction of φ to X0 is
still a bijection onto Y0 .

Example 8.1.1. Let f : [0, 1] → [0, 1] be defined by f (x) = 10x − [10x]. As


we saw in Section 1.3.1, this transformation preserves the Lebesgue measure
m on [0, 1]. If one represents each number x ∈ [0, 1] by its decimal expansion
x = 0.a0 a1 a2 . . . , the transformation f corresponds, simply, to shifting all the
digits of x one unit to the left. That motivates us to consider:

N
   an
φ : {0, 1, . . . , 9} → [0, 1], φ (an )n = n+1
= 0.a0 a1 a2 . . .
n=0
10

It is clear that φ is surjective. On the other hand, it is not injective, since


certain real numbers have more than one decimal expansion: for example,
0.1000000 . . . = 0.099999 . . . Actually, this happens if and only the number
admits a finite decimal expansion, that is, such that all but finitely many digits
are zero. The set of such numbers is countable and, hence, is irrelevant from the
point of view of the Lebesgue measure. More precisely, let us consider the set
X ⊂ {0, 1, . . . , 9}N of all sequences with an infinite number of symbols different
from zero and the set Y ⊂ [0, 1] of all numbers whose decimal expansion is
infinite (hence, unique). Then the restriction of φ to X is a bijection onto Y.
It is easy to check that both φ | X and its inverse are measurable: use the fact
that the image of the intersection of X with each cylinder [0; a0 , . . . , am−1 ] is
8.1 Ergodic equivalence 215

the intersection of Y with some interval of length 10−m . This observation also
shows that φ∗ ν = m, where ν denotes the Bernoulli measure on {0, 1, . . . , 9}N
that assigns equal weights to all the digits. Moreover, denoting by σ the shift
map in {0, 1, . . . , 9}N , we have that
   
φ ◦ σ (an )n = 0, a1 a2 . . . an · · · = f ◦ φ (an )n

for every (an )n ∈ X. This proves that (f , m) is ergodically equivalent to the


Bernoulli shift (σ , ν).

Suppose that (f , μ) and (g, ν) are ergodically equivalent. A measurable set


A ⊂ M is invariant under f : M → M if and only if φ(A) is invariant under
g : N → N. Moreover, ν(φ(A)) = μ(A). Therefore, (f , μ) is ergodic if and
only if (g, ν) is ergodic. It is just as easy to obtain similar conclusions for the
mixing and the weak mixing properties. Indeed, essentially all the properties
that we study in this book are invariants of ergodic equivalence, that is, if they
hold for a given system then they also hold for any system that is ergodically
equivalent to it. An exception is unique ergodicity, which is a property of a
different nature, since it concerns solely the transformation.
This also means that these properties may be used to try to distinguish
systems that are not ergodically equivalent. Still, that is usually a difficult
task. For example, nothing of what was said so far is of much help towards
answering the following question: are the shift maps

σ : {1, 2}Z → {1, 2}Z and ζ : {1, 2, 3}Z → {1, 2, 3}Z , (8.1.1)

endowed with the corresponding Bernoulli measures giving the same weights
to all the symbols, ergodically equivalent? It is easy to see that σ and ζ are
not topologically conjugate (for example: ζ has three fixed points, whereas
σ has only two), but the existence of an ergodic equivalence is a much more
delicate issue. In fact, this type of question motivates most of the content of
the present chapter and also leads to the notion of entropy, which is the subject
of Chapter 9.

Example 8.1.2. Let σ : M → M be the shift map in M = X N and let μ =


ν N be a Bernoulli measure. Let σ̂ : M̂ → M̂ be the natural extension of σ
and μ̂ be the lift of μ (Section 2.4.2). Moreover, let σ̃ : M̃ → M̃ be the shift
map in M̃ = X Z and μ̃ = ν Z be the corresponding Bernoulli measure. Then,
(σ̂ , μ̂) is ergodically equivalent to (σ̃ , μ̃). An equivalence may be constructed
as follows.
By definition, M̂ is the space of pre-orbits of σ : M → M, that is, of all
the sequences x̂ = (. . . , x−n , . . . , x0 ) in M such that σ (x−j ) = x−j+1 for every
j ≥ 1. Moreover, each x−j is a sequence (x−j,i )i∈N in X. So, the previous relation
means that
x−j,i+1 = x−j+1,i for every i, j ∈ N. (8.1.2)
216 Equivalent systems

Consider the map φ : M̂ → M̃, x̂ → x̃ given by


x̃n = x0,n = x−1,n+1 = · · · and x̃−n = x−n,0 = x−n−1,1 = · · · .
We leave it to the reader to check that φ is indeed an ergodic equivalence
between the natural extension (σ̂ , μ̂) and the two-sided shift map (σ̃ , μ̃).

8.1.1 Exercises
8.1.1. Let f : [0, 1] → [0, 1] be the transformation defined by f (x) = 2x − [2x] and m be
the Lebesgue measure on [0, 1]. Exhibit a map g : [0, 1] → [0, 1] and a probability
measure ν invariant under g such that (g, ν) is ergodically equivalent to (f , μ) and
the support of ν has empty interior.
8.1.2. Let f : {1, . . . , k}N → {1, . . . , k}N and g : {1, . . . , l}N → {1, . . . , l}N be one-sided shift
maps, endowed with Bernoulli measures μ and ν, respectively. Show that, for
every set X ⊂ {1, . . . , k}N with f −1 (X) = X and μ(X) = 1, there exists x ∈ X such
that #(X ∩ f −1 (x)) = k. Conclude that if k = l then (f , μ) and (g, ν) cannot be
ergodically equivalent.
8.1.3. Let X = {1, . . . , d} and consider the shift map σ : X N → X N endowed with
a Markov measure μ. Given any cylinder C = [0; c0 , . . . , cl ] in X N , let μC
be the normalized restriction of μ to C. Show that there exists an induced
transformation σC : C → C (see Section 1.4.2) preserving μC and such that
(σC , μC ) is ergodically equivalent to a Bernoulli shift (σN , ν) in NN .

8.2 Spectral equivalence


Let f : M → M and g : N → N be transformations preserving probability
measures μ and ν, respectively. Let Uf : L2 (μ) → L2 (μ) and Ug : L2 (ν) →
L2 (ν) be the corresponding Koopman operators. We say that (f , μ) and (g, ν)
are spectrally equivalent if there exists some unitary operator L : L2 (μ) →
L2 (ν) such that
Ug ◦ L = L ◦ Uf . (8.2.1)
We leave it to the reader to check that the relation defined in this way is, indeed,
an equivalence relation.
It is easy to see that if two systems are ergodically equivalent then they
are spectrally equivalent. Indeed, suppose that there exists an invertible map
h : M → N such that φ∗ μ = ν and φ ◦ f = g ◦ φ. Then, the Koopman operator
Uφ : L2 (ν) → L2 (μ), Uφ (ψ) = ψ ◦ φ
is an isometry and is invertible: the inverse is the Koopman operator associated
with φ −1 . In other words, Uφ is a unitary operator. Moreover,
Uf ◦ Uφ = Uφ◦f = Ug◦φ = Uφ ◦ Ug .
Therefore, L = Uφ is a spectral equivalence between the two systems.
8.2 Spectral equivalence 217

The converse is false, as will be clear from the sequel. For example,
all countably generated two-sided Bernoulli shifts are spectrally equivalent
(Corollary 8.4.12); however, not all have the same entropy (Example 9.1.10)
and so not all are ergodically equivalent.

8.2.1 Invariants of spectral equivalence


Recall that the spectrum spec(A) of a bounded linear operator A : E → E in a
complex Banach space E consists of the complex numbers λ such that A − λid
is not invertible. We say that λ ∈ spec(A) is an eigenvalue if A − λid is not
injective, that is, if there exists v = 0 such that Av = λv. Then, the dimension
of the kernel of A − λid is called the multiplicity of the eigenvalue.
By definition, the spectrum of a system (f , μ) is the spectrum of the
corresponding Koopman operator Uf : L2 (μ) → L2 (μ). If (f , μ) is spectrally
equivalent to (g, ν) then the two systems have the same spectrum: the relation
(8.2.1) implies that
(Ug − λ id ) = L ◦ (Uf − λ id ) ◦ L−1 (8.2.2)
and, consequently, Ug − λ id is invertible if and only if Uf − λ id is invertible.
In fact, the spectrum itself is a poor invariant: in particular, all the invertible
ergodic systems with no atoms have the same spectrum (Exercise 8.2.1).
However, the associated spectral measure does provide very useful invariants,
as we are going to see. The simplest one is the set of atoms of the spectral
measure, that is, the set of eigenvalues of the Koopman operator. Note that
(8.2.2) also shows that a given λ is an eigenvalue of Uf if and only if it is an
eigenvalue of Ug ; besides, in that case the two multiplicities are equal.
Observe that 1 is always an eigenvalue of the Koopman operator, since
Uf ϕ = ϕ for every constant function ϕ. By Proposition 4.1.3(v), the system
(f , μ) is ergodic if and only if this eigenvalue has multiplicity 1 for Uf . Thus,
it follows from what we have just said that (f , μ) is ergodic if and only if any
system (g, ν) spectrally equivalent to it is ergodic. In other words, ergodicity is
an invariant of spectral equivalence.
Analogously, suppose that the system (f , μ) is mixing. Then, by Proposi-
tion 7.1.12,  
lim Ufn ϕ · ψ = ϕ dμ ψ dμ
n

for every ϕ, ψ ∈ L2 (μ). Now suppose that (g, ν) is spectrally equivalent


to (f , μ). Let L be the unitary operator in (8.2.1). The inverse L−1 maps
eigenvectors of Ug associated with the eigenvalue 1 to eigenvectors of Uf
associated with the same eigenvalue 1. Since the two systems are ergodic
(use the previous paragraph), this means that L−1 maps constant functions to
constant functions. Since L−1 is unitary,
Ugn ϕ · ψ = L−1 (Ugn ϕ) · L−1 ψ = Ufn (L−1 ϕ) · L−1 ψ
218 Equivalent systems

and, hence, limn Ugn ϕ · ψ = L−1 ϕ dμ L−1 ψ dμ for every ϕ, ψ ∈ L2 (ν). Also,
 
L−1 ϕ dμ = L−1 ϕ · 1 = L−1 ϕ · L−1 1 = ϕ · 1 = ϕ dν

and, analogously, L−1 ψ dμ = ψ dμ. In this way, we have shown that


 
lim Ug ϕ · ψ = ϕ dμ ψ dμ,
n
n

for every ϕ, ψ ∈ L2 (ν), that is, (g, ν) is also mixing. This shows that the mixing
property is an invariant of spectral equivalence.
The same argument may be used for the weak mixing property, though the
theorem that we prove in Section 8.2.2 below gives us a more interesting proof
of the fact that weak mixing is an invariant of spectral equivalence.

8.2.2 Eigenvalues and weak mixing


As we have seen, the Koopman operator Uf : L2 (μ) → L2 (μ) of a system (f , μ)
is an isometry, that is, it satisfies Uf∗ Uf = id . If f is invertible then the Koopman
operator is unitary, that is, it satisfies Uf∗ Uf = Uf Uf∗ = id . In particular, in that
case Uf is a normal operator. Then the property of weak mixing admits the
following interesting characterization:
Theorem 8.2.1. An invertible system (f , μ) is weak mixing if and only if the
constant functions are the only eigenvectors of the Koopman operator.
In particular, a system (f , μ) is weak mixing if and only if it is ergodic and
1 is the unique eigenvalue of Uf .

Proof. Suppose that (f , μ) is weak mixing. Let ϕ ∈ L2 (μ) be any (non-zero)


eigenfunction of Uf and λ be the corresponding eigenvalue. Then
  
ϕ dμ = Uf ϕ dμ = λ ϕ dμ,

and this implies that ϕ dμ = 0 or λ = 1. In the first case,


     
   j 
Cj (ϕ, ϕ̄) =  (Uf ϕ)ϕ̄ dμ = λ ϕ ϕ̄ dμ = |ϕ|2 dμ
j

for every j ≥ 1 (recall that |λ| = 1). But then



1
n−1
lim Cj (ϕ, ϕ̄) = |ϕ|2 dμ > 0,
n n
j=0

contradicting the hypothesis that the system is weak mixing. In the second
case, using that the system is ergodic, we find that ϕ is constant at μ-almost
every point. This shows that if the system is weak mixing then the constant
functions are the only eigenvectors.
8.2 Spectral equivalence 219

Now suppose that the only eigenvectors of Uf are the constant functions. To
conclude that (f , μ) is weak mixing, we must show that

1
n−1
Cj (ϕ, ψ)2 → 0 for any ϕ, ψ ∈ L2 (μ)
n j=0

(recall Exercise 7.1.2). It follows immediately from the definition that



 
Cj (ϕ, ψ) = Cj (ϕ , ψ) where ϕ = ϕ − ϕ dμ

and the integral of ϕ  vanishes. Hence, it is no restriction to suppose that


ϕ dμ = 0. Then, using the relation (A.7.6) for the unitary operator L = Uf ,
we get:
 2   2
   
Cj (ϕ, ψ)2 =  (Uf ϕ)ψ dμ =  zj dθ (z) ,
j

C
where θ = Eϕ · ψ. The expression on the right-hand side may be rewritten as
follows:    
z dθ (z) z̄ dθ̄ (z) =
j j
zj w̄j dθ (z) dθ̄ (w).
C C C C
Therefore, given any n ≥ 1,
 
1 1
n−1 n−1
Cj (ϕ, ψ)2 = (zw̄)j dθ (z) dθ̄ (w). (8.2.3)
n j=0 C C n j=0

We claim that the measure θ = Eϕ · ψ is non-atomic. In fact, suppose that


there exists λ ∈ C such that θ ({λ}) = 0. Then, E({λ}) = 0 and then we may
use Proposition A.7.8 to conclude that the function E({λ})ϕ is an eigenvector
of Uf . By the hypothesis about the operator Uf , this implies that E({λ})ϕ is
constant at μ-almost every point. Hence,

E({λ})ϕ · ϕ = E({λ})ϕ ϕ̄ dμ = 0.

Lemma A.7.3 also gives that


E({λ})ϕ · ϕ = E({λ})2 ϕ · ϕ = E({λ})ϕ · E({λ})ϕ.
Putting these two identities together, we conclude that E({λ})ϕ = 0, which
contradicts the hypothesis. Thus, our claim is proved.

The sequence n−1 n−1 j=0 (zw̄) in (8.2.3) is bounded and (see Exercise 8.2.6)
j

converges to zero on the complement of the diagonal = {(z, w) : z = w}.


Moreover, the diagonal has measure zero:

(θ × θ̄ )( ) = θ ({y}) dθ̄ (y) = 0,

because θ is non-atomic. Then we may use the monotone convergence theorem


to conclude that (8.2.3) converges to zero when n → ∞. This proves that (f , μ)
is weak mixing if Uf has no non-constant eigenvectors.
220 Equivalent systems

Suppose that M is a topological space. We say that a continuous map


f : M → M is topologically weak mixing if the Koopman operator Uf has
no non-constant continuous eigenfunctions. The following fact is an easy
consequence of Theorem 8.2.1:
Corollary 8.2.2. If (f , μ) is weak mixing then the restriction of f to the support
of μ is topologically weak mixing.

Proof. Let ϕ be a continuous eigenfunction of Uf . By Theorem 8.2.1, the


function ϕ is constant at μ-almost every point. Hence, by continuity, ϕ is
constant (at every point) on the support of μ.

We mentioned in Section 7.3 that almost every interval exchange is weak


mixing but not mixing. In the following we describe an explicit construction,
based on an extension of ideas that were hinted at in Example 6.3.9. The reader
may find this and other variations of those ideas in Section 7.4 of Kalikow and
McCutcheon [KM10].
Example 8.2.3 (Chacon). Consider the sequence (Sn )n of piles defined as
follows. First, S1 = {[0, 2/3)}. Next, for each n ≥ 1, let Sn be the pile obtained
by dividing Sn−1 into three columns, with the same width, and piling those
columns up on top of each other, with an additional interval inserted between
the second pile and the third one, as illustrated in Figure 8.1.
For example, S2 = {[0, 2/9), [2/9, 4/9), [6/9, 8/9), [4/9, 6/9)} and
S3 = {[0, 2/27), [6/27, 8/27), [18/27, 20/27), [12/27, 14/27), [2/27, 4/27),
[8/27, 10/27), [20/27, 22/27), [14/27, 16/27), [24/27, 26/27),
[4/27, 6/27), [10/27, 12/27), [22/27, 24/27), [16/27, 18/27)}.
Note that each Sn is a pile in the interval Jn = [0, 1 − 3−n ). The sequence (fn )n
of transformations associated with such piles converges at every point to a
transformation f : [0, 1) → [0, 1) that preserves the Lebesgue measure m. This
system (f , m) is weak mixing but not mixing (Exercise 8.2.7).

Ik–1

I0

Figure 8.1. Constructing a weak mixing system that is not mixing


8.2 Spectral equivalence 221

8.2.3 Exercises
8.2.1. Let (f , μ) be an invertible ergodic system with no atoms. Show that every λ in the
unit circle {z ∈ C : |z| = 1} is an approximate eigenvalue of the Koopman operator
Uf : L2 (μ) → L2 (μ): there exists some sequence (ϕn )n such that ϕn  → 1 and
Uf ϕn − λϕn  → 0. In particular, the spectrum of Uf coincides with the unit
circle.
8.2.2. Let m be the Lebesgue measure on the circle and Uα : L2 (m) → L2 (m) be
the Koopman operator of the irrational rotation Rα : S1 → S1 . Calculate the
eigenvalues of Uα and deduce that (Rα , m) and (Rβ , m) are spectrally equivalent
if and only if α = ±β. [Observation: Corollary 8.3.6 provides a more complete
statement.]
8.2.3. Let m be the Lebesgue measure on the circle and, for each integer number k ≥ 2,
let Uk : L2 (m) → L2 (m) be the Koopman operator of the transformation fk : S1 →
S1 given by fk (x) = kx mod Z. Check that if p  = q then (fp , m) and (fq , m) are not
ergodically equivalent. Show that, for any k ≥ 2,



j
L (m) = {constants} ⊕
2
Uk (Hk ),
j=0


where Hk = { n∈Z an e2π inx : an = 0 if k | n} and the terms in the direct sum are
pairwise orthogonal. Conclude that (fp , m) and (fq , m) are spectrally equivalent
for any p and q.
8.2.4. Let f : S1 → S1 be the transformation given by f (x) = kx mod Z and μ be the
Lebesgue measure. Show that (f , μ) is weak mixing if and only if |k| ≥ 2.
8.2.5. Prove that, for any invertible transformation f , if μ is ergodic for every iterate f n
and there exists C > 0 such that
 
lim sup μ f −n (A) ∩ B ≤ Cμ(A)μ(B),
n

for any measurable sets A and B, then μ is weak mixing. [Observation: This
statement is due to Ornstein [Orn72]. In fact, he proved more: under these
hypotheses the system is (strongly) mixing.]
8.2.6. Let z and w be two complex numbers with absolute value 1. Check that
1 j
n−1
(a) lim |z − 1| = 0 if and only if z = 1;
n n
j=0

1
n−1
(b) lim (zw̄)j = 0 if z  = w.
n n j=0
8.2.7. Consider the system (f , m) in Example 8.2.3. Show that
(a) the system (f , m) is ergodic;
(b) the only eigenvalues of the Koopman operator Uf : L1 (m) → L1 (m) are the
constant functions, and hence (f , m) is weak mixing;
(c) lim supn m(f n (A) ∩ A) ≥ 2/27 if we take A = [0, 2/9); in particular, (f , m) is
not mixing.
222 Equivalent systems

8.3 Discrete spectrum


In this section and the next we study two extreme cases, in terms of the type
of spectral measure of the Koopman operator: systems with discrete spectrum,
whose spectral measure is purely atomic, and systems with Lebesgue spectrum,
whose spectral measure is equivalent to the Lebesgue measure on the unit
circle.
We begin by describing some properties of the eigenvalues and eigenvectors
of the Koopman operator. It is clear that all the eigenvalues are in the unit
circle, since Uf is an isometry.
Proposition 8.3.1. If ϕ1 , ϕ2 ∈ L2 (μ) satisfy Uf ϕ1 = λ1 ϕ1 and Uf ϕ2 = λ2 ϕ2 with
λ1 = λ2 , then ϕ1 · ϕ2 = 0. Moreover, the eigenvalues of Uf form a subgroup of
the unit circle.
If the system (f , μ) is ergodic then every eigenvalue of Uf is simple and the
absolute value of every eigenfunction is constant at μ-almost every point.

Proof. The first claim follows from the identity


ϕ1 · ϕ2 = Uf ϕ1 · Uf ϕ2 = λ1 ϕ1 · λ2 ϕ2 = λ1 λ̄2 (ϕ1 · ϕ2 ) = λ1 λ−1
2 (ϕ1 · ϕ2 ),

since λ1 λ−12  = 1. This identity also shows that the set of all eigenvalues is
closed under the operation (λ1 , λ2 ) → λ1 λ−1 2 . Recalling that 1 is always an
eigenvalue, it follows that this set is a multiplicative group.
Now assume that (f , μ) is ergodic and suppose that Uf ϕ = λϕ. Then
Uf (|ϕ|) = |Uf ϕ| = |λϕ| = |ϕ| at μ-almost every point. By ergodicity, this
implies that |ϕ| is constant at μ-almost every point. Next, suppose that Uf ϕ1 =
λϕ1 , Uf ϕ2 = λϕ2 and the functions ϕ1 and ϕ2 are not identically zero. Since
|ϕ2 | is constant at μ-almost every point, ϕ2 (x)  = 0 for μ-almost every x. Then
ϕ1 /ϕ2 is well defined. Moreover,
 
ϕ1 Uf (ϕ1 ) λϕ1 ϕ1
Uf = = = .
ϕ2 Uf (ϕ2 ) λϕ2 ϕ2
By ergodicity, it follows that the quotient is constant at μ-almost every point.
That is, ϕ1 = cϕ2 for some c ∈ C.
We say that a system (f , μ) has discrete spectrum if the eigenvectors
of the Koopman operator Uf : L2 (μ) → L2 (μ) generate the Hilbert space
L2 (μ). Observe that this implies that Uf is invertible and, hence, is a unitary
operator. This terminology is justified by the following observation (recall also
Theorem A.7.9):
Proposition 8.3.2. A system (f , μ) has discrete spectrum if and only if its
Koopman operator Uf has a spectral representation of the form
   
T: L2 (σj )χj → L2 (σj )χj , (ϕj,l )j,l → z → zϕj,l (z) j,l , (8.3.1)
j j

where each σj is a Dirac measure at a point in the unit circle.


8.3 Discrete spectrum 223

Proof. Suppose that Uf admits a spectral representation of the form (8.3.1)


with σj = δλj for some λj in the unit circle. Each L2 (σj )χj may be canonically
identified with a subspace in the direct sum. The restriction of T to that
subspace coincides with λj id , since
z ϕj,l (z) = λj ϕj,l (z) at σj -almost every point. (8.3.2)
Let (vj,l )l be a Hilbert basis of L2 (σj )χj . Then (vj,l )j,l is a Hilbert basis of the
direct sum formed by eigenvectors of T. Since T is unitarily conjugate to Uf ,
it follows that L2 (μ) admits a Hilbert basis formed by eigenvectors of the
Koopman operator.
Now suppose that (f , μ) has discrete spectrum. Let (λj )j be the eigenvalues
of Uf and, for each j, let σj = δλj and χj be the Hilbert dimension of the
eigenspace ker(Uf − λj id ). Note that the space L2 (σj ) is 1-dimensional, since
every function is constant at σj -almost every point. Therefore, the Hilbert
dimension of L2 (δλj )χj is also equal to χj . Hence, there exists some unitary
isomorphism
Lj : ker(Uf − λj id ) → L2 (δλj )χj .
It is clear that Lj ◦ Uf ◦ Lj−1 = λj id . In other words, recalling the observation
(8.3.2),
   
Lj ◦ Uf ◦ Lj−1 : (ϕj,l )l → z → λj ϕj,l (z) l = z → zϕj,l (z) l . (8.3.3)
The eigenspaces ker(Uf − λj id ) generate L2 (μ), by hypothesis, and they
are pairwise orthogonal, by Proposition 8.3.1. Hence, we may combine the
1
operators Lj to obtain a unitary isomorphism L : L2 (μ) → j L2 (σj )χj . The
relation (8.3.3) gives that
 
L ◦ Uf ◦ L−1 : (ϕj,l )j,l → z → zϕj,l (z) l
is a spectral representation of Uf of the form we are looking for.

Example 8.3.3. Let m be the Lebesgue measure on the torus Td . Consider the
Fourier basis {φk (x) = e2πik·x : k ∈ Zd } of the Hilbert space L2 (m). Let f be the
rotation Rθ : Td → Td corresponding to a given vector θ = (θ1 , . . . , θd ). Then,
Uf φk (x) = φk (x + θ ) = e2πik·θ φk (x) for every x ∈ Td .
This shows that every φk is an eigenvector of Uf and, hence, (f , m) has discrete
spectrum. Note that the group of eigenvalues is
Gθ = {e2πik·θ : k ∈ Zd }, (8.3.4)
that is, the subgroup of the unit circle generated by {e2πiθj : j = 1, . . . , d}.

More generally, every ergodic translation in a compact abelian group has


discrete spectrum. Conversely, every ergodic system with discrete spectrum is
ergodically isomorphic to a translation in a compact abelian group (the notion
of ergodic isomorphism is discussed in Section 8.5). Another interesting result
224 Equivalent systems

is that every subgroup of the unit circle is the group of eigenvalues of some
ergodic system with discrete spectrum. These facts are proved in Section 3.3
of the book of Peter Walters [Wal82].
Proposition 8.3.4. Suppose that (f , μ) and (g, ν) are ergodic and have discrete
spectrum. Then (f , μ) and (g, ν) are spectrally equivalent if and only if their
Koopman operators Uf : L2 (μ) → L2 (μ) and Ug : L2 (ν) → L2 (ν) have the
same eigenvalues.

Proof. It is clear that if the Koopman operators are conjugate then they have
the same eigenvalues. To prove the converse, let (λj )j be the eigenvalues of the
two operators. By Proposition 8.3.2, the eigenvalues are simple. For each j,
let uj and vj be unit vectors in ker(Uf − λj id ) and ker(Ug − λj id ), respectively.
Then (uj )j and (vj )j are Hilbert bases of L2 (μ) and L2 (ν), respectively. Consider
the isomorphism L : L2 (μ) → L2 (ν) defined by L(uj ) = vj . This operator is
unitary, since it maps a Hilbert basis to a Hilbert basis, and it satisfies

L ◦ Uf (uj ) = L(λj uj ) = λj vj = Ug (vj ) = Ug ◦ L(uj )

for every j. By linearity, it follows that L ◦ Uf = Ug ◦ L. Therefore, (f , μ) and


(g, ν) are spectrally equivalent.

Corollary 8.3.5. If (f , μ) is ergodic, invertible and has discrete spectrum then


(f , μ) is spectrally equivalent to (f −1 , μ).

Proof. It is clear that λ is an eigenvalue of Uf if and only if λ−1 is an eigenvalue


of Uf −1 ; moreover, in that case the eigenvectors are the same. Since the sets
of eigenvalues are groups, it follows that the two operators have the same
eigenvalues and the same eigenvectors. Apply Proposition 8.3.4.

Let m be the Lebesgue measure on the torus Td . Proposition 8.3.4 also


allows us to classify the irrational rotations on the torus up to equivalence,
ergodic and spectral:

Corollary 8.3.6. Let θ = (θ1 , . . . , θd ) and τ = (τ1 , . . . , τd ) be rationally


independent vectors and Rθ and Rτ be the corresponding rotations on the torus
Td . The following conditions are equivalent:

(i) (Rθ , m) and (Rτ , m) are ergodically equivalent;


(ii) (Rθ , m) and (Rτ , m) are spectrally equivalent;
(iii) there exists L ∈ SL(d, Z) such that θ = Lτ mod Zd .

We leave the proof to the reader (Exercise 8.3.2). In the special case of the
circle, we get that two irrational rotations Rθ and Rτ are equivalent if and only
if either Rθ = Rτ or Rθ = R−1τ . See also Exercise 8.3.3.
8.4 Lebesgue spectrum 225

8.3.1 Exercises
8.3.1. Suppose that (f , μ) has discrete spectrum and the Hilbert space L2 (μ) is separable
(this is the case, for instance, if the σ -algebra of measurable sets is countably
generated). Show that there exists a sequence (nk )k converging to infinity such
n
that Uf k ϕ − ϕ2 converges to zero when k → ∞, for every ϕ ∈ L2 (μ).
8.3.2. Prove Corollary 8.3.6.
8.3.3. Let m be the Lebesgue measure on S1 and θ = p/q and τ = r/s be two rational
numbers, with gcd(p, q) = 1 = gcd(r, s). Show that the rotations (Rθ , m) and
(Rτ , m) are ergodically equivalent if and only if the denominators q and s are
equal.

8.4 Lebesgue spectrum


This section is devoted to the class of systems whose Koopman operator has
the following property (the reason for the terminology will become clear in
Proposition 8.4.10):

Definition 8.4.1. Let U : H → H be an isometry in a Hilbert space. We say


that U has Lebesgue spectrum if there exists some closed subspace E ⊂ H such
that

(i) U(E) ⊂ E;
(ii) n∈N U n (E) = {0};
 −n
(iii) n∈N U (E) = H.

Given a probability measure μ, we denote by L02 (μ) = L02 (M, B, μ) the


orthogonal complement, inside the space L2 (μ) = L2 (M, B, μ), of the subspace
of constant functions. In other words,

L0 (μ) = {ϕ ∈ L (μ) : ϕ dμ = 0}.
2 2

Note that L02 (μ) is invariant under the Koopman operator: ϕ ∈ L02 (μ) if and
only if Uf ϕ ∈ L02 (μ). We say that the system (f , μ) has Lebesgue spectrum if
the restriction of the Koopman operator to L02 (μ) has Lebesgue spectrum.

8.4.1 Examples and properties


We start by observing that all Bernoulli shifts have Lebesgue spectrum. It is
convenient to treat one-sided shifts and the two-sided shifts separately.

Example 8.4.2. Consider a one-sided shift map σ : X N → X N and a


Bernoulli measure μ = ν N on X N . Let E = L02 (μ). Conditions (i) and (iii) in
Definition 8.4.1 are obvious. To prove condition (ii), consider any function
ϕ ∈ L02 (μ) in the intersection, that is, such that for every n ∈ N there exists a
226 Equivalent systems

function ψn ∈ L02 (μ) satisfying ϕ = ψn ◦ σ n . We want to show that ϕ is constant


at μ-almost every point. For each c ∈ R, consider

Ac = {x ∈ X N : ϕ(x) > c}.

For each n ∈ N, we may write Ac = σ −n ({x ∈ X N : ψn (x) > c}). Then Ac belongs
to the σ -algebra generated by the cylinders of the form [n; Cn , . . . , Cm ] with
m ≥ n. Consequently, μ(Ac ∩ C) = μ(Ac )μ(C) for every cylinder C of the form
C = [0; C0 , . . . , Cn−1 ]. Since n is arbitrary and the cylinders are a generating
family, it follows that μ(Ac ∩B) = μ(Ac )μ(B) for every measurable set B ⊂ X N .
Taking B = Ac we conclude that μ(Ac ) = μ(Ac )2 ; in other words, μ(Ac ) ∈ {0, 1}
for every c ∈ R. This proves that ϕ is constant at μ-almost every point, as stated.

Example 8.4.3. Now consider a two-sided shift map σ : X Z → X Z and a


Bernoulli measure μ = ν Z . Let A be the σ -algebra generated by the cylinders
of the form [0; C0 , . . . , Cm ] with m ≥ 0. Denote by L02 (X Z , A, μ) the space of
all functions ϕ ∈ L02 (μ) that are measurable with respect to the σ -algebra A
(in other words, ϕ(x) depends only on the coordinates xn , n ≥ 0 of the point).
Take E = L02 (X Z , A, μ). Condition (i) in Definition 8.4.1 is obvious. Condition
(ii) follows from the same arguments that we used in Example 8.4.2. To prove
condition (iii), note that n Uσ−n (E) contains the characteristic functions of all
the cylinders. Therefore, it contains all the linear combinations of characteristic
functions of sets in the algebra generated by the cylinders. This implies that the
union is dense in L02 (μ), as we wanted to prove.

Lemma 8.4.4. If (f , μ) has Lebesgue spectrum then limn Ufn ϕ · ψ = 0 for every
ϕ ∈ L02 (μ) and every ψ ∈ L2 (μ).

Proof. Observe that the sequence Ufn ϕ · ψ is bounded. Indeed, by the


Cauchy–Schwarz inequality (Theorem A.5.4):

|Ufn ϕ · ψ| ≤ Ufn ϕ2 ψ2 = ϕ2 ψ2 for every n.


n
So, it is enough to prove that every convergent subsequence Uf j ϕ · ψ converges
to zero. Furthermore, the set {Ufn ϕ : n ∈ N} is bounded in L2 (μ), because Uf is
an isometry. By the theorem of Banach–Alaoglu (Theorems A.6.1 and 2.3.1),
every sequence in that set admits some weakly convergent subsequence.
n
Hence, it is no restriction to suppose that Uf j ϕ converges weakly to some
ϕ̂ ∈ L2 (μ).
Let E be a subspace satisfying the conditions in Definition 8.4.1. Initially,
n n −k
suppose that ϕ ∈ Uf−k (E) for some k. Then Uf j ϕ ∈ Uf j (E). Hence, given
n
any l ∈ N, we have that Uf j ϕ ∈ Ufl (E) for every j sufficiently large. It follows
(see Exercise A.6.8) that ϕ̂ ∈ Ufl (E) for every l ∈ N. By condition (ii) in
the definition, this implies that ϕ̂ = 0 at μ-almost every point. In particular,
n
limj Uf j ϕ · ψ = ϕ̂ · ψ = 0.
8.4 Lebesgue spectrum 227

Now consider any ϕ ∈ L02 (μ). By condition (iii) in the definition, for every
ε > 0 there exist k ∈ N and ϕk ∈ Uf−k (E) such that ϕ − ϕk 2 ≤ ε. Using the
Cauchy–Schwarz once more inequality:
|Ufn ϕ · ψ − Ufn ϕk · ψ| ≤ ϕ − ϕk 2 ψ2 ≤ εψ2
for every n. Recalling that limn Ufn ϕk · ψ = 0 (by the previous paragraph), we
find that
−εψ2 ≤ lim inf Ufn ϕ · ψ ≤ lim sup Ufn ϕ · ψ ≤ εψ2 .
n n

Making ε → 0, it follows that limn Ufn ϕ · ψ = 0, as we wanted to prove.

Corollary 8.4.5. If (f , μ) has Lebesgue spectrum then (f , μ) is mixing.

Proof. It suffices to observe that


    
Cn (ϕ, ψ) = |Uf ϕ · ψ −
n
ϕ dμ · ψ| = |Uf ϕ − ϕ dμ · ψ|
n

and the function ϕ  = ϕ − ϕ dμ is in L02 (μ).

The converse to Corollary 8.4.5 is false, in general: in Example 8.4.13 we


present certain mixing systems that do not have Lebesgue spectrum.
The class of systems with Lebesgue spectrum is invariant under spectral
equivalence. Indeed, suppose that (f , μ) has Lebesgue spectrum and (g, ν) is
spectrally equivalent to (f , μ). Let L : L2 (μ) → L2 (ν) be a unitary operator
conjugating the Koopman operators Uf and Ug . It follows from the hypothesis
and Corollary 8.4.5 that (f , μ) is weak mixing. Hence, by Theorem 8.2.1, the
constant functions are the only eigenvectors of Uf . Then the same holds for
Ug and so the conjugacy L maps constant functions to constant functions.
Then, as L is unitary, its restriction to the orthogonal complement L02 (μ) is a
unitary operator onto L02 (ν). Now, given any subspace E ⊂ L02 (μ) satisfying the
conditions (i), (ii), (iii) in Definition 8.4.1 for Uf , it is clear that the subspace
L(E) ⊂ L02 (ν) satisfies the corresponding conditions for Ug . Hence, (g, ν) has
Lebesgue spectrum.
Given closed subspaces V ⊂ W of a Hilbert space H, we denote by W " V
the orthogonal complement of V inside W, that is,
W " V = W ∩ V ⊥ = {w ∈ W : v · w = 0 for every v ∈ V}.
The proof of the following fact is discussed in the next section:
Proposition 8.4.6. If U : H → H is an isometry and E1 and E2 are subspaces
satisfying the conditions in Definition 8.4.1, then the orthogonal complements
E1 " U(E1 ) and E2 " U(E2 ) have the same Hilbert dimension.
This leads to the following definition: the rank of an operator U : H → H
with Lebesgue spectrum is the Hilbert dimension of E "U(E) for any subspace
E satisfying the conditions in Definition 8.4.1.
228 Equivalent systems

Then we define the rank of a system (f , μ) with Lebesgue spectrum to be


the rank of the associated Koopman operator restricted to L02 (μ). It is clear that
the rank is less than or equal to the Hilbert dimension of L02 (μ). In particular,
if L2 (μ) is separable then the rank is countable, possibly finite. The majority
of interesting examples fall into this category:
Example 8.4.7. Suppose that the probability space (M, B, μ) is countably
generated, that is, there exists a countable family G of measurable subsets
such that every element of B coincides, up to measure zero, with some
element of the σ -algebra generated by G. Then L2 (μ) is separable: the algebra
A generated by G is countable and the linear combinations with rational
coefficients of characteristic functions of elements of A form a countable dense
subset of L2 (μ).
It is interesting to point out that no examples are known of systems with
Lebesgue spectrum of finite rank. For Bernoulli shifts, the rank coincides with
the dimension of the corresponding L2 (μ):
Example 8.4.8. Let (σ , μ) be a one-sided Bernoulli shift (similar considera-
tions apply in the two-sided case). As we have seen in Example 8.4.2, we may
take E = L02 (μ). Then, denoting x = (x1 , . . . , xn , . . . ) and recalling that μ = ν N ,

ϕ ∈ E " Uσ (E) ⇔ ϕ(x0 , x)ψ(x) dμ(x0 , x) = 0 ∀ψ ∈ L02 (μ)
  
⇔ ϕ(x0 , x) dν(x0 ) ψ(x) dμ(x) = 0 ∀ψ ∈ L02 (μ).
 
Hence, E " Uσ (E) = ϕ ∈ L2 (μ) : ϕ(x0 , x) dν(x0 ) = 0 for μ-almost every x .
We claim that dim(E " Uσ (E)) = dim L2 (μ). The inequality ≤ is obvious.
To prove the other inequality, fix any measurable function φ : X → R with
φ dν = 0 and φ 2 dν = 1. Consider the linear map I : L2 (μ) → L2 (μ)
associating with each ψ ∈ L2 (μ) the function Iψ(x0 , x) = φ(x0 )ψ(x). The
assumptions on φ imply that
Iψ ∈ E " Uσ (E) and Iψ2 = ψ2 for every ψ ∈ L2 (μ).
This shows that E " Uσ (E) contains a subspace isometric to L2 (μ) and, hence,
dim E " Uσ (E) ≥ dim L2 (μ). This concludes the argument.
We say that the shift is of countable type if the probability space X
is countably generated. This is automatic, for example, if X is finite, or
even countable. In that case, the space  = X N (or  = X Z ) is also
countably generated: if G is a countable generator of X then the cylinders
[m; Cm , . . . , Cn ] with Cj ∈ G form a countable generator of . Then, as observed
in Example 8.4.7, the space L2 (μ) is separable. Therefore, it follows from
Example 8.4.8 that every Bernoulli shift of countable type has Lebesgue
spectrum with countable rank.
8.4 Lebesgue spectrum 229

8.4.2 The invertible case


In this section we take the system (f , μ) to be invertible. In this context, the
notion of Lebesgue spectrum may be formulated in a more transparent way:
Proposition 8.4.9. Let U : H → H be a unitary operator in a Hilbert space
H. Then U has Lebesgue spectrum if and only if there exists a closed subspace
F ⊂ H such that the iterates U k (F), k ∈ Z are pairwise orthogonal and satisfy

H= U k (F).
k∈Z

Proof. Suppose that there exists some subspace F as in the statement. Take
1
E= ∞ k=0 U (F). Condition (i) in Definition 8.4.1 is immediate:
k



U(E) = U k (F) ⊂ E.
k=1
1∞ k
As for condition (ii), note that ϕ ∈ ∞n=0 U n
(E) means that ϕ ∈ k=n U (F) for
every n ≥ 0. This implies that ϕ is orthogonal to U (F) for every k ∈ Z. Hence,
k

ϕ = 0. Finally, by hypothesis, we may write any ϕ ∈ H as an orthogonal sum



ϕ = k∈Z ϕk with ϕk ∈ U k (F) for every k. Then

 ∞

ϕk ∈ U k (F) = U −n (E)
k=−n k=−n

for every n and the sequence on the left-hand side converges to ϕ when n → ∞.
This gives condition (iii) in the definition.
Now we prove the converse. Given E satisfying the conditions (i), (ii) and
(iii) in the definition, take F = E " U(E). It is easy to see that the iterates of F
are pairwise orthogonal. We claim that


U k (F) = E. (8.4.1)
k=0

Indeed, consider any v ∈ E. It follows immediately from the definition of F


that there exist sequences vn ∈ U n (F) and wn ∈ U n (E) such that v = v0 + · · · +
vn−1 + wn for each n ≥ 1. We want to show that (wn )n converges to zero, to

conclude that v = ∞ j=0 vn . For that, observe that


n−1
v =
2
vj 2 + wn 2 for every n
j=0

and, thus, the series ∞j=0 vj  is summable. Given ε > 0, fix m ≥ 1 such that
2

the sum of the terms with j ≥ m is less than ε. For every n ≥ m,


wm − wn 2 = vm + · · · + vn−1 2 = vm 2 + · · · + vn−1 2 < ε.
This proves that (wn )n is a Cauchy sequence in H. Let w be its limit. Since
wn ∈ U n (E) ⊂ U m (E) for every m ≤ n, taking the limit we get that w ∈ U m (E)
230 Equivalent systems

for every m. By condition (ii) in the hypothesis, this implies that w = 0.


Therefore, the proof of the claim (8.4.1) is complete. To conclude the proof
of the proposition it suffices to observe that

  ∞
∞  ∞

U (F) =
k
U (F) =
k
U −n (E).
k∈Z n=0 k=−n n=0

Condition (iii) in the hypothesis implies that this subspace coincides with H.

In particular, an invertible system (f , μ) has Lebesgue spectrum if and only


if there exists a closed subspace F ⊂ L02 (μ) such that

L02 (μ) = Ufk (F). (8.4.2)
k∈Z

The next result is the reason why systems with Lebesgue spectrum are
denominated in this way, and it also leads naturally to the notion of rank:

Proposition 8.4.10. Let U : H → H be a unitary operator in a Hilbert space.


Let λ denote the Lebesgue measure on the unit circle. Then U has Lebesgue
spectrum if and only if it admits a spectral representation
T : L2 (λ)χ → L2 (λ)χ (ϕα )α → (z → zϕα (z))α
for some cardinal χ . Moreover, χ is uniquely determined by U.

Proof. Let us start by proving the “if” claim. As we know, the Fourier family
{zn : n ∈ Z} is a Hilbert basis of the space L2 (λ). Let Vn be the one-dimensional
1
subspace generated by ϕ(z) = zn . Then, L2 (λ) = n∈Z Vn and, consequently,
  χ 
χ
L (λ) =
2
Vn = Vnχ (8.4.3)
n∈Z n∈Z

(W χ denotes the orthogonal direct sum of χ copies of a space W). Moreover,


χ
the restriction of T to each Vnχ is a unitary operator onto Vn+1 . Take F  =
χ
V0 . The relation (8.4.3) means that the iterates T n (F  ) = Vnχ are pairwise
orthogonal and their orthogonal direct sum is the space L2 (λ)χ . Using the
conjugacy of T to the Koopman operator in L02 (μ), we conclude that there
exists a subspace F in the conditions of Proposition 8.4.9.
Conversely, suppose that there exists F in the conditions of Proposi-
tion 8.4.9. Let {vq : q ∈ Q} be a Hilbert basis of F. Then {U n (vq ) : n ∈ Z, q ∈ Q}
is a Hilbert basis of H. Given q ∈ Q, denote by δq the element of the space
L2 (λ)Q that is equal to 1 in the coordinate q and identically zero in all the other
coordinates. Define
L : H → L2 (λ)Q , L(U n (vq )) = zn δq for each n ∈ Z and q ∈ Q.
Observe that L is a unitary operator, since {zn δq } is a Hilbert basis of L2 (λ)Q .
Observe also that LU = TL. This provides the spectral representation in the
8.4 Lebesgue spectrum 231

statement of the proposition, with χ equal to the cardinal of the set Q, that is,
equal to the Hilbert dimension of the subspace F.

Let E ⊂ H be any subspace satisfying the conditions in Definition 8.4.1.


Then the orthogonal difference F = E " U(E) satisfies the conclusion of
Proposition 8.4.9, as we saw during the proof of that proposition. Moreover,
according to the proof of Proposition 8.4.10, we may take the cardinal χ equal
to the Hilbert dimension of F. Since χ is uniquely determined, the same holds
for the Hilbert dimension of E " U(E). This proves Proposition 8.4.6 in the
invertible case. In Exercise 8.4.3 we invite the reader to prove the general case.
We have just shown that the rank of a system with Lebesgue spectrum is
well defined. Next, we are going to see that for invertible systems the rank is a
complete invariant of spectral equivalence:
Corollary 8.4.11. Two invertible systems with Lebesgue spectrum are spec-
trally equivalent if and only if they have the same rank.

Proof. It is clear that two invertible systems are spectrally equivalent if and
only if they admit the same spectral representation. By Proposition 8.4.10, this
happens if and only if the value of the cardinal χ is the same, that is, if the rank
is the same.

Corollary 8.4.12. All two-sided Bernoulli shifts of countable type are


spectrally equivalent.

Proof. As we saw in the previous section, all Bernoulli shifts of countable type
have countable rank.

Proofs of the facts that are quoted in the following may be found in
Mañé [Mañ87, Section II.10]:
Example 8.4.13 (Gaussian shifts). Let A = (ai,j )i,j∈Z be an infinite real matrix.
We say that A is positive definite if every finite restriction Am,n = (ai,j )m≤i,j<n
is positive definite, for any m < n. We say that A is symmetric if ai,j = aj,i
for any i, j ∈ Z. Let μ be a Borel probability measure on  = RZ (similar
considerations hold for  = RN ). We say that μ is a Gaussian measure if there
exists some symmetric positive definite matrix A such that μ([m; Bm , . . . , Bn−1 ])
is equal to
  
1 1 1 −1
exp − (Am,n z · z) dz
(det Am,n )1/2 (2π )(n−m)/2 Bm ×···×Bn−1 2
for any m < n and any measurable sets Bm , . . . , Bn−1 ⊂ R. The reason for the
factor on the left-hand side is explained in Exercise 8.4.4. A is called the
covariance matrix of μ. It is uniquely determined by

ai,j = xi xj dμ(x) for each i, j ∈ Z.
232 Equivalent systems

For each symmetric positive definite matrix A there exists a unique Gaussian
probability measure μ that has A as its covariance matrix. Moreover, μ is
invariant under the shift map σ :  →  if and only if ai,j = ai+1,j+1 for any
i, j ∈ Z. In that case, the properties of the system (σ , μ) are directly related to
the behavior of the covariance sequence
αn = an,0 = Uσn x0 · x0 for each n ≥ 0.
In particular, (f , μ) is mixing if and only if the covariance sequence converges
to zero.
Now, Exercise 8.4.5 shows that if (f , μ) has Lebesgue spectrum then the
covariance sequence is generated by some absolutely continuous probability
measure ν on the unit circle, in the following sense:

αn = zn dν(z) for each n ≥ 0.

(The Riemann–Lebesgue lemma asserts that if ν is a probability measure


absolutely continuous with respect to the Lebesgue measure λ on the unit
circle then the sequence zn dν(z) converges to zero when n → ∞.) But
Exercise 8.4.6 shows that not every sequence that converges to zero is of this
form. Therefore, there exist Gaussian shifts (σ , μ) that are mixing but do not
have Lebesgue spectrum.

8.4.3 Exercises
8.4.1. Show that every mixing Markov shift has Lebesgue spectrum with countable
rank. [Observation: In Section 9.5.3 we mention stronger results.]
8.4.2. Let μ be the Haar measure on Td and fA : Td → Td be a surjective endomorphism.
Assume that no eigenvalue of the matrix A is a root of unity. Check that every
orbit of At in the set Zd \ {0} is infinite and use this fact to conclude that (fA , μ)
has Lebesgue spectrum. Conversely, if (fA , μ) has Lebesgue spectrum then no
eigenvalue of A is a root of unity.
8.4.3. Complete the proof of Proposition 8.4.6, using Exercise 2.3.6 to reduce the
general case to the invertible
√ one.
−x2 /2
8.4.4. Check that R e dx = 2π. Use this fact to show that if A is a symmetric
positive definite matrix of dimension d ≥ 1 then

 
exp − (A−1 z · z)/2 dz = (det A)1/2 (2π )d/2 .
Rd

8.4.5. Let (f , μ) be an invertible system with Lebesgue spectrum. Show that for every
ϕ ∈ L02 (μ) there exists a probability measure ν absolutely continuous with respect
to the Lebesgue measure λ on the unit circle {z ∈ C : |z| = 1} and such that
Ufn ϕ · ϕ = zn dν(z) for every n ∈ Z.
8.4.6. Let λ be the Lebesgue measure in the unit circle. Consider the linear operator
F : L1 (λ) → c0 defined by
 
F(ϕ) = z ϕ(z) dλ(z) .
n

n
8.5 Lebesgue spaces and ergodic isomorphism 233

Show that F is continuous and injective but not surjective. Therefore, not every
sequence of complex numbers (αn )n converging to zero may be written as αn =
zn dν(z) for n ≥ 0, for some probability measure ν absolutely continuous with
respect to λ.

8.5 Lebesgue spaces and ergodic isomorphism


The main subject this section are the Lebesgue spaces (also called standard
probability spaces), a class of probability spaces introduced by the Russian
mathematician Vladimir A. Rokhlin [Rok62]. These spaces have a distin-
guished role in measure theory, for two reasons: on the one hand, they exhibit
much better properties than a general probability space; on the other hand,
they include most interesting examples. In particular, every complete separable
metric space endowed with a Borel probability measure is a Lebesgue
space.
Initially, we discuss yet another notion of equivalence, intermediate to
ergodic equivalence and spectral equivalence, that we call ergodic isomor-
phism. One of the highlights is that for transformations in Lebesgue spaces
the notions of ergodic equivalence and ergodic isomorphism turn out to
coincide.

8.5.1 Ergodic isomorphism


Let (M, B, μ) be a probability space. We denote by B̃ the quotient of the
σ -algebra by the equivalence relation A ∼ B ⇔ μ(A B) = 0. Observe that
if Ak ∼ Bk for every k ∈ N then k Ak ∼ k Bk , k Ak ∼ k Bk and Ack ∼ Bck for
every k ∈ N. Therefore, the basic operations of set theory are well defined in
the quotient B̃. Moreover, the measure μ induces a measure μ̃ on B̃. The pair
(B̃, μ̃) is called the measure algebra of the probability space.
Now let (M, B, μ) and (N, C, ν) be two probability spaces, and (B̃, μ̃) and
˜ ν̃) be their measure algebras. A homomorphism of measure algebras is
(C,
a map H : B̃ → C˜ that preserves the operations of union, intersection and
complement and also preserves the measures: μ(B) = ν(H(B)) for every B ∈ B̃.
If H is a bijection, we call it an isomorphism of measure algebras. In that case
the inverse H −1 is also an isomorphism of measure algebras.
Every measurable map h : M → N satisfying h∗ μ = ν defines a homo-
morphism h̃ : C˜ → B̃, through B → h−1 (B). Moreover, if h is invertible
then h̃ is an isomorphism. In the same way, transformations f : M → M and
g : N → N preserving the measures in the corresponding probability spaces
define homomorphisms f̃ : B̃ → B̃ and g̃ : C˜ → C, ˜ respectively. We say that the
systems (f , μ) and (g, ν) are ergodically isomorphic if these homomorphisms
are conjugate, that is, if f̃ ◦ H = H ◦ g̃ for some isomorphism H : C˜ → B̃.
234 Equivalent systems

Ergodically equivalent systems are always ergodically isomorphic: given


any ergodic equivalence h, it suffices to take H = h̃. We also have the following
relation between ergodic isomorphism and spectral equivalence:
Proposition 8.5.1. If two systems (f , μ) and (g, ν) are ergodically isomorphic
then they are spectrally equivalent.

Proof. Let H : C˜ → B̃ be an isomorphism such that f̃ ◦ H = H ◦ g̃. Consider the


linear operator L : L2 (ν) → L2 (μ) constructed as follows. Initially, L(XC ) =
XH(C) for every B ∈ C.˜ Note that L(XC ) = XC . Extend the definition to the
set of simple functions, preserving linearity:
⎛ ⎞
k k
L⎝ cj XCj ⎠ = ˜
cj XH(Cj ) for any k ≥ 1, cj ∈ R and Cj ∈ C.
j=1 j=1

The definition does not depend on the representation of the simple function
as a linear combination of characteristic functions (Exercise 8.5.1). Moreover,
L(ϕ) = ϕ for every simple function. Recall that the set of simple functions
is dense in L2 (ν). Then, by continuity, L extends uniquely to a linear isometry
defined on the whole of L2 (ν). Observe that this isometry is invertible: the
inverse is constructed in the same way, starting from the inverse of H. Finally,
Uf ◦ L(XC ) = Uf (XH(C) ) = Xf̃ (H(C)) = XH(g̃(C)) = L(Xg̃(C) ) = L ◦ Ug (XC )
˜ By linearity, it follows that Uf ◦ L(ϕ) = L ◦ Ug (ϕ) for every
for every C ∈ C.
simple function; then, by continuity, the same holds for every ϕ ∈ L2 (ν).

Summarizing these observations, we have the following relation between


the three equivalence relations:
ergodic equivalence ⇒ ergodic isomorphism ⇒ spectral equivalence.
In what follows we discuss some partial converses, starting with the relation
between ergodic isomorphism and spectral equivalence.
The following result of Paul Halmos and John von Neumann [HvN42]
broadens Proposition 8.3.4 and shows that for systems with discrete spectrum
the notions of ergodic isomorphism and spectral equivalence coincide. The
reader may find a proof in Section 3.2 of Walters [Wal75].
Theorem 8.5.2 (Discrete spectrum). If (f , μ) and (g, ν) are ergodic systems
with discrete spectrum then the following conditions are equivalent:

1. (f , μ) and (g, ν) are spectrally equivalent;


2. the Koopman operators of (f , μ) and (g, ν) have the same eigenvalues;
3. (f , μ) and (g, ν) are ergodically isomorphic.

In particular, every invertible ergodic system with discrete spectrum is


ergodically isomorphic to its inverse.
8.5 Lebesgue spaces and ergodic isomorphism 235

8.5.2 Lebesgue spaces


Let (M, B, μ) be any probability space. Initially, suppose that the measure μ is
non-atomic, that is, that μ({x}) = 0 for every x ∈ M. Let P1 ≺ · · · ≺ Pn ≺ · · · be
an increasing sequence of finite partitions of M into measurable sets. We call
the sequence separating if, given any two different points x, y ∈ M, there exists
n ≥ 1 such that Pn (x)  = Pn (y). In other words, the non-empty elements of the
*
partition ∞ n=1 Pn contain a unique point.
Let MP be the subset one obtains by removing from M all the P ∈ n Pn
with measure zero. Observe that MP has full measure. We denote by BP and
μP the restrictions of B and μ, respectively, to MP . Let m be the Lebesgue
measure on R. The next proposition means that the separating sequence allows
one to represent the probability space (MP , BP , μP ) as a kind of subspace of
the real line. We say “kind of” because, in general, the image ι(MP ) is not a
measurable subset of R.

Proposition 8.5.3. Given any separating sequence (Pn )n , there exists a


compact totally disconnected set K ⊂ R and there exists a measurable injective
map ι : MP → K such that the closure of the image ι(P) of every P ∈ n Pn
is an open and closed subset of K with m(ι(P)) = μ(P). In particular, ι∗ μ
coincides with the restriction of the Lebesgue measure m to the set K.

Proof. Let αn = 1 + 1/n for n ≥ 1. We are going to construct a sequence of


bijective maps ψn : Pn → In , n ≥ 1 satisfying:

(i) each In is a finite family of compact pairwise disjoint intervals;


(ii) each element of In , n > 1 is contained in some element of In−1 ;
(iii) m(ψn (P)) = αn μ(P) for every P ∈ Pn and every n ≥ 1.

To do this, we start by writing P1 = {P1 , . . . , PN }. Consider any family I1 =


{I1 , . . . , IN } of compact pairwise disjoint intervals such that m(Ij ) = α1 μ(Pj ) for
every j. Let ψ1 : P1 → I1 be the map associating with each Pj the corresponding
Ij . Now suppose that, for a given n ≥ 1, we have already constructed maps ψ1 ,
. . . , ψn satisfying (i), (ii), (iii). For each P ∈ Pn , let I = ψn (P) and let P1 , . . . , PN
be the elements of Pn+1 contained in P. Take compact pairwise disjoint
intervals I1 , . . . , IN ⊂ I satisfying m(Ij ) = αn+1 μ(Pj ) for each j = 1, . . . , N. This
is possible because, by the induction hypothesis,


N 
N
m(I) = αn μ(P) = αn μ(Pj ) > αn+1 μ(Pj ).
j=1 j=1

Then, define ψn+1 (Pj ) = Ij for each j = 1, . . . , N. Repeating this procedure for
each P ∈ Pn , we complete the definition of ψn+1 and In+1 . It is clear that the
conditions (i), (ii), (iii) are preserved. This finishes the construction.
236 Equivalent systems

Now, let K = n I∈In I. It is clear that K is compact and its intersection


with any I ∈ In is an open and closed subset of K. Moreover,

max{m(I) : I ∈ In } = αn max{μ(P) : P ∈ Pn } → 0 when n → ∞ (8.5.1)

because the sequence (Pn )n is separating and the measure μ is non-atomic.


Hence, K is totally disconnected. For each x ∈ MP , the intervals ψn (Pn (x))
form a decreasing sequence of compact sets whose lengths decrease to zero.
Define ι(x) to be the unique point in n ψn (Pn (x)). The hypothesis that the
sequence is separating ensures that ι is injective: if x  = y then there exists
n ≥ 1 such that Pn (x) ∩ Pn (y) = ∅ and, thus, ι(x) = ι(y). By construction, the
pre-image of K ∩ I is in n Pn for every I ∈ n In . Consider the algebra A
formed by the finite disjoint unions of sets K ∩ I of this form. This algebra is
generating and we have just checked that ι−1 (A) is a measurable set for every
A ∈ A. Therefore, the transformation ι is measurable.
To check the other properties in the statement of the proposition, begin by
noting that, for every n ≥ 1 and P ∈ Pn ,
∞ 

ι(P) = ψk (Q), (8.5.2)
k=n Q

where the union is over all the Q ∈ Pk that are contained in P. To get the
inclusion ⊂ it suffices to note that ι(P) = Q ι(Q) and ι(Q) ⊂ ψ(Q) for every
Q ∈ Pk and every k. The converse follows from the fact that ι(P) intersects
every ψk (Q) (the intersection contains ι(Q)) and the length of the ψk (Q)
converges to zero when k → ∞. In this way, (8.5.2) is proven. It follows that
  
m(ι(P)) = lim m ψk (Q) = lim αk μ(Q) = lim αk μ(P) = μ(P).
k k k
Q Q

Moreover, (8.5.2) means that ι(P) = ∞ k=n I I, where the union is over all
the I ∈ Ik that are contained in ψn (P). The right-hand side of this equality
coincides with K ∩ ψn (P) and, hence, is an open and closed subset of K. It also
follows from the construction that ι−1 (ι(P)) = P. Consequently, ι∗ μ(ι(P)) =
μ(P) = m(ι(P)) for every P ∈ n Pn . Since the algebra of finite pairwise
disjoint unions of sets ι(P) generates the measurable structure of K, we
conclude that ι∗ μ = m | K.

We say that a probability space without atoms (M, B, μ) is a Lebesgue space


if, for some separating sequence, the image ι(MP ) is a Lebesgue measurable
set. Actually, this property does not depend on the choice of the generating
sequence (nor on the families In in the proof of Proposition 8.5.3), but we
do not prove this fact here: the reader may find a proof in [Rok62, §2.2].
Exercise 8.5.6 shows that it is possible to define Lebesgue space in a more
direct way, without using Proposition 8.5.3.
8.5 Lebesgue spaces and ergodic isomorphism 237

Note that if ι(MP ) is measurable then ι(P) = ι(MP ) ∩ ψn (P) is measurable


for every P ∈ Pn and every n. Hence, the inverse ι−1 is also a measurable
transformation. Moreover, m(ι(MP )) = μ(MP ) = 1 = m(K). Therefore, every
Lebesgue space (M, B, μ) is isomorphic, as a measure space, to a measurable
subset of a compact totally disconnected subset of the real line.
Observe that if the cardinal of M is strictly larger than the cardinal of the
continuum then (M, B, μ) admits no separating sequence and, thus, cannot
be a Lebesgue space. In Exercise 8.5.8 we propose another construction of
probability spaces that are not Lebesgue spaces. Despite examples such as
these, practically all the probability spaces we deal with are Lebesgue spaces:

Theorem 8.5.4. If M is a complete separable metric space and μ is a Borel


probability measure with no atoms then (M, B, μ) is a Lebesgue space.

Proof. Let X ⊂ M be a countable dense subset and {Bn : n ∈ N} be an


enumeration of the set of balls B(x, 1/k) with x ∈ X and k ≥ 1. We are going to
construct an increasing sequence (Pn )n of finite partitions such that

(i) Pn is finer than {B1 , Bc1 } ∨ · · · ∨ {Bn , Bcn }, and


(ii) En = {x ∈ M : Pn (x) is not compact} satisfies μ(En ) ≤ 2−n .

We start by considering Q1 = {B1 , Bc1 }. By Proposition A.3.7, there exist


compact sets K1 ⊂ B1 and K2 ⊂ Bc1 such that μ(B1 \ K1 ) ≤ 2−1 μ(B1 ) and
μ(Bc1 \ K2 ) ≤ 2−1 μ(Bc1 ). Then take P1 = {K1 , B1 \ K1 , K2 , Bc1 \ K2 }. Now, for
each n ≥ 1, assume that one has already constructed partitions P1 ≺ · · · ≺ Pn
satisfying (i) and (ii). Consider the partition Qn+1 = Pn ∨ {Bn+1 , Bcn+1 } and
let Q1 , . . . , Qm be its elements. By Proposition A.3.7, there exist compact sets
Kj ⊂ Qj such that μ(Qj \ Kj ) ≤ 2−(n+1) μ(Qj ) for every j = 1, . . . , m. Take

Pn+1 = {K1 , Q1 \ K1 , . . . , Km , Qm \ Km }.

It is clear that Pn+1 satisfies (i) and (ii). Therefore, our construction is
complete.
All that is left is to show that the existence of such a sequence (Pn )n
implies the conclusion of the theorem. Property (i) ensures that the sequence is
separating. Let ι : MP → K be a map as in Proposition 8.5.3. Fix any N ≥ 1 and
consider any point y ∈ K \ ι(MP ). For each n > N, let In be the interval in the
family In that contains y and let Pn be the element of Pn such that ψn (Pn ) = In .
Note that (Pn )n is a decreasing sequence. If they were all compact, there would
be x ∈ n>N Pn and, by definition, ι(x) would be equal to y. Since we are
assuming that y is not in the image of ι, this proves that there exists l > N such
that Pl is not compact. Take l > N minimum and let Il = ψl (Pl ). Recall that
m(Il ) = αl μ(Pl ) ≤ 2μ(Pl ). Let ĨN and P̃N be the unions of all these Il and Pl ,
respectively, when we vary y on the whole K \ ι(MP ). On the one hand, ĨN
contains K \ ι(MP ); on the other hand, P̃N is contained in l>N El . Moreover,
238 Equivalent systems

the Il are pairwise disjoint (because we took l minimum) and the same holds
for the Pl . Hence,
 
   
m ĨN ≤ 2μ P̃N ≤ 2μ El ≤ 2−N+1 .
l>N

Then the intersection N ĨN has Lebesgue measure zero and contains K \
ι(MP ). Since K is a Borel set, this shows that ι(MP ) is a Lebesgue measurable
set.
The next result implies that all the Lebesgue spaces with no atoms are
isomorphic:
Proposition 8.5.5. If (M, B, μ) is a Lebesgue space with no atoms, there exists
an invertible measurable map h : M → [0, 1] (defined between subsets of full
measure) such that h∗ μ coincides with the Lebesgue measure on [0, 1].

Proof. Let ι : MP → K be a map as in Proposition 8.5.3. Consider the map


g : K → [0, 1] defined by g(x) = m([a, x] ∩ K), where a = min K. It follows
immediately from the definition that g is non-decreasing and Lipschitz:
 
g(x2 ) − g(x1 ) = m [x1 , x2 ] ∩ K ≤ x2 − x1 ,
for any x1 < x2 in K. In particular, g is measurable. By monotonicity, the
pre-image of any interval [y1 , y2 ] ⊂ [0, 1] is a set of the form [x1 , x2 ] ∩ K with
x1 , x2 ∈ K and g(x1 ) = y1 and g(x2 ) = y2 . In particular,
 
m [x1 , x2 ] ∩ K = g(x2 ) − g(x1 ) = y2 − y1 = m([y1 , y2 ]).
This shows that g∗ (m | K) = m | [0, 1]. Let Y be the set of points y ∈ [0, 1]
such that g−1 ({y}) = [x1 , x2 ] ∩ K with x1 , x2 ∈ K and x1 < x2 . Let X = g−1 (Y).
Then m(X) = m(Y) = 0 because Y is countable. Moreover, the restriction g :
K \ X → [0, 1] \ Y is bijective. Its inverse is non-decreasing and, consequently,
measurable. Now, take h = g ◦ ι. It follows from the previous observations that
h : MP \ ι−1 (X) → g(ι(MP )) \ Y
is a measurable bijection with measurable inverse such that h∗ μ = m | [0, 1].
Now we extend this discussion to general probability spaces (M, B, μ),
possibly with atoms. Let A ⊂ M be the set of all the atoms; note that A is at most
countable, possibly finite. If the space is purely atomic, that is, if μ(A) = 1,
then, by definition, it is a Lebesgue space. More generally, let M  = M \ A, let
B  be the restriction of B to M  and let μ be the normalized restriction of μ
to B  . By definition, (M, B, μ) is a Lebesgue space if (M  , B  , μ ) is a Lebesgue
space.
It is clear that Theorem 8.5.4 remains valid in the general case: every
complete separable metric space endowed with a Borel probability measure,
possibly with atoms, is a Lebesgue space. Moreover, Proposition 8.5.5 has the
following extension to the atomic case: if (M, B, μ) is a Lebesgue space and
8.5 Lebesgue spaces and ergodic isomorphism 239

A ⊂ M denotes the set of atoms of the measure μ, then there exists an invertible
measurable map h : M → [0, 1 − μ(A)] ∪ A such that h∗ μ coincides with m on
the interval [0, 1 − μ(A)] and coincides with μ on A.

Proposition 8.5.6. Let (M, B, μ) and (N, C, ν) be two Lebesgue spaces and
H : C˜ → B̃ be an isomorphism between the corresponding measure algebras.
Then there exists an invertible measurable map h : M → N such that h∗ μ = ν
and H = h̃ for every C ∈ C. ˜ Moreover, h is essentially unique: any two maps
satisfying these conditions coincide at μ-almost every point.

We are going to sketch the proof of this proposition in the non-atomic case.
The arguments are based on the ideas and use the notations in the proof of
Proposition 8.5.3.
Let us start with the uniqueness claim. Let h1 , h2 : M → N be any two maps
such that (h1 )∗ μ = (h2 )∗ μ = ν. Suppose that h1 (x)  = h2 (x) for every x in a
set E ⊂ M with μ(E) > 0. Let (Qn )n be a separating sequence in (N, C, ν).
Then Qn (h1 (x))  = Qn (h2 (x)) for every x ∈ E and every n sufficiently large.
Hence, we may fix n (large) and E ⊂ E with μ(E ) > 0 such that Qn (h1 (x)) =
Qn (h2 (x)) for every x ∈ E . Consequently, there exist Q ∈ Qn and E ⊂ E
with μ(E ) > 0 such that Qn (h1 (x)) = Q and Qn (h2 (x))  = Q for every x ∈ E .
Therefore, E ⊂ h−1 −1
1 (Q) \ h2 (Q). This implies that h̃1 (Q)  = h̃2 (Q) and, hence,
h̃1 = h̃2 .
Next we comment on the existence claim. Let (Pn )n and (Qn )n be separating
sequences in (M, B, μ) and (N, C, ν), respectively. Define Pn = Pn ∨H(Qn ) and
Qn = Qn ∨ H −1 (Pn ). Then (Pn )n and (Qn )n are also separating sequences and
Pn = H(Qn ) for each n. Let ι : MP → K be a map as in Proposition 8.5.3 and
ψn : Pn → In , n ≥ 1 be the family of bijections used in its construction. Let
j : NQ → L and ϕn : Qn → Jn be corresponding objects for (N, C, ν). Since
we are assuming that (M, B, μ) and (N, C, ν) are Lebesgue spaces, ι and j are
invertible maps over subsets with full measure. Recall also that m(ψn (P)) =
αn μ(P) for each P ∈ Pn and, analogously, m(ϕn (Q)) = αn ν(Q) for each Q ∈ Qn .
Hence, m(ψn (P)) = m(ϕn (Q)) if P = H(Q). Then, for each n,

ψn ◦ H ◦ ϕn−1 : Jn → In (8.5.3)

is a bijection that preserves length. Given z ∈ K and n ≥ 1, let In be the element


of In that contains z and let Jn be the corresponding element of Jn , via (8.5.3).
By construction, (Jn )n is a nested sequence of compact intervals whose length
converges to zero. Let φ(z) be the unique point in the intersection. In this way,
one has defined a measurable map φ : K → L that preserves the Lebesgue
measure. It is clear from the construction that φ is invertible and the inverse is
also measurable. Now it suffices to take h = j −1 ◦ φ ◦ ι.
All that is left is to check that h is invertible. Applying the construction in the
previous paragraph to the inverse H −1 we find h : N → M such that h∗ ν = μ
240 Equivalent systems

and H −1 = h̃ . Then, h 


 ◦ h = h̃ ◦ h̃ = id and h ◦ h = h̃ ◦ h̃ = id . By uniqueness,
it follows that h ◦ h = id and h ◦ h = id at almost every point.
Corollary 8.5.7. Let (M, B, μ) and (N, C, ν) be two Lebesgue spaces and
let f : M → M and g : N → N be measurable transformations preserving
the measures in their corresponding domains. Then (f , μ) and (g, ν) are
ergodically equivalent if and only if they are ergodically isomorphic.

Proof. We only need to show that if the systems are ergodically isomorphic
then they are ergodically equivalent. Let H : C˜ → B̃ be an ergodic isomorphism.
By Proposition 8.5.6, there exists an invertible measurable map h : M → N such
that h∗ μ = ν and H = h̃. Then,
h2
◦ f = f̃ ◦ h̃ = f̃ ◦ H = H ◦ g̃ = h̃ ◦ g̃ = g2
◦ h.
By the uniqueness part of Proposition 8.5.6, it follows that h ◦ f = g ◦ h at
μ-almost every point. This shows that h is an ergodic equivalence.

8.5.3 Exercises
8.5.1. Let H : C˜ → B̃ be a homomorphism of measure algebras. Show that

l 
k 
l 
k
bi XBi = cj XCj ⇒ bi XH(Bi ) = cj XH(Cj ) .
i=1 j=1 i=1 j=1

8.5.2. Check that the homomorphism of measure algebras g̃ : C → B induced by a


measure-preserving map g : M → N is injective. Suppose that N is a Lebesgue
space. Show that, given another measure-preserving map h : M → N, the
corresponding homomorphisms g̃ and h̃ coincide if and only if g = h at almost
every point.
8.5.3. Let f : M → M be a measurable transformation in a Lebesgue space (M, B, μ),
preserving the measure μ. Show that (f , μ) is invertible at almost every point
(that is, there exists an invariant full measure subset restricted to which f is a
measurable bijection with measurable inverse) if and only if the corresponding
homomorphism of measure algebras f̃ : B̃ → B̃ is surjective.
8.5.4. Show that the Koopman operator of a system (f , μ) is surjective if and only if
the corresponding homomorphism of measure algebras f̃ : B̃ → B̃ is surjective.
In Lebesgue spaces this happens if and only if the system is invertible at almost
every point.
8.5.5. Show that every system (f , μ) with discrete spectrum in a Lebesgue space is
invertible at almost every point.
8.5.6. Given a separating sequence P1 ≺ · · · ≺ Pn ≺ · · · , we call a chain any sequence
(Pn )n with Pn ∈ Pn and Pn+1 ⊂ Pn for every n. We say that a chain is empty if
n Pn = ∅. Consider the map ι : MP → K constructed in Proposition 8.5.3. Show
that the image ι(MP ) is a Lebesgue measurable set and m(K \ ι(MP )) = 0 if and
only if the empty chains have zero measure in the following sense: for every δ > 0
there exists B ⊂ M such that B is a union of elements of n Pn with μ(B) < δ
and every empty chain (Pn )n has Pn ⊂ B for every n sufficiently large.
8.5 Lebesgue spaces and ergodic isomorphism 241

8.5.7. Prove the following extension of Proposition 2.4.4: If f : M → M preserves a


probability measure μ and (M, μ) is a Lebesgue space then μ admits a (unique)
lift μ̂ to the natural extension f̂ : M̂ → M̂.
8.5.8. Let M be a subset of [0, 1] with exterior measure m∗ (M) = 1 but which is not a
Lebesgue measurable set. Consider the σ -algebra M of all sets of the form M ∩B,
where B is a Lebesgue measurable subset of R. Check that μ(M ∩ B) = m(B)
defines a probability measure on (M, M) such that (M, M, μ) is not a Lebesgue
space.
9
Entropy

The word entropy was invented in 1865 by the German physicist and mathe-
matician Rudolf Clausius, one of the founding pioneers of thermodynamics. In
the theory of systems in thermodynamical equilibrium, the entropy quantifies
the degree of “disorder” in the system. The second law of thermodynamics
states that, when an isolated system passes from an equilibrium state to another,
the entropy of the final state is necessarily bigger than the entropy of the initial
state. For example, when we join two containers with different gases (oxygen
and nitrogen, say), the two gases mix with one another until reaching a new
macroscopic equilibrium, where they are both uniformly distributed in the two
containers. The entropy of the new state is larger than the entropy of the initial
equilibrium, where the two gases were separate.
The notion of entropy plays a crucial role in different fields of science.
An important example, which we explore in our presentation, is the field of
information theory, initiated by the work of the American electrical engineer
Claude Shannon in the mid 20th century. At roughly the same time, the
Russian mathematicians Andrey Kolmogorov and Yakov Sinai were proposing
a definition of the entropy of a system in ergodic theory. The main purpose
was to provide an invariant of ergodic equivalence that, in particular, could
distinguish two Bernoulli shifts. This Kolmogorov–Sinai entropy is the subject
of this chapter.
In Section 9.1 we define the entropy of a transformation with respect to an
invariant probability measure, by analogy with a similar notion in information
theory. The theorem of Kolmogorov–Sinai, which we discuss in Section 9.2, is
a fundamental tool for the actual calculation of the entropy in specific systems.
In Section 9.3 we analyze the concept of entropy from a more local viewpoint,
which is more closely related to Shannon’s formulation of this concept. Next,
in Section 9.4, we illustrate a few methods for calculating the entropy, by
means of concrete examples.
In Section 9.5 we discuss the role of the entropy as an invariant of
ergodic equivalence. The highlight is the theorem of Ornstein (Theorem 9.5.2),
according to which any two-sided Bernoulli shifts are ergodically equivalent
9.1 Definition of entropy 243

if and only if they have the same entropy. In that section we also introduce
the class of Kolmogorov systems, which contains the Bernoulli shifts and is
contained in the class of systems with Lebesgue spectrum. In both cases the
inclusion is strict.
In the last couple of sections we present two complementary topics that will
be useful later. The first one (Section 9.6) is the theorem of Jacobs, according
to which the entropy behaves in an affine way with respect to the ergodic
decomposition. The other (Section 9.7) concerns the notion of the Jacobian
and its relations with the entropy.

9.1 Definition of entropy


To motivate the definition of Kolmogorov–Sinai entropy, let us look at the
following basic situation in information theory. Consider some communication
channel transmitting symbols from a certain alphabet A, one after the other.
This could be a telegraph transmitting group of dots and dashes, according
to the old Morse code, an optical fiber, transmitting packets of zeros and
ones, according to the ASCII binary code, or any other process of sequential
transmission of information, such as our reader’s going through the text of this
book, one letter after the other. The objective is to measure the entropy of the
channel, that is, the mean quantity of information it carries, per unit of time.

9.1.1 Entropy in information theory


It is assumed that each symbol has a given frequency, that is, a given
probability of being used at any time in the communication. For example, if
the channel is transmitting a text in English then the letter E is more likely
to be used than the letter Z, say. The occurrence of rarer symbols, such as Z,
restricts the kind of word or sentence in which they appear and, hence, is more
informative than the presence of commoner symbols, such as E.
This suggests that information should be a function of probability: the more
unlikely a symbol (or a word, defined as a finite sequence of symbols) is, the
more information it carries.
The situation is actually more complicated, because for most communica-
tion codes the probability of using a given symbol also depends on the context.
For example, still assuming that the channel transmits in English, any sequence
of symbols S, Y, S, T, E must be followed by an M: in this case, in view of the
symbols transmitted previously, this letter M is unavoidable, which also means
that it carries no additional information.1
1 We once participated in a “treasure hunt” that consisted in searching the woods for hidden
letters that would form the name of a mathematical object. It just so happened that the first
three letters that were found were Z, Z and Z. That unfortunate circumstance ended the game
244 Entropy

On the other hand, in those situations where symbols are transmitted at


random, independently of each other, the information carried by each symbol
simply adds to the information conveyed by the previous ones. For example, if
the transmission reflects the outcomes of the successive flipping of a fair coin,
then the amount of information associated with the outcome (Head, Tail, Tail)
must be equal to the sum of the amounts of information associated with each of
the symbols Head, Tail and Tail. Now, by independence, the probability of the
event (Head, Tail, Tail) is the product of the probabilities of the events Head,
Tail and Tail.
This suggests that information should be defined in terms of the logarithm of
the probability. In information theory it is usual to consider base 2 logarithms,
because essentially all the communication channels one finds in practice are
binary. However, there is no reason to stick to that custom in our setting: we
will consider natural (base e) logarithms instead.
By definition, the quantity of information associated with a symbol a ∈ A is
given by
I(a) = − log pa , (9.1.1)

where pa is the probability (frequency) of the symbol a. The mean information


associated with the alphabet A is given by
 
I(A) = pa I(a) = −pa log pa . (9.1.2)
a∈A a∈A

More generally, the quantity of information associated with a word


a1 . . . an is
I(a1 . . . an ) = − log pa1 ...an , (9.1.3)

where pa1 ...an denotes the probability of the word. In the independent case this
coincides with the product pa1 . . . pan of the probabilities of the symbols, but
not in general. Denoting by An the set of all the words of length n, we define
 
I(An ) = pa1 ...an I(a1 , . . . , an ) = −pa1 ...an log pa1 ...an . (9.1.4)
a1 ,...,an a1 ,...,an

Finally, the entropy of the communication channel is defined by:


1
I = lim I(An ). (9.1.5)
n n

We invite the reader to check that the sequence I(An ) is subadditive and, thus,
the limit in (9.1.5) does exist. This is also contained in the much more general
theory that we are about to present.

prematurely, since at that point the remaining letters could add no information: there is only
one mathematical object whose name includes the letter Z three times (the Yoccoz puzzle).
9.1 Definition of entropy 245

9.1.2 Entropy of a partition


We want to adapt these ideas to our context in ergodic theory. The main
difference is that, while in information theory the alphabet A is usually discrete
(finite or, at most, countable), that is not the case for the domain (space of
states) of most interesting dynamical systems. That issue is dealt with by using
partitions of the domain.
Let (M, B, μ) be a probability space. In this chapter, by partition we always
mean a countable (finite or infinite) family P of pairwise disjoint measurable
subsets of M whose union has full measure. We denote by P(x) the element of
the partition that contains a given point x. The sum P ∨ Q of two partitions P
and Q is the partition whose elements are the intersections P ∩ Q with P ∈ P
and Q ∈ Q. More generally, given any countable family of partitions Pn , we
define  
)
Pn = Pn : Pn ∈ Pn for each n .
n n

With each partition P we associate the corresponding information function


IP : M → R, IP (x) = − log μ(P(x)). (9.1.6)
It is clear that the function IP is measurable. By definition, the entropy of the
partition P is the mean of its information function, that is,
 
Hμ (P) = IP dμ = −μ(P) log μ(P). (9.1.7)
P∈P

We always abide to the usual (in the theory of Lebesgue integration)


convention that 0 log 0 = limx→0 x log x = 0. See Figure 9.1.
Consider the function φ : (0, ∞) → R given by φ(x) = −x log x. One can
readily check that φ  < 0. Therefore, φ is strictly concave:
t1 φ(x1 ) + · · · + tk φ(xk ) ≤ φ(t1 x1 + · · · + tk xk ) (9.1.8)
for every x1 , . . . , xk > 0 and t1 , . . . , tk > 0 with t1 + · · · + tk = 1; moreover, the
identity holds if and only if x1 = · · · = xk . This observation will be useful on
several occasions.

1
Figure 9.1. Graph of the function φ(x) = −x log x
246 Entropy

We say that two partitions P and Q are independent if μ(P∩Q) = μ(P)μ(Q)


for every P ∈ P and every Q ∈ Q. Then, IP∨Q = IP +IQ and, therefore, Hμ (P ∨
Q) = Hμ (P) + Hμ (Q). In general, one has the inequality ≤, as we are going
to see in a while.
Example 9.1.1. Let M = [0, 1] be endowed with the Lebesgue measure. For
each
 n ≥ 1, consider
 the partition P n of the interval M into the subintervals
(i − 1)/10n , i/10n with 1 ≤ i ≤ 10n . Then,
n

10
Hμ (P n ) = −10−n log 10−n = n log 10.
i=1

Example 9.1.2. Let M = {1, . . . , d}N be endowed with a product measure μ =


ν N . Denote pi = ν({i}) for each i ∈ {1, . . . , d}. For each n ≥ 1, let P n be the
partition of M into the cylinders [0; a1 , . . . , an ] of length n. The entropy of P n is
  
Hμ (P n ) = −pa1 . . . pan log pa1 . . . pan
a1 ,...,an
 
= −pa1 . . . paj . . . pan log paj
j a1 ,...,an
 
= −paj log paj pa1 . . . paj−1 paj+1 . . . pan .
j aj ai ,i=j

The last sum is equal to 1, since i pi = 1. Therefore,

n 
d 
n 
d 
d
Hμ (P ) =
n
−paj log paj = −pi log pi = −n pi log pi .
j=1 aj =1 j=1 i=1 i=1

Lemma 9.1.3. Every finite partition P has finite entropy: Hμ (P) ≤ log #P and
the identity holds if and only if μ(P) = 1/#P for every P ∈ P.

Proof. Let P = {P1 , P2 , . . . , Pn } and consider ti = 1/n and xi = μ(Pi ). By the


concavity property (9.1.8):
 n  n   
1 1 log n
Hμ (P) = ti φ(xi ) ≤ φ ti xi = φ = .
n i=1 i=1
n n
Therefore, Hμ (P) ≤ log n. Moreover, the identity holds if and only if μ(Pi ) =
1/n for every i = 1, . . . , n.

Example 9.1.4. Let M = [0, 1] be endowed with the Lebesgue measure μ.



Observe that the series ∞ k=1 1/(k(log k) ) is convergent. Let c be the value
2

of the sum. Then, we may partition [0, 1] into intervals Pk with μ(Pk ) =
1/(ck(log k)2 ) for every k. Let P be the partition formed by these subintervals.
Then,
 ∞
log c + log k + 2 log log k
Hμ (P) = 2
.
k=1
ck(log k)
9.1 Definition of entropy 247

By the ratio convergence criterion, the series on the right-hand side has the

same behavior as the series ∞ k=1 1/(k log k) which, as we know (use the
integral convergence criterion), is divergent. Therefore, Hμ (P) = ∞.

This shows that infinite partitions may have infinite entropy. From now, for
the rest of the chapter, we always consider (countable) partitions with finite
entropy.
The conditional entropy of a partition P with respect to another partition Q
is the number
 
   μ P∩Q
Hμ (P/Q) = −μ P ∩ Q log . (9.1.9)
P∈P Q∈Q
μ(Q)

Intuitively, it measures the amount of information provided by the partition


P in addition to the information provided by the partition Q. It is clear
that Hμ (P/M) = Hμ (P) for every P, where M denotes the trivial partition
M = {M}. Moreover, if P and Q are independent then Hμ (P/Q) = Hμ (P). In
general, one has the inequality ≤, as we are going to see later.
Given two partitions P and Q, we say that P is coarser than Q (or,
equivalently, Q is finer than P) and we write P ≺ Q, if every element of Q
is contained in some element of P, up to measure zero. The sum P ∨ Q may
also be defined as the coarsest of all the partitions R such that P ≺ R and
Q ≺ R.

Lemma 9.1.5. Let P, Q and R be partitions with finite entropy. Then,


(i) Hμ (P ∨ Q/R) = Hμ (P/R) + Hμ (Q/P ∨ R);
(ii) if P ≺ Q then Hμ (P/R) ≤ Hμ (Q/R) and Hμ (R/P) ≥ Hμ (R/Q);
(iii) P ≺ Q if and only if Hμ (P/Q) = 0.

Proof. By definition,
 
     μ P∩Q∩R
Hμ P ∨ Q/R = −μ P ∩ Q ∩ R log
P,Q,R
μ(R)
 
   μ P∩Q∩R
= −μ P ∩ Q ∩ R log  
P,Q,R
μ P ∩ R
 
   μ P∩R
+ −μ P ∩ Q ∩ R log .
P,Q,R
μ(R)

The sum on the right-hand side may be rewritten as


   
   μ S∩Q    μ P∩R
−μ S ∩ Q log + −μ P ∩ R log
S∈P∨R,Q∈Q
μ(S) P∈P,R∈R
μ(R)
 
= Hμ Q/P ∨ R + Hμ (P/R).
248 Entropy

This proves part (i). Next, observe that if P ≺ Q then


 
   μ P∩R
Hμ (P/R) = −μ Q ∩ R log
P R Q⊂P
μ(R)
 
   μ Q∩R
≤ −μ Q ∩ R log = Hμ (Q/R).
P R Q⊂P
μ(R)

This proves the first half of claim (ii). To prove the second half, note that for
any P ∈ P and R ∈ R,
   
μ R∩P  μ(Q) μ R ∩ Q
= .
μ(P) Q⊂P
μ(P) μ(Q)

It is clear that Q⊂P μ(Q)/μ(P) = 1. Therefore, by (9.1.8),
   
 μ(Q)  μ R ∩ Q 

μ R∩P
φ ≥ φ
μ(P) Q⊂P
μ(P) μ(Q)

for every P ∈ P and R ∈ R. Consequently,


   

 
 μ(Q)  μ R ∩ Q 
μ R∩P
Hμ (R/P) = μ(P)φ ≥ μ(P) φ
P,R
μ(P) P,R Q⊂P
μ(P) μ(Q)
   
μ R∩Q
= μ(Q)φ = Hμ (R/Q).
Q,R
μ(Q)

Finally, it follows from the definition in (9.1.9) that Hμ (P/Q) = 0 if and only if
 
  μ P∩Q
μ P ∩ Q = 0 or else =1
μ(Q)
for every P ∈ P and every Q ∈ Q. In other words, either Q is disjoint from P
(up to measure zero) or else Q is contained in P (up to measure zero). This
means that Hμ (P/Q) = 0 if and only if P ≺ Q.

In particular, taking Q = M in part (ii) of the lemma we get that


Hμ (R/P) ≤ Hμ (R) for any partitions R and P. (9.1.10)
Moreover, taking R = M in part (i) we find that
 
Hμ P ∨ Q = Hμ (P) + Hμ (Q/P) ≤ Hμ (P) + Hμ (Q). (9.1.11)
Let f : M → N be a measurable transformation and μ be a probability
measure on M. Then, f∗ μ is a probability measure on N. Moreover, if P is a
partition of N then f −1 (P) = {f −1 (P) : P ∈ P} is a partition of M. By definition,

−1
Hμ (f (P)) = −μ(f −1 (P)) log μ(f −1 (P))
P∈P
 (9.1.12)
= −f∗ μ(P) log f∗ μ(P) = Hf ∗ μ (P).
P∈P
9.1 Definition of entropy 249

In particular, if M = N and the measure μ is invariant under f then


Hμ (f −1 (P)) = Hμ (P) for every partition P. (9.1.13)
We also need the following continuity property:
Lemma 9.1.6. Given k ≥ 1 and ε > 0 there exists δ > 0 such that, for any finite
partitions P = {P1 , . . . , Pk } and Q = {Q1 , . . . , Qk },
μ(Pi Qi ) < δ for every i = 1, . . . , k ⇒ Hμ (Q/P) < ε.

Proof. Fix ε > 0 and k ≥ 1. Since φ : [0, 1] → R, φ(x) = −x log x is a


continuous function, there exists ρ > 0 such that φ(x) < ε/k2 for every
x ∈ [0, ρ)∪(1−ρ, 1]. Let δ = ρ/k. Given partitions P and Q as in the statement,
denote by R the partition whose elements are the intersections Pi ∩Qj with i = j
and also the set ki=1 Pi ∩ Qi . Note that μ(Pi ∩ Qj ) ≤ μ(Pi Qi ) < δ for every
i  = j and
  
k k
   k
 
μ Pi ∩ Qi ≥ μ(Pi ) − μ(Pi Qi ) > μ(Pi ) − δ = 1 − ρ.
i=1 i=1 i=1

Therefore,  ε
Hμ (R) = φ(μ(R)) < #R ≤ ε.
R∈R
k2
It is clear from the definition that P ∨ Q = P ∨ R. Then, using (9.1.11) and
(9.1.10),
   
Hμ (Q/P) = Hμ P ∨ Q − Hμ (P) = Hμ P ∨ R − Hμ (P)
= Hμ (R/P) ≤ Hμ (R) < ε.
This proves the lemma.

9.1.3 Entropy of a dynamical system


Let f : M → M be a measurable transformation preserving a probability
measure μ. The notion of the entropy of the system (f , μ) that we introduce in
what follows is inspired by (9.1.5).
Given a partition P of M with finite entropy, denote
)
n−1
P =
n
f −i (P) for each n ≥ 1.
i=0

Observe that the element P (x) that contains x ∈ M is given by:


n

P n (x) = P(x) ∩ f −1 (P(f (x))) ∩ · · · ∩ f −n+1 (P(f n−1 (x))).


It is clear that the sequence P n is non-decreasing, that is, P n ≺ P n+1 for
every n. Therefore, the sequence of entropies Hμ (P n ) is also non-decreasing.
Another important fact is that this sequence is subadditive:
250 Entropy

Lemma 9.1.7. Hμ (P m+n ) ≤ Hμ (P m ) + Hμ (P n ) for every m, n ≥ 1.


*m+n−1
Proof. By definition, P m+n = i=0 f −i (P) = P m ∨ f −m (P n ). Therefore,
using (9.1.11),
Hμ (P m+n ) ≤ Hμ (P m ) + Hμ (f −m (P n )). (9.1.14)
On the other hand, since the measure μ is invariant under f , the property
(9.1.13) implies that Hμ (f −m (P n )) = Hμ (P n ) for every m, n. Substituting this
fact in (9.1.14) we get the conclusion of the lemma.

In view of Lemma 3.3.4, it follows from Lemma 9.1.7 that the limit
1
hμ (f , P) = lim Hμ (P n ) (9.1.15)
n n

exists and coincides with the infinitum of the sequence on the left-hand side.
We call hμ (f , P) the entropy of f with respect to the partition P. Observe that
this entropy is a non-decreasing function of the partition:

P ≺Q ⇒ hμ (f , P) ≤ hμ (f , Q). (9.1.16)

Indeed, if P ≺ Q then P n ≺ Qn for every n. Using Lemma 9.1.5, it follows that


Hμ (P n ) ≤ Hμ (Qn ) for every n, and this implies (9.1.16).
Finally, the entropy of the system (f , μ) is defined by

hμ (f ) = sup hμ (f , P), (9.1.17)


P

where the supremum is taken over all the partitions with finite entropy. A useful
observation is that the definition is not affected if we take the supremum only
over the finite partitions (see Exercise 9.1.2).

Example 9.1.8. Suppose that the invariant measure μ is supported on a


periodic orbit. In other words, there exist x ∈ M and k ≥ 1 such that f k (x) = x
and the measure μ is given by
1 
μ= δx + δf (x) + · · · + δf k−1 (x) .
k
Note that this measure takes only a finite number of values (because the Dirac
measure takes only the values 0 and 1). Hence, the entropy function P →
Hμ (P) also takes only finitely many values. In particular, limn n−1 Hμ (P n ) = 0
for every partition P. This proves that hμ (f ) = 0.

Example 9.1.9. Consider the decimal expansion map f : [0, 1] → [0, 1], given
by f (x) = 10x − [10x]. As observed previously, f preserves the Lebesgue
measure μ on the interval. Let P be the partition of [0,n1] into the intervals
of the form (i − 1)/10, i/10]
 with i =n1, . . . ,n10. Then, P is the partition into
the intervals of the form (i − 1)/10 , i/10 ] with i = 1, . . . , 10n . Using the
9.1 Definition of entropy 251

calculation in Example 9.1.1, we get that


1
hμ (f , P) = lim Hμ (P n ) = log 10.
n n

Using the theory in Section 9.2 (the theorem of Kolmogorov–Sinai and its
corollaries), one can easily check that this is also the value of the entropy hμ (f ),
that is, P realizes the supremum in the definition (9.1.17).

Example 9.1.10. Consider the shift map σ :  →  in  = {1, . . . , d}N (or


 = {1, . . . , d}Z ), with a Bernoulli measure μ = ν N (respectively, μ = ν Z ). Let
P be the partition of  into the cylinders [0; a] with a = 1, . . . , d. Then, P n is
the partition into cylinders [0; a1 , . . . , an ] of length n. Using the calculation in
Example 9.1.2 we conclude that

1  d
hμ (σ , P) = lim Hμ (P ) =
n
−pi log pi . (9.1.18)
n n
i=1

The theory presented in Section 9.2 permits us to prove that this is also the
value of the entropy hμ (σ ).

It follows from expression (9.1.18) that for every x > 0 there exists some
Bernoulli shift (σ , μ) such that hμ (σ ) = x. We use this observation a few times
in what follows.

Lemma 9.1.11. hμ (f , Q) ≤ hμ (f , P) + Hμ (Q/P) for any partitions P and Q


with finite entropy.

Proof. By Lemma 9.1.5, for every n ≥ 1,


   
Hμ Qn+1 /P n+1 = Hμ Qn ∨ f −n (Q)/P n ∨ f −n (P)
   
≤ Hμ Qn /P n + Hμ f −n (Q)/f −n (P) .
The last term is equal to Hμ (Q/P), because the measure μ is invariant under f .
Therefore, the previous relation proves that
   
Hμ Qn /P n ≤ nHμ Q/P for every n ≥ 1. (9.1.19)

Using Lemma 9.1.5 once more, it follows that


 
Hμ (Qn ) ≤ Hμ P n ∨ Qn = Hμ (P n ) + Hμ (Qn /P n ) ≤ Hμ (P n ) + nHμ (Q/P).

Dividing by n and taking the limit when n → ∞, we get the conclusion of the
lemma.

*n −j
Lemma 9.1.12. hμ (f , P) = limn Hμ (P/ j=1 f (P)) for any partition P with
finite entropy.
252 Entropy

Proof. Using Lemma 9.1.5(i) and the fact that the measure μ is invariant
under f , we get that
 n−1
)   n−1
)   n−1
) 
−j −j −j
Hμ f (P) = Hμ f (P) + Hμ P/ f (P)
j=0 j=1 j=1
 n−2
)   )
n−1 
−j −j
= Hμ f (P) + Hμ P/ f (P)
j=0 j=1

for every n. By recurrence, it follows that


 n−1
)  
n−1  )k 
Hμ f −j (P) = Hμ (P) + Hμ P/ f −j (P) .
j=0 k=1 j=1

Therefore, hμ (f , P) is given by the Cesàro limit


 n−1
)  
n−1  ) k 
1 1
hμ (f , P) = lim Hμ f −j (P) = lim Hμ P/ f −j (P) .
n n n n
j=0 k=1 j=1
*
On the other hand, Lemma 9.1.5(ii) ensures that Hμ (P/ nj=1 f −j (P)) is a
*
non-increasing sequence. In particular, limn Hμ (P/ nj=1 f −j (P)) exists and,
consequently, coincides with the Cesàro limit in the previous identity.
* −j
Recall that P n = n−1j=0 f (P). When f : M → M is invertible, we also
*
consider P ±n = j=−n
n−1 −j
f (P).
Lemma 9.1.13. Let P be a partition with finite entropy. For every k ≥ 1, we
have hμ (f , P) = hμ (f , P k ) and, if f is invertible, hμ (f , P) = hμ (f , P ±k ).

Proof. Observe that, given any n ≥ 1,


)
n−1 ) )
n−1 k−1  n+k−2
)
−j −j −i
f (P ) =
k
f f (P) = f −l (P) = P n+k−1 .
j=0 j=0 i=0 l=0

Therefore,
  1   1    
hμ f , P k = lim Hμ P n+k−1 = lim Hμ P n = hμ f , P .
n n n n

This proves the first part of the lemma. To prove the second part, note that
)
n−1 ) )
n−1 k−1  n+k−2
)  
−j ±k −j −i
f (P ) = f f (P) = f −l (P) = f −k P n+2k−1
j=0 j=0 i=−k l=−k

for every n and every k. Therefore,


  1   1    
hμ f , P ±k = lim Hμ f −k (P n+2k−1 ) = lim Hμ P n+2k−1 = hμ f , P
n n n n

(the second equality uses the fact that μ is invariant under f ).


9.1 Definition of entropy 253

Proposition 9.1.14. One has hμ (f k ) = khμ (f ) for every k ∈ N. If f is invertible


then hμ (f k ) = |k|hμ (f ) for every k ∈ Z.
Proof. It is clear that the identity holds for k = 0, since f 0 = id and hμ (id) = 0.
Take k to be non-zero from now on. Let g = f k and P be any partition of M
with finite entropy. Recalling that P k = P ∨ f −1 (P) ∨ · · · ∨ f −k+1 (P), we see
that
)
km−1 )
m−1 )
k−1  m−1 )
−j −ki −j
P =
km
f (P) = f f (P) = g−i (P k ).
j=0 i=0 j=0 i=0

Therefore,
  1    
khμ f , P = lim Hμ P km = hμ g, P k . (9.1.20)
m m

Since P ≺ P k , this implies that hμ (g, P) ≤ khμ (f , P) ≤ hμ (g) for any P. Taking
the supremum over these partitions P, it follows that hμ (g) ≤ khμ (f ) ≤ hμ (g).
This proves that khμ (f ) = hμ (g), as stated.
Now suppose that f is invertible. Let P be any partition of M with finite
entropy. For any n ≥ 1,
+ n−1
) % + + n−1
) %% + n−1
) %
−j −n+1
Hμ f (P) = Hμ f f (P)
i
= Hμ f i (P) ,
j=0 i=0 i=0

because the measure μ is invariant under f . Dividing by n and taking the limit
when n → ∞, we get that
hμ (f , P) = hμ (f −1 , P). (9.1.21)
Taking the supremum over these partitions P, it follows that hμ (f ) = hμ (f −1 ).
Replacing f with f k and using the first half of the proposition, we get that
hμ (f −k ) = hμ (f k ) = khμ (f ) for every k ∈ N.

9.1.4 Exercises
9.1.1. Prove that Hμ (P/R) ≤ Hμ (P/Q) + Hμ (Q/R) for any partitions P, Q and R.
9.1.2. Show that the supremum of hμ (f , P) over the finite partitions coincides with the
supremum over all the partitions with finite entropy.
* −i
*n −j
9.1.3. Check that limn Hμ ( k−1 i=0 f (P)/ j=k f (P)) = kh(f , P) for every partition P
with finite entropy and every k ≥ 1.
9.1.4. Let f : M → M be a measurable transformation preserving a probability
measure μ.
(a) Assume that there exists an invariant set A ⊂ M with μ(A) ∈ (0, 1). Let
μA and μB be the normalized restrictions of μ to the sets A and B = Ac ,
respectively. Show that hμ (f ) = μ(A)hμA (f ) + μ(B)hμB (f ).

(b) Suppose that μ is a convex combination μ = ni=1 ai μi of ergodic measures

μ1 , . . . , μn . Show that hμ (f ) = ni=1 ai hμi (f ).
[Observation: In Section 9.6 we prove much stronger results.]
9.1.5. Let (M, B, μ) and (N, C, ν) be probability spaces and f : M → M and g : N → N be
measurable transformations preserving the measures μ and ν, respectively. We
254 Entropy

say that (g, ν) is a factor of (f , μ) if there exists a measurable map, not necessarily
invertible, φ : (M, B) → (N, C) such that φ∗ μ = ν and φ ◦ f = g ◦ φ at almost every
point. Show that in that case hν (g) ≤ hμ (f ).

9.2 Theorem of Kolmogorov–Sinai


In general, the main difficulty in calculating the entropy lies in the calculation
of the supremum in (9.1.17). The methods that we develop in this section
permit the simplification of that task in many cases, by identifying certain
partitions P that realize the supremum, that is, such that hμ (f , P) = hμ (f ).
The main result is:
Theorem 9.2.1 (Kolmogorov–Sinai). Let P1 ≺ · · · ≺ Pn ≺ · · · be a non-
decreasing sequence of partitions with finite entropy such that ∞ n=1 Pn
generates the σ -algebra of measurable sets, up to measure zero. Then,
hμ (f ) = lim hμ (f , Pn ).
n

Proof. The limit always exists, for property (9.1.16) implies that the sequence
hμ (f , Pn ) is non-decreasing. The inequality ≥ in the statement is a direct
consequence of the definition of entropy. Therefore, we only need to show
that hμ (f , Q) ≤ limn hμ (f , Pn ) for every partition Q with finite entropy. We use
the following fact, which is interesting in itself:
Proposition 9.2.2. Let A be an algebra that generates the σ -algebra of
measurable sets, up to measure zero. For every partition Q with finite
entropy and every ε > 0 there exists some finite partition P ⊂ A such that
Hμ (Q/P) < ε.

Proof. The first step is to reduce the statement to the case when Q is finite.
Denote by Qj , j = 1, 2, . . . the elements of Q. For each k ≥ 1, consider the finite
partition
 
k 
Qk = Q1 , . . . , Qk , M \ Qj .
j=1

Lemma 9.2.3. If Q is a partition with finite entropy then limk Hμ (Q/Qk ) = 0.


k
Proof. Denote Q0 = M \ j=1 Qj . By definition,
 

k 
  μ Qi ∩ Qj
Hμ (Q/Qk ) = −μ Qi ∩ Qj log .
i=0 j≥1
μ(Q i )

All the terms with i ≥ 1 vanish, since in that case μ(Qi ∩ Qj ) is equal to zero if
i = j and is equal to μ(Qi ) if i = j. For i = 0 we have that μ(Qi ∩ Qj ) is equal
to zero if j ≤ k and is equal to μ(Qj ) if j > k. Therefore,
 μ(Qj ) 
Hμ (Q/Qk ) = −μ(Qj ) log ≤ −μ(Qj ) log μ(Qj ).
j>k
μ(Q 0 ) j>k
9.2 Theorem of Kolmogorov–Sinai 255

The hypothesis that Q has finite entropy means that the expression on the
right-hand side converges to zero when k → ∞.

Given ε > 0, fix k ≥ 1 such that Hμ (Q/Qk ) < ε/2. Consider any δ > 0. By
the approximation theorem (Theorem A.1.19), for each i = 1, . . . , k there exists
Ai ∈ A such that
μ(Qi Ai ) < δ/(2k2 ). (9.2.1)
Define P1 = A1 and Pi = Ai \ i−1 j=1 Aj for i = 2, . . . , k and P0 = M \
k−1
j=1 Aj .
It is clear that P = {P1 , . . . , Pk , P0 } is a partition of M and also that Pi ∈ A for
i−1
every i. For i = 1, . . . , k, we have Pi Ai = Pi \ Ai = Ai ∩ j=1 Aj . For any x
in this set, there is j < i such that x ∈ Ai ∩ Aj . Since Qi ∩ Qj = ∅, it follows that
x ∈ (Ai \ Qi ) ∪ (Aj \ Qj ). This proves that

i i
Pi Ai ⊂ (Aj \ Qj ) ⊂ (Aj Qj ),
j=1 j=1

and so μ(Pi Ai ) < iδ/(2k2 ) ≤ δ/(2k). Together with (9.2.1), this implies that
μ(Pi Qi ) < δ/(2k2 ) + δ/(2k) ≤ δ/k for i = 1, . . . , k. (9.2.2)
k
Moreover, P0 Q0 ⊂ i=1 Pi Qi since P and Qk are partitions of M. Hence,
(9.2.2) implies that
μ(P0 Q0 ) < δ. (9.2.3)
By Lemma 9.1.6, the relations (9.2.2) and (9.2.3) imply that Hμ (Qk /P) <
ε/2, as long as we take δ > 0 sufficiently small. Then, by the inequality in
Exercise 9.1.1,
Hμ (Q/P) ≤ Hμ (Q/Qk ) + Hμ (Qk /P) < ε,
as stated.

Corollary 9.2.4. If (Pn )n is a sequence of partitions as in Theorem 9.2.1 then


limn Hμ (Q/Pn ) = 0 for every partition Q with finite entropy.

Proof. For each n, let An be the algebra generated by nj=1 Pj . Then (An )n is a
non-decreasing sequence and the union A = n An is the algebra generated by
n Pn . Consider any ε > 0. By Proposition 9.2.2, there exists a finite partition
P ⊂ A such that Hμ (Q/P) < ε. Hence, since P is finite, there exists m ≥ 1
such that P ⊂ Am and, thus, P is coarser than Pm . Then, using Lemma 9.1.5,
Hμ (Q/Pn ) ≤ Hμ (Q/Pm ) ≤ Hμ (Q/P) < ε for every n ≥ m.
This completes the proof of the corollary.

We are ready to conclude the proof of Theorem 9.2.1. By Lemma 9.1.11,


hμ (f , Q) ≤ hμ (f , Pn ) + Hμ (Q/Pn ) for every n.
Taking the limit as n → ∞, we get that hμ (f , Q) ≤ limn hμ (f , Pn ) for every
partition Q with finite entropy.
256 Entropy

9.2.1 Generating partitions


In this section and the ones that follow, we deduce several useful consequences
of Theorem 9.2.1.

Corollary 9.2.5. Let P be a partition with finite entropy such that the union of
*n−1 −j
the iterates P n = j=0 f (P), n ≥ 1 generates the σ -algebra of measurable
sets, up to measure zero. Then, hμ (f ) = hμ (f , P).

Proof. It suffices to apply Theorem 9.2.1 to the sequence P n and to recall that
hμ (f , P n ) = hμ (f , P) for every n, by Lemma 9.1.13.

Corollary 9.2.6. Assume that the system (f , μ) is invertible. Let P be a


partition with finite entropy such that the union of the iterates P ±n =
*n−1 −j
j=−n f (P), n ≥ 1 generates the σ -algebra of measurable sets, up to measure
zero. Then, hμ (f ) = hμ (f , P).

Proof. It suffices to apply Theorem 9.2.1 to the sequence P ±n and to recall


that hμ (f , P ±n ) = hμ (f , P) for every n, by Lemma 9.1.13.

In particular, Corollaries 9.2.5 and 9.2.6 complete the calculation of the


entropy of the decimal expansion and the Bernoulli shifts that we started in
Examples 9.1.9 and 9.1.10, respectively.
In both situations, Corollary 9.2.5 and Corollary 9.2.6, we say that P is
a generating partition or, simply, a generator of the system. Note, however,
that this contains a certain abuse of language, since the conditions in the
two corollaries are not equivalent. For example, for the shift map in M =
{1, . . . , d}Z , the partition P into cylinders {[0; a] : a = 1, . . . , d} is such that the
union of the two-sided iterates P ±n generates the σ -algebra of measurable sets
but the union of the one-sided iterates P n does not. Whenever it is necessary to
distinguish between these two concepts, we talk of a one-sided generator and
a two-sided generator, respectively.
In this regard, let us point out that certain invertible systems admit one-sided
generators. For example, if f : S1 → S1 is an irrational rotation and P = {I, S1 \
I} is a partition of the circle into two complementary intervals, then P is a
one-sided generator (and also a two-sided generator, of course). However, this
kind of situation is possible only for systems with entropy zero:

Corollary 9.2.7. Assume that the system (f , μ) is invertible and there exists a
partition P with finite entropy such that ∞n=1 P generates the σ -algebra of
n

measurable sets, up to measure zero. Then, hμ (f ) = 0.

Proof. Combining Lemma 9.1.12 and Corollary 9.2.5, we get that

hμ (f ) = hμ (f , P) = lim Hμ (P/f −1 (P n )).


n
9.2 Theorem of Kolmogorov–Sinai 257

Since n P n generates the σ -algebra B of measurable sets, n f −1 (P n )


generates the σ -algebra f −1 (B). Now notice that f −1 (B) = B, since f is
invertible. Hence, Corollary 9.2.4 implies that Hμ (P/f −1 (P n )) converges to
zero when n → ∞. It follows that hμ (f ) = hμ (f , P) = 0.

Now take M to be a metric space and μ to be a Borel probability measure.

Corollary 9.2.8. Let P1 ≺ · · · ≺ Pn ≺ · · · be a non-decreasing sequence of


partitions with finite entropy such that diam Pn (x) → 0 for μ-almost every
x ∈ M. Then,
hμ (f ) = lim hμ (f , Pn ).
n

Proof. Let U be any non-empty open subset of M. The hypothesis ensures that
for each x ∈ U there exists n(x) such that the set Px = Pn(x) (x) is contained in
U. It is clear that Px belongs to the algebra A generated by n Pn . Observe
also that A is countable, since it consists of the finite unions of elements of the
partitions Pn together with the complements of such unions. In particular, the
map x → Px takes only countably many values. It follows that U = x∈U Px is
in the σ -algebra generated by A. This proves that the σ -algebra generated by
A contains all the open sets and, thus, all the Borel sets. Now, the conclusion
follows directly from Theorem 9.2.1.

Example 9.2.9. Let f : S1 → S1 be a homeomorphism and μ be any invariant


probability measure. Given a finite partition P of S1 into subintervals, denote
by x1 , . . . , xm their endpoints. For any j ≥ 1, the partition f −j (P) consists of the
subintervals of S1 determined by the points f −j (xi ). This implies that, for each
n ≥ 1, the elements of P n have their endpoints in the set
{f −j (xi ) : j = 0, . . . , n − 1 and i = 1, . . . , m}.
In particular, #P n ≤ mn. Then, using Lemma 9.1.3,
1 1 1
hμ (f , P) = lim Hμ (P n ) ≤ lim log #P n = lim log mn = 0.
n n n n n n

It follows that hμ (f ) = 0: to see that, it suffices to consider any sequence


of finite partitions into intervals with diameter going to zero and to apply
Corollary 9.2.8.

Corollary 9.2.10. Let P be a partition with finite entropy such that we have
diam P n (x) → 0 for μ-almost every x ∈ M. Then, hμ (f ) = hμ (f , P).

Proof. It suffices to apply Corollary 9.2.8 to the sequence P n , recalling that


hμ (f , P n ) = hμ (f , P) for every n.

Analogously, if f is invertible and P is a partition with finite entropy and


such that diam P ±n (x) → 0 for μ-almost every x ∈ M, then hμ (f ) = hμ (f , P).
258 Entropy

It is known that generators do exist in most interesting cases, although it


may be difficult to exhibit a generator explicitly. Indeed, suppose that the
ambient M is a Lebesgue space. Rokhlin [Rok67a, §10] proved that if a
system is aperiodic (that is, the set of periodic points has measure zero) and
almost every point has a countable (finite or infinite) set of pre-images, then
there exists some generator. In particular, every invertible aperiodic system
admits some countable generator. In general, this generator may have infinite
entropy. But Rokhlin also showed that every invertible aperiodic system with
finite entropy admits some two-sided generator with finite entropy. Moreover
(Krieger [Kri70]), this generator may be chosen to be finite if the system is
ergodic.

9.2.2 Semi-continuity of the entropy


Next, we examine some properties of the entropy function that associates
with each invariant measure μ of a given transformation f the value of
the corresponding entropy hμ (f ). We are going to see that this function is
usually not continuous. However, under quite broad assumptions, it is upper
semi-continuous: given any ε > 0, one has hν (f ) ≤ hμ (f ) + ε for every ν
sufficiently close to μ. That holds, in particular, for the class of transformations
that we call expansive. These facts have important consequences, some of
which are explored in Sections 9.2.3 and 9.6. Moreover, we return to this
subject in Section 10.5.
Let us start by showing, through an example, that the entropy function may
be discontinuous:

Example 9.2.11. Let f : [0, 1] → [0, 1] be the decimal expansion map. As we


saw in Example 9.1.9, the entropy of f with respect to the Lebesgue measure
m is hm (f ) = log 10. For each k ≥ 1, denote by Fk the set of fixed points of
the iterate f k . Observe that Fk is an invariant set with #Fk = 10k . Observe
also that these sets are equidistributed, in the following sense: each interval
[(i − 1)/10k , i/10k ] contains exactly one point of Fk . Consider the sequence of
measures
1 
μk = k δx .
10 x∈F
k

The previous observations imply (check!) that each μk is an invariant


probability measure and the sequence (μk )k converges to the Lebesgue
measure m in the weak∗ topology. Since μk is supported on a finite set, the
same argument as in Example 9.1.8 proves that hμk (f ) = 0 for every k. In
particular, we have that limk hμk (f ) = 0 < hm (f ).
9.2 Theorem of Kolmogorov–Sinai 259

On the other hand, in general, consider any finite partition P of M whose


boundary

∂P = ∂P
P∈P

satisfies μ(∂P) = 0. By Theorem 2.1.2 or, more precisely, by the fact that the
topology (2.1.5) is equivalent to the weak∗ topology, the function ν → ν(P) is
continuous at the point μ, for every P ∈ P. Consequently, the function

ν → Hν (P) = −ν(P) log ν(P)
P∈P

is also continuous at μ. The hypothesis on P also implies that μ(∂P n ) = 0 for


every n ≥ 1, since

∂P n ⊂ ∂P ∪ f −1 (∂P) ∪ · · · ∪ f −n+1 (∂P).

Thus, the same argument shows that ν → Hν (P n ) is continuous for every n.

Proposition 9.2.12. Let P be a finite partition such that μ(∂P) = 0. Then the
function ν → hν (f , P) is upper semi-continuous at μ.

Proof. Recall that, by definition,


1
hν (f , P) = inf Hν (f , P).
n n

It is a well-known easy fact that the infimum of any family of continuous


functions is an upper semi-continuous function.

Corollary 9.2.13. Assume that there exists a finite partition P such that
μ(∂P) = 0 and n P n generates the σ -algebra of measurable sets of M, up to
measure zero. Then the function η → hη (f ) is upper semi-continuous at μ.

Proof. By Proposition 9.2.12, given ε > 0 there exists a neighborhood U of μ


such that hν (f , P) ≤ hμ (f , P) + ε for every ν ∈ V. By definition, hμ (f , P) ≤
hμ (f ). By Corollary 9.2.5, the hypothesis implies that hν (f , P) = hν (f ) for
every ν. Therefore, hν (f ) ≤ hμ (f ) + ε for every ν ∈ V.

Now let us suppose that M is a compact metric space. As before, take μ


to be a Borel probability measure. By definition, the diameter diam P of a
partition P is the supremum of the diameters of its elements. Then we have the
following more specialized version of the previous corollary:

Corollary 9.2.14. Assume that there exists ε0 > 0 such that every finite
partition P with diam P < ε0 satisfies limn diam P n = 0. Then, the functionμ →
260 Entropy

hμ (f ) is upper semi-continuous. Consequently, that function is bounded and its


supremum is attained for some measure μ.

Proof. As we saw in Corollary 9.2.10, the property limn diam P n = 0 implies


that n P n generates the σ -algebra of measurable sets. On the other hand,
given any invariant probability measure μ, it is easy to choose2 a partition
P with diameter smaller than ε0 and such that μ(∂P) = 0. It follows from the
previous corollary that the entropy function is upper semi-continuous at μ and,
since μ is arbitrary, this gives the first claim in the statement.
The other claims are general consequences of semi-continuity and the fact
that the domain of the entropy function, that is, the space M1 (f ) of all invariant
probability measures, is compact.
*
When f is invertible we may replace P n by P ±n = n−1 −j
j=−n f (P) in
the statement of Corollaries 9.2.13 and 9.2.14. The proof is analogous,
using Corollary 9.2.5 and the version of Corollary 9.2.10 for invertible
transformations.

9.2.3 Expansive transformations


Next, we discuss a rather broad class of transformations that satisfy the
conditions in Corollary 9.2.14.
A continuous transformation f : M → M in a metric space is said to be
expansive if there exists ε0 > 0 (called a constant of expansivity) such that,
given any x, y ∈ M with x = y, there exists n ∈ N such that d(f n (x), f n (y)) ≥ ε0 .
That is, any two distinct orbits of f may be distinguished, at a macroscopic
scale, at some stage of the iteration.
When f is invertible, there is also a two-sided version of the notion of
expansivity, defined as follows: there exists ε0 > 0 such that, given any x, y ∈ M
with x = y there exists n ∈ Z such that d(f n (x), f n (y)) ≥ ε0 . It is clear that
(one-sided) expansive homeomorphisms are also two-sided expansive.

Example 9.2.15. Let σ :  →  be the shift map in  = {1, . . . , d}N . Consider


in  the distance defined by d((xn )n , (yn )n ) = 2−N , where N is the smallest
value of n such that xn  = yn . Note that d(σ N (x), σ N (y)) = 20 = 1 if x = (xn )n
and y = (yn )n are distinct. This shows that σ is an expansive transformation,
with ε0 = 1 as a constant of expansivity.
Analogously, the two-sided shift map σ :  →  in  = {1, . . . , d}Z is
two-sided expansive (but not one-sided expansive).

2 For example: for each x choose r ∈ (0, ε ) such that the boundary of the ball of center x and
x 0
radius rx has measure zero. Then let U be a finite cover of M by such balls and take for P the
partition defined by U , that is, the partition whose elements are the maximal sets that, for each
U ∈ U , are either contained in U or disjoint from U; see Figure 2.1.
9.2 Theorem of Kolmogorov–Sinai 261

We leave it to the reader to check (Exercise 9.2.1) that the decimal expansion
transformation f (x) = 10x − [10x] is also expansive. On the other hand, torus
rotations are never expansive.
Proposition 9.2.16. Let f : M → M be an expansive transformation in a
compact metric space and let ε0 > 0 be a constant of expansivity. Then
limn diam P n = 0 for every finite partition P with diam P < ε0 .

Proof. It is clear that the sequence diam P n is non-increasing. Suppose that its
infimum δ is positive. Then, for every n ≥ 1 there exist points xn and yn such
that d(xn , yn ) > δ/2 but xn and yn belong to the same element of P n and, thus,
satisfy
d(f j (xn ), f j (yn )) ≤ diam P < ε0 for every 0 ≤ j < n.
By compactness, there exists a sequence (nj )j → ∞ such that (xnj )j and
(ynj )j converge to points x and y, respectively. Then, d(x, y) ≥ δ/2 > 0 but
d(f j (x), f j (y)) ≤ diam P < ε0 for every j ≥ 0. This contradicts the hypothesis
that ε0 is a constant of expansivity.

Corollary 9.2.17. If f : M → M is an expansive transformation in a compact


metric space then the entropy function is upper semi-continuous. Moreover,
there exist invariant probability measures μ whose entropy hμ (f ) is maximum
among all the invariant probability measures of f .

Proof. Combine Proposition 9.2.16 with Corollary 9.2.14.

If the transformation f is invertible and two-sided expansive, we may replace


P by P ±n in Proposition 9.2.16 and the conclusion of Corollary 9.2.17 also
n

remains valid as stated.

9.2.4 Exercises
9.2.1. Show that the decimal expansion f : [0, 1] → [0, 1], f (x) = 10x − [10x] is
expansive and exhibit a constant of expansivity.
9.2.2. Check that for every s > 0 there exists some Bernoulli shift (σ , μ) whose entropy
is equal to s.
9.2.3. Let X = {0} ∪ {1/n : n ≥ 1} and consider the space  = X N endowed with the
distance d((xn )n , (yn )n ) = 2−N |xN − yN |, where N = min{n ∈ N : xn  = yn }.
(a) Verify that the shift map σ :  →  is not expansive.
(b) For each k ≥ 1, let νk be the probability measure on X that assigns weight
1/2 to each of the points 1/k and 1/(k + 1). Use the Bernoulli measures
μk = νkN to conclude that the entropy function of the shift is not upper
semi-continuous.
(c) Let μ be the Bernoulli measure associated with any probability vector

(px )x∈X such that x∈X −px log px = ∞. Show that hμ (σ ) is infinite.
9.2.4. Let f : S1 → S1 be a covering map of degree d ≥ 2 and μ be a probability measure
invariant under f . Show that hμ (f ) ≤ log d.
262 Entropy

9.2.5. Let P and Q be two partitions with finite entropy. Show that if P is coarser than
*∞ −j
j=0 f (Q) then hμ (f , P) ≤ hμ (f , Q).
9.2.6. Show that if A is an algebra that generates the σ -algebra of measurable sets, up
to measure zero, then the supremum of hμ (f , P) over the partitions with finite
entropy (or even the finite partitions) P ⊂ A coincides with hμ (f ).
9.2.7. Consider transformations f : M → M and g : N → N preserving probability
measures μ and ν, respectively. Consider f × g : M × N → M × N defined by
(f × g)(x, y) = (f (x), g(y)). Show that f × g preserves the product measure μ × ν
and that hμ×ν (f × g) = hμ (f ) + hν (g).

9.3 Local entropy


The theorem of Shannon–McMillan–Breiman, which we discuss in this
section, provides a complementary view of the concept of entropy, more
detailed and more local in nature. We also mention a topological version of
that idea, which is due to Brin–Katok.

Theorem 9.3.1 (Shannon–McMillan–Breiman). Given any partition P with


finite entropy, the limit
1
hμ (f , P, x) = lim − log μ(P n (x)) exists at μ-almost every point. (9.3.1)
n n
The function x → hμ (f , P, x) is μ-integrable, and the limit in (9.3.1) also holds
in L1 (μ). Moreover,

hμ (f , P, x) dμ(x) = hμ (f , P).

If (f , μ) is ergodic then hμ (f , P, x) = hμ (f , P) at μ-almost every point.

Recall that P n (x) = P(x) ∩ f −1 (P(f (x))) ∩ · · · ∩ f −n+1 (P(f n−1 (x))), that
is, P n (x) is formed by the points whose trajectories remain “close” to the
trajectory of x during n iterates, in the sense that they visit the same elements
of P. Theorem 9.3.1 states that the measure of this set has a well-defined
exponential rate of decay: at μ-almost every point,
μ(P n (x)) ≈ e−nhμ (f ,P,x) for every large n.
The proof of the theorem is presented in Section 9.3.1.
The theorem of Brin–Katok that we state in the sequel belongs to the same
family of results, but uses a different notion of proximity.

Definition 9.3.2. Let f : M → M be a continuous map in a compact metric


space. Given x ∈ M, n ≥ 1 and ε > 0, we call the dynamical ball of length n
and radius ε around x the set
B(x, n, ε) = {y ∈ M : d(f j (x), f j (y)) < ε for every j = 0, 1, . . . , n − 1}.
9.3 Local entropy 263

In other words,
B(x, n, ε) = B(x, ε) ∩ f −1 (B(f (x), ε)) ∩ · · · ∩ f −n+1 (B(f n−1 (x), ε)).
Define
1
h+
μ (f , ε, x) = lim sup − log μ(B(x, n, ε)) and
n n
1
h−
μ (f , ε, x) = limninf − log μ(B(x, n, ε)).
n
Theorem 9.3.3 (Brin–Katok). Let μ be a probability measure invariant under
f . The limits
lim h+ μ (f , ε, x) and lim h−
μ (f , ε, x)
ε→0 ε→0
exist and are equal, for μ-almost every point. Denoting by hμ (f , x) their
common value, the function hμ (f , ·) is integrable and

hμ (f ) = hμ (f , x)dμ(x).

The proof of this result may be found in the original paper of Brin and
Katok [BK83], and is not presented here.
Example 9.3.4 (Translations in compact groups). Let G be a compact
metrizable group and μ be its Haar measure. Every translation of G has zero
entropy with respect to μ. Indeed, consider in G any distance d that is invariant
under all the translations (recall Lemma 6.3.6). Relative to such a distance,
Lgj (B(x, ε)) = B(Lgj (x), ε)
for every g ∈ G and j ∈ Z. Consequently, B(x, n, ε) = B(x, ε) for every n ≥ 1.
Then,
1
h± μ (Lg , ε, x) = lim − log μ(B(x, ε)) = 0
n n
for every ε > 0 and x ∈ G. By the theorem of Brin–Katok, it follows
that hμ (Lg ) = 0 for every g ∈ G. The same argument applies to every
right-translation Rg .

9.3.1 Proof of the Shannon–McMillan–Breiman theorem


Consider the sequence of functions ϕn : M → R defined by
μ(P n (x))
ϕn (x) = − log .
μ(P n−1 (f (x)))
By telescopic cancellation,

1
n−2
1 1
− log μ(P n (x)) = − log μ(P(f n−1 (x))) + ϕn−j (f j (x)) (9.3.2)
n n n j=0

for every n and every x.


264 Entropy

Lemma 9.3.5. The sequence −n−1 log μ(P(f n−1 (x))) converges to zero μ-
almost everywhere and in L1 (μ).

Proof. Start by noting that the function x → − log μ(P(x)) is integrable:


 
| log μ(P(x))| dμ(x) = − log μ(P(x)) dμ(x) = Hμ (P) < ∞.

Using Lemma 3.2.5, it follows that −(n − 1)−1 log μ(P(f n−1 (x))) converges to
zero at μ-almost every point. Moreover, it is clear that this conclusion is not
affected if one replaces n − 1 by n in the denominator. This proves the claim
of μ-almost everywhere convergence. Next, using the fact that the measure μ
is invariant and Hμ (P) < ∞,
  
 1  1 1
 − log μ(P(f n−1 (x))) = − log μ(P(f n−1 (x))) dμ(x) = Hμ (P)
 n  n n
1

converges to zero when n → ∞. This proves the convergence in L1 (μ).

Next, we show that the last term in (9.3.2) also converges μ-almost
everywhere and in L1 (μ).

Lemma 9.3.6. The limit ϕ(x) = limn ϕn (x) exists at μ-almost every point.

Proof. For each n > 1, denote by Qn the partition of M defined by


Qn (x) = f −1 (P n−1 (f (x))) = f −1 (P(f (x))) ∩ · · · ∩ f −n+1 (P(f n−1 (x))).
Note that μ(P n−1 (f (x)) = μ(Qn (x)) and P n (x) = P(x) ∩ Qn (x). Therefore,
μ(P n (x)) μ(P(x) ∩ Qn (x))
= . (9.3.3)
μ(P n−1 (f (x))) μ(Qn (x))
For each P ∈ P and n > 1, consider the conditional expectation (recall
Section 5.2.1)

1 μ(P ∩ Qn (x))
en (XP , x) = XP dμ = .
μ(Qn (x)) Qn (x) μ(Qn (x))
Comparing with (9.3.3) we see that
μ(P n (x))
en (XP , x) = for every x ∈ P.
μ(P n−1 (f (x)))
By Lemma 5.2.1, the limit e(XP , x) = limn en (XP , x) exists for μ-almost every
x ∈ M and, in particular, for μ-almost every x ∈ P. Since P ∈ P is arbitrary, this
proves that
μ(P n (x))
lim
n μ(P n−1 (f (x)))

exists for μ-almost every point. Taking logarithms, we get that limn ϕn (x) exists
for μ-almost every point, as stated.
9.3 Local entropy 265

Lemma 9.3.7. The function  = supn ϕn is integrable.

Proof. As in the previous lemma, let us consider the partitions Qn defined by


Qn (x) = f −1 (P n−1 (f (x))). Fix any P ∈ P. Given x ∈ P and t > 0, it is clear that
(x) > t if and only if ϕn (x) > t for some n. Moreover,
 
ϕn (x) > t ⇔ μ P ∩ Qn (x) < e−t μ(Qn (x))

and, in that case, ϕn (y) > t for every y ∈ P ∩ Qn (x). Therefore, we may write
the set {x ∈ P : (x) > t} as a disjoint union j (P ∩ Qj ), where each Qj belongs
to some partition Qn(j) and
 
μ P ∩ Qj < e−t μ(Qj ) for every j.

Consequently, for every t > 0 and P ∈ P,


   
μ({x ∈ P : (x) > t}) = μ P ∩ Qj < e−t μ(Qj ) ≤ e−t . (9.3.4)
j j

Then (see Exercise 9.3.1),


   ∞
 dμ =  dμ = μ({x ∈ P : (x) > t}) dt
P∈P P P∈P 0
 ∞
≤ min{e−t , μ(P)} dt.
P∈P 0

The last integral may be rewritten as follows:


 − log μ(P)  ∞
μ(P) dt + e−t dt = −μ(P) log μ(P) + μ(P).
0 − log μ(P)

Combining these two relations:


 
 dμ ≤ −μ(P) log μ(P) + μ(P) = Hμ (P) + 1 < ∞.
P∈P

This proves the lemma, since  is non-negative.

Lemma 9.3.8. The function ϕ is integrable and (ϕn )n converges to ϕ in L1 (μ).

Proof. We saw in Lemma 9.3.6 that (ϕn )n converges to ϕ at μ-almost every


point. Since 0 ≤ ϕn ≤  for every n, we also have that 0 ≤ ϕ ≤ . In particular,
ϕ is integrable. Moreover, |ϕ − ϕn | ≤  for every n and, thus, we may use the
dominated convergence theorem (Theorem A.2.11) to conclude that
 
lim |ϕ − ϕn | dμ = lim |ϕ − ϕn | dμ = 0.
n n

This proves the convergence in L1 (μ).


266 Entropy

Lemma 9.3.9. At μ-almost every point and in L1 (μ),

1 1
n−2 n−2
lim ϕn−j (f (x)) = lim
j
ϕ(f j (x)).
n n n n
j=0 j=0

Proof. By the Birkhoff ergodic theorem (Theorem 3.2.3), the limit on the
right-hand side exists at μ-almost every point and in L1 (μ), indeed, it is equal
to the time average of the function ϕ. Therefore, it is enough to show that the
difference
1
n−2
(ϕn−j − ϕ) ◦ f j (9.3.5)
n j=0

converges to zero at μ-almost every point and in L1 (μ). Since the measure μ
is invariant, (ϕn−j − ϕ) ◦ f j 1 = ϕn−j − ϕ1 for every j. Hence,
 n−2 
1   1
n−2
 j
(ϕn−j − ϕ) ◦ f  ≤ ϕn−j − ϕ1 .
n n
j=0 1 j=0

By Lemma 9.3.8 the sequence on the right-hand side converges to zero. This
implies that (9.3.5) converges to zero in L1 (μ). We are left to prove that the
sequence converges at μ-almost every point.
For each fixed k ≥ 2, consider k = supi>k |ϕi − ϕ|. Note that k ≤  and,
thus, k ∈ L1 (μ). Moreover,

1 1  1 
n−2 n−k−1 n−2
|ϕn−j − ϕ| ◦ f =
j
|ϕn−j − ϕ| ◦ f +
j
|ϕn−j − ϕ| ◦ f j
n j=0 n j=0 n j=n−k

1  1 
n−k−1 n−2
≤ k ◦ f +
j
 ◦ f j.
n j=0 n j=n−k

By the Birkhoff ergodic theorem, the first term on the right-hand side converges
˜ k at μ-almost every point. By Lemma 3.2.5, the last term
to the time average 
converges to zero at μ-almost every point: the lemma implies that n−1  ◦ f n−i
converges to zero for any fixed i. Hence,

1
n−2
lim sup ˜ k (x)
|ϕn−j − ϕ|(f j (x)) ≤  at μ-almost every point. (9.3.6)
n n j=0

We claim that limk ˜ k (x) = 0 at μ-almost every point. Indeed, the sequence
(k )k is non-increasing and, by Lemma 9.3.6, it converges to zero at μ-almost
every point. By the monotone convergence theorem (Theorem A.2.9), it
follows that k dμ → 0 when k → ∞. Another consequence is that ( ˜ k )k
is non-increasing. Hence, using the monotone convergence theorem together
9.3 Local entropy 267

with the Birkhoff ergodic theorem,


  
˜ ˜
lim k dμ = lim k dμ = lim k dμ = 0.
k k k

Since  ˜ k = 0 at μ-almost every point,


˜ k is non-negative, it follows that limk 
as we claimed. Therefore, (9.3.6) implies that

1
n−2
lim |ϕn−j − ϕ| ◦ f j = 0
n n
j=0

at μ-almost every point. This completes the proof of the lemma.

It follows from (9.3.2) and Lemmas 9.3.5 and 9.3.9 that


1
hμ (f , P, x) = lim − log μ(P n (x))
n n
exists at μ-almost every point and in L1 (μ): in fact, it coincides with the time
average ϕ̃(x) of the function ϕ. Then, in particular,
 
1
hμ (f , P, x) dμ(x) = lim − log μ(P n (x)) dμ(x)
n n

1
= lim Hμ (P n ) = hμ (f , P).
n n

Moreover, if (f , μ) is ergodic then h(f , P, x) = ϕ̃(x) is constant at μ-almost


every point. That is, in that case hμ (f , P, x) = hμ (f , P) for μ-almost every
point. This closes the proof of Theorem 9.3.1.

9.3.2 Exercises
9.3.1. Check that, for any measurable function ϕ : M → (0, ∞),
  ∞
ϕ dμ = μ({x ∈ M : ϕ(x) > t}) dt.
0

9.3.2. Use Theorem 9.3.1 to calculate the entropy of a Bernoulli shift in  = {1, . . . , d}N .
9.3.3. Show that the function hμ (f , x) in Theorem 9.3.3 is f -invariant. Conclude that if
(f , μ) is ergodic, then hμ (f ) = hμ (f , x) for μ-almost every x.
9.3.4. Suppose that (f , μ) is ergodic and let P be a partition with finite entropy. Show
that given ε > 0 there exists k ≥ 1 such that for every n ≥ k there exists Bn ⊂ P n
such that

e−n(hμ (f ,P)+ε) < μ(B) < e−n(hμ (f ,P)−ε) for every B ∈ Bn ,

and the measure of the union of the elements of Bn is larger than 1 − ε.


268 Entropy

9.4 Examples
In this section we illustrate the previous results through a few examples.

9.4.1 Markov shifts


Let  = {1, . . . , d}N and σ :  →  be the shift map. Let μ be the Markov
measure associated with a stochastic matrix P = (Pi,j )i,j and a probability vector
p = (pi )i . We are going to prove:
 
Proposition 9.4.1. hμ (σ ) = da=1 pa db=1 −Pa,b log Pa,b .

Proof. Consider the partition P of  into cylinders [0; a], a = 1, . . . , d. For


each n, the iterate P n is the partition into cylinders [0; a1 , . . . , an ] of length n.
Recalling that μ([0; a1 , . . . , an ]) = pa1 Pa1 ,a2 · · · Pan−1 ,an , we see that
  
Hμ (P ) =
n
−pa1 Pa1 ,a2 · · · Pan−1 ,an log pa1 Pa1 ,a2 · · · Pan−1 ,an
a1 ,...,an
 
= −pa1 log pa1 Pa1 ,a2 · · · Pan−1 ,an
a1 a2 ,...,an (9.4.1)


n−1  
+ − log Paj ,aj+1 pa1 Pa1 ,a2 · · · Pan−1 ,an ,
j=1 aj ,aj+1

where the last sum is over all the values of a1 , . . . , aj−1 , aj+2 , . . . , an . On the one
hand,  
Pa1 ,a2 · · · Pan−1 ,an = Pna1 ,an = 1,
a2 ,...,an an

because Pn is a stochastic matrix. On the other hand,


 
pa1 Pa1 ,a2 · · · Pan−1 ,an = pa1 Pja1 ,aj Paj ,aj+1 Pn−j−1
aj+1 ,an
a1 ,an

= pa1 Pja1 ,aj Paj ,aj+1 = paj Paj ,aj+1 ,
a1

because Pn−j−1 is a stochastic matrix and pPj = P∗j p = p. Replacing these


observations in (9.4.1), we get that
 
n−1 
Hμ (P ) =n
−pa1 log pa1 + −paj Paj ,aj+1 log Paj ,aj+1
a1 j=1 aj ,aj+1
 
= −pa log pa + (n − 1) −pa Pa,b log Pa,b .
a a,b

It follows that hμ (σ , P) = a,b −pa Pa,b log Pa,b . Since the family of all cylin-
ders [0; a1 , . . . , an ] generates the σ -algebra of , it follows from Corollary 9.2.5
that hμ (σ ) = hμ (σ , P). This completes the proof of theorem.
9.4 Examples 269

This conclusion remains valid for two-sided Markov shifts as well, that is,
when  = {1, . . . , d}Z . The argument is analogous, using Corollary 9.2.6.

9.4.2 Gauss map


Now we calculate the entropy of the Gauss map G(x) = (1/x) − [1/x] relative
to the invariant probability measure

1 dx
μ(E) = , (9.4.2)
log 2 E 1 + x
which was already studied in Sections 1.3.2 and 4.2.4. The method that
we are going to present extends to a much broader class of systems,
including the expanding maps of the interval that are defined and discussed
in Example 11.1.16.
Let P be the partition of (0, 1) into the subintervals (1/(m + 1), 1/m) with
* −j
m ≥ 1. As before, we denote P n = n−1 j=0 G (P). The following facts are used
in what follows:
(A) Gn maps each Pn ∈ P n diffeomorphically onto (0, 1), for each n ≥ 1.
(B) diam P n → 0 when n → ∞.
(C) There exists C > 1 such that |(Gn ) (y)|/|(Gn ) (x)| ≤ C for every n ≥ 1 and
any x and y in the same element of the partition P n .
(D) There exist c1 , c2 > 0 such that c1 m(Pn ) ≤ μ(Pn ) ≤ c2 m(Pn ) for every n ≥ 1
and every Pn ∈ Pn , where m denotes the Lebesgue measure.
It is immediate from the definition that each P ∈ P is mapped by G
diffeomorphically onto (0, 1). Property (A) is a consequence, by induction
on n. Using (A) and Lemma 4.2.12, we get that
1
diam Pn ≤ sup 
≤ 2−[n/2]
x∈Pn |(G ) (x)|
n

for every n ≥ 1 and every Pn ∈ P n . This implies (B). Property (C) is given by
Lemma 4.2.13. Finally, (D) follows directly from (9.4.2).
Proposition 9.4.2. hμ (G) = log |G | dμ.

Proof. Consider the function ψn (x) = − log μ(P n (x)), for each n ≥ 1. Observe
that
 
Hμ (P ) =
n
−μ(Pn ) log μ(Pn ) = ψn (x) dμ(x).
Pn ∈P n

Property (D) gives that


− log c1 ≥ ψn (x) + log m(P n (x)) ≥ − log c2 .
By property (A), we have log m(P n (x)) = − log |(Gn ) (y)| for some y ∈ P n (x).
Using property (C), it follows that
− log c1 + log C ≥ ψn (x) − log |(Gn ) (x)| ≥ − log c2 − log C
270 Entropy

for every x and every n. Consequently,



− log(c1 /C) ≥ Hμ (P ) − log |(Gn ) | dμ ≥ log(c2 C)
n
(9.4.3)

for every n. Since the measure μ is invariant under G,


 n−1 
 
n  
log |(G ) | dμ = log |G | ◦ G dμ = n
j
|G | dμ.
j=0

Then, dividing (9.4.3) by n and taking the limit when n → ∞,



1
hμ (f , P) = lim Hμ (P ) = log |G | dμ.
n
n n

Now, property (B) ensures that we may apply Corollary 9.2.10 to conclude that

hμ (G) = hμ (G, P) = log |G | dμ.

This completes the proof of the proposition.

The integral in the statement of the proposition may be calculated explicitly:


we leave it to the reader to check (using integration by parts and the fact that
∞
j=1 1/j = π /6) that
2 2

  1
 −2 log x dx π2
hμ (G) = log |G | dμ = = ≈ 5.46 . . .
0 (1 + x) log 2 6 log 2
Then, recalling that (G, μ) is ergodic (Section 4.2.4), it follows from the
theorem of Shannon–McMillan–Breiman (Theorem 9.3.1) that
1 π2
lim − log μ(P n (x)) = for μ-almost every x.
n n 6 log 2
As the measure μ is comparable to the Lebesgue measure, up to a constant
factor, this means that
2
− 6πlogn2
diam P (x) ≈ e
n
(up to a factor e±εn )
for μ-almost every x and every n sufficiently large. Observe that P n (x) is
formed by the points y whose continued fraction expansion coincide with the
continued fraction expansion of x up to order n.

9.4.3 Linear endomorphisms of the torus


Given a real number x > 0, we denote log+ x = max{log x, 0}. In this section
we prove the following result:

Proposition 9.4.3. Let fA : Td → Td be the endomorphism induced on the


torus Td by some invertible matrix A with integer coefficients. Let μ be the
9.4 Examples 271

Haar measure of Td . Then



d
hμ (fA ) = log+ |λi |,
i=1

where λ1 , . . . , λd are the eigenvalues of A, counted with multiplicity.


Initially, let us suppose that the matrix A is diagonalizable. Let v1 , . . . , vd be
a normed basis of Rd such that Avi = λi vi for each i. Let u be the number of
eigenvalues of A with absolute value strictly larger than 1. We may take the
eigenvalues to be numbered in such a way that |λi | > 1 if and only if i ≤ u.
Given x ∈ Td , every point y in a neighborhood of x may be written in the form

d
y = x+ ti v i
i=1

with t1 , . . . , td close to zero. Given ε > 0, denote by D(x, ε) the set of points y of
this form with |ti | < ε for every i = 1, . . . , d. Moreover, for each n ≥ 1, consider
j j
D(x, n, ε) = {y ∈ Td : fA (y) ∈ D(fA (x), ε) for every j = 0, . . . , n − 1}.
j j  j
Observe that fA (y) = fA (x) + di=1 ti λi vi for every n ≥ 1. Therefore,
 d 
D(x, n, ε) = x + ti vi : |λi ti | < ε for i ≤ u and |ti | < ε for i > u .
n

i=1

Hence, there exists a constant C1 > 1 that depends only on A, such that
&
u &
u
C1−1 εd |λi | −n
≤ μ(D(x, n, ε)) ≤ C1 ε d
|λi |−n
i=1 i=1

for every x ∈ Td , n ≥ 1 and ε > 0. It is also clear that there exists a constant
C2 > 1 that depends only on A, such that
B(x, C2−1 ε) ⊂ D(x, ε) ⊂ B(x, C2 ε)
for x ∈ Td and ε > 0 small. Then, B(x, n, C2−1 ε) ⊂ D(x, n, ε) ⊂ B(x, n, C2 ε) for
every n ≥ 1. Combining these two observations and taking C = C1 C2d , we get
that
&u &
u
−n
−1 d
C ε |λi | ≤ μ(B(x, n, ε)) ≤ Cε d
|λ−n
i |
i=1 i=1

for every x ∈ Td , n ≥ 1 and ε > 0. Then,


1    u
h+ −
μ (f , ε, x) = hμ (f , ε, x) = lim log μ B(x, n, ε) = log |λi |
n n i=1

for x ∈ T and ε > 0 small. Hence, using the theorem of Brin–Katok


(Theorem 9.3.3),

u
hμ (f ) = hμ (f , x) = log |λi |
i=1
272 Entropy

for μ-almost every point x. This proves Proposition 9.4.3 in the diagonalizable
case.
The general case may be treated analogously, through writing the matrix A
in canonical Jordan form. We leave this task to the reader (Exercise 9.4.2).

9.4.4 Differentiable maps


Here we take M to be a Riemannian manifold (check Appendix A.4.5) and
f : M → M to be a local diffeomorphism, that is, a C1 map whose derivative
Df (x) : Tx M → Tf (x) M at each point x is an isomorphism. We are going to state
and discuss two important theorems, the Margulis–Ruelle inequality and the
Pesin entropy formula, that relate the entropy hμ (f ) of an invariant measure to
the Lyapunov exponents of the derivative Df .
Let μ be any probability measure invariant under f . According to the
multiplicative ergodic theorem of Oseledets (see Section 3.3.5), for μ-almost
every point x ∈ M there exist k(x) ≥ 1, real numbers λ1 (x) > · · · > λk (x) and a
filtration Rd = Vx1 > · · · > Vxk(x) > Vxk(x)+1 = {0} such that
1
Df (x)Vxi = Vfi(x) and lim log Df n (x)v = λi (x)
n n

for every v ∈ Vxi \ Vxi+1 , every i ∈ {1, . . . , k(x)} and μ-almost every x ∈ M.
Moreover, all these objects depend measurably on the point x ∈ M. When the
measure μ is ergodic, the number k(x), the Lyapunov exponents λi (x) and their
multiplicities mi (x) = dim Vxi − dim Vxi+1 are all constant on a full measure set.
Define ρ + (x) to be the sum of all positive Lyapunov exponents, counted
with multiplicity:

k(x)
+
ρ (x) = mi (x)λ+
i (x) with λ+
i (x) = max{λi (x), 0}.
i=1

The Margulis–Ruelle inequality asserts that the average of ρ + is always an


upper bound for the entropy of (f , μ). Proofs can be found in Ruelle [Rue78]
and Mañé [Mañ87, Section 4.12].
Theorem 9.4.4 (Margulis–Ruelle inequality).

hμ (f ) ≤ ρ + dμ.

It may happen that all the Lyapunov exponents are positive: that is the
case, for instance, for the expanding differentiable maps in Section 11.1. Then,
ρ + (x) is simply the sum of all Lyapunov exponents, counted with multiplicity.
Now, it is also part of the theorem of Oseledets (property (c1) in Section 3.3.5)
that

k(x)
1
mi (x)λi (x) = lim log | det Df n (x)|
n n
i=1
9.4 Examples 273

at μ-almost every point. Observe that the right-hand side of this identity is a
Birkhoff time average:

1
n−1
1
lim log | det Df n (x)| = lim log | det Df |(f j (x)).
n n n n
j=0

So, by the Birkhoff ergodic theorem, the integral of ρ + coincides with the
integral of the function log | det Df |. Thus, in this case the Margulis–Ruelle
inequality becomes:

hμ (f ) ≤ log | det Df | dμ.

Another interesting particular case is when f is a diffeomorphism. It follows


from the version of the Oseledets theorem for invertible maps (also stated in
Section 3.3.5) that the Lyapunov exponents of Df −1 are the numbers −λi (x),
with multiplicities mi (x). Then, applying Theorem 9.4.4 to the inverse and
recalling (Proposition 9.1.14) that hμ (f ) = hμ (f −1 ), we get that

hμ (f ) ≤ ρ − dμ, (9.4.4)

where ρ − (x) = k(x)
i=1 mi (x) max{−λi (x), 0}.
Now let us suppose that the invariant measure μ is absolutely continuous
with respect to the volume measure associated with the Riemannian structure
of M (check Appendix A.4.5). In this case, assuming just a little bit more
regularity, we have a much stronger result:

Theorem 9.4.5 (Pesin entropy formula). Assume that the derivative Df is


Hölder and the invariant measure μ is absolutely continuous. Then

hμ (f ) = ρ + dμ.

This fundamental result was originally proven by Pesin [Pes77]. See also
Mañé [Mañ87, Section 4.13] for an alternative proof.
The expression for the entropy of the Haar measure of a linear torus
endomorphism, given in Proposition 9.4.3, is a special case of Theorem 9.4.5.
Indeed, one can check that the Lyapunov exponents of a linear endomorphism
fA at every point coincide with the logarithms log |λi | of the absolute values
of the eigenvalues of the matrix A, with the same multiplicities. Thus, in this
context
 d
+
ρ (x) ≡ log+ |λi |.
i=1

Of course, the Haar measure is absolutely continuous. Another special


case of the Pesin entropy formula will appear in Section 12.1.8: see
(12.1.31)–(12.1.36).
274 Entropy

Finally, let us point out that the assumption of absolute continuity is


too strong: the conclusion of Theorem 9.4.5 still holds if the invariant
measure is just “absolutely continuous along unstable manifolds”. Roughly
speaking, this technical condition means that the conditional probabilities
of μ with respect to a certain measurable partition, whose elements are
unstable disks,3 are absolutely continuous with respect to the volume measures
induced on each of the disks by the Riemannian metric of M. Moreover,
and most striking, this sufficient condition is also necessary for the Pesin
entropy formula to hold. For precise statements, related results and proofs,
see [LS82, Led84, LY85a, LY85b].

9.4.5 Exercises
9.4.1. Show that every rotation Rθ : Td → Td has entropy zero with respect to the Haar
measure of the torus Td . [Observation: This is a special case of Example 9.3.4
but for the present statement we do not need the theorem of Brin–Katok.]
9.4.2. Complete the proof of Proposition 9.4.3.
9.4.3. Let f : M → M be a measurable transformation and μ be an ergodic probability
measure. Let B ⊂ M be a measurable set with μ(B) > 0, g : B → B be the
first-return map of f to B and ν be the normalized restriction of μ to the set B
(recall Section 1.4.1). Show that hμ (f ) = ν(B)hν (g).
9.4.4. Let f : M → M be a measure-preserving transformation in a Lebesgue space
(M, μ). Let f̂ : M̂ → M̂ be the natural extension of f and μ̂ be the lift of μ
(Exercise 8.5.7). Show that hμ (f ) = hμ̂ (f̂ ).
9.4.5. Prove that if f is the time-1 of a smooth flow on a surface M then hμ (f ) = 0 for
every invariant ergodic measure μ. [Observation: Using Theorem 9.6.2 below, it
follows that the entropy is zero for every invariant measure.]

9.5 Entropy and equivalence


The notion of entropy was originally introduced in ergodic theory as a means
to distinguish systems that are not ergodically equivalent, especially in the case
of systems that are spectrally equivalent and, thus, cannot be distinguished by
spectral invariants. It is easy to see that the entropy is, indeed, an invariant of
ergodic equivalence:

Proposition 9.5.1. Let f : M → M and g : N → N be transformations


preserving probability measures μ in M and ν in N, respectively. If (f , μ) is
ergodically equivalent to (g, ν), then hμ (f ) = hν (g).

3 Unstable disks are differentiably embedded disks that are contracted exponentially under
negative iteration; in the non-invertible case, the definition is formulated in terms of the natural
extension of the transformation.
9.5 Entropy and equivalence 275

Proof. Let φ : M → N be an ergodic equivalence between the two systems.


This means that φ∗ μ = ν and there exist full measure subsets X ⊂ M and Y ⊂ N
such that φ is a measurable bijection from X to Y, with measurable inverse.
Moreover, as observed in Section 8.1, the sets X and Y may be chosen to be
invariant. Given any partition P of M with finite entropy for μ, let PX be its
restriction to X and QY = φ(PX ) be the image of PX under φ. Then Q =
QY ∪ {Y c } is a partition of N and, since X and Y are full measure subsets,
 
Hν (Q) = −ν(Q) log ν(Q) = −μ(P) log μ(P) = Hμ (P).
Q∈QY P∈PX

Since X and Y are both invariant, PXn is the restriction of P n to the subset X and
Qn = QnY ∪ {Y c } for every n. Moreover,
)
n−1 + n−1
) %
−j −j
QnY = g (QY ) = φ f (PX ) = φ(PXn )
j=0 j=0

for every n. Thus, the previous argument proves that Hν (Qn ) = Hμ (P n ) for
every n and so
1 1
hν (g, Q) = lim Hν (Qn ) = lim Hμ (P n ) = hμ (f , P).
n n n n

Taking the supremum over P, we conclude that hν (g) ≥ hμ (f ). The converse


inequality is entirely analogous.

Using this observation, Kolmogorov and Sinai concluded that not all
two-sided Bernoulli shifts are ergodically equivalent despite their being
spectrally equivalent, as we saw in Corollary 8.4.12. This also shows that
spectral equivalence is strictly weaker than ergodic equivalence. In fact, as
observed in Exercise 9.2.2, for every s > 0 there exists some two-sided
Bernoulli shift (σ , μ) such that hμ (σ ) = s. Therefore, a sole class of spectral
equivalence may contain a whole continuum of ergodic equivalence classes.

9.5.1 Bernoulli automorphisms


The converse to Proposition 9.5.1 is false, in general. Indeed, we saw
in Example 9.2.9 (and Corollary 9.2.7) that all the circle rotations have
entropy zero. But an irrational rotation is never ergodically equivalent to a
rational rotation, since the former is ergodic and the latter is not. Besides,
Corollary 8.3.6 shows that irrational rotations are also not ergodically
equivalent to each other, in general. The case of rational rotations is treated
in Exercise 8.3.3.
However, a remarkable result due to Donald Ornstein [Orn70] states that the
entropy is a complete invariant for two-sided Bernoulli shifts:
Theorem 9.5.2 (Ornstein). Two-sided Bernoulli shifts in Lebesgue spaces are
ergodically equivalent if and only if they have the same entropy.
276 Entropy

We call Bernoulli automorphism any system that is ergodically equivalent


to a two-sided Bernoulli shift. In the sequel we find several examples of
such systems. The theorem of Ornstein may be reformulated as follows: two
Bernoulli automorphisms in Lebesgue spaces are ergodically equivalent if and
only if they have the same entropy.
Let us point out that the theorem of Ornstein does not extend to one-sided
Bernoulli shifts. Indeed, Exercise 8.1.2 shows that in the non-invertible case
there are other invariants of ergodic equivalence, including the degree of the
transformation (the number of pre-images).
William Parry and Peter Walters [PW72b, PW72a, Wal73] proved, among
other results, that one-sided Bernoulli shifts corresponding to probability
vectors p = (p1 , . . . , pk ) and q = (q1 , . . . , ql ) are ergodically equivalent if and
only if k = l and the vector p is a permutation of the vector q. In Exercise 9.7.7,
after introducing the notion of the Jacobian, we invite the reader to prove this
fact.

9.5.2 Systems with entropy zero


In this section we study some properties of systems whose entropy is equal to
zero. The main result (Proposition 9.5.5) is that such systems are invertible at
almost every point, if the ambient space is a Lebesgue space. It is worthwhile
comparing this statement with Corollary 9.2.7. At the end of the section we
briefly discuss the spectral types of systems with entropy zero.
In what follows, (M, B, μ) is a probability space and f : M → M is a
measure-preserving transformation. In the second half of the section we take
(M B, μ) to be a Lebesgue space.

Lemma 9.5.3. For every ε > 0 there exists δ > 0 such that if P and Q are
partitions with finite entropy and Hμ (P/Q) < δ then for every P ∈ P there
exists a union P of elements of Q satisfying μ(P P ) < ε.

Proof. Let s = 1 − ε/2 and δ = −(ε/2) log s. For each P ∈ P consider


 
S = {Q ∈ Q : μ P ∩ Q ≥ sμ(Q)}.
Let P be the union of all the elements of S. On the one hand,
  ε
μ(P \ P) = μ(Q \ P) ≤ (1 − s)μ(Q) ≤ . (9.5.1)
Q∈S Q∈S
2

On the other hand,


   μ(R ∩ Q)
Hμ (P/Q) = −μ R ∩ Q log
R∈P Q∈Q
μ(Q)
  
≥ −μ P ∩ Q log s = −μ(P \ P ) log s.
Q∈S
/
9.5 Entropy and equivalence 277

This implies that


Hμ (P/Q) δ ε
μ(P \ P ) ≤ < = . (9.5.2)
− log s − log s 2
Putting (9.5.1) and (9.5.2) together, we get the conclusion of the lemma.
The next lemma means that the rate hμ (f , P) of information (relative to the
partition P) generated by the system at each iteration is zero if and only if the
future determines the present, in the sense that information relative to the 0-th
iterate may be deduced from the ensemble of information relative to the future
iterates.
Lemma 9.5.4. Let P be a partition with finite entropy. Then, hμ (f , P) = 0 if
*
and only if P ≺ ∞ −j
j=1 f (P).

Proof. Suppose that hμ (f , P) is zero. Using Lemma 9.1.12, we obtain that


*
Hμ (P/ nj=1 f −j (P)) converges to zero when n → ∞. Then, by Lemma 9.5.3,
for each l ≥ 1 and each P ∈ P there exist nl ≥ 1 and a union Pl of elements of
*nl −j
the partition j=1 f (P) such that μ(P Pl ) < 2−l . It is clear that every Pl is a
*
union of elements of ∞ −j
j=1 f (P) and, thus, the same is true for every

l=n Pl
∞ ∞
and also for P∗ = n=1 l=n Pl . Moreover,
+  ∞ % + ∞ %
μ P \ Pl = 0 and μ Pl \ P ≤ 2−n
l=n l=n

for every n and, consequently, μ(P P∗ ) = 0. This shows that every element of
*
P coincides, up to measure zero, with a union of elements of ∞ −j
j=1 f (P), as
claimed in the “only if” half of the statement.
The argument to prove the converse is similar to the one in Proposition 9.2.2.
*
Suppose that P ≺ ∞ −j
j=1 f (P). Write P = {Pj : j = 1, 2, . . . } and, for each k ≥ 1,
consider the finite partition Pk = {P1 , . . . , Pk , M \ kj=1 Pj }. Given any ε > 0,
Lemma 9.2.3 ensures that Hμ (P/Pk ) < ε/2 for every k sufficiently large. Fix
k in these conditions and write P0 = M \ kj=1 Pj . For each n ≥ 1 and each
*
j = 1, . . . , k, let Qni be the union of the elements of nj=1 f −j (P) that intersect
Pi . The hypothesis ensures that each (Qni )n is a decreasing sequence whose
intersection coincides with Pi up to measure zero. Then, given δ > 0 there
exists n0 such that
 k
μ(Qni \ Pi ) < δ for every n ≥ n0 . (9.5.3)
i=1

Define Rni = Qni \ i−1


j=1 Qj for i = 1, . . . , k and also R0 = M \
n n k n
j=1 Qj . It is clear
from the construction that Rn = {Rn1 , . . . , Rnk , Rn0 } is a partition of M coarser than
*n −j
j=1 f (P). Since Ri ⊂ Qi and Pi ⊂ Qi , and the elements of P are pairwise
n n n

disjoint,

i−1
 
i−1
 n 
Rni \ Pi ⊂ Qni \ Pi and Pi \ Rni = Pi ∩ Qnj ⊂ Qj \ Pj
j=1 j=1
278 Entropy
k k
for i = 1, . . . , k. Similarly, Rn0 ⊂ P0 and P0 \ Rn0 = P0 ∩ n
j=1 Qi = j=1 (Qj \ Pj ).
n

Therefore, the relation (9.5.3) implies that


μ(Pi Rni ) < δ for every i = 0, 1, . . . , k and every n ≥ n0 .
Then, assuming that δ > 0 is small, it follows from Lemmas 9.1.5 and 9.1.6
that  
)n
Hμ Pk / f −j (P) ≤ Hμ (Pk /Rn ) < ε/2
j=1
 *n −j 
for every n ≥ n0 . Using Exercise 9.1.1, we get that Hμ P/
 *n −j j=1 f  (P) <ε
for every n ≥ n0 . In this way, it is shown that Hμ P/ j=1 f (P) → 0. By
Lemma 9.1.12, it follows that hμ (f , P) = 0.

As a consequence, we get that every system with entropy zero is invertible


at almost every point:

Proposition 9.5.5. Let (M, B, μ) be a Lebesgue space and f : M → M be a


measure-preserving transformation. If hμ (f ) = 0 then (f , μ) is invertible: there
exists a measurable transformation g : M → M that preserves the measure μ
and satisfies f ◦ g = g ◦ f = id at μ-almost every point.

Proof. Consider the homomorphism f̃ : B̃ → B̃ induced by f in the measure


algebra of B (these notions were introduced in Section 8.5). Recall that f̃ is
always injective (Exercise 8.5.2). Given any B ∈ B, consider the partition P =
{B, Bc }. The hypothesis hμ (f ) = 0 implies that hμ (f , P) = 0. By Lemma 9.5.4,
*
it follows that P ≺ ∞ −j
j=1 f (P). This implies that P ⊂ f
−1
(B), because
f (P) ⊂ f (B) for every j ≥ 1. Varying B, we conclude that B ⊂ f −1 (B). In
−j −1

other words, the homomorphism f̃ is surjective. Hence, f̃ is an isomorphism


of measure algebras. Then, by Proposition 8.5.6, there exists a measurable
map g : M → M preserving the measure μ and such that the corresponding
homomorphism of measure algebra g̃ : B̃ → B̃ is the inverse of f̃ . In other
words, f̃ ◦ g̃ = g̃ ◦ f̃ = id . Then, (Exercise 8.5.2) f ◦ g = g ◦ f = id , as claimed.

These arguments also prove the following fact, which will be useful in a
while:

Corollary 9.5.6. In the conditions of Proposition 9.5.5, every σ -algebra A ⊂


B that satisfies f −1 (A) ⊂ A up to measure zero also satisfies f −1 (A) = A up
to measure zero.

Exercise 9.1.5 implies that if (f , μ) has entropy zero then the same is true
for any factor. Therefore, the following fact is also an immediate consequence
of the proposition:

Corollary 9.5.7. In the conditions of Proposition 9.5.5, every factor of (f , μ)


is invertible at almost every point.
9.5 Entropy and equivalence 279

It is not completely understood how entropy relates to the spectrum type of a


system, but there are several partial results, especially for systems with entropy
zero.
Rokhlin [Rok67a, § 14] proved that every ergodic system with discrete
spectrum defined in a Lebesgue space has entropy zero. This may also be
deduced from the fact that, as we mentioned in Section 8.3, every ergodic
system with discrete spectrum is ergodically isomorphic to a translation
in a compact abelian group. As we saw in Corollary 8.5.7, in Lebesgue
spaces ergodic isomorphism implies ergodic equivalence. Recall also that
systems with discrete spectrum in Lebesgue spaces are always invertible
(Exercise 8.5.5).
In that same work of Rokhlin it is shown that invertible systems with
singular spectrum defined in Lebesgue spaces have entropy zero and the same
holds for systems with Lebesgue spectrum of finite rank (if they exist). The
case of infinite rank is the focus of the next section. We are going to see that
there are systems with Lebesgue spectrum of infinite rank and entropy zero.
On the other hand, we introduce the important class of so-called Kolmogorov
systems, for which the entropy is necessarily positive, in a strong sense.

9.5.3 Kolmogorov systems


Let (M, B, μ) be a non-trivial probability space, that is, one such that not all
*
measurable sets have measure 0 or 1. We use α Uα to denote the σ -algebra
generated by any family of subsets Uα of B. Let f : M → M be a transformation
preserving the measure μ.

Definition 9.5.8. We say that (f , μ) is a Kolmogorov system if there exists


some σ -algebra A ⊂ B such that

(i) f −1 (A) ⊂ A up to measure zero;


(ii) ∞ f −n (A) = {∅, M} up to measure zero;
*n=0
(iii) ∞ n=0 {B ∈ B : f
−n
(B) ∈ A} = B up to measure zero.

We leave it to the reader to check that this property is an invariant of ergodic


equivalence (it is not an invariant of spectral equivalence, as will be explained
shortly).
If (f , μ) is a Kolmogorov system then (f k , μ) is a Kolmogorov system, for
every k ≥ 1. Indeed, if A satisfies condition (i) then the sequence f −j (A) is
non-increasing and, in particular, f −k (A) ⊂ A. Then, the conditions (ii) and
(iii) imply that

 ∞

−kn
f (A) = f −n (A) = {∅, M} and
n=0 n=0
280 Entropy

) ∞
)
−kn
{B ∈ B : f (B) ∈ A} = {B ∈ B : f −n (B) ∈ A} = B
n=0 n=0

up to measure zero. We say that (f , μ) is a Kolmogorov automorphism if it


is invertible and a Kolmogorov system. Then the inverse (f −1 , μ) is also a
Kolmogorov system, as we will see.

Proposition 9.5.9. Every Kolmogorov system has Lebesgue spectrum of


infinite rank. If the σ -algebra B is countably generated then the rank is
countable.

Proof. Let A ⊂ B be a σ -algebra satisfying the conditions in Definition 9.5.8.


Let E = L02 (M, A, μ) be the subspace of functions ϕ ∈ L02 (M, B, μ) that are
A-measurable, that is, such that the pre-image ϕ −1 (B) of every measurable set
B ⊂ R is in A up to measure zero. We are going to show that E satisfies all the
conditions in Definition 8.4.1.
Start by observing that Uf (L02 (M, A, μ)) = L02 (M, f −1 (A), μ). Indeed, it is
clear that if ϕ is A-measurable then Uf ϕ = ϕ ◦ f is f −1 (A)-measurable. The
inclusion ⊂ follows immediately. Conversely, given any B ∈ f −1 (A), take A ∈
A such that B = f −1 (A) and let c = μ(A) = μ(B). Then, XB − c = Uf (XA − c)
is in Uf (L02 (M, A, μ)). This gives the other inclusion. So, the hypothesis that
f −1 (A) ⊂ A up to measure zero ensures that Uf (E) ⊂ E.
It also follows that Ufn (L02 (M, A, μ)) = L02 (M, f −n (A), μ) for every n ≥ 0.
Then,

   +  ∞ %
−n
Uf L0 (M, A, μ) = L0 M, f (A), μ .
n 2 2

n=0 n=0

Hence, the hypothesis that ∞ n=0 f


−n
(A) = {0, M} up to measure zero implies

that n=0 Uf (E) = {0}.
n

Now consider An = {B ∈ B : f −n (B) ∈ A)}. The sequence (An )n is


non-decreasing, because f −1 (A) ⊂ A. Moreover, each ϕ is An -measurable if
and only if Ufn ϕ = ϕ ◦f n is A-measurable. This shows that Uf−n (L02 (M, A, μ)) =
L02 (M, An , μ) for every n ≥ 0. Observe also that

   + )
∞ %
L02 M, An , μ = L0 M, An , μ .
2
(9.5.4)
n=0 n=0
*
Indeed, it is clear that L02 (M, Ak , μ) ⊂ L02 (M, ∞ n=0 An , μ) for every k, since Ak
*∞
is contained in n=0 An . The inclusion ⊂ in (9.5.4) is an immediate conse-
*
quence of this observation, since L02 (M, ∞ n=0 An , μ) is a Banach space. Now
*∞
consider any A ∈ n=0 An . The approximation theorem (Theorem A.1.19)
implies that for each ε > 0 there exist n and An ∈ An such that μ(A An ) < ε.

Then (XAn )n converges to XA in the L2 -norm, and so XA ∈ ∞ n=0 L0 (M, An , μ).
2
9.5 Entropy and equivalence 281

This proves the inclusion ⊃ in (9.5.4). In this way, we have shown that
∞ + )∞ %
−n 2
Uf (L0 (M, A, μ)) = L0 M, An , μ .
2

n=0 n=0
*∞
Therefore, the hypothesis that n=0 An = B up to measure zero implies that
∞ −n
n=0 Uf (E) = L0 (M, B, μ).
2

This concludes the proof that (f , μ) has Lebesgue spectrum. To prove that
the rank is infinite, we need the following lemma:
Lemma 9.5.10. Let A be any σ -algebra satisfying the conditions in Defini-
tion 9.5.8. Then for every A ∈ A with μ(A) > 0 there exists B ⊂ A such that
0 < μ(B) < μ(A).

Proof. Suppose that A has any element A with positive measure that does not
satisfy the conclusion of the lemma. We claim that A ∩ f −k (A) has measure
zero for every k ≥ 1. Then,
+ % + %
−i −j −j+i
μ f (A) ∩ f (A) = μ A ∩ f (A) = 0 for every 0 ≤ i < j.

Since μ(f −j (A)) = μ(A) for every j ≥ 0, this implies that the measure μ is
infinite, which is a contradiction. This contradiction reduces the proof of the
lemma to proving our claim.
To do that, note that condition (i) implies that f −k (A) ∈ f −k (A) ⊂ A. Then it
follows from the choice of A that A ∩ f −k (A) must have either zero measure or
full measure in A:
+ %
μ A ∩ f (A) = 0 or else μ(A \ f −k (A)) = 0.
−k

So, to prove the claim it suffices to exclude the second possibility. Suppose
that μ(A \ f −k (A)) = 0. Then (Exercise 1.1.4), there exists B ∈ A such that
μ(A B) = 0 and f −k (B) = B. It follows that B = f −nk (B) for every n ≥ 1 and,
thus,  
−nk
B∈ f (A) = f −n (A).
n∈N n∈N
By condition (ii), this means that the measure of B is either 0 or 1. Since
μ(B) = μ(A) is positive, it follows that μ(A) = μ(B) = 1. Then, the hypothesis
about A implies that the σ -algebra A contains only sets with measure 0 or 1.
By condition (iii), it follows that the same is true for the σ -algebra B, which
contradicts the assumption that the probability space is non-trivial.

On the way toward proving that the orthogonal complement F = E " Uf (E)
has infinite dimension, let us start by checking that F = {0}. Indeed, otherwise
we would have Uf (E) = E and, thus, Ufn (E) = E for every n ≥ 1. By condition
(ii), that would imply that E = n Ufn (E) = {0}. Then, by condition (iii), we
would have L02 (M, B, μ) = {0} and that would contradict the hypothesis that the
probability space is non-trivial.
282 Entropy

Let ϕ be any non-zero element of F, fixed once and for all, and let N be the
set of all x ∈ M such that ϕ(x) = 0. Then, N ∈ A and μ(N) > 0. It is convenient
to consider the space E = L2 (M, A, μ) = E ⊕ {constants}. Observe that F
coincides with E " Uf (E ), because the Koopman operator preserves the line
of constant functions. Let EN be the subspace of functions ψ ∈ E that vanish
outside N, that is, such that ψ(x) = 0 for every x ∈ N c . By Lemma 9.5.10,
we may find sets Aj ∈ A, j ≥ 1 with positive measure, contained in N and
pairwise disjoint. Then, XAj is in EN for every j. Moreover, Ai ∩ Aj = ∅ yields
XAi · XAj = 0 for every i  = j. This implies that EN has infinite dimension.
Now denote by Uf (E )N the subspace of functions ψ ∈ Uf (E ) that vanish
outside N. Denote FN = EN " Uf (E )N . The fact that dim EN = ∞ ensures that
dim FN = ∞ or dim Uf (E )N = ∞ (or both). We are going to show that any of
these alternatives implies that dim F = ∞.
To treat the first alternative, it suffices to show that FN ⊂ F. For that, since
it is clear that FN ⊂ E , it suffices to check that FN is orthogonal to Uf (E ).
Consider any ξ ∈ FN and η ∈ E . The function (Uf η)XN = Uf (ηXf −1 (N) ) is in
Uf (E ) and vanishes outside N. In other words, it is in Uf (E )N . Then ξ · Uf η =
ξ · (Uf η)XN = 0, because ξ vanishes outside N and is orthogonal to Uf (E )N .
This completes the argument in this case.
Now we treat the second alternative. Given any Uf η ∈ Uf (E )N and any
n ∈ N, let ηn = ηXRn with Rn = {x ∈ M : |η(x)| ≤ n}. Then (ηn )n is a sequence of
bounded functions converging to η in E . Moreover, every ηn vanishes outside
f −1 (N), because η does. Then, (Uf ηn )n is a sequence of bounded functions
that vanish outside N and, recalling that Uf is an isometry, this sequence
converges to Uf η in E . This proves that the subspace of bounded functions
is dense in Uf (E )N . Then, since we are assuming that dim Uf (E )N = ∞, this
subspace must also have infinite dimension. Choose {ξk : k ≥ 1} ⊂ E such
that {Uf ξk : k ≥ 1} is a linearly independent subset of Uf (E )N consisting of
bounded functions. Observe that the products ϕ(Uf ξk ), k ≥ 1 form a linearly
independent subset of E . Moreover, given any η ∈ E ,
 
ϕ(Uf ξk ) · (Uf η) = ϕ (ξk ◦ f ) (η̄ ◦ f ) dμ = ϕ (ξk η̄) ◦ f dμ = ϕ · Uf (ξ̄k η).

This last expression is equal to zero because ξ̄k η ∈ E and the function ϕ ∈ F is
orthogonal to Uf (E ). Varying η ∈ E , we conclude that ϕ(Uf ξk ) is orthogonal
to Uf (E ) for every k. This shows that {ϕ(Uf ξk ) : k ≥ 1} is contained in F and,
thus, dim F = ∞ also in this case.
This completes the proof that (f , μ) has infinite rank. When B is countably
generated, L02 (M, B, μ) is separable (Example 8.4.7) and so the rank is
necessarily countable.

We say that a partition of (M, B, μ) is trivial if all its elements have measure
0 or 1. Keep in mind that in the present chapter all partitions are assumed to be
countable.
9.5 Entropy and equivalence 283

Proposition 9.5.11. A system (f , μ) in a Lebesgue space is a Kolmogorov


system if and only if hμ (f , P) > 0 for every non-trivial partition with finite
entropy. In particular, every Kolmogorov system has positive entropy.
This result is due to Pinsker [Pin60] and to Rokhlin and Sinai [RS61]. The
proof may also be found in Rokhlin [Rok67a, §13]. Let us point out, however,
that the last part of the statement is an immediate consequence of the ideas in
Section 9.5.2. Indeed, suppose that (f , μ) is a Kolmogorov system with zero
entropy. By Corollary 9.5.6, any σ -algebra A that satisfies condition (i) in
Definition 9.5.8 also satisfies f −1 (A) = A up to measure zero. Then, condition
(ii) implies that A is trivial and, by condition (iii), the σ -algebra B itself is
trivial (contradicting the assumption we made at the beginning of this section).
It follows from Proposition 9.5.11 and the relation (9.1.21) that the
inverse of a Kolmogorov automorphism is also a Kolmogorov automorphism.
Unlike what happens for Bernoulli automorphisms (Exercise 9.5.1), in the
Kolmogorov case the two systems (f , μ) and (f −1 , μ) need not be ergodically
equivalent.
Example 9.5.12. The first example of an invertible system with countable
Lebesgue spectrum that is not a Kolmogorov system was found by Girsanov
in 1959, but was never published. Another example, a factor of a certain
Gaussian shift (recall Example 8.4.13) with countable Lebesgue spectrum
but whose entropy vanishes, was exhibited a few years later by Newton
and Parry [NP66]. Also, Gurevič [Gur61] proved that the horocyclic flow
on surfaces with constant negative curvature has entropy zero; sometime
before, Parasyuk [Par53] had shown that such flows have countable Lebesgue
spectrum.
As we saw in Theorem 8.4.11, all systems with countable Lebesgue
spectrum are spectrally equivalent. Therefore, the fact that systems as in
Example 9.5.12 do exist has the interesting consequence that being a
Kolmogorov system is not an invariant of spectral equivalence.
Example 9.5.13. We saw in Examples 8.4.2 and 8.4.3 that all the Bernoulli
shifts have Lebesgue spectrum. In both cases, one-sided and two-sided, we
exhibited subspaces of L02 (M, B, μ) of the form E = L02 (M, A, μ) for some
σ -algebra A ⊂ B. Therefore, the same argument proves that every Bernoulli
shift is a Kolmogorov system. In particular, every Bernoulli automorphism is
a Kolmogorov automorphism.

There exist Kolmogorov automorphisms that are not Bernoulli automor-


phisms. The first example, found by Ornstein, is quite elaborate. The following
simple construction is due to Kalikow [Kal82]:
Example 9.5.14. Let σ :  →  be the shift map in  = {1, 2}Z and μ be
the Bernoulli measure associated with the probability vector p = (1/2, 1/2).
284 Entropy

Consider the map f :  ×  →  ×  defined as follows:


   
f (xn )n , (yn )n = σ ((xn )n ), σ ±1 ((yn )n )

where the sign is − if x0 = 1 and is + if x0 = 2. This map preserves the product


measure μ × μ. Kalikow showed that (f , μ) is a Kolmogorov automorphism
but not a Bernoulli automorphism.

Consider any Kolmogorov automorphism that is not a Bernoulli au-


tomorphism and let s > 0 be its entropy. Consider any Bernoulli au-
tomorphism whose entropy is equal to s (see Exercise 9.2.2). The two
systems have the same entropy but they cannot be ergodically equiv-
alent, since being a Bernoulli automorphism is an invariant of ergodic
equivalence. Therefore, the entropy is not a complete invariant of ergodic
equivalence for Kolmogorov automorphisms. Actually (see Ornstein and
Shields [OS73]), there exists a non-countable family of Kolmogorov au-
tomorphisms that have the same entropy and, yet, are not ergodically
equivalent.
The properties of Bernoulli automorphisms described in Exercise 9.5.1 do
not extend to the Kolmogorov case: there exist Kolmogorov automorphisms
that are not ergodically equivalent to their inverses (see Ornstein and
Shields [OS73]), and there are also Kolmogorov automorphisms that admit
no k-th root for any value of k ≥ 1 (Clark [Cla72]).
Closing this section, let us discuss the Kolmogorov property for two specific
classes of systems: Markov shifts and automorphisms of compact groups.
Concerning the first class, Friedman and Ornstein [FO70] proved that
every two-sided mixing Markov shift is a Bernoulli automorphism. Recall
(Theorem 7.2.11) that a Markov shift is mixing if and only if the corresponding
stochastic matrix is aperiodic. It follows from the theorem of Friedman and
Ornstein that the entropy is still a complete invariant of ergodic equivalence
in the context of two-sided mixing Markov shifts. Another interesting
consequence is that every two-sided mixing Markov shift is a Kolmogorov
automorphism. Observe, however, that this consequence admits a relatively
easy direct proof (see Exercise 9.5.4).
As for the second class, every ergodic automorphism of a compact group
is a Kolmogorov automorphism. This was proven by Rokhlin [Rok67b] for
abelian groups and by Yuzvinskii [Yuz68] in the general case. In fact, ergodic
automorphisms of metrizable compact groups are Bernoulli automorphisms
(Lind [Lin77], Miles and Thomas [MT78]). In particular, every ergodic
linear automorphism of the torus Td is a Bernoulli automorphism; this had
been proved by Katznelson [Kat71]. Recall (Theorem 4.2.14) that a linear
automorphism fA is ergodic if and only if no eigenvalue of the matrix A is a
root of unity.
9.5 Entropy and equivalence 285

9.5.4 Exact systems


We say that a Kolmogorov system is exact if one may take the σ -algebra A in
Definition 9.5.8 to be the σ -algebra B of all measurable sets. Note that in this
case the conditions (i) and (iii) are automatically satisfied. Therefore, a system
(f , μ) is exact if and only if the σ -algebra B is such that ∞n=0 f
−n
(B) is trivial,
meaning that it only contains sets with measure 0 or 1. Equivalently, (f , μ) is
exact if and only if

  
Ufn L02 (M, B, μ) = {0}.
n=0

This observation also implies that, unlike the Kolmogorov property, exactness
is an invariant of spectral equivalence.
We saw in Example 8.4.2 that every one-sided Bernoulli shift has Lebesgue
spectrum. In order to do that, we considered the subspace E = L02 (M, B, μ).
Therefore, the same argument proves that every one-sided Bernoulli shift is an
exact system. A much larger class of examples, expanding maps endowed with
their equilibrium states, is studied in Chapter 12.
It is immediate that invertible systems are never exact: in the invertible case
f −n (B) = B up to measure zero, for every n; therefore, the exactness condition
corresponds to saying that the σ -algebra B is trivial (which is excluded, by
hypothesis).
Figure 9.2 summarizes the relations between the different classes of systems
studied in this book. It is organized in three columns: systems with zero entropy
(which are necessarily invertible, as we saw in Proposition 9.5.5), invertible
systems with positive entropy and non-invertible systems.

h=0 h > 0, invertible non-invertible

RT B2 B1

Bernoulliaut. exactsyst.
Kolmogorovsyst.
discretespec. Lebesguespec.
mixingsyst.
ergodicsyst.

Figure 9.2. Relations between various of classes of systems (B1/B2 =


one-sided/two-sided Bernoulli shifts, RT = irrational rotations on tori)
286 Entropy

9.5.5 Exercises
9.5.1. Show that if (f , μ) is a Bernoulli automorphism then it is ergodically equivalent
to its inverse (f −1 , μ). Moreover, for every k ≥ 1 there exists a Bernoulli
automorphism (g, ν) that is a k-th root of (f , μ), that is, such that (gk , ν)
is ergodically equivalent to (f , μ). [Observation: Ornstein [Orn74] proved
that, conversely, every k-th root of a Bernoulli automorphism is a Bernoulli
automorphism.]
9.5.2. Use the notion of density point to show that the decimal expansion map f (x) =
10x − [10x] is exact, relative to the Lebesgue measure.
9.5.3. Show that the Gauss map is exact, relative to its absolutely continuous invariant
measure μ.
9.5.4. Show that the two-sided Markov shift associated with any aperiodic stochastic
matrix P is a Kolmogorov automorphism.
9.5.5. Show that the one-sided Markov shift associated with any aperiodic stochastic
matrix P is an exact system.
9.5.6. Prove that if (f , μ) is exact then hμ (f , P) > 0 for every non-trivial partition P
with finite entropy.

9.6 Entropy and ergodic decomposition


It is not difficult to show that the entropy hμ (f ) is always an affine function of
the invariant measure μ:

Proposition 9.6.1. Let μ and ν be probability measures invariant under a


transformation f : M → M. Then, htμ+(1−t)ν (f ) = thμ (f )+(1−t)hν (f ) for every
0 < t < 1.

Proof. Define φ(x) = −x log x for x > 0. On the one hand, since the function
φ is concave,

φ(tμ(B) + (1 − t)ν(B)) ≥ tφ(μ(B)) + (1 − t)φ(ν(B))

for every measurable set B ⊂ M. On the other hand, given any measurable set
B ⊂ M,
     
φ tμ(B) + (1 − t)ν(B) − tφ μ(B) − (1 − t)φ ν(B)
tμ(B) + (1 − t)ν(B) tμ(B) + (1 − t)ν(B)
= −tμ(B) log − (1 − t)ν(B) log
μ(B) ν(B)
≤ −tμ(B) log t − (1 − t)ν(B) log(1 − t)
because the function − log is decreasing. Therefore, given any partition P with
finite entropy,
Htμ+(1−t)ν (P) ≥ tHμ (P) + (1 − t)Hν (P) and
Htμ+(1−t)ν (P) ≤ tHμ (P) + (1 − t)Hν (P) − t log t − (1 − t) log(1 − t).
9.6 Entropy and ergodic decomposition 287

Consequently,

htμ+(1−t)ν (f , P) = thμ (f , P) + (1 − t)hν (f , P). (9.6.1)

It follows, immediately, that htμ+(1−t)ν (f ) ≤ thμ (f ) + (1 − t)hν (f ). Moreover,


the relations (9.1.16) and (9.6.1) imply that
+ %
htμ+(1−t)ν f , P1 ∨ P2 ≥ thμ (f , P1 ) + (1 − t)hν (f , P2 )

for any partitions P1 and P2 . Taking the supremum on P1 and P2 we obtain


that htμ+(1−t)ν (f ) ≥ thμ (f ) + (1 − t)hν (f ).

In particular, given any invariant set A ⊂ M, we have that

hμ (f ) = μ(A)hμA (f ) + μ(Ac )hμAc (f ), (9.6.2)

where μA and μAc denote the normalized restrictions of μ to the set A and
its complement, respectively (this fact was obtained before, in Exercise 9.1.4).
Another immediate consequence is the following version of Proposition 9.6.1
for finite convex combinations:

n 
n
μ= ti μi ⇒ hμ (f ) = ti hμi (f ), (9.6.3)
i=1 i=1

for any invariant probability measures μ1 , . . . , μn and any positive numbers



t1 , . . . , tn with ni=1 ti = 1.
A much deeper fact, due to Konrad Jacobs [Jac60, Jac63], is that the affinity
property extends to the ergodic decomposition given by Theorem 5.1.3:

Theorem 9.6.2 (Jacobs). Suppose that M is a complete separable metric


space. Given any invariant probability measure μ, let {μP : P ∈ P} be its
ergodic decomposition. Then, hμ (f ) = hμP (f ) dμ̂(P) (if one side is infinite,
so is the other side).

We are going to deduce this result from a general theorem about affine
functionals in the space of probability measures, that we state in Section 9.6.1
and prove in Section 9.6.2.

9.6.1 Affine property


Let M be a complete separable metric space. We saw in Lemma 2.1.3 that the
weak∗ topology in the space of probability measures M1 (M) is metrizable.
Moreover (Exercise 2.1.3), the metric space M1 (M) is separable.
Let W be a probability measure on the Borel σ -algebra of M1 (M). The
barycenter of W is the probability measure bar(W) ∈ M1 (M) given by
   
ψ d bar(W) = ψ dη dW(η) (9.6.4)
288 Entropy

for every bounded measurable function ψ : M → R. We leave it to the reader to


check that this relation determines bar(W) uniquely (Exercise 9.6.1) and that
the barycenter is an affine function of the measure (Exercise 9.6.2).
Example 9.6.3. If W is a Dirac measure, that is, if W = δν for some
ν ∈ M1 (M), then bar(W) = ν. Using Exercise 9.6.2, we get the following
 ∞
generalization: if W = ∞ t δ
i νi with t ≥ 0 and i=1 ti = 1 and νi ∈ M1 (M)
∞
i=1 i
for every i, then bar(W) = i=1 ti νi .
Example 9.6.4. Let {μP : P ∈ P} be the ergodic decomposition of a probability
measure μ invariant under a measurable transformation f : M → M and μ̂ be
the associated quotient measure in P (recall Section 5.1.1). Let W = ∗ μ̂ be
the image of the quotient measure μ̂ under the map : P → M that assigns
to each P ∈ P the conditional probability μP . Then (Exercise 5.1.4),
      
ψdμ = ψ dμP dμ̂(P) = ψ dη dW(η)

for every bounded measurable function ψ : M → R. This means that μ is the


barycenter of W.

A set M ⊂ M1 (M) is said to be strongly convex if ∞i=1 ti νi ∈ M for any
∞
νi ∈ M and ti ≥ 0 with i=1 ti = 1.
Theorem 9.6.5. Let M be a strongly convex subset of M1 (M) and H : M →
R be a non-negative affine functional. If H is upper semi-continuous then

H(bar(W)) = H(η) dW(η)

for any probability measure W on M1 (M) with W(M) = 1 and bar(W) ∈ M.


Before proving this result, let us explain how Theorem 9.6.2 may be
obtained from it. The essential step is the following lemma:
Lemma 9.6.6. hμ (f , Q) = hμP (f , Q) dμ̂(P) for any finite partition Q of M.

Proof. Let M = M1 (f ), the subspace of invariant probability measures, and


H : M → R be the functional defined by H(η) = hη (f , Q). Let W be the
image of the quotient measure μ̂ by the map : P → M that assigns to
each P ∈ P the conditional probability μP . It is clear that M is strongly
convex, W(M) = 1 and (recall Example 9.6.4) the barycenter bar(W) = μ
is in M. Proposition 9.6.1 shows that H is affine and it is clear that H is
non-negative. In order to apply Theorem 9.6.5, we also need to check that
H is upper semi-continuous.
Initially, suppose that f is the shift map in a space  = X N , where X is a
finite set, and Q is the partition of  into cylinders [0; a], a ∈ X. The point
with this partition is that its elements are both open and closed subsets of .
In other words, ∂Q = ∅ for every Q ∈ Q. By Proposition 9.2.12, it follows that
9.6 Entropy and ergodic decomposition 289

the map η → H(η) = hη (f , Q) is upper semi-continuous at every point of M.


So, we may indeed apply Theorem 9.6.5 to the functional H. In this way we
get that 
hμ (f , Q) = H(μ) = H(bar(W)) = H(η) dW(η)
 
= H(μP ) dμ̂(P) = hμP (f , Q) dμ̂(P).

Now we treat the general case of the lemma, by reduction to the previous
paragraph. Given any finite partition Q, let  = QN and
 
h : M → , h(x) = Q(f n (x)) n∈N .
Observe that h ◦ f = σ ◦ h, where σ :  →  denotes the shift map. To each
measure η on M we may assign the measure η = h∗ η on . The previous
relation ensures that if η is invariant under f then η is invariant under σ .
Moreover, if η is ergodic for f then η is ergodic for σ . Indeed, if B ⊂  is
invariant under σ then B = h−1 (B ) is invariant under σ . Assuming that η is
ergodic, it follows that η (B ) = η(B) is either 0 or 1; hence, η is ergodic.
By construction, Q = h−1 (Q ), where Q denotes the partition of  into the
*n−1 −j *n−1 −j 
cylinders [0; Q], Q ∈ Q. More generally, j=0 f (Q) = h−1 ( j=0 σ (Q ))
and, thus,
 n−1
)   n−1
) 
−j −j 
Hη f (Q) = Hη σ (Q )
j=0 j=0

for every n ∈ N. Dividing by n and taking the limit,


hη (f , Q) = hη (σ , Q ) for every η ∈ M. (9.6.5)
Denote μ = h∗ μ and μP = h∗ (μP ) for each P. For every bounded measurable
function ψ :  → R,
    
ψ dμ = (ψ ◦ h) dμ = (ψ ◦ h) dμP dμ̂(P)
   (9.6.6)
= ψ dμP dμ̂(P).

As the measures μP are ergodic, the relation (9.6.6) means that {μP : P ∈ P}
is an ergodic decomposition of μ . Then, according to the previous paragraph,
hμ (σ , Q ) = hμP (σ , Q ) dμ̂(P). By the relation (9.6.5) applied to η = μ and
to η = μP , this may be rewritten as

hμ (σ , Q) = hμP (σ , Q) dμ̂(P),

which is precisely what we wanted to prove.

Proceeding with the proof of Theorem 9.6.2, consider any increasing


sequence Q1 ≺ · · · ≺ Qn ≺ · · · of finite partitions of M such that diam Qn (x)
290 Entropy

converges to zero at every x ∈ M (such a sequence may be constructed, for


instance, from a family of balls centered at the points of a countable dense
subset, with radii converging to zero). By Lemma 9.6.6,

hμ (f , Qn ) = hμP (f , Qn ) dμ̂(P) (9.6.7)

for every n. According to (9.1.16), the sequence hη (f , Qn ) is non-decreasing,


for any invariant measure η. Moreover, by Corollary 9.2.5, its limit is equal to
hη (f ). Then, we may pass to the limit in (9.6.7) with the aid of the monotone
convergence theorem. In this way we get that

hμ (f ) = hμP (f ) dμ̂(P),

as we wanted to prove. Note that the argument remains valid even when either
of the two sides of this identity is infinite (then the other one is also infinite).
In this way, we reduced the proof of Theorem 9.6.2 to proving Theo-
rem 9.6.5.

9.6.2 Proof of the Jacobs theorem


Now we prove Theorem 9.6.5. Let us start by proving that the barycenter
function has the following continuity property: if W is concentrated in a
neighborhood V of a given measure ν then the barycenter of W is close to
ν. More precisely:

Lemma 9.6.7. Let W be a probability measure on M1 (M) and ν ∈ M1 (M).


Given any finite set  = {φ1 , . . . , φN } of bounded continuous functions and any
ε > 0, let V = V(ν, , ε) be as defined in (2.1.1). If W(V) = 1, then bar(W) ∈ V.

Proof. Consider any i = 1, . . . , N. By the definition of barycenter and the


hypothesis that the complement of V has measure zero,
         
   
 φi d bar(W) − φi dν  =  φ dη dW(η) − φ dν dW(η) 
   i i 
    
 
≤  φi dη − φi dν  dW(η).
V

By the definition of V, the last expression is smaller than ε. Therefore,


  
 
 φi d bar(W) − φi dν  < ε
 

for every i = 1, . . . , N. In other words, bar(W) ∈ V.

We also use the following simple property of non-negative affine


functionals:
9.6 Entropy and ergodic decomposition 291

Lemma 9.6.8. For any non-negative affine functional H : M → R, probability



measures νi ∈ M, i ≥ 1 and non-negative numbers ti , i ≥ 1 with ∞i=1 ti = 1,
/∞ 0 ∞
 
H ti νi ≥ ti H(νi ).
i=1 i=1
 
Proof. Define sn = ni=1 ti for every n ≥ 1. Let Rn = (1 − sn )−1 i>n ti νi if
sn < 1; otherwise, pick Rn arbitrarily. Then,

 
n
ti νi = ti νi + (1 − sn )Rn .
i=1 i=1

Since H is affine and the expression on the right-hand side is a (finite) convex
combination, it follows that
 ∞   n 
n
H ti νi = ti H(νi ) + (1 − sn )H(Rn ) ≥ ti H(νi )
i=1 i=1 i=1

for every n. Now just make n go to infinity.

Corollary 9.6.9. If H : M → R is a non-negative affine functional then H is


bounded.

Proof. Suppose that H is not bounded: there exist νi ∈ M such that H(νi ) ≥ 2i

for every i ≥ 1. Consider ν = ∞ −i
i=1 2 νi . By Lemma 9.6.8,


H(ν) ≥ 2−i H(νi ) = ∞.
i=1

This contradicts the fact that H(ν) is finite.

Now we are ready to prove the inequality ≥ in Theorem 9.6.5. Let us write
μ = bar(W). By the hypothesis of semi-continuity, given any ε > 0 there exist
δ > 0 and a finite family  = {φ1 , . . . , φN } of bounded continuous functions
such that
H(η) < H(μ) + ε for every η ∈ M ∩ V(μ, , δ). (9.6.8)
Since M1 (M) is a separable metric space, it admits a countable basis of open
sets, and then so does any subspace. Let {V1 , . . . , Vn , . . . } be a basis of open sets
of M, with the following properties:

(i) every Vn is contained in M ∩ V(νn , , δ) for some νn ∈ M;


(ii) H(η) < H(νn ) + ε for every η ∈ Vn .

Consider the partition {P1 , . . . , Pn , . . . } of the space M defined by P1 = V1 and


Pn = Vn \ (V1 ∪ · · · ∪ Vn−1 ) for every n > 1. It is clear that the properties (i) and
(ii) remain valid if we replace Vn by Pn . We claim that

W(Pn )νn ∈ V(μ, , δ). (9.6.9)
n
292 Entropy

Indeed, observe that


        
    
 φi dμ − W(P ) φ dν  =  φ dη − φ dν dW(η) 
 n i n  i i n 
n n Pn

for every i. Therefore, property (i) ensures that


    
 
 φi dμ − W(P ) φ dν 
n < δW(Pn ) = δ for every i,
 n i
n n

which is precisely what (9.6.9) means. Then, combining (9.6.8), (9.6.9) and
Lemma 9.6.8,
/ 0
 
W(Pn )H(νn ) ≤ H W(Pn )νn < H(μ) + ε.
n n

On the other hand, property (ii) implies that


    
H(η) dW(η) − W(Pn )H(νn ) = H(η) − H(νn ) dWμ(η)
n n Pn

< εW(Pn ) = ε.
n

Adding the last two inequalities, we get that H(η) dW(η) < H(μ) + 2ε. Since
ε > 0 is arbitrary, this implies that H(μ) ≥ H(η) dW(η).
Now we prove the inequality ≤ in Theorem 9.6.5. Consider any sequence
(Pn )n of finite partitions of M such that, for every ν ∈ M, the diameter
of P (ν) converges to zero when n goes to infinity. For example, Pn =
*n n
i=1 {Vi , Vi }, where {Vn : n ≥ 1} is any countable basis of open sets of M.
c

For each fixed n, consider the normalized restriction WP of the measure W to


each set P ∈ Pn (we consider only sets with positive measure: the union of
all the elements of n Pn with W(P) = 0 has measure zero and so may be
neglected):
W(A ∩ P)
WP (A) = for each measurable set A ⊂ M.
W(P)

It is clear that W = P∈Pn W(P)WP . Since the barycenter is an affine function
(Exercise 9.6.2), it follows that

bar(W) = W(P) bar(WP )
P∈Pn

and, therefore, 
H(bar(W)) = W(P)H(bar(WP )).
P∈Pn

Define Hn (η) = H(bar(WPn (η) )), for each η ∈ M. Then the last identity above
may be rewritten as follows:

H(bar(W)) = Hn (η) dW(η) for every n. (9.6.10)
9.6 Entropy and ergodic decomposition 293

It follows directly from the definition of Hn that 0 ≤ Hn (η) ≤ sup H for every
n and every η. Recall that sup H < ∞ (Corollary 9.6.9). We also claim that
lim sup Hn (η) ≤ H(η) for every η ∈ M. (9.6.11)
n

This may be seen as follows. Given any neighborhood V = V(η, , ε) of η, we


have that Pn (η) ⊂ V for every large n, because the diameter of Pn (η) converges
to zero. Then, assuming always that W(Pn (η)) is positive,
WPn (η) (V) ≥ WPn (η) (Pn (η)) = 1.
By Lemma 9.6.7, it follows that bar(WPn (η) ) ∈ V for every large n. Now (9.6.11)
is a direct consequence of the hypothesis that H is upper semi-continuous.
Applying the lemma of Fatou to the sequence −Hn + sup H, we deduce that
  
lim sup Hn (η) dW(η) ≤ lim sup Hn (η) dW(η) ≤ H(η) dW(η). (9.6.12)
n n

Combining the relations (9.6.10) and (9.6.12), we get that



H(bar(W)) ≤ H(η) dW(η),

as we wanted to prove.
Now the proof of Theorems 9.6.2 and 9.6.5 is complete.

9.6.3 Exercises
9.6.1. Check that, given any probability measure W on M1 (M), there exists a unique
probability measure bar(W) ∈ M1 (M) on M that satisfies (9.6.4).
9.6.2. Show that the barycenter function is strongly affine: if Wi , i ≥ 1 are probability

measures on M1 (M) and ti , i ≥ 1 are non-negative numbers with ∞ i=1 ti = 1,
then

∞ 

bar( ti W i ) = ti bar(Wi ).
i=1 i=1

9.6.3. Show that if M ⊂ M1 (M) is a closed convex set then M is strongly convex.
Moreover, in that case W(M) = 1 implies that bar(W) ∈ M.
9.6.4. Check that the inequality ≥ in Theorem 9.6.2 may also be obtained through the
following, more direct, argument:
(1) Recalling that the function φ(x) = −x log x is concave, show that Hμ (Q) ≥
HμP (Q) dμ̂(P) for every finite partition Q.
(2) Deduce that hμ (f , Q) ≥ hμP (f , Q) dμ̂(P) for every finite partition Q.
(3) Conclude that hμ (f ) ≥ hμP (f ) dμ̂(P).
9.6.5. The inequality ≤ in Theorem 9.6.2 is based on the fact that hμ (f , Q) ≤
hμP (f , Q) dμ̂(P) for every finite partition Q, which is part of Lemma 9.6.6.
Find what is wrong with the following “alternative proof”:
Let Q be a finite partition. The theorem of Shannon–McMillan–Breiman
ensures that hμ (f , Q) = hμ (f , Q, x) dμ(x), where

1 1
hμ (f , Q, x) = lim − log μ(Q (x)) = lim − log μP (Qn (x)) dμ̂(P).
n
n n n n
294 Entropy

By the Jensen inequality applied to the convex function ψ(x) = − log x,


 
1 1
lim − log μP (Q (x)) dμ̂(P) ≤ lim − log μP (Qn (x)) dμ̂(P).
n
n n n n
Using the fact that hμP (f , Q) = hμP (f , Q, x) at almost every point (because the
measure μP is ergodic),
 
1 1
lim − log μP (Q (x)) dμ̂(P) = lim − log μP (Qn (x)) dμ̂(P)
n
n n n n

= hμP (f , Q) dμ̂(P).

This shows that hμ (f , Q, x) ≤ hμP (f , Q) dμ̂(P) for every finite partition Q and
μ-almost every x. Consequently, hμ (f , Q) ≤ hμP (f , Q) dμ̂(P) for every finite
partition Q.

9.7 Jacobians and the Rokhlin formula


Let U be an open subset and m be the Lebesgue measure of Rd . Let f : U → U
be a local diffeomorphism. By the formula of change of variables,

m(f (A)) = | det Df (x)| dx (9.7.1)
A
for any measurable subset A of a small ball restricted to which f is injective.
The notion of a Jacobian that we present in this section extends this kind
of relation to much more general transformations and measures. Besides
introducing this concept, we are going to show that Jacobians do exist under
quite general hypotheses. Most important, it is possible to express the system’s
entropy explicitly in terms of the Jacobian. Actually, we already encountered
an interesting manifestation of this fact in Proposition 9.4.2.
Let f : M → M be a measurable transformation. We say that f is locally
invertible if there exists some countable cover {Uk : Uk ≥ 1} of M by
measurable sets such that f (Uk ) is a measurable set and the restriction f | Uk :
Uk → f (Uk ) is a bijection with measurable inverse, for every k ≥ 1. Every
measurable subset of some Uk is called an invertibility domain. Note that the
image f (A) of any invertibility domain A is a measurable set. It is also clear that
if f is locally invertible then the pre-image f −1 (y) of any y ∈ M is countable: it
contains at most one point in each Uk .
Let η be a probability measure on M, not necessarily invariant under f . A
measurable function ξ : M → [0, ∞) is a Jacobian of f with respect to η if the
restriction of ξ to any invertibility domain A is integrable with respect to η and
satisfies 
η(f (A)) = ξ dη. (9.7.2)
A
It is important to note (see Exercise 9.7.1) that the definition does not depend
on the choice of the family {Uk : k ≥ 1}.
9.7 Jacobians and the Rokhlin formula 295

Example 9.7.1. Let σ :  →  be the shift map in  = {1, 2, . . . , d}N and μ be


the Bernoulli measure associated with a probability vector p = (p1 , . . . , pd ). The
restriction of σ to each cylinder [0; a] is an invertible map. Moreover, given any
cylinder [0; a, a1 , . . . , an ] ⊂ [0; a],
  1  
μ σ ([0; a, a1 , . . . , an ]) = pa1 · · · pan = μ [0; a, a1 , . . . , an ] .
pa
 
We invite the reader to deduce that μ σ (A) = (1/pa )μ(A) for every
measurable set A ⊂ [0; a]. Therefore, the function ξ((xn )n ) = 1/px0 is a
Jacobian of σ with respect to μ.

We say that a measure η is non-singular with respect to the transformation


f : M → M if the image of any invertibility domain with measure zero also has
measure zero: if η(A) = 0 then η(f (A)) = 0. For example, if f : U → U is a
local diffeomorphism of an open subset of Rd and η is the Lebesgue measure,
then η is non-singular. For any locally invertible transformation, every invariant
probability measure is non-singular restricted to some full measure invariant
set (Exercise 9.7.8).
It follows immediately from the definition (9.7.2) that if f admits a Jacobian
with respect to a measure η then this measure is non-singular. The converse is
also true:

Proposition 9.7.2. Let f : M → M be a locally invertible transformation and


η be a measure on M, non-singular with respect to f . Then there exists some
Jacobian of f with respect to η and it is essentially unique: any two Jacobians
coincide at η-almost every point.

Proof. We start by proving existence. Given any countable cover {Uk : k ≥ 1}


of M by invertibility domains of f , define P1 = U1 and Pk = Uk \ (U1 ∪ · · · ∪
Uk−1 ) for each k > 1. Then P = {Pk : k ≥ 1} is a partition of M formed by
invertibility domains. For each Pk ∈ P, denote by ηk the measure defined on
Pk by ηk (A) = η(f (A)). Equivalently, ηk is the image under (f | Pk )−1 of the
measure η restricted to f (Pk ). The hypothesis that η is non-singular implies
that every ηk is absolutely continuous with respect to η restricted to Pk :

η(A) = 0 ⇒ ηk (A) = η(f (A)) = 0

for every measurable set A ⊂ Pk . Let ξk = dηk /d(η | Pk ) be the Radon–Nikodym


derivative (Theorem A.2.18). Then, ξk is a function defined on Pk , integrable
with respect to η and satisfying

η(f (A)) = ηk (A) = ξk dη (9.7.3)
A

for every measurable set A ⊂ Pk . Consider the function ξ : M → [0, ∞) whose


restriction to each Pk ∈ P is given by ξk . Every subset of Uk may be written
296 Entropy

as a (disjoint) union of subsets of P1 , . . . , Pk . Applying (9.7.3) to each one of


these subsets and summing the corresponding equalities, we get that

η(f (A)) = ξ dη for every measurable set A ⊂ Uk and k ≥ 1.
A

This proves that ξ is a Jacobian of f with respect to η.


Now suppose that ξ and ζ are Jacobians of f with respect to η and there
exists B ⊂ M with η(B) > 0 such that ξ(x)  = ζ (x) for every x ∈ B. Up to
replacing B by a suitable subset, and exchanging the roles of ξ and ζ if
necessary, we may suppose that ξ(x) < ζ (x) for every x ∈ B. Similarly, we
may suppose that B is contained in some Uk . Then,
 
η(f (B)) = ξ dη < ζ dη = η(f (B)).
B B

This contradiction proves that the Jacobian is essentially unique.

From now on, we denote by Jη f the (essentially unique) Jacobian of a locally


invertible transformation f : M → M with respect to a measure η, when it exists.
By definition, Jη f is integrable on each invertibility domain. If f is such
that the number of pre-images of any y ∈ M is bounded then the Jacobian is
(globally) integrable: if ≥ 1 is the maximum number of pre-images then
  
Jη f dη = Jη f dη = η(f (Pk )) ≤ ,
k Pk k

because every point y ∈ M is in no more than images f (Pk ).


The following observation will be useful in the sequel. Let Z ⊂ M be the set
of points where the Jacobian Jη f vanishes. Covering Z with a countable family
of invertibility domains and using (9.7.2), we see that f (Z) is a measurable set
and η(f (Z)) = 0. In other words, the set of points y ∈ M such that Jμ f (x) > 0
for every x ∈ f −1 (y) has total measure for η. When the probability measure
η is invariant under f , it also follows that η(f −1 (f (Z))) = η(f (Z)) = 0 and so
η(Z) = 0.
The main result in this section is the following formula for the entropy of an
invariant measure:

Theorem 9.7.3 (Rokhlin formula). Let f : M → M be a locally invertible


transformation and μ be a probability measure invariant under f . Assume
that there is some partition P with finite entropy such that n P n generates
the σ -algebra of M, up to measure zero, and every P ∈ P is an invertibility
domain of f . Then hμ (f ) = log Jμ f dμ.
*n −j
Proof. Let us consider the sequence of partitions Qn = j=1 f (P). By
Corollary 9.2.5 and Lemma 9.1.12,
hμ (f ) = hμ (f , P) = lim Hμ (P/Qn ). (9.7.4)
n
9.7 Jacobians and the Rokhlin formula 297

By definition (as before, φ(x) = −x log x),


    μ(P ∩ Qn )
Hμ (P/Qn ) = −μ P ∩ Qn log
P∈P Qn ∈Qn
μ(Qn )
    (9.7.5)
μ(P ∩ Qn )
= μ(Qn )φ .
P∈P Q ∈Q
μ(Q n )
n n

Let en (ψ, x) be the conditional expectation of a function ψ with respect to the


partition Qn and e(ψ, x) be its limit when n goes to infinity (these notions were
introduced in Section 5.2.1: see (5.2.1) and Lemma 5.2.1). It is clear from the
definition that
μ(P ∩ Qn )
= en (XP , x) for every x ∈ Qn and every Qn ∈ Qn .
μ(Qn )
Therefore,
    
μ(P ∩ Qn )
μ(Qn )φ = φ(en (XP , x)) dμ(x). (9.7.6)
P∈P Qn ∈Qn
μ(Qn ) P∈P

By Lemma 5.2.1, the limit e(XP , x) = limn en (XP , x) exists at μ-almost every
x. So, observing that the function φ is bounded, we may use the dominated
convergence theorem to deduce from (9.7.4)–(9.7.6) that

hμ (f ) = φ(e(XP , x)) dμ(x). (9.7.7)
P∈P

Now we need to relate the expression inside the integral to the Jacobian. This
we do by means of Lemma 9.7.5 below. Beforehand, let us prove the following
change of variables formulas:

Lemma 9.7.4. For any probability measure η non-singular with respect to f ,


and any invertibility domain A ⊂ M of f :

(i) f (A) ϕ dη = A (ϕ ◦ f )Jη f


dη for any measurable function ϕ : f (A) → R such
that the integrals are defined (possibly ±∞).
(ii) A ψ dη = f (A) (ψ/Jη f ) ◦ (f | A)−1 dη for any measurable function ψ : A →
R such that the integrals are defined (possibly ±∞).

Proof. The definition (9.7.2) means that the formula in part (i) holds for the
characteristic function ϕ = Xf (A) for any invertibility domain A. Thus, it holds
for the characteristic function of any measurable subset of f (A), since such a
subset may be written as f (B) for some invertibility domain B ⊂ A. Hence, by
linearity, the identity extends to every simple function defined on f (A). Using
the monotone convergence theorem, we conclude that the identity holds for
every non-negative measurable function. Using linearity once more, we get
the general statement of part (i).
298 Entropy

To deduce the claim in (ii), apply (i) to the function ϕ = (ψ/Jη f ) ◦ (f | A)−1 .
Note that this function is well defined at η-almost every point for, as observed
before, Jη f (x) > 0 for every x in the pre-image of η-almost every y ∈ M.

Lemma 9.7.5. For every bounded measurable function ψ : M → R and every


probability measure η invariant under f ,
 ψ
e(ψ, x) = ψ̂(f (x)) for η-almost every x, where ψ̂(y) = (z).
−1
Jη f
z∈f (y)

*n −j
n
Proof. Recall that Qn = j=1 f (P), that is, Qn (x) = f −1 (P(f i (x))) for
j=1
*n−1 −j
each x. We also use the sequence of partitions P = j=0 f (P). Observe that
n

Qn (x) = f −1 (P n−1 (f (x))) and P n (x) = P(x) ∩ Qn (x) for every n and every x.
Then,   ψ
ψ̂ dη = ◦ (f | P)−1 dη.
P n−1 (f (x)) n−1 (f (x)) Jη f
P∈P f (P)∩P

Using the formula of change of variables in Lemma 9.7.4(ii), the expression


on the right-hand side may be rewritten as
 
ψ(z) dη(z) = ψ dη.
P∈P P∩Qn (x) Qn (x)

Therefore,  
ψ̂ dη = ψ dη. (9.7.8)
P n−1 (f (x)) Qn (x)

Let en−1 (ψ̂, x) be the conditional expectation of ψ̂ with respect to the partition
P n−1 , as defined in Section 5.2.1, and let e (ψ̂, x) be its limit when n goes to
infinity,
 n−1 given by Lemma
 5.2.1. The hypothesis that η is invariant gives that
η P (f (x)) = η Qn (x) . Dividing both sides of (9.7.8) by this number, we
get that

en−1 (ψ̂, f (x)) = en (ψ, x) for every x and every n > 1. (9.7.9)

Then, taking the limit, e (ψ̂, f (x)) = e(ψ, x) for η-almost every x. On the other
hand, according to Exercise 5.2.3, the hypothesis implies that e (ψ̂, y) = ψ̂(y)
for η-almost every y ∈ M.

Let us apply this lemma to ψ = XP and η = μ. Since f is injective on every


element of P, each intersection P ∩ f −1 (y) either is empty or contains exactly
one point. Therefore, it follows from Lemma 9.7.5 that e(XP , x) = X̂P (f (x)),
with   
1/Jμ f (f | P)−1 (y) if y ∈ f (P)
X̂P (y) =
0 if y ∈
/ f (P).
9.7 Jacobians and the Rokhlin formula 299

Then, recalling that the measure μ is assumed to be invariant,


 
φ(e(XP , x)) dμ(x) = φ(X̂P (y)) dμ(y)
   
1 −1
= log Jμ f ◦ (f | P) dμ = log Jμ f dμ
f (P) Jμ f P

(the last step uses the identity in part (ii) of Lemma 9.7.4). Replacing this
expression in (9.7.7), we get that
 
hμ (f ) = log Jμ f dμ = log Jμ f dμ,
P∈P P

as stated in the theorem.

9.7.1 Exercises
9.7.1. Check that the definition of a Jacobian does not depend on the choice of the cover
{Uk : k ≥ 1} by invertibility domains.
9.7.2. Let σ :  →  be the shift map in  = {1, 2, . . . , d}N and μ be the Markov
measure associated with an aperiodic matrix P. Find the Jacobian of f with
respect to μ.
9.7.3. Let f : M → M be a locally invertible transformation and η be a probability
measure on M, non-singular with respect to f . Show that for every bounded
measurable function ψ : M → R,
  
ψ
ψ dη = (z)dη(x).
−1
Jη f
z∈f (x)

9.7.4. Let f : M → M be a locally invertible transformation and η be a probability


measure on M, non-singular with respect to f . Show that η is invariant under
f if and only if
 1
= 1 for η-almost every x ∈ M.
−1
Jη f (z)
z∈f (x)

Moreover, if η is invariant under f then Jη f ≥ 1 at μ-almost every point.


9.7.5. Let f : M → M be a locally invertible transformation and η be a probability
measure on M, non-singular with respect to f . Show that, for every k ≥ 1, there
exists a Jacobian of f k with respect to η and it is given by
&
k−1
Jη f (x) =
j
Jη f (f j (x)) for η-almost every x.
j=0

Assuming that f is invertible, what can be said about the Jacobian of f −1 with
respect to η?
9.7.6. Let f : M → M and g : N → N be locally invertible transformations and let μ
and ν be probability measures invariant under f and g, respectively. Assume that
there exists an ergodic equivalence φ : M → N between the systems (f , μ) and
(g, ν). Show that Jμ f = Jν g ◦ φ at μ-almost every point.
300 Entropy

9.7.7. Let σk : k → k and σl : l → l be the shift maps in k = {1, . . . , k}N and l =


{1, . . . , l}N . Let μk and μl be the Bernoulli measures on k and l , respectively,
associated with probability vectors p = (p1 , . . . , pk ) and q = (q1 , . . . , ql ). Show
that the systems (σk , μk ) and (σl , μl ) are ergodically equivalent if and only if
k = l and the vectors p and q are obtained from one another by permutation of
the components.
9.7.8. Let μ be a probability measure invariant under a locally invertible trans-
formation f : M → M. Show that there exists a full measure set N ⊂ M such that
N ⊂ f −1 (N) and μ restricted to N is non-singular with respect to the restriction
f : N → N. Conclude that f admits a Jacobian with respect to μ.
10
Variational principle

In 1965, the IBM researchers R. Adler, A. Konheim and M. McAn-


drew proposed [AKM65] a notion of topological entropy, inspired by the
Kolmogorov–Sinai entropy that we studied in the previous chapter, but whose
definition does not involve any invariant measure. This notion applies to any
continuous transformation in a compact topological space.
Subsequently, Efim Dinaburg [Din70] and Rufus Bowen [Bow71, Bow75a]
gave a different, yet equivalent, definition for continuous transformations in
compact metric spaces. Despite being a bit more restrictive, the Bowen–
Dinaburg definition has the advantage of making more transparent the meaning
of this concept: the topological entropy is the rate of exponential growth of the
number of orbits that can be distinguished within a certain precision, arbitrarily
small. Moreover, Bowen extended the definition to non-compact spaces, which
is also very useful for applications.
These definitions of topological entropy and their properties are studied in
Section 10.1 where, in particular, we observe that the topological entropy is an
invariant of topological equivalence (topological conjugacy). In Section 10.2
we analyze several concrete examples.
The main result is the following remarkable relation between the topological
entropy and the entropies of the transformation with respect to its invariant
measures:
Theorem 10.1 (Variational principle). If f : M → M is a continuous
transformation in a compact metric space then its topological entropy h(f )
coincides with the supremum of the entropies hμ (f ) of f with respect to all the
invariant probability measures.
This theorem was proved by Dinaburg [Din70, Din71], Goodman [Goo71a]
and Goodwin [Goo71b]. Here, it arises as a special case of a more
general statement, the variational principle for the pressure, which is due to
Walters [Wal75].
The pressure P(f , φ) is a weighted version of the topological entropy h(f ),
where the “weights” are determined by a continuous function φ : M → R,
302 Variational principle

which we call a potential. We study these notions and their properties in


Section 10.3. The topological entropy corresponds to the special case when the
potential is identically zero. The notion of pressure was brought from statistical
mechanics to ergodic theory by the Belgium mathematician and theoretical
physicist David Ruelle, one of the founders of differentiable ergodic theory,
and was then extended by the British mathematician Peter Walters.
The variational principle (Theorem 10.1) extends to the setting of the
pressure, as we are going to see in Section 10.4:

 
P(f , φ) = sup hμ (f ) + φ dμ : μ is invariant under f (10.0.1)

for every continuous function φ : M → R. An invariant probability measure μ


is called an equilibrium state for the potential φ if it realizes the supremum in
(10.0.1), that is, if hμ (f ) + φ dμ = P(f , φ). The set of all equilibrium states is
studied in Section 10.5.

10.1 Topological entropy


Initially, we present the definitions of Adler–Konheim–McAndrew and
Bowen–Dinaburg and we prove that they are equivalent when the ambient is a
compact metric space.

10.1.1 Definition via open covers


The original definition of the topological entropy is very similar to that of the
Kolmogorov–Sinai entropy, with open covers in the place of partitions into
measurable sets.
Let M be a compact topological space. An open cover of M is any family
α of open sets whose union is the whole of M. By compactness, every open
cover admits a subcover (that is, a subfamily that is still an open cover) with
finitely many elements. We call the entropy of the open cover α the number
H(α) = log N(α), (10.1.1)
where N(α) is the smallest number such that α admits some finite subcover
with that number of elements.
Given two open covers α and β, we say that α is coarser than β (or β is finer
than α), and we write α ≺ β, if every element of β is contained in some element
of α. For example, if β is a subcover of α then α ≺ β. By Exercise 10.1.1,
α ≺ β ⇒ H(α) ≤ H(β). (10.1.2)
Given open covers α1 , . . . , αn , we denote by α1 ∨ · · · ∨ αn their sum, that is, the
open cover whose elements are the intersections A1 ∩ · · · ∩ An with Aj ∈ αj for
each j. Note that αj ≺ α1 ∨ · · · ∨ αn for every j.
10.1 Topological entropy 303

Let f : M → M be a continuous transformation. If α is an open cover of M


then so is f −j (α) = {f −j (A) : A ∈ α}. For each n ≥ 1, let us denote
α n = α ∨ f −1 (α) ∨ · · · ∨ f −n+1 (α).
Using Exercise 10.1.2, we see that
 
H(α m+n ) = H α m ∨ f −m (α n ) ≤ H(α m ) + H(f −m (α n )) ≤ H(α m ) + H(α n )
for every m, n ≥ 1. In other words, the sequence H(α n ) is subadditive.
Consequently (Lemma 3.3.4),
1 1
h(f , α) = lim H(α n ) = inf H(α n ) (10.1.3)
n n n n

always exists and is finite. It is called the entropy of f with respect to the open
cover α. The relation (10.1.2) implies that
α≺β ⇒ h(f , α) ≤ h(f , β). (10.1.4)
Finally, we define the topological entropy of f to be
h(f ) = sup{h(f , α) : α is an open cover of M}. (10.1.5)
In particular, if β is a subcover of α then h(f , α) ≤ h(f , β). Therefore, the
definition (10.1.5) does not change when one restricts the supremum to the
finite open covers.
Observe that the entropy h(f ) is a non-negative number, possibly infinite
(see Exercise 10.1.6).
Example 10.1.1. Let f : S1 → S1 be any homeomorphism (for example, a
rotation Rθ ) and let α be an open cover of the circle formed by a finite
number of open intervals. Let ∂α be the set consisting of the endpoints of
those intervals. For each n ≥ 1, the open cover α n is formed by intervals whose
endpoints are in
∂α n = ∂α ∪ f −1 (∂α) ∪ · · · ∪ f −n+1 (∂α).
Note that #α n ≤ #∂α n ≤ n#∂α. Therefore,
1 1 1
h(f , α) = lim H(α n ) ≤ lim inf log #α n ≤ lim inf log n = 0.
n n n n n n
Proposition 10.1.12 below gives that h(f ) = limk h(f , αk ) for any sequence
of open covers αk with diam αk → 0. Then, considering open covers αk by
intervals of length less than 1/k, we conclude from the previous calculation
that h(f ) = 0 for every homeomorphism of the circle.
Example 10.1.2. Let  = {1, . . . , d}N and α be the cover of  by the cylinders
[0; a], a = 1, . . . , d. Consider the shift map σ :  → . For each n, the open
cover α n consists of the cylinders of length n:
α n = {[0; a0 , . . . , an−1 ] : aj = 1, . . . , d}.
304 Variational principle

Therefore, H(α n ) = log #α n = log dn and, consequently, h(f , α) = log d.


Observe also that diam α n converges to zero when n → ∞, relative to the
distance defined by (A.2.7). Then, it follows from Corollary 10.1.13 below
that h(f ) = h(f , α) = log d. The same holds for the two-sided shift σ :  → 
in  = {1, . . . , d}Z .
Now we show that the topological entropy is an invariant of topological
equivalence. Let f : M → M and g : N → N be continuous transformations in
compact topological spaces M and N. We say that g is a topological factor of
f if there exists a surjective continuous map θ : M → N such that θ ◦ f = g ◦ θ .
When θ may be chosen to be invertible (a homeomorphism), we say that the
two transformations are topologically equivalent, or topologically conjugate,
and we call θ a topological conjugacy between f and g.
Proposition 10.1.3. If g is a topological factor of f then h(g) ≤ h(f ). In
particular, if f and g are topologically equivalent then h(f ) = h(g).

Proof. Let θ : M → N be a surjective continuous map such that θ ◦ f = g ◦ θ .


Given any open cover α of N, the family
θ −1 (α) = {θ −1 (A) : A ∈ α}
is an open cover of M. Recall that, by definition, the iterated sum α n is the open
n−1 −j
cover formed by the sets j=0 g (Aj ) with A0 , A1 , . . . , An−1 ∈ α. Analogously,
 
the iterated sum θ −1 (α)n consists of the sets n−1
j=0 f −j −1
θ (A )
j . Clearly,

   n−1
 
n−1
−j
 −1  n−1 
−1 −j
 −1 −j
f θ (Aj ) = θ g (Aj ) = θ g (Aj ) .
j=0 j=0 j=0

Noting that the sets of the form on the right-hand side of this identity constitute
the pre-image θ −1 (α n ) of α n , we conclude that θ −1 (α n ) = θ −1 (α)n . Since θ is
surjective, a family γ ⊂ α n covers N if and only if θ −1 (γ ) covers M. Therefore,
H(θ −1 (α)n ) = H(θ −1 (α n )) = H(α n ).
Since n is arbitrary, it follows that h(f , θ −1 (α)) = h(g, α). Then, taking the
supremum over all the open covers α of N:
h(g) = sup h(g, α) = sup h(f , θ −1 (α)) ≤ h(f ).
α α

This proves the first part of the proposition. The second part is an immediate
consequence, since in that case f is also a factor of g.

The converse to Proposition 10.1.3 is false, in general. For example, all


the homeomorphisms of the circle have topological entropy equal to zero
(recall Example 10.1.1) but they are not necessarily topologically equivalent
(for example, the identity is not topologically equivalent to any other
homeomorphism).
10.1 Topological entropy 305

10.1.2 Generating sets and separated sets


Next, we present the definition of topological entropy of Bowen–Dinaburg. Let
f : M → M be a continuous transformation in a metric space M, not necessarily
compact, and let K ⊂ M be any compact subset. When M is compact it suffices
to consider K = M, as observed in (10.1.12) below.
Given ε > 0 and n ∈ N, we say that a set E ⊂ M (n, ε)-generates K if for every
x ∈ K there exists a ∈ E such that d(f i (x), f i (a)) < ε for every i ∈ {0, . . . , n − 1}.
In other words, 
K⊂ B(a, n, ε),
a∈E

where B(a, n, ε) = {x ∈ M : d(f i (x), f i (a)) < ε for i = 0, . . . , n − 1} is the


dynamical ball of center a, length n and radius ε. Note that {B(x, n, ε) : x ∈ K}
is an open cover of K. Hence, by compactness, there always exist finite
(n, ε)-generating sets.
Let us denote by gn (f , ε, K) the smallest cardinality of an (n, ε)-generating
set of K. We define
1
g(f , ε, K) = lim sup log gn (f , ε, K). (10.1.6)
n n
Observe that the function ε → g(f , ε, K) is monotone non-increasing. Indeed,
it is clear from the definition that if ε1 < ε2 then every (n, ε1 )-generating set
is also (n, ε2 )-generating. Therefore, gn (f , ε1 , K) ≥ gn (f , ε2 , K) for every n ≥ 1
and, taking the limit, g(f , ε1 , K) ≥ g(f , ε2 , K). This ensures, in particular, that
g(f , K) = lim g(f , ε, K) (10.1.7)
ε→0

exists. Finally, we define


g(f ) = sup{g(f , K) : K ⊂ M compact}. (10.1.8)
We also introduce the following dual notion. Given ε > 0 and n ∈ N,
we say that a set E ⊂ K is (n, ε)-separated if, given x, y ∈ E, there exists
j ∈ {0, . . . , n − 1} such that d(f j (x), f j (y)) ≥ ε. In other words, if x ∈ E then
B(x, n, ε) contains no other point of E. We denote by sn (f , ε, K) the largest
cardinality of an (n, ε)-separated set. We define
1
s(f , ε, K) = lim sup log sn (f , ε, K). (10.1.9)
n n
It is clear that if 0 < ε1 < ε2 , then every (n, ε2 )-separated set is also
(n, ε1 )-separated. Therefore, sn (f , ε1 , K) ≥ sn (f , ε2 , K) for every n ≥ 1 and,
taking the limit, s(f , ε1 , K) ≥ s(f , ε2 , K). In particular,
s(f , K) = lim s(f , ε, K) (10.1.10)
ε→0

always exists. Finally, we define


s(f ) = sup{s(f , K) : K ⊂ M compact}. (10.1.11)
306 Variational principle

It is clear that g(f , K1 ) ≤ g(f , K2 ) and s(f , K1 ) ≤ s(f , K2 ) if K1 ⊂ K2 . In


particular,
g(f ) = g(f , M) and s(f ) = s(f , M) if M is compact. (10.1.12)
Another interesting observation (Exercise 10.1.7) is that the definitions
(10.1.8) and (10.1.11) are not affected when we restrict the supremum to
compact sets with small diameter.

Proposition 10.1.4. We have g(f , K) = s(f , K) for every compact K ⊂ M.


Consequently, g(f ) = s(f ).

Proof. For the proof we need the following lemma:

Lemma 10.1.5. gn (f , ε, K) ≤ sn (f , ε, K) ≤ gn (f , ε/2, K) for every n ≥ 1, every


ε > 0 and every compact K ⊂ M.

Proof. Let E ⊂ K be an (n, ε)-separated set with maximal cardinality. Given


any y ∈ K \ E, the set E ∪ {y} is not (n, ε)-separated, and so there exists x ∈ E
such that d(f i (x), f i (y)) < ε for every i ∈ {0, . . . , n − 1}. This shows that E is an
(n, ε)-generating set of K. Consequently, gn (f , ε, K) ≤ #E = sn (f , ε, K).
To prove the other inequality, let E ⊂ K be an (n, ε)-separated set and F ⊂ M
be an (n, ε/2)-generating set of K. The hypothesis ensures that, given any x ∈ E
there exists some y ∈ F such that d(f i (x), f i (y)) < ε/2 for every i ∈ {0, . . . , n−1}.
Let φ : E → F be a map such that each φ(x) is a point y satisfying this condition.
We claim that the map φ is injective. Indeed, suppose that x, z ∈ E are such that
φ(x) = y = φ(z). Then
d(f i (x), f i (z)) ≤ d(f i (x), f i (y)) + d(f i (y), f i (z)) < ε/2 + ε/2
for every i ∈ {0, . . . , n − 1}. Since E is (n, ε)-separated, this implies that x = z.
Therefore, φ is injective, as we claimed. It follows that #E ≤ #F and, since E
and F are arbitrary, this proves that sn (f , ε, K) ≤ gn (f , ε/2, K).

Then, given any ε > 0 and any compact K ⊂ M,


1
g(f , ε, K) = lim sup log gn (f , ε, K)
n n
1
≤ lim sup log sn (f , ε, K) = s(f , ε, K) (10.1.13)
n n
1 ε ε
≤ lim sup log gn (f , , K) = g(f , , K).
n n 2 2
Taking the limit when ε → 0, we get that
g(f , K) = lim g(f , ε, K) ≤ lim s(f , ε, K) = s(f , K)
ε→0 ε→0
ε
≤ lim g(f , , K) = g(f , K).
ε→0 2
10.1 Topological entropy 307

This proves the first part of the proposition. The second part is an immediate
consequence.

By definition, the diameter of an open cover α of a metric space M is the


supremum of the diameters of all the sets A ∈ α.
Proposition 10.1.6. If M is a compact metric space then h(f ) = g(f ) = s(f ).

Proof. By Proposition 10.1.4, it suffices to show that s(f ) ≤ h(f ) ≤ g(f ).


Start by fixing ε > 0 and n ≥ 1. Let E ⊂ M be an (n, ε)-separated set and α
be any open cover of M with diameter less than ε. If x and y are in the same
element of α n then
d(f i (x), f i (y)) ≤ diam α < ε for every i = 0, . . . , n − 1.
In particular, each element of α n contains at most one element of E.
Consequently, #E ≤ N(α n ). Taking E with maximal cardinality, we conclude
that sn (f , ε, M) ≤ N(α n ) for every n ≥ 1. So,
1
s(f , ε, M) = lim sup log sn (f , ε, M)
n n
(10.1.14)
1
≤ lim log N(α n ) = h(f , α) ≤ h(f ).
n n

Making ε → 0, we find that s(f ) = s(f , M) ≤ h(f ).


Next, given any open cover α of M, let ε > 0 be a Lebesgue number for
α, that is, a positive number such that every ball of radius ε is contained in
some element of α. Let E ⊂ M be an (n, ε)-generating set of M with minimal
cardinality. For each x ∈ E and i = 0, . . . , n − 1, there exists Ax,i ∈ α such that
B(f i (x), ε) is contained in Ax,i . Then,

n−1
B(x, n, ε) ⊂ f −i (Ax,i ).
i=0

Therefore, the hypothesis that E is a generating set implies that the family
n−1 −i
γ = { i=0 f (Ax,i ) : x ∈ E} is an open cover of M. Since γ ⊂ α n , it follows
that N(α n ) ≤ #E = gn (f , ε, M) for every n. Therefore,
1 1
h(f , α) = lim log N(α n ) ≤ lim inf log gn (f , ε, M)
n n n n
(10.1.15)
1
≤ lim sup log gn (f , ε, M) = g(f , ε, M).
n n
Making ε → 0, we get that h(f , α) ≤ g(f , M) = g(f ). Since the open cover α is
arbitrary, it follows that h(f ) ≤ g(f ).

We define the topological entropy of a continuous transformation f :


M → M in a metric space M to be g(f ) = s(f ). Proposition 10.1.6 shows
that this definition is compatible with the one we gave in Section 10.1.1
308 Variational principle

for transformations in compact topological spaces. A relevant difference is


that, while for compact spaces the topological entropy depends only on the
topology (because h(f ) is defined solely in terms of the open sets), in the
non-compact case the topological entropy may also depend on the distance
function in M. In this regard, see Exercises 10.1.4 and 10.1.5. They also
show that in the non-compact case the topological entropy is no longer
an invariant of topological conjugacy, although it remains an invariant of
uniformly continuous conjugacy.
Example 10.1.7. Assume that f : M → M does not expand distances, that is,
that d(f (x), f (y)) ≤ d(x, y) for every x, y ∈ M. Then the topological entropy
of f is equal to zero. Indeed, the hypothesis implies that B(x, n, ε) = B(x, ε)
for every n ≥ 1. Hence, a set E is (n, ε)-generating if and only if it is
(1, ε)-generating. In particular, the sequence gn (f , ε, K) does not depend on
n and, hence, g(f , ε, K) = 0 for every ε > 0 and every compact set K. Making
ε → 0 and taking the supremum over K we get that g(f ) = 0 (analogously,
s(f ) = 0).
There are two important special cases: contractions, such that there exists
λ < 1 satisfying d(f (x), f (y)) ≤ λd(x, y) for every x, y ∈ M; and isometries, such
that d(f (x), f (y)) = d(x, y) for every x, y ∈ M. We saw in Lemma 6.3.6 that every
compact metrizable group admits a distance relative to which every translation
is an isometry. Therefore, it also follows from the previous observations that
the topological entropy of every translation in a compact metrizable group is
zero.
Recalling that g(f ) = g(f , M) and s(f ) = s(f , M) when M is compact, we see
that the conclusion of Proposition 10.1.6 may be rewritten as follows:
1 1
h(f ) = lim lim sup log gn (f , ε, M) = lim lim sup log sn (f , ε, M).
ε→0 n n ε→0 n n
From the proof of the proposition we may also obtain the following related
identity:
Corollary 10.1.8. If f : M → M is a continuous transformation in a compact
metric space then
1 1
h(f ) = lim lim inf log gn (f , ε, M) = lim lim inf log sn (f , ε, M).
ε→0 n n ε→0 n n
Proof. The relation (10.1.15) gives that
1
h(f , α) ≤ lim inf log gn (f , ε, M)
n n
whenever ε > 0 is a Lebesgue number for the open cover α. Making ε → 0,
we conclude that
1
h(f ) ≤ lim lim inf log gn (f , ε, M). (10.1.16)
ε→0 n n
10.1 Topological entropy 309

The first inequality in Lemma 10.1.5 implies that


1 1
lim lim inf log gn (f , ε, M) ≤ lim lim inf log sn (f , ε, M). (10.1.17)
ε→0 n n ε→0 n n
Also, it is clear that
1 1
lim lim inf log sn (f , ε, M) ≤ lim lim sup log sn (f , ε, M). (10.1.18)
ε→0 n n ε→0 n n
As we have just observed, the expression on the right-hand side is equal to
h(f ). Therefore, the inequalities (10.1.16)–(10.1.18) imply the conclusion.

10.1.3 Calculation and properties


We start by proving a version of Lemma 9.1.13 for the topological entropy.
The proof is a bit more elaborate because, unlike what happens for partitions,
given an open cover α the covers (α k )n and α n+k−1 need not coincide if the
elements of α are not pairwise disjoint.
Example 10.1.9. Let f : M → M be the shift map in M = {1, 2, 3}N (or M =
{1, 2, 3}Z ) and α be the open cover of M consisting of the cylinders [0; {1, 2}]
and [0; {1, 3}]. For each n ≥ 1, the cover α n consists of the 2n cylinders of
the form [0; A0 , . . . , An−1 ] with Aj = [0; {1, 2}] or Aj = [0; {1, 3}]. In particular,
#α 3 = 8. On the other hand, (α 2 )2 contains 12 elements: the 8 elements of α 3
together with the 4 cylinders of the form [0; A0 , {1}, A2 ] with Aj = [0; {1, 2}] or
Aj = [0; {1, 3}] for j = 0 and j = 2. Hence, α n+k−1 = (α k )n for n = k = 2.
Proposition 10.1.10. Let M be a compact topological space, f : M → M be
a continuous transformation and α be an open cover of M. Then h(f , α) =
h(f , α k ) for every k ≥ 1. Moreover, if f : M → M is a homeomorphism then
*k−1 −j
h(f , α) = h(f , α ±k ) for every k ≥ 1, where α ±k = j=−k f (α).

Proof. The main point is to show that the open covers (α k )n and α n+k−1 have
the same entropy, for every n ≥ 1. We use the following simple fact, which will
be useful again later:
Lemma 10.1.11. Given any open cover α and any n, k ≥ 1,
1. α n+k−1 is a subcover of (α k )n and, in particular, (α k )n ≺ α n+k−1 ;
2. for any subcover β of (α k )n there exists a subcover γ of α n+k−1 such that
#γ ≤ #β and γ ≺ β.
n+k−2 −l
Proof. By definition, every element α n+k−1 has the form B = l=0 f (Bl )
with Bl ∈ α for every l. It is clear that this may be written in the form
n−1 −i k−1 −j
B = i=0 f j=0 f (Bi+j ) and, thus, B ∈ (α ) . This proves the first claim.
k n

Next, let β be a subcover of (α k )n . Every element of β has the form



n−1 +
k−1 % 
n+k−2 + %
−i −j
A= f f (Ai,j ) = f −l Ai,j ,
i=0 j=0 l=0 i+j=l
310 Variational principle

with Ai,j ∈ α. Consider B = n+k−2


l=0 f −l (Bl ), where Bl = Ai,j for some pair (i, j)
such that i + j = l. Observe that A ⊂ B and B ∈ α n+k−1 . Therefore, the family γ
formed by all the sets B obtained in this way satisfies all the conditions in the
second claim.

According to the relation (10.1.2), the first part of Lemma 10.1.11 implies
that H((α k )n ) ≤ H(α n+k−1 ). Clearly, the second part of the lemma implies the
opposite inequality. Hence,
H(α n+k−1 ) = H((α k )n ) for any n, k ≥ 1, (10.1.19)
as we claimed. Therefore,
1 1
h(f , α k ) = lim H((α k )n ) = lim H(α n+k−1 ) = h(f , α) for every k.
n n n n

When f is invertible, it follows from the definitions that α ±k = f k (α 2k ). Using


Exercise 10.1.3, we get that h(f , α ±k ) = h(f , f k (α 2k )) = h(f , α 2k ) = h(f , α).

The next proposition and its corollary simplify the calculation of the
topological entropy significantly in concrete examples. Recall that, when M
is a metric space, the diameter of an open cover is defined to be the supremum
of the diameters of its elements.
Proposition 10.1.12. Assume that M is a compact metric space. Let (βk )k be
any sequence of open covers of M such that diam βk converges to zero. Then
h(f ) = sup h(f , βk ) = lim h(f , βk ).
k k

Proof. Given any open cover α, let ε > 0 be a Lebesgue number of α. Take
n ≥ 1 such that diam βk < ε for every k ≥ n. By the definition of Lebesgue
number, it follows that every element of βk is contained in some element of α.
In other words, α ≺ βk and, hence, h(f , βk ) ≥ h(f , α). In view of the definition
(10.1.5), this proves that
lim inf h(f , βk ) ≥ h(f ).
k

It is also clear from the definitions that h(f ) ≥ supk h(f , βk ) ≥ lim supk h(f , βk ).
Combining these observations, we obtain the conclusion of the proposition.

Corollary 10.1.13. Assume that M is a compact metric space. If β is an open


cover such that
*k−1 −j
(1) the diameter of the one-sided iterated sum β k = j=0 f (β) converges to
zero when k → ∞, or
(2) f : M → M is a homeomorphism and the diameter of the two-sided iterated
*
sum β ±k = k−1 −j
j=−k f (β) converges to zero when k → ∞,

then h(f ) = h(f , β).


10.1 Topological entropy 311

Proof. In case (1), Propositions 10.1.10 and 10.1.12 yield


h(f ) = lim h(f , β k ) = h(f , β).
k

The proof in case (2) is analogous.

Next, we check that the topological entropy behaves as one could expect
with respect to positive iterates, at least when the transformation is uniformly
continuous:

Proposition 10.1.14. If f : M → M is a uniformly continuous transformation


in a metric space then h(f k ) = kh(f ) for every k ∈ N.

Proof. Fix k ≥ 1 and let K ⊂ M be any compact set. Consider any n ≥ 1


and ε > 0. It is clear that if E ⊂ M is an (nk, ε)-generating set of K for the
transformation f then it is also an (n, ε)-generating set of K for the iterate f k .
Therefore, gn (f k , ε, K) ≤ gnk (f , ε, K). Hence,
1 1
g(f k , ε, K) = lim gn (f k , ε, K) ≤ lim gnk (f , ε, K) = kg(f , ε, K).
n n n n

Making ε → 0 and taking the supremum over K, we see that h(f k ) ≤ kh(f ).
The proof of the other inequality uses the assumption that f is uniformly
continuous. Take δ > 0 such that d(x, y) < δ implies d(f j (x), f j (y)) < ε for
every j ∈ {0, . . . , k − 1}. If E ⊂ M is an (n, δ)-generating set of K for f k then
E is an (nk, ε)-generating set of K for f . Therefore, gnk (f , ε, K) ≤ gn (f k , δ, K).
This shows that kg(f , ε, K) ≤ g(f k , δ, K). Making ε and δ go to zero, we get that
kg(f , K) ≤ g(f k , K) for every compact set K. Hence, kh(f ) ≤ h(f k ).

In particular, Proposition 10.1.14 holds for every continuous transformation


in a compact metric space. On the other hand, in the case of homeomorphisms
in compact spaces the conclusion extends to negative iterates:

Proposition 10.1.15. If f : M → M is a homeomorphism of a compact metric


space then h(f −1 ) = h(f ). Consequently, h(f n ) = |n|h(f ) for every n ∈ Z.

Proof. Let α be an open cover of M. For every n ≥ 1, denote


α+
n
= α ∨ f −1 (α) ∨ · · · ∨ f −n+1 (α) and α−
n
= α ∨ f (α) ∨ · · · ∨ f n−1 (α).
Observe that α−
n
= f n−1 (α+
n
). Moreover, γ is a finite subcover of α+
n
if and only
if f (γ ) is a finite subcover of α− . Since the two subcovers have the same
n−1 n

number of elements, it follows that H(α+ n


) = H(α− n
). Therefore,
1 1
h(f , α) = lim H(α+
n
) = lim H(α−
n
) = h(f −1 , α).
n n n n

Since α is arbitrary, this proves that h(f ) = h(f −1 ). The second part of the
statement follows from combining the first part with Proposition 10.1.14.
312 Variational principle

The claim in Proposition 10.1.15 is generally false when the space M is not
compact:

Example 10.1.16. Let M = R with the distance d(x, y) = |x − y| and take f :


R → R to be given by f (x) = 2x. We are going to check that h(f ) = h(f −1 ). Let
K = [0, 1] and, given n ≥ 1 and ε > 0, take E ⊂ R to be any (n, ε)-generating
set of K. In particular, every point of f n−1 (K) = [0, 2n−1 ] is within less than ε
from some point of f n−1 (E). Hence,

2ε#E = 2ε#f n−1 (E) ≥ 2n−1 .

This proves that gn (f , ε, K) ≥ 2n−2 /ε for every n and, thus, g(f , ε, K) ≥ log 2. It
follows that h(f ) ≥ g(f , K) ≥ log 2. On the other hand, f −1 is a contraction and
so it follows from Example 10.1.7 that its topological entropy h(f −1 ) is zero.

10.1.4 Exercises
10.1.1. Let M be a compact topological space. Show that if α and β are open covers of
M such that α ≺ β then H(α) ≤ H(β).
10.1.2. Let f : M → M be a continuous transformation and α, β be open covers of
a compact topological space M. Show that H(α ∨ β) ≤ H(α) + H(β) and
H(f −1 (β)) ≤ H(β). Check that if f is surjective then H(f −1 (β)) = H(β).
10.1.3. Let M be a compact topological space. Show that if f : M → M is a surjective
continuous transformation and β is an open cover of M then h(f , β) =
h(f , f −1 (β)). Moreover, if f is a homeomorphism then h(f , β) = h(f , f (β)).
10.1.4. Let M = (0, ∞) and f : M → M be given by f (x) = 2x. Calculate the topological
entropy of f when one considers in M:
(a) the usual distance d(x, y) = |x − y|;
(b) the distance d(x, y) = | log x − log y|.
[Observation: Hence, in non-compact spaces the topological entropy may
depend on the distance function, not just the topology.]
10.1.5. Consider in M two distances d1 and d2 that are uniformly equivalent: for every
ε > 0 there exists δ > 0 such that

d1 (x, y) < δ ⇒ d2 (x, y) < ε and d2 (x, y) < δ ⇒ d1 (x, y) < ε.

Show that if f : M → M is continuous with respect to either of the two


distances then the value of the topological entropy is the same relative to both
distances.
10.1.6. Let f : M → M and g : N → N be continuous transformations in compact metric
spaces. Show that if there exists a continuous injective map ψ : M → N such that
ψ ◦ f = g ◦ ψ then h(f ) ≤ h(g). Use this fact to show that the topological entropy
of the shift map σ : [0, 1]Z → [0, 1]Z is infinite (thus, the topological entropy of a
homeomorphism of a compact space need not be finite). [Observation: The first
claim remains valid for non-compact spaces, as long as we require the inverse
ψ −1 : ψ(M) → M to be uniformly continuous.]
10.2 Examples 313

10.1.7. Show that if K, K1 , . . . , Kl are compact sets such that K is contained in K1 ∪ · · · ∪


Kl then g(f , K) ≤ maxj g(f , Kj ). Conclude that, given any δ > 0,

g(f ) = sup{g(f , K) : K compact with diam K < δ}

and analogously for s(f ).


10.1.8. Prove that the logistic map f : [0, 1] → [0, 1], f (x) = 4x(1 − x) is topologically
conjugate to the map g : [0, 1] → [0, 1] defined by g(x) = 1 − |2x − 1|. Use this
fact to calculate h(f ).
10.1.9. Let A be a finite alphabet and σ :  →  be the shift map in  = AN . The
complexity of a sequence x ∈  is defined by c(x) = limn n−1 log cn (x), where
cn (x) is the number of distinct words of length n that appear in x. Show that
this limit exists and coincides with the topological entropy of the restriction
σ : X → X of the shift map to the closure X of the orbit of x. [Observation:
One interesting application we have in mind is in the context of Example 6.3.10,
where x is the fixed point of a substitution.]
10.1.10. Check that if θ is the fixed point of the Fibonacci substitution in A = {0, 1}
(see Example 6.3.10) then cn (θ ) = n + 1 for every n and so the complexity c(θ )
is equal to zero. Hence, the topological entropy of the shift map σ : X → X
associated with the Fibonacci substitution is equal to zero.

10.2 Examples
Let us use a few concrete situations to illustrate the ideas introduced in the
previous section.

10.2.1 Expansive maps


Recall (Section 9.2.3) that a continuous transformation f : M → M in a
compact metric space is said to be expansive if there exists ε0 > 0 such that
d(f j (x), f j (y)) < ε0 for every j ∈ N implies that x = y. When f : M → M is
invertible, we say that it is two-sided expansive if there exists ε0 > 0 such that
d(f j (x), f j (y)) < ε0 for every j ∈ Z implies that x = y. In both cases, ε0 is called
a constant of expansivity for f .
Proposition 10.2.1. If ε0 > 0 is a constant of expansivity for f then

(i) h(f ) = h(f , α) for every open cover α with diameter less than ε0 ;
(ii) h(f ) = g(f , ε, M) = s(f , ε, M) for every ε < ε0 /2.

In particular, h(f ) < ∞.

Proof. Let α be any open cover of M with diameter less than ε0 . We claim that
limk diam α k = 0. Indeed, suppose that this is not so. It is clear that the sequence
of diameters is non-increasing. Then, there exists δ > 0 and for each k ≥ 1
there exist points xk and yk in the same element of α k such that d(xk , yk ) ≥ δ.
314 Variational principle

By compactness, we may find a subsequence (kj )j such that both x = limj xkj
and y = limj ykj exist. On the one hand, d(x, y) ≥ δ and so x = y. On the other
hand, the fact that xk and yk are in the same element of α k implies that
d(f i (xk ), f i (yk )) ≤ diam α for every 0 ≤ i < k.
Passing to the limit, we get that d(f i (x), f i (y)) ≤ diam α < ε0 for every i ≥ 0.
This contradicts the hypothesis that ε0 is a constant of expansivity for f . This
contradiction proves our claim. Using Corollary 10.1.13, it follows that h(f ) =
h(f , α), as claimed in part (i).
To prove part (ii), let α be the open cover of M formed by the balls of
radius ε. Note that α n contains every dynamical ball B(x, n, ε):

n−1
 
B(x, n, ε) = f −j B(f j (x), ε) and each B(f j (x), ε) ∈ α.
j=0

If E is an (n, ε)-generating set of M then {B(a, n, ε) : a ∈ E} is an open cover of


M; in view of what we have just said, it is a subcover of α n . Therefore (recall
also Lemma 10.1.5),
N(α n ) ≤ gn (f , ε, M) ≤ sn (f , ε, M) for every n.
Passing to the limit, we get that h(f , α) ≤ g(f , ε, M) ≤ s(f , ε, M). Recall that
s(f , ε, M) ≤ s(f , M) = h(f ). Since diam α < ε0 , the first part of the proposition
yields that h(f ) = h(f , α). These relations imply part (ii).
The last claim in the proposition is a direct consequence, since g(f , ε, M),
s(f , ε, M) and h(f , α) are always finite. Indeed, that h(f , α) < ∞ for every open
cover was observed right after the definition (10.1.3). Then (10.1.14) implies
that s(f , ε, M) < ∞ and (10.1.13) implies that g(f , ε, M) < ∞ for every ε > 0.

Exercise 10.2.8 contains an extension of Proposition 10.2.1 to h-expansive


transformations, due to Rufus Bowen [Bow72]. Exercise 10.1.6 shows that
the topological entropy of a continuous transformation, or even a homeomor-
phism, in a compact metric space may be infinite, if one omits the expansivity
assumption.
Next, we prove that for expansive maps the topological entropy is an upper
bound on the rate of growth of the number of periodic points. Let Fix(f n )
denote the set of all points x ∈ M such that f n (x) = x.

Proposition 10.2.2. If M is a compact metric space and f : M → M is


expansive then
1
lim sup log # Fix(f n ) ≤ h(f ).
n n
Proof. Let ε0 be a constant of expansivity for f and α be any open cover of
M with diam α < ε0 . We claim that every element of α n contains at most one
point of Fix(f n ). Indeed, if x, y ∈ Fix(f n ) are in the same element of α n then
10.2 Examples 315

d(f i (x), f i (y)) < diam α < ε0 for every i = 0, . . . , n − 1. Since f n (x) = x and
f n (y) = y, it follows that d(f i (x), f i (y)) < ε0 for every i ≥ 0. By expansivity,
this implies that x = y, which proves our claim. It follows that
1 1
lim sup log # Fix(f n ) ≤ lim sup log N(α n ) = h(f , α).
n n n n
Taking the limit when the diameter of α goes to zero, we get the conclusion of
the proposition.

In some interesting situations, one can show that the topological entropy
actually coincides with the rate of growth of the number of periodic points:
1
lim log # Fix(f n ) = h(f ). (10.2.1)
n n

That is the case, for example, for the shifts of finite type, which we are going
to study in Section 10.2.2 (check Proposition 10.2.5 below). More generally,
(10.2.1) holds whenever f : M → M is an expanding transformation in a
compact metric space, as we are going to see in Section 11.3.

10.2.2 Shifts of finite type


Let X = {1, . . . , d} be a finite set and A = (Ai,j )i,j be a transition matrix, that
is, a square matrix of dimension d ≥ 2 with coefficients in the set {0, 1} and
such that no row is identically zero: for every i there exists j such that Ai,j = 1.
Consider the subset A of  = X N consisting of all the sequences (xn )n ∈ 
that are A-admissible, meaning that
Axn ,xn+1 = 1 for every n ∈ N. (10.2.2)
It is clear that A is invariant under the shift map σ :  → , in the sense
that σ (A ) ⊂ A . Note also that A is closed in  and, hence, it is a compact
metric space (this is similar to Lemma 7.2.5).
The restriction σA : A → A of the shift map σ :  →  to this invariant
compact set is called the one-sided shift of finite type associated with A. The
two-sided shift of finite type associated with a transition matrix A is defined
analogously, considering  = X Z and requiring (10.2.2) for every n ∈ Z. In
this case, as part of the definition of a transition matrix, we also require the
columns (not just the rows) of A to be non-zero.
The restriction of the shift map σ :  →  to the support of any Markov
measure is a shift of finite type:
Example 10.2.3. Given a stochastic matrix P = (Pi,j )i,j , define A = (Ai,j )i,j by

1 if Pi,j > 0
Ai,j =
0 if Pi,j = 0.
Note that A is a transition matrix: the definition of a stochastic matrix implies
that no row P is identically zero (in the two-sided situation we must assume
316 Variational principle

1 2

3 4

Figure 10.1. Graph associated with a transition matrix

that the columns of P are also not zero; this is automatic, for example, if the
matrix P is aperiodic). Comparing (7.2.7) and (10.2.2), we see that a sequence
is A-admissible if and only if it is P-admissible. Let μ be the Markov measure
determined by a probability vector p = (pj )j with positive coefficients and
such that P∗ p = p (recall Example 7.2.2). By Lemma 7.2.5, the support of
μ coincides with the set A = P of all admissible sequences.
It is useful to associate with any transition matrix A the oriented graph whose
vertices are the points of X = {1, . . . , d} and such that there exists an edge from
vertex a to vertex b if and only if Aa,b = 1. In other words,
GA = {(a, b) ∈ X × X : Aa,b = 1}.
For example, Figure 10.1 describes the graph associated with the matrix
⎛ ⎞
0 1 1 0
⎜ 1 1 0 1 ⎟
A=⎜ ⎝ 1 0 1 0 ⎠.

1 0 0 1
A path of length l ≥ 1 in the graph GA is a sequence a0 , . . . , al in X such that
Aai−1 ,ai = 1 for every i, that is, such that there always exists an edge connecting
ai−1 to ai . Given a, b ∈ X and l ≥ 1, denote by Ala,b the number of paths of length
l starting at a and ending at b, that is, with a0 = a and al = b. Observe that:

1. A1a,b = 1 if there exists an edge connecting a to b and A1a,b = 0 otherwise. In


other words, A1a,b = Aa,b for every a, b.
2. The paths of length l + m starting at a and ending at b are the concatenations
of the paths of length l starting at a and ending at some point z ∈ X with the
paths of length m starting at that point z and ending at b. Therefore,

d
Al+m
a,b = Ala,z Am
z,b for every a, b ∈ X and every l, m ≥ 1.
z=1

It follows, by induction on l, that Ala,b coincides with the coefficient in row a


and column b of the matrix Al .
10.2 Examples 317

The basic topological properties of shifts of finite type are analyzed in


Exercise 10.2.2. In the proposition that follows we calculate the topological
entropy of these transformations. We need a few prior observations about
transition matrices.
Recall that the spectral radius ρ(B) of a linear map B : Rd → Rd (that is, the
largest absolute value of an eigenvalue of B) is given by
ρ(B) = lim Bn 1/n = lim | trc Bn |1/n , (10.2.3)
n n

where trc denotes the trace of the matrix and  ·  denotes any norm in
the vector space of linear maps (all norms are equivalent, as we are in
finite dimension). Most of the time, one uses the operator norm B =
sup{Bv/v : v = 0}, but it will also be useful to consider the norm  · s
defined by
d
Bs = |Bi,j |.
i,j=1

Now take A to be a transition matrix. Since the coefficients of A are


non-negative, we may use the Perron–Frobenius theorem (Theorem 7.2.3)
to conclude that A admits a non-negative eigenvalue λA that is equal to the
spectral radius. By our definition of the transition matrix, we also have that
all the rows of A are non-zero. Then the same is true about An , for any n ≥ 1
(Exercise 10.2.5). This implies that all the coefficients of the vector An (1, . . . , 1)
are positive (and integer) and, thus,
An (1, . . . , 1)
A  ≥
n
≥ 1 for every n ≥ 1.
(1, . . . , 1)
Using (10.2.3), we get that λA = ρ(A) ≥ 1 for every transition matrix A.
Proposition 10.2.4. The topological entropy h(σA ) of a shift of finite type σA :
A → A is given by h(σA ) = log λA , where λA is the largest eigenvalue of the
transition matrix A.

Proof. We treat the case of one-sided shifts; the two-sided case is analogous,
as the reader may readily check. Consider the open cover α of A formed by
the restrictions
[0; a]A = {(xj )j ∈ A : x0 = a}
of the cylinders [0; a] of . For each n ≥ 1, the open cover α n is formed by the
restrictions
[0; a0 , . . . , an−1 ]A = {(xj )j ∈ A : xj = aj for j = 0, . . . , n − 1}
of the cylinders of length n. Observe that [0; a0 , . . . , an−1 ]A is non-empty if and
only if a0 , . . . , an−1 is a path (of length n − 1) in the graph GA : it is evident that
this condition is necessary; to see that it is also sufficient, use the assumption
that for every i there exists j such that Ai,j = 1. Since the cylinders are pairwise
318 Variational principle

disjoint, this observation shows that N(α n ) is equal to the total number of paths
of length n − 1 in the graph GA . In other words,

d
N(α ) =
n n−1
Ai,j = An−1 s .
i,j=1

By the spectral radius formula (10.2.3), it follows that


1 1
h(σA , α) = lim log N(α n ) = lim log An−1 s = log ρ(A) = log λA .
n n n n

Finally, since diam α n → 0, Corollary 10.1.13 yields that h(σA ) = h(σA , α).

Proposition 10.2.5. If σA : A → A is a shift of finite type then


1
h(σA ) = lim log # Fix(σAn ).
n n

Proof. We treat the case of one-sided shifts, leaving the two-sided case for the
reader. Note that (xk )k ∈ A is a fixed point of σAn if and only if xk = xk−n for
every k ≥ n. In particular, every cylinder [0; a0 , . . . , an−1 ]A contains at most one
element of Fix(σAn ). Moreover, the cylinder does contain a fixed point if and
only if a0 , . . . , an−1 , a0 is a path (of length n) in the graph GA . This proves that

d
# Fix(σAn ) = Ani,i = trc An
i=1

for every n. Consequently,


1 1
lim log # Fix(σAn ) = lim log trc An = log ρ(A).
n n n n

Now the conclusion is a direct consequence of the previous proposition.

10.2.3 Topological entropy of flows


The definition of topological entropy extends easily to the context of
continuous flows φ = {φ t : M → M : t ∈ R} in a metric space M, as we now
explain.
Given x ∈ M and T > 0 and ε > 0, the dynamical ball of center x, length T
and radius ε > 0 is the set
B(x, T, ε) = {y ∈ M : d(φ t (x), φ t (y)) < ε for every 0 ≤ t ≤ T}.
Let K be any compact subset of M. We say that E ⊂ M is a (T, ε)-generating
set for K if 
K⊂ B(x, T, ε),
x∈E

and we say that E ⊂ K is a (T, ε)-separated set if the dynamical ball B(x, T, ε)
of each x ∈ E contains no other element of E.
10.2 Examples 319

Denote by gT (φ, ε, K) the smallest cardinality of a (T, ε)-generating set of K


and by sT (φ, ε, K) the largest cardinality of a (T, ε)-separated set E ⊂ K. Then,
take
1
g(φ, K) = lim lim sup log gT (φ, ε, K) and
ε→0 T→∞ T

1
s(φ, K) = lim lim sup log sT (φ, ε, K)
ε→0 T→∞ T
and define
g(φ) = sup g(φ, K) and s(φ) = sup s(φ, K),
K K

where both suprema are taken over all the compact sets K ⊂ M.
The next result, a continuous-time analogue of Proposition 10.1.4, ensures
that these two last numbers coincide. We leave the proof up to the reader
(Exercise 10.2.3). By definition, the topological entropy of the flow φ is the
number h(φ) = g(φ) = s(φ).

Proposition 10.2.6. We have g(φ, K) = s(φ, K) for every compact K ⊂ M.


Consequently, g(φ) = s(φ).

In the statement that follows we take the flow to be uniformly continuous,


that is, such that for every T > 0 and ε > 0 there exists δ > 0 such that

d(x, y) < δ ⇒ d(φ t (x), φ t (y)) < ε for every t ∈ [−T, T].

Observe that this is automatic for continuous flows when M is compact.

Proposition 10.2.7. If the flow φ is uniformly continuous then its topological


entropy h(φ) coincides with the topological entropy h(φ 1 ) of its time-1 map.

Proof. It suffices to prove that g(φ, K) = g(φ 1 , K) for every compact K ⊂ M.


It is clear that if E ⊂ M is (T, ε)-generating for K relative to the flow φ then
E is also (n, ε)-generating for K relative to the time-1 map, for any n ≤ T + 1.
In particular, gn (φ 1 , ε, K) ≤ gT (φ, ε, K). It follows that
1 1
lim sup log gn (φ 1 , ε, K) ≤ lim sup log gT (φ, ε, K),
n n T→∞ T

and so g(φ 1 , K) ≤ g(φ, K).


The hypothesis of uniform continuity is used for the opposite inequality.
Given ε > 0, fix δ ∈ (0, ε) such that if d(x, y) < δ then d(φ t (x), φ t (y)) < ε for
every t ∈ [0, 1]. If E ⊂ M is an (n, δ)-generating set of K relative to φ 1 then E is
a (T, ε)-generating set of K relative to the flow φ, for any T ≤ n. In particular,
gT (φ, ε, K) ≤ gn (φ 1 , δ, K). It follows that
1 1
lim sup log gT (φ, ε, K) ≤ lim sup log gn (φ 1 , δ, K)
T→∞ T n n
320 Variational principle

(given a sequence (Tj )j that realizes the supremum on the left-hand side,
consider the sequence (nj )j given by nj = [Tj ]+1). Making ε → 0 (then δ → 0),
we get that g(φ, K) ≤ g(φ 1 , K).

We have seen previously that for transformations the topological entropy is


an invariant of topological (uniformly continuous) conjugacy. The same is true
for flows: this follows from Proposition 10.2.7 and the obvious observation
that any flow conjugacy also conjugates the corresponding time-1 maps.
However, in the continuous-time context, one more often uses the concept
of topological equivalence, which allows for rescaling of time. Clearly,
topological equivalence need not preserve the topological entropy.

10.2.4 Differentiable maps


In this section we take M to be a Riemannian manifold (Appendix A.4.5).
Let f : M → M be a differentiable map and Df (x) : Tx M → Tf (x) M denote the
derivative of f at each point x ∈ M. Our goal is to prove that the norm of the
derivative, defined by
 
Df (x)v
Df (x) = sup : v ∈ Tx M and v = 0 ,
v||
determines an upper bound for the topological entropy h(f ) of f . For x > 0, we
denote log+ x = max{log x, 0}.

Proposition 10.2.8. Let f : M → M be a differentiable map in a Riemannian


manifold of dimension d such that Df  is bounded. Then
h(f ) ≤ d log+ sup Df  < ∞.

Proof. Let L = sup{Df (x) : x ∈ M}. By the mean value theorem,


d(f (x), f (y)) ≤ Ld(x, y) for every x, y ∈ M.
If L ≤ 1 then, as we have seen in Example 10.1.7, the entropy of f is zero.
Thus, from now on we may suppose that L > 1.
Let A be an atlas of the manifold M consisting of charts ϕα : Uα → Xα with
Xα = (−2, 2)d . Given any compact set K ⊂ M, we may find a finite family
AK ⊂ A such that
 −1 
ϕα ((−1, 1)d ) : ϕα ∈ AK
covers K. Fix B > 0 such that d(u, v) ≤ Bd(ϕα (u), ϕ √α (v))−nfor all u, v ∈ [−1, 1]d
and ϕα ∈ AK . Given n ≥ 1 and ε > 0, fix δ = (ε/B d)L . Denote by δZd the
set of all points of the form (δk1 , . . . , δkd ) with kj ∈ Z for every j = 1, . . . , d. Let
E ⊂ M be the union of the pre-images ϕα−1 (δZd ∩ (−1, 1)d ), with √ ϕ α ∈ AK .
Note that every point of (−1, 1) is at a distance less than δ d from some
d

point of δZd ∩ (−1, 1)d . Therefore, for any ϕα ∈ AK , every x ∈ ϕα−1 ((−1, 1)d )
10.2 Examples 321

is at a distance less than Bδ d from some point a ∈ ϕ(δZd ∩ (−1, 1)d ). Then,
by the choice of δ,
√ √
d(f j (x), f j (a)) ≤ Lj Bδ d < Ln Bδ d = ε
for every j = 0, . . . , n − 1. This proves that E is an (n, ε)-generating set for K.
On the other hand, by construction,
  √
#E ≤ #AK # δZd ∩ (−1, 1)d ≤ #AK (2/δ)d ≤ #AK (2B dLn /ε)d ,
so the expression on the right-hand side is an upper bound for gn (f , ε, K).
Consequently,
1 √
g(f , ε, K) ≤ lim sup log(2B dLn /ε)d = d log L.
n n
Making ε → 0 and taking the supremum over K, we get that h(f ) ≤ d log L.

Combining Propositions 10.1.14 and 10.2.8, we find that


1
log+ sup Df n  for every n ≥ 1.
h(f ) ≤
n
When f is a homeomorphism, using Proposition 10.1.15 we also get that
1
h(f ) ≤ log+ sup Df −n  for every n ≥ 1.
n
The following conjecture of Michael Shub [Shu74] is central to the theory
of topological entropy:
Conjecture 10.2.9 (Entropy conjecture). If f : M → M is a diffeomorphism of
class C1 in a Riemannian manifold of dimension d, then
h(f ) ≥ max log ρ(fk ), (10.2.4)
1≤k≤d

where each ρ(fk ) denotes the spectral radius of the action fk : Hk (M) → Hk (M)
induced by f in the real homology of dimension k.
The full statement of the conjecture remains open to date, but several partial
answers and related results have been obtained, both positive and negative. Let
us summarize what is known in this regard.
It follows from a result of Yano [Yan80] that the inequality (10.2.4) is true
for an open and dense subset of the space of homeomorphisms in any manifold
of dimension d ≥ 2. Moreover, it is true for every homeomorphism in certain
classes of manifolds, such as the spheres or the infranilmanifolds [MP77b,
MP77a, MP08]. On the other hand, Shub [Shu74] exhibited a Lipschitz
homeomorphism, with zero topological entropy, for which (10.2.4) is false.
See Exercise 10.2.7.
A useful way to approach (10.2.4) is by comparing the topological entropy
with each one of the spectral radii ρ(fk ). The case k = d is relatively easy.
Indeed, for any continuous map f in a manifold of dimension d, the spectral
322 Variational principle

radius ρ(fd ) is equal to the absolute value | deg f | of the degree of the map. In
particular, the inequality h(f ) ≥ log ρ(fd ) is trivial for any homeomorphism.
For non-invertible continuous maps, the topological entropy may be less than
the logarithm of the absolute value of the degree. However, it was shown in
[MP77b] that for differentiable maps one always has h(f ) ≥ log | deg f |.
Anthony Manning [Man75] proved that the inequality h(f ) ≥ log ρ(f1 ) is
true for every homeomorphism in a manifold of any dimension d. It follows
that h(f ) ≥ log ρ(fd−1 ), since the duality theorem of Poincaré implies that
ρ(fk ) = ρ(fd−k ) for every 0 < k < d.
In particular, the theorem of Manning together with the observations in
the previous paragraph prove that entropy conjecture is true for every
homeomorphisms in any manifold of dimension d ≤ 3.
Rufus Bowen [Bow78] proved that for any homeomorphism in a manifold
the topological entropy h(f ) is greater than or equal to the logarithm of the rate
of growth of the fundamental group. One can show that this rate of growth is
greater than or equal to ρ(f1 ). Thus, this result of Bowen implies the theorem
of Manning that we have just mentioned.
The main result concerning the entropy conjecture is the theorem of
Yosef Yomdin [Yom87], according to which the conjecture is true for every
diffeomorphism of class C∞ . The crucial ingredient in the proof is a relation
between the topological entropy h(f ) and the diffeomorphism’s rate of growth
of volume, which is defined as follows. For each 1 ≤ k < d, let Bk be the unit
ball in Rk . Denote by v(σ ) the k-dimensional volume of the image of any
differentiable embedding σ : Bk → M. Then, define
1
vk (f ) = sup lim sup log v(f n ◦ σ ),
σ n n
where the supremum is taken over all the embeddings σ : Bk → M of class C∞ .
Define also v(f ) = max{vk (f ) : 1 ≤ k < d}. It is not difficult to check that
log ρ(fk ) ≤ vk (f ) for every 1 ≤ k < d. (10.2.5)
On the one hand, Sheldon Newhouse [New88] proved that h(f ) ≤ v(f ) for every
diffeomorphism of class Cr with r > 1. On the other hand, Yomdin [Yom87]
proved the opposite inequality:
v(f ) ≤ h(f ), (10.2.6)
for every diffeomorphism of class C∞ (this inequality is false, in general, in the
Cr case with r < ∞). Combining (10.2.5) with (10.2.6), one gets the entropy
conjecture (10.2.4) for every diffeomorphism of class C∞ .
Concerning systems of class C1 , it is also known that the inequality (10.2.4)
is true for every Axiom A diffeomorphism with no cycles [SW75], for certain
partially hyperbolic diffeomorphisms [SX10] and, more generally, for any C1
diffeomorphism far from homoclinic tangencies [LVY13].
10.2 Examples 323

10.2.5 Linear endomorphisms of the torus


In this section we calculate the topological entropy of the linear endomor-
phisms of the torus:
Proposition 10.2.10. Let fA : Td → Td be the endomorphism induced on the
torus Td by some square matrix A of dimension d with integer coefficients and
non-zero determinant. Then

d
h(fA ) = log+ |λj |, (10.2.7)
j=1

where λ1 , . . . , λd are the eigenvalues of A, counted with multiplicity.


We have seen in Proposition 9.4.3 that the entropy of fA with respect to
the Haar measure μ is equal to the expression on the right-hand side of
(10.2.7). By the variational principle (Theorem 10.1), whose proof is contained
in Section 10.4 below, the topological entropy is greater than or equal to the
entropy of the transformation with respect to any invariant probability measure.
Thus,

d
h(fA ) ≥ hμ (f ) = log+ |λj |.
j=1

In what follows, we focus on proving the opposite inequality:



d
h(fA ) ≤ log+ |λj |. (10.2.8)
j=1

Initially, assume that A is diagonalizable, that is, that there exists a basis
v1 , . . . , vd of Rd with Avi = λi vi for each i. Then, clearly, we may take the
elements of such a basis to be unit vectors. Moreover, up to renumbering the
eigenvalues, we may assume that there exists u ∈ {0, . . . , d} such that |λi | > 1 for
1 ≤ i ≤ u and |λi | ≤ 1 for every i > u. Let e1 , . . . , ed be the canonical basis of Rd
and P : Rd → Rd be the linear isomorphism defined by P(ei ) = vi for each i.
Then P−1 AP is a diagonal matrix. Fix L > 0 large enough so that P((0, L)d )
.
contains some unit cube di=1 [bi , bi + 1]d . See Figure 10.2. Let π : Rd → Td
be the canonical projection. Then π P((0, L)d ) contains √ the whole torus T .
d

Given n ≥ 1 and ε > 0, fix δ > 0 such that Pδ d < ε. Moreover, for each
i = 1, . . . , d, take 
δ|λi |−n if i ≤ u
δi =
δ if i > u.
Consider the set
 
E = π P (k1 δ1 , . . . , kd δd ) ∈ (0, L)d : k1 , . . . , kd ∈ Z .
Observe also that, given any j ≥ 0,
j  j j 
fA (E) ⊂ π P (k1 λ1 δ1 , . . . , kd λd δd ) : k1 , . . . , kd ∈ Z .
324 Variational principle

P b2 + 1

b2

L b1 b1 + 1

Figure 10.2. Building an (n, ε)-generating set in Td

j
Consider 0 ≤ j < n. By construction, |λi δi | ≤ δ for every i = √1, . . . , d. Therefore,
every point of R is at a distance less than or equal to δ d from some point
d
j j
of the form (k1 λ1 δ1 , . . . , kd λd δd ). Then (see Figure
√ 10.2), for each x ∈ Td we
may find a ∈ E such that d(f j (x), f j (a)) ≤ Pδ d < ε for every 0 ≤ j < n. This
shows that E is an (n, ε)-generating set for Td . On the other hand,
&d  d & u
L L
#E ≤ = |λi |n .
i=1 i
δ δ i=1
.
These observations show that gn (fA , ε, Td ) ≤ (L/δ)d ui=1 |λi |n for every n ≥ 1
and ε > 0. Hence,

1  u  d
h(f ) = lim lim sup gn (fA , ε, Td ) ≤ log |λi | = log+ |λi |.
ε→0 n n i=1 i=1

This proves Proposition 10.2.10 in the case when A is diagonalizable.


The general case may be treated in a similar fashion, writing the matrix A in
its Jordan canonical form. The reader is invited to carry out the details.

10.2.6 Exercises
10.2.1. Let (Mi , di ), i = 1, 2 be metric spaces and fi : Mi → Mi , i = 1, 2 be continuous
transformations. Let M = M1 × M2 , d be the distance defined in M by

d((x1 , x2 ), (y1 , y2 )) = max{d1 (x1 , y1 ), d2 (x2 , y2 )}

and f : M → M be the transformation defined by f (x1 , x2 ) = (f1 (x1 ), f2 (x2 )). Show
that h(f ) ≤ h(f1 ) + h(f2 ) and the identity holds if at least one of the spaces is
compact.
10.2.2. Let σA : A → A be a shift of finite type, either one-sided or two-sided. We
say that a transition matrix A is irreducible if for any i, j ∈ X there exists n ≥ 1
such that Ani,j > 0 and that A is aperiodic if there exists n ≥ 1 such that Ani,j > 0
for every i, j ∈ X. Show that:
(a) If A is irreducible then the set of periodic points of σA is dense in A .
10.3 Pressure 325

(b) σA is transitive if and only if A is irreducible.


(c) σA is topologically mixing if and only if A is aperiodic.
[Observation: Condition (b) means that the oriented graph GA is connected:
given any a, b ∈ X there exists some path in GA starting at a and ending
at b.]
10.2.3. Prove Proposition 10.2.6.
10.2.4. Let M be a compact metric space. Show that, given any ε > 0, the restriction
of the topological entropy function f → h(f ) to the set of continuous
transformations f : M → M that are ε-expansive is upper semi-continuous (with
respect to the topology of uniform convergence).
10.2.5. Show that if A is a transition matrix then, for every k ≥ 1, no row of Ak is
identically zero. The same is true for the columns of Ak , k ≥ 1, if we assume
that A is a transition matrix in the two-sided sense.
10.2.6. (a) Let f : M → M be a surjective local homeomorphism in a compact metric
space and let d = infy #f −1 (y). Prove that h(f ) ≥ log d.
(b) Let f : S1 → S1 be a continuous map in the circle. Show that h(f ) is greater
than or equal to the logarithm of the absolute value of the degree of f , that
is, h(f ) ≥ log | deg f |.
[Observation: Misiurewicz and Przytycki [MP77b] proved that h(f ) ≥ log | deg f |
for every map f : M → M of class C1 in a compact manifold.]
10.2.7. Consider the map f : C → C defined by f (z) = zd /(2|z|d−1 ), with d ≥ 2. Prove
that the topological entropy of f is zero, but the degree of f is d. Why is this not
in contradiction with Exercise 10.2.6?
10.2.8. Let f : M → M be a continuous map in a compact metric space M. Given ε > 0,
define
g∗ (f , ε) = sup{g(f , B(x, ∞, ε)) : x ∈ M},

where B(x, ∞, ε) denotes the set of all y ∈ M such that d(f i (x), f i (y)) ≤ ε for
every n ≥ 0. Bowen [Bow72] has shown that, given b > 0 and δ > 0, there exists
c > 0 such that

log gn (f , δ, B(x, n, ε)) < c + (g∗ (f , ε) + b)n for every x ∈ M and n ≥ 1.

Using this fact, prove that h(f ) ≤ g(f , ε, M) + g∗ (f , ε). One says that f is
h-expansive if g∗ (f , ε) = 0 for some ε > 0. Conclude that in that case
h(f ) = g(f , ε, M). [Observation: This generalizes Proposition 10.2.1, since every
expansive transformation is also h-expansive.]

10.3 Pressure
In this section we introduce an important extension of the concept of
topological entropy, called (topological) pressure, and we study its main
properties. Throughout, we consider only continuous transformations in
compact metric spaces. Related to this, check Exercises 10.3.4 and 10.3.5.
326 Variational principle

10.3.1 Definition via open covers


Let f : M → M be a continuous transformation in a compact metric space. We
call a potential in M any continuous function φ : M → R. For each n ∈ N,

define φn : M → R by φn = n−1 i=0 φ ◦ f . Given an open cover α of M, let
i

 
Pn (f , φ, α) = inf sup eφn (x) : γ is a finite subcover of α n . (10.3.1)
U∈γ x∈U

This sequence log Pn (f , φ, α) is subadditive (Exercise 10.3.1) and so the limit


1
P(f , φ, α) = lim log Pn (f , φ, α) (10.3.2)
n n

exists. Define the pressure of the potential φ with respect to f to be the limit
P(f , φ) of P(f , φ, α) when the diameter of α goes to zero. The existence of this
limit is guaranteed by the following lemma:

Lemma 10.3.1. There exists limdiam α→0 P(f , φ, α), that is, there exists some
P(f , φ) ∈ R̄ such that
lim P(f , φ, αk ) = P(f , φ)
k

for every sequence (αk )k of open covers with diam αk → 0.

Proof. Let (αk )k and (βk )k be any sequences of open covers with diameters
converging to zero. Given any ε > 0, fix δ > 0 such that |φ(x) − φ(y)| ≤ ε
whenever d(x, y) ≤ δ. By assumption, diam αk < δ for every k sufficiently large.
For fixed k, let ρ > 0 be a Lebesgue number for αk . By assumption, diam βl < ρ
for every l sufficiently large. By the definition of Lebesgue number, it follows
that every B ∈ βl is contained in some A ∈ αk . Observe also that
sup φn (x) ≤ nε + sup φn (y)
x∈A y∈B

for every n ≥ 1, since diam αk < δ. This implies that


Pn (f , φ, αk ) ≤ enε Pn (f , φ, βl ) for every n ≥ 1
and, hence, P(f , φ, αk ) ≤ ε + P(f , φ, βl ). Making l → ∞ and then k → ∞, we
get that
lim sup P(f , φ, αk ) ≤ ε + lim inf P(f , φ, βl ).
k l

Since ε > 0 is arbitrary, it follows that lim supk P(f , φ, αk ) ≤ lim infl P(f , φ, βl ).
Exchanging the roles of the two sequences of covers, we conclude that the
limits limk P(f , φ, αk ) and liml P(f , φ, βl ) exist and are equal.

Before we proceed, let us mention a few simple consequences of the


definitions. The first is that the pressure of the zero potential coincides with
the topological entropy. Indeed, it is immediate from (10.3.1) that Pn (f , 0, α) =
N(α n ) for every n ≥ 1 and, thus, P(f , 0, α) = h(f , α) for every open cover α.
10.3 Pressure 327

Let (αk )k be any sequence of open covers with diameters going to zero. Then,
by Proposition 10.1.12 and the definition of the pressure,

h(f ) = lim h(f , αk ) = lim P(f , 0, αk ) = P(f , 0). (10.3.3)


k k

Observe, however, that for general potentials P(f , φ) need not coincide with
the supremum of P(f , φ, α) over all open covers α (see Exercise 10.3.5).
Given any constant c ∈ R, we have that Pn (f , φ + c, α) = ecn Pn (f , φ, α) for
every n ≥ 1 and, consequently, P(f , φ + c, α) = P(f , φ, α) + c for any open
cover α. Hence,
P(f , φ + c) = P(f , φ) + c. (10.3.4)
Analogously, if φ ≤ ψ then Pn (f , φ, α) ≤ Pn (f , ψ, α) for every n ≥ 1, which
implies that P(f , φ, α) = P(f , ψ, α) for every open cover α. That is,

φ ≤ ψ ⇒ P(f , φ) ≤ P(f , ψ). (10.3.5)

In particular, since inf φ ≤ φ ≤ sup φ, we have that

h(f ) + inf φ ≤ P(f , φ) ≤ h(f ) + sup φ (10.3.6)

for every potential φ. An interesting corollary is that if h(f ) is finite then


P(f , φ) < ∞ for every potential φ and, otherwise, P(f , φ) = ∞ for every
potential φ. An example of this last situation is given in Exercise 10.1.6.
Another simple consequence of the definition is that the pressure is an
invariant of topological equivalence:

Proposition 10.3.2. Let f : M → M and g : N → N be continuous transforma-


tions in compact metric spaces. If there exists a homeomorphism h : M → N
such that h ◦ f = g ◦ h then P(g, φ) = P(f , φ ◦ h) for every potential φ in N.

Proof. The correspondence α → h(α) is a bijection between the spaces of


open covers of M and N, respectively. Moreover, since h and its inverse are
(uniformly) continuous, diam αk → 0 if and only if diam h(αk ) → 0. Consider
the potential ψ = φ ◦ h in M. Note that ψn = φn ◦ h and so

sup ψn (x) = sup φn (y)


x∈U y∈h(U)

for every U ⊂ M and every n ≥ 1. Hence, Pn (f , ψ, α) = Pn (g, φ, h(α)) for every


n and every open cover α of M. Thus, P(f , ψ, α) = P(g, φ, h(α)) and, taking the
limit when the diameter of α goes to zero, P(f , ψ) = P(g, φ).

One may replace the supremum by the infimum in (10.3.1), that is, replace
Pn (f , φ, α) with
 
φn (x)
Qn (f , φ, α) = inf inf e : γ is a finite subcover of α ,
n
x∈U
U∈γ
328 Variational principle

although this makes the definition a bit more complicated. In contrast with
log Pn (f , φ, α), the sequence log Qn (f , φ, α) need not be subadditive. Denote
1
Q− (f , φ, α) = lim inf log Qn (f , φ, α) and
n n
1
Q+ (f , φ, α) = lim sup log Qn (f , φ, α).
n n
Clearly, Q− (f , φ, α) ≤ Q+ (f , φ, α) for every open cover α of M. Furthermore,
Qn (f , 0, α) = Pn (f , 0, α) = N(α n ) for every n and so Q− (f , 0, α) = Q+ (f , 0, α) =
P(f , 0, α) = h(f , α).

Corollary 10.3.3. For any potential φ : M → R,

P(f , φ) = lim Q+ (f , φ, α) = lim Q− (f , φ, α).


diam α→0 diam α→0

Proof. Since φ is (uniformly) continuous, given any ε > 0 there exists δ > 0
such that
inf φn (x) ≤ sup φn (x) ≤ nε + inf φn (x)
x∈C x∈C x∈C

whenever diam C ≤ δ. So,

Qn (f , φ, α) ≤ Pn (f , φ, α) ≤ enε Qn (f , φ, α)

for every open cover α with diam α ≤ δ. It follows that


1 1
lim sup log Qn (f , φ, α) ≤ P(f , φ, α) ≤ ε + lim inf log Qn (f , φ, α).
n n n n
As the diameter of α goes to zero, we may take ε → 0. Thus,

lim Q− (f , φ, α) = lim Q+ (f , φ, α) = lim P(f , φ, α) = P(f , φ),


diam α→0 diam α→0 diam α→0

as claimed.

10.3.2 Generating sets and separated sets


Now we present two alternative definitions of pressure, in terms of generating
sets and separated sets. As before, f : M → M is a continuous transformation
in a compact metric space and φ : M → R is a continuous function.
Given n ≥ 1 and ε > 0, define
 
Gn (f , φ, ε) = inf eφn (x) : E is an (n, ε)-generating set for M and
x∈E
 
φn (x)
Sn (f , φ, ε) = sup e : E is an (n, ε)-separated set in M .
x∈E
(10.3.7)
10.3 Pressure 329

Next, define
1
G(f , φ, ε) = lim sup log Gn (f , φ, ε) and
n n
(10.3.8)
1
S(f , φ, ε) = lim sup log Sn (f , φ, ε),
n n
and also
G(f , φ) = lim G(f , φ, ε) and S(f , φ) = lim S(f , φ, ε) (10.3.9)
ε→0 ε→0

(these limits exist because the functions are monotonic in ε).


Note that Gn (f , 0, ε) = gn (f , ε) and Sn (f , 0, ε) = sn (f , ε) for every n ≥ 1 and
every ε > 0. Therefore (Proposition 10.1.6), G(f , 0) = g(f ) and S(f , 0) = s(f )
coincide with the topological entropy h(f ). In fact,
Proposition 10.3.4. P(f , φ) = G(f , φ) = S(f , φ) for every potential φ in M.

Proof. Consider n ≥ 1 and ε > 0. It is clear from the definitions that every
maximal (n, ε)-separated set is (n, ε)-generating. Then,
 
Sn (f , φ, ε) = sup eφn (x) : E is (n, ε)-separated
x∈E
 
φn (x)
= sup e : E is (n, ε)-separated maximal (10.3.10)
x∈E
 
≥ inf eφn (x) : E is (n, ε)-generating = Gn (f , φ, ε)
x∈E

for every n and every ε. This implies that G(f , φ, ε) ≤ S(f , φ, ε) for every ε and,
thus, G(f , φ) ≤ S(f , φ).
Next, we prove that S(f , φ) ≤ P(f , φ). Let ε and δ be positive numbers such
that d(x, y) ≤ δ implies |φ(x) − φ(y)| ≤ ε. Let α be any open cover of M with
diam α < δ and E ⊂ M be any (n, δ)-separated set. Given any subcover γ of α n ,
it is obvious that every point of E is contained in some element of γ . On the
other hand, the hypothesis that E is (n, δ)-separated implies that each element
of γ contains at most one element of E. Therefore,
 
eφn (x) ≤ sup eφn (y) .
x∈E U∈γ y∈U

Taking the supremum in E and the infimum in γ , we get that


Sn (f , φ, δ) ≤ Pn (f , φ, α). (10.3.11)
It follows that S(f , φ, δ) ≤ P(f , φ, α). Making δ → 0 (hence diam α → 0), we
conclude that S(f , φ) ≤ P(f , φ), as stated.
Finally, we prove that P(f , φ) ≤ G(f , φ). Let ε and δ be positive numbers
such that d(x, y) ≤ δ implies |φ(x) − φ(y)| ≤ ε. Let α be any open cover of M
with diam α < δ and ρ > 0 be a Lebesgue number of α. Let E ⊂ M be any
330 Variational principle

(n, ρ)-generating set for M. For each x ∈ E and i = 0, . . . , n − 1, there exists


Ax,i ∈ α such that B(f i (x), ρ) is contained in Ax,i . Denote

n−1
γ (x) = f −i (Ax,i ).
i=0

Observe that γ (x) ∈ α n and B(x, n, ρ) ⊂ γ (x). Hence, the hypothesis that E is
(n, ρ)-generating implies that γ = {γ (x) : x ∈ E} is a subcover of α. Observe
also that
sup φn (y) ≤ nε + φn (x) for every x ∈ E,
y∈γ (x)

since diam Ax,i < δ for every i. It follows that


 
φn (y)
sup e ≤enε
eφn (x) .
U∈γ y∈U x∈E

This proves that Pn (f , φ, α) ≤ enε Gn (f , φ, ρ) for every n ≥ 1 and, consequently,


1
P(f , φ, α) ≤ ε + lim inf Gn (f , φ, ρ) ≤ ε + G(f , φ, ρ). (10.3.12)
n n
Making ρ → 0 we find that P(f , φ, α) ≤ ε + G(f , φ). Hence, making ε, δ and
diam α go to zero, P(f , φ) ≤ G(f , φ).

The conclusion of Proposition 10.3.4 may be rewritten as follows:


1
P(f , φ) = lim lim sup log Gn (f , φ, s)
s→0 n n
(10.3.13)
1
= lim lim sup log Sn (f , φ, s).
s→0 n n
The relations (10.3.12) and (10.3.10) in the proof also give that
1 1
P(f , φ) ≤ lim lim inf log Gn (f , φ, s) ≤ lim lim inf log Sn (f , φ, s).
s→0 n n s→0 n n
Combining these observations, we get:
1
P(f , φ) = lim lim inf log Gn (f , φ, s)
s→0 n n
(10.3.14)
1
= lim lim inf log Sn (f , φ, s).
s→0 n n

10.3.3 Properties
Properties of the pressure function in the spirit of Proposition 10.1.10
and Corollary 10.1.13 are stated in Exercise 10.3.3. Let us also extend
Propositions 10.1.14 and 10.1.15 to the present context:

Proposition 10.3.5. Let f : M → M be a continuous transformation in a


compact metric space and φ be a potential in M. Then:
10.3 Pressure 331

(1) P(f k , φk ) = kP(f , φ) for every k ≥ 1.


(2) If f is a homeomorphism then P(f −1 , φ) = P(f , φ).
k−1
Proof. Given a potential φ : M → R and an open cover α, denote ψ = φ◦
* −i
i=0
f i and β = k−1
i=0 f (α). Let g = f . It is clear that
k


n−1 
kn−1 )
n−1 )
nk−1
−j
ψ ◦g = j
φ ◦f l
and g (β) = f −l (α).
j=0 l=0 j=0 l=0

Then,
 n−1 )
n−1 
j
Pn (g, ψ, β) = inf sup e j=0 ψ(g (x)) :γ ⊂ −j
g (β)
U∈γ x∈U j=0
 kn−1 )
nk−1 
φ(f l (x))
= inf sup e l=0 :γ ⊂ f −l (α) = Pkn (f , φ, α).
U∈γ x∈U l=0

Consequently, P(f k , ψ, β) = kP(f , φ, α) for any α. Making diam α → 0 (note


that diam β → 0), we deduce that P(f k , ψ) = kP(f , φ). This proves part (1).
Suppose that f is a homeomorphism. Given an open cover α and an integer
number n ≥ 1, denote

n−1
φn− = φ ◦ f −j and α−
n
= α ∨ f (α) ∨ · · · ∨ f n−1 (α).
j=0

It is clear that φn− = φn ◦ f n−1 and α−n


= f n−1 (α n ). Moreover, γ is a subcover of
α n if and only if δ = f n−1 (γ ) is a subcover of α− n
. Combining these facts, we
find that
 
−1 φn− (x)
Pn (f , φ, α) = inf sup e : γ ⊂ α− n

U∈δ x∈U
 
= inf sup eφn (y) : δ ⊂ α n = Pn (f , φ, α)
V∈γ y∈V

for every n ≥ 1. Hence, P(f −1 , φ, α) = P(f , φ, α) for every open cover α.


Making diam α → 0, we reach the conclusion in part (2).

Next, we fix the transformation f : M → M and we consider P(f , ·) as


a function in the space C0 (M) of all continuous functions, with the norm
defined by
ϕ = sup{|ϕ(x)| : x ∈ M}.
We have seen in (10.3.6) that if the topological entropy h(f ) is infinite then the
pressure function is constant and equal to ∞. In what follows we assume that
h(f ) is finite. Then, P(f , φ) is finite for every potential φ.
Proposition 10.3.6. The pressure function is Lipschitz, with Lipschitz constant
equal to 1: |P(f , φ) − P(f , ψ)| ≤ φ − ψ for any potentials φ and ψ.
332 Variational principle

Proof. Clearly, φ ≤ ψ + φ − ψ. Hence, by (10.3.4) and (10.3.5), we have


that P(f , φ) ≤ P(f , ψ) + φ − ψ. Exchanging the roles of φ and ψ, one gets
the other inequality.

Proposition 10.3.7. The pressure function is convex:

P(f , (1 − t)φ + tψ) ≤ (1 − t)P(f , φ) + tP(f , ψ)

for any potentials φ and ψ in M and any 0 ≤ t ≤ 1.

Proof. Write ξ = (1 − t)φ + tψ. Then ξn = (1 − t)φn + tψn for every n ≥ 1 and,
thus, sup(ξn | U) ≤ (1 − t) sup(φn | U) + t sup(ψn | U) for every U ⊂ M. Then,
by the Hölder inequality (Theorem A.5.5),
  1−t   t
ξn (x) φn (x) ψn (x)
sup e ≤ sup e sup e
U∈γ x∈U U∈γ x∈U U∈γ x∈U

for any finite family γ of subsets of M. This implies that, given any open
cover α,
Pn (f , ξ , α) ≤ Pn (f , φ, α)1−t Pn (f , ψ, α)t

for every n ≥ 1 and, hence, P(f , ξ , α) ≤ (1 − t)P(f , φ, α) + tP(f , ψ, α). Passing


to the limit when diam α → 0, we get the conclusion of the proposition.

We say that two potentials φ, ψ : M → R are cohomologous if there exists a


continuous function u : M → R such that φ = ψ + u ◦ f − u. Note that this is an
equivalence relation in the space of potentials (Exercise 10.3.6).

Proposition 10.3.8. Let f : M → M be a continuous transformation in a


compact topological space. If φ, ψ : M → R are cohomologous potentials then
P(f , φ) = P(f , ψ).

Proof. If ψ = φ + u ◦ f − u then ψn (x) = φn (x) + u(f n (x)) − u(x) for every


n ∈ N. Let K = sup |u|. Then | supx∈C ψn (x) − supx∈C φn (x)| ≤ 2K for every set
C ⊂ M. Hence, for any open cover γ ,
  
e−2K sup eφn (x) ≤ sup eψn (x) ≤ e2K sup eφn (x) .
U∈γ x∈U U∈γ x∈U U∈γ x∈U

This implies that, given any open cover α of M,

e−2K Pn (f , φ, α) ≤ Pn (f , ψ, α) ≤ e2K Pn (f , φ, α)

for every n. Therefore, P(f , φ, α) = P(f , ψ, α) for every α and, consequently,


P(f , φ) = P(f , ψ).
10.3 Pressure 333

10.3.4 Comments in statistical mechanics


Let us take a pause to explain the relation between the mathematical concept
of pressure and the issues in physics that originated it. This also serves as a
preview to Chapter 12, where this theory will be developed in the context of
expanding maps in metric spaces. The discussion that follows is a combination
of mathematical results and physical considerations, not necessarily rigorous,
and is quite brief: we refer the reader to the classical works of David
Ruelle [Rue04] and Oscar Lanford [Lan73] for actual presentations of the
subject.
The goal of statistical mechanics is to describe the properties of physical
systems consisting of a large number of units that interact with each other. For
example, these units may be particles, such as molecules of a gas, or sites in
a crystal grid, which may or may not be occupied by particles. The constant
of Avogadro 6.022 × 1023 illustrates what one means by “large” in specific
situations in this context.
The main challenge in this area of mathematical physics is to understand
the phenomena of phase transitions, that is, sudden changes from one physical
state to another: for example, what happens when liquid water turns into ice?
Why does this occur suddenly, at a given freezing temperature? Mathematical
methods developed for tackling this kind of question turn out to be very
useful in other areas of science, such as quantum field theory and, closer to
the scope of this book, the ergodic theory of hyperbolic dynamical systems
(Bowen [Bow75a]).
In order to formulate these problems in mathematical terms, it is convenient
to assume that the set L of units in the system is actually infinite, because finite
systems do not have genuine phase transitions. The best-studied examples are
the lattice systems, for which L = Zd with d ≥ 1. It is assumed that each unit
has a finite set F of possible values (or “states”). For example, F = {−1, +1}
in the case of spin systems, with ±1 representing the two possible orientations
of the particle’s “spin”, and F = {0, 1} in the case of lattice gases: 1 means that
the site k ∈ L is occupied by a gas molecule, whereas 0 means that the site is
empty.
Then, the system’s configuration space is a subset  of the product space
F . We assume  to be closed in F L and invariant under the shift map
L

σ n : FL → FL , (ξk )k∈L → (ξk+n )k∈L

for every n ∈ L. A state of the system is a probability measure μ on


: intuitively, one presumes that at the microscopic level the system
oscillates randomly between different configurations ξ ∈  (for example,
different positions or velocities of the molecules), all corresponding to the
same macroscopic parameters (same temperature, etc.); then, the measure μ
describes the probability distribution of these microscopic configurations.
334 Variational principle

States corresponding to macroscopic configurations that can be physically


observed, that is, that actually occur in Nature, are called equilibrium states.
This notion has a central role in the theory, in particular, because phase
transitions are associated with the coexistence of more than one equilibrium
state. Under our hypotheses on , one can show that every equilibrium state μ
is invariant under the shift maps σ n , n ∈ L. Thus, the study of lattice systems is
naturally inserted in the scope of ergodic theory.
According to the variational principle of statistical mechanics, which
goes back to the principle of least action of Maupertuis, the equilibrium
states are characterized by the fact that they minimize a certain fundamental
quantity, called Gibbs free energy, whose definition involves the energy E, the
temperature T and the entropy S of the system’s state. The pressure of that
state is, simply, the product of the Gibbs free energy and a negative factor
−β whose nature will be explained shortly.1 Therefore, the equilibrium states
are also characterized by the fact that they maximize the pressure among all
probability measures invariant under the shift maps σ n , n ∈ L.
From these facts, one can obtain a rather explicit description of the equi-
librium states for lattice systems: under suitable hypotheses, the equilibrium
states are precisely the Gibbs states invariant under the shift maps. In the
remainder of this section we are going to motivate and define this concept
of Gibbs state, which will also allow us to illustrate the ideas outlined in the
previous paragraphs. By the end of the section we briefly comment on the case
of one-dimensional lattice systems, that is, the case d = 1, whose theory is
much simpler and which is more closely related to the topics treated in this
book.
Let us start by considering the particularly simple case of finite systems, that
is, such that the configuration space  is finite. The entropy of a state μ in 
is the number 
S(μ) = −μ({ξ }) log μ({ξ }).
ξ ∈

To each configuration ξ ∈  corresponds a value E(ξ ) for the energy of the


system. Denote by E(μ) the energy of the state μ, that is, the mean

E(μ) = μ({ξ })E({ξ }).
ξ ∈

Take the system’s absolute temperature T to be constant in time. Then, the


Gibbs free energy is defined by
G(μ) = E(μ) − κTS(μ),

1 From the mathematical point of view, the two quantities are equivalent. Preference for one
denomination or the other has mostly to do with the physical interpretation of the set F: for
spin systems one usually refers to the Gibbs free energy, whereas for lattice gases, where the
elements of F describe the rate of occupation of each site, it is more natural to refer to the
pressure.
10.3 Pressure 335

where κ = 1.380 × 10−23 m2 kg s−2 K−1 is the Boltzmann constant. In other


words, denoting β = 1/(κT),
  
− βG(μ) = μ({ξ }) − βE(ξ ) − log μ({ξ }) . (10.3.15)
ξ ∈

This expression is denoted by P(μ) and is called pressure of the state μ.


It is easy to check that the pressure (10.3.15) is maximum (hence, the Gibbs
free energy G(μ) is minimum) if and only if

e−βE(ξ )
μ({ξ }) =  −βE(η)
for every ξ ∈  (10.3.16)
η∈ e

(see Lemma 10.4.4 below). Therefore, the Gibbs distribution μ given by


(10.3.16) is the unique equilibrium state of the system. In particular, in this
simple context there are no phase transitions.
Now we sketch how this analysis can be extended to infinite lattice systems,
assuming that the interaction between sites that are far apart is sufficiently
weak. It is part of the hypotheses that the energy associated with each
configuration ξ ∈  comes from the pairwise interactions between the different
sites in the lattice (including self-interactions) and that this interaction is
invariant under the shift maps σ n , n ∈ L. Then, the energy Ek,l resulting from
the action of any site k ∈ L on any other site l ∈ L depends only on their relative
position and on the values of ξk and ξl . In other words, there exists a function
 : L × F × F → R̄ such that

Ek,l = (k − l, ξk , ξl ) for any k, l ∈ L.

It is also assumed that the strength of the interaction decays exponentially


with the distance between the sites, in the following sense: there exist constants
K > 0 and θ > 0 such that

|(m, a, b)| ≤ Ke−θ|m| for any m ∈ L and a, b ∈ F, (10.3.17)

where |m| = max{|m1 |, . . . , |md |}. In particular (Exercise 10.3.9), the energy

ϕ(ξ ) = (k, ξk , ξ0 )
k∈L

resulting from the action of all the sites on the site 0 at the origin is uniformly
bounded.
Initially, given any finite set  ⊂ L, let us consider the system one obtains
by observing only the sites k ∈  and “switching off” their interactions with
the sites in the complement of . This is a finite system, as the configuration
space is contained in F  , with energy function given by

E (x) = (k − l, xk , xl ) for every x ∈ F  .
l∈ k∈
336 Variational principle

Hence, according to (10.3.16), its Gibbs distribution μ is given by


e−βE (x)
μ ({x}) =  −βE (y)
for each x ∈ F  . (10.3.18)
y∈F  e

The notion of Gibbs state is obtained from this one by “switching back on”
the interaction with the sites outside , in the way we are going to explain.
Denote by r (x) the expression on the right-hand side of (10.3.18). Observe
that
 −1
βE (x)−βE (y)
r (x) = e
y∈F 

and recall that



E (x) − E (y) = (k − l, xk , xl ) − (k − l, yk , yl ).
l∈ k∈

For ξ , η ∈ F L , define

E(ξ , η) = (k − l, ξk , ξl ) − (k − l, ηk , ηl )
l∈L k∈L
 
= (j, ξj+l , ξl ) − (j, ηj+l , ηl ) = ϕ(σ l (ξ )) − ϕ(σ l (η)).
l∈L j∈L l∈L

It follows from the condition (10.3.17) that this sum converges whenever the
two configurations are such that ξk = ηk for every k in the complement of some
finite set (Exercise 10.3.9). Then,
 −1
βE(ξ ,η)
ρ (ξ ) = β e
η|c =ξ |c

is well defined for every ξ ∈ F  .


A probability measure μ supported in  ⊂ F L is called a Gibbs state if, for
c
every finite set  ⊂ L, the disintegration {μ,θ : θ ∈ F  } of μ relative to the
c c
partition {F  × {θ } : θ ∈ F  } of the space F L = F  × F  is given by

ρ (x, θ ) if (x, θ ) ∈ 
μ,θ ({x} × {θ }) =
0 otherwise.
To conclude this section, we state one of the main results of this formalism
that we have been describing: one-dimensional lattice systems exhibit no phase
transitions. More precisely:
Theorem 10.3.9 (Ruelle). If d = 1 and the interactions decay exponentially
with the distance, then there exists a unique Gibbs state and it is also the unique
equilibrium state.
The arguments in the proof of this theorem (Ruelle [Rue04]) are at the
basis of the thermodynamic formalism of expanding maps, which we are
going to present in Chapter 12. Let us point out that the theorem is false in
10.3 Pressure 337

dimension d ≥ 2. Exercise 10.3.10 highlights one of the specificities of the


one-dimensional case that are behind this result.

10.3.5 Exercises
10.3.1. Check that the sequence log Pn (f , φ, α) is subadditive.
10.3.2. Show that if f is a homeomorphism then P(f , φ, f (α)) = P(f , φ, α), Q+ (f , φ, f (α)) =
Q+ (f , φ, α) and Q− (f , φ, f (α)) = Q− (f , φ, α) for every open cover α.
10.3.3. Show that, for any potential φ : M → R:
(a) If α, β are open covers with α ≺ β then Q+ (f , φ, α) ≤ Q+ (f , φ, β) and
Q− (f , φ, α) ≤ Q− (f , φ, β).
(b) Q+ (f , φ, α) = Q+ (f , φ, α k ) and Q− (f , φ, α) = Q− (f , φ, α k ) for every k ≥ 1
and every open cover α.
(c) Q+ (f , φ, α) = P(f , φ) = Q− (f , φ, α) for any open cover α such that
diam α k → 0.
(d) P(f , φ, α) = P(f , φ, α k ) for every k ≥ 1 and any open cover α whose
elements are pairwise disjoint.
(e) P(f , φ, α) = P(f , φ) for any open cover α such that diam α k → 0 and whose
elements are pairwise disjoint.
(f) If f is a homeomorphism, one may replace α k by α ±k in statements (b), (c),
(d) and (e).
10.3.4. (Walters). Prove that
P(f , φ) = sup{Q− (f , φ, α) : α is an open cover of M}
= sup{Q+ (f , φ, α) : α is an open cover of M}.
[Observation: In particular, the pressure depends only on the topology of M,
not the distance. This also provides a way to extend the definition to continuous
transformations in compact topological spaces.]
10.3.5. Exhibit a homeomorphism f : M → M, a potential φ : M → R and open
covers α and β of a compact metric space M such that α ≺ β and P(f , φ, α) >
P(f , φ, β) = P(f , φ). [Observation: Thus, the conclusions of Exercise 10.3.3(a)
and Exercise 10.3.4 are no longer valid if one replaces Q± (f , φ, α) by P(f , φ, α).]
10.3.6. Check that the cohomology relation
φ ∼ ψ ⇔ ψ = φ + u ◦ f − u for some continuous function u : M → R
is an equivalence relation.
10.3.7. Let fi : Mi → Mi , i = 1, 2 be continuous transformations in compact metric spaces
and, for each i, let φi be a potential in Mi . Define
f1 × f2 : M 1 × M 2 → M1 × M2 , f1 × f2 (x1 , x2 ) = (f1 (x1 ), f2 (x2 ))
φ1 × φ2 : M1 × M2 → R, φ1 × φ2 (x1 , x2 ) = φ1 (x1 ) + φ2 (x2 ).
Show that P(f1 × f2 , φ1 × φ2 ) = P(f1 , φ1 ) + P(f2 , φ2 ).
10.3.8. Consider the transformation f : S1 → S1 defined by f (x) = 2x mod Z. Prove
that if φ : S1 → R is a Hölder function then
1 
P(f , φ) = lim log eφn (p) .
n n
p∈Fix(f n )
338 Variational principle

[Observation: We will get a more general result in Exercise 11.3.4.]


10.3.9. Assuming the conditions in Section 10.3.4, prove that:
(a) There exists C > 0 such that |ϕ(ξ )| ≤ C and |ϕ(ξ ) − ϕ(η)| ≤ Ce−θN/2 for
any ξ , η ∈ F L such that ξk = ηk whenever |k| < N.
(b) For any finite set  ⊂ L, there exists C > 0 such that |E(ξ , η)| ≤ C for
any ξ , η ∈ F L such that ξk = ηk for every k ∈ c .
10.3.10. Assuming the conditions in Section 10.3.4, prove that if d = 1 then there exists
C > 0 such that |E(ξ , η) − E (ξ ) + E (η)| ≤ C for every finite interval  ⊂ Z
and any ξ , η ∈ F Z with ξ | c = η | c . Consequently,
r (ξ | )
e−βC ≤ ≤ eβC
ρ (ξ )
for every ξ ∈ F Z and every finite interval . [Observation: Therefore, the
probability distribution of the configurations restricted to each finite interval
is not affected in a significant way by the interactions with the sites outside that
interval.]

10.4 Variational principle


The variational principle for the pressure, which we state below, was proved
originally by Ruelle [Rue73], under more restrictive assumptions, and was then
extended by Walters [Wal75] to the context we consider here:
Theorem 10.4.1 (Variational principle). Let f : M → M be a continuous
transformation in a compact metric space and M1 (f ) denote the set of
probability measures invariant under f . Then, for every continuous function
φ : M → R, 
P(φ, f ) = sup{hν (f ) + φ dν : ν ∈ M1 (f )}.

Theorem 10.1 corresponds to the special case φ ≡ 0. In particular, it follows


that the topological entropy of f is zero if and only if hν (f ) = 0 for every
invariant probability measure ν. That is the case, for example, for every
circle homeomorphism (Example 10.1.1) and every translation in a compact
metrizable group (Example 10.1.7). The compactness hypothesis is crucial:
as observed in Exercise 10.4.4, there exist transformations (in non-compact
spaces) without invariant measures and whose topological entropy is positive.
In Sections 10.4.1 and 10.4.2 we present a proof of Theorem 10.4.1 that
is due to Misiurewicz [Mis76]. Before that, let us mention a couple of
consequences.
Corollary 10.4.2. Let f : M → M be a continuous transformation in a compact
metric space and Me (f ) denote the set of probability measures invariant and
ergodic. Then

P(φ, f ) = sup{hν (f ) + φ dν : ν ∈ Me (f )}.
10.4 Variational principle 339

Proof. Given any ν ∈ M1 (f ), let {νP : P ∈ P} be its ergodic decomposition. By


Theorems 5.1.3 and 9.6.2,
    
hν (f ) + φ dν = hνP (f ) + φ dνP dμ̂(P).

This implies that


 
sup{hν (f ) + φ dν : ν ∈ M1 (f )} ≤ sup{hν (f ) + φ dν : ν ∈ Me (f )}.

The converse inequality is trivial, since Me (f ) ⊂ M1 (f ). Now it suffices to


apply Theorem 10.4.1.

Another interesting consequence is that for transformations with finite


topological entropy the pressure function determines the set of all invariant
probability measures:
Corollary 10.4.3 (Walters). Let f : M → M be a continuous transformation in
a compact metric space with topological entropy h(f ) < ∞. Let η be any finite
signed measure on M. Then, η is a probability measure invariant under f if
and only if φ dη ≤ P(f , φ) for every continuous function φ : M → R.

Proof. The “only if” claim is an immediate consequence of Theorem 10.4.1:


if η is an invariant probability measure then
 
P(f , φ) ≥ hη (f ) + φ dη ≥ φ dη

for every continuous function φ. In what follows we prove the converse.


Let η be a finite signed measure such that φ dη ≤ P(f , φ) for every φ.
Consider any φ ≥ 0. For any c > 0 and ε > 0,
 
c (φ + ε) dη = − −c(φ + ε) dη ≥ −P(f , −c(φ + ε)).

By the relation (10.3.6), we also have that


 
P(f , −c(φ + ε)) ≤ h(f ) + sup − c(φ + ε) = h(f ) − c inf(φ + ε).
Therefore, c (φ + ε) dη ≥ −h(f ) + c inf(φ + ε). When c > 0 is sufficiently
large, the right-hand side of this inequality is positive. Hence, (φ + ε) dη > 0.
Since ε > 0 is arbitrary, this implies that φ dη ≥ 0 for every φ ≥ 0. So, η is a
positive measure.
The next step is to show that η is a probability measure. By assumption,

c dη ≤ P(f , c) = h(f ) + c

for every c ∈ R. For c > 0, this implies that η(M) ≤ 1 + h(f )/c. Passing to the
limit when c → +∞, we get that η(M) ≤ 1. Analogously, considering c < 0 and
taking the limit when c → −∞, we get η(M) ≥ 1. Therefore, η is a probability
measure, as stated.
340 Variational principle

We are left to prove that η is invariant under f . By assumption, given any


c ∈ R and any potential φ,

c (φ ◦ f − φ) dη ≤ P(f , c(φ ◦ f − φ)).

By Proposition 10.3.8, the expression on the right-hand side is equal to


P(f , 0) = h(f ). For c > 0, this implies that

h(f )
(φ ◦ f − φ) dη ≤
c
and, taking the limit when c → +∞, it follows that (φ ◦ f − φ) dη ≤ 0. The
same argument, applied to the function −φ, shows that (φ ◦ f − φ) dη ≥ 0.
Hence, φ ◦ f dη = φ dη for every φ. By Proposition A.3.3, this implies that
f∗ η = η.

10.4.1 Proof of the upper bound


In this section we prove that, given any invariant probability measure ν,

hν (f ) + φ dν ≤ P(f , φ). (10.4.1)

To do this, let P = {P1 , . . . , Ps } be any finite partition. We are going to show


that if α is an open cover of M with sufficiently small diameter, depending only
on P, then

hν (f , P) + φ dν ≤ log 4 + P(f , φ, α). (10.4.2)

Making diam α → 0, it follows that hν (f , P)+ φ dν ≤ log 4+P(f , φ) for every


finite partition P. Hence, hν (f ) + φ dν ≤ log 4 + P(f , φ). Now replace the
transformation f by f k and the potential φ by φk . Note that φk dν = k φ dν,
since ν is invariant under f . Using Propositions 9.1.14 and 10.3.5, we get that

khν (f , P) + k φ dν ≤ log 4 + kP(f , φ)

for every k ≥ 1. Dividing by k and taking the limit when k → ∞, we get the
inequality (10.4.1).
For proving (10.4.2) we need the following elementary fact:

Lemma 10.4.4. Let a1 , . . . , ak be real numbers and p1 , . . . , pk be non-negative



numbers such that p1 + · · · + pk = 1. Let A = ki=1 eai . Then


k
pi (ai − log pi ) ≤ log A.
i=1

Moreover, the identity holds if and only if pj = eaj /A for every j.


10.4 Variational principle 341
k
Proof. Write ti = eai /A and xi = pi /eai . Note that i=1 ti = 1. By the concavity
property (9.1.8) of the function φ(x) = −x log x,

k 
k
ti φ(xi ) ≤ φ( ti xi ).
i=1 i=1
k
Note that ti φ(xi ) = (pi /A)(ai − log pi ) and i=1 ti xi = 1/A. So, the previous
inequality may be rewritten as follows:

k
pi 1
(ai − log pi ) ≤ log A.
i=1
A A

Multiplying by A we get the inequality in the statement of the lemma.


Moreover, the identity holds if and only if the xi are all equal, that is, if and
only if there exists c such that pi = ceai for every i. Summing over i = 1, . . . , k
we see that in that case c = 1/A, as stated.

Since the measure ν is regular (Proposition A.3.2), given ε > 0 we may find
compact sets Qi ⊂ Pi such that ν(Pi \ Qi ) < ε for every i = 1, . . . , s. Let Q0
be the complement of si=1 Qi and let P0 = ∅. Then Q = {Q0 , Q1 , . . . , Qs } is a
finite partition of M such that ν(Pi Qi ) < sε for every i = 0, 1, . . . , s. Hence,
by Lemma 9.1.6,
Hν (P/Q) ≤ log 2
as long as ε > 0 is sufficiently small (depending only on s). Let ε and Q be
fixed from now on and assume that the open cover α satisfies
diam α < min{d(Qi , Qj ) : 1 ≤ i < j ≤ s}. (10.4.3)
By Lemma 9.1.11, we have that hν (f , P) ≤ hν (f , Q) + Hν (P/Q) ≤ hν (f , Q) +
log 2. Hence, to prove (10.4.2) it suffices to show that

hν (f , Q) + φ dν ≤ log 2 + P(f , φ, α). (10.4.4)

To that end, observe that


   
Hν (Q ) + φn dν ≤
n
ν(Q) − log ν(Q) + sup φn (x)
Q∈Qn x∈Q

for every n ≥ 1. Then, by Lemma 10.4.4,


  
Hν (Qn ) + φn dν ≤ log sup eφn (x) . (10.4.5)
Q∈Qn x∈Q

Let γ be any finite subcover of α n . For each Q ∈ Qn , consider any point xQ


in the closure of Q such that φn (xQ ) = supx∈Q φn (x). Pick UQ ∈ γ such that
xQ ∈ UQ . Then supx∈Q φn (x) ≤ supy∈UQ φn (y) for every Q ∈ Qn . The condition
(10.4.3) implies that each element of α intersects the closure of not more than
two elements of Q. Therefore, each element of α n intersects the closure of not
342 Variational principle

more than 2n elements of Qn . In particular, for each U ∈ γ there exist not more
than 2n sets Q ∈ Qn such that UQ = U. Therefore,
 
sup eφn (x) ≤ 2n sup eφn (y) , (10.4.6)
Q∈Qn x∈Q U∈γ y∈U

for any finite subcover γ of α n . Combining (10.4.5) and (10.4.6),



Hν (Q ) + φn dν ≤ n log 2 + log Pn (f , φ, α).
n

Dividing by n and taking the limit when n → ∞, we get (10.4.4). This


completes the proof of the upper bound (10.4.1).

10.4.2 Approximating the pressure


To finish the proof of Theorem 10.4.1, we now show that for every ε > 0 there
exists a probability measure μ invariant under f and such that

hμ (f ) + φ dμ ≥ S(f , φ, ε). (10.4.7)

Clearly, this implies that the supremum of the values of hν (f ) + φ dν when ν


varies in M1 (f ) is greater than or equal to S(f , φ) = P(f , φ).
For each n ≥ 1, let E be an (n, ε)-separated set such that
 1
eφn (y) ≥ Sn (f , φ, ε). (10.4.8)
y∈E
2

Denote by A the expression on the left-hand side of this inequality. Consider


the probability measures νn and μn defined on M by

1  φn (x) 
n−1
νn = e δx and μn = f∗j νn .
A x∈E j=0

By the definition (10.3.8), recalling also that the space of probability measures
is compact (Theorem 2.1.5), we may choose a subsequence (nj )j → ∞ such
that
1
1. log Snj (f , φ, ε) converges to S(f , φ, ε), and
nj
2. μnj converges, in the weak∗ topology, to some probability measure μ.

We are going to check that such a measure μ is invariant under f and satisfies
(10.4.7). For the reader’s convenience, we split the argument into four steps.

Step 1: First, we prove that μ is invariant. Let ϕ : M → R be any continuous


function. For each n ≥ 1,
 n     
1 1
ϕ d(f∗ μn ) = ϕ ◦ f dνn = ϕ dμn +
j
ϕ ◦ f dνn − ϕ dνn
n
n j=1 n
10.4 Variational principle 343

and, consequently,
  
 
 ϕ d(f∗ μn ) − ϕ dμn  ≤ 2 sup |ϕ|.
  n

Restricting to n = nj and taking the limit when j → ∞, we see that


ϕ df∗ μ = ϕ dμ for every continuous function ϕ : M → R. This proves (recall
Proposition A.3.3) that f∗ μ = μ, as claimed.

Step 2: Next, we estimate the entropy with respect to νn . Let P be any finite
partition of M such that diam P < ε and μ(∂P) = 0, where ∂P denotes the
union of the boundaries ∂P of all sets P ∈ P. The first condition implies that
each element of P n contains at most one element of E. On the other hand, it is
clear that every element of E is contained in some element of P n . Hence,
  1  
φn (x) 1 φn (x)
Hνn (P ) =
n
−νn ({x}) log νn ({x}) = − e log e
x∈E x∈E
A A
 (10.4.9)
1  φn (x)
= log A − e φn (x) = log A − φn dνn
A x∈E

(the last equality follows directly from the definition of νn ).

Step 3: Now we calculate the entropy with respect to μn . Consider 1 ≤ k < n.


For each r ∈ {0, . . . , k − 1}, let qr ≥ 0 be the largest integer number such that
r + kqr ≤ n. In other words, qr = [(n − r)/k]. Then,
qr −1
)
P =P ∨
n r
f −(kj+r) (P k ) ∨ f −(kqr +r) (P n−(kqr +r) )
j=0

(the first term is void if r = 0 and the third one is void if n = kqr +r). Therefore,
qr −1

Hνn (P ) ≤
n
Hνn (f −(kj+r) (P k )) + Hνn (P r ) + Hνn (f −(kqr +r) (P n−(kqr +r) )).
j=0

Clearly, #P r ≤ (#P)k . Using Lemma 9.1.3, we find that Hνn (P r ) ≤ k log #P.
For the same reason, the last term in the previous inequality is also bounded
by k log #P. Then, using the property (9.1.12),
qr −1

Hνn (Pn ) ≤ Hf (kj+r) ν (P k ) + 2k log #P (10.4.10)
∗ n
j=0

for every r ∈ {0, . . . , k − 1}. Now, it is clear that every number i ∈ {0, . . . , n − 1}
may be written in a unique way as i = kj+r with 0 ≤ j ≤ qr −1. Then, summing
(10.4.10) over all the values of r,

n−1
kHνn (Pn ) ≤ Hf∗i νn (P k ) + 2k2 log #P. (10.4.11)
i=0
344 Variational principle

The concavity property (9.1.8) of the function φ(x) = −x log x implies that

1
n−1
Hf∗i νn (P k ) ≤ Hμn (P k ).
n i=0

Combining this inequality with (10.4.11), we see that


1 1 2k
Hνn (Pn ) ≤ Hμn (P k ) + log #P.
n k n
On the other hand, by the definition of μn ,
 n−1  
1 1
φn dνn = φ ◦ f j dνn = φ dμn .
n n j=0

Thus, the previous inequality yields


 
1 1 1 2k
Hνn (Pn ) + φn dνn ≤ Hμn (P ) + φ dμn + log #P.
k
(10.4.12)
n n k n
Step 4: Finally, we translate the previous estimates to the limit measure μ.
From (10.4.9) and (10.4.12), we get

1 1 2k
Hμn (P ) + φ dμn ≥ log A − log #P.
k
k n n
By the choice of E in (10.4.8), it follows that

1 1 1 2k
Hμn (P ) + φ dμn ≥ log Sn (f , φ, ε) − log 2 − log #P.
k
(10.4.13)
k n n n
The choice of the partition P with μ(∂P) = 0 implies that μ(∂P k ) = 0 for
every k ≥ 1, since
∂P k ⊂ ∂P ∪ f −1 (∂P) ∪ · · · ∪ f −k+1 (∂P).
In other words, every element of P k is a continuity set for the measure μ.
According to Exercise 2.1.1, it follows that μ(P) = limj μnj (P) for every P ∈ P k
and, hence,
Hμ (P k ) = lim Hμnj (P k ) for every k ≥ 1.
j

Since the function φ is continuous, we also have that φ dμ = limj φ dμnj .


Therefore, restricting (10.4.13) to the subsequence (nj )j and taking the limit
when j → ∞, 
1
Hμ (P ) + φ dμ ≥ S(f , φ, ε).
k
k
Taking the limit when k → ∞, we find that

hμ (f , P) + φ dμ ≥ S(f , φ, ε).

Then, making ε → 0 (and, consequently, diam P → 0), we get (10.4.7).


This completes the proof of the variational principle (Theorem 10.4.1).
10.5 Equilibrium states 345

10.4.3 Exercises
10.4.1. Let f : M → M be a continuous transformation in a compact metric space M.
Check that P(f , ϕ) ≤ h(f ) + sup ϕ for every continuous function ϕ : M → R.
10.4.2. Show that if f : M → M is a continuous transformation in a compact metric
space and X ⊂ M is a forward invariant set, meaning that f (X) ⊂ X, then P(f |
X, ϕ | X) ≤ P(f , ϕ).
10.4.3. Give an alternative proof of Proposition 10.3.8, using the variational principle.
10.4.4. Exhibit a continuous transformation f : M → M in a non-compact metric space
M such that f has no invariant probability measure and yet the topological
entropy h(f ) is positive. [Observation: Thus, the variational principle need not
hold when the ambient space is not compact.]
10.4.5. Given numbers α, β > 0 such that α + β < 1, define

x/α if x ∈ [0, α]
g : [0, α] ∪ [1 − β, 1] → [0, 1] g(x) =
(x − 1)/β + 1 if x ∈ [1 − β, 1].

Let K ⊂ [0, 1] be the Cantor set formed by the points x such that gn (x) is defined
for every n ≥ 0 and f : K → K be the restriction of g. Calculate the function ψ :
R → R defined by ψ(t) = P(f , −t log g ). Check that ψ is convex and decreasing
and admits a (unique) zero in (0, 1). Show that hμ (f ) < log g dμ for every
probability measure μ invariant under f .
10.4.6. Let f : M → M be a continuous transformation in a compact metric space, such
that the set of ergodic invariant probability measures is finite. Show that for
every potential ϕ : M → R there exists some invariant probability measure that
realizes the supremum in (10.0.1).

10.5 Equilibrium states


Let f : M → M be a continuous transformation and φ : M → M be a potential
in a compact metric space. In this section we study the fundamental properties
of the set E(f , φ) formed by the equilibrium states, that is, the invariant
probability measures μ such that
 
hμ (f ) + φ dμ = P(φ, f ) = sup{hν (f ) + φ dν : ν ∈ M1 (f )}.

In the special case φ ≡ 0 the elements of E(f , φ) are also called measures of
maximal entropy. Let us start with a few simple examples.

Example 10.5.1. If f : M → M has zero topological entropy then every


invariant probability measure μ is a measure of maximal entropy: hμ (f ) =
0 = h(f ). For any potential φ : M → R,

P(f , φ) = sup{ φ dν : ν ∈ M1 (f )}.
346 Variational principle

Hence, ν is an equilibrium state if and only if ν maximizes the integral of


φ. Since the function ν → φ dν is continuous and M1 (f ) is compact, with
respect to the weak∗ topology, maxima do exist for every potential φ.

Example 10.5.2. Let fA : M → M be the linear endomorphism induced in Td


by some matrix A with integer coefficients and non-zero determinant. Let μ be
the Haar measure on Td . By Propositions 9.4.3 and 10.2.10,

d
hμ (fA ) = log+ |λi | = h(f ),
i=1

where λ1 , . . . , λd are the eigenvalues of A. In particular, the Haar measure is a


measure of maximal entropy for f .

Example 10.5.3. Let σ :  →  be the shift map in  = {1, . . . , d}N and μ be


the Bernoulli measure associated with a probability vector p = (p1 , . . . , pd ). As
observed in Example 9.1.10,

1  d
hμ (σ , P) = lim Hμ (P n ) = −pi log pi .
n n
i=1

We leave it to the reader (Exercise 10.5.1) to check that this function attains
its maximum precisely when the coefficients pi are all equal to 1/d. Moreover,
in that case hμ (σ ) = log d. Recall also (Example 10.1.2) that h(σ ) = log d.
Therefore, the Bernoulli measure associated with the vector p = (1/d, . . . , 1/d)
is the only measure of maximal entropy among the Bernoulli measures. In fact,
it follows from the theory that we develop in Chapter 12 that μ is the unique
measure of maximal entropy among all invariant measures.

Let us start with the following extension of the variational principle:

Proposition 10.5.4. For every potential φ : M → R,



P(f , φ) = sup{hμ (f ) + φ dμ : μ invariant and ergodic for f }.

Proof. Consider the function  : M1 (f ) → R given by (μ) = hμ (f ) +


φ dμ. For each invariant probability measure μ, let {μP : P ∈ P} be the
corresponding ergodic decomposition. It follows from Theorem 9.6.2 that

(μ) = (μP ) dμ̂(P). (10.5.1)

This implies that the supremum of  over all the invariant probability measures
is less than or equal to the supremum of  over the ergodic invariant
probability measures. Since the opposite inequality is obvious, it follows that
the two suprema coincide. By the variational principle (Theorem 10.4.1), the
supremum of  over all the invariant probability measures is equal to P(f , φ).
10.5 Equilibrium states 347

Proposition 10.5.5. Assume that h(f ) < ∞. Then the set of equilibrium states
for any potential φ : M → R is a convex subset of M1 (f ): more precisely, given
t ∈ (0, 1) and μ1 , μ2 ∈ M1 (f ),
(1 − t)μ1 + tμ2 ∈ E(f , φ) ⇔ {μ1 , μ2 } ⊂ E(f , φ).
Moreover, an invariant probability measure μ is in E(f , φ) if and only if almost
every ergodic component of μ is in E(f , φ).

Proof. As we have seen in (10.3.6), the hypothesis that the topological entropy
is finite ensures that P(f , φ) < ∞ for every potential φ. Let us consider the
functional (μ) = hμ (f ) + φ dμ introduced in the proof of the previous
result. By Proposition 9.6.1, this functional is convex:
((1 − t)μ1 + tμ2 ) = (1 − t)(μ1 ) + t(μ2 )
for every t ∈ (0, 1) and any μ1 , μ2 ∈ M1 (f ). Then, ((1 − t)μ1 + tμ2 ) is equal
to the supremum of  if and only if both (μ1 ) and (μ2 ) are. This proves
the first part of the proposition. The proof of the second part is analogous: the
relation (10.5.1) implies that (μ) = sup  if and only if (μP ) = sup  for
μ̂-almost every P.

Corollary 10.5.6. If E(f , φ) is non-empty then it contains ergodic invariant


probability measures. Moreover, the extremal elements of the convex set E(f , φ)
are precisely the ergodic measures contained in it.

Proof. To get the first claim it suffices to consider the ergodic components of
any element of E(f , φ). Let us move on to proving the second claim. If μ ∈
E(f , φ) is ergodic then (Proposition 4.3.2) μ is an extremal element of M1 (f )
and so it must be an extremal element of E(f , φ). Conversely, if μ ∈ E(f , φ) is
not ergodic then we may write
μ = (1 − t)μ1 + tμ2 , with 0 < t < 1 and μ1 , μ2 ∈ M1 (f ).
By Proposition 10.5.5 we have that μ1 , μ2 ∈ E(f , φ), which implies that μ is
not an extremal element of the set E(f , φ).

In general, the set of equilibrium states may be empty. The first example
of this kind was given by Gurevič. The following construction is taken from
Walters [Wal82]:

Example 10.5.7. Let fn : Mn → Mn be a sequence of homeomorphisms in


compact metric spaces such that the sequence (h(fn ))n is increasing and
bounded. We are going to build a metric space M and a homeomorphism
f : M → M with the following features:

• M is the union of all Mn with an additional point, denoted as ∞, endowed


with a distance function relative to which (Mn )n converges to ∞.
348 Variational principle

• f fixes the point ∞ and its restriction to each Mn coincides with fn .

Then we are going to check that f : M → M has no measure of maximal entropy.


Let us explain how this is done.
Denote by dn the distance in each metric space Mn . It is no restriction to
assume that dn ≤ 1 for every n. Define M = n Mn ∪ {∞} and consider the
distance d defined in M by:
⎧ −2
⎨ n dn (x, y) if x ∈ Xn and y ∈ Xn with n ≥ 1
m −2
d(x, y) = i=n i if x ∈ Xn and y ∈ Xm with n < m
⎩  ∞ −2
i=n i if x ∈ Xn and y = ∞.
We leave it to the reader to check that d is indeed a distance in M and that (M, d)
is a compact space. Let β = supn h(fn ). Since the sets {∞} and Mn , n ≥ 1 are
invariant and cover the whole of M, any ergodic probability measure μ of f
satisfies either μ({∞}) = 1 or μ(Mn ) = 1 for some n ≥ 1. In the first case,
hμ (f ) = 0. In the second, μ may be viewed as a probability measure invariant
under fn and, consequently, hμ (f ) ≤ h(fn ). In particular, hμ (f ) < β for every
ergodic probability measure μ of f . The previous observation also shows that
sup{hμ (f ) : μ invariant and ergodic for f }
= sup sup{hμ (f ) : μ invariant and ergodic for fn }.
n

According to Proposition 10.5.4, this means that h(f ) = supn h(fn ) = β.


Thus, no ergodic invariant measure of f realizes the topological entropy. By
Proposition 10.5.5, it follows that f has no measure of maximal entropy.
Nevertheless, there is a broad class of transformations for which equilibrium
states do exist for every potential:
Lemma 10.5.8. If the entropy function ν → hν (f ) is upper semi-continuous
then E(f , φ) is compact, relative to the weak∗ topology, and non-empty, for any
potential φ : M → R.

Proof. Let (μn )n be a sequence in M1 (f ) such that



hμn (f ) + φ dμn converges to P(f , φ).

Since M1 (f ) is compact (Theorem 2.1.5), there exists some accumula-


tion point μ. The assumption implies that ν → hν (f ) + φ dν is upper
semi-continuous. Consequently,
 
hμ (f ) + φ dμ ≥ lim inf hμn (f ) + φ dμn = P(f , φ)
n

and so μ is an equilibrium state, as stated. Analogously, taking any sequence


(νn )n in E(f , φ) we see that every accumulation point ν is an equilibrium state.
This shows that E(f , φ) is closed and, thus, compact.
10.5 Equilibrium states 349

Corollary 10.5.9. Assume that f : M → M is an expansive continuous


transformation in a compact metric space M. Then every potential φ : M → R
admits some equilibrium state.

Proof. Just combine Corollary 9.2.17 with Lemma 10.5.8.

The conclusions of Corollaries 9.2.17 and 10.5.9 remain valid when f is


just h-expansive, in the sense of Exercise 10.2.8. See Bowen [Bow72]. Misi-
urewicz [Mis73] noted that the same is still true when f is just asymptotically
h-expansive, meaning that g∗ (f , ε) → 0 when ε → 0. Buzzi [Buz97] proved that
every C∞ diffeomorphism is asymptotically h-expansive. The corresponding
statement for h-expansivity is false: Burguet, Liao and Yang [BLY] found open
sets, in the C2 topology, formed by diffeomorphisms that are not h-expansive;
C∞ diffeomorphisms are dense in such sets.
Combining these results of Misiurewicz and Buzzi, one gets the following
theorem of Newhouse [New90]: for every C∞ diffeomorphism f , the entropy
function ν → hν (f ) is upper semi-continuous and so equilibrium states always
exist. Yomdin [Yom87] also proved that the topological entropy function
f → h(f ) is upper semi-continuous in the realm of C∞ diffeomorphisms. Both
conclusions, Newhouse’s and Yomdin’s, are usually false for Cr diffeomor-
phisms with r < ∞, according to Misiurewicz [Mis73]. But Liao, Viana and
Yang [LVY13] proved that they both extend to C1 diffeomorphisms away
from homoclinic tangencies and any such diffeomorphism is h-expansive. In
particular, equilibrium states always exist in that generality.
Uniqueness of equilibrium states is a much more delicate problem. It is
very easy to exhibit transformations with infinitely many ergodic equilibrium
states. For example, let f : S1 → S1 be a circle homeomorphism with infinitely
many fixed points. The Dirac measures on those points are ergodic invariant
probability measures. Since the topological entropy h(f ) is equal to zero
(Example 10.5.1), each of those measures is an equilibrium state for any
potential that attains its maximum at the corresponding point.
This type of example is trivial, of course, because the transformation is not
transitive. A more interesting question is whether an indivisibility property,
such as transitivity or topological mixing, ensures uniqueness of the equilib-
rium state. It turns out that this is also not true. The first counter-example
(called Dyck shift) was exhibited by Krieger [Kri75]. Next, we present a
particularly transparent and flexible construction, due to Haydn [Hay]. Other
interesting examples were studied by Hofbauer [Hof77].

Example 10.5.10. Let X = {∗, 1, 2, 3, 4} and consider the subsets E = {2, 4}


(the even symbols) and O = {1, 3} (the odd symbols). We are going to exhibit a
compact H ⊂ X Z invariant under the shift map in X Z such that the restriction σ :
H → H is topologically mixing and yet admits two mutually singular invariant
350 Variational principle

measures, μv and μa , such that

hμv (σ ) = hμa (σ ) = log 2 = h(σ ).

Let us describe this example. By definition, H = EZ ∪ OZ ∪ H∗ , where H∗


consists of the sequences x ∈ X Z that satisfy the following rule: Whenever one
block with m symbols of one type, even or odd, is followed by another block
with n symbols of the other type, odd or even, the two of them are separated
by a block of no less than m + n symbols ∗. In other words, the following
configurations are admissible in sequences x ∈ H∗ :
x =(. . . , ∗, e1 , . . . , em , ∗, . . . , ∗, o1 , . . . , on , ∗ . . . ) or
3 45 6
k

x =(. . . , ∗, o1 , . . . , om , ∗, . . . , ∗, e1 , . . . , en , ∗ . . . ),
3 45 6
k

with ei ∈ E, oj ∈ O and k ≥ m + n. Observe that a sequence x ∈ H∗ may start


and/or end with an infinite block of ∗ but it can neither start nor end with an
infinite block of either even or odd type. It is clear that H is invariant under the
shift map. Haydn [Hay] proved that (see Exercise 10.5.6):

(i) the shift map σ : H → H is topologically mixing;


(ii) h(σ ) = log 2.

We know that EZ and OZ support Bernoulli measures μv and μa with entropy


equal to log 2. Then, μv and μa are measures of maximal entropy for σ : H →
H and they are mutually singular.

Clearly, this construction may be modified to yield transformations with any


given number of ergodic measures of maximal entropy. Haydn [Hay] has also
shown how to adapt it to construct examples with multiple equilibrium states
for other potentials as well.
In Chapters 11 and 12 we study a class of transformations, called expanding,
for which every Hölder potential admits exactly one equilibrium state. In
particular, these transformations are intrinsically ergodic, that is, they have
a unique measure of maximal entropy.

10.5.1 Exercises
10.5.1. Show that, among the Bernoulli measures of the shift map σ :  →  in the
space  = {1, . . . , d}Z , the one with the largest entropy is given by the probability
vector (1/d, . . . , 1/d).
10.5.2. Let σ :  →  be the shift map in  = {1, . . . , d}Z and φ :  → R be a
locally constant potential, that is, such that φ is constant on each cylinder [0; i].
Calculate P(f , φ) and show that there exists some equilibrium state that is a
Bernoulli measure.
10.5 Equilibrium states 351

10.5.3. Let σ :  →  be the shift map in  = {1, . . . , d}N . An invariant probability


measure μ is called a Gibbs state for a potential ϕ :  → R if there exist P ∈ R
and K > 0 such that
μ(C)
K −1 ≤ ≤K (10.5.2)
exp(ϕn (x) − nP)
for every cylinder C = [0; i0 , . . . , in−1 ] and any x ∈ C. Prove that if μ is a Gibbs
state then hμ (σ ) + ϕ dμ coincides with the constant P in (10.5.2). Therefore,
μ is an equilibrium state if and only if P = P(σ , ϕ). Prove that for each choice
of the constant P there exists at most one ergodic Gibbs state.
10.5.4. Let f : M → M be a continuous transformation in a compact metric space
and φ : M → R be a continuous function. If μ is an equilibrium state for
φ, then the functional Fμ : C0 (M) → R defined by Fμ (ψ) = ψ dμ is such
that Fμ (ψ) ≤ P(f , φ + ψ) − P(f , φ) for every ψ ∈ C0 (M). Conclude that if the
pressure function P(f , ·) : C0 (M) → R is differentiable in every direction at a
point φ then φ admits at most one equilibrium state.
10.5.5. Let f : M → M be a continuous transformation in a compact metric space.
Show that the subset of functions φ : M → R for which there exists a unique
equilibrium state is residual in C0 (M).
10.5.6. Check the claims (i) and (ii) in Example 10.5.10.
11
Expanding maps

The distinctive feature of the transformations f : M → M that we study in the


last two chapters of this book is that they expand the distance between nearby
points: there exists a constant σ > 1 such that
d(f (x), f (y)) ≥ σ d(x, y)
whenever the distance between x and y is small (a precise definition will be
given shortly). There is more than one reason why this class of transformations
has an important role in ergodic theory.
On the one hand, as we are going to see, expanding maps exhibit very rich
dynamical behavior, from the metric and topological point of view as well as
from the ergodic point of view. Thus, they provide a natural and interesting
context for utilizing many of the ideas and methods that have been introduced
so far.
On the other hand, expanding maps lead to paradigms that are useful
for understanding many other systems, technically more complex. A good
illustration of this is the ergodic theory of uniformly hyperbolic systems, for
which an excellent presentation can be found in Bowen [Bow75a].
An important special case of expanding maps are the differentiable
transformations on manifolds such that
Df (x)v ≥ σ v
for every x ∈ M and every vector v tangent to M at the point x. We focus on
this case in Section 11.1. The main result (Theorem 11.1.2) is that, under the
hypothesis that the Jacobian det Df is Hölder, the transformation f admits a
unique invariant probability measure absolutely continuous with respect to the
Lebesgue measure. Moreover, that probability measure is ergodic and positive
on the open subsets of M.
In Section 11.2 we extend the notion of an expanding map to metric spaces
and we give a global description of the topological dynamics of such maps,
starting from the study of their periodic points. The main objective is to show
that the global dynamics may always be reduced to the topologically exact
11.1 Expanding maps on manifolds 353

case (Theorem 11.2.15). In Section 11.3 we complement this analysis by


showing that for these transformations the topological entropy coincides with
the growth rate of the number of periodic points.
The study of expanding maps will proceed in Chapter 12, where we will
develop the so-called thermodynamic formalism for such systems.

11.1 Expanding maps on manifolds


Let M be a compact manifold and f : M → M be a map of class C1 . We say that
f is expanding if there exists σ > 1 and some Riemannian metric on M such
that

Df (x)v ≥ σ v for every x ∈ M and every v ∈ Tx M. (11.1.1)

In particular, f is a local diffeomorphism: the condition (11.1.1) implies that


Df (x) is an isomorphism for every x ∈ M. In what follows, we call Lebesgue
measure on M the volume measure m induced by such a Riemannian metric.
The precise choice of the metric is not very important, since the volume
measures induced by different Riemannian metrics are all equivalent.

Example 11.1.1. Let fA : Td → Td be the linear endomorphism of the torus


induced by some matrix A with integer coefficients and determinant different
from zero. Assume that all the eigenvalues λ1 , . . . , λd of A are larger than 1 in
absolute value. Then, given any 1 < σ < infi |λi |, there exists an inner product
in Rd relative to which Av ≥ σ v for every v. Indeed, suppose that the
eigenvalues are real. Consider any basis of Rd that sets A in canonical Jordan
form: A = D + εN where N is nilpotent and D is diagonal with respect to
that basis. The inner product relative to which such basis is orthonormal has
the required property, as long as ε > 0 is small enough. The reader should
have no difficulty extending this argument to the case when there are complex
eigenvalues. This shows that the transformation fA is expanding.

It is clear from the definition that any map sufficiently close to an expanding
one, relative to the C1 topology, is still expanding. Thus, the observation in
Example 11.1.1 provides a whole open set of examples of expanding maps.
A classical result of Michael Shub [Shu69] asserts a (much deeper) kind of
converse: every expanding map on the torus Td is topologically conjugate to
an expanding linear endomorphism fA .
Given a probability measure μ invariant under a transformation f : M → M,
we call the basin of μ the set B(μ) of all points x ∈ M such that

1
n−1
lim ϕ(f j (x)) = ϕ dμ
n→∞ n
j=0
354 Expanding maps

for every continuous function ϕ : M → R. Note that the basin is always an


invariant set. If μ is ergodic then B(μ) is a full measure set (Exercise 4.1.5).
Theorem 11.1.2. Let f : M → M be an expanding map on a compact
(connected) manifold M and assume that the Jacobian x → det Df (x) is Hölder.
Then f admits a unique invariant probability measure μ absolutely continuous
with respect to Lebesgue measure m. Moreover, μ is ergodic, its support
coincides with M and its basin has full Lebesgue measure in M.
First, let us outline the strategy of the proof of Theorem 11.1.2. The details
will be given in the forthcoming sections. The conclusion is generally false if
one omits the hypothesis of Hölder continuity: see Quas [Qua99].
It is easy to check (Exercise 11.1.1) that the pre-image under f of any set
with zero Lebesgue measure m also has zero Lebesgue measure. This means
that the image f∗ ν under f of any measure ν absolutely continuous with respect
to m is also absolutely continuous with respect to m. In particular, the n-th
image f∗n m is always absolutely continuous with respect to m.
In Proposition 11.1.7 we prove that the density (that is, the Radon–Nikodym
derivative) of each f∗n m with respect to m is bounded by some constant
independent of n ≥ 1. We deduce from this fact that every accumulation point
of the sequence
1 j
n−1
f m,
n j=0 ∗
with respect to the weak∗ topology, is an invariant probability measure
absolutely continuous with respect to Lebesgue measure, with density bounded
by that same constant.
An additional argument, using the fact that M is connected, proves that the
accumulation point is unique and has all the properties in the statement of the
theorem.

11.1.1 Distortion lemma


Starting the proof of Theorem 11.1.2, let us prove the following elementary
fact:
Lemma 11.1.3. Let f : M → M be a local diffeomorphism of class Cr , r ≥ 1 on
a compact Riemannian manifold M and σ > 0 be such that Df (x)v ≥ σ v
for every x ∈ M and every v ∈ Tx M. Then there exists ρ > 0 such that, for any
pre-image x of any point y ∈ M, there exists a map h : B(y, ρ) → M of class Cr
such that f ◦ h = id , h(y) = x and
d(h(y1 ), h(y2 )) ≤ σ −1 d(y1 , y2 ) for every y1 , y2 ∈ B(y, ρ). (11.1.2)

Proof. By the inverse function theorem, for every ξ ∈ M there exist open
neighborhoods U(ξ ) ⊂ M of ξ and V(ξ ) ⊂ N of f (ξ ) such that f maps U(ξ )
11.1 Expanding maps on manifolds 355

diffeomorphically onto V(ξ ). Since M is compact, it follows that there exists


δ > 0 such that d(ξ , ξ  ) ≥ δ whenever f (ξ ) = f (ξ  ). In particular, we may
choose these neighborhoods in such a way that U(ξ ) ∩ U(ξ  ) = ∅ whenever
f (ξ ) = f (ξ  ). For each η ∈ M, let

W(η) = V(ξ ).
ξ ∈f −1 (η)

Since f −1 (η) is finite (Exercise A.4.6), every W(η) is an open set. Fix ρ > 0
such that 2ρ is a Lebesgue number for the open cover {W(η) : η ∈ M} of M.
In particular, for every y ∈ M there exists η ∈ M such that B(y, ρ) is contained
in W(η), that is, it is contained in V(ξ ) for all ξ ∈ f −1 (η). Since the U(ξ ) are
pairwise disjoints and #f −1 (y) = degree(f ) = #f −1 (η), given any x ∈ f −1 (y)
there exists exactly one ξ ∈ f −1 (η) such that x ∈ U(ξ ). Let h be the restriction
to B(y, ρ) of the inverse of f : U(ξ ) → V(ξ ). By construction, f ◦ h = id and
h(y) = x. Moreover, Dh(z) = Df (h(z))−1  ≤ σ −1 for every z in the domain
of h. By the mean value theorem, this implies that h has the property (11.1.2).

Transformations h as in this statement are called inverse branches of the


local diffeomorphism f . Now assume that f is an expanding map. The condition
(11.1.1) means that in this case we may take σ > 1 in the hypothesis of the
lemma. Then the conclusion (11.1.2) implies that the inverse branches are
contractions, with uniform contraction rate.
In particular, we may define inverse branches hn of any iterate f n , n ≥ 1, as
follows. Given y ∈ M and x ∈ f −n (y), let h1 , . . . , hn be inverse branches of f with
hj (f n−j+1 (x)) = f n−j (x)
for every 1 ≤ j ≤ n. Since every hj is a contraction, its image is contained in a
ball around f n−j (x) with radius smaller than ρ. Then hn = hn ◦ · · · ◦ h1 is well
defined on the closure of the ball of radius ρ around y. It is clear that f n ◦hn = id
and hn (y) = x. Moreover, each hn is a contraction:
d(hn (y1 ), hn (y2 )) ≤ σ −n d(y1 , y2 ) for every y1 , y2 ∈ B(y, ρ).

Lemma 11.1.4. If f : M → M is a C1 expanding map on a compact manifold


then f is expansive.

Proof. By Lemma 11.1.3, there exists ρ > 0 such that, for any pre-image x of a
point y ∈ M, there exists a map h : B(y, ρ) → M of class C1 such that f ◦ h = id ,
h(y) = x and
d(h(y1 ), h(y2 )) ≤ σ −1 d(y1 , y2 ) for every y1 , y2 ∈ B(y, ρ).
Hence, if d(f n (x), f n (y)) ≤ ρ for every n ≥ 0 then
d(x, y) ≤ σ −n d(f n (x), f n (y)) ≤ σ −n ρ,
which immediately implies that x = y.
356 Expanding maps

The next result provides a good control of the distortion of the iterates of f
and their inverse branches, which is crucial for the proof of Theorem 11.1.2.
This is the only step of the proof where we use the hypothesis that the Jacobian
x → det Df (x) is Hölder. Note that, since f is a local diffeomorphism and M is
compact, the Jacobian is bounded from zero and infinity. Hence, the logarithm
log | det Df | is also Hölder: there exist C0 > 0 and ν > 0 such that
 
 log | det Df (x)| − log | det Df (y)| ≤ C0 d(x, y)ν for any x, y ∈ M.

Proposition 11.1.5 (Distortion lemma). There exists C1 > 0 such that, given
any n ≥ 1, any y ∈ M and any inverse branch hn : B(y, ρ) → M of f n ,
| det Dhn (y1 )|
log ≤ C1 d(y1 , y2 )ν ≤ C1 (2ρ)ν
| det Dh (y2 )|
n

for every y1 , y2 ∈ B(y, ρ).

Proof. Write hn as a composition hn = hn ◦ · · · ◦ h1 of inverse branches of f .


Analogously, hi = hi ◦ · · · ◦ h1 for 1 ≤ i < n and h0 = id . Then,
| det Dhn (y1 )| 
n
log = log | det Dhi (hi−1 (y1 ))| − log | det Dhi (hi−1 (y2 ))| .
| det Dh (y2 )| i=1
n

Note that log | det Dhi | = − log | det Df | ◦ hi and recall that every hj is a
σ −1 -contraction. Hence,
| det Dhn (y1 )|  
n n
ν
log ≤ C0 d(hi
(y1 ), hi
(y2 )) ≤ C0 σ −iν d(y1 , y2 )ν .
| det Dhn (y2 )| i=1 i=1

Therefore, to prove the lemma it suffices to take C1 = C0 ∞ i=1 σ
−iν
.

The geometric meaning of this proposition is made even more transparent


by the following corollary:
Corollary 11.1.6. There exists C2 > 0 such that, for every y ∈ M and any
measurable sets B1 , B2 ⊂ B(y, ρ),
1 m(B1 ) m(hn (B1 )) m(B1 )
≤ ≤ C2 .
C2 m(B2 ) m(hn (B2 )) m(B2 )
Proof. Take C2 = exp(2C1 (2ρ)ν ). It follows from the Proposition 11.1.5 that

m(h (B1 )) =
n
| det Dhn | dm ≤ exp(C1 (2ρ)ν )| det Dhn (y)|m(B1 ) and
B1

m(hn (B2 )) = | det Dhn | dm ≥ exp(−C1 (2ρ)ν )| det Dhn (y)|m(B2 ).
B2

Dividing the two inequalities, we get that


m(hn (B1 )) m(B1 )
≤ C2 .
m(hn (B2 )) m(B2 )
Inverting the roles of B1 and B2 we get the other inequality.
11.1 Expanding maps on manifolds 357

The next result, which is also a consequence of the distortion lemma,


asserts that the iterates f∗n m of the Lebesgue measure have uniformly bounded
densities:
Proposition 11.1.7. There exists C3 > 0 such that (f∗n m)(B) ≤ C3 m(B) for
every measurable set B ⊂ M and every n ≥ 1.

Proof. It is no restriction to suppose that B is contained in a ball B0 = B(z, ρ) of


radius ρ around some point in the pre-image of z ∈ M. Using Corollary 11.1.6,
we see that
m(hn (B)) m(B)
≤ C2
m(hn (B0 )) m(B0 )
for every inverse branch hn of f n at the point z. Moreover, (f∗n m)(B) =
m(f −n (B)) is the sum of m(hn (B)) over all the inverse branches, and
analogously for B0 . In this way, we find that
(f∗n m)(B) m(B)
≤ C2 .
(f∗n m)(B0 ) m(B0 )
It is clear that (f∗n m)(B0 ) ≤ (f∗n m)(M) = 1. Moreover, the Lebesgue measure of
the balls with a fixed radius ρ is bounded from zero for some constant α0 > 0
that depends only on ρ. So, to get the conclusion of the proposition it suffices
to take C3 = C2 α0 .

We also need the auxiliary result that follows. Recall that, given a function ϕ
and a measure ν, we denote by ϕν the measure defined by (ϕν)(B) = B ϕ dν.
Lemma 11.1.8. Let ν be a probability measure on a compact metric space X
and ϕ : X → [0, +∞) be an integrable function with respect to ν. Let μi , i ≥ 1,
be a sequence of probability measures on X converging, in the weak∗ topology,
to a probability measure μ. If μi ≤ ϕν for every i ≥ 1 then μ ≤ ϕν.

Proof. Let B be any measurable set. For each ε > 0, let Kε be a compact subset
of B such that μ(B \ Kε ) and (ϕν)(B \ Kε ) are both less than ε (such a compact
set does exist, by Proposition A.3.2). Fix r > 0 small enough that the measure
of Aε \ Kε is also less than ε, for both μ and ϕν, where Aε = {z : d(z, Kε ) < r}.
The set of values of r for which the boundary of Aε has positive μ-measure is at
most countable (Exercise A.3.2). Hence, up to changing r slightly if necessary,
we may suppose that the boundary of Aε has measure zero. Then μ = limi μi
implies that μ(Aε ) = limi μi (Aε ) ≤ (ϕν)(Aε ). Making ε → 0, we conclude that
μ(B) ≤ (ϕν)(B).

Applying this lemma to our present situation, we obtain


n−1 j
Corollary 11.1.9. Every accumulation point μ of the sequence n−1 j=0 f∗ m
is an invariant probability measure for f absolutely continuous with respect to
the Lebesgue measure.
358 Expanding maps

Proof. Take ϕ constant equal to C3 and let ν = m. Choose a subsequence


ni −1 j
(ni )i such that μi = n−1
i j=0 f∗ m converges to some probability measure μ.
Proposition 11.1.7 ensures that μi ≤ ϕν. Then, by Lemma 11.1.8, we also have
μ ≤ ϕν = C3 m. This implies that μ  m with density bounded by C3 .

11.1.2 Existence of ergodic measures


Next, we show that the measure μ we have just constructed is the unique
invariant probability measure absolutely continuous with respect to the
Lebesgue measure and, moreover, it is ergodic for f .
Start by fixing a finite partition P0 = {U1 , . . . , Us } of M into regions with
non-empty interior and diameter less than ρ. Then, for each n ≥ 1, define Pn
to be the partition of M into the images of the Ui , 1 ≤ i ≤ s, under the inverse
branches of f n . The diameter of each Pn , that is, the supremum of the diameters
of its elements, is less than ρσ −n .

Lemma 11.1.10. Let Pn , n ≥ 1, be a sequence of partitions in a compact


metric space M, with diameters converging to zero. Let ν be a probability
measure on M and B be any measurable set with ν(B) > 0. Then there exist
Vn ∈ Pn , n ≥ 1, such that
ν(B ∩ Vn )
ν(Vn ) > 0 and →1 when n → ∞.
ν(Vn )

Proof. Given any 0 < ε < ν(B), let Kε ⊂ B be a compact set with ν(B \ Kε ) < ε.
Let Kε,n be the union of all the elements of Pn that intersect Kε . Since the
diameters of the partitions converge to zero, ν(Kε,n \ Kε ) < ε for every n
sufficiently large. By contradiction, suppose that
  ν(B) − ε
ν Kε ∩ Vn ≤ ν(Vn )
ν(B) + ε
for every Vn ∈ Pn that intersects Kε . It would follow that
    ν(B) − ε ν(B) − ε
ν(Kε ) ≤ ν Kε ∩ Vn ≤ ν(Vn ) = ν(Kε,n )
V V
ν(B) + ε ν(B) + ε
n n

ν(B) − ε
≤ (ν(Kε ) + ε) ≤ ν(B) − ε < ν(Kε ).
ν(B) + ε
This contradiction shows that there must exist some Vn ∈ Pn such that
    ν(B) − ε
ν(Vn ) ≥ ν B ∩ Vn ≥ ν Kε ∩ Vn > ν(Vn )
ν(B) + ε
and, consequently, ν(Vn ) > 0. Making ε → 0 we get the claim.

In the statements that follow, we say that a measurable set A ⊂ M is invariant


under f : M → M if f −1 (A) = A up to zero Lebesgue measure. According
11.1 Expanding maps on manifolds 359

to Exercise 11.1.1, then we also have that f (A) = A up to zero Lebesgue


measure.
Lemma 11.1.11. If A ⊂ M satisfies f (A) ⊂ A and has positive Lebesgue
measure then A has full Lebesgue measure inside some Ui ∈ P0 , that is, there
exists 1 ≤ i ≤ s such that m(Ui \ A) = 0.

Proof. By Lemma 11.1.10, we may choose Vn ∈ Pn so that m(Vn \ A)/m(Vn )


converges to zero when n → ∞. Let Ui(n) = f n (Vn ). By Proposition 11.1.5
applied to the inverse branch of f n that maps Ui(n) to Vn , we get that
m(Ui(n) \ A) m(f n (Vn \ A))  
ν m(Vn \ A)
≤ ≤ exp C1 (2ρ)
m(Ui(n) ) m(f n (Vn )) m(Vn )
also converges to zero. Since P0 is finite, there must be 1 ≤ i ≤ s such that
i(n) = i for infinitely many values of n. Then m(Ui \ A) = 0.

Corollary 11.1.12. The transformation f : M → M admits some ergodic


invariant probability measure absolutely continuous with respect to the
Lebesgue measure.

Proof. It follows from the previous lemma there exist at most s = #P0 pairwise
disjoint invariant sets with positive Lebesgue measure. Therefore, M may be
partitioned into a finite number of minimal invariant sets A1 , . . . , Ar , r ≤ s
with positive Lebesgue measure, where by minimal we mean that there are no
invariant sets Bi ⊂ Ai with 0 < m(Bi ) < m(Ai ). Given any absolutely continuous
invariant probability measure μ, there exists some i such that μ(Ai ) > 0. The
normalized restriction
μ(B ∩ Ai )
μi (B) =
μ(Ai )
of μ to any such Ai is invariant and absolutely continuous. Moreover, the
assumption that Ai is minimal implies that μi is ergodic.

11.1.3 Uniqueness and conclusion of the proof


The previous argument also shows that there exist only a finite number of
absolutely continuous ergodic probability measures. The last step is to show
that, in fact, such a probability measure is unique. For that we use the fact that
f is topologically exact:
Lemma 11.1.13. Given any non-empty open set U ⊂ M, there exists N ≥ 1
such that f N (U) = M.

Proof. Let x ∈ U and r > 0 be such that the ball of radius r around x is
contained in U. Given any n ≥ 1, suppose that f n (U) does not cover the
whole manifold. Then there exists some curve γ connecting f n (x) to a point
y ∈ M \f n (U), and that curve may be taken with length smaller than diam M +1.
360 Expanding maps

Lifting1 γ by the local diffeomorphism f n , we obtain a curve γn connecting x


to some point yn ∈ M \ U. Then r ≤ %(γn ) ≤ σ −n (diam M + 1). This provides
an upper bound on the possible value of n. Hence, f n (U) = M for every n
sufficiently large, as claimed.

Corollary 11.1.14. If A ⊂ M has positive Lebesgue measure and satisfies


f (A) ⊂ A then A has full Lebesgue measure in the whole manifold M.

Proof. Let U be the interior of a set Ui as in Lemma 11.1.11 and N ≥ 1 be


such that f N (U) = M. Then m(U \ A) = 0 and, using the fact that f is a local
diffeomorphism, it follows that M \ A = f N (U) \ f N (A) ⊂ f N (U \ A) also has
Lebesgue measure zero.

The next statement completes the proof of Theorem 11.1.2:


Corollary 11.1.15. Let μ be any absolutely continuous invariant probability
measure. Then μ is ergodic and its basin B(μ) has full Lebesgue measure in
M. Consequently, μ is unique. Moreover, its support is the whole manifold M.

Proof. If A is an invariant set then, by Corollary 11.1.14, either A or its


complement Ac has Lebesgue measure zero. Since μ is absolutely continuous,
it follows that either μ(A) = 0 or μ(Ac ) = 0. This proves that μ is ergodic.
Then μ(B(μ)) = 1 and, in particular, m(B(μ)) > 0. Since B(μ) is an invariant
set, it follows that it has full Lebesgue measure. Analogously, since the support
of μ is a compact set with positive Lebesgue measure and f (supp μ) ⊂ supp μ,
it must coincide with M.
Finally, let μ and ν be any two absolutely continuous invariant probability
measures. It follows from what we have just said that the two measures are
ergodic and their basins intersect each other. Given any point x in B(μ) ∩ B(ν),
the sequence
1
n−1
δj
n j=0 f (x)

converges to both μ and ν in the weak∗ topology. Thus, μ = ν.

In general, we say that an invariant probability measure μ of a local


diffeomorphism f : M → M is a physical measure if its basin has positive
Lebesgue measure. It follows from Corollary 11.1.15 that in the present context
there exists a unique physical measure, which is the absolutely continuous
invariant measure μ, and its basin has full Lebesgue measure. This last fact
may be expressed as follows:

1
n−1
δ j →μ for Lebesgue almost every x.
n j=0 f (x)

1 Note that any local diffeomorphism from a compact manifold to itself is a covering map.
11.1 Expanding maps on manifolds 361

In Chapter 12 we will find this absolutely continuous invariant probability


measure μ through a different approach (Proposition 12.1.20) that also shows
that the density h = dμ/dm is Hölder and bounded away from zero. In
particular, μ is equivalent to the Lebesgue measure m, not just absolutely
continuous. Moreover (Section 12.1.7), the system (f , μ) is exact, not just
ergodic. In addition (Lemma 12.1.12), its Jacobian is given by Jμ f =
| det Df |(h ◦ f )/h. Hence, by the Rokhlin formula (Theorem 9.7.3),
   
hμ (f ) = log Jμ f dμ = log | det Df | dμ + log(h ◦ f ) dμ − log h dμ.

Since μ is invariant, this means that



hμ (f ) = log | det Df | dμ.

Actually, the facts stated in the previous paragraph can already be proven
with the methods available at this point. We invite the reader to do just that
(Exercises 11.1.3 through 11.1.6), in the context of expanding maps of the
interval, which are technically a bit simpler than expanding maps on a general
manifold.
Example 11.1.16. We say that a transformation f : [0, 1] → [0, 1] is an
expanding map of the interval if there exists a countable (possibly finite)
family P of pairwise disjoint open subintervals whose union has full Lebesgue
measure in [0, 1] and which satisfy:
(i) The restriction of f to each P ∈ P is a diffeomorphism onto (0, 1); denote
by fP−1 : (0, 1) → P its inverse.
(ii) There exist C > 0 and θ > 0 such that, for every x, y and every P ∈ P,
 
 log |D(f −1 )(x)| − log |D(f −1 )(y)| ≤ C|x − y|θ .
P P

(iii) There exist c > 0 and σ > 1 such that, for every n and every x,
|Df n (x)| ≥ cσ n (whenever the derivative is defined.)
This class of transformations includes the decimal expansion and the Gauss
map as special cases. Its properties are analyzed in Exercises 11.1.3
through 11.1.5.
Exercise 11.1.6 deals with a slightly more general class of transformations,
where we replace condition (i) by
(i ) There exists δ > 0 such that the restriction of f to each P ∈ P is a
diffeomorphism onto some interval f (P) of length larger than δ that
contains every element of P that it intersects.

11.1.4 Exercises
11.1.1. Let f : M → M be a local diffeomorphism in a compact manifold and m be the
Lebesgue measure on M. Check the following facts:
362 Expanding maps

(a) If m(B) = 0 then m(f −1 (B)) = 0.


(b) If B is measurable then f (B) is measurable.
(c) If m(B) = 0 then m(f (B)) = 0.
(d) If A = B up to zero Lebesgue measure zero then f (A) = f (B) and f −1 (A) =
f −1 (B) up to zero Lebesgue measure.
(e) If A is an invariant set then f (A) = A up to zero Lebesgue measure.
11.1.2. Let f : M → M be a transformation of class C1 such that there exist σ > 1 and
k ≥ 1 satisfying Df k (x)v ≥ σ v for every x ∈ M and every v ∈ Tx M. Show
that there exists θ > 1 and a Riemannian norm &·' equivalent to  ·  such that
&Df (x)v' ≥ θ &v' for every x ∈ M and every v ∈ Tx M.
11.1.3. Show that if f : [0, 1] → [0, 1] is an expanding map of the interval and m is the
Lebesgue measure on [0, 1] then there exists a function ρ : (0, 1) → (0, ∞) such
that log ρ is bounded and Hölder and μ = ρm is a probability measure invariant
under f .
11.1.4. Show that the measure μ in Exercise 11.1.3 is exact and is the unique invariant
probability measure of f absolutely continuous with respect to the Lebesgue
measure m.
11.1.5. Show that the measure μ in Exercise 11.1.3 satisfies the Rokhlin formula:
assuming that log |f  | ∈ L1 (μ), we have that hμ (f ) = log |f  | dμ.
11.1.6. Prove the following generalization of Exercises 11.1.3 and 11.1.4: if f
satisfies the conditions (i ), (ii) and (iii) in Example 11.1.16 then there exists
a finite (non-empty) family of absolutely continuous invariant probability
measures ergodic for f and such that every absolutely continuous invariant
probability measure is a convex combination of those ergodic probability
measures.

11.2 Dynamics of expanding maps


In this section we extend the notion of an expanding map to compact metric
spaces and we mention a few interesting examples. In this general setup, an
expanding map need not be transitive, let alone topologically exact (compare
Lemma 11.1.13). However, Theorems 11.2.14 and 11.2.15 assert that the
dynamics may always be reduced to the topologically exact case. This is
relevant because for the main results in this section we need the transformation
to be topologically exact or, equivalently (Exercise 11.2.2), topologically
mixing.
A continuous transformation f : M → M in a compact metric space M is an
expanding map if there exist constants σ > 1 and ρ > 0 such that for every
p ∈ M the image of the ball B(p, ρ) contains a neighborhood of the closure of
B(f (p), ρ) and
d(f (x), f (y)) ≥ σ d(x, y) for every x, y ∈ B(p, ρ). (11.2.1)
Every expanding map on a manifold, in the sense of Section 11.1, is also
expanding in the present sense:
11.2 Dynamics of expanding maps 363

Example 11.2.1. Let M be a compact Riemannian manifold and f : M → M


be a map of class C1 such that Df (x)v ≥ σ v for every x ∈ M and every
v ∈ Tx M, where σ is a constant larger than 1. Denote K = sup Df  (observe
that K > 1). Fix ρ > 0 small enough that the restriction of f to every ball
B(p, 2Kρ) is a diffeomorphism onto its image. Consider any y ∈ B(f (p), σρ)
and let γ : [0, 1] → B(f (p), σρ) be a minimizing geodesic (that is, such that it
realizes the distance between points) with γ (0) = f (p) and γ (1) = y. By the
choice of ρ, there exists a differentiable curve β : [0, δ] → B(p, ρ) such that
β(0) = p and f (β(t)) = γ (t) for every t. Observe that (using (·) to denote the
length of a curve),
   
d(p, β(t)) ≤ β | [0, t] ≤ σ −1 γ | [0, t] = σ −1 td(f (p), y) < tρ

for every t. This shows that we may take δ = 1. Then, β(1) ∈ B(p, ρ) and
f (β(1)) = γ (1) = y. In this way, we have shown that f (B(p, ρ)) contains
B(f (p), σρ), which is a neighborhood of the closure of B(f (p), ρ). Now
consider any x, y ∈ B(p, ρ). Note that d(f (x), f (y)) < 2Kρ. Let γ : [0, 1] →
B(f (x), 2Kρ) be a minimizing geodesic connecting f (x) to f (y). Arguing as in
the previous paragraph, we find a differentiable curve β : [0, 1] → B(x, 2Kρ)
connecting x to y and such that f (β(t)) = γ (t) for every t. Then,

d(f (x), f (y)) = (γ ) ≥ σ (β) ≥ σ d(x, y).

This completes the proof that f is an expanding map.

The following fact is useful for constructing further examples:

Lemma 11.2.2. Assume that f : M → M is an expanding map and  ⊂ M is a


compact set such that f −1 () = . Then the restriction f :  →  is also an
expanding map.

Proof. It is clear that the condition (11.2.1) remains valid for the restriction.
We are left to check that f ( ∩ B(p, ρ)) contains a neighborhood of  ∩
B(f (p), ρ) inside . By assumption, f (B(p, ρ)) contains some neighborhood V
of the closure of B(f (p), ρ). Then  ∩ V is a neighborhood of  ∩ B(f (p), ρ).
Moreover, given any y ∈  ∩ V there exists x ∈ B(p, ρ) such that f (x) = y.
Since f −1 () = , this point is necessarily in . This proves that  ∩ V is
contained in the image f ( ∩ B(p, ρ)). Hence, the restriction of f to the set 
is an expanding map, as stated.

It is not possible to replace the hypothesis of Lemma 11.2.2 by f () = .


See Exercise 11.2.4.

Example 11.2.3. Let J ⊂ [0, 1] be a finite union of (two or more) pairwise


disjoint compact intervals. Consider a map f : J → [0, 1] such that the
364 Expanding maps

0 J1 J2 J3 1

Figure 11.1. Expanding map on a Cantor set

restriction of f to each connected component of J is a diffeomorphism onto


[0, 1]. See Figure 11.1. Assume that there exists σ > 1 such that
|f  (x)| ≥ σ for every x ∈ J. (11.2.2)
Denote  = ∞ n=0 f
−n
(J). In other words,  is the set of all points x whose
iterates f (x) are defined for every n ≥ 0. It follows from the definition that
n

 is compact (one can show that K is a Cantor set) and f −1 () = . The
restriction f :  →  is an expanding map. Indeed, fix any ρ > 0 smaller than
the distance between any two connected components of J. Then every ball of
radius ρ inside  is contained in a unique connected component of J and so,
by (11.2.2), it is dilated by a factor greater than or equal to σ .

Example 11.2.4. Let σ : A → A be the one-sided shift of finite type


associated with a transition matrix A (these notions were introduced in
Section 10.2.2). Consider in A the distance defined by
 
d (xn )n , (yn )n = 2−N , N = inf{n ∈ N : xn  = yn }. (11.2.3)
Then σA is an expanding map. Indeed, fix ρ ∈ (1/2, 1) and σ = 2. The open
ball of radius ρ around any point (pn )n ∈ A is just the cylinder [0; p0 ]A that
contains the point. The definition (11.2.3) yields
   
d (xn+1 )n , (yn+1 )n = 2d (xn )n , (yn )n
for any (xn )n and (yn )n in the cylinder [0; p0 ]A . Moreover, σA ([0; p0 ]A ) is the
union of all cylinders [0; q] such that Ap0 ,q = 1. In particular, it contains the
cylinder [0; p1 ]A . Since the cylinders are both open and closed in A , this shows
that the image of the ball of radius ρ around (pn )n contains a neighborhood of
the closure of the ball of radius ρ around (pn+1 )n . This completes the proof
that every shift of finite type is an expanding map.

Example 11.2.5. Let f : S1 → S1 be a local diffeomorphism of class C2 with


degree larger than 1. Assume that all the periodic points of f are hyperbolic,
11.2 Dynamics of expanding maps 365

that is, |(f n ) (x)|  = 1 for every x ∈ Fix(f n ) and every n ≥ 1. Let  be the
complement of the union of the basins of attraction of all the attracting periodic
points of f . Then the restriction f :  →  is an expanding map: this is a
consequence of a deep theorem of Ricardo Mañé [Mañ85].

For expanding maps f : M → M in a metric space M, the number of


pre-images of a point y ∈ M may vary with y (unless M is connected; see
Exercises 11.2.1 and A.4.6). For example, for a shift of finite type σ : A → A
(Example 11.2.4) the number of pre-images of a point y = (yn )n ∈ A is equal
to the number of symbols i such that Ai,y0 = 1; in general, this number depends
on y0 .
On the other hand, it is easy to see that the number of pre-images is always
finite, and even bounded: it suffices to consider a finite cover of M by balls
of radius ρ and to notice that every point has at most one pre-image in
each of those balls. By a slight abuse of language, we call the degree of an
expanding map f : M → M the maximum number of pre-images of any point,
that is,
degree(f ) = max{#f −1 (y) : y ∈ M}. (11.2.4)

11.2.1 Contracting inverse branches


Let f : M → M be an expanding map. By definition, the restriction of
f to each ball B(p, ρ) of radius ρ is injective and its image contains the
closure of B(f (p), ρ). Thus, the restriction to B(p, ρ) ∩ f −1 (B(f (p), ρ)) is a
homeomorphism onto B(f (p), ρ). We denote by

hp : B(f (p), ρ) → B(p, ρ)

its inverse and call it the inverse branch of f at p. It is clear that hp (f (p)) = p
and f ◦ hp = id . The condition (11.2.1) implies that hp is a σ −1 -contraction:

d(hp (z), hp (w)) ≤ σ −1 d(z, w) for every z, w ∈ B(f (p), ρ). (11.2.5)

Lemma 11.2.6. If f : M → M is expanding then, for every y ∈ M,



f −1 (B(y, ρ)) = hx (B(y, ρ)).
x∈f −1 (y)

Proof. The relation f ◦ hx = id implies that hx (B(y, ρ)) is contained in the


pre-image of B(y, ρ) for every x ∈ f −1 (y). To prove the other inclusion, let z
be any point such that f (z) ∈ B(y, ρ). By the definition of an expanding map,
f (B(z, ρ)) contains B(f (z), ρ) and, hence, contains y. Let hz : B(f (z), ρ) → M be
the inverse branch of f at z and let x = hz (y). Both z and hx (f (z)) are in B(x, ρ).
Since f is injective on every ball of radius ρ and f (hx (f (z))) = f (z), it follows
that z = hx (f (z)). This completes the proof.
366 Expanding maps

hp hf(p)

p f(p) f 2(p)
Figure 11.2. Inverse branches of f n

More generally, for any n ≥ 1, we call the inverse branch of f n at p the


composition
hnp = hp ◦ · · · ◦ hf n−1 (p) : B(f n (p), ρ) → B(p, ρ)
of the inverse branches of f at the iterates of p. See Figure 11.2. Observe that
n−j
hnp (f n (p)) = p and f n ◦ hnp = id . Moreover, f j ◦ hnp = hf j (p) for each 0 ≤ j < n.
Hence,
d(f j ◦ hnp (z), f j ◦ hnp (w)) ≤ σ j−n d(z, w) (11.2.6)
for every z, w ∈ B(f n (p), ρ) and every 0 ≤ j ≤ n.
Lemma 11.2.7. If f : M → M is an expanding map then f n (B(p, n + 1, ε)) =
B(f n (p), ε) for every p ∈ M, n ≥ 0 and ε ∈ (0, ρ].

Proof. The inclusion f n (B(p, n + 1, ε)) ⊂ B(f n (p), ε) is an immediate conse-


quence of the definition of a dynamical ball. To prove the converse, consider
the inverse branch hnp : B(f n (p), ρ) → B(p, ρ). Given any y ∈ B(f n (p), ε), let
x = hnp (y). Then f n (x) = y and, by (11.2.6),
d(f j (x), f j (p)) ≤ σ j−n d(f n (x), f n (p)) ≤ d(y, f n (p)) < ε
for every 0 ≤ j ≤ n. This shows that x ∈ B(p, n + 1, ε).

Corollary 11.2.8. Every expanding map is expansive.

Proof. Assume that d(f n (z), f n (w)) < ρ for every n ≥ 0. This implies that z =
hnw (f n (z)) for every n ≥ 0. Then, the property (11.2.6) gives that
d(z, w) ≤ σ −n d(f n (z), f n (w)) < ρσ −n .
Making n → ∞, we get that z = w. So, ρ is a constant of expansivity for f .

11.2.2 Shadowing and periodic points


Given δ > 0, we call a δ-pseudo-orbit of a transformation f : M → M any
sequence (xn )n≥0 such that
d(f (xn ), xn+1 ) < δ for every n ≥ 0.
We say that the δ-pseudo-orbit is periodic if there is κ ≥ 1 such that xn = xn+κ
for every n ≥ 0. It is clear that every orbit is a δ-pseudo-orbit, for every
11.2 Dynamics of expanding maps 367

δ > 0. For expanding maps we have a kind of converse: every pseudo-orbit


is uniformly close to (we say that it is shadowed by) some orbit of the
transformation:

Proposition 11.2.9 (Shadowing lemma). Assume that f : M → M is an


expanding map. Then, given any ε > 0 there exists δ > 0 such that for every
δ-pseudo-orbit (xn )n there exists x ∈ M such that d(f n (x), xn ) < ε for every
n ≥ 0.
If ε is small enough, so that 2ε is a constant of expansivity for f , then the
point x is unique. If, in addition, the pseudo-orbit is periodic then x is a periodic
point.

Proof. It is no restriction to suppose that ε is less than ρ. Fix δ > 0 so that


σ −1 ε + δ < ε. For each n ≥ 0, let hn : B(f (xn ), ρ) → B(xn , ρ) be the inverse
branch of f at xn . The property (11.2.5) ensures that
 
hn B(f (xn ), ε) ⊂ B(xn , σ −1 ε) for every n ≥ 1. (11.2.7)
Since d(xn , f (xn−1 )) < δ, it follows that
 
hn B(f (xn ), ε) ⊂ B(f (xn−1 ), ε) for every n ≥ 1. (11.2.8)
Then, we may consider the composition hn+1 = h0 ◦· · · ◦ hn , and  (11.2.8)
implies that the sequence of compact sets Kn+1 = hn+1 B(f (xn ), ε) is nested.
Take x in the intersection. For every n ≥ 0, we have that x ∈ Kn+1 and so f n (x)
belongs to
   
f n ◦ hn+1 B(f (xn ), ε) = hn B(f (xn ), ε) .
By (11.2.7), this implies that d(f n (x), xn ) ≤ σ −1 ε < ε for every n ≥ 0.
The other claims in the proposition are simple consequences, as we now
explain. If x is another point as in the conclusion of the proposition then
d(f n (x), f n (x )) ≤ d(f n (x), xn ) + d(f n (x ), xn ) < 2ε for every n ≥ 0.
Since 2ε is an expansivity constant, it follows that x = x . Moreover, if the
pseudo-orbit is periodic, with period κ ≥ 1, then
d(f n (f κ (x)), xn ) = d(f n+κ (x), xn+κ ) < ε for every n ≥ 0.
By uniqueness, it follows f κ (x) = x.

It is worthwhile pointing out that δ depends linearly on ε: the proof of


Proposition 11.2.9 shows that we may take δ = cε, where c > 0 depends only
on σ .
We call pre-orbit of a point x ∈ M any sequence (x−n )n≥0 such that x0 = x
and f (x−n ) = x−n+1 for every n ≥ 1. If x is a periodic point, with period l ≥ 1,
then it admits a distinguished periodic pre-orbit (x̄−n )n , such that x̄−kl = x for
every integer k ≥ 1.
368 Expanding maps

Lemma 11.2.10. If d(x, y) < ρ then, given any pre-orbit (x−n )n of x, there
exists a pre-orbit (y−n )n of y asymptotic to (x−n )n , in the sense that d(x−n , y−n )
converges to 0 when n → ∞.

Proof. For each n ≥ 1, let hn : B(x, ρ) → M be the inverse branch of f n with


hn (x) = x−n . Define y−n = hn (y). It is clear that d(x−n , y−n ) ≤ σ −n d(x, y). This
implies the claim.

Theorem 11.2.11. Let f : M → M be an expanding map in a compact metric


space and  ⊂ M be the closure of the set of all periodic points of f . Then
f () =  and the restriction f :  →  is an expanding map.

Proof. On the one hand, it is clear that f () is contained in : if a point x


is accumulated by periodic points pn then f (x) is accumulated by the images
f (pn ), which are also periodic points. On the other hand, since f () is a
compact set that contains all the periodic points, it must contain . This shows
that f () = .
Next, we prove that the restriction f :  →  is an expanding map. It is clear
that the property (11.2.1) remains valid for the restriction. So, we only have to
show that there exists r ≤ ρ (we are going to take r = σ −1 ρ) such that, for
every x ∈ , the image f ( ∩ B(x, r)) contains a neighborhood of  ∩ B(x, r)
inside . The main ingredient is the following lemma:

Lemma 11.2.12. Let p be a periodic point and hp : B(f (p), ρ) → B(p, ρ) be


the inverse branch of f at p. If y ∈ B(f (p), ρ) is a periodic point then hp (y) ∈ .

Proof. Write x = hp (y) and q = f (p). Consider any ε > 0 such that 2ε is
a constant of expansivity for f . Take δ > 0 given by the shadowing lemma
(Proposition 11.2.9). By Lemma 11.2.10, there exists a pre-orbit (x−n )n of x
asymptotic to the periodic pre-orbit (p̄−n )n of p. In particular,

d(x−k+1 , q) = d(x−k+1 , p̄−k+1 ) < δ (11.2.9)

for every large multiple k of the period of p. Analogously, there exists a


pre-orbit (q−n )n of q asymptotic to the periodic pre-orbit (ȳ−n )n of y. Fix any
multiple l of the period of y such that

d(q−l , f (x)) = d(q−l , y) < δ. (11.2.10)

Now consider the periodic sequence (zn )n , with period k + l, given by

z0 = x, z1 = q−l , . . . , zl = q−1 , zl+1 = x−k+1 , . . . , zl+k−1 = x−1 , zk+l = x.

See Figure 11.3. We claim that (zn )n is a δ-pseudo-orbit. Indeed, if n is a


multiple of k + l then, by (11.2.10),

d(f (zn ), zn+1 ) = d(f (x), q−1 ) = d(y, q−1 ) < δ.


11.2 Dynamics of expanding maps 369

q−1 y

x q−l
x−k+1

p q

Figure 11.3. Constructing periodic orbits

If n − l is a multiple of k + l then, by (11.2.9),


d(f (zn ), zn+1 ) = d(f (q−1 ), x−k+1 ) = d(q, x−k+1 ) < δ.
In all the other cases, f (zn ) = zn+1 . This proves our claim. Now we may use
Proposition 11.2.9 to conclude that there exists a periodic point z such that
d(f n (z), zn ) < ε for every n ≥ 0. In particular, d(z, x) < ε. Since ε > 0 is
arbitrary, this shows that x is in the closure of the set of periodic points, as
stated.

Corollary 11.2.13. Let z ∈  and hz : B(f (z), ρ) → B(z, ρ) be the inverse


branch of f at z. If w ∈  ∩ B(f (z), ρ) then hz (w) ∈ .

Proof. Since z ∈ , we may find some periodic point p close enough to z that
w ∈ B(f (p), ρ) and hp (w) = hz (w). Since w ∈ , we may find periodic points
yn ∈ B(f (p), ρ) converging to w. By Lemma 11.2.12, we have that hp (yn ) ∈ 
for every n. Passing to the limit, we conclude that hp (w) ∈ .

We are ready to conclude the proof of Theorem 11.2.11. Take r = σ −1 ρ. The


property (11.2.6) implies that hz (B(f (z), ρ)) is contained in B(z, r), for every z ∈
. Then, Corollary 11.2.13 implies that f ( ∩ B(z, r)) contains  ∩ B(f (z), ρ),
which is a neighborhood of ∩B(f (z), r) in . Thus the argument is complete.

Theorem 11.2.14. Let f : M → M be an expanding map in a compact metric


space and  ⊂ M be the closure of the set of periodic points of f . Then


M= f −k ().
k=0
370 Expanding maps

Proof. Given any x ∈ M, let ω(x) denote its ω-limit set, that is, the set of
accumulation points of the iterates f n (x) when n → ∞. First, we show that
ω(x) ⊂ . Then, we deduce that f k (x) ∈  for some k ≥ 0.
Let ε > 0 be such that 2ε is a constant of expansivity for f . Take δ > 0
given by the shadowing lemma (Proposition 11.2.9) and let α ∈ (0, δ) be such
that d(f (z), f (w)) < δ whenever d(z, w) < α. Let y be any point in ω(x). The
definition of the ω-limit set implies that there exist r ≥ 0 and s ≥ 1 such that

d(f r (x), y) < α and d(f r+s (x), y) < α.

Consider the periodic sequence (zn ), with period s, given by

z0 = y, z1 = f r+1 (x), . . . , zs−1 = f r+s−1 (x), zs = y.

Observe that d(f (z0 ), z1 ) = d(f (y), f r+1 (x)) < δ (because d(y, f r (x)) < α),
d(f (zs−1 ), zs ) = d(f r+s (x), y) < α < δ and f (zn ) = zn+1 in all the other cases. In
particular, (zn )n is a δ-pseudo-orbit. Then, by Proposition 11.2.9, there exists
some periodic point z such that d(y, z) < ε. Making ε → 0, we conclude that y
is accumulated by periodic points, that is, y ∈ .
Let ε > 0 and δ > 0 be as before. It is no restriction to suppose that
δ < ε. Take β ∈ (0, δ/2) such that d(f (z), f (w)) < δ/2 whenever d(z, w) < β.
Since ω(x) is contained in , there exist k ≥ 1 and points wn ∈  such that
d(f n+k (x), wn ) < β for every n ≥ 0. Observe that

d(f (wn ), wn+1 ) ≤ d(f (wn ), f n+k+1 (x)) + d(f n+k+1 (x), wn+1 ) < δ/2 + β < δ

for every n ≥ 0. Therefore, (wn )n is a δ-pseudo-orbit in . Since the


restriction of f to  is an expanding map (Theorem 11.2.11), it follows from
Proposition 11.2.9 applied to the restriction that there exists w ∈  such that
d(f n (w), wn ) < ε for every n ≥ 0. Then,

d(f n (f k (x)), f n (w)) ≤ d(f n+k (x), wn ) + d(wn , f n (w)) < β + ε < 2ε

for every n ≥ 0. Then, by expansivity, f k (x) = w.

11.2.3 Dynamical decomposition


Theorem 11.2.14 shows that for expanding maps the interesting dynamics are
localized in the closure  of the set of periodic points. In particular, supp μ ⊂ 
for every invariant probability measure f . Moreover (Theorem 11.2.11), the
restriction of f to  is still an expanding map. Thus, up to replacing M by ,
it is no restriction to suppose that the set of periodic points is dense in M.

Theorem 11.2.15 (Dynamical decomposition). Let f : M → M be an


expanding map whose set of periodic points is dense in M. Then there exists a
partition of M into non-empty compact sets Mi,j , with 1 ≤ i ≤ k and 1 ≤ j ≤ m(i),
such that:
11.2 Dynamics of expanding maps 371

(i) Mi = m(i) j=1 Mi,j is invariant under f , for every i;


(ii) f (Mi,j ) = Mi,j+1 if j < m(i) and f (Mi,m(i) ) = Mi,1 , for every i, j;
(iii) each restriction f : Mi → Mi is a transitive expanding map;
(iv) each f m(i) : Mi,j → Mi,j is a topologically exact expanding map.

Moreover, the number k, the numbers m(i) and the sets Mi,j are unique up to
renumbering.

Proof. Consider the relation ∼ defined in the set of periodic points of f as


follows. Given two periodic points p and q, let (p̄−n )n and (q̄−n )n , respectively,
be their periodic pre-orbits. By definition, p ∼ q if and only if there exist
pre-orbits (p−n )n of p and (q−n )n of q such that
d(p−n , q̄−n ) → 0 and d(p̄−n , q−n ) → 0. (11.2.11)
We claim that ∼ is an equivalence relation. It is clear from the definition that
the relation ∼ is reflexive and symmetric. To prove that it is also transitive,
suppose that p ∼ q and q ∼ r. Then there exist pre-orbits (q−n )n of q and
(r−n )n of r asymptotic to the periodic pre-orbits (p̄−n )n of p and (q̄−n )n of
q, respectively. Let k ≥ 1 be a multiple of the periods of p and q such that
d(r−k , q̄−k ) < ρ. Note that q̄−k = q, since k is a multiple of the period of q.

Then, by Lemma 11.2.10, there exists a pre-orbit (r−n )n of the point r = r−k
  
such that d(r−n , q−n ) → 0. Then d(r−n , p̄−n ) → 0. Consider the pre-orbit (r−n )n
of r defined by

 r−n if n ≤ k
r−n = 
r−n+k if n > k.
 
Since k is a multiple of the period of p, we have d(r−n , p̄−n ) = d(r−n+k , p̄−n+k )

for every n > k. Therefore, (r−n )n is asymptotic to (p̄−n )n . Analogously, one

can find a pre-orbit (p−n ) of p asymptotic to the periodic pre-orbit (r̄−n )n of r.
Therefore, p ∼ r and this proves that the relation ∼ is indeed transitive.
Next, we claim that p ∼ q if and only if f (p) ∼ f (q). Start by supposing that
p ∼ q, and let (p−n )n and (q−n )n be pre-orbits of p and q as in (11.2.11). The
periodic pre-orbits of p = f (p) and q = f (q) are given by, respectively,
 
f (p) if n = 0 f (q) if n = 0
p̄−n = and q̄−n =
p̄−n+1 if n ≥ 1 q̄−n+1 if n ≥ 1.
Consider the pre-orbits of p and q given by, respectively,
 
 f (p) if n = 0  f (q) if n = 0
p−n = and q−n =
p−n+1 if n ≥ 1 q−n+1 if n ≥ 1.
It is clear that (p−n )n is asymptotic to (q̄−n )n and (q−n )n is asymptotic to (p̄−n )n .
Hence, f (p) ∼ f (q). Now suppose that f (p) ∼ f (q). The previous argument
shows that f k (p) ∼ f k (q) for every k ≥ 1. When k is a common multiple of
the periods of p and q this means that p ∼ q. This completes the proof of our
372 Expanding maps

claim. Note that the statement means that the image and the pre-image of any
equivalence class are both equivalence classes.
Observe also that if d(p, q) < ρ then p ∼ q. Indeed, by Lemma 11.2.10
we may find a pre-orbit of q asymptotic to the periodic pre-orbit of p and,
analogously, a pre-orbit of p asymptotic to the periodic pre-orbit of q. It follows
that the equivalence classes are open sets and, since M is compact, they are
finite in number. Moreover, if A and B are two different equivalence classes,
then their closures Ā and B̄ are disjoint: the distance between them is at least
ρ. Since p ∼ q if and only if f (p) ∼ f (q), it follows that the transformation f
permutes the closures of the equivalence classes.
Thus, we may enumerate the closures of the equivalence classes as Mi,j , with
1 ≤ i ≤ k and 1 ≤ j ≤ m(i), in such a way that
f (Mi,j ) = Mi,j+1 for j < m(i) and f (Mi,m(i) ) = Mi,1 . (11.2.12)
The properties (i) and (ii) in the statement of the theorem are immediate
consequences.
Let us prove property (iii). Since the Mi are pairwise disjoint, it follows
from (11.2.12) that f −1 (Mi ) = Mi for every i. Hence, Lemma 11.2.2 implies
that f : Mi → Mi is an expanding map. By Lemma 4.3.4, to show that this map
is transitive it suffices to show that given any open subsets U and V of Mi there
exists n ≥ 1 such that f n (U) intersects V. It is no restriction to assume that U ⊂
Mi,j for some j. Moreover, up to replacing V by some pre-image f −k (V), we
may suppose that V is contained in the same Mi,j . Choose periodic points p ∈ U
and q ∈ V. By the definition of equivalence classes, there exists some pre-orbit
(q−n )n of q asymptotic to the periodic pre-orbit (p̄−n )n of p. In particular, we
may find n arbitrarily large such that q−n ∈ U. Then q ∈ f n (U) ∩ V. Therefore,
f : Mi → Mi is transitive.
Next, we prove property (iv). Since the Mi,j are pairwise disjoint, it follows
from (11.2.12) that f −m(i) (Mi,j ) = Mi,j for every i. Hence (Lemma 11.2.2),
g = f m(i) : Mi,j → Mi,j is an expanding map. We also want to prove that g
is topologically exact. Let U be a non-empty open subset of Mi,j and p be a
periodic point of f in U. By (11.2.12), the period κ is a multiple of m(i), say
κ = sm(i). Let q be any periodic point of f in Mi,j . By the definition of the
equivalence relation ∼, there exists some pre-orbit (q−n )n of q asymptotic to
the periodic pre-orbit (p̄−n )n of p. In particular, d(q−κn , p) → 0 when n → ∞.
Then hκn q (B(q, ρ)) is contained in U for every n sufficiently large. This implies
that g (U) = f κn (U) contains B(q, ρ) for every n sufficiently large. Since Mi,j
sn

is compact, we may find a finite cover by balls of radius ρ around periodic


points. Applying the previous argument to each of those periodic points, we
deduce that gsn (U) contains Mi,j for every n sufficiently large. Therefore, g is
topologically exact.
We are left to prove that k, the m(i) and the Mi,j are unique. Let Nr,s , with
1 ≤ r ≤ l and 1 ≤ s ≤ n(r), be another partition as in the statement. Initially,
11.2 Dynamics of expanding maps 373

let us consider the partitions M = {Mi : 1 ≤ i ≤ k} and N = {Nr : 1 ≤ r ≤


l}, where Nr = n(r) s=1 Nr,s . Given any i and r, the sets Mi and Nr are open,
closed, invariant and transitive. We claim that either Mi ∩ Nr = ∅ or Mi = Nr .
Indeed, since the intersection is open, if it is non-empty then it intersects any
orbit that is dense in Mi (or in Nr ). Since the intersection is also closed and
invariant, it follows that it contains Mi (and Nr ). In other words, Mi = Nr . This
proves our claim. It follows that the partitions M and N coincide, that is, k = l
and Mi = Ni up to renumbering. Now, fix i. The transformation f permutes
the Mi,j and the Ni,s cyclically, with periods m(i) and n(i). Since f m(i)n(i) is
transitive on each Mi,j and each Ni,s , the same argument as in the first part of
this paragraph shows that, given any j and s, either Mi,j ∩ Ni,s = ∅ or Mi,j = Ni,s .
It follows that m(i) = n(i) and the families Mi,j and Ni,s coincide, up to cyclic
renumbering.

The following consequence of the theorem contains Lemma 11.1.13:

Corollary 11.2.16. If M is connected and f : M → M is an expanding map


then the set of periodic points is dense in M and f is topologically exact.

Proof. We claim that  is an open subset of f −1 (). Indeed, consider δ ∈


(0, ρ) such that d(x, y) < δ implies d(f (x), f (y)) < ρ. Assume that x ∈ f −1 ()
is such that d(x, ) < δ. Then there exists z ∈  such that d(x, z) < δ < ρ and
so d(f (x), f (z)) < ρ. Applying Corollary 11.2.13 with w = f (x), we get that
x = hz (w) ∈ . Therefore,  contains its δ-neighborhood inside f −1 (). This
implies our claim.
Then, the set S = f −1 () \  is closed in f −1 () and, consequently, it is
closed in M, so f −n (S) is closed in M for every n ≥ 0. By Theorem 11.2.14,
the space M is a countable pairwise disjoint union of closed sets  and f −n (S),
n ≥ 0. By the Baire theorem, some of these closed sets have a non-empty
interior. Since f is an open map, it follows that  has a non-empty interior.
Now consider the restriction f :  → . By Theorem 11.2.11, this is an
expanding map. Let {i,j : 1 ≤ i ≤ k, 1 ≤ j ≤ m(i)} be the partition of the domain
 given by Theorem 11.2.15. Then some i,j contains an open subset V of M.
Since f m(i) is topologically exact, f nm(i) (V) = i,j for some n ≥ 1. Using once
more the fact that f is an open map, it follows that the compact set Mi,j is an
open subset of M. By connectivity, it follows that M = i,j . This implies that
 = M and f : M → M is topologically exact.

11.2.4 Exercises
11.2.1. Show that if f : M → M is a local homeomorphism in a compact connected
metric space then the number of pre-images #f −1 (y) is the same for every y ∈ M.
11.2.2. Show that if an expanding map is topologically mixing then it is topologically
exact.
374 Expanding maps

11.2.3. Let f : M → M be a topologically exact transformation in a compact metric


space. Show that for every r > 0 there exists N ≥ 1 such that f N (B(x, r)) = M
for every x ∈ M.
11.2.4. Consider the expanding map f : S1 → S1 given by f (x) = 2x mod Z. Give an
example of a compact set  ⊂ S1 such that f () =  but the restriction f :  →
 is not an expanding map.
11.2.5. Let f : M → M be an expanding map and  be the closure of the set of periodic
points of f . Show that h(f ) = h(f | ).
11.2.6. Let f : M → M be an expanding map such that the set of periodic points is dense
in M and let Mi , Mi,j be the compact subsets given by Theorem 11.2.15. Show
that h(f ) = maxi h(f | Mi ) and
1
h(f | Mi ) = h(f m(i) | Mi,j ) for any i, j.
m(i)
11.2.7. Let σA : A → A be a shift of finite type. Interpret the decomposition given by
Theorem 11.2.15 in terms of the matrix A.

11.3 Entropy and periodic points


In this section we analyze the distribution of periodic points of an expanding
map f : M → M from a quantitative point of view. We show (Section 11.3.1)
that the rate of growth of the number of periodic points is equal to
the topological entropy; compare this statement with the discussion in
Section 10.2.1. Another interesting conclusion (Section 11.3.2) is that every
invariant probability measure may be approximated, in the weak∗ topology, by
invariant probability measures supported on periodic orbits. These results are
based on the following property:

Proposition 11.3.1. Let f : M → M be a topologically exact expanding map.


Then, given any ε > 0 there exists κ ≥ 1 such that, given any x1 , . . . , xs ∈ M,
any n1 , . . . , ns ≥ 1 and any k1 , . . . , ks ≥ κ, there exists a point p ∈ M such that,
j
denoting mj = i=1 ni + ki for j = 1, . . . , s and m0 = 0,

(i) d(f mj−1 +i (p), f i (xj )) < ε for 0 ≤ i < nj and 1 ≤ j ≤ s, and
(ii) f ms (p) = p.

Proof. Given ε > 0, take δ > 0 as in the shadowing lemma (Proposition 11.2.9).
Without loss of generality, we may suppose that δ < ε and 2ε is a constant of
expansivity for f (recall Corollary 11.2.8). Since f is topologically exact, given
any z ∈ M there exists κ ≥ 1 such that f k (B(z, δ)) = M for every k ≥ κ. Moreover
(see Exercise 11.2.3), since M is compact, we may choose κ depending only
on δ. Let xj , nj , kj ≥ κ, j = 1, . . . , s be as in the statement. In particular, for
each j = 1, . . . , s − 1 there exists yj ∈ B(f nj (xj ), δ) such that f kj (yj ) = xj+1 .
Analogously, there exists ys ∈ B(f ns (xs ), δ) such that f ks (ys ) = x1 . Consider the
11.3 Entropy and periodic points 375

periodic δ-pseudo-orbit (zn )n≥0 defined by


⎧ n−m
j−1 (x ) for 0 ≤ n − mj−1 < nj and j = 1, . . . s
⎨ f j
n−mj−1 −nj
zn = f (yj ) for 0 ≤ n − mj−1 − nj < kj and j = 1, . . . , s

zn−ms for n ≥ ms .
By the second part of the shadowing lemma, there exists some periodic point
p ∈ M, with period ms , whose trajectory ε-shadows this periodic pseudo-orbit
(zn )n . In particular, the conditions (i) and (ii) in the statement hold.

The property in the conclusion of Proposition 11.3.1 was introduced by


Rufus Bowen [Bow71] and is called specification by periodic points. When
condition (i) holds, but the point p is not necessarily periodic, we say that f has
the property of specification.

11.3.1 Rate growth of periodic points


Let f : M → M be an expanding map. Then f is expansive (by Lemma 11.1.4)
and so it follows from Proposition 10.2.2 that the rate of growth of the number
of periodic points is bounded above by the topological entropy:
1
lim sup log # Fix(f n ) ≤ h(f ). (11.3.1)
n n
In this section we prove that, in fact, the identity holds in (11.3.1). We start with
the topologically exact case, where one may even replace the limit superior by
a limit:

Proposition 11.3.2. For any topologically exact expanding map f : M → M,


1
lim log # Fix(f n ) = h(f ).
n n

Proof. Given ε > 0, fix κ ≥ 1 satisfying the conclusion of Proposition 11.3.1


with ε/2 instead of ε. For each n ≥ 1, let E be a maximal (n, ε)-separated set.
According to Proposition 11.3.1, for each x ∈ E there exists p(x) ∈ B(x, n, ε/2)
with f n+κ (p(x)) = p(x). We claim that the map x → p(x) is injective. Indeed,
consider any y ∈ E \ {x}. Since the set E was chosen to be (n, ε)-separated,
B(x, n, ε/2) ∩ B(y, n, ε/2) = ∅. This implies that p(x) = p(y), which proves our
claim. In particular, it follows that

# Fix(f n+κ ) ≥ #E = sn (f , ε, M) for every n ≥ 1

(recall the definition (10.1.9) in Section 10.2.1). Then,


1 1
lim inf log # Fix(f n+κ ) ≥ lim inf log sn (f , ε, M).
n n n n
Making ε → 0 and using Corollary 10.1.8, we find that
376 Expanding maps
1 1
lim inf log # Fix(f n+κ ) ≥ lim lim inf log sn (f , ε, M) = h(f ). (11.3.2)
n n ε→0 n n
Together with (11.3.1), this implies the claim in the proposition.

Proposition 11.3.2 is not true, in general, if f is not topologically exact.


For example, given an arbitrary expanding map g : M → M, consider f : M ×
{0, 1} → M × {0, 1} defined by f (x, i) = (g(x), 1 − i). Then f is an expanding
map but all its periodic points have even period. In particular, in this case,
1
lim inf log # Fix(f n ) = 0.
n n
However, the next proposition shows that this type of example is the worst that
can happen. The proof also makes it clear when and how the limit may fail to
exist.

Proposition 11.3.3. For any expanding map f : M → M,


1
lim sup log # Fix(f n ) = h(f ).
n n
Proof. By Theorem 11.2.11, the restriction f to the set of periodic points
is an expanding map. According to Exercise 11.2.5 this restriction has the
same entropy as f . Obviously, the two transformations have the same periodic
points. Therefore, up to replacing f by this restriction, we may suppose that
the set of periodic points is dense in M. Then, by the theorem of dynamical
decomposition (Theorem 11.2.11), one may write M as a finite union of
compact sets Mi,j , with 1 ≤ i ≤ k and 1 ≤ j ≤ m(i), such that each f m(i) : Mi,j →
Mi,j is a topologically exact expanding map. According to Exercise 11.2.6,
there exists 1 ≤ i ≤ k such that
1
h(f ) = h(f m(i) | Mi,1 ). (11.3.3)
m(i)
It is clear that
1   1  
lim sup log # Fix f n ≥ lim sup log # Fix f nm(i)
n n n nm(i)
(11.3.4)
1 1  
≥ lim sup log # Fix (f m(i) | Mi,1 )n .
m(i) n n
Moreover, Proposition 11.3.2 applied to f m(i) : Mi,1 → Mi,1 yields
1  
lim log # Fix (f m(i) | Mi,1 )n = h(f m(i) | Mi,1 ). (11.3.5)
n n

Combining (11.3.3)–(11.3.5), we find that


1  
lim sup log # Fix f n ≥ h(f ), (11.3.6)
n n
as we wanted to prove.
11.3 Entropy and periodic points 377

11.3.2 Approximation by atomic measures


Given a periodic point p, with period n ≥ 1, consider the probability measure
μp defined by
1 
μp = δp + δf (p) + · · · + δf n−1 (p) .
n
Clearly, μp is invariant and ergodic for f . We are going to show that if f is
expanding then the set of measures of this form is dense in the space M1 (f ) of
all invariant probability measures:
Theorem 11.3.4. Let f : M → M be a topologically exact expanding map.
Then every probability measure μ invariant under f can be approximated, in
the weak∗ topology, by invariant probability measures supported on periodic
orbits.

Proof. Let ε > 0 and  = {φ1 , . . . , φN } be a finite family of continuous


functions in M. We want to show that the neighborhood V(μ, , ε) defined
in (2.1.1) contains some measure μp supported on a periodic orbit. By the
theorem of Birkhoff, for μ-almost every point x ∈ M,

1
n−1
φ̃i (x) = lim φi (f t (x)) exists for every i. (11.3.7)
n n
t=0

Fix C > sup |φi | ≥ sup |φ̃i | and take δ > 0 such that
ε
d(x, y) < δ ⇒ |φi (x) − φi (y)| < for every i. (11.3.8)
5
Fix κ = κ(δ) ≥ 1 given by the property of specification (Proposition 11.3.1).
Choose points xj ∈ M, 1 ≤ j ≤ s satisfying (11.3.7) and positive numbers αj ,

1 ≤ j ≤ s such that j αj = 1 and
  
 s
 ε
 φ̃i dμ − α φ̃ (x ) <
 j i j  5 for every i (11.3.9)
j=1

(use Exercise A.2.6). Take kj ≡ κ and choose integer numbers nj much bigger
than κ, in such a way that
 
 nj 
 − αj  < ε (11.3.10)
m  5Cs
s
 
(recall that ms = j (nj + kj ) = sκ + j nj ) and, using (11.3.8),
 nj −1 
  ε
 φ (f t
(x )) − n φ̃ (x )  < nj for 1 ≤ i ≤ N. (11.3.11)
 i j j i j  5
t=0

Combining (11.3.9) and (11.3.10) with the fact that φ̃i dμ = φi dμ, we get
  
 s
 ε
 φi dμ − nj
φ̃ (x )  < + ε s sup |φ̃i | < 2ε . (11.3.12)
 m
i j  5 5Cs 5
j=1 s
378 Expanding maps

By Proposition 11.3.1, there exists some periodic point p ∈ M, with period ms ,


such that d(f mj−1 +t (p), f t (xj )) < δ for 0 ≤ t < nj and 1 ≤ j ≤ s. Then, the property
(11.3.8) implies that
 n−1 nj −1
 
 j  ε
 φ (f mj−1 +t
(p)) − φ (f t
(x 
j  < nj
)) for 1 ≤ j ≤ s.
 i i
5
t=0 t=0

Combining this relation with (11.3.11), we obtain


 n−1 
 j  2ε
 φ (f mj−1 +t
(p)) − nj i j <
φ̃ (x )  nj for 1 ≤ j ≤ s. (11.3.13)
 i
5
t=0

Since j αj = 1, the condition (11.3.10) implies that

s
ε
sκ = ms − nj < ms .
j=1
5C

Then (11.3.13) implies that


 m 
 s −1 s
 2ε s

 φ (f t
(p)) − n φ̃ (x ) < n + sκ sup | φ̃ | < ms . (11.3.14)
 i j i j  5
j i
5
t=0 j=1 j=1

Let μp be the invariant probability measure supported on the orbit of p.


The first term in (11.3.14) coincides with ms φi dμp . Therefore, adding the
inequalities (11.3.12) and (11.3.14), we conclude that
  
 
 φi dμp − φi dμ < 2ε + 3ε = ε for every 1 ≤ i ≤ N.
  5 5
This means that μp ∈ V(μ, , ε), as we wanted to prove.

11.3.3 Exercises
11.3.1. Let f : M → M be a continuous transformation in a compact metric space
M. Check that if some iterate f l , l ≥ 1 has the property of specification, or
specification by periodic points, then so does f .
11.3.2. Let f : M → M be a continuous transformation in a metric space with the
property of specification. Show that f is topologically mixing.
11.3.3. Let f : M → M be a topologically mixing expanding map and ϕ : M → R
be a continuous function. Assume that there exist probability measures μ1 , μ2
invariant under f and such that ϕ dμ1  = ϕ dμ2 . Show that there exists x ∈ M
such that the time average of ϕ on the orbit of x does not converge. [Observation:
One can show (see [BS00]) that the set Mϕ of points where the time average of
ϕ does not converge has full entropy and full Hausdorff dimension.]
11.3.4. Prove the following generalization of Proposition 11.3.2: if f : M → M is a
topologically exact expanding map then
1 
P(f , φ) = lim log eφk (p) for every Hölder function φ : M → R.
k k
p∈Fix(f )
k
11.3 Entropy and periodic points 379

11.3.5. Let f : M → M be an expanding map of class C1 on a compact manifold M.


Show that f admits:
(a) A neighborhood U0 in the C0 topology (that is, the topology of uniform
convergence) such that f is a topological factor of every g ∈ U0 . In
particular, h(g) ≥ h(f ) for every g ∈ U0 .
(b) A neighborhood U1 in the C1 topology such that every g ∈ U1 is
topologically conjugate to f . In particular, g → h(g) is constant on U1 .
12
Thermodynamic formalism

In this chapter we develop the ergodic theory of expanding maps on compact


metric spaces. This theory evolved from the kind of ideas in statistical
mechanics that we discussed in Section 10.3.4 and, for that reason, is
often called thermodynamic formalism. We point out, however, that this last
expression is much broader, encompassing not only the original setting of
mathematical physics but also applications to other mathematical systems,
such as the so-called uniformly hyperbolic diffeomorphisms and flows (in this
latter regard, see the excellent monograph of Rufus Bowen [Bow75a]).
The main result in this chapter is the following theorem of David Ruelle,
which we prove in Section 12.1 (the notion of Gibbs state is also introduced in
Section 12.1):

Theorem 12.1 (Ruelle). Let f : M → M be a topologically exact expanding


map on a compact metric space and ϕ : M → R be a Hölder function. Then
there exists a unique equilibrium state μ for ϕ. Moreover, the measure μ is
exact, it is supported on the whole of M and is a Gibbs state.

Recall that an expanding map is topologically exact if (and only if) it is


topologically mixing (Exercise 11.2.2). Moreover, a topologically exact map
is necessarily surjective.
In the particular case when M is a Riemannian manifold and f is differ-
entiable, the equilibrium state of the potential ϕ = − log | det Df | coincides
with the absolutely continuous invariant measure given by Theorem 11.1.2.
In particular, it is the unique physical measure of f . These facts are proved in
Section 12.1.8.
The theorem of Livšic that we present in Section 12.2 complements the
theorem of Ruelle in a very elegant way. It asserts that two potentials ϕ and
ψ have the same equilibrium state if and only if the difference between them
is cohomologous to a constant. In other words, this happens if and only if
ϕ −ψ = c+u◦f −u for some c ∈ R and some continuous function u. Moreover,
and remarkably, it suffices to check this condition on the periodic orbits of f .
12.1 Theorem of Ruelle 381

In Section 12.3 we show that the system (f , μ) exhibits exponential decay of


correlations in the space of Hölder functions, for every equilibrium state μ of
any Hölder potential.
We close this chapter (Section 12.4) with an application of these ideas to a
class of geometric and dynamical objects called conformal repellers. We prove
the Bowen–Manning formula according to which the Hausdorff dimension of
the repeller is given by the unique zero of the function t → P(f , tϕu ).

12.1 Theorem of Ruelle


Let f : M → M be a topologically exact expanding map and ϕ be a Hölder
potential. In what follows, ρ > 0 and σ > 1 are the same constants as in the
definition (11.2.1). Recall that we denote by ϕn the orbital sums of ϕ:

n−1
ϕn (x) = ϕ(f j (x)) for x ∈ M.
j=0

Before getting into the details of the proof of Theorem 12.1 let us outline the
main points. The arguments in the proof turn around the transfer operator (or
Ruelle–Perron–Frobenius operator), the linear operator L : C0 (M) → C0 (M)
defined in the Banach space C0 (M) of continuous complex functions by

Lg(y) = eϕ(x) g(x). (12.1.1)
x∈f −1 (y)

Observe that L is well defined: Lg ∈ C0 (M) whenever g ∈ C0 (M). Indeed,


as we saw in Lemma 11.2.6, for each y ∈ M there exist inverse branches
hi : B(y, ρ) → M, i = 1, . . . , k
k
of the transformation f such that i=1 hi (B(y, ρ)) coincides with the pre-image
of the ball B(y, ρ). Then,

k
 ϕ 
Lg = e g ◦ hi (12.1.2)
i=1

restricted to B(y, ρ) and, clearly, this expression defines a continuous function.


It is clear from the definition that L is a positive operator: if g(x) ≥ 0 for
every x ∈ M then Lg(y) ≥ 0 for every y ∈ M. It is also easy to check that L is a
continuous operator: indeed,
Lg = sup |Lg| ≤ degree(f )esup ϕ sup |g| = degree(f )esup ϕ g (12.1.3)
for every g ∈ C0 (M), and that means that L ≤ degree(f ) esup ϕ ; recall that
degree(f ) was defined in (11.2.4).
According to the theorem of Riesz–Markov (Theorem A.3.12), the dual
space of the Banach space C0 (M) may be identified with the space M(M)
382 Thermodynamic formalism

of all complex Borel measures. Then, the dual of the transfer operator is the
linear operator L∗ : M(M) → M(M) defined by
 
 ∗   
gd L η = Lg dη for every g ∈ C0 (M) and η ∈ M(M). (12.1.4)

This operator is positive, in the sense that if η is a positive measure then L∗ η


is also a positive measure.
The first step in the proof (Section 12.1.1) is to show that L∗ admits a
positive eigenmeasure ν associated with a positive eigenvalue λ. We will see
that such a measure admits a positive Jacobian which is Hölder and whose
support is the whole space M. Moreover (Section 12.1.2), the eigenmeasure ν
is a Gibbs state: there exists a constant P ∈ R and for each ε > 0 there exists
K ≥ 1 such that
ν(B(x, n, ε))
K −1 ≤   ≤K for every x ∈ M and every n ≥ 1, (12.1.5)
exp ϕn (x) − nP

where B(x, n, ε) is the dynamical ball defined in (9.3.2). Actually, P = log λ.


Behind the proof of the Gibbs property are certain results about distortion
control that are also crucial to show (Section 12.1.3) that the transfer operator
itself L admits an eigenfunction associated with the eigenvalue λ. This function
is strictly positive and Hölder. The measure μ = hν is the equilibrium state we
are looking for, although that will take a little while to prove.
It follows easily from the properties of h (Section 12.1.4) that μ is invariant
and a Gibbs state, and its support is the whole of M. Moreover, hμ (f ) +
ϕ dμ = P. To conclude that μ is indeed an equilibrium state, we need to
check that P is equal to the pressure P(f , ϕ). This is done (Section 12.1.5)
with the help of the Rokhlin formula (Theorem 9.7.3), which also allows us
to conclude that if η is an equilibrium state then η/h is an eigenmeasure of
L∗ associated with the eigenvalue λ = log P(f , ϕ). This last result is the key
ingredient for proving that the equilibrium state is unique (Section 12.1.6).
The distortion control is, again, crucial for checking (Section 12.1.7) that
the system (f , μ) is exact. Finally, in Section 12.1.8 we comment on the special
case ϕ = − log | det f |, when f is an expanding map on a Riemannian manifold.
In this case, the reference measure ν is the Lebesgue measure on the manifold
itself. Thus, the equilibrium state μ is an invariant measure equivalent to the
Lebesgue measure, and so it coincides with the invariant measure constructed
in Section 11.1.
Before we start to detail these steps, it is convenient to make a couple of
quick comments. First, note that the existence of an equilibrium state follows
immediately from Corollary 10.5.9, since Lemma 11.1.4 asserts that every
expanding map is expansive. However, this fact is not used in the proof:
instead, in Section 12.1.4 we present a much more explicit construction of
the equilibrium state.
12.1 Theorem of Ruelle 383

The other comment concerns the Rokhlin formula. Let P be any finite
partition of M with diam P < ρ. For each n ≥ 1, every element of the partition
*n−1 −j
P n = j=0 f (P) is contained in the image hn−1 (P) of some P ∈ P by an
inverse branch hn−1 of the iterate f n−1 . In particular, diam P n < σ −n+1 ρ for
every n. Then, P satisfies the hypotheses of Theorem 9.7.3 at every point.
Hence, the Rokhlin formula holds for every invariant probability measure.

12.1.1 Reference measure


Recall that C+0
(M) denotes the cone of positive continuous functions. As
observed previously, this cone is preserved by the transfer operator L. The
dual cone (recall Example 2.3.3) is defined by
0
C+ (M)∗ = {η ∈ C0 (M)∗ : η(ψ) ≥ 0 for every ψ ∈ C+
0
(M)}

and may be seen as the cone of finite positive Borel measures. It follows
0
directly from (12.1.4) that C+ (M)∗ is preserved by the dual operator L∗ .

Lemma 12.1.1. Consider the spectral radius λ = ρ(L∗ ) = ρ(L). Then there
exists some probability measure ν on M such that L∗ ν = λν.

Proof. As we saw in Exercise 2.3.3, the cone C+ 0


(M) is normal. Hence, we
may apply Theorem 2.3.4 with E = C (M) and C = C+
0 0
(M) and T = L. The

conclusion of the theorem means that L admits some eigenvector ν ∈ C+
0
(M)∗
associated with the eigenvalue λ. As we have just explained, ν may be
identified with a finite positive measure. Normalizing ν, we may take it to
be a probability measure.

In Exercise 12.1.2 we propose an alternative proof of Lemma 12.1.1, based


on the Tychonoff–Schauder theorem (Theorem 2.2.3).

Example 12.1.2. Let f : M → M be a local diffeomorphism on a compact


Riemannian manifold M. Consider the transfer operator L associated with the
potential ϕ = − log | det Df |. The Lebesgue measure m (that is, the volume
measure induced by the Riemannian metric) of M is an eigenmeasure of the
transfer operator associated with the eigenvalue λ = 1:

L∗ m = m. (12.1.6)

To check this fact, it is enough to show that L∗ m(E) = m(E) for every
measurable set E contained in the image of some inverse branch hj : B(y, ρ) →
M (because, M being compact, every measurable set may be written as a finite
disjoint union of subsets E of this kind). Now, using the expression (12.1.2),
   
k
XE
L∗ m(E) = XE d(L∗ m) = (LXE ) dm = ◦ hi dm.
i=1
| det Df |
384 Thermodynamic formalism

Hence, by the choice of E and the formula of change of variables,


  k 
X
L∗ m(E) =
E
◦ hi dm = XE dm = m(E).
i=1
| det Df |

This proves that m is, indeed, a fixed point of L∗ .


Exercise 12.1.3 gives a similar conclusion for Markov measures.

From now on, we always take ν to be a reference measure, that is, a


probability measure such that L∗ ν = λν for some λ > 0. By the end of the
proof of Theorem 12.1 we will find that λ is uniquely determined (in view of
Lemma 12.1.1, that means that λ is necessarily equal to the spectral radius of
L and L∗ ) and the measure ν itself is also unique.
Initially, we show that f admits a Jacobian with respect to ν, which may be
written explicitly in terms of the eigenvalue λ and the potential ϕ:

Lemma 12.1.3. The transformation f : M → M admits a Jacobian with respect


to ν, given by Jν f = λe−ϕ .

Proof. Let A be any domain of invertibility of f . Let (gn )n be a sequence of


continuous functions converging to the characteristic function of A at ν-almost
every point and such that sup |gn | ≤ 1 for every n (see Exercise A.3.5). Observe
that 
L(e−ϕ gn )(y) = gn (x).
x∈f −1 (y)

The expression on the right-hand side is bounded by the degree of f , as defined


in (11.2.4), and it converges to χf (A) (y) at ν-almost every point. Hence, using
the dominated convergence theorem, the sequence
  
λe gn dν = e gn d(L ν) = L(e−ϕ gn ) dν
−ϕ −ϕ %

converges to ν(f (A)). Since the expression on the left-hand side converges to
−ϕ
A λe dν, we conclude that

ν(f (A)) = λe−ϕ dν,
A

which proves the claim.

The next lemma applies, in particular, to the reference measure ν:

Lemma 12.1.4. Let f : M → M be a topologically exact expanding map and


η be any Borel probability measure such that there exists Jacobian of f with
respect to η. Then η it is supported on the whole of M.

Proof. Suppose, by contradiction, that there exists some open set U ⊂ M such
that η(U) = 0. Note that f is an open map, since it is a local homeomorphism.
12.1 Theorem of Ruelle 385

Thus, the image f (U) is also an open set. Moreover, we may write U as a finite
disjoint union of domains of invertibility A. For each one of them,

η(f (A)) = Jη f dη = 0.
A

Therefore, η(f (U)) = 0. By induction, it follows that η(f n (U)) = 0 for every
n ≥ 0. Since we take f to be topologically exact, there exists n ≥ 1 such that
f n (U) = M. This contradicts the fact that η(M) = 1.

12.1.2 Distortion and the Gibbs property


In this section we prove certain distortion bounds that have a central role in the
proof of Theorem 12.1. The hypothesis that ϕ is Hölder is critical at this stage:
most of what follows is false, in general, if the potential is only continuous.
As a first application of this distortion control, we prove that every reference
measure ν is a Gibbs state.
Fix constants K0 > 0 and α > 0 such that |ϕ(z) − ϕ(w)| ≤ K0 d(z, w)α for any
z, w ∈ M.

Lemma 12.1.5. There exists K1 > 0 such that for every n ≥ 1, every x ∈ M and
every y ∈ B(x, n + 1, ρ),
|ϕn (x) − ϕn (y)| ≤ K1 d(f n (x), f n (y))α .

Proof. By hypothesis, d(f i (x), f i (y)) < ρ for every 0 ≤ i ≤ n. Then, for each j =
1, . . . , n, the inverse branch hj : B(f n (x), ρ) → M of f j at the point f n−j (x), which
maps f n (x) to f n−j (x), also maps f n (y) to f n−j (y). Hence, recalling (11.2.6),
d(f n−j (x), f n−j (y)) ≤ σ −j d(f n (x), f n (y)) for every j = 1, . . . , n. Then,

n
|ϕn (x) − ϕn (y)| ≤ |ϕ(f n−j (x)) − ϕ(f n−j (y))|
j=1


n
≤ K0 σ −jα d(f n (x), f n (y))α .
j=1
∞ −jα
Therefore, we may take any K1 ≥ K0 j=0 σ .

As a consequence of Lemma 12.1.5, we obtain the following variation of


Proposition 11.1.5 where the usual Jacobian with respect to the Lebesgue
measure is replaced by the Jacobian with respect to any reference measure ν:

Corollary 12.1.6. There exists K2 > 0 such that for every n ≥ 1, every x ∈ M
and every y ∈ B(x, n + 1, ρ),
Jν f n (x)
K2−1 ≤ ≤ K2 .
Jν f n (y)
386 Thermodynamic formalism

Proof. From the expression of the Jacobian in Lemma 12.1.3 it follows that
(recall Exercise 9.7.5)

Jν f n (z) = λn e−ϕn (z) for every z ∈ M and every n ≥ 1. (12.1.7)

Then, Lemma 12.1.5 yields


   
 Jν f n
(x)   
 log  = ϕn (x) − ϕn (y) ≤ K1 d(f n (x), f n (y))α ≤ K1 ρ α .
 Jν f n (y)   

So, it suffices to take K2 = exp(K1 ρ α ).

Now we may show that every reference measure ν is a Gibbs state:

Lemma 12.1.7. For every small ε > 0, there exists K3 = K3 (ε) > 0 such that,
denoting P = log λ,
ν(B(x, n, ε))
K3−1 ≤ ≤ K3 for every x ∈ M and every n ≥ 1.
exp(ϕn (x) − nP)

Proof. Consider ε < ρ. Then, f | B(y, ε) is injective for every y ∈ M and,


consequently, f n | B(x, n, ε) is injective for every x ∈ M and every n. Then,

ν(f (B(x, n, ε))) =
n
Jν f n (y)dν(y).
B(x,n,ε)

Up to reducing ε, we may assume that d(f (x), f (y)) < ρ whenever d(x, y) < ε.
This implies that B(x, n, ε) ⊂ B(x, n + 1, ρ) for every x ∈ M and n ≥ 1. Then,
by Corollary 12.1.6, the value of Jν f n at any point y ∈ B(x, n, ε) differs from
Jν f n (x) by a factor bounded by the constant K2 . It follows that

K2−1 ν(f n (B(x, n, ε))) ≤ Jν f n (x)ν(B(x, n, ε)) ≤ K2 ν(f n (B(x, n, ε))). (12.1.8)

Now, Jν f n (x) = λn e−ϕn (x) = exp(nP − ϕn (x)), as we saw in (12.1.7). By


Lemma 11.2.7 we also have that f n (B(x, n, ε)) = f (B(f n−1 (x), ε)), and so

ν(f (B(x, n, ε))) =
n
Jν f dν (12.1.9)
B(f n−1 (x),ε)

for every x ∈ M and every n. It is clear that the left-hand side of (12.1.9) is
bounded above by 1. Moreover, Jν f = λe−ϕ is bounded from zero and (by
Exercise 12.1.1 and Lemma 12.1.4) the set {ν(B(y, ε)) : y ∈ M} is also bounded
from zero. Therefore, the right-hand side of (12.1.9) is bounded below by some
number a > 0. Using these observations in (12.1.8), we obtain
ν(B(x, n, ε))
K2−1 a ≤ ≤ K2 .
exp(ϕn (x) − nP)
Now it suffices to take K3 = max{K2 /a, K2 }.
12.1 Theorem of Ruelle 387

12.1.3 Invariant density


Next, we show that the transfer operator L admits some positive eigenfunction
h associated with the eigenvalue λ. We are going to find h as a Cesàro
accumulation point of the sequence of functions λ−n Ln 1. To show that there
does exist some accumulation point, we start by proving that this sequence is
uniformly bounded and equicontinuous.

Lemma 12.1.8. There exists K4 > 0 such that


αLn 1(y1 )
−K4 d(y1 , y2 ) ≤ log n ≤ K4 d(y1 , y2 )α
L 1(y2 )
for every n ≥ 1 and any y1 , y2 ∈ M with d(y1 , y2 ) < ρ.
Proof. It follows from (12.1.2) that, given any continuous function g,
 
L g=
n
eϕn g ◦ hni restricted to each ball B(y, ρ),
i

where the sum is over all inverse branches hni : B(y, ρ) → M of the iterate f n .
In particular,  ϕn (hn (y ))
Ln 1(y1 ) ie
i 1
=  ϕn (hni (y2 ))
.
Ln 1(y2 ) i e
By Lemma 12.1.5, for each of these inverse branches hni one has
|ϕn (hni (y1 )) − ϕn (hni (y2 ))| ≤ K1 d(y1 , y2 )α .
Consequently,
Ln 1(y1 )
−K1 d(y1 ,y2 )α α
e ≤ n ≤ eK1 d(x1 ,x2 ) .
L 1(y2 )
Therefore, one may take any K4 ≥ K1 .

It follows that the sequence λ−n Ln 1 is bounded from zero and infinity:
Corollary 12.1.9. There exists K5 > 0 such that K5−1 ≤ λ−n Ln 1(x) ≤ K5 for
every n ≥ 1 and any x ∈ M.

Proof. Start by observing that, for every n ≥ 1,


  
∗n
L 1 dν = 1 d(L ν) = λn dν = λn .
n

In particular, for every n ≥ 1,


min λ−n Ln 1(y) ≤ 1 ≤ max λ−n Ln 1(y). (12.1.10)
y∈M y∈M

Since f is topologically exact, there exists N ≥ 1 such that f N (B(x, ρ)) = M for
every x ∈ M (check Exercise 11.2.3). Now, given any x, y ∈ M, we may find
x ∈ B(x, ρ) such that f N (x ) = y. Then, on the one hand,
 
Ln+N 1(y) = eϕN (z) Ln 1(z) ≥ eϕN (x ) Ln 1(x ) ≥ e−cN Ln 1(x ).
z∈f −N (y)
388 Thermodynamic formalism

On the other hand, Lemma 12.1.8 gives that Ln 1(x ) ≥ Ln 1(x) exp(−K4 ρ α ).
Take c = sup |ϕ| and K ≥ exp(K4 ρ α )ecN λN . Combining the previous inequali-
ties, we get that
Ln+N 1(y) ≥ exp(−K4 ρ α )e−cN Ln 1(x) ≥ K −1 λN Ln 1(x)
for every x, y ∈ M. Therefore, for every n ≥ 1,
min λ−(n+N) Ln+N 1 ≥ K −1 max λ−n Ln 1. (12.1.11)
Combining (12.1.10) and (12.1.11), we get:
max λ−n Ln 1 ≤ K min λ−(n+N) Ln+N 1 ≤ K for every n ≥ 1,
min λ−n Ln 1 ≥ K −1 max λ−n+N Ln−N 1 ≥ K −1 for every n > N.
To conclude the proof, we only have to extend this last estimate to the values
n = 1, . . . , N. For that, observe that each Ln 1 is a positive continuous function.
Since M is compact, it follows that the minimum of Ln 1 is positive for every n.
Then, we may take K5 ≥ K such that min λ−n Ln 1 ≥ K5−1 for every n = 1, . . . , N.

It follows immediately from Corollary 12.1.9 that the positive eigenvalue λ


is uniquely determined. By Lemma 12.1.1, this implies that λ = ρ(L) = ρ(L∗ ).
We are also going to see, in a while, that λ = eP(f ,ϕ) .

Lemma 12.1.10. There exists K6 > 0 such that


|λ−n Ln 1(x) − λ−n Ln 1(y)| ≤ K6 d(x, y)α for any n ≥ 1 and x, y ∈ M.
In particular, the sequence λ−n Ln 1 is equicontinuous.

Proof. Initially, suppose that d(x, y) < ρ. By Lemma 12.1.8,


Ln 1(x) ≤ Ln 1(y) exp(K4 d(x, y)α )
and, hence,
 
λ−n Ln 1(x) − λ−n Ln 1(y) ≤ exp(K4 d(x, y)α ) − 1 λ−n Ln 1(y).
Take K > 0 such that | exp(K4 t) − 1| ≤ K|t| whenever |t| ≤ ρ α . Then, using
Corollary 12.1.9,
λ−n Ln 1(x) − λ−n Ln 1(y) ≤ KK5 d(x, y)α .
Reversing the roles of x and y, we conclude that
|λ−n Ln 1(x) − λ−n Ln 1(y)| ≤ KK5 d(x, y)α whenever d(x, y) < ρ.
When d(x, y) ≥ ρ, Corollary 12.1.9 gives that
|λ−n Ln 1(x) − λ−n Ln 1(y)| ≤ 2K5 ≤ 2K5 ρ −α d(x, y)α .
Hence, it suffices to take K6 ≥ max{KK5 , 2K5 ρ −α } to get the first part of the
statement. The second part is an immediate consequence.
12.1 Theorem of Ruelle 389

We are ready to show that the transfer operator L admits some eigenfunction
associated with the eigenvalue λ. Corollary 12.1.9 and Lemma 12.1.10 imply
that the time average
1  −i i
n−1
hn = λ L1
n i=0

defines an equicontinuous bounded sequence. Then, by the theorem of


Ascoli–Arzelá, there exists some subsequence (hni )i converging uniformly to
a continuous function h.

Lemma 12.1.11. The function h satisfies Lh = λh. Moreover, h dν = 1 and

K5−1 ≤ h(x) ≤ K5 and |h(x) − h(y)| ≤ K6 d(x, y)α for every x, y ∈ M.

Proof. Consider any subsequence (hni )i converging to h. As the transfer


operator L is continuous,
ni −1
1 λ
ni
−k k+1
Lh = lim Lhni = lim λ L 1 = lim λ−k Lk 1
i i ni i ni
k=0 k=1
ni −1
λ λ  −ni ni 
= lim λ−k Lk 1 + λ L 1−1 .
i ni ni
k=0

The first term on the right-hand side converges to λh whereas the second
one converges to zero, because the sequence λ−n Ln 1 is uniformly bounded.
It follows that Lh = λh, as we stated.
Note that λ−n Ln 1 dν = λ−n d(L∗n ν) = 1 dν = 1 for every n ∈ N, by the
definition of ν. It follows that hn dν = 1 for every n and, using the dominated
convergence theorem, h dν = 1. All the other claims in the statement follow,
in an entirely analogous way, from Corollary 12.1.9 and Lemma 12.1.10.

12.1.4 Construction of the equilibrium state


Consider the measure defined by μ = hν, that is,

μ(A) = h dν for each measurable set A ⊂ M.
A

We are going to see that μ is an equilibrium state for the potential ϕ and
satisfies all the other conditions in Theorem 12.1.
From Lemma 12.1.11 we get that μ(M) = h dν = 1 and so μ is a
probability measure. Moreover,

K5−1 ν(A) ≤ μ(A) ≤ K5 ν(A) (12.1.12)

for every measurable set A ⊂ M. In particular, μ is equivalent to the reference


measure ν. This fact, together with Lemma 12.1.4, gives that supp μ = M. It
390 Thermodynamic formalism

also follows from the relation (12.1.12), together with Lemma 12.1.7, that μ is
a Gibbs state: taking L = K5 K, we find that
μ(B(x, n, ε))
L−1 ≤ ≤ L, (12.1.13)
exp(ϕn (x) − nP)
for every x ∈ M and every n ≥ 1. Recall that P = log λ.
Lemma 12.1.12. The probability measure μ is invariant under f . Moreover, f
admits a Jacobian with respect to μ, given by Jμ f = λe−ϕ (h ◦ f )/h.

Proof. Start by noting that L (g1 ◦f )g2 ) = g1 Lg2 , for any continuous functions
g1 , g2 : M → R. Indeed, for every y ∈ M,
  
L (g1 ◦ f )g2 (y) = eϕ(x) g1 (f (x))g2 (x)
x∈f −1 (y)
 (12.1.14)
ϕ(x)
= g1 (y) e g2 (x) = g1 (y)Lg2 (y).
x∈f −1 (y)

Thus, for every continuous function g : M → R,


  
−1 ∗ −1
 
(g ◦ f ) dμ = λ (g ◦ f )h d(L ν) = λ L (g ◦ f )h dν
  
−1
=λ gLh dν = gh dν = g dμ.

In view of Proposition A.3.3, this proves that the probability measure μ is


invariant under f .
To prove the second claim, consider any domain of invertibility A of f . Then,
using Lemma 9.7.4(i),
   
h◦f
μ(f (A)) = 1 dμ = h dν = Jν f (h ◦ f ) dν = Jν f dμ.
f (A) f (A) A A h
By Lemma 12.1.3, this means that
h◦f h◦f
Jμ f = Jν f = λe−ϕ ,
h h
as stated.

Corollary 12.1.13. The invariant probability measure μ = hν satisfies



hμ (f ) + ϕ dμ = P.

Proof. Combining the Rokhlin formula (Theorem 9.7.3) with the second part
of Lemma 12.1.12,
  
hμ (f ) = log Jμ f dμ = log λ − ϕ dμ + (log h ◦ f − log h) dμ.

Since μ is invariant and log h is bounded (Corollary 12.1.9), the last term is
equal to zero. This shows that hμ (f ) = P − ϕ dμ, as stated.
12.1 Theorem of Ruelle 391

To complete the proof that μ = hν is an equilibrium state, all that we need


to do is to check that P = log λ is equal to the pressure P(f , ϕ). This is done in
Corollary 12.1.15 below.

12.1.5 Pressure and eigenvalues


Let η be any probability measure invariant under f and such that

hη (f ) + ϕ dη ≥ P (12.1.15)

(for example: the probability measure μ constructed in the previous section).


Let gη = 1/Jη f (the Jacobian Jη f does exist, by Exercise 9.7.8) and consider
also the function g = λ−1 eϕ h/(h ◦ f ). Observe that
 1  Lh(y)
g(x) = eϕ(x) h(x) = =1 (12.1.16)
−1
λh(y) −1
λh(y)
x∈f (y) x∈f (y)

for every y ∈ M. Moreover, since η is invariant under f , Exercise 9.7.4 gives


that 
gη (x) = 1 for η-almost every y ∈ M. (12.1.17)
x∈f −1 (y)

Using (12.1.15) and the Rokhlin formula (Theorem 9.7.3),


 
0 ≤ hη (f ) + ϕ dη − P = (− log gη + ϕ − log λ) dη. (12.1.18)

By the definition of g and the hypothesis that η is invariant, the integral on the
right-hand side of (12.1.18) is equal to
 
g
(− log gη + log g + log h ◦ f − log h) dη = log dη. (12.1.19)

Recalling the definition of gη , Exercise 9.7.3 gives that
    
g g
log dη = gη (x) log (x) dη(y). (12.1.20)
gη −1

x∈f (y)

At this point we need the following elementary fact:

Lemma 12.1.14. Let pi , bi , i = 1, . . . , k be positive real numbers such that


k
i=1 pi = 1. Then
k  k
pi log bi ≤ log( pi bi ),
i=1 i=1
k
and the identity holds if and only if the numbers bj are all equal to i=1 pi bi .

Proof. Take ai = log(pi bi ) in Lemma 10.4.4. Then the inequality in the


conclusion of Lemma 10.4.4 corresponds exactly to the inequality in the
392 Thermodynamic formalism

present lemma. Moreover, the identity holds if and only if


eaj pj bj 
pj =  a ⇔ pj =  ⇔ bj = pi bi
ie i pi bi
i
i

for every j = 1, . . . , n.

For each y ∈ M, take pi = gη (xi ) and bi = g(xi )/gη (xi ), where the xi are the

pre-images of y. The identity (12.1.17) means that i pi = 1 for η-almost every
y. Then, we may apply Lemma 12.1.14:
 g  g
gη (x) log (x) ≤ log gη (x) (x)
gη gη
x∈f −1 (y) x∈f −1 (y)
 (12.1.21)
= log g(x) = 0
x∈f −1 (y)

for η-almost every y; in the last step we used (12.1.16). Combining the relations
(12.1.18) through (12.1.21), we find:
 
g
hη (f ) + ϕ dη − P = log dη = 0. (12.1.22)

Corollary 12.1.15. P(f , ϕ) = P = log ρ(L).

Proof. By (12.1.22), we have that hη (f ) + ϕ dη = P for every invariant


probability measure η such that hη (f )+ ϕ dη ≥ P. By the variational principle
(Theorem 10.4.1), it follows that P(f , ϕ) = P. The second identity has been
observed before, right after Corollary 12.1.9.

At this point we have completed the proof that the measure μ = hν


constructed in the previous section is an equilibrium state for ϕ. The statement
that follows arises from the same kind of ideas and is the basis for proving that
this equilibrium state is unique:

Corollary 12.1.16. If η is an equilibrium state for ϕ then supp η = M and

Jη f = λe−ϕ (h ◦ f )/h and L∗ (η/h) = λ(η/h).

Proof. The first claim is an immediate consequence of the second one and
Lemma 12.1.4.
Note that the identity in (12.1.22) also implies that the identity in (12.1.21)
holds for η-almost every y ∈ M. According to Lemma 12.1.14, that happens if
and only if the numbers bi = log(g(xi )/gη (xi )) are all equal. In other words, for
η-almost every y ∈ M there exists a number c(y) such that
g(x)
= c(y) for every x ∈ f −1 (y).
gη (x)
12.1 Theorem of Ruelle 393

Moreover, recalling the identities (12.1.18) and (12.1.19),


 
c(y) = c(y)gη (x) = g(x) = 1
x∈f −1 (y) x∈f −1 (y)

for η-almost every y. It follows that gη = g at η-almost every point, and so the
function 1/g = λe−ϕ (h ◦ f )/h is a Jacobian of f with respect to η. This proves
the second claim.
To prove the third claim, let ξ : M → R be any continuous function. On the
one hand, by the definition of the transfer operator,
       
∗ η 1  1 ϕ(x)
ξ dL = Lξ dη = e ξ(x) dη(y). (12.1.23)
h h h(y) −1 x∈f (y)

By the definition of the function g,


eϕ(x) λg(x)
= .
h(y) h(x)
Replacing this identity in (12.1.23), we obtain:
      
η λgξ
ξ dL∗ = (x) dη(y). (12.1.24)
h −1
h
x∈f (y)

Then, recalling that g = gη = 1/Jη f , we may use Exercise 9.7.3 to conclude


that        
η λgξ λξ
ξ dL∗ = (x) dη(y) = dη.
h −1
h h
x∈f (y)

Since the continuous function ξ is arbitrary, this shows that L∗ (η/h) = λ(η/h),
as stated.

12.1.6 Uniqueness of the equilibrium state


Let us start by proving the following distortion bound:
Corollary 12.1.17. There exists K7 > 0 such that for every equilibrium state
η, every n ≥ 1, every x ∈ M and every y ∈ B(x, n + 1, ρ),
Jη f n (x)
K7−1 ≤ ≤ K7 .
Jη f n (y)
Proof. By Corollary 12.1.16,
h◦fn h◦fn
Jη f n = λe−ϕn = Jν f n
h h
for each n ≥ 1. Then, using Corollary 12.1.6 and Lemma 12.1.11,
Jη f n (x) Jν f n (x) h(f n (x))h(y)
K2−1 K5−4 ≤ = ≤ K2 K54 .
Jη f (y) Jν f (y) f (f (y))h(x)
n n n

Therefore, it suffices to take K7 = K2 K54 .


394 Thermodynamic formalism

Lemma 12.1.18. All the equilibrium states of ϕ are equivalent measures.

Proof. Let η1 and η2 be equilibrium states. Consider any finite partition P of


M such that every P ∈ P has non-empty interior and diameter less than ρ. Since
supp η1 = supp η2 = M (by Corollary 12.1.16), the set {ηi (P) : i = 1, 2 and P ∈
P} is bounded from zero. Consequently, there exists C1 > 0 such that
1 η1 (P)
≤ ≤ C1 for every P ∈ P. (12.1.25)
C1 η2 (P)
We are going to show that this relation extends to every measurable subset of
M, up to replacing C1 by a convenient constant C2 > C1 .
For each n ≥ 1, let Qn be the partition of M formed by the images hn (P)
of the elements of P under the inverse branches hn of the iterate f n . By the
definition of Jacobian, ηi (P) = hn (P) Jηi f n dηi . Hence, using Corollary 12.1.17,
ηi (P)
K7−1 Jηi f n (x) ≤ ≤ K7 Jηi f n (x)
ηi (h (P))
n

for any x ∈ hn (P). Recalling that Jη1 f = Jη2 f (Corollary 12.1.16), it follows that
η2 (P)η1 (hn (P))
K7−2 ≤ ≤ K72 . (12.1.26)
η1 (P)η2 (h (P))
n

Combining (12.1.25) and (12.1.26), and taking C2 = C1 K72 , we get that


1 η1 (hn (P))
≤ ≤ C2 (12.1.27)
C2 η2 (hn (P))
for every P ∈ P, every inverse branch hn of f n and every n ≥ 1. In other words,
the property in (12.1.25) holds for every element of Qn , with C2 in the place
of C1 .
Now observe that diam Qn ≤ 2σ −n ρ for every n. Given any measurable set
B and any δ > 0, we may use Proposition A.3.2 to find a compact set F ⊂ B
and an open set A ⊃ B such that ηi (A \ F) < δ for i = 1, 2. Let Qn be the union
of all the elements of the partition Qn that intersect F. It is clear that Qn ⊃ F
and, assuming that n is large enough, Qn ⊂ A. Then,
η1 (B) ≤ η1 (A) < η1 (Qn ) + δ and η2 (B) ≥ η2 (F) > η2 (Qn ) − δ.
The relation (12.1.27) gives that η1 (Qn ) ≤ C2 η2 (Qn ), since Qn is a (disjoint)
union of elements of Qn . Combining these three inequalities, we obtain
 
η1 (B) < C2 η2 (B) + δ + δ.
Since δ is arbitrary, we conclude that η1 (B) ≤ C2 η2 (B) for every measurable set
B ⊂ M. Reversing the roles of the two measures, we also get η2 (B) ≤ C2 η2 (B)
for every measurable set B ⊂ M.
These inequalities prove that any two equilibrium states are equivalent
measures, with Radon–Nikodym derivatives bounded from zero and infinity.
12.1 Theorem of Ruelle 395

Combining Lemmas 4.3.3 and 12.1.18 we get that all the ergodic equilib-
rium states are equal. Now, by Proposition 10.5.5, the connected components
of any equilibrium state are also equilibrium states (ergodic, of course). It
follows that there exists a unique equilibrium state, as stated.
There is an alternative proof of the fact that the equilibrium state is unique
that does not use Proposition 10.5.5 and, thus, does not require the theorem of
Jacobs. Indeed, the results in the next section imply that the equilibrium state
μ = hν in Section 12.1.4 is ergodic. By Lemma 12.1.18, that implies that all
the equilibrium states are ergodic. Using Lemma 4.3.3, it follows that all the
equilibrium states must coincide.
As a consequence, the reference measure ν is also unique: if there were two
distinct reference measures, ν1 and ν2 , then μ1 = hν1 and μ2 = hν2 would be
distinct equilibrium states. Analogously, the positive eigenfunction h is unique
up to multiplication by a positive constant.

12.1.7 Exactness
Finally, let us prove that the system (f , μ) is exact. Recall that this means that
if B ⊂ M is such that there exist measurable sets Bn satisfying B = f −n (Bn ) for
every n ≥ 1, then B has measure 0 or measure 1.
Let B be such a subset of M and assume that μ(B) > 0. Let P be a finite
partition of M by subsets with non-empty interior and diameter less than ρ.
For each n, let Qn be the partition of M whose elements are the images hn (P)
of the sets P ∈ P under the inverse branches hn of the iterate f n .

Lemma 12.1.19. For every ε > 0 and every n ≥ 1 sufficiently large there exists
some hn (P) ∈ Qn such that
 
μ B ∩ hn (P) > (1 − ε)μ(hn (P)). (12.1.28)

Proof. Fix ε > 0. Since the measure μ is regular (Proposition A.3.2), given any
δ > 0 there exist some compact set F ⊂ B and some open set A ⊃ B satisfying
μ(A \ F) < δ. Since we assume that μ(B) > 0, this inequality implies that
μ(F) > (1 − ε)μ(A), as long as δ > 0 is sufficiently small. Fix δ from now on.
Note that diam Qn < σ −n ρ. Then, for every n sufficiently large, any element
hn (P) of Qn that intersects F is contained in A. By contradiction, suppose
that (12.1.28) is false for every hn (P). Then, adding over all the hn (P) that
intersect F,
     
μ(F) ≤ μ F ∩ hn (P) ≤ μ B ∩ hn (P)
P,hn P,hn

≤ (1 − ε) μ(hn (P)) ≤ (1 − ε)μ(A).
P,hn

This contradiction proves that (12.1.28) is valid for some hn (P) ∈ Qn .


396 Thermodynamic formalism

Consider any hn (P) ∈ Qn such that (12.1.28).


 Since B = f −n (Bn ) and f n ◦
hn = id in its domain, we have that f n hn (P) \ B) = P \ Bn . Then, applying
Corollary 12.1.17 to the measure η = μ,

μ(P \ Bn ) = Jμ f n dμ ≤ K7 μ(hn (P) \ B)Jμ f n (x)
hn (P)\B
 (12.1.29)
and μ(P) = Jμ f n
dμ ≥ K7−1 μ(hn (P))Jμ f n (x)
hn (P)

for any x ∈ hn (P). Combining (12.1.28) and (12.1.29),


μ(P \ Bn ) 2 μ(h (P) \ B)
n
≤ K7 ≤ K72 ε.
μ(P) μ(hn (P))
In this way we have shown that, given any ε > 0 and any n ≥ 1 sufficiently
large, there exists some P ∈ P such that μ(P \ Bn ) ≤ K72 εμ(P).
Since the partition P is finite, it follows that there exist some P ∈ P and
some sequence (nj )j → ∞ such that
μ(P \ Bnj ) → 0 when j → ∞. (12.1.30)
Let P be fixed from now on. Since, by assumption, P has non-empty interior
and f is topologically exact, there exists N ≥ 1 such that f N (P) = M. Let
P = P1 ∪ · · · ∪ Ps be a finite partition of P into domains of invertibility of f N .
Corollaries 12.1.9 and 12.1.16 give that Jμ f N = λN e−ϕN (h ◦ f N )/f is bounded
from zero and infinity. Note also that f N (Pi \ Bnj ) = f N (Pi ) \ Bnj +N , because
f −n (Bn ) = B for every n. Combining these two observations with (12.1.30), we
find that, given any i = 1, . . . , s, the sequence

μ(f (Pi ) \ Bnj +N ) = μ(f (Pi \ Bnj )) =
N N
Jμ f N dμ
Pi \Bnj

converges to zero when j → ∞. Now, {f N (Pi ) : i = 1, . . . , s} is a finite cover of


M by measurable sets. Therefore, this last conclusion implies that μ(M \Bnj +N )
converges to zero, that is, μ(B) = μ(Bnj +N ) converges to 1 when j → ∞. That
means that μ(B) = 1, of course.
The proof of Theorem 12.1 is complete.

12.1.8 Absolutely continuous measures


In this last section on the theorem of Ruelle we briefly discuss the special
case when f : M → M is a local diffeomorphism on a compact Riemannian
manifold and ϕ = − log | det Df |. It is assumed that f is such that this potential
ϕ is Hölder. The first goal is to compare the conclusions of the theorem of
Ruelle in this case with the results in Section 11.1:

Proposition 12.1.20. The invariant absolutely continuous probability measure


coincides with the equilibrium state μ of the potential ϕ = − log | det Df |.
12.1 Theorem of Ruelle 397

Consequently, it is equivalent to the Lebesgue measure m, with density dμ/dm


Hölder and bounded from zero and infinity, and it is exact.

Proof. We saw in Example 12.1.2 that the Lebesgue measure m is an


eigenvector of the dual L∗ of the transfer operator associated with the potential
ϕ = − log | det Df |: more precisely,

L∗ m = m.

Applying the previous theory (from Lemma 12.1.3 on) with λ = 1 and ν = m,
we find a Hölder function h : M → R, bounded from zero and infinity, such
that Lh = h and the measure μ = hm is the equilibrium state of the potential
ϕ. Recalling Corollary 11.1.15, it follows that μ is also the unique probability
measure invariant under f and absolutely continuous with respect to m. The
fact that h is positive implies that μ and m are equivalent measures. Exactness
was proven in Section 12.1.7.

It is worthwhile pointing out that, while the absolutely continuous invariant


measure is unique (Theorem 11.1.2), the potential ϕ = − log | det Df | depends
on the choice of the Riemannian metric on M, because the determinant does.
So, Proposition 12.1.20 also implies that all the potentials of this form,
corresponding to different choices of the Riemannian metric, have the same
equilibrium state. This type of situation is the subject of Section 12.2 and, in
particular, Exercise 12.2.3.
It also follows from the proof of Theorem 12.1 that

hμ (f ) − log | det Df | dμ = P(f , − log | det Df |) = log λ = 0. (12.1.31)

Let ϕ̃ be the time average of the function ϕ, given by the Birkhoff ergodic
theorem. Then,
  
log | det Df | dμ = −ϕ dμ = −ϕ̃ dμ. (12.1.32)

Moreover,

1
n−1
1
− ϕ̃(x) = lim log | det Df (f j (x))| = lim log | det Df n (x)| (12.1.33)
n n n n
j=0

at μ-almost every point. In the context of our comments about the Oseledets
theorem (see the relation (c1) in Section 3.3.5) we mentioned that

1  k(x)
lim log | det Df (x)| =
n
di (x)λi (x), (12.1.34)
n n
i=1

where λ1 (x), . . . , λk(x) (x) are the Lyapunov exponents of the transformation
f at the point x and d1 (x), . . . , dk(x) (x) are the corresponding multiplicities.
398 Thermodynamic formalism

Combining the relations (12.1.31)–(12.1.34), we find that


 k(x) 
hμ (f ) = di (x)λi (x) dμ(x). (12.1.35)
i=1

Since these functions are invariant (see the relation (a1) in Section 3.3.5)
and the measure μ is ergodic, the functions k(x), λi (x) and di (x) are constant
at μ-almost every point. Let us denote by k, λi and di these constants. Then
(12.1.35) translates into the following theorem:

Theorem 12.1.21. Let f : M → M be an expanding map on a compact


Riemannian manifold, such that the derivative Df is Hölder. Let μ be the
unique invariant probability measure absolutely continuous with respect to the
Lebesgue measure on M. Then

k
hμ (f ) = di λi , (12.1.36)
i=1

where λi , i = 1, . . . , k are the Lyapunov exponents of f at μ-almost every point


and di , i = 1, . . . , k are the corresponding multiplicities.

As we pointed out before, in Section 9.4.4, this is a special instance of the


Pesin entropy formula (Theorem 9.4.5).

12.1.9 Exercises
12.1.1. Show that if η is a Borel measure on a compact metric space then for every
ε > 0 there exists b > 0 such that η(B(y, ε)) > b for every y ∈ supp η.
12.1.2. Let f : M → M be an expanding map. Consider the non-linear operator G :
M1 (M) → M1 (M) defined in the space M1 (M) of all Borel probability
measures by
L∗ (η)
G(η) = .
L1 dη
Use the Tychonoff–Schauder theorem (Theorem 2.2.3) to show that G admits
some fixed point and deduce Lemma 12.1.1.
12.1.3. Let σ : A → A be the one-sided shift of finite type associated with a given
transition matrix A (recall Section 10.2.2). Let P be a stochastic matrix such that
Pi,j = 0 whenever Ai,j = 0 and p be a probability vector with positive coefficients
such that P∗ p = p. Consider the transfer operator L associated with the locally
constant potential
pi1
ϕ(i0 , i1 , . . . , in , . . . ) = − log .
pi0 Pi0 ,i1
Show that the Markov measure μ associated with the matrix P and the vector p
satisfies L∗ μ = μ.
12.1.4. Let λ be any positive number and ν be a Borel probability measure
such that L∗ ν = λν. Show that, given any u ∈ L1 (ν) and any continuous
12.2 Theorem of Livšic 399

function v : M → R,
 
(u ◦ f )v dν = u(λ−1 Lv) dν.

12.2 Theorem of Livšic


Now we discuss the following issue: when is it the case that two different
Hölder potentials φ and ψ have the same equilibrium state? Observe that,
since these are ergodic measures, the two equilibrium states μφ and μψ either
coincide or are mutually singular (by Lemma 4.3.3).
Recall that two potentials are said to be cohomologous (with respect to f ) if
the difference between them may be written as u ◦ f − u for some continuous
function u : M → R.

Theorem 12.2.1 (Livšic). A potential ϕ : M → R is cohomologous to zero if


and only if ϕn (x) = 0 for every x ∈ Fix(f n ) and every n ≥ 1.

Proof. It is clear that if ϕ = u ◦ f − u for some u then



n 
n−1
ϕn (x) = u(f (x)) −
j
u(f j (x)) = 0
j=1 j=0

for every x ∈ M such that f n (x) = x. The converse is a lot more interesting.
Suppose that ϕn (x) = 0 for every x ∈ Fix(f n ) and every n ≥ 1. Consider
any point z ∈ M whose orbit is dense in M; such a point exists because f is
topologically exact and, consequently, transitive. Define the function u on the
orbit of z through the following relation:
u(f n (z)) = u(z) + ϕn (z), (12.2.1)
where u(z) is arbitrary. Observe that
u(f n+1 (z)) − u(f n (z)) = ϕn+1 (z) − ϕn (z) = ϕ(f n (z)) (12.2.2)
for every n ≥ 0. In other words, the cohomology relation
φ −ψ = u◦f −u (12.2.3)
holds on the orbit of z. To extend this relation to the whole of M, we use the
following fact:

Lemma 12.2.2. The function u is uniformly continuous on the orbit of z.

Proof. Given ε ∈ (0, ρ), take δ > 0 given by the shadowing lemma (Proposi-
tion 11.2.9). Suppose that k ≥ 0 and l ≥ 1 are such that d(f k (z), f k+l (z)) < δ.
Then the periodic sequence (xn )n of period l given by
x0 = f k (z), x1 = f k+1 (z), . . . , xl−1 = f k+l−1 (z), xl = f k (z)
400 Thermodynamic formalism

is a δ-pseudo-orbit. Hence, by Proposition 11.2.9, there exists x ∈ Fix(f l ) such


that d(f j (x), f k+j (z)) < ε for every j ≥ 0. Since we took ε < ρ, this also implies
that x = hl (f l (x)), where hl : B(f k+l (z), ρ) → M denotes the inverse branch of
f l that maps f k+l (z) to f k (z). By (11.2.6), it follows that
d(f j (x), f k+j (z)) ≤ σ j−l d(f l (x), f k+l (z)) for every 0 ≤ j ≤ l. (12.2.4)
By the definition (12.2.1),
u(f k+l (z)) − u(f k (z)) = ϕk+l (z) − ϕk (z) = ϕl (f k (z)). (12.2.5)
Fix constants C > 0 and ν > 0 such that |ϕ(x) − ϕ(y)| ≤ Cd(x, y)ν for any
x, y ∈ M. Then,

  
j−1
 k+j  
ϕl (f k (z)) − ϕl (x) ≤ ϕ(f (z)) − ϕ(f j (x)) ≤ Cd(f j (x), f k+j (z))ν .
j=0 j=0

Using (12.2.4), it follows that



|ϕl (f (z)) − ϕl (x)| ≤
k
Cσ ν(j−l) d(x, f k+l (z))ν ≤ C1 εν , (12.2.6)
j=0

where C1 = C ∞ i=0 σ
−iν
. Recall that, by assumption, ψl (x) = 0. Hence,
combining (12.2.5) and (12.2.6), we find that |u(f k+l (z)) − u(f k (z))| ≤ C1 εν .
This completes the proof of the lemma.

It follows from Lemma 12.2.2 that u admits a (unique) continuous extension


to the closure of the orbit of z, that is, the ambient space M. Then, by continuity
of ϕ and u, the cohomology relation (12.2.3) extends to the whole M. This
proves Theorem 12.2.1.

Theorem 12.2.3. Let f : M → M be a topologically exact expanding map on


a compact metric space and φ and ψ be two Hölder potentials in M. The
following conditions are equivalent:

(i) μφ = μψ ;
(ii) there exist c ∈ R and an arbitrary function u : M → R such that φ − ψ =
c + u ◦ f − u;
(iii) φ − ψ is cohomologous to some constant c ∈ R;
(iv) there exist c ∈ R and a Hölder function u : M → R such that φ − ψ =
c + u ◦ f − u;
(v) there exists c ∈ R such that φn (x) − ψn (x) = cn for every x ∈ Fix(f n ) and
n ≥ 1.

Moreover, the constants c ∈ R in (ii), (iii), (iv) and (v) coincide; indeed, they
are all equal to P(f , φ) − P(f , ψ).

Proof. It is clear that (iv) implies (iii) and (iii) implies (ii).
12.2 Theorem of Livšic 401

If φ − ψ = c + u ◦ f − u for some function u then, given any x ∈ Fix(f n ),



n−1
  j 
n−1
 
φn (x) − ψn (x) = φ − ψ (f (x)) = c + u(f j+1 (x)) − u(f j (x)) .
j=0 j=0

Since f n (x) = x, the sum of the last two terms over every j = 0, . . . , n − 1
vanishes. Therefore, φn (x) − ψn (x) = cn. This proves that (ii) implies (v).
Suppose that φn (x) − ψn (x) = cn for every x ∈ Fix(f n ) and every n ≥ 0. That
means that the function ϕ = φ − ψ − c satisfies ϕn (x) = 0 for every x ∈ Fix(f n )
and every n ≥ 0. Note also that ϕ is Hölder. Hence, by Theorem 12.2.1, there
exists a continuous function u : M → R such that ϕ = u ◦ f − u. In other words,
φ − ψ is cohomologous to c. This shows that (v) implies (iii).
It follows from (10.3.4) and Proposition 10.3.8 that if φ is cohomologous to
ψ + c then
P(f , φ) = P(f , ψ + c) = P(f , ψ) + c.
On the other hand, given any invariant probability measure ν,
  
hν (f ) + φ dν = hν (f ) + (ψ + c) dν = hν (f ) + ψ dν + c.

Therefore, ν is an equilibrium state for φ if and only if ν is an equilibrium state


for ψ. This shows that (iii) implies (i).
If μφ and μψ coincide then they have the same Jacobian, of course. By
Lemma 12.1.12, this means that
hφ ◦ f hψ ◦ f
λφ e−φ = λψ e−ψ . (12.2.7)
hφ hψ
Let c = log λφ − log λψ and u = log hφ − log hψ . Both are well defined, since
λφ , λψ , hφ and hψ are all positive. Moreover, since the functions hφ and hψ are
Hölder and bounded from zero and infinity (Corollary 12.1.9), the function u
is Hölder. Finally, (12.2.7) may be rewritten as follows:
φ − ψ = c + log u ◦ f − u.
This shows that (i) implies (iv). The proof of the theorem is complete.

Here is an interesting consequence in the differentiable setting:

Corollary 12.2.4. Let f : M → M be a differentiable expanding map on a


compact Riemannian manifold such that the Jacobian det Df is Hölder. The
absolutely continuous invariant probability measure μ coincides with the
measure of maximum entropy if and only if there exists c ∈ R such that
| det Df n (x)| = ecn for every x ∈ Fix(f n ) and every n ≥ 1.

Proof. As we saw in Proposition 12.1.20, μ is the equilibrium state of the


potential ϕ = − log | det Df |. It is clear that the measure of maximum entropy
402 Thermodynamic formalism

μ0 is the equilibrium state of the zero function. Observe that



n−1
ϕn (x) = log | det Df (f j (x))| = log | det Df n (x)|.
j=0

Therefore, Theorem 12.2.3 gives that μ = μ0 if and only if there exists some
number c ∈ R such that log | det Df n (x)| = 0 + cn for every x ∈ Fix(f n ) and
every n ≥ 1.

12.2.1 Exercises
12.2.1. Consider the two-sided shift σ :  →  in  = {1, . . . , d}Z . Show that for
every Hölder function ϕ :  → R, there exists a Hölder function ϕ + :  → R,
cohomologous to ϕ and such that ϕ + (x) = ϕ + (y) whenever x = (xi )i∈Z and
y = (yi )i∈Z are such that xi = yi for i ≥ 0.
12.2.2. Prove that if the functions ϕ, ψ : M → R are such that there exist constants C, L
satisfying |ϕn (x) − ψn (x) − nC| ≤ L for every x ∈ M, then P(f , ϕ) = P(f , ψ) + C
and ϕ is cohomologous to ψ + C.
12.2.3. Let f : M → M be a differentiable expanding map on a compact manifold, with
Hölder derivative. Check that any two potentials of the form ϕ = − log | det Df |,
for two different choices of a Riemannian metric on M, are cohomologous.
[Observation: In particular, all such potentials have the same equilibrium state,
namely, the absolutely continuous invariant probability measure. This was
observed before, in Section 12.1.8.]
12.2.4. Given k ≥ 2, let f : S1 → S1 be the (expanding) map given by f (x) = kx mod Z.
Let g : S1 → S1 be a differentiable expanding map of degree k. Show that f and
g are topologically conjugate.
12.2.5. Given k ≥ 2, let f : S1 → S1 be the map given by f (x) = kx mod Z. Let g : S1 →
S1 be a differentiable expanding map of degree k, with Hölder derivative. Show
that the following conditions are equivalent:
(a) f and g are conjugated by some diffeomorphism;
(b) f and g are conjugated by some absolutely continuous homeomorphism
whose inverse is also absolutely continuous;
(c) (gn ) (p) = kn for every p ∈ Fix(f n ).

12.3 Decay of correlations


Let f : M → M be a topologically exact expanding map and ϕ : M → R
be a Hölder potential. As before, we denote by ν the reference measure
(Section 12.1.1) and by μ the equilibrium state (Section 12.1.4) of the potential
ϕ. Recall that μ = hν, where the function h is bounded from zero and infinity
(Corollary 12.1.9). In particular, L1 (μ) = L1 (ν).
Given b > 0 and β > 0, we say that a function g : M → R is (b, β)-Hölder if
|g(x) − g(y)| ≤ bd(x, y)β for any x, y ∈ M. (12.3.1)
12.3 Decay of correlations 403

We say that g is β-Hölder if it is (b, β)-Hölder for some b > 0. Then we


denote by Hβ (g) the smallest of such constants b. Moreover, fixing ρ > 0 as in
(11.2.1), we denote by Hβ,ρ (g) the smallest constant b such that the inequality
in (12.3.1) holds for any x, y ∈ M with d(x, y) < ρ.
The correlations sequence of two functions g1 and g2 , with respect to the
invariant measure μ, was defined in (7.1.1):
   
 
Cn (g1 , g2 ) =  (g1 ◦ f )g2 dμ − g1 dμ g2 dμ.
n

We also consider a similar notion for the reference measure ν:


   
 
Bn (g1 , g2 ) =  (g1 ◦ f )g2 dν − g1 dμ g2 dν .
n

In this section we prove that these sequences decay exponentially.

Theorem 12.3.1 (Exponential convergence to equilibrium). Given β ∈ (0, α],


there exists  < 1 and for every β-Hölder function g2 : M → C there exists
K1 (g2 ) > 0 such that

Bn (g1 , g2 ) ≤ K1 (g2 )n
|g1 | dν for every g1 ∈ L1 (ν) and every n ≥ 1.

The proof is presented in Sections 12.3.1 through 12.3.3. It provides an


explicit expression for the factor K1 (g2 ). Observe also that
    
  n  
Bn (g1 , g2 ) =  g1 d f∗ (g2 ν) − g1 d μ g2 dν .

Then, the conclusion of Theorem 12.3.1 may be interpreted as follows: the


iterates of any measure of the form g2 ν converge to the invariant measure
μ g2 dν exponentially fast.

Theorem 12.3.2 (Exponential decay of correlations). For every β ∈ (0, α]


there exists  < 1 and for every β-Hölder function g2 : M → C there exists
K2 (g2 ) > 0 such that

Cn (g1 , g2 ) ≤ K2 (g2 )n |g1 | dμ for every g1 ∈ L1 (μ) and every n ≥ 1.

In particular, given any pair g1 and g2 of β-Hölder functions, there exists


K(g1 , g2 ) > 0 such that Cn (g1 , g2 ) ≤ K(g1 , g2 )n for every n ≥ 1.

Proof. Recall that μ = hν and, according to Corollary 12.1.9, the function


h is α-Hölder and satisfies K5−1 ≤ h ≤ K5 for some K5 > 0. Hence (see
Exercise 12.3.5), g2 is β-Hölder if and only if g2 h is β-Hölder. Moreover,
  
Cn (g1 , g2 ) = (g1 ◦ f )g2 dμ − g1 dμ g2 dμ
n

  
= (g1 ◦ f )(g2 h) dν − g1 dμ (g2 h) dν = Bn (g1 , g2 h).
n
404 Thermodynamic formalism

Therefore, it follows from Theorem 12.3.1 that


 
Cn (g1 , g2 ) ≤ K1 (g2 h)n |g1 | dν ≤ K1 (g2 h)/K5 n |g1 | dμ.

This proves the first part of the theorem, with K2 (g2 ) = K1 (g2 h)/K5 . The
second part is an immediate consequence: if g1 is β-Hölder then g1 ∈ L1 (μ)
and it suffices to take K(g1 , g2 ) = K2 (g2 ) |g1 | dμ.

Before we move to prove Theorem 12.3.1, let us make a few quick


comments. The issue of decay of correlations was already discussed in
Section 7.4, from the viewpoint of the spectral gap property. Here we introduce
a different approach. The proof of the theorem that we are going to present
is based on the notion of projective distance associated with a cone, which
was introduced by Garret Birkhoff [Bir67]. This tool allows us to obtain
exponential convergence to equilibrium (which yields exponential decay of
correlations, as we have just shown) without having to analyze the spectrum
of the transfer operator. Incidentally, this can also be used to deduce that the
spectral gap property does hold in the present context. We will come back to
this theme near the end of Section 12.3.

12.3.1 Projective distances


Let E be a Banach space. We call a cone any convex subset C of E such that
tC ⊂ C for every t > 0 and C̄ ∩ (−C̄) = {0}, (12.3.2)
where C̄ denotes the closure of C (previously we considered only closed cones
but at this point it is convenient to loosen that requirement). Given v1 , v2 ∈ C,
define
α(v1 , v2 ) = sup{t > 0 : v2 − tv1 ∈ C} and β(v1 , v2 ) = inf{s > 0 : sv1 − v2 ∈ C}.
Figure 12.1 helps illustrate the geometric meaning of these numbers. By
convention, α(v1 , v2 ) = 0 if v2 − tv1 ∈
/ C for every t > 0 and β(v1 , v2 ) = +∞ if
sv1 − v2 ∈
/ C for every s > 0.

v2

v2 − α (v1, v2)v1
C

v1

v1 − β (v1, v2)−1v1
0
Figure 12.1. Defining the projective distance in a cone C
12.3 Decay of correlations 405

Note that α(v1 , v2 ) is always finite. Indeed, α(v1 , v2 ) = +∞ would mean


that there exists a sequence (tn )n → +∞ with v2 − tn v1 ∈ C for every n. Then,
sn = 1/tn would be a sequence of positive numbers converging to zero and such
that sn v2 − v1 ∈ C for every n. This would imply that −v1 ∈ C̄, which would
contradict the second condition in (12.3.2). An analogous argument shows that
β(v1 , v2 ) is always positive: β(v1 , v2 ) = 0 would imply −v2 ∈ C̄.
Given any cone C ⊂ E and any v1 , v2 ∈ C \ {0}, we define
β(v1 , v2 )
θ (v1 , v2 ) = log , (12.3.3)
α(v1 , v2 )
with θ (v1 , v2 ) = +∞ whenever α(v1 , v2 ) = 0 or β(v1 , v2 ) = +∞. The remarks
in the previous paragraph ensure that θ (v1 , v2 ) is always well defined. We call θ
the projective distance associated with the cone C. This terminology is justified
by the proposition that follows, which shows that θ defines a distance in the
projective quotient of C \ {0}, that is, in the set of equivalence classes of the
relation ∼ defined by v1 ∼ v2 ⇔ v1 = tv2 for some t > 0.

Proposition 12.3.3. If C is a cone then

(i) θ (v1 , v2 ) = θ (v2 , v1 ) for any v1 , v2 ∈ C;


(ii) θ (v1 , v2 ) + θ (v2 , v3 ) ≥ θ (v1 , v3 ) for any v1 , v2 , v3 ∈ C;
(iii) θ (v1 , v2 ) ≥ 0 for any v1 , v2 ∈ C;
(iv) θ (v1 , v2 ) = 0 if and only if there exists t > 0 such that v1 = tv2 ;
(v) θ (t1 v1 , t2 v2 ) = θ (v1 , v2 ) for any v1 , v2 ∈ C and t1 , t2 > 0.

Proof. If α(v2 , v1 ) > 0 then


 
1
α(v2 , v1 ) = sup{t > 0 : v1 − tv2 ∈ C} = sup t > 0 : v1 − v2 ∈ C
t
 −1
= inf{s > 0 : sv1 − v2 ∈ C} = β(v1 , v2 )−1 .
Moreover,
α(v2 , v1 ) = 0 ⇔ v1 − tv2 ∈
/ C for every t > 0
⇔ sv1 − v2 ∈
/ C for every s > 0 ⇔ β(v1 , v2 ) = +∞.

Therefore, α(v2 , v1 ) = β(v1 , v2 )−1 in all cases. Exchanging the roles of v1 and
v2 , we also get that β(v2 , v1 ) = α(v1 , v2 )−1 for any v1 , v2 ∈ C. Part (i) of the
proposition is an immediate consequence of these observations.
Next, we claim that α(v1 , v2 )α(v2 , v3 ) ≤ α(v1 , v3 ) for any v1 , v2 , v3 ∈ C. This
is obvious if α(v1 , v2 ) = 0 or α(v2 , v3 ) = 0; therefore, we may suppose that
α(v1 , v2 ) > 0 and α(v2 , v3 ) > 0. Then, by definition, there exist increasing
sequences of positive numbers (rn )n → α(v1 , v2 ) and (sn )n → α(v2 , v3 ) such
that
v 2 − rn v 1 ∈ C and v 3 − sn v 2 ∈ C for every n ≥ 1.
406 Thermodynamic formalism

Since C is convex, it follows that v3 − sn rn v1 ∈ C and so sn rn ≤ α(v1 , v3 ), for


every n ≥ 1. Passing to the limit as n → +∞, we get the claim. An analogous
argument shows that β(v1 , v2 )β(v2 , v3 ) ≥ β(v1 , v3 ) for any v1 , v2 , v3 ∈ C. Part
(ii) of the proposition follows immediately from these inequalities.
Part (iii) means, simply, that α(v1 , v2 ) ≤ β(v1 , v2 ) for any v1 , v2 ∈ C. To prove
this fact, consider t > 0 and s > 0 such that v2 − tv1 ∈ C and sv1 − v2 ∈ C. Then,
by convexity, (s−t)v1 ∈ C. If s−t were negative, then we would have −v1 ∈ C,
which would contradict the last part of (12.3.2). Therefore, s ≥ t for any t and
s as above. This implies that α(v1 , v2 ) ≤ β(v1 , v2 ).
Let v1 , v2 ∈ C be such that θ (v1 , v2 ) = 0. Then, α(v1 , v2 ) = β(v1 , v2 ) = γ for
some γ ∈ (0, +∞). Hence, there exist an increasing sequence (tn )n → γ and a
decreasing sequence (sn )n → γ with

v 2 − tn v 1 ∈ C and sn v 1 − v 2 ∈ C for every n ≥ 1.

Writing v2 − tn v1 = (v2 − γ v1 ) + (γ − tn )v1 , we conclude that v2 − γ v1 is in


the closure C̄ of C. Analogously, γ v1 − v2 ∈ C̄. By the second part of (12.3.2),
it follows that v2 − γ v1 = 0. This proves part (iv) of the proposition.
Finally, consider any t1 , t2 > 0 and v1 , v2 ∈ C. By definition,
t2 t2
α(t1 v1 , t2 v2 ) = α(v1 , v2 ) and β(t1 v1 , t2 v2 ) = β(v1 , v2 ).
t1 t1
Hence, θ (t1 v1 , t2 v2 ) = θ (v1 , v2 ), as stated in part (v) of the proposition.

Example 12.3.4. Consider the cone C = {(x, y) ∈ E : y > |x|} in E = R2 . The


projective quotient of C may be identified with the interval (−1, 1), through
(x, 1) → x. Given −1 < x1 ≤ x2 < 1, we have:

α((x1 , 1), (x2 , 1)) = sup{t > 0 : (x2 , 1) − t(x1 , 1) ∈ C}


1 − x2
= sup{t > 0 : 1 − t ≥ |x2 − tx1 |} = ,
1 − x1
x2 + 1
and β((x1 , 1), (x2 , 1)) = .
x1 + 1
Therefore,
θ ((x1 , 1), (x2 , 1)) = log R(−1, x1 , x2 , 1), (12.3.4)

where
(c − a)(d − b)
R(a, b, c, d) =
(b − a)(d − c)
denotes the cross-ratio of four positive numbers a < b ≤ c < d.

In Exercise 12.3.2 we invite the reader to check a similar fact when the
interval is replaced by the unit disk D = {z ∈ C : |z| < 1}.
12.3 Decay of correlations 407

Example 12.3.5. Let E = C0 (M) be the space of continuous functions on a


compact metric space M. Consider the cone C+ = {g ∈ E : g(x) > 0 for x ∈ M}.
For any g1 , g2 ∈ C+ ,
 
α(g1 , g2 ) = sup t > 0 : (g2 − tg1 )(x) > 0 for every x ∈ M
 
g2
= inf (x) : x ∈ M
g1
 
g2
and β(g1 , g2 ) = sup (x) : x ∈ M .
g1
Therefore,
 
sup(g2 /g1 ) g2 (x)g1 (y)
θ (g1 , g2 ) = log = log sup : x, y ∈ M . (12.3.5)
inf(g2 /g1 ) g1 (x)g2 (y)
This projective distance is complete (Exercise 12.3.3) but that is not always the
case (Exercise 12.3.4).

Next, we observe that the projective distance depends monotonically on the


cone. Indeed, let C1 and C2 be two cones with C1 ⊂ C2 and let αi (·, ·), βi (·, ·),
θi (·, ·), i = 1, 2 be the corresponding functions, as defined previously. It is clear
from the definitions that, given any v1 , v2 ∈ C2 ,

α1 (v1 , v2 ) ≤ α2 (v1 , v2 ) and β1 (v1 , v2 ) ≥ β2 (v1 , v2 )

and, consequently, θ1 (v1 , v2 ) ≥ θ2 (v1 , v2 ).


More generally, let C1 and C2 be cones in Banach spaces E1 and E2 ,
respectively, and let L : E1 → E2 be a linear operator such that L(C1 ) ⊂ C2 .
Then,
α1 (v1 , v2 ) = sup{t > 0 : v2 − tv1 ∈ C1 }
≤ sup{t > 0 : L(v2 − tv1 ) ∈ C2 }
= sup{t > 0 : L(v2 ) − tL(v1 ) ∈ C2 } = α2 (L(v1 ), L(v2 ))
and, analogously, β1 (v1 , v2 ) ≥ β2 (L(v1 ), L(v2 )). Consequently,

θ2 (L(v1 ), L(v2 )) ≤ θ1 (v1 , v2 ) for any v1 , v2 ∈ C1 . (12.3.6)

Of course, the inequality (12.3.6) is not strict, in general. However,


according to the next proposition, one does have a strict inequality whenever
L(C1 ) has finite θ2 -diameter in C2 ; actually, in that case L is a contraction with
respect to the projective distances θ1 and θ2 . Recall that the hyperbolic tangent
is defined by
1 − e−2x
tanh x = for every x ∈ R.
1 + e−2x
Keep in mind that the function tanh takes values in the interval (0, 1).
408 Thermodynamic formalism

Proposition 12.3.6. Let C1 and C2 be cones in Banach spaces E1 and E2 ,


respectively, and let L : E1 → E2 be a linear operator such that L(C1 ) ⊂ C2 .
Suppose that D = sup{θ2 (L(v1 ), L(v2 )) : v1 , v2 ∈ C1 } is finite. Then
 
D
θ2 (L(v1 ), L(v2 )) ≤ tanh θ1 (v1 , v2 ) for any v1 , v2 ∈ C.
4
Proof. Let v1 , v2 ∈ C1 . It is no restriction to suppose that α1 (v1 , v2 ) > 0 and
β1 (v1 , v2 ) < +∞, otherwise θ1 (v1 , v2 ) = +∞ and there is nothing to prove.
Then there exist an increasing sequence (tn )n → α1 (v1 , v2 ) and a decreasing
sequence (sn )n → β1 (v1 , v2 ) such that
v2 − tn v1 ∈ C1 and sn v1 − v2 ∈ C1 .
In particular, θ2 (L(v2 −tn v1 ), L(sn v1 −v2 )) ≤ D for every n ≥ 1. Fix any D0 > D.
Then we may choose positive numbers Tn and Sn such that
L(sn v1 − v2 ) − Tn L(v2 − tn v1 ) ∈ C2 and
(12.3.7)
Sn L(v2 − tn v1 ) − L(sn v1 − v2 ) ∈ C2 ,
and log(Sn /Tn ) ≤ D0 for every n ≥ 1. The first part of (12.3.7) gives that
(sn + tn Tn )L(v1 ) − (1 + Tn )L(v2 ) ∈ C2
and, by definition of β2 (·, ·), this implies that
sn + tn Tn
β2 (L(v1 ), L(v2 )) ≤ .
1 + Tn
Analogously, the second part of (12.3.7) implies that
sn + tn Sn
α2 (L(v1 ), L(v2 )) ≥ .
1 + Sn
Therefore, θ2 (L(v1 ), L(v2 )) cannot exceed
   
sn + tn Tn 1 + Sn sn /tn + Tn 1 + Sn
log · = log · .
1 + Tn sn + tn Sn 1 + Tn sn /tn + Sn
The last term may be rewritten as
   
sn sn
log + Tn − log(1 + Tn ) − log + Sn + log(1 + Sn ) =
tn tn
 log(sn /tn )  x 
e dx ex dx
= − ,
0 ex + Tn ex + Sn
and this expression is less than or equal to
 
ex (Sn − Tn ) sn
sup x log .
x>0 (e + Tn )(e + Sn )
x tn
Now we use the following elementary facts:

y(Sn − Tn ) 1 − Tn /Sn 1 − e−D0 /2 D0
sup = √ ≤ = tanh .
y>0 (y + Tn )(y + Sn ) 1 + Tn /Sn 1 + e−D0 /2 4
12.3 Decay of correlations 409

Indeed, the supremum is attained for y = Sn Tn and the inequality is a
consequence of the fact that log(Sn /Tn ) ≤ D0 . This proves that
   
D0 sn
θ2 (L(v1 ), L(v2 )) ≤ tanh log .
4 tn
Note also that θ (v1 , v2 ) = limn log(sn /tn ), due to our choice of sn and tn .
Hence, taking the limit when n → ∞ and then making D0 → D, we obtain
the conclusion of the proposition.

Example 12.3.7. Let C+ be the cone of positive continuous functions in M.


For each L > 1, let C(L) = {g ∈ C+ : sup |g| ≤ L inf |g|}. Then, C(L) has finite
diameter in C+ , for every L > 1. Indeed, we have seen in Example 12.3.5 that
the projective distance θ associated with C+ is given by
 
g2 (x)g1 (y)
θ (g1 , g2 ) = log sup : x, y ∈ M .
g1 (x)g2 (y)
In particular, θ (g1 , g2 ) ≤ 2 log L for any g1 , g2 ∈ C(L).

12.3.2 Cones of Hölder functions


Let f : M → M be a topologically exact expanding map and ρ > 0 and σ > 1 be
the constants in the definition (11.2.1). Let L : C0 (M) → C0 (M) be the transfer
operator associated with a Hölder potential ϕ : M → M. Fix constants K0 > 0
and α > 0 such that

|ϕ(x) − ϕ(y)| ≤ K0 d(x, y)α for any x, y ∈ M.

Given b > 0 and β > 0, we denote by C(b, β) the set of positive functions
g ∈ C0 (M) whose logarithm is (b, β)-Hölder on balls of radius ρ, that is, such
that

| log g(x) − log g(y)| ≤ bd(x, y)β whenever d(x, y) < ρ. (12.3.8)

Lemma 12.3.8. For any b > 0 and β > 0, the set C(b, β) is a cone in the space
E = C0 (M) and the corresponding projective distance is given by
β(g1 , g2 )
θ (g1 , g2 ) = log ,
α(g1 , g2 )
where α(g1 , g2 ) is the infimum and β(g1 , g2 ) is the supremum of the set
 
g2 exp(bd(x, y)β )g2 (x) − g2 (y)
(x), : x = y and d(x, y) < ρ .
g1 exp(bd(x, y)β )g1 (x) − g1 (y)

Proof. It is clear that g ∈ C implies tg ∈ C for every t > 0. Moreover, the


closure of C is contained in the set of non-negative functions and so −C̄ ∩ C̄
contains only the zero function. Now, to conclude that C is a cone, we only
410 Thermodynamic formalism

have to check that it is convex. Consider any g1 , g2 ∈ C(b, β). The definition
(12.3.8) means that
gi (x)
exp(−bd(x, y)β ) ≤ ≤ exp(bd(x, y)β )
gi (y)
for i = 1, 2 and any x, y ∈ M with d(x, y) < ρ. Then, given t1 , t2 > 0,
t1 g1 (x) + t2 g2 (x)
exp(−bd(x, y)β ) ≤ ≤ exp(bd(x, y)β )
t1 g1 (y) + t2 g2 (y)
for any x, y ∈ M with d(x, y) < ρ. Hence, t1 g1 + t2 g2 is in C(b, β).
We proceed to calculate the projective distance. By definition, α(g1 , g2 )
is the supremum of all the numbers t > 0 satisfying the following three
conditions:
g2
(g2 − tg1 )(x) > 0 ⇔ t < (x)
g1
(g2 − tg1 )(x) β exp(bd(x, y)β )g2 (y) − g2 (x)
≤ exp(bd(x, y) ) ⇔ t ≤
(g2 − tg1 )(y) exp(bd(x, y)β )g1 (y) − g1 (x)
(g2 − tg1 )(x) β exp(bd(x, y)β )g2 (x) − g2 (y)
≥ exp(−bd(x, y) ) ⇔ t ≤
(g2 − tg1 )(y) exp(bd(x, y)β )g1 (x) − g1 (y)
for any x, y ∈ M with x  = y and d(x, y) < ρ. Hence, α(g1 , g2 ) is equal to
 
g2 (x) exp(bd(x, y)β )g2 (x) − g2 (y)
inf , : x = y and d(x, y) < ρ .
g1 (x) exp(bd(x, y)β )g1 (x) − g1 (y)
Analogously, β(g1 , g2 ) is the supremum of this same set.

The crucial fact that makes the proof of Theorem 12.3.1 work is that the
transfer operator tends to improve the regularity of functions or, more pre-
cisely, their Hölder constants. The next proposition is a concrete manifestation
of this fact:

Lemma 12.3.9. For each β ∈ (0, α] there exists a constant λ0 ∈ (0, 1) such that
L(C(b, β)) ⊂ C(λ0 b, β) for every b sufficiently large (depending on β).

Proof. It follows directly from the expression of the transfer operator in


(12.1.1) that Lg is positive whenever g is positive. Therefore, we only have to
check the second condition in the definition of C(λ0 b, β). Consider y1 , y2 ∈ M
with d(y1 , y2 ) < ρ. The expression (12.1.2) gives that

k
Lg(yi ) = eϕ(xi,j ) g(xi,j )
j=1

for i = 1, 2, where the points xi,j ∈ f −1 (yi ) satisfy d(x1i , x2i ) ≤ σ −1 d(y1 , y2 ) for
every 1 ≤ j ≤ k. By hypothesis, ϕ is (K0 , α)-Hölder. Since we suppose that
12.3 Decay of correlations 411

β ≤ α, it follows that ϕ is (K, β)-Hölder, with K = K0 (diam M)α−β . Therefore,



k 
k
g(x1,i )eϕ(x1,i )
ϕ(x1,i ) ϕ(x2,i )
(Lg)(y1 ) = e g(x1,i ) = e g(x2,i )
i=1 i=1
g(x2,i )eϕ(x2,i )


k
 
≤ eϕ(x2,i ) g(x2,i ) exp bd(x1,i , x2,i )β + Kd(x1,i , x2,i )β
i=1
 
≤ (Lg)(y2 ) exp (b + K)σ −β d(y1 , y2 )β
for every g ∈ C(b, β). Fix λ0 ∈ (σ −β , 1). For every b sufficiently large, (b +
K)σ −β ≤ bλ0 . Then the previous relation gives that
(Lg)(y1 ) ≤ (Lg)(y2 ) exp(λ0 bd(y1 , y2 )β ),
for any y1 , y2 ∈ M with d(y1 , y2 ) < ρ. Exchanging the roles of y1 and y2 , we
obtain the other inequality.

Next, we use the family of cones C(L) introduced in Example 12.3.7:


Lemma 12.3.10. There exists N ≥ 1 and for every β > 0 and every b > 0 there
exists L > 1 satisfying LN (C(b, β)) ⊂ C(L).

Proof. By hypothesis, f is topologically exact. Hence, there exists N ≥ 1 such


that f N (B(z, ρ)) = M for every z ∈ M. Fix N once and for all. Given g ∈ C(b, β),
consider any point z ∈ M such that g(z) = sup g. Consider y1 , y2 ∈ M. On the
one hand,

LN g(y1 ) = eϕN (x) g(x) ≤ degree(f N )eN sup |ϕ| g(z).
x∈f −N (y1 )

On the other hand, by the choice of N, there exists x ∈ B(z, ρ) such that f N (x) =
y2 . Then,
β β
LN g(y2 ) ≥ eϕN (x) g(x) ≥ e−N sup |ϕ| e−bd(x,z) g(z) ≥ e−N sup |ϕ|−bρ g(z).
Since y1 and y2 are arbitrary, this proves that
sup LN g N 2N sup |ϕ|+bρ β
≤ degree(f )e .
inf LN g
Now it suffices to take L equal to the expression on the right-hand side of this
inequality.

Combining Lemmas 12.3.9 and 12.3.10, we get that there exists N ≥ 1 and,
given β ∈ (0, α] there exists λ0 ∈ (0, 1) such that, for every b > 0 sufficiently
large (depending on N and β) there exists L > 1, satisfying
LN (C(b, β)) ⊂ C(λN0 b, β) ∩ C(L). (12.3.9)
In what follows, we write C(c, β, R) = C(c, β) ∩ C(R) for any c > 0, β > 0 and
R > 1.
412 Thermodynamic formalism

Lemma 12.3.11. For every c ∈ (0, b) and R > 1, the set C(c, β, R) ⊂
C(b, β) has finite diameter with respect to the projective distance of the cone
C(b, β).

Proof. We use the expression of θ given by Lemma 12.3.8. On the one hand,
the hypothesis that g1 , g2 ∈ C(c, β) ensures that
    
exp bd(x, y)β g2 (x) − g2 (y) g2 (x) 1 − exp − bd(x, y)β g2 (y)/g2 (x)
  =   
exp bd(x, y)β g1 (x) − g1 (y) g1 (x) 1 − exp − bd(x, y)β g1 (y)/g1 (x)

g2 (x) 1 − exp − (b − c)d(x, y)β )
≥  
g1 (x) 1 − exp − (b + c)d(x, y)β

g2 (x) 1 − exp − (b − c)ρ β )
≥  
g1 (x) 1 − exp − (b + c)ρ β

for any x, y ∈ M with d(x, y) < ρ. Denote by r the value of the last fraction on
the right-hand side. Then, observing that r ∈ (0, 1),
   
g2 (x) g2 (x) g2 (x) inf g2
α(g1 , g2 ) ≥ inf ,r : x ∈ M = r inf :x∈M ≥r .
g1 (x) g1 (x) g1 (x) sup g1
Analogously,
   
g2 (x) 1 g2 (x) 1 g2 (x) 1 sup g2
β(g1 , g2 ) ≤ sup , : x ∈ M = sup :x∈M ≤ .
g1 (x) r g1 (x) r g1 (x) r inf g1
On the other hand, the hypothesis that g1 , g2 ∈ C(R) gives that
sup g2 inf g2
≤ R2 .
inf g1 sup g1
Combining these three inequalities, we conclude that θ (g1 , g2 ) ≤ log(R2 /r2 )
for any g1 , g2 ∈ C(c, β, R).

Corollary 12.3.12. There exists N ≥ 1 such that for every β ∈ (0, α] and every
b > 0 sufficiently large there exists 0 < 1 satisfying

θ (LN g1 , LN g2 ) ≤ 0 θ (g1 , g2 ) for any g1 , g2 ∈ C(b, β).

Proof. Take N ≥ 1, λ0 ∈ (0, 1) and L > 1 as in (12.3.9) and consider

c = λN0 b and R = L. (12.3.10)

Then LN (C(b, β)) ⊂ C(c, β, R) and it follows from Lemma 12.3.11 that the
diameter D of the image LN (C(b, β)) with respect to the projective distance
θ is finite. Take 0 = tanh(D/4). Now the conclusion of the corollary is an
immediate application of Proposition 12.3.6.
12.3 Decay of correlations 413

12.3.3 Exponential convergence


Fix N ≥ 1, β ∈ (0, α], b > 0 and L > 1 as in Corollary 12.3.12, and then
consider c > 0 and R > 1 given by (12.3.10). As before, we denote by
h the positive eigenfunction (Lemma 12.1.11) and by λ the spectral radius
(Corollary 12.1.15) of the transfer operator L. Recall that h is α-Hölder and
bounded from zero and infinity. Therefore, up to increasing the constants b
and L, if necessary, we may assume that h ∈ C(c, β, R).
The next lemma follows directly from the previous considerations and is a
significant step toward the estimate in Theorem 12.3.1. We continue denoting
by  ·  the norm defined in C0 (M) by φ = sup{|φ(x)| : x ∈ M}.

Lemma 12.3.13. There exists C > 0 and  ∈ (0, 1) such that


 
−n n
λ L g − h g dν ≤ C n
g dν for g ∈ C(c, β, R) and n ≥ 1.

Proof. Let g ∈ C(c, β, R). In particular, g > 0 and so g dν > 0. The conclusion
of the lemma is not affected when we multiply g by any positive number.
Hence, it is no restriction to suppose that g dν = 1. Then,
   
−n n −n ∗n
λ L g dν = λ g d(L ν) = g dν = 1 = h dν

and, hence, inf(λ−n Ln g/h) ≤ 1 ≤ sup(λ−n Ln g/h) for every n ≥ 1. Now, it


follows from the expressions in Lemma 12.3.8 that

−jN λ−jN LjN g


α(λ L g, h) ≤ inf
jN
≤ 1,
h
−jN jN λ−jN LjN g
β(λ L g, h) ≥ sup ≥ 1.
h
Consequently,

−jN −jN λ−jN LjN g


θ (λ L g, h) ≥ log β(λ L g, h) ≥ log sup
jN jN
,
h
−jN jN −jN jN λ−jN LjN g
θ (λ L g, h) ≥ − log α(λ L g, h) ≥ − log inf
h
for every j ≥ 0. Now let D be the diameter of C(c, β, R) with respect
to the projective distance θ (Lemma 12.3.11). By Proposition 12.3.3 and
Corollary 12.3.12,

θ (λ−jN LjN g, h) = θ (LjN g, LjN h) ≤ 0 θ (g, h) ≤ 0 D


j j

for every j ≥ 0. Combining this with the previous two inequalities,

j λ−jN LjN g λ−jN LjN g j


exp(−0 D) ≤ inf ≤ sup ≤ exp(0 D)
h h
414 Thermodynamic formalism

for every j ≥ 0. Fix C1 > 0 such that |ex − 1| ≤ C1 |x| whenever |x| ≤ D. Then
the previous relation implies that
 −jN jN 
λ L g(x) − h(x) ≤ h(x)C1 j D for every x ∈ M and j ≥ 0. (12.3.11)
0
1/N
Take C2 = C1 D sup h and  = 0 . The inequality (12.3.11) means that
λ−jN LjN g − h ≤ C2 jN for every j ≥ 1.
Given any n ≥ 1, write n = jN + r with j ≥ 0 and 0 ≤ r < N. Since the transfer
operator L : C0 (M) → C0 (M) is continuous and Lh = λh,
λ−n Ln g − h = λ−r Lr (λ−jN LjN g − h) ≤ (L/λ)r λ−jN LjN g − h.
Combining the last two inequalities,
λ−n Ln g − h ≤ (L/λ)r C2 n−r .
This proves the conclusion of the lemma, as long as we take C ≥ C2 (L/(λ))r
for every 0 ≤ r < N.

Now we are ready to prove Theorem 12.3.1:

Proof. Start by considering g2 ∈ C(c, β, R). Using the identity in Exer-


cise 12.1.4 and recalling that μ = hν,
    
 
Bn (g1 , g2 ) =  g1 λ−n Ln g2 − h g2 dν dν 
  
 −n n 
≤
λ L g2 − h g2 dν  |g1 | dν.

Therefore, using Lemma 12.3.13,
 
Bn (g1 , g2 ) ≤ C n
|g1 | dν g2 dν. (12.3.12)

Now let g2 : M → R be any β-Hölder function and H = Hβ (g2 ). Write g2 =


g+ −
2 − g2with
1 1
g+2 = (|g 2 | + g2 ) + B and g−
2 = (|g2 | − g2 ) + B,
2 2
with B defined by B = max{H/c, sup |g2 |/(R − 1)}. It is clear that the functions
g± ±
2 are positive: g2 ≥ B > 0. Moreover, they are (H, β)-Hölder:

|g± ± β
2 (x) − g2 (y)| ≤ |g2 (x) − g2 (y)| ≤ Hd(x, y) ,

for x, y ∈ M. Hence, using the mean value theorem and the fact that B ≥ H/c,
  ± ± β
 log g± (x) − log g± (y) ≤ |g2 (x) − g2 (y)| ≤ Hd(x, y) ≤ cd(x, y)β .
2 2
B B
Moreover, since B ≥ sup |g2 |/(R − 1),
sup g± ±
2 ≤ sup |g2 | + B ≤ RB ≤ R inf g2 .
12.3 Decay of correlations 415

Together with the previous relation, this means that g±


2 ∈ C(c, β, R). Then, we
may apply (12.3.12) to both functions:
 
±
Bn (g1 , g2 ) ≤ Cn
|g1 | dν g±
2 dν

and, consequently,
Bn (g1 , g2 ) ≤ Bn (g1 , g+ −
2 ) + Bn (g1 , g2 )
  (12.3.13)
≤ C n
|g1 | dν (g+ −
2 + g2 ) dν.

Moreover, by the definition of g±2,


  
2H 2 sup |g2 |
(g+ −
2 + g2 ) dν = |g2 | dν + 2B ≤ |g2 | dν + +
c R−1
(12.3.14)
2 R+1
≤ Hβ (g2 ) + sup |g2 |.
c R−1
Take C1 = C max{2/c, (R + 1)/(R − 1)} and define
 
K1 (g2 ) = 2C1 sup |g2 | + Hβ (g2 ) .
The relations (12.3.13) and (12.3.14) give that
 
  1
Bn (g1 , g2 ) ≤ C1 n |g1 | dν Hβ (g2 ) + sup |g2 | ≤ K1 (g2 )n |g1 | dν.
2
This closes the proof of the theorem in the case when g2 is a real function.
The general (complex) case follows easily. Note that K1 ((g2 ) ≤ K1 (g2 ),
because sup |(g2 | ≤ sup |g2 | and Hβ ((g2 ) ≤ Hβ (g2 ). Analogously, K1 ()g2 ) ≤
K1 (g2 ). Therefore, the previous argument yields

1  n
Bn (g1 , g2 ) ≤ Bn (g1 , (g2 ) + Bn (g1 , )g2 ) ≤ K1 ((g2 ) + K1 ()g2 )  |g1 | dν
2

≤ K1 (g2 ) n
|g1 | dν.

This completes the proof of Theorem 12.3.1.

We close this section with a few comments about the spectral gap property.
Let Cβ (M) be the vector space of β-Hölder functions g : M → C. We leave it
to the reader (Exercise 12.3.6) to check the following facts:

(i) The function gβ,ρ = sup |g| + Hβ,ρ (g) is a complete norm in Cβ (M).
(ii) Cβ (M) is invariant under the transfer operator: L(Cβ (M)) ⊂ Cβ (M).
(iii) The restriction L : Cβ (M) → Cβ (M) is continuous with respect to the
norm  · β,ρ .

Note that h ∈ Cβ (M), since β ≤ α. Define V = {g ∈ Cβ (M) : g dν = 0}. Then


Cβ (M) = V ⊕ Ch, because every function g ∈ Cβ (M) may be decomposed, in
416 Thermodynamic formalism

a unique way, as the sum of a function in V with a multiple of h:


 

g = g − h g dν) + h g dν.

Moreover, the direct sum Cβ (M) = V ⊕ Ch is invariant under the transfer


operator. Indeed, if g ∈ V then
  

g ∈ V ⇒ Lg dν = gdL ν = λ g dν = 0 ⇒ Lg ∈ V.

It follows that the spectrum of L : Cβ (M) → Cβ (M) is the union of {λ} with
the restriction of L to the hyperplane V. In Exercise 12.3.8 we invite the reader
to show that the spectral radius of L | V is strictly less than λ. Consequently,
L : Cβ (M) → Cβ (M) has the spectral gap property.
The book of Viviane Baladi [Bal00] contains an in-depth presentation of the
spectral theory of transfer operators and its connections to the issue of decay
of correlations, for differentiable (or piecewise differentiable) expanding maps
and also for uniformly hyperbolic diffeomorphisms.

12.3.4 Exercises
12.3.1. Show that the cross-ratio R(a, x, y, b) is invariant under every Möbius automor-
phism of the real line, that is, R(φ(a), φ(b), φ(c), φ(d)) = R(a, b, c, d) for any
a < b ≤ c < d and every transformation of the form φ(x) = (αx + β)/(γ x + δ)
with αδ − βγ  = 0.
12.3.2. Consider the cone C = {(z, s) ∈ C × R : s > |z|}. Its projective quotient may be
identified with the unit disk D = {z ∈ C : |z| < 1} through (z, 1) → z. Let d be the
distance induced in D, through this identification, by the projective distance of
C. Show that d coincides with the Cayley–Klein distance , which is defined by
|aq| |pb|
(p, q) = log , for p, q ∈ D,
|ap| |bq|
where a and b are the points where the straight line through p and q intersects
the boundary of the disk, denoted in such a way that p is between a and q and
q is between p and b. [Observation: The Cayley–Klein distance is related to the
Poincaré distance in the disk through the map z → (2z)/1 + |z|2 .]
12.3.3. Show that the projective distance associated with the cone C+ in Example 12.3.7
is complete, in the following sense: with respect to the projective distance,
every Cauchy sequence (gn )n converges to some element of C+ . Moreover, if
we normalize the functions (for example, fixing any probability measure η on
M and requiring that gn dη = 1 = g dη for every n), then (gn )n converges
uniformly to g.
12.3.4. Let M be a compact manifold and C1 be the cone of positive differentiable
functions in M. Show that the corresponding projective distance θ1 is not
complete.
12.3.5. Check that if g1 , g2 : M → R are β-Hölder functions, θ : M → M is an
L-Lipschitz transformation and η is a probability measure on M then:
12.4 Dimension of conformal repellers 417

(a) Hβ (g1 g2 ) ≤ sup |g1 |Hβ (g2 ) + sup |g2 |Hβ (g1 );
(b) |g1 | dη ≤ sup |g1 | ≤ |g1 | dη + Hβ (g1 )(diam M)β ;
(c) Hβ (g ◦ θ ) ≤ Lβ Hβ (g).
Moreover, the claim in (a) remains true if we replace Hβ by Hβ,ρ . The same
holds for the claim in (c), as long as L ≤ 1.
12.3.6. Let Cβ (M) be the vector space of β-Hölder functions on a compact metric space
M. Prove the properties (i), (ii), (iii) stated at the end of Section 12.3.
12.3.7. Endow Cβ (M) with the norm  · β,ρ . Let L : Cβ (M) → Cβ (M) be the transfer
operator associated with an α-Hölder potential ϕ : M → R, with α ≥ β. Let λ
be the spectral radius, ν be the reference measure, h be the eigenfunction and
μ = hν be the equilibrium state of the potential ϕ. Consider the transfer operator
P : Cβ (M) → Cβ (M) associated with the potential ψ = ϕ + log h − log h ◦ f −
log λ.
(a) Check that L is linearly conjugate to λP, and so spec(L) = λ spec(P).
Moreover, P1 = 1 and P ∗ μ = μ.
(b) Show that |P n g| dμ ≤ |g| dμ and sup |P n g| ≤ sup |g| and there exist
constants C > 0 and τ < 1 such that Hβ,ρ (P n g) ≤ τ n Hβ,ρ (g) + C sup |g| for
every g ∈ Cβ (M) and every n ≥ 1.
12.3.8. The goal of this exercise is to prove that the spectral radius of the restriction of
L to the hyperplane V = {g ∈ Cβ (M) : g dν = 0} is strictly less than λ. By part
(a) of Exercise 12.3.7, it is enough to consider the case L = P (with λ = 1 and
ν = μ and h = 1). Fix b, β, R as in Corollary 12.3.12.
(a) Show that there exist K > 1 and r > 0 such that, for every v ∈ V
with vβ,ρ ≤ r, the function g = 1 + v is in the cone C(b, β, R) and
satisfies

K −1 vβ,ρ ≤ θ(1, g) ≤ Kvβ,ρ .

(b) Use Corollary 12.3.12 and the previous item to find C > 0 and τ < 1 such
that P n vβ,ρ ≤ Cτ n vβ,ρ for every v ∈ V. Deduce that the spectral radius
of P | V is less or equal than τ < 1.

12.4 Dimension of conformal repellers


In this section we present an application of the theory developed previously
to the calculation of the Hausdorff dimension of certain invariant sets
of expanding maps, that we call conformal repellers. The main result
(Theorem 12.4.3) contains a formula for the value of the Hausdorff dimension
of the repeller in terms of the pressure of certain potentials.
Detailed presentations of the theory of fractal dimensions and its many
applications can be found in the books of Falconer [Fal90], Palis and
Takens [PT93, Chapter 4], Pesin [Pes97] and Bonatti, Dı́az and Viana [BDV05,
Chapter 3].
418 Thermodynamic formalism

12.4.1 Hausdorff dimension


Let M be a metric space. In this section, we call a cover of M any countable
(possibly finite) family of subsets of M, not necessarily open, whose union is
the whole of M. The diameter of a cover U is the supremum of the diameters
of its elements. For each d > 0 and δ > 0, define
 
md (M, δ) = inf diam(U) : U cover with diam U < δ .
d
(12.4.1)
U∈U

That is, we consider all possible covers of M by subsets with diameter less than
δ and we try to minimize the sum of the diameters raised to the power d. This
number varies with δ in a monotonic fashion: when δ decreases, the class of
admissible covers decreases and, thus, the infimum can only increase. We call
Hausdorff measure of M in dimension d the limit

md (M) = lim md (M, δ). (12.4.2)


δ→0

Note that md (M) ∈ [0, ∞]. Moreover, it follows directly from the definition
that

md1 (M, δ) ≤ δ d1 −d2 md2 (M, δ) for every δ > 0 and any d1 > d2 > 0.

Making δ → 0, it follows that md1 (M) = 0 or md2 (M) = ∞ or both. Therefore,


there exists a unique d(M) ∈ [0, ∞] such that md (M) = ∞ for every d < d(M)
and md (M) = 0 for every d > d(M). We call d(M) the Hausdorff dimension of
the metric space M.

Example 12.4.1. Consider the usual Cantor set K in the real line. That is,


K= Kn
n=0

where K 0 = [0, 1] and every K n , n ≥ 1 is obtained by removing from each


connected component of K n−1 the central open subinterval with relative length
1/3. Let d0 = log 2/ log 3. We are going to show that md0 (M) = 1, which
implies that d(M) = d0 .
To prove the upper bound, consider, for each n ≥ 0, the cover V n of K whose
elements are the intersections of K with each of the connected components
of K n . It is clear that the sequence (V n )n is increasing: V n−1 ≺ V n for every
n ≥ 1. Note also that V n has exactly 2n elements, all with diameter equal to
3−n . Therefore,

(diam V)d0 = 2n 3−nd0 = 1 for every n. (12.4.3)
V∈V n

Since diam V n → 0 when n → ∞, it follows that md0 (M) ≤ 1 and so d(M) ≤ d0 .


12.4 Dimension of conformal repellers 419

The lower bound is a bit more difficult, because one needs to deal with
arbitrary covers. We are going to show that, given any cover U of M,

(diam U)d0 ≥ 1. (12.4.4)
U∈U

Clearly, this implies that md0 (M) ≥ 1 and so d(M) ≥ d0 .


Let us call an open segment the intersection of K with any interval of the real
line whose endpoints are not in K. It is clear that every subset of K is contained
in some open segment whose diameter is only slightly larger. Hence, given
any cover U , we can always find covers U  whose elements are open segments
 
such that U ∈U  (diam U  )d0 is as close to U∈U (diam U)d0 as we want. So,
it is no restriction to assume from the start that the elements of U are open
segments. Then, since K is compact, we may also assume that U is finite. For
any open segment U there exists n ≥ 0 such that every element of V m , m ≥ n
that intersects U is contained in U. Since U is finite, we may choose the same
n for all its elements. We claim that
 
(diam U) ≥d0
(diam V)d0 . (12.4.5)
U∈U V∈V n

Clearly, (12.4.3) and (12.4.5) imply (12.4.4). We are left to prove (12.4.5).
The strategy is to modify the cover U successively, in such a way that the
expression on the left-hand side of (12.4.5) never increases and one reaches
the cover V n after finitely many modifications. For each U ∈ U , let k ≥ 0 be
minimum such that U intersects a unique element V of V k . The choice of n
implies that k ≤ n: for k > n, if U intersects an element of V k then U contains
all the 2k−n elements of V k inside the same element of V n . Suppose that k < n.
By the choice of k, the set U intersects exactly two elements of V k+1 . Let them
be denoted V1 and V2 and let U1 and U2 be their intersections with U. Then

diam Ui ≤ diam Vi = 3−k−1 and diam U = diam U1 + 3−k−1 + diam U2 .

Hence (Exercise 12.4.1),

(diam U)d0 ≥ (diam U1 )d0 + (diam U2 )d0 .

This means that the value on the left-hand side of (12.4.5) does not increase
when we replace U by U1 and U2 in the cover U . On the one hand, the
new cover satisfies the same conditions as the original: U1 and U2 are open
segments (because V1 , V2 and U are open segments) and they contain every
element of V n that they intersect. On the other hand, by construction, each one
of them intersects a unique element of V k+1 . Therefore, after finitely many
repetitions of this procedure we reduce the initial situation to the case where
k = n for every U ∈ U . Now, the choice of n implies that in that case each
U ∈ U contains the unique V ∈ V n that it intersects. Observe that this means
that U = V. In particular, any elements of U that correspond to the same
420 Thermodynamic formalism

V ∈ V m must coincide. Eliminating such repetitions we obtain the cover V n .


This completes the calculation.
In general, the Hausdorff measure of a metric space M in its dimension d(M)
may take any value. In many interesting situations, including Example 12.4.1
and the much more general construction that we treat next, this measure is
positive and finite. But there are many other cases where it is either zero or
infinity.

12.4.2 Conformal repellers


Let D, D1 , . . . , DN be compact convex subsets of an Euclidean space R such
that Di ⊂ D for every i and Di ∩ Dj = ∅ whenever i  = j. Assume that
vol(D \ D∗ ) > 0, (12.4.6)
where D∗ = D1 ∪ · · · ∪ DN and vol denote the volume measure on R . Assume
also that there exists a map f : D∗ → D such that the restriction to each Di
is a homeomorphism onto D. See Figure 12.2. Note that the sequence of
pre-images f −n (D) is decreasing. Their intersection


= f −n (D) (12.4.7)
n=0

is called a repeller of f . In other words,  is the set of points x whose iterates


f n (x) are defined for every n ≥ 1. It is clear that  is compact and f −1 () = .
Example 12.4.2. The Cantor set K in Example 12.4.1 is the repeller of the
transformation f : [0, 1/3] ∪ [2/3, 1] → [0, 1] given by f (x) = 3x if x ∈ [0, 1/3]
and f (x) = 3x−2 if x ∈ [2/3, 1]. A more general class of examples in dimension
1 was introduced in Example 11.2.3.
In what follows we take the map f : D∗ → D to be of class C1 ; for points
on the boundary of the domain this just means that f admits a C1 extension to
some neighborhood. We also make the following additional hypotheses.

D D
hj

Dj

Di
hi

Figure 12.2. A repeller


12.4 Dimension of conformal repellers 421

The first hypothesis is that the map f is expanding: there exists σ > 1 such
that
Df (x)v ≥ σ for every x ∈ D∗ and every v ∈ R . (12.4.8)

Then it is not difficult to check that the restriction f :  →  to the repeller is


an expanding map in the sense of Section 11.2.
The second hypothesis is that the logarithm of the Jacobian of f is Hölder:
there exist C > 0 and θ > 0 such that
| det Df (x)|
log ≤ Cx − yθ for every x, y ∈ D∗ . (12.4.9)
| det Df (y)|
Up to choosing C sufficiently large, the inequality is automatically satisfied
when x and y belong to distinct subdomains Di and Dj , because d(Di , Dj ) > 0.
The third and last hypothesis is that the map f is conformal:

Df (x) Df (x)−1  = 1 for every x ∈ D∗ . (12.4.10)

It is important to note that this condition is automatic when = 1. For = 2, it


holds if and only if the map f is analytic.
All these conditions are satisfied in the case of the Cantor set (Exam-
ples 12.4.1 and 12.4.2). They are also satisfied in Example 11.2.3, as long
as we take the derivative of the corresponding map f to be Hölder.

Theorem 12.4.3 (Bowen–Manning formula). Suppose that f : D∗ → D


satisfies the conditions (12.4.8), (12.4.9) and (12.4.10). Then the Hausdorff
dimension of the repeller is given by

d() = d0 ,

where d0 ∈ (0, 1) is the unique number such that P(f , −d0 log | det Df |) = 0.

The reader should be warned that we allow ourselves a slight abuse of


language, in order not to overload the notations: throughout this section,
P(f , ψ) always denotes the pressure of a potential ψ :  → R with respect to
the restriction f :  →  to the repeller, even if at other points of the arguments
we consider the map f defined on the whole domain D∗ .
Before we start proving the theorem, let us mention the following interesting
special case:

Example 12.4.4. Let f : J → [0, 1] be a map as in Example 11.2.3 and assume


that the restriction of f to each connected component Ji of J is affine: the
absolute value of the derivative is constant, equal to the inverse of the length
|Ji |. Then the Hausdorff dimension of the repeller K of the map f is the unique
number τ such that

|Ji |τ = 1. (12.4.11)
i
422 Thermodynamic formalism

To obtain this conclusion from Theorem 12.4.3 it suffices to note that



P(f , −t log |f  |) = log |Ji |t for every t. (12.4.12)
i

We let the reader check this last claim (Exercise 12.4.6).

12.4.3 Distortion and conformality


Let us call an inverse branch of f the inverse hi : D → Di of the restriction
of f to each domain Di . More generally, we call an inverse branch of f n any
composition
hn = hi0 ◦ · · · ◦ hin−1 (12.4.13)

with i0 , . . . , in−1 ∈ {1, . . . , N}. For each n ≥ 1, denote by I n the family of all
inverse branches hn of f n . By construction, the images hn (D), hn ∈ I n are
pairwise disjoint and their union contains .
The principal goal in this section is to prove the following geometric
estimate, which is at the heart of the proof of Theorem 12.4.3:

Proposition 12.4.5. There exists C0 > 1 such that for every n ≥ 1, every hn ∈
I n , every E ⊂ hn (D) and every x ∈ hn (D):
1
[diam f n (E)] ≤ [diam E] | det Df n (x)| ≤ C0 [diam f n (E)] . (12.4.14)
C0

Starting the proof of this proposition, observe that our hypotheses imply that
every inverse branch hi of f is a diffeomorphism with Dhi  ≤ σ −1 . Then, since
D is convex, we may use the mean value theorem to conclude that

hi (z) − hi (w) ≤ σ −1 z − w for every z, w ∈ D. (12.4.15)

For each inverse branch hn as in (12.4.13), let us consider the sequence of


inverse branches

hn−k = hik ◦ · · · ◦ hin−1 , k = 0, . . . , n − 1. (12.4.16)

Note that hn−k (D) ⊂ Dik for each k. It follows from (12.4.15) that each hn−k is
a σ k−n -contraction. In particular,

diam hn−k (D) ≤ σ k−n diam D for every k = 0, . . . , n − 1. (12.4.17)

Recall that the convex hull of a set X ⊂ R is the union of all the line
segments whose endpoints are in X. It is clear that the convex hull has the
same diameter as the set itself. Since Di is convex for every i, the convex hull
of each hn−k (D) is contained in Dik . In particular, the derivative Df is defined
at every point in the convex hull of every hn−k (D).
12.4 Dimension of conformal repellers 423

Lemma 12.4.6. There exists C1 > 1 such that, for every n ≥ 1 and every
inverse branch hn ∈ I n ,
&
n−1
| det Df (zk )|
≤ C1
k=0
| det Df (w k )|

for any zk , wk in the convex hull of hn−k (D) for k = 0, . . . , n − 1.

Proof. The condition (12.4.9) gives that


| det Df (zk )|
log ≤ Czk − wk θ ≤ C[diam hn−k (D)]θ
| det Df (wk )|
for each k = 0, . . . , n − 1. Then, using (12.4.17),
&
n−1
| det Df (zk )| 
n−1 
n−1
θ θ
log ≤ C[diam h (D)] ≤ C[diam D]
n−k
σ (k−n)θ .
k=0
| det Df (wk )| k=0 k=0
  
Therefore, it suffices to take C1 = exp C(diam D)θ ∞
j=1 σ
−jθ
.

The time has come for us to exploit the conformality hypothesis (12.4.10).
Given any linear isomorphism L : R → R , it is clear that | det L| ≤ L , and
analogously for the inverse. Therefore,
 
1 = | det L| | det L−1 | ≤ L L−1  .
Hence, L L−1  = 1 implies that | det L| = L , and analogously for the
inverse. Therefore, (12.4.10) implies that
| det Df (y)| = Df (y) for every y ∈ D∗ . (12.4.18)
Now we are ready to prove Proposition 12.4.5:

Proof of Proposition 12.4.5. Let n, hn , E and x be as in the statement. Let w be


a point of maximum for the norm of Dhn in the domain D. By the mean value
theorem,
x1 − x2  ≤ Dhn (w) f n (x1 ) − f n (x2 ) (12.4.19)
for any x1 , x2 in E. Observe that Dhn (w) is the inverse of Df n (z), with z =
hn (w). Hence, by conformality, Dhn (w) = Df n (z)−1 . Moreover, using
Lemma 12.4.6 and (12.4.18),
| det Df n (x)| ≤ C1 | det Df n (z)| = C1 Df n (z) . (12.4.20)
Combining (12.4.19) and (12.4.20), we obtain
x1 − x2  ≤ C1 | det Df n (x)|−1 f n (x1 ) − f n (x2 ) .
Varying x1 , x2 ∈ E, it follows that
[diam E] ≤ C1 | det Df n (x)|−1 [diam f n (E)] .
This proves the second inequality in (12.4.14), as long as we take C0 ≥ C1 .
424 Thermodynamic formalism

The proof of the other inequality is similar. For each k = 0, . . . , n − 1, let


zk be a point of maximum for the norm of Df restricted to the convex hull of
hn−k (D). Then,
f k+1 (x1 ) − f k+1 (x2 ) ≤ Df (zk )f k (x1 ) − f k (x2 )
= | det Df (zk )|1/ f k (x1 ) − f k (x2 )
for every k and any x1 , x2 ∈ E. Hence,
&
n−1
f (x1 ) − f (x2 ) ≤
n n
| det Df (zk )| x1 − x2  . (12.4.21)
k=0

By Lemma 12.4.6,
&
n−1
| det Df (zk )| ≤ C1 | det Df n (x)|. (12.4.22)
k=0

Combining (12.4.21) and (12.4.22), we obtain


y1 − y2  ≤ C1 | det Df n (x)| x1 − x2  .
Varying y1 , y2 , we conclude that
[diam f n (E)] ≤ C1 | det Df n (x)|[diam E] .
This proves the first inequality in (12.4.14), for any C0 ≥ C1 .

12.4.4 Existence and uniqueness of d0


Now we prove the existence and uniqueness of the number d0 in the statement
of Theorem 12.4.3. Denote φ = − log | det Df | and consider the function
 : R → R, (t) = P(f , tφ).
We want to show that there exists a unique d0 such that (d0 ) = 0.
Uniqueness is easy to prove. Indeed, the hypotheses (12.4.8) and (12.4.10)
imply that
φ = log | det Df −1 ◦ f | = log Df −1 ◦ f  ≤ − log σ .
Then, given any s < t, we have tφ ≤ sφ − (t − s) log σ . Using (10.3.4) and
(10.3.5), it follows that
P(f , tφ) ≤ P(f , sφ) − (t − s) log σ < P(f , sφ).
This proves that  is strictly decreasing, and so there exists at most one d0 ∈ R
such that (d0 ) = 0.
On the other hand, it follows from Proposition 10.3.6 that  is continuous.
Hence, to prove the existence of d0 it is enough to show that (0) > 0 > (1).
This may be done as follows.
12.4 Dimension of conformal repellers 425

Let L be the open cover of  whose elements are the images h() of 
under all the inverse branches of f . For each n ≥ 1, the iterated sum Ln is
formed by the images hn () of  under the inverse branches of f n . It follows
from (12.4.17) that diam Ln ≤ σ −n diam D for every n, and so diam Ln → 0.
Then, since the elements of L are pairwise disjoint, we may use Exercise 10.3.3
to conclude that
P(f , ψ) = P(f , ψ, L) for every potential ψ. (12.4.23)
In particular, (0) = P(f , 0, L) = h(f , L). Note that each family Ln is a minimal
cover of the repeller, that is, no proper subfamily covers . Therefore, H(Ln ) =
log #Ln = n log N for every n and, consequently, h(f , L) = log N. This proves
that (0) is positive.
 
Proposition 12.4.7. (1) = limn 1n log vol f −n (D) < 0.

Proof. By (12.4.23), we have that (1) = P(f , φ, L). In other words,


1 1  n
(1) = lim log Pn (f , φ, L) = lim log eφn (h ()) .
n n n n
hn ∈I n

Since φ = − log | det Df |, this means that


1  1
(1) = lim log sup . (12.4.24)
n n hn (D) | det Df n |
h ∈I
n n

On the other hand, by the formula of change of variables,


 −n    1
vol f (D) = vol(h (D)) =
n
◦ hn dx.
hn ∈I n hn ∈I n D
| det Df |
n

It follows from Lemma 12.4.6 that


inf | det Df n | ≤ | det Df n |(hn (z)) ≤ C1 inf | det Df n |
h (D)
n h (D)
n

for every z ∈ hn (D) and every hn ∈ I n . Consequently,


   1  −n 
vol f −n (D) ≤ vol(D) sup ≤ C1 vol f (D) .
hn (D) | det Df n |
hn ∈I n

Combining these inequalities with (12.4.24), we conclude that


1   1  
lim sup log vol f −n (D) ≤ (1) ≤ lim inf log vol f −n (D) .
n n n n
This proves the identity in the statement of the proposition.
It remains to prove that the volume of the pre-images f −n (D) decays
exponentially fast. For that, observe that f −(n+1) (D) = f −n (D∗ ) is the disjoint
union of the images hn (D∗ ), with hn ∈ I n . Therefore,
    n   n 
vol f −(n+1) (D) vol h (D∗ ) vol h (D∗)
 = h ∈I
n n
   ≤ max   . (12.4.25)
−n
vol f (D) hn ∈I n vol h (D)
n hn ∈I n vol hn (D)
426 Thermodynamic formalism

By the formula of change of variables,



 n 1
vol h (D)) = ◦ hn dx and
D | det Df |
n

 1
vol hn (D \ D∗ )) = ◦ hn dx.
D\D∗ | det Df |
n

Hence, using Lemma 12.4.6,


   
vol hn (D \ D∗ ) 1 vol D \ D∗
  ≥   (12.4.26)
vol hn (D) C1 vol D
for every hn ∈ I n . By the hypothesis (12.4.6), the expression on the right-hand
side of (12.4.26) is positive. Fix β > 0 close enough to zero that 1 − e−β is
smaller than that expression. Then
 
vol hn (D) \ hn (D∗ )
  ≥ 1 − e−β
vol h (D)
n

for every hn ∈ I n . Combining this inequality with (12.4.25) and the fact that
vol(hn (D) \ hn (D∗ )) = vol hn (D) − vol hn (D∗ ), we obtain that
 
vol f −(n+1) (D)
  ≤ e−β for every n ≥ 0
−n
vol f (D)
(the case n = 0 follows directly from the hypothesis (12.4.6)). Hence,
1  
lim log vol f −n (D) ≤ −β < 0.
n n

This concludes the proof of the proposition.

Figure 12.3 summarizes the conclusions in this section. Recall that the func-
tion defined by (t) = P(f , −t log | det Df |) is convex, by Proposition 10.3.5.

12.4.5 Upper bound


Here we show that d() ≤ b for every b > 0 such that P(f , bφ) < 0. In view
of the observations in the previous section, this proves that d() ≤ d0 .

t Ψ(t )

h(f )

d0 1
0

Figure 12.3. Pressure and Hausdorff dimension


12.4 Dimension of conformal repellers 427

Let L be the open cover of  introduced in the previous section and let b > 0
be such that P(f , bφ) < 0. The property (12.4.23) implies that

P(f , bφ, L) = P(f , bφ) < −κ

for some κ > 0. By the definition (10.3.2), it follows that

Pn (f , bφ, L) ≤ e−κn for every n sufficiently large. (12.4.27)

It is clear that Ln is a minimal cover of : no proper subfamily covers .


Hence, recalling the definition (10.3.1), the inequality (12.4.27) implies that

ebφn (L) ≤ e−κn for every n sufficiently large. (12.4.28)
L∈Ln

It is clear that every L ∈ Ln is compact. Hence, by continuity of the Jacobian,

eφn (L) = sup | det Df n |−1 = | det Df n (x)|−1


L

for some x ∈ L. It is also clear that f n (L) =  for every L ∈ Ln . Then, taking
E = L in Proposition 12.4.5,

[diam L] e−φn (L) ≤ C0 [diam ] .

Combining this inequality with (12.4.28), we obtain that


 
[diam L]b ≤ C0b [diam ]b ebφn (L) ≤ C0b [diam ]b e−κn
L∈Ln L∈Ln

for every n sufficiently large. Since the expression on the right-hand side
converges to zero, and the diameter of the covers Ln also converges to zero,
it follows that mb (M) = 0. Therefore, d(M) ≤ b .

12.4.6 Lower bound


Now we show that d() ≥ a for every a such that P(f , aφ) > 0. This implies
that d() ≥ d0 , which completes the proof of Theorem 12.4.3.
As observed in the previous section, the cover L realizes the pressure and all
its iterated sums Ln are minimal covers of . Hence, the choice of a implies
that there exists κ > 0 such that

Pn (f , aφ, L) = eaφn (L) ≥ eκn for every n sufficiently large. (12.4.29)
L∈Ln

Fix such an n. Let ε > 0 be a lower bound for the distance between any two
elements of Ln : a lower bound does exist because the elements of Ln are
compact and pairwise disjoint. Fix ρ ∈ (0, εa ). The reason for this choice will
be clear soon. We claim that

[diam U]al ≥ 2−al ρ (12.4.30)
U∈U
428 Thermodynamic formalism

for every cover U of . By definition, this implies that ma () ≥ 2−al ρ > 0
and, consequently, d() ≥ a . Therefore, to end the proof of Theorem 12.4.3
it suffices to prove this claim.
Let us suppose that there exists some open cover of  which does not satisfy
(12.4.30). Then, using Exercise 12.4.3, there exists some open cover U of 
with 
[diam U]al < ρ < εal . (12.4.31)
U∈U
By compactness, we may suppose that this open cover U is finite. The relation
(12.4.31) implies that every U ∈ U has diameter less than ε. Hence, each U ∈ U
intersects at most one L ∈ Ln . Since Ln covers  and U is a non-empty subset
of , we also have that U intersects some L ∈ Ln . This means that U is the
disjoint union of the families
 
UL = U ∈ U : U ∩ L  = ∅ , L ∈ Ln .
If U ∈ UL then U ⊂ L. Let us consider the families f n (UL ) = {f n (U) :
U ∈ UL }. Observe that each one of them is a cover of . Moreover, using
Proposition 12.4.5,
  
−aφn (L)
[diam V] =
a
[diam f (U)] ≤ C0 e
n a
[diam U]a .
V∈f n (UL ) U∈UL U∈UL
(12.4.32)
Therefore,
    
[diam U]a = [diam U]a ≥ C0−1 eaφn (L) [diam V]a .
U∈U L∈Ln U∈UL L∈Ln V∈f n (UL )

Let us suppose that


 
[diam V]a ≥ [diam U]a for every L ∈ Ln .
V∈f n (UL ) U∈U

Then, the previous inequality implies


   
[diam U]a ≥ C0−1 eaφn (L) [diam U]a ≥ C0−1 eκn [diam U]a .
U∈U L∈Ln U∈U U∈U

This is a contradiction, because eκn > C0 . Hence, there exists L ∈ Ln such that
 
[diam V]a ≤ [diam U]a < ρ.
V∈f n (UL ) U∈U

Figure 12.4. Sierpinski triangle


12.4 Dimension of conformal repellers 429

Thus, we may repeat the previous procedure with f n (UL ) in the place of U .
Observe, however, that #f n (UL ) = #UL is strictly less than #U . Therefore, this
process must stop after a finite number of steps. This contradiction proves the
claim 12.4.30.
The proof of Theorem 12.4.3 is complete. However, it is possible to prove an
even stronger result: in the conditions of the theorem, the Hausdorff measure
of  in dimension d(M) is positive and finite. We leave this statement as a
special challenge (Exercise 12.4.7) for the reader who remained with us till the
end of this book!

12.4.7 Exercises
12.4.1. Let d = log 2/ log 3. Show that (x1 + 1 + x2 )d ≥ x1d + x2d for every x1 , x2 ∈ [0, 1].
Moreover, the identity holds if and only if x1 = x2 = 1.
12.4.2. Let f : M → N be a Lipschitz map, with Lipschitz constant L. Show that

md (f (A)) ≤ Ld md (A)
for any d ∈ (0, ∞) and any A ⊂ M. Use this fact to show that if A ⊂ Rn and t > 0,
then md (tA) = td md (A), where tA = {tx : x ∈ A}.
12.4.3. Represent by mod (M) and mcd (M) the numbers defined in the same ways as the
Hausdorff measure md (M) but considering only covers by open sets and covers
by closed sets, respectively. Show that mod (M) = mcd (M) = md (M).
12.4.4. (Mass distribution principle) Let μ be a finite measure on a compact metric
space M and assume that there exist numbers d, K, ρ > 0 such that μ(B) ≤
K(diam U)d for every set B ⊂ M with diameter less than ρ. Show that if A ⊂ M
is such that μ(A) > 0 then md (A) > 0 and so d(A) ≥ d.
12.4.5. Use the mass distribution principle to show that the Hausdorff dimension of the
Sierpinski triangle (Figure 12.4) is equal to d0 = log 3/ log 2 and the Hausdorff
measure in dimension d0 is positive and finite.
12.4.6. Check the pressure formula (12.4.12).
12.4.7. Adapting arguments from Exercise 12.4.5, show that in the conditions of
Theorem 12.4.3 one has 0 < md() () < ∞.
Appendix A
Topics in measure theory, topology
and analysis

In this series of appendices we recall several basic concepts and facts in


measure theory, topology and functional analysis that are useful throughout
the book. Our purpose is to provide the reader with a quick, accessible source
of references to measure and integration, general and differential topology
and spectral theory, to try and make this book as self-contained as possible.
We have not attempted to make the material in these appendices completely
sequential: it may happen that a notion mentioned in one section is defined or
discussed in more depth in a later one (check the index).
As a general rule, we omit the proofs. For Appendices A.1, A.2 and A.5,
the reader may find detailed information in the books of Castro [Cas04],
Fernandez [Fer02], Halmos [Hal50], Royden [Roy63] and Rudin [Rud87]. The
presentation in Appendix A.3 is a bit more complete, including the proofs of
most results, but the reader may find additional relevant material in the books
of Billingsley [Bil68, Bil71]. We recommend the books of Hirsch [Hir94]
and do Carmo [dC79] to all those interested in going further into the topics
in Appendix A.4. For more information on the subjects of Appendices A.6
and A.7, including proofs of the results quoted here, check the book of
Halmos [Hal51] and the treatise of Dunford and Schwarz [DS57, DS63],
especially Section IV.4 of the first volume and the initial sections of the second
volume.

A.1 Measure spaces


Measure spaces are the natural environment for the definition of the Lebesgue
integral, which is the main topic to be presented in Appendix A.2. We begin by
introducing the notions of algebra and σ -algebra of subsets of a set, which lead
to the concept of measurable space. Next, we present the notion of measure on
a σ -algebra and we analyze some of its properties. In particular, we mention a
few results on the construction of measures, including Lebesgue measures in
A.1 Measure spaces 431

Euclidean spaces. The last part is dedicated to measurable maps, which are the
maps that preserve the structure of measurable spaces.

A.1.1 Measurable spaces


Given a set X, we often denote by Ac the complement X \ A of each subset A.

Definition A.1.1. An algebra of subsets of a set X is a family B of subsets of


X that contains the empty set and is closed under the elementary operations of
set theory:

(i) ∅ ∈ B;
(ii) A ∈ B implies Ac ∈ B;
(iii) A ∈ B and B ∈ B implies A ∪ B ∈ B;
(iv) A ∈ B and B ∈ B implies A ∩ B ∈ B;
(v) A ∈ B and B ∈ B implies A \ B ∈ B.

The two last properties are immediate consequences of the previous ones,
since A ∩ B = (Ac ∪ Bc )c and A \ B = A ∩ Bc . Moreover, by associativity,
properties (iii) and (iv) imply that the union and the intersection of any finite
family of elements of B are also in B.

Definition A.1.2. A σ -algebra of subsets of a set X is an algebra B of subsets


of X that is also closed under countable unions:


Aj ∈ B for j = 1, . . . , n, . . . implies Aj ∈ B.
j=1

Then B is also closed under countable intersections:



 +
∞ %c
Aj ∈ B for j = 1, . . . , n, . . . implies Aj = Acj ∈ B.
j=1 j=1

Definition A.1.3. A measurable space is a pair (X, B) where X is a set and B


is a σ -algebra of subsets of X. The elements of B are called measurable sets.

Next, we describe a few examples of constructions of σ -algebras.

Example A.1.4. For any set X, the following families of subsets are
σ -algebras:
{∅, X} and 2X = { all subsets of X}.
Moreover, clearly, if B is any algebra of subsets of X then {∅, X} ⊂ B ⊂ 2X . So,
{∅, X} is the smallest and 2X is the largest of all algebras of subsets of X.

In the statement that follows, I is an arbitrary set whose sole use is to index
the elements of the family of σ -algebras.
432 Measure theory, topology and analysis

Proposition A.1.5. Consider any non-empty family {Bi : i ∈ I} of σ -algebras


of subsets of the same set X. Then the intersection B = i∈I Bi is also a
σ -algebra of subsets of X.
Given any family E of subsets of X, we may apply Proposition A.1.5 to
the family of all σ -algebras that contain E. Note that this family is non-empty,
since it contains the σ -algebra 2X of all subsets of X. According to the previous
proposition, the intersection of all these σ -algebras is also a σ -algebra. By
construction, this σ -algebra contains E and is contained in every σ -algebra
that contains E. In other words, it is the smallest σ -algebra that contains E.
This leads to the following definition:
Definition A.1.6. The σ -algebra generated by a family E of subsets of X is
the smallest σ -algebra σ (E) that contains E or, in other words, the intersection
of all the σ -algebras that contain E.
Recall that a topological space is a pair (X, τ ) where X is a set and τ
is a family of subsets of X that contains {∅, X} and is closed under finite
intersections and arbitrary unions. Such a family τ is called a topology and
its elements are called open subsets of X. In this book we take all topological
spaces to be Hausdorff, that is, such that for any pair of distinct points there
exists a pair of disjoint open subsets each of which contains one of the points.
Definition A.1.7. The Borel σ -algebra of a topological space is the σ -algebra
σ (τ ) generated by the topology τ , that is, the smallest σ -algebra that contains
all the open subsets of X. The elements of σ (τ ) are called Borel subsets of X.
The closed subsets of X, being the complements of the open subsets, are also
in the Borel σ -algebra.
Analogously to Proposition A.1.5, the intersection of any non-empty family
{τi : i ∈ I} of topologies of the same set X is also a topology of X. Then, by
the same argument as we used before for σ -algebras, given any family E of
subsets of X there exists a smallest topology τ (E) that contains E. We call it
the topology generated by E.
Example A.1.8. Let (X, B) be a measurable space. The limit superior of a
sequence of sets En ∈ B is the set lim supn En formed by the points x ∈ X such
that x ∈ En for infinitely many values of n. Analogously, the limit inferior of
(En )n is the set lim infn En of points x ∈ X such that x ∈ En for every value of n
sufficiently large. In other words,
 
lim inf En = Em and lim sup En = Em .
n n
n≥1 m≥n n≥1 m≥n

Observe that lim infn En ⊂ lim supn En and both sets are in B.
Example A.1.9. The extended line R̄ = [−∞, ∞] is the union of the real line
R = (−∞, +∞) with the two points ±∞ at infinity. This space has a natural
A.1 Measure spaces 433

topology, generated by the intervals [−∞, b) and (a, +∞], with a, b ∈ R. It


is easy to see that the extended line is homeomorphic to a compact interval
in the real line: for example, the function arctan : R → (−π/2, π/2) extends
straightforwardly to a homeomorphism (that is, a continuous bijection whose
inverse is also continuous) between R̄ and [−π/2, π/2]. We always consider
on the extended line the Borel σ -algebra associated with this topology.
Of course, the real line R is a subspace (measurable as well as topological)
of the extended line. The Borel subsets of the real line constitute a large family
and one might even be led to think that every subset of R is a Borel subset.
However, this is not true: a counterexample is constructed in Exercise A.1.4.

A.1.2 Measure spaces


Let (X, B) be a measurable space. The following notions have a central role in
this book:
Definition A.1.10. A measure on (X, B) is a function μ : B → [0, +∞] such
that μ(∅) = 0 and
 ∞   ∞
μ Aj = μ(Aj )
j=1 j=1

for any countable family of pairwise disjoint sets Aj ∈ B. This last property is
called countable additivity or σ -additivity. Then the triple (X, B, μ) is called
a measure space. If μ(X) < ∞ then we say that μ is a finite measure and if
μ(X) = 1 then we call μ a probability measure. In this last case, (X, B, μ) is
called a probability space.
Example A.1.11. Let X be an arbitrary set, endowed with the σ -algebra B =
2X . Given any p ∈ X, consider the function δp : 2X → [0, +∞] defined by:

1 if p ∈ A
δp (A) =
0 if p ∈/ A.
It is easy to see that δp is a measure. It is usually called the Dirac measure, or
Dirac mass at p.
Definition A.1.12. We say that a measure μ is σ -finite if there exists a
sequence A1 , . . . , An , . . . of subsets of X such that μ(Ai ) < ∞ for every i ∈ N
and
∞
X= Ai .
i=1

We say that a function μ : B → [0, +∞] is finitely additive if


N   N
μ Aj = μ(Aj )
j=1 j=1
434 Measure theory, topology and analysis

for any finite family A1 , . . . , AN ∈ B of pairwise disjoint subsets. Note that if μ


is σ -additive then it is also finitely additive. Moreover, if μ is finitely additive
and is not constant equal to +∞ then μ(∅) = 0.
The main tool for constructing measures is the following theorem:
Theorem A.1.13 (Extension). Let A be an algebra of subsets of X and let
μ0 : A → [0, +∞] be a σ -additive function with μ0 (X) < ∞. Then there exists
a unique measure μ defined on the σ -algebra B generated by A that is an
extension of μ0 , meaning that it satisfies μ(A) = μ0 (A) for every A ∈ A.
Theorem A.1.13 remains valid for σ -finite measures. Moreover, there is a
version for finitely additive functions: if μ0 is finitely additive then it admits
a finitely additive extension to σ -algebra B generated by A. However, in this
context the extension need not be unique.
The most useful criterion for proving that a given function is σ -additive is
provided by the following theorem:
Theorem A.1.14 (Continuity at the empty set). Let A be an algebra of subsets
of X and μ : A → [0, +∞) be a finitely additive function with μ(X) < ∞. Then
μ is σ -additive if and only if
 
lim μ An = 0 (A.1.1)
n

for every sequence A1 ⊃ · · · ⊃ Aj ⊃ · · · of elements of A with j=1 Aj = ∅.
The proof of this theorem is proposed in Exercise A.1.7. Exercise A.1.9
deals with some variations of the statement.
Definition A.1.15. We say that an algebra A is compact if any decreasing
sequence A1 ⊃ · · · ⊃ An ⊃ · · · of non-empty elements of A has non-empty
intersection.
An open cover of a topological space is a family of open subsets whose
union is the whole of K. A subcover is just a subfamily of elements of a
cover whose union is still the whole space. A topological space is compact
if every open cover admits some finite subcover. A subset K of a topological
space X is compact if the topology of X restricted to K turns the latter into a
compact topological space. Every closed subset of a compact space is compact.
Conversely, (assuming X is a Hausdorff space) then every compact subset is
closed. Another important fact is that the intersection n Kn of any decreasing
sequence K1 ⊃ · · · ⊃ Kn ⊃ · · · of compact subsets is non-empty.
Example A.1.16. It follows from what we have just said that if X is a
(Hausdorff) topological space and every element of the algebra A is compact
then A is a compact algebra.
It follows from Theorem A.1.14 that if A is a compact algebra then every
finitely additive function μ : A → [0, +∞) with μ(X) < ∞ is σ -additive.
A.1 Measure spaces 435

Hence, by Theorem A.1.13, μ extends uniquely to a measure defined on the


σ -algebra generated by A.

Definition A.1.17. We say that a non-empty family C of subsets of X is a


monotone class if C contains X and is closed under countable monotone unions
and intersections:

• if A1 ⊂ A2 ⊂ · · · are in C then n≥1 An ∈ C, and


• if A1 ⊃ A2 ⊃ · · · are in C then n≥1 An ∈ C.

Clearly, the two families {∅, X} and 2X are monotone classes. Moreover, if
{Ci : i ∈ I} is any family of monotone classes then the intersection i∈I Ci
is a monotone class. Thus, for every subset A of 2X there exists the smallest
monotone class that contains A.

Theorem A.1.18 (Monotone class). The smallest monotone class that con-
tains an algebra A coincides with the σ -algebra σ (A) generated by A.

Another important result about σ -algebras that will be useful later states that
every element of a σ -algebra B generated by an algebra A is approximated by
the elements of A, in the sense that the measure of the symmetric difference
   
A B = (A \ B) ∪ (B \ A) = A ∪ B \ A ∩ B
can be made arbitrarily small. More precisely:

Theorem A.1.19 (Approximation). Let (X, B, μ) be a probability space and


A be an algebra A of subsets of X that generates the σ -algebra B. Then, for
every ε > 0 and every B ∈ B there exists A ∈ A such that μ(A B) < ε.

Definition A.1.20. A measure space is complete if every subset of a


measurable set with zero measure is also measurable.

It is possible to transform any measure space (X, B, μ) into a complete space,


as follows. Let B̄ be the family of all subsets A ⊂ X such that there exist B1 , B2 ∈
B with B1 ⊂ A ⊂ B2 and μ(B2 \ B1 ) = 0. Then B̄ is a σ -algebra and it contains
B. Consider the function μ̄ : B̄ → [0, +∞] defined by μ̄(A) = μ(B1 ) = μ(B2 ),
for any B1 , B2 ∈ B as before. The function μ̄ is well defined, it is a measure
on B̄ and its restriction to B coincides with μ. By construction, (X, B̄, μ̄) is a
complete measure space. It is called the completion of (X, B, μ).
Given subsets U1 and U2 of the σ -algebra B, we say that U1 ⊂ U2 up to
measure zero if for every B1 ∈ U1 there exists B2 ∈ U2 such that μ(B1 B2 ) = 0.
By definition, U1 = U2 up to measure zero if U1 ⊂ U2 up to measure zero and
U2 ⊂ U1 up to measure zero. We say that a set U ⊂ B generates the σ -algebra B
up to measure zero if the σ -algebra generated by U is equal to B up to measure
zero. Equivalently, U generates B up to measure zero if the completion of the
σ -algebra generated by U coincides with the completion of B.
436 Measure theory, topology and analysis

By definition, a measure takes values in [0, ∞]. Whenever it is convenient


to stress that fact, we speak of positive measure instead. But it is possible to
weaken that requirement and, indeed, such generalizations are useful for our
purposes.
We call a signed measure on a measurable space (X, B) any σ -additive
function μ : B → [−∞, ∞] such that μ(∅) = 0. More precisely, μ may take
either the value −∞ or the value +∞, but not both; this is to avoid the
“indetermination” ∞ − ∞ in the additivity condition.

Theorem A.1.21 (Hahn decomposition). If μ is a signed measure then there


exist measurable sets P, N ⊂ X such that P ∪ N = X and P ∩ N = ∅, and

μ(E) ≥ 0 for every E ⊂ P and μ(E) ≤ 0 for every E ⊂ N.

This means that we may write μ = μ+ − μ− , where μ+ and μ− are the


(positive) measures defined by
   
μ+ (E) = μ E ∩ P and μ− (E) = −μ E ∩ N .

In particular, the sum |μ| = μ+ + μ− is also a positive measure; it is called the


total variation of the signed measure μ.
If μ takes values in (−∞, ∞) only, we call it a finite signed measure. In
this case, the measures μ+ and μ− are finite. The set M(X) of finite signed
measures is a real vector space and the function μ = |μ|(X) is a complete
norm in this space (see Exercise A.1.10). In other words, (M(X),  · ||) is a
real Banach space. When X is a compact metric space, this Banach space
is isomorphic to the dual of the space C0 (X) of continuous real functions X
(theorem of Riesz–Markov).
More generally, we call a complex measure on a measurable space (X, B) any
σ -additive function μ : B → C. Observe that μ(∅) is necessarily zero. Clearly,
we may write μ = (μ + i)μ, where the real part (μ and the imaginary part
)μ are finite signed measures. The total variation of μ is the finite measure
defined by

|μ|(E) = sup |μ(P)|,
P P∈P

where the supremum is taken over all countable partitions of the measurable
set E into measurable subsets (this definition coincides with the one we gave
previously in the special case when μ is real). The function μ = |μ|(X)
defines a norm in the vector space of complex measures on X, which we also
denote as M(X). Moreover, this norm is complete. When X is a compact
metric space, the complex Banach space (M(X),  · ) is isomorphic to the
dual of the space C0 (X) of continuous complex functions on X (theorem of
Riesz–Markov).
A.1 Measure spaces 437

A.1.3 Lebesgue measure


The notion of Lebesgue measure corresponds to the notion of volume of
subsets of the Euclidean space Rd . It is defined as follows.
Let X = [0, 1] and A be the family of all subsets of the form A = I1 ∪ · · · ∪ IN
where I1 , . . . , IN are pairwise disjoint intervals. It is easy to check that A is an
algebra of subsets of X. Let m0 : A → [0, 1] be the function defined on this
algebra by
 
m0 I1 ∪ · · · ∪ IN = |I1 | + · · · + |IN |,
where |Ij | represents the length of each interval Ij . Note that m0 (X) = 1. In
Exercise A.1.8 we ask the reader to show that m0 is σ -additive.
Note that the σ -algebra B generated by A coincides with the Borel σ -algebra
of X, since every open subset is a countable union of pairwise intervals. So, by
Theorem A.1.13, there exists a unique probability measure m defined on B that
is an extension of m0 . It is called the Lebesgue measure on [0, 1].
More generally, one defines the Lebesgue measure m on the cube X = [0, 1]d
of any dimension d ≥ 1, in the following way. First, we call a rectangle in X
any subset of the form R = I1 × · · · × Id where the Ij are intervals. Then we
define:
m0 (R) = |I1 | × · · · × |Id |.
Next, we consider the algebra A of subsets of X of the form A = R1 ∪ · · · ∪ RN ,
where R1 , . . . , RN are pairwise disjoint rectangles, and we define
m0 (A) = m0 (R1 ) + · · · + m0 (RN )
for every A in that algebra. The σ -algebra generated by A coincides with the
Borel σ -algebra of X. The Lebesgue measure on the cube X = [0, 1]d is the
extension of m0 to that σ -algebra.
In order to define the Lebesgue measure on the whole Euclidean space Rd ,
we decompose the space into cubes of unit size:
 
R =
d
··· [k1 , k1 + 1) × · · · × [kd , kd + 1).
k1 ∈Z kd ∈Z

Each cube [k1 , k1 + 1) × · · · × [kd , kd + 1) may be identified with [0, 1)d through
the translation Tk1 ,...,kd (x) = x − (k1 , . . . , kd ) that maps (k1 , k2 , . . . , kd ) to the
origin. That allows us to define a measure mk1 ,k2 ,...,kd on C, by setting
 
mk1 ,k2 ,...,kd (B) = m0 Tk1 ,...,kd (B)
for every measurable set B ⊂ C. Finally, given any measurable set B ⊂ Rd ,
define:
   
m(B) = ··· mk1 ,...,kd B ∩ [k1 , k1 + 1) × · · · × [kd , kd + 1) .
k1 ∈Z kd ∈Z

Note that this measure m is σ -finite but not finite.


438 Measure theory, topology and analysis

Example A.1.22. It is worthwhile outlining a classical alternative construction


of the Lebesgue measure (see Chapter 2 of Royden [Roy63] for details). We
call the Lebesgue exterior measure of an arbitrary set E ⊂ Rd the number

m∗ (E) = inf m0 (Rk ),
k

where the infimum is taken over all countable covers (Rk )k of E by open
rectangles. The function E → m(E) is defined for every E ⊂ Rd , but is not
finitely additive (although it is countably subadditive). We say that E is a
Lebesgue measurable set if
   
m∗ (A) = m∗ A ∩ E + m∗ A ∩ Ec for every A ⊂ Rd .

Every rectangle R is a Lebesgue measurable set and satisfies m∗ (R) = m0 (R).


The family M of all Lebesgue measurable sets is a σ -algebra. Moreover, the
restriction of m∗ to M is σ -additive and, hence, a measure. By the previous
observation, M contains every Borel set of Rd . The restriction of m∗ to the
Borel σ -algebra B of Rd coincides with the Lebesgue measure on Rd .

Actually, M coincides with the completion of the Borel σ -algebra of Rd


with respect to the Lebesgue measure. This and other related properties are
part of Exercise A.1.13.

Example A.1.23. Let φ : [0, 1] → R be a positive continuous function. Given


any interval I, with endpoints 0 ≤ a < b ≤ 1, define
 b
μφ (I) = φ(x) dx (Riemann integral).
a

Next, extend the definition of μφ to the algebra A formed by the finite unions
A = I1 ∪ · · · ∪ Ik of pairwise disjoint intervals, through the relation

k
μφ (A) = μφ (Ij ).
j=1

The basic properties of the Riemann integral ensure that μφ is finitely


additive. We leave it to the reader to check that the measure μφ is
σ -additive in the algebra A (see Exercise A.1.7). Moreover, μφ (∅) = 0 and
μφ ([0, 1]) < ∞, because φ is continuous and, hence, bounded. With the help
of Theorem A.1.13, we may extend μφ to the whole Borel σ -algebra of [0, 1].

The measure μφ that we have just constructed has the following special
property: if a set A ⊂ [0, 1] has Lebesgue measure zero then μφ (A) = 0. This
property is called absolute continuity (with respect to the Lebesgue measure)
and is studied in a lot more depth in Appendix A.2.4.
Here is an example of a measure that is positive on any open set but is not
absolutely continuous with respect to Lebesgue measure:
A.1 Measure spaces 439

Example A.1.24. Fix any enumeration {r1 , r2 , . . . } of the set Q of rational


numbers. Consider the measure μ defined on R by
1
μ(A) = i
.
r ∈A
2
i

On the one hand, the measure of any non-empty open subset of the real line is
positive, for such a subset must contain some ri . On the other hand, the measure
of Q is
1
μ(Q) = = 1.
2i
ri ∈Q

Since Q has Lebesgue measure zero (because it is a countable set), this implies
that μ is not absolutely continuous with respect to the Lebesgue measure.

This example also motivates the concept of the support of a measure on a


topological space (X, τ ), which we introduce next. For that, we must recall a
few basic ideas from topology.
A subset τ  of the topology τ is a basis of the topology, or a basis of open
sets, if for every x ∈ X and every open set U containing x there exists U  ∈ τ 
such that x ∈ U  ⊂ U. We say that the topological space admits a countable
basis of open sets if such a subset τ  may be chosen to be countable. A set
V ⊂ X is a neighborhood of a point x ∈ X if there exists some open set U such
that x ∈ U ⊂ V. Thus, a subset X is open if and only if it is a neighborhood of
each one of its points. A family υ  of subsets of X is a basis of neighborhoods
of a point x ∈ X if for every neighborhood V there exists some V  ∈ υ  such that
x ∈ V  ⊂ V. We say that x admits a countable basis of neighborhoods if υ  may
be chosen to be countable. If the topological space admits a countable basis of
open sets then every x ∈ X admits a countable basis of neighborhoods, namely,
the family of elements of the countable basis of open sets that contain x.

Definition A.1.25. Let (X, τ ) be a topological space and μ be a measure on the


Borel σ -algebra of X. The support of the measure μ is the set supp μ formed
by the points x ∈ X such that μ(V) > 0 for any neighborhood V of x.

It follows immediately from the definition that the support of a measure is a


closed set. In Example A.1.24 above, the support of μ is the whole real line,
despite the fact that μ(Q) = 1.

Proposition A.1.26. If X is a topological space with a countable basis of


open sets and μ is a non-zero measure on X, then the support supp μ is non-
empty.

Proof. If supp μ is empty then for each point x ∈ X we may find an open
neighborhood Vx such that μ(Vx ) = 0. Let {Aj : j = 1, 2, . . . } be a countable
440 Measure theory, topology and analysis

basis of the topology of X. Then, for each x ∈ X we may choose i(x) ∈ N such
that x ∈ Ai(x) ⊂ Vx . Hence,
 
X= Vx = Ai(x)
x∈X x∈X

and so   ∞
μ(X) = μ Ai(x) ≤ μ(Ai ) = 0.
x∈X i=1
This is a contradiction, and so supp μ cannot be empty.

A.1.4 Measurable maps


Measurable maps play a role in measure theory similar to the role of continuous
maps in topology: measurability corresponds to the idea that the map preserves
the family of measurable subsets, just as continuity means that the family of
open subsets is preserved by the map.
Definition A.1.27. Given measurable spaces (X, B) and (Y, C), we say that a
map f : X → Y is measurable if f −1 (C) ∈ B for every C ∈ C.
In general, the family of sets C ∈ C such that f −1 (C) ∈ B is a σ -algebra. So, to
prove that f is measurable it suffices to show that f −1 (C0 ) ∈ B for every set C0
in some family C0 ⊂ C that generates the σ -algebra C. See also Exercise A.1.1.
Example A.1.28. A function f : X → [−∞, ∞] is measurable if and only
if the set f −1 ((c, +∞]) belongs to B for every c ∈ R. This follows from
the previous observation, since the family of intervals (c, +∞] generates the
Borel σ -algebra of the extended line (recall Example A.1.9). In particular, if
a function f takes values in (−∞, +∞) then it is measurable if and only if
f −1 ((c, +∞)) belongs to B for every c ∈ R.
Example A.1.29. If X is a topological space and B is the corresponding Borel
σ -algebra, then every continuous function f : X → R is measurable. Indeed,
continuity means that the pre-image of every open subset of R is an open subset
of X and, hence, is in B. Since the family of open sets generates the Borel
σ -algebra of R, it follows that the pre-image of every Borel subset of the real
line is also in B.
Example A.1.30. The characteristic function XB : X → R of a set B ⊂ X is
defined by: 
1, if x ∈ B;
XB (x) =
0, otherwise.
Observe that the function XB is measurable if and only if B is a measurable
subset: indeed, XB−1 (A) ∈ {∅, B, X \ B, X} for any A ⊂ R.
Among the basic properties of measurable functions, let us highlight:
A.1 Measure spaces 441

Proposition A.1.31. Let f , g : X → [−∞, +∞] be measurable functions and


let a, b ∈ R. Then the following functions are also measurable:
(af + bg)(x) = af (x) + bg(x) and (f · g)(x) = f (x) · g(x).
Moreover, if fn : X → [−∞, +∞] is a sequence of measurable functions, then
the following functions are also measurable:
s(x) = sup{fn (x) : n ≥ 1} and i(x) = inf{fn (x) : n ≥ 1},

f ∗ (x) = lim sup fn (x) and f∗ (x) = lim inf fn (x).


n n

In particular, if f (x) = lim fn (x) exists then f is measurable.

The linear combinations of characteristic functions form an important class


of measurable functions:

Definition A.1.32. We say that a function s : X → R is simple if there exist


constants α1 , . . . , αk ∈ R and pairwise disjoint measurable sets A1 , . . . , Ak ∈ B
such that
k
s= αj XAj , (A.1.2)
j=1

where XA is the characteristic function of the set A.

Note that every simple function is measurable. In the converse direction,


the result that follows asserts that every measurable function is the limit of
a sequence of simple functions. This fact will be very useful in the next
appendix, when defining the Lebesgue integral.

Proposition A.1.33. Let f : X → [−∞, +∞] be a measurable function. Then


there exists a sequence (sn )n of simple functions such that |sn (x)| ≤ |f (x)| for
every n and
lim sn (x) = f (x) for every x ∈ X.
n

If f takes values in R, we may take every sn with values in R. If f is bounded,


the sequence (sn )n may be chosen such that the convergence is uniform. If f is
non-negative, we may take 0 ≤ s1 ≤ s2 ≤ · · · ≤ f .

In Exercise A.1.16 the reader is invited to prove this proposition.

A.1.5 Exercises
A.1.1. Let X be a set and (Y, C) be a measurable space. Show that, for any
transformation f : X → Y there exists some σ -algebra B of subsets of X such
that the transformation is measurable with respect to the σ -algebras B and C.
A.1.2. Let X be a set and consider the family of subsets

B0 = {A ⊂ X : A is finite or Ac is finite}.
442 Measure theory, topology and analysis

Show that B0 is an algebra. Moreover, B0 is a σ -algebra if and only if the set X


is finite. Show also that, in general,

B1 = {A ⊂ X : A is finite or countable or Ac is finite or countable}

is the σ -algebra generated by the algebra B0 .


A.1.3. Prove Proposition A.1.5.
A.1.4. The purpose of this exercise is to exhibit a non-Borel subset of the real line.
Let α be any irrational number. Consider the following relation on R : x ∼ y ⇔
there are m, n ∈ Z such that x − y = m + nα. Check that ∼ is an equivalence
relation and every equivalence class intersects [0, 1). Let E0 be a subset of [0, 1)
containing exactly one element of each equivalence class (the existence of such
a set is a consequence of the Axiom of Choice). Show that E0 is not a Borel set.
A.1.5. Let (X, B, μ) be a measure space. Show that if A1 ,A2 , . . . are in B then
+
∞ % ∞
μ Aj ≤ μ(Aj ).
j=1 j=1

A.1.6. (Lemma of Borel–Cantelli). Let (En )n be a countable family of measurable sets.


Let F be the set of points that belong to En for infinitely many values of n, that

is, F = lim supn En = ∞ k=1

n=k En . Show that if n μ(En ) < ∞ then μ(F) = 0.
A.1.7. Prove Theorem A.1.14.
A.1.8. Let A be the collection of subsets of X = [0, 1] that may be written as finite
unions of pairwise disjoint intervals. Check that A is an algebra of subsets of
X. Let m0 : A → [0, 1] be the function defined on this algebra by
 
m0 I1 ∪ · · · ∪ IN = |I1 | + · · · + |IN |,

where |Ij | represents the length of Ij . Show that m0 is σ -additive.


A.1.9. Let B be an algebra of subsets of X and μ : B → [0, +∞) be a finitely additive
function with μ(X) < ∞. Show that μ is σ -additive if and only if any one of
the following conditions holds:
(a) limn μ(An ) = μ( ∞ j=1 Aj ) for any decreasing sequence A1 ⊃ · · · ⊃ Aj ⊃ · · ·
of elements of B;
(b) limn μ(An ) = μ( ∞ j=1 Aj ) for any increasing sequence A1 ⊂ · · · ⊂ Aj ⊂ · · ·
of elements of B.
A.1.10. Show that μ = |μ|(X) defines a complete norm in the vector space of finite
signed measures on a measurable space (X, B).
A.1.11. Let X = {1, . . . , d} be a finite set, endowed with the discrete topology, and let
M = X I with I = N or I = Z.
(a) Check that (A.2.7) defines a distance on M and that the topology defined
by this distance coincides with the product topology on M. Describe the
open balls and the closed balls around any point x ∈ X I .
(b) Without using the theorem of Tychonoff, show that (M, d) is a compact
space.
(c) Let A be the algebra generated by the elementary cylinders of M. Show
that every additive function μ : A → [0, 1] with μ(M) = 1 extends to a
probability measure on the Borel σ -algebra of M.
A.2 Integration in measure spaces 443

A.1.12. Let K ⊂ [0, 1] be the Cantor set, that is, K = ∞ n=0 Kn where K0 = [0, 1] and
each Kn is the set obtained by removing from each connected component C of
Kn−1 the open interval whose center coincides with the center of C and whose
length is one third of the length of C. Show that K has Lebesgue measure equal
to zero.
A.1.13. Given a set E ⊂ Rd , prove that the following conditions are equivalent:
(a) E is a Lebesgue measurable set.
(b) E belongs to the completion of the Borel σ -algebra relative to the
Lebesgue measure, that is, there exist Borel sets B1 , B2 ⊂ Rd such that
B1 ⊂ E ⊂ B2 and m(B2 \ B1 ) = 0.
(c) (Approximation from above by open sets) Given ε > 0 we can find an open
set A such that E ⊂ A and m∗ (A \ E) < ε.
(d) (Approximation from below by closed sets) Given ε > 0 we can find a
closed set F such that F ⊂ E and m∗ (E \ F) < ε.
A.1.14. Prove Proposition A.1.31.
A.1.15. Let gn : M → R, n ≥ 1 be a sequence of measurable functions such that

f (x) = ∞ n=1 gn (x) converges at every point. Show that the sum f is a measurable
function.
A.1.16. Prove Proposition A.1.33.
A.1.17. Let f : X → X be a measurable transformation and ν be a measure on X. Define
(f∗ ν)(A) = ν(f −1 (A)). Show that f∗ ν is a measure and note that it is finite if and
only if ν itself is finite.
A.1.18. Let ω5 : [0, 1] → [0, 1] be the function assigning to each x ∈ [0, 1] the upper
frequency of the digit 5 in the decimal expansion of x. In other words, writing
x = 0.a0 a1 a2 . . . with ai = 9 for infinitely many values of i,

1
ω5 (x) = lim sup #{0 ≤ j ≤ n − 1 : aj = 5}.
n n

Prove that the function ω5 is measurable.

A.2 Integration in measure spaces


In this appendix we define the Lebesgue integral of a measurable function
with respect to a measure. This generalizes the notion of Riemann integral that
is usually presented in calculus or introductory analysis courses to a much
broader class of functions. Indeed, the Riemann integral is not defined for
many useful functions, for example the characteristic functions of arbitrary
measurable sets (see Example A.2.5 below). In contrast, the Lebesgue integral
makes sense for the whole class of measurable functions, which, as we have
seen in Proposition A.1.31, is closed under all the main operations in analysis.
Also in this appendix, we state some important results about the behavior
of the (Lebesgue) integral under limits of sequences. Moreover, we describe
the product of any finite family of finite measures; for probability measures we
444 Measure theory, topology and analysis

even extend this construction to countable families. Near the end, we discuss
the related notions of absolute continuity and Lebesgue derivation.

A.2.1 Lebesgue integral


Throughout this section, we always take (X, B, μ) to be a measure space. We
are going to introduce the notion of Lebesgue integral in a certain number of
steps. The first one deals with the integral of a simple function:

Definition A.2.1. Let s = kj=1 αj XAj be a simple function. The integral of s
is given by:
  k
s dμ = αj μ(Aj ).
j=1

It is easy to check (Exercise A.2.1) that this definition is consistent: if


two different linear combinations of characteristic functions define the same
function then the values of the integrals obtained from those two linear
combinations are equal.
The next step is to define the integral of a non-negative measurable function.
The idea is to approximate the function by a monotone sequence of simple
functions, using Proposition A.1.33:
Definition A.2.2. Let f : X → [0, ∞] be a non-negative measurable function.
Then  
fdμ = lim sn dμ,
n

where s1 ≤ s2 ≤ . . . is a non-decreasing sequence of simple functions such that


limn sn (x) = f (x) for every x ∈ X.
It is not difficult to check (Exercise A.2.2) that this definition is consistent:
the value of the integral does not depend on the choice of the sequence (sn )n .
Next, to extend the definition of integral to an arbitrary measurable function,
let us observe that given any function f : X → [−∞, +∞] we can always write
f = f + − f − with
f + (x) = max{f (x), 0} and f − (x) = max{−f (x), 0}.
It is clear that the functions f + and f − are non-negative. Moreover, by
Proposition A.1.31, they are measurable whenever f is measurable.
Definition A.2.3. Let f : X → [−∞, +∞] be a measurable function. Then
  
f dμ = f dμ − f − dμ,
+

as long as at least one of the integrals on the right-hand side is finite (with
the usual conventions that (+∞) − a = +∞ and a − (+∞) = −∞ for every
a ∈ R).
A.2 Integration in measure spaces 445

Definition A.2.4. A function f : X → [−∞, +∞] is integrable if it is


measurable and its integral is a real number. We denote the set of all integrable
functions as L1 (X, B, μ) or, simply, as L1 (μ).
Given a measurable function f : X → [−∞, ∞] and a measurable set E, we
define the integral of f over E to be
 
fdμ = f XE dμ,
E
where XE is the characteristic function of the set E.
Example A.2.5. Consider X = [0, 1] endowed with the Lebesgue measure m.
Let f = XB , where B is the subset of rational numbers. Then m(B) = 0 and
so, using Definition A.2.2, the Lebesgue integral of f is equal to zero. On the
other hand, a direct calculation shows that every lower Riemann sum of f is
equal to 0, while every upper Riemann sum of f is equal to 1. So, the Riemann
integral of f does not exist. Indeed, more generally, the Riemann integral of the
characteristic function of a measurable set exists if and only if the boundary of
the set has zero Lebesgue measure. Note that in the present case the boundary
is the whole of [0, 1], which has positive Lebesgue measure.
Example A.2.6. Let x1 , . . . , xm ∈ X and p1 , . . . , pm > 0 with p1 + · · · + pm = 1.
Let μ be the probability measure μ defined on 2X by

m
μ= pi δxi where δxi is the Dirac mass at xi .
i=1

In other words, μ(A) = xi ∈A pi for every subset A of X. Then, for any function
f : X → [−∞, +∞],
 
m
f dμ = pi f (xi ).
i=1

Proposition A.2.7. The set L1 (μ) of all real integrable functions is a real
vector space. Moreover, the map I : L1 (μ) → R given by I(f ) = f dμ is a
positive linear functional:
(1) af + bg dμ = a f dμ + b g dμ, and
(2) f dμ ≥ g dμ if f (x) ≥ g(x) for every x.
In particular, | f dμ| ≤ |f | dμ if |f | ∈ L1 (μ). Moreover, |f | ∈ L1 (μ) if and
only if f ∈ L1 (μ).
The notion of the Lebesgue integral may be extended to an even broader
class of functions, in two different ways. On the one hand, we may consider
complex functions f : X → C. In this case, we say that f is integrable if and
only if the real part (f and the imaginary part )f are both integrable. Then, by
definition,   
f dμ = (f dμ + i )f dμ.
446 Measure theory, topology and analysis

On the other hand, we may consider functions that are not necessarily
measurable but coincide with some measurable function on a subset of the
domain with total measure. To explain this, we need the following notion,
which is used frequently throughout the text:

Definition A.2.8. We say that a property holds at μ-almost every point (or
μ-almost everywhere) if the subset of points of X for which it does not hold is
contained in some zero measure set.

For example, we say that a sequence of functions (fn )n converges to some


function at μ-almost every point if there exists some measurable set N ⊂ X
with μ(N) = 0 such that f (x) = limn fn (x) for every x ∈ X \ N. Analogously, we
say that two functions f and g are equal at μ-almost every point if there exists a
measurable set N ⊂ X with μ(N) = 0 such that f (x) = g(x) for every x ∈ X \ N.
Clearly, this is an equivalence relation in the space of functions defined on X.
Moreover, assuming that the two functions are integrable, it implies that the
two integrals coincide:
 
f dμ = g dμ if f = g at μ-almost every point.

This observation permits the definition of the integral for any function f ,
possibly non-measurable, that coincides at μ-almost every point with some
measurable function g: it suffices to take f dμ = g dμ.
To close this section, let us observe that the notion of integral may also be
extended to signed measures and even complex measures, as follows. Let μ be
a signed measure and μ = μ+ − μ− be its Hahn decomposition. We say that a
function φ is integrable with respect to μ if it is integrable with respect to both
μ+ and μ− . Then we define:
  
φ dμ = φ dμ − φ dμ− .
+

Similarly, let μ be a complex measure. By definition, a function φ is integrable


with respect to μ if it is integrable with respect to both the real part (μ and
the imaginary part )μ. Then we define:
  
φ dμ = φ d(μ − φ d)μ.

A.2.2 Convergence theorems


Next, we mention three important results concerning the convergence of
functions under the integral sign. The first one deals with monotone sequences
of functions:

Theorem A.2.9 (Monotone convergence). Let fn : X → [−∞, +∞] be a


non-decreasing sequence of non-negative measurable functions. Consider the
A.2 Integration in measure spaces 447

function f : X → [−∞, +∞] defined by f (x) = limn fn (x). Then


 
lim fn dμ = f (x) dμ.
n

The next result applies to much more general sequences, not necessarily
monotone:
Theorem A.2.10 (Lemma of Fatou). Let fn : X → [0, +∞] be a sequence
of non-negative measurable functions. Then the function f : X → [−∞, +∞]
defined by f (x) = lim infn fn (x) is integrable and satisfies
 
lim inf fn (x) dμ ≤ lim inf fn dμ.
n n

The most powerful of the results in this section is the dominated convergence
theorem, which asserts that we may take the limit under the integral sign
whenever the sequence of functions is bounded by some integrable function:
Theorem A.2.11 (Dominated convergence). Let fn : X → R be a sequence
of measurable functions and assume that there exists some integrable function
g : X → R such that |fn (x)| ≤ |g(x)| for μ-almost every x in X. Assume moreover
that the sequence (fn )n converges at μ-almost every point to some function
f : X → R. Then f is integrable and satisfies
 
lim fn dμ = f dμ.
n

In Exercise A.2.7 we invite the reader to deduce the dominated convergence


theorem from the Lemma of Fatou.

A.2.3 Product measures


Let (Xj , Aj , μj ), j = 1, . . . , n be finite measure spaces, that is, such that μj (Xj ) <
∞ for every j. One can endow the Cartesian product X1 × · · · × Xn with the
structure of a finite measure space in the following way. Consider on X1 ×
· · · × Xn the σ -algebra generated by the family of all subsets of the form A1 ×
· · · × An with Aj ∈ Aj . This is called the product σ -algebra and is denoted by
A1 ⊗ · · · ⊗ A n .
Theorem A.2.12. There exists a unique measure μ on the measurable space
(X1 × · · · × Xn , A1 ⊗ · · · ⊗ An ) such that μ(A1 × · · · × An ) = μ1 (A1 ) · · · μn (An )
for every A1 ∈ A1 , . . . , An ∈ An . In particular, μ is a finite measure.
The proof of this result (see Theorem 35.B in Halmos [Hal50]) combines
the extension theorem (Theorem A.1.13) with the monotone convergence
theorem (Theorem A.2.9). The measure μ in the statement is the product of
the measures μ1 , . . . , μn and is denoted by μ1 × · · · × μn . In this way one
defines the product measure space
(X1 × · · · × Xn , A1 ⊗ · · · ⊗ An , μ1 × · · · × μn ).
448 Measure theory, topology and analysis

Theorem A.2.12 remains valid when the measures μj are just σ -finite, except
that in this case the product measure μ is also only σ -finite.
Next, we describe the product of a countable family of measure spaces.
Actually, for now we restrict ourselves to the case of probability spaces. Let
(Xj , Bj , μj ), j ∈ I be probability measure spaces with μj (Xj ) = 1 for every j ∈ I.
What follows holds for both I = N and I = Z. Consider the Cartesian product
&
= Xj = {(xj )j∈I : xj ∈ Xj }. (A.2.1)
j∈I

We call cylinders of  all subsets of the form

[m; Am , . . . , An ] = {(xj )j∈I : xj ∈ Aj for m ≤ j ≤ n}, (A.2.2)

where m ∈ I and n ≥ m and Aj ∈ Bj for each m ≤ j ≤ n. Note that X itself is


a cylinder: we may write X = [1; X1 ], for example. By definition, the product
σ -algebra on  is the σ -algebra B generated by the family of all cylinders.
The family A of all finite unions of pairwise disjoint cylinders is an algebra
and it generates the product σ -algebra B.

Theorem A.2.13. There exists a unique measure μ on (, B) such that

μ([m; Am , . . . , An ]) = μm (Am ) · · · μn (An ) (A.2.3)

for every cylinder [m; Am , . . . , An ]. In particular, μ is a probability measure.

The proof of this theorem (see Theorem 38.B in Halmos [Hal50]) uses the
extension theorem (Theorem A.1.13) together with the theorem of continuity
at the empty set (Theorem A.1.14). The probability measure μ is called the
.
product of the measures μj and is denoted as j∈I μj . The probability space
(, B, μ) is called the product of the spaces (Xj , Bj , μj ), j ∈ I.
An important special case is when the spaces (Xi , Bi , μi ) are all equal to
a given (X, C, ν). The corresponding product space may be used to model
a sequence of identical random experiments such that the outcome of each
experiment is independent of all the others. To explain this, take X to be
the set of possible outcomes of each experiment and let ν be the probability
distribution of those outcomes. In this context, the measure μ = ν I =
.
j∈I ν is usually called the Bernoulli measure defined by ν. Property (A.2.3)
corresponds to the identity
&
n
μ([m; Am , . . . , An ]) = ν(Aj ), (A.2.4)
j=m

which may be read in the following way: the probability of any composite
event {xm ∈ Am , . . . , xn ∈ An } is equal to the product of the probabilities of
the individual events xi ∈ Ai . So, (A.2.4) does reflect the assumption that the
successive experiments are mutually independent.
A.2 Integration in measure spaces 449

We have a special interest in the case when X is a finite set, endowed with
the σ -algebra C = 2X of all its subsets. In this case, it is useful to consider the
elementary cylinders

[m; am , . . . , an ] = {(xj )j∈I ∈ X : xm = am , . . . , xn = an }, (A.2.5)

corresponding to subsets Aj consisting of a single point aj . Observe that


every cylinder is a finite union of pairwise disjoint elementary cylinders. In
particular, the σ -algebra generated by the elementary cylinders coincides with
the σ -algebra generated by all the cylinders, and the same is true for the
generated algebra. Moreover, the relation (A.2.4) may be written as

μ([m; am , . . . , an ]) = pam , · · · pan where pa = ν({a}) for a ∈ X. (A.2.6)

Consider the finite set X endowed with the discrete topology. The product
topology on  = X I coincides with the topology generated by the elementary
cylinders. Moreover (see Exercise A.1.11), it coincides with the topology
associated with the distance defined by
 
d (xi )i∈I , (yi )i∈I = θ N , (A.2.7)

where θ ∈ (0, 1) is fixed and N = N((xi )i∈I , (yi )i∈I ) ≥ 0 is the largest integer
such that xi = yi for every i ∈ I with |i| < N.

A.2.4 Derivation of measures


Let m be the Lebesgue measure on Rd . Given a measurable subset A of Rd , we
say that a ∈ Rd is a density point of A if the subset A occupies most of every
small neighborhood of a, in the following sense:
m(B(a, δ) ∩ A)
lim = 1. (A.2.8)
δ→0 m(B(a, δ))

Theorem A.2.14. Let A be a measurable subset of Rd with Lebesgue measure


m(A) positive. Then m-almost every a ∈ A is a density point of A.

In Exercise A.2.11 we propose a proof of this result. It is also a direct


consequence of the theorem that we state next. We say that a function f : Rd →
R is locally integrable if the product f XK is integrable for every compact set
K ⊂ Rd .

Theorem A.2.15 (Lebesgue derivation). Let X = Rd and B be a Borel


σ -algebra and m be the Lebesgue measure on Rd . Let f : X → R be a locally
integrable function. Then

1
lim |f (y) − f (x)|dm = 0 at m-almost every point x.
r→0 m(B(x, r)) B(x,r)
450 Measure theory, topology and analysis

In particular,

1
lim f (y)dm = f (x) at m-almost every point x.
r→0 m(B(x, r)) B(x,r)

The crucial ingredient in the proof of these results is the following geometric
fact:

Theorem A.2.16 (Lemma of Vitali). Let m be the Lebesgue measure on Rd


and suppose that for every x ∈ R one is given a sequence (Bn (x))n of balls
centered at x with radii converging to zero. Let A ⊂ Rd be a measurable set
with m(A) > 0. Then, for every ε > 0 there exist sequences (xj )j in R and (nj )j
in N such that

1. the balls Bnj (xj ) are pairwise disjoint;


+ % + %
2. m j Bnj (xj ) \ A < ε and m A \ j Bnj (xj ) = 0.

This theorem remains valid if, instead of balls, we take for (Bn (x))n any
sequence of sets such that n Bn (x) = {x} and
sup{d(x, y) : y ∈ Bn (x)}
sup < ∞.
x,n inf{d(x, z) : z ∈
/ Bn (x)}
The set of measures defined on the same measurable space possesses a
natural partial order relation:

Definition A.2.17. Let μ and ν be two measures in the same measurable


space (X, B). We say that ν is absolutely continuous with respect to μ if every
measurable set E that satisfies μ(E) = 0 also satisfies ν(E) = 0; then we write
ν  μ. We say that μ and ν are equivalent if each one of them is absolutely
continuous with respect to the other; then we write μ ∼ ν. In other words, two
measures are equivalent if they have exactly the same zero measure sets.

Another very important result, known as the theorem of Radon–Nikodym,


asserts that if ν  μ then the measure ν may be seen as the product of μ by
some measurable function ρ:

Theorem A.2.18 (Radon–Nikodym). If μ and ν are finite measures such that


ν  μ then there exists a measurable function ρ : X → [0, +∞] such that
ν = ρμ, meaning that
 
φ dν = φρ dμ for any bounded measurable function φ : X → R.
(A.2.9)
In particular, ν(E) = E ρ dμ for every measurable set E ⊂ X. Moreover, ρ is
essentially unique: any two functions satisfying (A.2.9) coincide at μ-almost
every point.
A.2 Integration in measure spaces 451

We call ρ the density, or Radon–Nikodym derivative, of ν relative to μ and


we write

ρ= .

Definition A.2.19. Let μ and ν be two measures in the same measurable


space (X, B). We say that μ and ν are mutually singular if there exist disjoint
measurable subsets A and B such that A ∪ B = X and μ(A) = 0 and ν(B) = 0.
Then we write μ ⊥ ν.

The Lebesgue decomposition theorem states that, given any two finite
measures μ and ν in the same measurable space, we may write ν = νa + νs
where νa and νs are finite measures such that νa  μ and νs ⊥ μ. Combining
this with the theorem of Radon–Nikodym, we get:

Theorem A.2.20 (Lebesgue decomposition). Given any finite measures μ and


ν, there exist a measurable function ρ : X → [0, +∞] and a finite measure η
such that ν = ρμ + η and η ⊥ μ.

A.2.5 Exercises
A.2.1. Prove that the integral of a simple function is well defined: if two linear
combinations of characteristic functions define the same function, then the
values of the integrals obtained from the two combinations coincide.
A.2.2. Show that if (rn )n and (sn )n are non-decreasing sequences of non-negative
functions converging at μ-almost every point to the same function f : M →
[0, +∞), then limn rn dμ = limn sn dμ.
A.2.3. Prove Proposition A.2.7.
A.2.4. (Tchebysheff–Markov inequality) Let f : M → R be a non-negative function
integrable with respect to a finite measure μ. Then, given any real number
a > 0,

  1
μ {x ∈ M : f (x) ≥ a} ≤ f dμ.
a X
 
In particular, if |f | dμ = 0 then μ {x ∈ X : f (x)  = 0} = 0.
A.2.5. Let f be an integrable function. Show that for every ε > 0 there exists δ > 0
such that | E f dμ| < ε for every measurable set E with μ(E) < δ.
A.2.6. Let ψ1 , . . . , ψN : M → R be bounded measurable functions defined on a
probability space (M, B, μ). Show that for any ε > 0 there exist x1 , . . . , xs ∈ M

and positive numbers α1 , . . . , αs such that sj=1 αj = 1 and
  
 s

 ψi dμ − α ψ (x ) <ε for every i = 1, . . . , N.
 j i j 
j=1

A.2.7. Deduce the dominated convergence theorem (Theorem A.2.11) from the
Lemma of Fatou (Theorem A.2.10).
452 Measure theory, topology and analysis

A.2.8. A set F of measurable functions f : M → R is said to be uniformly integrable


with respect to a probability measure μ if for every α > 0 there exists C > 0
such that {|f |>C} |f | dμ < α for every f ∈ F . Show that
(a) F is uniformly integrable with respect to μ if and only if there exists L > 0
and for every ε > 0 there exists δ > 0 such that |f | dμ < L and A |f | dμ <
ε for every f ∈ F and every measurable set A with μ(A) < δ.
(b) If there exists a function g : M → R integrable with respect to μ such that
|f | ≤ |g| for every f ∈ F (we say that F is dominated by g) then the set F
is uniformly integrable with respect to μ.
(c) If the set F is uniformly integrable with respect to μ then limn fn dμ =
lim fn dμ for any sequence (fn )n in F such that limn fn exists at μ-almost
every point.
A.2.9. Show that a is a density point of a set A ⊂ Rd if and only if
 
m(B ∩ A)
lim inf : B a ball with a ∈ B ⊂ B(a, δ) = 1. (A.2.10)
δ→0 m(B)

A.2.10. Let Pn , n ≥ 1 be a sequence of countable partitions of Rd into measurable


subsets. Assume that the diameter diam Pn = sup{diam P : P ∈ Pn } converges
to zero when n → ∞. Show that, given any measurable set A ⊂ Rd with positive
Lebesgue measure, it is possible to choose sets Pn ∈ Pn , n ≥ 1 in such a way
that m(A ∩ Pn )/m(Pn ) → 1 when n → ∞.
A.2.11. Prove Theorem A.2.14.
A.2.12. Consider x1 , x2 ∈ M and p1 , p2 , q1 , q2 > 0 with p1 + p2 = q1 + q2 = 1. Let μ
and ν be the probability measures given by
 
μ(A) = pi , ν(A) = qi ,
xi ∈A xi ∈A

that is, μ = p1 δx1 + p2 δx2 and ν = q1 δx1 + q2 δx2 . Check that ν  μ and μ  ν
and calculate the corresponding Radon–Nikodym derivatives.
A.2.13. Construct a probability measure μ on [0, 1] absolutely continuous with respect
to the Lebesgue measure m and such that there exists a measurable set K ⊂ [0, 1]
with μ(K) = 0 and m(K) = 1/2. In particular, m is not absolutely continuous
with respect to μ. Could we require that m(K) = 1?
A.2.14. Assume that f : X → X is such that there exists a countable cover of M by
measurable sets Bn , n ≥ 1, such that the restriction of f to each Bn is a bijection
onto its image, with measurable inverse. Let η be a probability measure on M
such that A ⊂ Bn and η(A) = 0 implies η(f (A)) = 0. Show that there exists a
function Jη : X → [0, +∞] such that
 
ψ dη = (ψ ◦ f )Jη dη
f (Bn ) Bn

for every bounded measurable function ψ : X → R and every n. Moreover, Jη


is essentially unique.
A.2.15. Let μ = μ+ − μ− be the Hahn decomposition of a finite signed measure μ.
Show that there exist functions ρ ± and τ ± such that μ+ = ρ + |μ| = τ + μ and
μ− = ρ − |μ| = τ − μ. Which functions are these?
A.3 Measures in metric spaces 453

A.2.16. Let (μn )n and (νn )n be two sequences of measures such that μ = n μn and
  
ν = n νn are finite measures. Let μ̂n = ni=1 μi and ν̂n = ni=1 νi . Show that
if μ̂n  ν̂n for every n then μ  ν and
dμ dμ̂n
= lim at ν-almost every point.
dν n d ν̂n

A.3 Measures in metric spaces


In this appendix, unless stated otherwise, μ is a Borel probability measure on
a metric space M, that is, a probability measure defined on the Borel σ -algebra
of M. Most of the results extend immediately to finite Borel measures, in fact.
Recall that a metric space is a pair (M, d) where M is a set and d is a distance
in M, that is, a function d : M × M → R satisfying:

1. d(x, y) ≥ 0 for any x, y and the equality holds if and only if x = y;


2. d(x, y) = d(y, x) for any x, y;
3. d(x, y) ≤ d(x, z) + d(z, y) for any x, y, z.

We denote B(x, r) = {y ∈ M : d(x, y) < r} and call it the ball of center x ∈ M and
radius r > 0.
Every metric space has a natural structure of a topological space where the
family of balls centered at each point is a basis of neighborhoods for that
point. Equivalently, a subset of M is open if and only if it contains some ball
centered at each one of its points. In the converse direction, one says that a
topological space is metrizable if its topology can be defined in this way, from
some distance function.

A.3.1 Regular measures


A first interesting fact is that any probability measure on a metric space is
completely determined by the values it takes on the open subsets (or the closed
subsets) of the space.

Definition A.3.1. A (Borel) measure μ on a topological space is regular if for


every measurable subset B and every ε > 0 there exists a closed set F and an
open set A such that F ⊂ B ⊂ A and μ(A \ F) < ε.

Proposition A.3.2. Any probability measure on a metric space is regular.

Proof. Let B0 be the family of all Borel subsets B for which the condition in
the definition holds, that is, such that for every ε > 0 there exist a closed set
F and an open set A satisfying F ⊂ B ⊂ A and μ(A \ F) < ε. Begin by noting
that B0 contains all the closed subsets of M. Indeed, let B be any closed set
and let Bδ denote the (open) set of points whose distance to B is less than δ.
454 Measure theory, topology and analysis

By Theorem A.1.14, we have that μ(Bδ \ B) → 0 when δ → 0. Hence, we may


take F = B and A = Bδ for δ > 0 sufficiently small.
It is immediate that the family B0 is closed under taking the complement,
that is, Bc ∈ B0 whenever B ∈ B0 . Furthermore, consider any countable family
Bn , n = 1, 2, . . . of elements of B0 and let B = ∞n=1 Bn . By hypothesis, for
every n ∈ N and ε > 0 there exist a closed set Fn and an open set An satisfying
Fn ⊂ Bn ⊂ An and μ(An \ Fn ) < ε/2n+1 . The union A = ∞ n=1 An is an open set
m
and any finite union F = n=1 Fn is a closed set. Fix m large enough that
 ∞ 
μ Fn \ F < ε/2
n=1

(recall Theorem A.1.14). Then F ⊂ B ⊂ A and


   ∞
  +∞ %  ∞
ε ε
μ A\F ≤ μ An \ Fn + μ Fn \ F < n+1
+ = ε.
n=1 n=1 n=1
2 2

This shows that B ∈ B0 . In this way, we have shown that B0 is a σ -algebra.


Hence, B0 contains all the Borel subsets of M.

It follows that, as stated above, the values that the probability measure μ
takes on the closed subsets of M determine μ completely: if ν is another
probability measure such that μ(F) = ν(F) for every closed set F then, taking
the complement, μ(A) = ν(A) for every open set A and, using the theorem,
μ(B) = ν(B) for every Borel set B. In other words, μ = ν. The same argument
shows that the values of μ on the open sets also determine the measure
completely.
The proposition that we state and prove next implies that the values of the
integrals of the bounded continuous functions also determine the probability
measure completely. Indeed, the same is true for the (smaller) set of bounded
Lipschitz functions.
Recall that a map h : M → N is Lipschitz if there exists some constant C > 0
such that d(h(x), h(y)) ≤ Cd(x, y) for every x, y ∈ M. When it is necessary to
specify the constant, we say that the function h is C-Lipschitz. More generally,
we say that h is Hölder if there exist C, θ > 0 such that d(h(x), h(y)) ≤ Cd(x, y)θ
for every x, y ∈ M. Then we also say that h is θ -Hölder or even (C, θ )-Hölder.

Proposition A.3.3. If μ and ν are probability measures on a metric space M


with ϕ dμ = ϕ dν for every bounded Lipschitz function ϕ : M → R then
μ = ν.

Proof. We are going to use the following simple topological fact:

Lemma A.3.4. Given any closed subset F of M and any δ > 0, there exists
a Lipschitz function gδ : M → [0, 1] such that gδ (x) = 1 for every x ∈ F and
gδ (x) = 0 for every x ∈ M such that d(x, F) ≥ δ.
A.3 Measures in metric spaces 455

Proof. Consider the function h : R → [0, 1] given by h(s) = 1 if s ≤ 0 and


h(s) = 0 if s ≥ 1 and h(s) = 1 − s if 0 ≤ s ≤ 1. Define
1 
g : M → [0, 1], g(x) = h d(x, F) .
δ
Note that g is Lipschitz, since it is a composition of Lipschitz functions. The
other properties in the lemma follow immediately from the definition.

Now we may finish the proof of Proposition A.3.3. Let F be any closed
subset of M and, for every δ > 0, let gδ : M → [0, 1] be a function as in the
lemma above. By assumption,
 
gδ dμ = gδ dν for every δ > 0.

Moreover, by the dominated convergence theorem (Theorem A.2.11),


 
lim gδ dμ = μ(F) and lim gδ dν = ν(F).
δ→0 δ→0

This shows that μ(F) = ν(F) for every closed subset F. As pointed out before,
the latter implies μ = ν.

As observed in Example A.1.29, continuous maps are automatically


measurable relative to the Borel σ -algebra. The result that we prove next
asserts that, under a simple condition on the metric space, there is a kind of
converse: measurable maps are continuous, restricted to certain subsets with
almost full measure.
A subset of a topological space M is dense if it intersects every open subset
of M. We say that the space M is separable if it admits some countable dense
subset. In the special case of metric spaces this is equivalent to saying that the
topology admits a countable basis of open sets (Exercise A.3.1).
Theorem A.3.5 (Lusin). Let ϕ : M → N be a measurable map with values
in some separable metric space N. Given any ε > 0, there exists a closed set
F ⊂ M such that μ(M \ F) < ε and the restriction of ϕ to F is continuous.

Proof. Let {xn : n ∈ N} be a countable dense subset of N and, for every k ≥ 1,


let Bn,k be the ball of center xn and radius 1/k. Fix ε > 0. By Proposition A.3.2,
for every (n, k) we may find an open set An,k ⊂ M containing ϕ −1 (Bn,k ) and
satisfying μ(An,k \ ϕ −1 (Bn,k )) < ε/2n+k+1 . Define

  
E= ϕ −1 (Bn,k ) ∪ Acn,k .
n,k=1

On the one hand,



 ∞

−1 ε ε
μ(M \ E) ≤ μ(An,k \ ϕ (Bn,k )) < = .
n,k=1 n,k=1
2n+k+1 2
456 Measure theory, topology and analysis

On the other hand, every ϕ −1 (Bn,k ) is an open subset of ϕ −1 (Bn,k ) ∪ Acn,k , since
the complement is the closed set Acn,k . Consequently, ϕ −1 (Bn,k ) is open in E
for every (n, k). This shows that the restriction of ϕ to the set E is continuous.
To conclude the proof it suffices to use Proposition A.3.2 once more to find a
closed set F ⊂ E such that μ(E \ F) < ε/2.

A.3.2 Separable complete metric spaces


Next, we discuss another important property of measures on metric spaces that
are both separable and complete. Recall that the latter means that every Cauchy
sequence converges.

Definition A.3.6. A (Borel) measure μ on a topological space is tight if for


every ε > 0 there exists a compact subset K such that μ(K c ) < ε.

Since every closed subset of a compact metric space is also compact, it


follows immediately from Proposition A.3.2 that every probability measure on
a compact metric space is tight. However, this conclusion is a lot more general:

Proposition A.3.7. Every probability measure on a separable complete metric


space is tight.

Proof. Let {pk : k ∈ N} be a countable dense subset of M. Then, for every n ≥ 1,


the closed balls B̄(pk , 1/n), k ∈ N form a countable cover of M. Given ε > 0
and n ≥ 1, fix k(n) ≥ 1 in such a way that the (closed) set

k(n)
Ln = B̄(pk , 1/n)
k=1

satisfies μ(Ln ) > 1 − ε/2n . Take K = ∞ n=1 Ln . Note that K is closed and
∞   ∞
ε
μ(K ) ≤ μ
c
Ln <
c
n
= ε.
n=1 n=1
2

It remains to check that K is compact. For that, it is enough to show that every
sequence (xi )i in K admits some Cauchy subsequence (since M is complete,
this subsequence converges). Such a subsequence may be found as follows.
Since xi ∈ L1 for every i, there exists l(1) ≤ k(1) such that the set of indices
I1 = {i ∈ N : xi ∈ B(pl(1) , 1)}
is infinite. Let i(1) be the smallest element of I1 . Next, since xi ∈ L2 for every
i, there exists l(2) ≤ k(2) such that
I2 = {i ∈ I1 : xi ∈ B(pl(2) , 1/2)}
is infinite. Let i(2) be the smallest element of I2 \ {i(1)}. Repeating this
procedure, we construct a decreasing sequence In of infinite subsets of N, and
an increasing sequence i(1) < i(2) < · · · < i(n) < · · · of integers such that
A.3 Measures in metric spaces 457

i(n) ∈ In and all the xi , i ∈ In are contained in the same closed ball of radius
1/n. In particular,
d(xi(a) , xi(b) ) ≤ 2/n for every a, b ≥ n.
This shows that the subsequence (xi(n) )n is indeed Cauchy.

Corollary A.3.8. Assume that M is a separable complete metric space and μ


is a probability measure on M. For every ε > 0 and every Borel set B ⊂ M there
exists a compact set L ⊂ B such that μ(B \ L) < ε.

Proof. By Proposition A.3.2, we may find some closed set F ⊂ B such that
μ(B \ F) < ε/2. By Theorem A.3.5, there exists a compact subset K ⊂ M such
that μ(M \ K) < ε/2. Take L = F ∩ K. Then L is compact and μ(B \ L) < ε.

Analogously, when the metric space M is separable and complete we can


improve the statement of Lusin’s theorem, replacing “closed” with “compact”
in the conclusion:

Theorem A.3.9 (Lusin). Assume that M is a separable complete metric space


and μ is a probability measure on M. Let ϕ : M → N be a measurable map
with values in a separable metric space N. Then, given any ε > 0 there exists
a compact set K ⊂ M such that μ(M \ K) < ε and the restriction of ϕ to K is
continuous.

We close this section with another important fact about measures on


separable complete metric spaces. A measure μ is called atomic if there exists
some point x such that μ({x}) > 0; any such point is called an atom. Otherwise,
the measure μ is said to be non-atomic.
The next theorem states that every non-atomic probability measure on a
separable complete metric space is equivalent to the Lebesgue measure in the
interval. The proof is given in Section 8.5.

Theorem A.3.10. Let M be a separable complete metric space and μ be a


non-atomic probability measure on M. Then there exists a measurable map
ψ : M → [0, 1] such that ψ is a bijection with measurable inverse, restricted to
a subset with full measure, and ψ∗ μ is the Lebesgue measure on [0, 1].

A.3.3 Space of continuous functions


Let M be a compact metric space. We are going to describe some important
properties of the vector space C0 (M) of continuous functions, real or complex,
defined on M. We consider on this space the norm of uniform convergence,
given by:
φ = sup{|φ(x)| : x ∈ M}.
458 Measure theory, topology and analysis

This norm is complete and, hence, endows C0 (M) with the structure of a
Banach space.
The conclusions of the previous sections hold in this setting, since every
compact metric space is separable and complete. Another useful fact about
compact metric spaces is that every open cover admits some Lebesgue number,
that is, some number ρ > 0 such that for every x ∈ M there exists some element
of the cover that contains the ball B(x, ρ).
A linear functional  : C0 (M) → C is said to be positive if (ϕ) ≥ 0
for every function ϕ ∈ C0 (M) with ϕ(x) ≥ 0 for every x ∈ M. The theorem
of Riesz–Markov (see Theorem 6.19 in Rudin [Rud87]) shows that the only
positive linear functionals on C0 (M) are the integrals:

Theorem A.3.11 (Riesz–Markov). Let M be a compact metric space. Con-


sider any positive linear functional  : C0 (M) → C . Then there exists a unique
finite Borel measure μ on M such that

(ϕ) = ϕ dμ for every ϕ ∈ C0 (M).

Moreover, μ is a probability measure if and only if (1) = 1.

The next result, which is also known as the theorem of Riesz–Markov, gives
an analogous representation for continuous linear functionals in C0 (M), not
necessarily positive. Recall that the norm of a linear functional  : C0 (M) → C
is defined by
 |(ϕ)| 
 = sup : ϕ = 0 (A.3.1)
ϕ
and that  is continuous if and only if the norm is finite.

Theorem A.3.12 (Riesz–Markov). Let M be a compact metric space. Con-


sider any continuous linear functional  : C0 (M) → C. Then there exists some
complex Borel measure μ on M such that

(ϕ) = ϕ dμ for every ϕ ∈ C0 (M).

The norm μ = |μ|(X) of the measure μ coincides with the norm  of the
functional . Moreover, μ takes values in [0, ∞) if and only if  is positive
and μ takes values in R if and only if (ϕ) ∈ R for every real function ϕ.

In other words, this last theorem asserts that the dual space of C0 (M) is
isometrically isomorphic to M(M). Theorems A.3.11 and A.3.12 extend to
locally compact topological spaces, with suitable assumptions on the behavior
of the functions at infinity. In this context the measure μ is still regular, but not
necessarily finite.
We also use the fact that the space C0 (M) has countable dense subsets
(Exercise A.3.6 is a particular instance):
A.3 Measures in metric spaces 459

Theorem A.3.13. If M is a compact metric space then C0 (M) is separable.

Proof. We treat the case of real functions; the complex case is entirely
analogous. Every compact metric space is separable. Let {xk : k ∈ N} be a
countable dense subset of M. For each k ∈ N, consider the function fk : M → R
defined by fk (x) = d(x, xk ). Represent by A the set of all functions f : M → R
of the form 
f = c+ ck1 ,...,ks fk1 · · · fks (A.3.2)
k1 ,...,ks

with c ∈ R and ck1 ,...,ks ∈ R for every k1 , . . . , ks ∈ N. It is clear that A contains


all constant functions. Observe also that A is an algebra of functions, that
is, it is closed under the operations of addition and multiplication (including
multiplication by any constant). Moreover, A separates the points of M, in the
sense that for any x  = y there exists some f ∈ A such that f (x) = f (y). To see
that, fix ε > 0 such that d(x, y) > 2ε, consider k ∈ N such that d(x, xk ) < ε
and then take f = fk . Note that f (x) = d(x, xk ) < ε while, by the triangle
inequality, f (y) = d(y, xk ) ≥ d(x, y) − d(x, xk ) > ε. So, the algebra of functions
A is separating, as we claimed. Now, the theorem of Stone–Weierstrass (see
[DS57, Theorem 4.6.16]) asserts that every separating subalgebra of the space
of continuous functions that contains the constant function 1 is dense in C0 (M).
The previous observations show that this applies to A. It follows that the
(countable) set of functions of the form (A.3.2) with c ∈ Q and ck1 ,...,ks ∈ Q
is also dense in C0 (M).

A.3.4 Exercises
A.3.1. Let M be a metrizable topological space. Justify that every point of M admits
a countable basis of neighborhoods. Check that M is separable if and only if it
admits a countable basis of open sets. Give examples of separable metric spaces
and non-separable metric spaces.
A.3.2. Let μ be a finite measure on a metric space M. Show that for every closed set
F ⊂ M there exists some finite or countable set E ⊂ (0, ∞) such that
μ({x ∈ M : d(x, F) = r}) = 0 for every r ∈ (0, ∞) \ E.
A.3.3. Let μ be a finite measure on a separable metric space M. Show that for every ε >
0 there exists a countable partition of M into measurable subsets with diameter
less than ε and whose boundaries have measure zero.
A.3.4. Let μ be a probability measure on [0, 1] and φ : [0, 1] → [0, 1] be the function
given by φ(x) = μ([0, x]). Check that φ is continuous if and only if μ is
non-atomic. Check that φ is absolutely continuous if and only if μ is absolutely
continuous with respect to the Lebesgue measure.
A.3.5. Let μ be a probability measure on some metric space M. Show that for every
integrable function ψ : M → R there exists a sequence ψn : M → R, n ≥ 1
of uniformly continuous functions converging to ψ at μ-almost every point.
Moreover, if ψ is bounded then we may choose the sequence in such a way
460 Measure theory, topology and analysis

that sup |ψn | ≤ sup |ψ| for every n. Do these claims remain true if we require
convergence at every point?
A.3.6. Without using Theorem A.3.13, show that the space C0 ([0, 1]d ) of continuous
functions, real or complex, on the compact unit cube is separable, for every d ≥ 1.

A.4 Differentiable manifolds


In this appendix we review some fundamental notions and facts from
differential topology and Riemannian geometry.

A.4.1 Differentiable manifolds and maps


A differentiable manifold of dimension d is a (Hausdorff) topological space
M endowed with a differentiable atlas of dimension d, that is, a family of
homeomorphisms ϕα : Uα → Xα such that

1. each Uα is an open subset of M and each Xα is an open subset of Rd and


M = α Uα ;
2. the map ϕβ ◦ ϕα−1 : ϕα (Uα ∩ Uβ ) → ϕβ (Uα ∩ Uβ ) is differentiable, for any
α and β such that Uα ∩ Uβ = ∅.

More generally, instead of Rd we may consider any Banach space E. Then we


say that M is a differentiable manifold modelled on the space E.
The homeomorphisms ϕα are called local charts, or local coordinates,
and the transformations ϕβ ◦ ϕα−1 are called coordinate changes. Exchanging
the roles of α and β, we see that the inverse (ϕβ ◦ ϕα−1 )−1 = ϕα ◦ ϕβ−1 is
also differentiable. So, the definition of a differentiable manifold requires the
coordinate changes to be diffeomorphisms between open subsets of Euclidean
space.
Unless explicitly stated otherwise, we only consider manifolds such that M
admits a countable basis of open sets and is connected. The latter means that
no subset of M is both open and closed, except for M and ∅.
Let r ∈ N ∪ {∞}. If every coordinate change is of class Cr (that is, all its
partial derivatives up to order r exist and are continuous), we say that the
manifold M (and the atlas {ϕα : Uα → Xα }) are of class Cr . Clearly, every
manifold of class Cr is also of class Cs for every s ≤ r.

Example A.4.1. The following are manifolds of class C∞ and dimension d:

Euclidean space Rd : consider the atlas consisting of a unique map, namely, the
identity map Rd → Rd .
A.4 Differentiable manifolds 461

Sphere Sd = {(x0 , x1 , . . . , xd ) ∈ Rd+1 : x02 + x12 + · · · + xd2 = 1}: consider the atlas
formed by the two stereographic projections:
Sd \ {(1, 0, . . . , 0)} → Rd , (x0 , x1 , . . . , xd ) → (x1 , . . . , xd )/(1 − x0 )
Sd \ {(−1, 0, . . . , 0)} → Rd , (x0 , x1 , . . . , xd ) → (x1 , . . . , xd )/(1 + x0 ).
Torus Td = Rd /Zd : consider the atlas formed by the inverses of the maps gz :
(0, 1)d → Td , defined by gz (x) = z + x mod Zd for every z ∈ Rd .

Example A.4.2 (Grassmannian manifolds). Given 0 ≤ k ≤ d, denote by


Gr(k, d) the set of all vector subspaces of dimension k of the Euclidean space
Rd . For each j1 < · · · < jk , denote by Gr(k, d, j1 , . . . , jk ) the subset of elements
of Gr(k, d) that are transverse to {(xj )j ∈ Rd : xj1 = · · · = xjk = 0}. For every
V ∈ Gr(k, d, j1 , . . . , jk ) there exists a unique matrix (ui,j )i,j with (d − k) rows and
k columns such that
 
V = (xj )j ∈ Rd : xi = ui,j1 xj1 + · · · + ui,jk xjk for every i ∈ / {j1 , . . . , jk } .
The maps Gr(k, d, j1 , . . . , jk ) → R(d−k)k associating with each V the correspond-
ing matrix (ui,j )i,j constitute an atlas of class C∞ for Gr(k, d). So, every Gr(k, d)
is a manifold of class C∞ and dimension (d − k)k.

Let M be a manifold of dimension d and A = {ϕα : Uα → Xα } be the


corresponding atlas. Let S be a subset of M. We say that S is a submanifold
of dimension k < d if there exists some atlas B = {ψβ : Vβ → Yβ } of M such
that

(i) A and B are compatible: the coordinate changes ψβ ◦ ϕα−1 and ϕα ◦ ψβ−1
are differentiable in their domains, for every α and every β;
(ii) for every β, the local chart ψβ maps Vβ = Vβ ∩ S onto an open subset Yβ
of Rk × {0d−k }.

Identifying Rk × {0d−k } , Rk , we get that the family formed by the restrictions


ψβ : Vβ → Yβ constitutes an atlas for S. Hence, S is a manifold of dimension k.
If M is a manifold of class Cr and the atlases A and B are Cr -compatible, that
is, if all the coordinate changes in (i) are of class Cr , then S is a (sub)manifold
of class Cr .
We say that a map f : M → N between two manifolds is differentiable if
+ %
ψβ ◦ f ◦ ϕα−1 : ϕα (Uα ∩ f −1 (Vβ )) → ψβ Vβ ∩ f (Uα ) (A.4.1)

is a differentiable map for every local chart ϕα : Uα → Xα of M and every local


chart ψβ : Vβ → Yβ of N with f (Uα ) ∩ Vβ = ∅. Moreover, we say that f is of
class Cr if M and N are manifolds of class Cr and every map ψβ ◦ f ◦ ϕα−1 in
(A.4.1) is of class Cr . A diffeomorphism f : M → N is a bijection between two
manifolds such that both f and f −1 are differentiable. If both maps are of class
Cr then we say that the diffeomorphism is of class Cr .
462 Measure theory, topology and analysis

Let Cr (M, N) be the space of maps of class Cr between two manifolds M


and N. We are going to introduce in this space a certain topology, called the
Cr topology, for which two maps are close if and only if they are uniformly
close and the same is true for their derivatives up to order r. The definition
may be given in a very broad context (see Section 2.1 of Hirsch [Hir94]), but
we restrict ourselves to the case when M and N are compact. In this case, the
Cr topology may be defined in the following way.
Fix finite families of local charts ϕi : Ui → Xi of M and ψj : Vj → Yj of N,
such that i Ui = M and j Vj = N. Let δ > 0 be a Lebesgue number for the
open cover {Ui ∩ f −1 (Vj )} of M. For each pair (i, j) such that Ui ∩ f −1 (Vj ) = ∅,
let Ki,j be the set of points whose distance to the complement of Ui ∩ f −1 (Vj ) is
greater than or equal to δ. Then Ki,j is a compact set contained in Ui ∩ f −1 (Vj )
and the union i,j Ki,j is the whole M. Consider

U (f ) = {g ∈ Cr (M, N) : g(Ki,j ) ⊂ Vj for any i, j}.

It is clear that f ∈ U (f ). For each g ∈ U (f ) and each pair (i, j) such that Ki,j is
non-empty, denote by gi,j the restriction of ψj ◦ g ◦ ϕi−1 to the set ϕi (Ki,j ). For
each r ∈ N and ε > 0, define

U r (f , ε) = {g ∈ U (f ) : sup Ds fi,j (x) − Ds gi,j (x) < ε}, (A.4.2)


s,x,i,j

where the supremum is over every s ∈ {1, . . . , r}, every x ∈ ϕi (Ki,j ) and every
pair (i, j) such that Ki,j = ∅. By definition, the family {U r (f , ε) : ε > 0} is a basis
of neighborhoods of each f ∈ Cr (M, N) relative to the Cr topology. Also by
definition, the family {U r (f , ε) : ε > 0 and r ∈ N} is a basis of neighborhoods
of f ∈ C∞ (M, N) relative to the C∞ topology.
The Cr topology has very nice properties: in particular, it admits a countable
basis of open sets and is completely metrizable, that is, it is generated by some
complete distance. An interesting consequence is that Cr (M, N) is a Baire
space: every intersection of a countable family of open dense subsets is dense
in the space. The set Diffeor (M) of diffeomorphisms of class Cr is an open
subset of Cr (M, M) relative to the Cr topology.

A.4.2 Tangent space and derivative


Let M be a manifold. For each p ∈ M, consider the set C(p) of all the curves
c : I → M whose domain is some open interval I containing 0 ∈ R, such that
c(0) = p and c is differentiable at the point 0. The latter means that the map
ϕα ◦ c is differentiable at the point 0 for every local chart ϕα : Uα → Xα with
p ∈ Uα . We say that two curves c1 , c2 ∈ C(p) are equivalent if (ϕα ◦ c1 ) (0) =
(ϕα ◦ c2 ) (0) for every local chart ϕα : Uα → Xα with p ∈ Uα . Actually, if the
equality holds for some local chart then it holds for all the other charts as well.
We denote by [c] the equivalence class of any curve c ∈ C(p).
A.4 Differentiable manifolds 463

The tangent space to the manifold M at the point p is the set of such
equivalence classes. We denote this set by Tp M. For any fixed local chart
ϕα : Uα → Xα with p ∈ Uα , the map
Dϕα (p) : Tp M → Rd , [c] → (ϕα ◦ c) (0)
is well defined and is a bijection. We may use this bijection to identify Tp M
with Rd . In this way, the tangent space acquires the structure of a vector space,
transported from Rd via Dϕα (p). Although this identification Dϕα (p) depends
on the choice of the local chart, the vector space structure on Tp M does not.
That is because, for any other local chart ϕβ : Uβ → Xβ with p ∈ Uβ , the
corresponding map Dϕβ (p) is given by
 
Dϕβ (p) = D ϕβ ◦ ϕα−1 (ϕα (p)) ◦ Dϕα (p).
 
Since D ϕβ ◦ ϕα−1 (ϕα (p)) is a linear isomorphism, it follows that the vector
space structures transported from Euclidean space to Tp M by Dϕα (p) and
Dϕβ (p) coincide, as we stated.
If f : M → N is a differentiable map, its derivative at a point p ∈ M is the
linear map Df (p) : Tp M → Tf (p) N defined by
 
Df (p) = Dψβ (f (p))−1 ◦ D ψβ ◦ f ◦ ϕα−1 (ϕα (p)) ◦ Dϕα (p),
where ϕα : Uα → Xα is a local chart of M with p ∈ Uα and ψβ : Vβ → Yβ is a
local chart of N with f (p) ∈ Vβ . The definition does not depend on the choice
of these local charts.
The tangent bundle to M is the (disjoint) union TM = p∈M Tp M of all the
tangent spaces to M. For each local chart ϕα : Uα → Xα , consider the union
TUα M = p∈Uα Tp M and the map

Dϕα : TUα M → Xα × Rd
that associates with each [c] ∈ TUα M the pair
((ϕα ◦ c)(0), (ϕα ◦ c) (0)) ∈ Xα × Rd .
We consider on TM the (unique) topology that turns every Dϕα into a
homeomorphism. Assuming that the atlas {ϕα : Uα → Xα } of the manifold M
is of class Cr , the coordinate change
   
Dϕβ ◦ Dϕα−1 : ϕα Uα ∩ Uβ × Rd → ϕβ Uα ∩ Uβ × Rd
is a map of class Cr−1 for any α and β such that Uα ∩ Uβ = ∅. So, the tangent
bundle TM is endowed with the structure of a manifold of class Cr−1 and
dimension 2d.
The derivative Df : TM → TN of a differentiable map f : M → N is the map
whose restriction to each tangent space Tp M is given by Df (p). If f is of class
Cr then Df is of class Cr−1 , relative to the manifold structure on the tangent
bundles TM and TN that we introduced in the previous paragraph. For example,
464 Measure theory, topology and analysis

the canonical projection π : TM → M, associating with each v ∈ TM the unique


point p ∈ M such that v ∈ Tp M, is a map of class Cr−1 (Exercise A.4.9).
A vector field on a manifold M is a map that associates with each point
p ∈ M an element X(p) of the tangent space Tp M, that is, a map X : M → TM
such that π ◦ X = id . We say that the vector field is of class Ck , with k ≤ r − 1,
if this map is of class Ck .
Assuming that k ≥ 1, we may apply the theorem of existence and uniqueness
of solutions of ordinary differential equations to conclude that for every point
p ∈ M there exists a unique curve cp : Ip → M such that
• cp (0) = p and cp (t) = X(c(t)) for every t ∈ Ip , and
• Ip is the largest open interval where such a curve can be defined.

If M is compact then Ip = R for any p ∈ M. Moreover, the maps f t : M → M


defined by f t (p) = cp (t) are diffeomorphisms of class Ck , with f 0 = id and
f s ◦ f t = f s+t for any s, t ∈ R. The family {f t : t ∈ R} is called the flow of the
vector field X.

A.4.3 Cotangent space and differential forms


The cotangent space Tp∗ M to a manifold M at a point p is the dual of the tangent
space Tp M, that is, the space of linear functionals ξ : Tp M → R. For any local
chart ϕα : Uα → Xα with p ∈ Uα , the isomorphism Dϕα (p) : Tp M → Rd induces
an isomorphism
Dϕα∗ (p) : Tp∗ M → Rd
as follows. For each i = 1, . . . , d, let dxi = πi ◦ Dϕα (p), where πi : Rd → R is
the projection to the i-th coordinate. Then dxi ∈ Tp∗ M and, in fact, the family
{dx1 , . . . , dxd } is a basis of Tp∗ M. For each ξ ∈ Tp∗ M, define

d
Dϕα∗ (p)ξ = (ξ1 , . . . , ξd ) ⇔ ξ= ξi dxi .
i=1

The cotangent bundle of M is the (disjoint) union T ∗ M = p∈M Tp∗ M of all


the cotangent spaces to M. For each local chart ϕα : Uα → Xα , consider the
union TU∗ α M = p∈Uα Tp∗ M and the map
Dϕα∗ : TU∗ α M → Xα × Rd
defined by Dϕα∗ ξ = (ϕα (p), Dϕα∗ (p)ξ ) if ξ ∈ TP∗ M. It is clear that this is a
bijection. We consider on T ∗ M the unique topology that turns every Dϕα∗ into
a homeomorphism. If {ϕα : Uα → Xα } is an atlas of class Cr for the manifold
M then
{Dϕα∗ : TU∗ α M → Xα × Rd }
is an atlas of class Cr−1 for T ∗ M. So, the cotangent bundle T ∗ M is also
endowed with the structure of a manifold of class Cr−1 and dimension 2d.
A.4 Differentiable manifolds 465

Moreover, the canonical map π ∗ : T ∗ M → M defined by π ∗ | Tp∗ M = p is of


class Cr−1 .
A differential 1-form in M is a differentiable map θ : M → T ∗ M such that
π ∗ ◦ θ = id . In other words, θ assigns to each point p ∈ M a linear functional
(or linear form) θp : Tp M → R that depends differentiably on the point.
More generally, for any 0 ≤ k ≤ d, an alternate k-linear form in Tp M is a
map1
θp : (Tp M)k → R, (v1 , . . . , vk ) → θp (v1 , . . . , vk )
such that θp is linear on each variable vi and
θp (v1 , . . . , vi , vi+1 , . . . , vk ) = −θp (v1 , . . . , vi+1 , vi , . . . , vk )
for any 1 ≤ i < k and any (v1 , . . . , vk ) ∈ (Tp M)k .
Let {dx1 , . . . , dxd } be the basis of the cotangent space associated with a local
chart ϕα : Uα → Xα and let {∂/∂x1 , . . . , ∂/∂xd } be the dual basis of Tp M, defined
by

1 if i = j
dxi (∂/∂xj ) =
0 if i = j.
If i1 . . . , ik ∈ {1, . . . , d} are all distinct, there exists a unique alternate k-linear
form dxi1 ∧ · · · ∧ dxik such that

• dxi1 ∧ · · · ∧ dxik (∂/∂xi1 , . . . , ∂/∂xik ) = 1, and


• dxi1 ∧ · · · ∧ dxik (∂/∂xj1 , . . . , ∂/∂xjk ) = 0 when {i1 , . . . , ik }  = {j1 , . . . , jk }.

The family {dxi1 ∧ · · · ∧ dxik : 1 ≤ i1 < · · · < ik ≤ d} is a basis of the vector space
of alternate k-linear forms in Tp M.
A differential k-form in M is a map θ assigning to each point p ∈ M an
alternate k-linear form in the tangent space Tp M that depends differentiably on
the point. In local coordinates, this may be written as

θp = ai1 ,...,ik (p)dxi1 ∧ · · · ∧ dxik .
1≤i1 <···<ik ≤d

The differentiability condition means that the coefficients ai1 ,...,ik (p) depend
differentiably on the point p.
Assuming that k < d, the exterior derivative of θ is the differential
(k + 1)-form dθ determined by
  ∂ai ,...,i
dθp = 1 k
(p)dxj ∧ dxi1 ∧ · · · ∧ dxik ,
1≤i <···<i ≤d j
∂x j
1 k

where the second sum is over all j ∈ / {i1 , . . . , ik }; one can check that the
expression on the right-hand side does not depend on the choice of the local
chart. A differential k-form θ is closed if dθ = 0 (or else k = d) and it is exact if

1 An alternate 0-linear form is just a real number.


466 Measure theory, topology and analysis

there exists some (k − 1)-form η such that dη = θ (or else k = 0). Every exact
differential form is closed.
For much more information on the subject of differential forms, see the book
of Henri Cartan [Car70].

A.4.4 Transversality
The result that we state next is an important tool for constructing new
manifolds. We say that y ∈ N is a regular value of a differentiable map f : M →
N if the derivative Df (x) : Tx M → Ty N is surjective for every x ∈ f −1 (y). Note
that this holds, automatically, if y is not in the image of f , that is, if f −1 (y) is
the empty set. On the other hand, in order that some point y ∈ f (M) is a regular
value of f it is necessary that dim M ≥ dim N.

Theorem A.4.3. Let f : M → N be a map of class Cr and y ∈ f (M) be a regular


value of f . Then f −1 (y) is a submanifold (not necessarily connected) of class
Cr of M, with dimension equal to dim M − dim N.

Example A.4.4. For any d ≥ 1, the space of square matrices of dimension d


2
with real coefficients is isomorphic to the Euclidean space R(d ) and, hence,
it is a manifold of dimension d2 and class C∞ . The linear group GL(d, R)
of invertible matrices is an open subset of that space and, hence, it is also a
manifold of dimension d2 and class C∞ . The function det : GL(d, R) → R
that maps each matrix to its determinant is of class C∞ and y = 1 is a regular
value (see Exercise A.4.5). Using Theorem A.4.3, it follows that the special
linear group SL(d, R) formed by the matrices with determinant equal to 1 is a
submanifold of class C∞ of GL(d, R), with dimension equal to d2 − 1.

It is possible to generalize Theorem A.4.3, using the notion of transversality.


We say that a submanifold S of N is transverse to f if
 
Df (x) Tx M + Tf (x) S = Tf (x) N for every x ∈ f −1 (S). (A.4.3)
For example, if S is a submanifold of dimension zero, that is, if it consists of a
unique point, then S is transverse to f if and only if that point is a regular value
of f . Therefore, the following statement generalizes Theorem A.4.3:

Theorem A.4.5. Let f : M → N be a map of class Cr and let S be a submanifold


of class Cr of N transverse to f . Then f −1 (S) is a submanifold (not necessarily
connected) of class Cr of M, with dimension equal to dim M − dim N + dim S.

The next theorem asserts that, for every map f : M → N of class Cr with r
sufficiently high, “almost all” points y ∈ N are regular values. We say that a
set X ⊂ N is residual if it contains some countable intersection of open and
dense subsets. Every residual set is dense in the manifold, because manifolds
are Baire spaces. We say that a set Z ⊂ N has volume zero if for every local
A.4 Differentiable manifolds 467

chart ψβ : Vβ → Yβ the image ψβ (Z ∩ Vβ ) is a subset of the Euclidean space


with volume zero, that is, it may be covered by balls in such a way that the sum
of the volumes of those balls is arbitrarily small.

Theorem A.4.6 (Sard). Assume that f : M → N is a map of class Cr with


r > max{0, dim M − dim N}. Then the set of regular points of f is a residual
subset of N and its complement has volume zero.

A.4.5 Riemannian manifolds


A Riemannian metric on a manifold M is a map that associates with each point
p ∈ M an inner product in the tangent space Tp M, that is, a symmetric bilinear
map
·p : Tp M × Tp M → R

such that v ·p v > 0 for every non-zero vector v ∈ Tp M. As part of the definition,
this inner product is required to vary in a differentiable way with the point p, in
the following sense. Consider any local chart ϕα : Uα → Xα of M. As explained
previously, for every p ∈ Uα we may identify Tp M with Rd , through the map
Dϕα (p). Thus, we may view ·p as an inner product in the Euclidean space. Let
e1 , . . . , ed be a basis of Rd . Then the functions gα,i,j (p) = ei ·p ej are required to
be differentiable, for every pair (i, j) and any choice of the local chart ϕα and
the basis e1 , . . . , ed .
We call a Riemannian manifold any manifold endowed with a Riemannian
metric. Every submanifold S of a Riemannian manifold M inherits the structure
of a Riemannian manifold, given by the restriction of the inner product ·p of
M to the tangent subspace Tp S of each point p ∈ S. Every compact manifold
admits (infinitely many) Riemannian metrics. That follows from the theorem
of Whitney (see Section 1.3 of Hirsch [Hir94]), according to which every
compact manifold may be realized as a submanifold of some Euclidean space.
Actually, this remains true in the much larger class of paracompact manifolds
(which we do not define here): every paracompact manifold of dimension
d is diffeomorphic to some submanifold of R2d . In particular, paracompact
manifolds are always metrizable.
Starting from the Riemannian metric, we may define the length of a
differentiable curve γ : [a, b] → M, by
 b
length(γ ) = γ  (t)γ (t) dt, where vp = (v ·p v)1/2 .
a

This also allows us to define on the manifold M the following distance


associated with the Riemannian metric: the distance d(p, q) between two points
p, q ∈ M is the infimum of the lengths of all the differentiable curves connecting
the two points. We say that a differentiable curve γ : [a, b] → M is minimizing
468 Measure theory, topology and analysis

if it realizes the distance between its endpoints, that is, if

length(γ ) = d(γ (a), γ (b)).

Any two points p, q ∈ M are connected by some minimizing curve; in other


words, the infimum in the definition of d(p, q) is always realized.
A differentiable curve γ : I → M defined on an open interval I is called
a geodesic if it is locally minimizing, in the following sense: for every c ∈ I
there exists δ > 0 such that the restriction of γ to the interval [c − δ, c + δ] is
minimizing. Every minimizing curve is a geodesic, but the converse is not true:
for example, the great circles are geodesics on the sphere S2 , but closed curves
cannot be minimizing. An important fact is that if γ is a geodesic then the norm
γ  (t)γ (t) is constant on the domain I. The theory of ordinary differentiable
equations may be used to show that for every p ∈ M and every v ∈ Tp M there

exists a unique geodesic γp,v : Ip,v → M such that γp,v (0) = p, γp,v (0) = v and
Ip,v is a maximal interval such that γp,v is locally minimizing.
If the manifold M is compact then Ip,v = R for every p ∈ M and every v ∈
Tp M. Then we define the exponential map at each point p ∈ M:

expp : Tp M → M, v → γp,v (1).

This is a differentiable map and its derivative at v = 0 is the identity


transformation on the tangent space Tp M. We also define the geodesic flow
on the tangent bundle:

f t : TM → TM, (p, v) → (γp,v (t), γp,v (t)).

Most of the time, one considers the restriction of the geodesic flow to the unit
tangent bundle T 1 M = {(p, v) ∈ TM : vp = 1}. This is well defined since,
as we mentioned before, the norm of the velocity vector of any geodesic is
constant.

A.4.6 Exercises
A.4.1. Check that every set X with the cardinality of R may be endowed with the
structure of a differentiable manifold of class C∞ and dimension d, for any d ≥ 1.
A.4.2. Consider the differentiable manifolds M = (R, A) and N = (R, B), where A is
the atlas consisting of the map φ(x) = x and B is the atlas consisting of the map
ψ(x) = x3 . Is the map f : M → N defined by f (x) = x a diffeomorphism between
these manifolds?
A.4.3. A topological space is path connected if any two points are connected by some
continuous curve. Show that every (connected) manifold is path connected.
A.4.4. For each d ≥ 2, the projective space of dimension d is the set Pd of all subspaces
of Rd+1 with dimension 1. Equivalently, Pd is the quotient space of Rd+1 \ {0}
for the equivalence relation defined by:

(x0 , . . . , xd ) ∼ (y0 , . . . , yd ) ⇔ there exists c = 0 such that xi = cyi for every i.


A.5 Lp (μ) spaces 469

Show that the family of maps ϕi : Ui → Rd , i = 0, . . . , d defined by

Ui = {[x0 : · · · : xd ] ∈ Pd : xi  = 0}

(where [x0 : · · · : xd ] denotes the equivalence class of (x0 , . . . , xd )) and


 
x0 xi−1 xi+1 xd
ϕi ([x0 : · · · : xd ]) = ,..., , ,..., ,
xi xi xi xi
constitutes an atlas of class C∞ and dimension d for Pd .
A.4.5. Check the claims in Example A.4.4.
A.4.6. Let M and N be two compact (connected) manifolds with the same dimension.
A map f : M → N of class C1 is a local diffeomorphism if the derivative Df (x) :
Tx M → Tf (x) N is an isomorphism for every x ∈ M. Show that in that case there
exists an integer k ≥ 1 such that every y ∈ M has exactly k pre-images:

#f −1 (y) = k for every y ∈ N.

[Observation: The number k is called the degree of f and is denoted degree(f ).]
A.4.7. Consider on R+ = {x ∈ R : x > 0} the Riemannian metric defined by u ·x v =
uv/x2 . Calculate the distance d(a, b) between any two points a, b ∈ R+ .
A.4.8. Let M and N be submanifolds of Rm+n with dim M = m and dim N = n. Show
that there exists a set Z ⊂ Rm+n with volume zero such that, for every v in the
complement of Z, the translate M + v is transverse to N:

Tx (M + v) + Tx N = Rd for every x ∈ (M + v) ∩ N.

A.4.9. Show that if M is a manifold of class Cr then the canonical projection π : TM →


M is a map of class Cr−1 .

A.5 Lp (µ) spaces


In this appendix we review certain Banach spaces formed by functions with
special integrability properties. Throughout, (X, B, μ) is a measure space.
Recall that a Banach space is a vector space endowed with a norm relative
to which the space is complete. We also state some properties of the norms in
these spaces.

A.5.1 Lp (µ) spaces with 1 ≤ p < ∞


Given any p ∈ [1, ∞), we say that a function f : X → C is p-integrable with
respect to μ if the function |f |p is integrable with respect to μ. For p = 1 this
is the same as saying that the function f is integrable (Definition A.2.4 and
Proposition A.2.7).

Definition A.5.1. We denote by Lp (μ) the set of all complex functions


p-integrable with respect to μ, modulo the equivalence relation that identifies
any two functions that are equal at μ-almost every point.
470 Measure theory, topology and analysis

Note that if the measure μ is finite, which is the case in most of our
examples, then all bounded measurable functions are in Lp (μ):

|f |p dμ ≤ (sup |f |)p m(X) < ∞.

In particular, if X is a compact topological space then every continuous


function is in Lp (μ). In other words, the space C0 (X) of all continuous
functions is contained in Lp (μ) for every p ≥ 1.
For every function f ∈ Lp (μ), define the Lp -norm of f by:
  1p
f p = |f | dμ .
p

The next theorem asserts that  · p turns Lp (μ) into a Banach space:

Theorem A.5.2. The set Lp (μ) is a complex vector space. Moreover,  · p is


a norm in Lp (μ) and this norm is complete.

The most interesting part of the proof of this theorem is to establish the
triangle inequality, which in this context is known as the Minkowski inequality:

Theorem A.5.3 (Minkowski inequality). Let f , g ∈ Lp (μ). Then:


  1p   1p   1p
|f + g|p dμ ≤ |f |p dμ + |g|p dμ .

In Exercises A.5.2 and A.5.5 we invite the reader to prove the Minkowski
inequality and to complete the proof of Theorem A.5.2.

A.5.2 Inner product in L2 (µ)


The case p = 2 deserves special attention. The reason is that the norm  · 2
introduced in the previous section arises from an (Hermitian) inner product.
Indeed, consider: 
f ·g = f ḡ dμ. (A.5.1)

It follows from the properties of the Lebesgue integral that this expression does
define an inner product on L2 (μ). Moreover, this product gives rise to the norm
 · 2 through:
f 2 = (f · f )1/2 .
In particular, we have the Cauchy–Schwarz inequality:

Theorem A.5.4 (Cauchy–Schwarz Inequality). For every f , g ∈ L2 (μ) we have


that f ḡ ∈ L1 (μ) and
    1/2   1/2
 
 f ḡ dμ ≤ |f ḡ| dμ ≤ |f | dμ
2
|g| dμ
2
.
 
A.5 Lp (μ) spaces 471

This inequality has the following interesting consequence. Assume that the
measure μ is finite and consider any f ∈ L2 (μ). Then, taking g ≡ 1,
   1/2   1/2
|f | dμ = |f ḡ| dμ ≤ |f | dμ
2
1 dμ < ∞. (A.5.2)

This proves that every function in L2 (μ) is also in L1 (μ). In fact, when the
measure μ is finite one has Lp (μ) ⊂ Lq (μ) whenever p ≥ q (Exercise A.5.3).
The next result is a generalization of the Cauchy–Schwarz inequality for all
values of p > 1:
Theorem A.5.5 (Hölder inequality). Given 1 < p < ∞, consider q > 1 defined
by the relation 1p + 1q = 1. Then, for every f ∈ Lp (μ) and every g ∈ Lq (μ), we
have that f ḡ ∈ L1 (μ) and
   1p   1q
|f ḡ| dμ ≤ |f |p dμ |g|q dμ .

A.5.3 Space of essentially bounded functions


Next, we extend the definition of Lp (μ) to the case p = ∞. For that we need
the following notion. We say that a function f : X → C is essentially bounded
with respect to μ if there exists some constant K > 0 such that |f (x)| ≤ K at
μ-almost every point. Then the infimum of all such constants K is called the
essential supremum of f and is denoted by supessμ (f ).
Definition A.5.6. We denote by L∞ (μ) the set of all complex functions
essentially bounded with respect to μ, identifying any two functions that
coincide at μ-almost every point.
We endow L∞ (μ) with the following norm:
f ∞ = supessμ (f ).
The conclusion of Proposition A.5.2 remains valid for p = ∞ (Exercise A.5.5):
the space L∞ (μ) is a Banach space for the norm  · ∞ . Clearly, if μ is a finite
measure then L∞ (μ) ⊂ Lp (μ) for any p ≥ 1.
The dual of a complex Banach space E is the space E∗ of all continuous
linear functionals φ : E → C, endowed with the norm
 |φ(v)| 
φ = sup : v ∈ E \ {0} . (A.5.3)
v
The Hölder inequality (Theorem A.5.5) leads to the following explicit
characterization of the dual space of Lp (μ) for every p < ∞:
Theorem A.5.7. For each p ∈ [1, ∞) consider q ∈ (1, ∞] defined by the
relation 1p + 1q = 1. The map Lq (μ) → Lp (μ)∗ defined by g → f → fg dμ is
an isomorphism and an isometry between Lq (μ) and the dual space of Lp (μ).
472 Measure theory, topology and analysis

This statement is false for p = ∞: in general, the dual space of L∞ (μ) is not
isomorphic to L1 (μ).

A.5.4 Convexity
We say that a function φ : I → R defined on an interval I of the real line is
convex if
φ(tx + (1 − t)y) ≤ tφ(x) + (1 − t)φ(y)
for every x, y ∈ I and t ∈ [0, 1]. Moreover, we say that φ is concave if −φ
is convex. For functions that are twice differentiable we have the following
practical criterion (Exercise A.5.1): φ is convex if φ  (x) ≥ 0 for every x ∈ I
and it is concave if φ  (x) ≤ 0 for every x ∈ I.
Theorem A.5.8 (Jensen inequality). Let φ : I → R be a convex function. If μ
is a probability measure on X and f ∈ L1 (μ) is such that f dμ ∈ I, then:
  
φ f dμ ≤ φ ◦ f dμ.

Example A.5.9. For any probability measure μ and any integrable positive
function f , we have  
log f dμ ≥ log f dμ.

Indeed, this corresponds to the Jensen inequality for the function φ : (0, ∞) →
R given by φ(x) = − log x. Note that φ is convex: φ  (x) = 1/x2 > 0 for every x.
Example A.5.10. Let φ : R → R be a convex function, (λi )i be a sequence

of non-negative real numbers satisfying ∞i=1 λi ≤ 1 and (ai )i be a bounded
sequence of real numbers. Then
/∞ 0 ∞
 
φ λi ai ≤ λi φ(ai ). (A.5.4)
i=1 i=1

This may be seen as follows. Consider X = [0, 1] endowed with the Lebesgue

measure μ. Let f : [0, 1] → R be a function of the form f = ∞ i=1 ai XEi , where
the Ei are pairwise disjoint measurable sets such that μ(Ei ) = λi . The Jensen
inequality applied to this function f gives precisely the relation (A.5.4).

A.5.5 Exercises
A.5.1. Consider any function ϕ : (a, b) → R. Show that if ϕ is twice differentiable and
φ  ≥ 0 then ϕ is convex. Show that if ϕ is convex then it is continuous.
A.5.2. Consider p, q > 1 such that 1/p + 1/q = 1. Prove:
(a) The Young inequality: ab ≤ ap /p + aq /q for every a, b > 0.
(b) The Hölder inequality (Theorem A.5.5).
(c) The Minkowski inequality (Theorem A.5.3).
A.6 Hilbert spaces 473

A.5.3. Show that if μ is a finite measure then we have Lq (μ) ⊂ Lp (μ) for every 1 ≤ p <
q ≤ ∞.
A.5.4. Let μ be a finite measure and f ∈ L∞ (μ) be different from zero. Show that
|f |n+1 dμ
f ∞ = lim .
n |f |n dμ
A.5.5. Show that a normed vector space (V,  · ) is complete if and only if every
 
series k vk that is absolutely summable (meaning that k vk  converges) is
convergent. Use this fact to show that if μ is a probability measure then  · p is
a complete norm on Lp (μ) for every 1 ≤ p ≤ ∞.
A.5.6. Show that if μ is a finite measure and 1/p + 1/q = 1 with 1 ≤ p < ∞ then the
map  : Lq (μ) → Lp (μ)∗ , (g)f = fg dμ is an isomorphism and an isometry.
A.5.7. Show that if X is a metric space then, given any Borel probability measure μ,
the set C0 (X) of all continuous functions is dense in Lp (μ) for every 1 ≤ p ≤
∞. Indeed, the same holds for the subset of all uniformly continuous bounded
functions.
A.5.8. Let f , g : X → R be two positive measurable functions such that f (x)g(x) ≥ 1 for
every x. Show that f dμ g dμ ≥ 1 for every probability measure μ.

A.6 Hilbert spaces


Let H be a vector space, real or complex. An (Hermitian) inner product on H
is a map (u, v) → u · v from H × H to the scalar field (R or C, respectively)
satisfying: for any u, v, w ∈ H and any scalar λ,

1. (u + w) · v = u · v + w · v and u · (v + w) = u · v + u · w;
2. (λu) · v = λ(u · v) and u · (λv) = λ̄(u · v);
3. u · v = v · u;
4. u · u ≥ 0 and u · u = 0 if and only if u = 0.

Then we can define the norm of a vector u ∈ H to be u = (u · u)1/2 .


A Hilbert space is a vector space endowed with an inner product whose
norm  ·  is complete: relative to  ·  every Cauchy sequence is convergent.
Thus, in particular, (H, ·) is a Banach space. A standard example of a Hilbert
space is the space L2 (μ) of square-integrable functions that we introduced in
Appendix A.5.2.

Given v ∈ H and any family (vα )α of vectors of H, we say that v = α vα if
for every ε > 0 there exists a finite set I such that
 
  
v − vβ 
  ≤ ε for every finite set J ⊃ I.
β∈J

Given any family (Hα )α of subspaces of H, the set of all vectors of the form

v = α vα with vα ∈ Hα for every α is a subspace of H (see Exercise A.6.2).

It is called the sum of the family (Hα )α and it is denoted by α Hα .
474 Measure theory, topology and analysis

A.6.1 Orthogonality
Let H be a Hilbert space. Two vectors u, v ∈ H are said to be orthogonal if
u · v = 0. We call a subset of H orthonormal if its elements have norm 1 and
are pairwise orthogonal.
A Hilbert basis of H is an orthonormal subset B = {vβ } such that the set of
all (finite) linear combinations of elements of B is dense in H. For example,
the Fourier basis
{x → e2πikx : k ∈ Z} (A.6.1)
is a Hilbert basis of the space L2 (m) of all measurable functions on the unit
circle whose square is integrable with respect to the Lebesgue measure.
A Hilbert basis B = {vβ } is usually not a basis of the vector space in the
usual sense (Hammel basis): it is usually not true that every vector of H is a
finite linear combination of the elements of B. However, every v ∈ H may be
written as an infinite linear combination of the elements of the Hilbert basis:
 
v= (v · vβ )vβ and, moreover, v2 = |v · vβ |2 .
β β

In particular, v · vβ = 0 except, possibly, for a countable subset of values of β.


Every orthonormal subset of H may be extended to a Hilbert basis. In
particular, Hilbert bases always exist. Moreover, any two Hilbert bases have
the same cardinal, which is called the Hilbert dimension of H. The Hilbert
dimension depends monotonically on the space: if H1 is a subspace of H2 then
dim H1 ≤ dim H2 . We say that two Hilbert spaces are isometrically isomorphic
if there exists some isomorphism between the two that also preserves the inner
product. A necessary and sufficient condition is that the two spaces have the
same Hilbert dimension.
A Hilbert space is said to be separable if it admits some countable subset
that is dense for the topology defined by the norm. This happens if and only if
the Hilbert dimension is either finite or countable. In particular, all separable
Hilbert spaces with infinite Hilbert dimension are isometrically isomorphic.
For this reason, one often finds in the literature (especially in the area of
mathematical physics) mentions of the Hilbert space, as if there were only one.
1
Given any family (Hα )α of Hilbert spaces, we denote by α Hα their
.
orthogonal direct sum, that is, the vector space of all (vα )α ∈ α Hα such that

α vα α < ∞ (this implies that vα = 0 except, possibly, for a countable set
2

of values of α), endowed with the inner product



(vα )α · (wα )α = vα w̄α .
α

The orthogonal complement of a subset S of a Hilbert space H is the set S⊥


of all the vectors of H that are orthogonal to every vector of S. It is easy to
see that S⊥ is a closed subspace of H (Exercise A.6.7). If S itself is a closed
subspace of H then S = (S⊥ )⊥ and every vector v ∈ H may be decomposed as a
A.6 Hilbert spaces 475

sum v = s + s⊥ of some s ∈ S and some s⊥ ∈ S⊥ . Moreover, this decomposition


is unique and the vectors s and s⊥ are the elements of S and S⊥ , respectively,
that are closest to v.

A.6.2 Duality
A linear functional on a Hilbert space H (or, more generally, on a Banach
space) is a linear map from H to the scalar field (R or C). It is said to be
bounded if
 |φ(v)| 
φ = sup : v = 0 < ∞.
v
This is equivalent to saying that the linear functional is continuous, relative to
the topology defined by the norm of H (see Exercise A.6.3). The dual space
of a Hilbert space H is the vector space H ∗ formed by all the bounded linear
functionals. The function φ → φ is a complete norm on H ∗ and, hence, it
endows the dual with the structure of a Banach space. The map
 
h : H → H∗, w → v → v · w (A.6.2)

is a bijection between the two spaces and it preserves the norms. In particular,
h is a homeomorphism. Moreover, it satisfies h(w1 + w2 ) = h(w1 ) + h(w2 ) and
h(λw) = λ̄h(w).
The weak topology in H is the smallest topology relative to which all the
linear functionals v → v · w are continuous. In terms of sequences, it can be
characterized as follows:

(wn )n → w weakly ⇔ (v · wn )n → v · w for every v ∈ H.

The weak∗ topology in the dual space H ∗ is the smallest topology relative to
which φ → φ(v) is continuous for every v ∈ H.
It is known from the theory of Banach spaces (theorem of Banach–Alaoglu)
that every bounded closed subset of the dual space is compact for the weak∗
topology. In the special case of Hilbert spaces, the weak topology in the space
H is homeomorphic to the weak∗ topology in the dual space H ∗ : the map h in
(A.6.2) is also a homeomorphism for these topologies. Since h preserves the
class of bounded sets, it follows that the weak topology in the space H itself
enjoys the property in the theorem of Banach–Alaoglu:

Theorem A.6.1 (Banach–Alaoglu). Every bounded closed subset of a Hilbert


space H is compact for the weak topology in H.

A linear operator L : H1 → H2 between two Hilbert spaces is continuous (or


bounded) if
 
|L(v)|
L = sup : v = 0
v
476 Measure theory, topology and analysis

is finite. The adjoint of a continuous linear operator is the linear operator L∗ :


H2 → H1 defined by

v · Lw = L∗ v · w for every v, w ∈ H.

The adjoint operator is continuous, with L∗  = L and L∗ L = LL∗  =
L2 . Moreover, (L∗ )∗ = L and (L1 + L2 )∗ = L1∗ + L2∗ and (λL)∗ = λ̄L∗ (in
Exercise A.6.5 we invite the reader to prove these facts).
A continuous linear operator L : H → H is self-adjoint if L = L∗ . More
generally, L is normal if it satisfies L∗ L = LL∗ . We are especially interested in
the case when L is unitary, that is, L∗ L = id = LL∗ . We call linear isometry
to every linear operator L : H → H such that L∗ L = id . Hence, the unitary
operators are the linear isometries that are also normal operators.

A.6.3 Exercises
A.6.1. Let H be a Hilbert space. Prove:
(a) That every ball (either open or closed) is a convex subset of H.
(b) The parallelogram identity: v + w2 + v − w2 = v2 + w2 for any
v, w ∈ H.
(c) The polarization identity: 4(v · w) = v + w2 − v − w2 (real case) or
4(v · w) = (v + w2 − v − w2 ) + i(v + iw2 − v − iw2 ) (complex
case).
A.6.2. Show that, given any family (Hα )α of subspaces of a Hilbert space H, the set

of all the vectors of the form v = α vα with vα ∈ Hα for every α is a vector
subspace of H.
A.6.3. Show that a linear operator L : E1 → E2 between two Banach spaces is
continuous if and only if there exists C > 0 such that L(v)2 ≤ Cv1 for every
v ∈ E1 , where  · i denotes the norm in the space Ei (we say that L is a bounded
operator).
A.6.4. Consider the Hilbert space L2 (μ). Let V be the subspace formed by the constant
functions. What is the orthogonal complement of V? Determine the (orthogonal)
projection to V of an arbitrary function g ∈ L2 (μ).
A.6.5. Prove that if L : H → H is a bounded operator on a Hilbert space H then the
adjoint operator L∗ is also bounded and L∗  = L and L∗ L = LL∗  = L2
and (L∗ )∗ = L.
A.6.6. Show that if K is a closed convex subset of a Hilbert space then for every z ∈ H
there exists a unique v ∈ K such that z − v = d(z, K).
A.6.7. Let S be a subspace of a Hilbert space H. Prove that:
(a) The orthogonal complement S⊥ of S is a closed subspace of H and it
coincides with the orthogonal complement of the closure S̄. Moreover,
(S⊥ )⊥ = S̄.
(b) Every v ∈ H may be written, in a unique fashion, as a sum v = s + s⊥ of
some s ∈ S̄ and some s⊥ ∈ S⊥ . The two vectors s and s⊥ are the elements of
S and S⊥ that are closest to v.
A.7 Spectral theorems 477

A.6.8. Let E be a closed subspace of a Hilbert space H. Show that E is also closed in the
weak topology. Moreover, U(E) is a closed subspace of H, for every isometry
U : H → H.
A.6.9. Show that a linear operator L : H → H on a Hilbert space H is an isometry if and
only if L(v) = v for every v ∈ H. Moreover, L is a unitary operator if and
only if L is an isometry and is invertible.

A.7 Spectral theorems


Let H be a complex Hilbert space. The spectrum of a continuous linear operator
L : H → H is the set spec(L) of all numbers λ ∈ C such that L − λ id is
not an isomorphism. The spectrum is closed and it is contained in the closed
disk of radius L around 0 ∈ C. In particular, spec(L) is a compact subset
of the complex plane. When H has finite dimension, spec(L) consists of the
eigenvalues of L, that is, the complex numbers λ such that L − λ id is not
injective. In general, the spectrum is strictly larger than the set of eigenvalues
(see Exercise A.7.2).

A.7.1 Spectral measures


By definition, a projection in H is a continuous linear operator P : H → H
that is idempotent (P2 = P) and self-adjoint (P∗ = P). Then the image and the
kernel of P are closed subspaces of H and they are orthogonal complements to
each other. In fact, the image coincides with the set of all fixed points of P.
Consider any map E associating with each measurable subset of the plane
C a projection in H. Such a map is called a spectral measure if it satisfies
E(C) = id and   
E Bn = E(Bn )
n∈N n∈N

whenever the Bn are pairwise disjoint (σ -additivity). Then, given any v, w ∈ H,


the function
Ev · w : B → E(B)v · w (A.7.1)
is a complex measure in C. Clearly, it depends on the pair (v, w) in a bilinear
fashion.
We call the support of a spectral measure E the set supp E of all the points
z ∈ C such that E(V)  = 0 for every neighborhood V of z. Note that the support
is always a closed set. Moreover, the support of the complex measure Ev · w is
contained in supp E for every v, w ∈ H.

Example A.7.1. Consider {λ1 , . . . , λs } ⊂ C and let V1 , . . . , Vs be a finite family


of subspaces of Cd , pairwise orthogonal and such that Cd = V1 ⊕ · · · ⊕ Vs .
For each set J ⊂ {1, . . . , s}, denote by PJ the projection in Cd whose image is
478 Measure theory, topology and analysis
1
j∈J Vj . For each measurable set B ⊂ C define

E(B) : Cd → Cd , E(B) = PJ(B) ,


where J(B) is the set of all j ∈ {1, . . . , s} such that λj ∈ B. The function E is a
spectral measure.
Example A.7.2. Let μ be a probability measure in C and H = L2 (μ) be the
space of all complex functions whose square is integrable with respect to μ.
For each measurable set B ⊂ C, let
E(B) : L2 (μ) → L2 (μ), ϕ → XB ϕ.
Each E(B) is a projection and the function E is a spectral measure.
The next lemma collects a few simple properties of the spectral measures:
Lemma A.7.3. Let E be a spectral measure and A, B be measurable subsets of
C. Then:

1. E(∅) = 0 and E(supp E) = id ;


2. if A ⊂ B then E(A) ≤ E(B) and E(B \ A) = E(B) − E(A);
3. E(A ∪ B) + E(A ∩ B) = E(A) + E(B);
4. E(A)E(B) = E(A ∩ B) = E(B)E(A).

In what follows we always assume that E is a spectral measure with compact


support. Then the support of every complex measure Ev · w is also compact.
Consequently, the integral z d(E(z)v · w) is well defined and it is a bilinear
function of (v, w). Hence, there exists a bounded linear operator L : H → H
such that 
Lv · w = z d(E(z)v · w) for every v, w ∈ H. (A.7.2)

We write, in shorter form: 


L= z dE(z). (A.7.3)

More generally, given any bounded measurable function ψ in the support of


the spectral measure E, there exists a bounded linear operator ψ(L) : H → H
that is characterized by

ψ(L)v · w = ψ(z) d(E(z)v · w) for every v, w ∈ H. (A.7.4)

We write 
ψ(L) = ψ(z) dE(z). (A.7.5)

Lemma A.7.4. Let E be a spectral measure with compact support. Given


bounded measurable functions ϕ, ψ and numbers α, β ∈ C,

(1) (αϕ + βψ)(z)dE(z) = α ϕ dE(z) + β ψ dE(z);



(2) ϕ̄(z) dE(z) = ϕ(z) dE(z) ;
A.7 Spectral theorems 479
   
(3) (ϕψ)(z) dE(z) = ϕ(z) dE(z) ◦ ψ(z) dE(z) .
In particular, by part (3) of this lemma,
 j 
Lj = z dE(z) = zj dE(z) for every j ∈ N. (A.7.6)

Analogously, using also part (2) of the lemma,


    

LL = z dE(z) z̄ dE(z) = |z|2 dE(z)
    (A.7.7)
= z̄ dE(z) z dE(z) = L∗ L.

Consequently, the linear operator defined by (A.7.3) is normal. Conversely, the


spectral theorem asserts that every normal operator may be written in this way:
Theorem A.7.5 (Spectral). For every normal operator L : H → H there exists
a spectral measure E such that L = z dE(z). This measure is unique and its
support coincides with the spectrum of L. In particular, L is unitary if and only
if supp E is contained in the unit circle {z ∈ C : |z| = 1}.
Example A.7.6 (Spectral theorem in finite dimension). Let H be a complex
Hilbert space with finite dimension. Then for every normal operator L : H → H
there exists a basis of H formed by eigenvectors of L. Let λ1 , . . . , λs be the
eigenvalues of L. The eigenspaces Vj = ker(L − λj id ) are pairwise orthogonal,
1
because L is normal. Moreover, by Theorem A.7.5, the direct sum sj=1 Vj is
the whole of H. So
s
L= λj πj
j=1

where πj denotes the orthogonal projection to Vj . In other words, the spectral


measure E of the operator L is given by E({λj }) = πj for every j = 1, . . . , s and
E(B) = 0 if B contains no eigenvalue of L.
Example A.7.7. Let (σα )α∈A be any family of finite measures in the unit circle
1
{z ∈ C : |z| = 1}. Consider H = α∈A L2 (σα ) and the linear operator
L : H → H, (ϕα )α → (z → zϕα (z))α .
Consider the spectral measure E given by
E(B) : H → H, (ϕα )α → (XB ϕα )α
(compare with Example A.7.2). Then, L = z dE(z). Indeed, the definition of

E gives that Eϕ · ψ = α ϕα ψ̄α σα for every ϕ = (ϕα )α and ψ = (ψα )α in the
space H. Then,
 
Lϕ · ψ = zϕα (z)ψ̄α (z) dσα (z) = z d(E(z)ϕ · ψ) (A.7.8)
α

for every ϕ, ψ.
480 Measure theory, topology and analysis

We say that λ ∈ C is an atom of the spectral measure if E({λ})  = 0 or,


equivalently, if there exists some non-zero vector ω ∈ H such that E({λ})ω = 0.
The proof of the next proposition is outlined in Exercise A.7.4.
Proposition A.7.8. Every eigenvalue of L is an atom of the spectral measure
E. Conversely, if λ is an atom of E then λ is an eigenvalue of the operator L
and every non-zero vector of the form v = E({λ})ω is an eigenvector.

A.7.2 Spectral representation


Theorem A.7.5 shows that normal linear operators on a Hilbert space are
essentially the same thing as spectral measures in that space. Theorems of
this type, establishing a kind of dictionary between two classes of objects that
a priori do not seem to be related, are among the most fascinating results in
mathematics. Of course, just how useful such a dictionary is to study one of
those classes (normal linear operators, say) depends on to what extent we are
capable of understanding the other one (spectral measures, in this case). In the
present situation this is handled, in a most satisfactory way, by the next result,
which exhibits a canonical form (inspired by Example A.7.2) in which every
normal linear operator may be written.
As before, we use ⊕ to denote the orthogonal direct sum of Hilbert spaces.
Given any cardinal χ , finite or infinite, and a Hilbert space V, we denote by
V χ the orthogonal direct sum of χ copies of V.
Theorem A.7.9 (Spectral representation). Let L : H → H be a normal linear
operator. Then there exist mutually singular finite measures (σj )j with support
in the spectrum of L, there exist cardinals (χj )j and there exists a unitary
1
operator U : H → j L2 (σj )χj , such that the conjugate ULU −1 = T is given by:
   
T: L2 (σj )χj → L2 (σj )χj , (ϕj,l )j,l → z → zϕj,l (z) j,l . (A.7.9)
j j

We call (A.7.9) the spectral representation of the normal operator L. Let us


point out that the measures σj in Theorem A.7.9 are not uniquely determined.
However, the spectral representation is unique, in the following sense.
Call the multiplicity function of the operator L the function associating with
each finite measure θ in C the smallest cardinal χj such that the measures θ
and σj are not mutually singular. One can prove that this function is uniquely
determined by the operator L, that is, it does not depend on the choice of the
measures σj in the statement. Moreover, two normal operators are conjugate by
some unitary operator if and only if they have the same multiplicity function.
Example A.7.10 (Spectral representation in finite dimension). Let us go back
to the setting of Example A.7.6. For each j = 1, . . . , s, let σj be the Dirac mass
at the eigenvalue λj and χj be the dimension of the eigenspace Vj . Note that
the space L2 (σj ) has dimension 1. Hence, we may choose a unitary operator
A.7 Spectral theorems 481

Uj : Vj → L2 (σj ), for each j = 1, . . . , s. Since L = λj id restricted to Vj , we have


that Tj = Uj LUj−1 = λj id , that is,
     
Tj : (ϕα )α → z → λj ϕα (z) α = z → zϕα (z) α .
In this way, we have found a unitary operator

s
U:C → d
L2 (σj )χj
j=1

such that T = ULU −1 is a spectral representation of L.

A.7.3 Exercises
A.7.1. Let T : E → E be a Banach space isomorphism, that is, a continuous linear
bijection whose inverse is also continuous. Show that T + H is a Banach space
isomorphism for every continuous linear map H : E → E such that H T −1  <
1. Use this fact to prove that the spectrum of every continuous linear operator
L : E → E is a closed set and is contained in the closed disk of radius L around
the origin.
A.7.2. Show that if L : H → H is a linear operator in a Hilbert space H with finite
dimension then spec(L) consists of the eigenvalues of L, that is, the complex
numbers λ for which L − λid is not injective. Give an example, in infinite
dimension, such that the spectrum is strictly larger than the set of eigenvalues.
A.7.3. Prove Lemma A.7.3.
A.7.4. Prove Proposition A.7.8, along the following lines:
(a) Assume that Lv = λv for some v  = 0. Consider the functions

(z − λ)−1 if |z − λ| > 1/n
ϕn (z) =
0 otherwise.

Show that ϕn (L)(L − λid ) = E({z : |z − λ| > 1/n}) for every n. Conclude that
E({λ})v = v and, consequently, λ is an atom of E.
(b) Assume that there exists w ∈ H such that v = E({λ})w is non-zero. Show
that, given any measurable set B ⊂ C,

v if λ ∈ B
E(B)v =
0 if λ ∈ / B.

Conclude that Lv = λv and, consequently, λ is an eigenvalue of L.


A.7.5. Let (σj )j be the family of measures given in Theorem A.7.9. Given any
measurable set B ⊂ C, check that E(B) = 0 if and only if σj (B) = 0 for every
j. Therefore, given any measure η in C, we have that E  η if and only if σj  η
for every j.
Hints or solutions for selected exercises

1.1.2. Use Exercise A.3.5 to approximate characteristic functions by continuous


functions.
1.2.5. Show that if N > 1/μ(A), then there exists j ∈ VA with 0 ≤ j ≤ N. Adapting
the proof of the previous statement, conclude that if K is a set of non-negative
integers with #K > 1/μ(A), then we may find k1 , k2 ∈ K and n ∈ VA such that
n = k1 − k2 . That is, the set K − K = {k1 − k2 ; k1 , k2 ∈ K} intersects VA . To
conclude that S is syndetic assume, by contradiction, that for every n ∈ N there
is some number ln such that {ln , ln +1, . . . , ln +n}∩VA = ∅. Consider an element
k1 ∈/ VA and construct, recursively, a sequence kj+1 = lkj + kj . Prove that the set
K = {k1 , . . . , kN } is such that (K − K) ∩ VA = ∅.
1.2.6. Otherwise, there exists k ≥ 1 and b > 1 such that the set B = {x ∈ [0, 1] :
n|f n (x) − x| > b for every n ≥ k} has positive measure. Let a ∈ B be a density
point of B. Consider E = B ∩ B(a, r), for r small. Get a lower estimate for the
return time to E of any point of x ∈ E and use the Kac̆ theorem to reach a
contradiction.
1.3.5. Consider the sequence log10 an , where log10 denotes the base 10 logarithm, and
observe that log10 2 is an irrational number.
1.3.12. Consider orthonormal bases {v1 , . . . , vd }, at x, and {w1 , . . . , wd }, at f (x), such
that v1 and w1 are orthogonal to Hc . Check that grad H(f (x)) · Df (x)v =
grad H(x) · v for every v. Deduce that the matrix of Df (x) with respect to those
bases has the form
⎛ ⎞
α 0 ··· 0
⎜ β ⎟
⎜ 2 γ2,2 · · · γ2,d ⎟

Df (x) = ⎜ . .. ⎟
. .. .. ⎟,
⎝ . . . . ⎠
βd γd,2 ··· γd,d

with  grad H(f (x)) |α| =  grad H(x). Note that  = (γi,j )i,j is the matrix of
D(f | Hc ) and observe that | det | =  grad H(x)/ grad H(f (x)). Using the
formula of change of variables, conclude that f | Hc preserves the measure
ds/ grad H.
1.4.4. Choose a set E ⊂ M with measure less than ε/n and, for each k ≥ 1, let Ek be
the set of points x ∈ E that return to E in exactly k iterates. Take for B the union
of the sets Ek , with k ≥ n, of the n-th iterates of the sets Ek with k ≥ 2n, and so
Hints or solutions for selected exercises 483

on. For the second part, observe that if (f , μ) is aperiodic then μ cannot have
atoms.
1.4.5. By assumption, f τ (y) ∈ Hn−τ (y) whenever y ∈ Hn with n > τ (y). Therefore,
T(y) ∈ H if y ∈ H. Consider An = {1 ≤ j ≤ n : x ∈ Hj } and Bn = {l ≥ 1 :
l
i=0 τ (T (x)) ≤ n}. Show, by induction, that #An ≤
i
#B and deduce that
k−1n i
lim supn #Bn /n ≥ θ . Now suppose that lim infk (1/k) i=0 τ (T (x)) > (1/θ).
Show that there exists θ0 < θ such that #Bn < θ0 n, for every n sufficiently
large. This contradicts the previous conclusion.
1.5.5. Observe that the maps f , f 2 , . . . , f k commute with each other and then use the
Poincaré multiple recurrence theorem.
1.5.6. By definition, the complement of (f1 , . . . , fq )c is an open set. The Birkhoff
multiple recurrence theorem ensures that the non-wandering set is
non-empty.

2.1.6. Consider the image V∗ μ of the measure μ under V. Check that V∗ μ((a, b]) =
F(b) − F(a) for every a < b. Consequently, V∗ μ({b}) = F(b) − lima→b F(a).
Therefore, (−∞, b] is a continuity set for V∗ μ if and only if b is a continuity
point for F. Using Theorem 2.1.2, it follows that if (Vk∗ μ)k converges to V∗ μ
in the weak∗ topology then (Vk )k converges to V in distribution. Conversely,
if (Vk )k converges to V in distribution then Vk∗ μ((a, b]) = Fk (b) − Fk (a)
converges to F(b) − F(a) = V∗ μ((a, b]), for any continuity points a < b
of F. Observing that such intervals (a, b] generate the Borel σ -algebra
of the real lines, conclude that (Vk∗ μ)k converges to V∗ μ in the weak∗
topology.
2.1.8. (Billingsley [Bil68]) Use the hypothesis to show that if (Un )n is an increasing
sequence of open subsets of M such that n Un = M then, for every ε > 0 there
exists n such that μ(Un ) ≥ 1 − ε for every μ ∈ K. Next, imitate the proof of
Proposition A.3.7.
2.2.2. For the first part of the statement use induction in q. The case q = 1
corresponds to Theorem 2.1. Consider continuous transformations fi : M → M,
1 ≤ i ≤ q + 1 commuting with each other. By the induction hypothesis,
there exists a probability ν invariant under fi for 1 ≤ i ≤ q. Define μn =
n−1 j
(1/n) j=0 (fq+1 )∗ (ν). Note that (fi )∗ μn = μn for every 1 ≤ i ≤ q and every n.
Hence, every accumulation point of (μn )n is invariant under every fi , 1 ≤ i ≤ q.
By compactness, there exists some accumulation point μ ∈ M1 (M). Check
that μ is invariant under fq+1 . For the second part, denote by Mq ⊂ M1 (M)
the set of probability measures invariant under fi , 1 ≤ i ≤ q. Then, (Mq )q
is a non-increasing sequence of closed non-empty subsets of M1 (M). By
compactness, the intersection q Mq is non-empty.
2.2.6. Define μ in each iterate f j (W), j ∈ Z by letting μ(A) = m(f −j (A)) for each
measurable set A ⊂ f j (W).
2.3.2. Clearly, convergence in norm implies weak convergence. To prove the
converse, assume that (xk )k converges to zero in the weak topology but not
in the norm topology. The first condition implies that, for every fixed N, the

sum Nn=0 |xnk | converges to zero when k → ∞. The second condition means
that, up to restricting to a subsequence, there exists δ > 0 such that xk  > δ
484 Hints or solutions for selected exercises

for every k. Then, there exists some increasing sequence (lk )k such that


lk−1
1 
lk
1 1
|xnk | ≤ but |xnk | ≥ xk  − ≥δ− for every k.
n=0
k n=0
k k

Take an = xnk /|xnk | for each lk−1 < n ≤ lk . Then, for every k,
 ∞ 
 k    4 4
 a x ≥ |x k
| − |x k
| − |xnk | ≥ xk  − ≥ δ − .
 n n n n
k k
n=0 l <n≤l
k−1 k n≤l k−1 n>l k

This contradicts the hypotheses. Now take xnk = 1 if k = n and xnk = 0 otherwise.

Given any (an )n ∈ c0 , we have that n an xnk = ak converges to zero when k →
∞. Therefore, (xk )k converges to zero in the weak∗ topology. But xk  = 1 for
every k, hence (xk )k does not converge to zero in the norm topology.
1
2.3.6. Take W = U(H)⊥ and V = ( ∞ n=0 U (W)) .
n ⊥

2.3.7. Suppose that there exist tangent functionals T1 and T2 with T1 (v) > T2 (v) for
some v ∈ E. Show that φ(u + tv) + φ(u − tv) − 2φ(u) ≥ t(T1 (u) − T2 (u)) for
every t and deduce that φ is not differentiable in the direction of v.
2.4.1. Consider the set P of all probability measures on X × M of the form ν Z ×
η. Note that P is compact in the weak∗ topology and is invariant under the
operator F∗ .
2.4.2. The condition p̂ ◦ g = f̂ ◦ p̂ entails f̂ n ◦ p̂ = p̂ ◦ gn for every n ∈ Z. Using π ◦
p̂ = p, it follows that π ◦ f̂ n ◦ p̂ = p ◦ gn for every n ≤ 0. Therefore, p̂(y) =
 n 
p(g (y)) n≤0 . This proves the existence and uniqueness of p̂. Now suppose
that p is surjective. The hypotheses of compactness and continuity ensure that
 −n −1 
g (p ({xn })) n≤0

is a nested sequence of compact sets, for every (xn )n≤0 ∈ M̂. Take y in the
intersection and note that p̂(y) = (xn )n≤0 .
2.5.2. Fix q and l. Assume that for every n ≥ 1 there exists a partition {S1n , . . . , Sln } of
the set {1, . . . , n} such that no subset of Sjn contains an arithmetic progression of
length q. Consider the function φn : N → {1, . . . , l} given by φn (i) = j if i ∈ Sj and
φn (i) = l if i > n. Take (nk )k → ∞ such that the subsequence (φnk )k converges
at every point to some function φ : N → {1, . . . , l}. Consider Sj = φ −1 (j) for
j = 1, . . . , l. Some Sj contains some arithmetic progression of length q. Then
n
Sj k contains that arithmetic progression for every k sufficiently large.
2.5.4. Consider  = {1, . . . , l}N with the distance d(ω, ω ) = 2−N where N ≥ 0 is
k

largest such that ω(i1 , . . . , ik ) = ω (i1 , . . . , ik ) for every i1 , . . . , ik < N. Note that
 is a compact metric space. Given q ≥ 1, let Fq = {(a1 , . . . , ak ) : 1 ≤ ai ≤
q and 1 ≤ i ≤ k}. Let e1 , . . . , em be an enumeration of the elements of Fq . For
each j = 1, . . . , m, consider the shift map σj :  →  given by (σj ω)(n) = ω(n+
ej ) for n ∈ Nk . Consider the point ω ∈  defined by ω(n) = i ⇔ n ∈ Si . Let Z be
l
the closure of {σ11 · · · σmlm (ω) : l1 , . . . , lm ∈ N}. Note that Z is invariant under the
shift maps σj . By the Birkhoff multiple recurrence theorem, there exist ζ ∈ Z
and s ≥ 1 such that d(σjs (ζ ), ζ ) < 1 for every j = 1, . . . , m. Let e = (1, . . . , 1) ∈
l
Nk . Then ζ (e) = ζ (e + se1 ) = · · · = ζ (e + sem ). Consider σ11 · · · σmlm (ω) close
enough to ζ that ω(b) = ω(b + se1 ) = · · · = ω(b + sem ), where b = e + l1 e1 +
Hints or solutions for selected exercises 485

· · · + lm em . It follows that if i = ω(b), then b + sFq ⊂ Si . Given that there are


only finitely many sets Si , some of them must contain infinitely many sets of
the type b + sFq , with q arbitrarily large.

3.1.1. Mimic the proof of Theorem 3.1.6.


3.1.2. Suppose that for every k ∈ N there exists nk ∈ N such that μ(A ∩ f −j (A)) = 0 for
every nk + 1 ≤ j ≤ nk + k. It is no restriction to assume that (nk )k → ∞. Take
nk +k
ϕ = XA . By Exercise 3.1.1, (1/k) j=n k +1
ϕ · ϕ ◦ f j → ϕ · P(ϕ). The left-hand
side is identically zero and the right-hand side is equal to P(ϕ)2 . Hence, the
time average P(ϕ) = 0 and so μ(A) = P(ϕ) dμ = 0.
3.2.3. (a) Consider ε = 1 and let C = sup{|ϕ(l)| : |l| ≤ L(1)}. Given n ∈ Z, fix s ∈ Z
such that sL(1) < n ≤ (s + 1)L(1). By hypothesis, there exists τ ∈ {sL(1) +
1, . . . , (s + 1)L(1)} such that |ϕ(k + τ ) − ϕ(k)| < 1 for every k ∈ Z. Take k =
n − τ and observe that |k| ≤ L(1). It follows that |ϕ(n)| < 1 + C. (b) Take
ρε > 2L(ε) sup |ϕ|. For every n ∈ Z there exists some ε-quasi-period τ = nρ +r
with 1 ≤ r ≤ L(ε). Then,
 (n+1)ρ ρ−r   
     ρ−r ρ

 ϕ(j) − ϕ(j)  < ρε and  ϕ(j) − ϕ(j)  ≤ 2r sup |ϕ| < ρε.
   
j=nρ+1 j=1−r j=1−r j=1

(c) Given ε > 0, take ρ as in part (b). For each n ≥ 1, write n = sρ + r, with
1 ≤ r ≤ ρ. Then,
(i+1)ρ
1 ρ 1  1 
n s−1 sρ+r
ϕ(j) = ϕ(l) + ϕ(l).
n j=1 sρ + r i=0 ρ l=iρ+1 n l=sρ+1

For s large, the first term on the right-hand side is close to (1/ρ) ρ−1 j=0 ϕ(j)
(by part (b)) and the last term is close to zero (by part (a)). Conclude that the
left-hand side of the identity is a Cauchy sequence. (d) Observe that
 n 
1  1 n
 2|x|
 ϕ(x + k) − ϕ(j) ≤ sup |ϕ|
n n  n
j=1 j=1

and use parts (a) and (c).


3.3.3. Let μ be a probability measure invariant under a flow f t : M → M, t ∈ R and let
(ϕs )s>0 be a family of functions, indexed by the positive real numbers, such that
ϕs+t ≤ ϕt + ϕs ◦ f t and the function  = sup0<s<1 ϕs+ is in L1 (μ). Then, (1/T)ϕT
converges at μ-almost every point to a function ϕ such that ϕ + ∈ L1 (μ)
and ϕ dμ = limT→∞ (1/T) ϕT dμ. To prove this, take ϕ = limn (1/n)ϕn
(Theorem 3.3.3). For T > 0 non-integer, write T = n + s with N ∈ N and
s ∈ (0, 1). Then,

ϕT ≤ ϕn + ϕs ◦ f n ≤ ϕn +  ◦ f n and ϕT ≥ ϕn+1 − ϕ1−s ◦ f T ≥ ϕn −  ◦ f T .

Using Lemma 3.2.5, the first inequality shows that lim supT→∞ (1/T)ϕT ≤ ϕ.
Analogously, using the version of Lemma 3.2.5 for continuous time, the
second inequality above gives that lim infT→∞ (1/T)ϕT ≥ ϕ. It also follows that
limT→∞ (1/T) ϕT dμ coincides with limn (1/n) ϕn dμ. By Theorem 3.3.3,
this last limit is equal to ϕ dμ.
486 Hints or solutions for selected exercises

3.3.6. Since log+ φ ∈ L1 (μ), for every ε > 0 there exists δ > 0 such that μ(B) < δ
n−1 +
implies B log+ θ dμ < ε. Using that log+ φ n  ≤ j=0 log θ ◦ f j , one
gets that
 n−1 
1 + 1
μ(E) < δ ⇒ log φ  dμ ≤
n
log+ θ dμ ≤ ε.
n E n j=0 f −j (E)

3.4.3. Consider local coordinates x = (x1 , x2 , . . . , xd ) such that  is contained in


{x1 = 0}. Write ν = ψ(x) dx1 dx2 . . . dxd . Then ν = ψ(y) dx2 . . . dxd with
y = (0, x2 , . . . , xd ). Given A ⊂  and δ > 0, the map ξ : (t, y) → gt (y) is
a diffeomorphism from [0, δ] × A to Aδ . Therefore, ν(Aδ ) = [0,δ]×A (ψ ◦
ξ )| det Dξ | dtdx2 . . . dxd and, consequently,

ν(Aδ )
lim = ψ(y)| det Dξ |(y) dx2 . . . dxd .
δ→0 δ A

Next, note that | det Dξ |(y) = |X(y) · (∂/∂t)| = φ(y) for every y ∈ . It follows
that the flux of ν coincides with the measure η = φν . In particular, η is
invariant under the Poincaré map.

4.1.2. Use the theorem of Birkhoff and the dominated convergence theorem.
4.1.8. Assume that Uf ϕ = λϕ. Since Uf is an isometry, |λ| = 1. If λn = 1 for some n
then ϕ ◦ f n = ϕ and, by ergodicity, ϕ is constant almost everywhere. Otherwise,
given any c = 0, the sets ϕ −1 (λ−k c), k ≥ 0 are pairwise disjoint. Since they all
have the same measure, this measure must be zero. Finally, the set ϕ −1 (c) is
invariant under f and, consequently, its measure is either zero or total.
4.2.4. Let K be such a set. We may assume that K contains an infinite sequence of
periodic orbits (On )n with period going to infinity. Let Y ⊂ K be the set of
accumulation points of that sequence. Show that Y cannot consist of a single
point. Let p  = q be periodic points in Y and z be a heteroclinic point, that is,
such that σ n (z) converges to the orbit of p when n → −∞ and to the orbit of q
when n → +∞. Show that z ∈ Y and deduce the conclusion of the exercise.
4.2.10. Let Jk = (0, 1/k), for each k ≥ 1. Check that the continued fraction expansion
of x is of bounded type if and only if there exists k ≥ 1 such that Gn (x) ∈ / Jk
for every n. Observe that μ(Jk ) > 0 for every k. Deduce that for every k and
μ-almost every x there exists n ≥ 1 such that Gn (x) ∈ Jk . Conclude that L has
zero Lebesgue measure.
4.2.11. For each L ∈ N, consider ϕL (x) = min{φ(x), L}. Then, ϕL ∈ L1 (μ) and, by
ergodicity, ϕ̃L = ϕL dμ at μ-almost every point. To conclude, observe that
φ̃ ≥ φ̃L for every L and φL dμ → +∞.
4.3.7. Let M = {0, 1}N and, for each n, let μn be the invariant measure supported on
the periodic orbit α n = (αkn )k , with period 2n, defined by αkn = 0 if 0 ≤ k < n
and αkn = 1 if n ≤ k < 2n. Show that (μn )n converges to (δ0 + δ1 )/2, where 0
and 1 are the fixed points of the shift map.
4.3.9. (a) Take k ≥ 1 such that every cylinder of length k has diameter less than δ.
Take y = (yj ) defined by yj+ni = xji for each 0 ≤ j < mi + k. (b) Take δ > 0 such
that d(z, w) < δ implies |ϕ(z) − ϕ(w)| < ε and consider k ≥ 1 given by part
(a). Choose mi , i = 1, . . . , s such that mi /ns ≈ αi for every i. Then take y as in
part (a). (c) By the ergodic theorem, ϕ dμ = ϕ̃ dμ. Take x1 , . . . , xs ∈  and
Hints or solutions for selected exercises 487

α 1 , . . . , α s such that ϕ̃ dμ ≈ i αi ϕ̃(xi ). Note that ϕ̃(y) = ϕ dνy , where νy is
the invariant measure supported on the orbit of y. Recall Exercise 4.1.1.
4.4.3. On each side of the triangle, consider the foot of the corresponding height, that
is, the orthogonal projection of the opposite vertex. Show that the trajectory
defined by those three points is a periodic orbit of the billiard.
4.4.5. Using (4.4.10) and the twist condition, we get that for each θ ∈ R there
exists exactly one number ρθ ∈ (a, b) such that "(θ , ρθ ) = θ . The function
θ → ρθ is continuous and periodic, with period 1. Consider its graph  =
{(θ , ρθ ) : θ ∈ S1 }. Every point in  ∩ f () is fixed under f : if (θ, ρθ ) =
 
f (γ , ργ ) = "(γ , ργ ), R(γ , ργ ) then, since "(γ , ργ ) = γ , it follows that θ = γ
and so ρθ = ργ . Since f preserves the area measure, none of the connected
components of A \  may be mapped inside itself. This implies that f ()
intersect  at no less than two points.
4.4.7. Taking inspiration from Example 4.4.12, show that the billiard map in 
extends to a Dehn twist in the annulus A = S1 × [−π/2, π/2], that is, a
homeomorphism f : A → A that coincides with the identity on both boundary
components but is homotopically non-trivial: actually, f admits a lift F : R ×
[−π/2, π/2] → R × [−π/2, π/2] such that F(s, −π/2) = (s − 2π , −π/2) and
F(s, π/2) = (s, π/2) for every s. Consider rational numbers pn /qn ∈ (−2π , 0)
with qn → ∞. Use Exercise 4.4.6 to show that g has periodic points of period
qn . One way to ensure that these periodic points are all distinct is to take the
qn mutually prime.
5.1.7. The statement does not depend on the choice of the ergodic decomposition,
since the latter is essentially unique. Consider the construction in Exer-
cise 5.1.6. The set M0 is saturated by the partition W s , that is, if x ∈ M0 then
W s (x) ⊂ M0 . Moreover, the map y → μy is constant on each W s (x). Since
the partition P is characterized by P(x) = P(y) ⇔ μx = μy , it follows that
P ≺ W s restricted to M0 .
5.2.1. Consider the canonical projections πP : M → P and πQ : M → Q, the
quotient measures μ̂P = (πP )∗ μ and μ̂Q = (πQ )∗ μ and the disintegrations
μ = μP dμ̂P (P) and μ = μQ dμ̂Q (Q). Moreover, for each P ∈ P, consider
μ̂P,Q = (πQ )∗ μP and the disintegration μP = μP,Q dμ̂P,Q (Q). Observe that
μ̂P,Q dμ̂P (P) = μ̂Q : given any B ⊂ Q,
 
−1 −1
μ̂P,Q (B) dμ̂P (P) = μP (πQ (B)) dμ̂P (P) = μ(πQ (B)) = μ̂Q (B).

To check that μπ(Q),Q is a disintegration of μ with respect to Q: (a) μP,Q (Q) = 1


for μ̂P,Q -almost every Q and μ̂P -almost every P. Moreover, μP,Q = μπ(Q),Q
for μ̂P,Q -almost every Q and μ̂P -almost every P, because μP (P) = 1 for
μ̂P -almost every P. By the previous observation, it follows that μπ(Q),Q (Q) = 1
for μ̂Q -almost every Q. (b) P → μP (E) is measurable, up to measure zero,
for every Borel set E ⊂ M. By construction (Section 5.2.3), there exists a
countable generating algebra A such that μP,Q (E) = limn μP (E ∩ Qn )/μP (Qn )
for every E ∈ A (where Qn is the element of Qn that contains Q). Deduce that
P → μπ(Q),Q (E) is measurable, up to measure zero, for every E ∈ A. Extend
this conclusion to every Borel set E, using the monotone class argument in
488 Hints or solutions for selected exercises

Section 5.2.3. (c) Note that μ = μP dμ̂P (P) = μP,Q dμ̂P,Q (Q)dμ̂P (P) =
μπ(Q),Q dμ̂P,Q (Q)dμ̂P (P) = μπ(Q),Q dμ̂Q (Q).
5.2.2. Argue that the partition Q of the space M1 (M) into points is measurable.
Given any disintegration {μP : P ∈ P}, consider the measurable map M →
M1 (M), x → μP(x) . The pre-image of Q under this map is a measurable
partition. Check that this pre-image coincides with P on a subset with full
measure.
6.1.3. The function ϕ is invariant.
6.2.5. Denote by X the closure of the orbit of x. If X is minimal, for each y ∈ X there
exists n(y) ≥ 1 such that d(f n(y) (y), x) < ε. Then, by continuity, y admits an
open neighborhood V(y) such that d(f n(y) (z), x) < ε for every z ∈ V(y). Take
y1 , . . . , ys such that X ⊂ i V(yi ) and let m = maxi n(yi ). Given any k ≥ 1, take
i such that f k (x) ∈ V(yi ). Then, d(f k+ni (x), x) < ε, that is, k + ni ∈ Rε . This
proves that, given any m + 1 consecutive integers, at least one of them is in Rε .
Hence, Rε is syndetic. Now assume that X is not minimal. Then, there exists a
non-empty, closed invariant set F properly contained in X. Note that x ∈ / F and
so, for every ε sufficiently small, there exists an open set U that contains F and
does not intersect B(x, ε). On the other hand, since Rε is syndetic, there exists
m ≥ 1 such that for any k ≥ 1 there exists n ∈ {k, . . . , k + m} satisfying f n (x) ∈
B(x, ε). Take k such that f k (x) ∈ U1 , where U1 = U ∩ f −1 (U) ∩ · · · ∩ f −m (U),
and find a contradiction.
6.2.6. By Exercise 6.2.5, the set Rε = {n ∈ N : d(x, f n (x)) < ε} is syndetic for every
ε > 0. If y is close to x then {n ∈ N : d(f n (x), f n (y)) < ε} contains blocks of
consecutive integers with arbitrary length, no matter the choice of ε > 0. Let U1
be any neighborhood of x. It follows from the previous observations that there
exist infinitely many values of n ∈ N such that f n (x), f n (y) are in U1 . Fix n1 with
this property. Next, consider U2 = U1 ∩ f −n1 (U1 ). By the previous step, there
exists n2 > n1 such that f n2 (x), f n2 (y) ∈ U2 . Continuing in this way, construct
a non-increasing sequence of open sets Uk and an increasing sequence of
natural numbers nk such that f nk (Uk+1 ) ⊂ Uk and f nk (x), f nk (y) ∈ Uk . Check
that f ni1 +···+nik (x) and f ni1 +···+nik (y) are in U1 for any i1 < · · · < ik , k ≥ 1.
6.2.7. Consider the shift map σ :  →  in  = {1, 2, . . . , q}N . The partition N =
S1 ∪ · · · ∪ Sq defines a certain element α = (αn ) ∈ , given by αn = i if and only
if n ∈ Si . Consider β in the closure of the orbit of α such that α and β are near
and the closure of the orbit of β is a minimal set. Apply Exercise 6.2.6 with
x = β, y = α and U = [0; α0 ] to obtain the result.
6.3.6. Write g = (a11 , a12 , a2 , a22 ). Then,

Eg (x11 , x12 , x21 , x22 ) = (a11 x11 +a12 x21 , a11 x12 +a12 x22 , a21 x11 +a22 x21 , a21 x12 +a22 x22 ).

Write the right-hand side as (y11 , y12 , y21 , y22 ). Use the formula of change of
variables, observing that det(y11 , y12 , y21 , y22 ) = (det g) det(x11 , x12 , x21 , x22 ) and

dy11 dy12 dy21 dy22 = (det g)2 dx11 dx12 dx21 dx22 .

In the complex case, take


 
ϕ(z11 , z12 , z21 , z22 )
ϕ dμ = dx11 dy11 dx12 dy12 dx21 dy21 dx22 dy22 ,
GL(2,R) | det(z11 , z12 , z21 , z22 )|4
Hints or solutions for selected exercises 489

where zjk = xjk + yjk i. [Observation: Generalize these constructions to any


dimension!]
6.3.9. Given x ∈ M, there exists a unique number 0 ≤ r < 10k such that f r (x) ∈
[b0 , . . . , bk−1 ]. Moreover, f n (x) ∈ [b0 , . . . , bk−1 ] if and only if n − r is a multiple
of 10k . Use this observation to conclude that
τ ([b0 , . . . , bk−1 ], x) = 10−k for every x ∈ M.
Conclude that if f admits an ergodic probability measure μ then μ([b0 , . . . ,
bk−1 ]) = 10−k for every b0 , . . . , bk−1 . This determines μ uniquely. To conclude,
show that μ is well defined and invariant.
6.3.11. Consider the sequence of words wn defined inductively by w1 = α and
s(wn+1 ) = wn for n ≥ 1. Decompose the word s(α) = w2 = αr1 and prove,
by induction, that wn+1 may be decomposed as wn+1 = wn rn , for some word
rn with length greater than or equal to n, such that s(rn ) = rn+1 . Define
w = αr1 r2 · · · and note that s(w) = s(α)s(r1 )s(r2 ) · · · = αr1 r2 r3 · · · = w. This
proves existence. To prove uniqueness, let γ ∈  be a sequence starting with α
and such that S(γ ) = γ . Decompose γ as γ = αγ1 γ2 γ3 · · · , in such a way that
γi and ri have the same length. Note that S(α) = αγ1 = αr1 , and so γ1 = r1 .
Conclude by induction.

6.4.2. Given any 0 ≤ α < β ≤ 1, we have that n ∈ (α, β) in the circle if and only if
there exists some integer k ≥ 1 such that k2 + 2kα + α 2 < n < k2 + 2kβ + β 2 .
For each k the number of values of n that satisfy this inequality is equal to the
integer part of 2k(β − α) + (β 2 − α 2 ). Therefore,

√ 
N−1
#{1 ≤ n < N : 2
n ∈ (α, β)} ≤ 2k(β − α) + (β 2 − α 2 )
k=1

and the difference between the term on the right and the one on the left is less
than N. Hence,
1 √
lim 2 #{1 ≤ n < N 2 : n ∈ (α, β)} = β − α.
N
A similar calculation shows that the sequence (log n mod Z)n is not equidis-
tributed in the circle. [Observation: But it does admit a continuous (non-constant)
limit density. Calculate that density!]
6.4.3. Define φn = an + (−1/a)n . Check that (φn )n is the Fibonacci sequence and,
in particular, φn ∈ N for every n ≥ 1. Now observe that (−1/a)n converges to
zero. Hence, {n ≥ 1 : an mod Z ∈ I} is finite, for any interval I ⊂ S1 whose
closure does not contain zero.
7.1.1. It is clear that the condition is necessary. To see that it is sufficient: Given
A, consider the closed subspace V of L2 (μ) generated by the functions 1
and Xf −k (A) , k ∈ N. The hypothesis ensures that limn Ufn (XA ) · Xf −k (A) = (XA ·
1)(Xf −k (A) · 1) for every k. Conclude that limn Ufn (XA ) · φ = (XA · 1)(φ · 1) for
every φ ∈ V. Given a measurable set B, write XB = φ + φ ⊥ with φ ∈ V and
φ ⊥ ∈ V ⊥ to conclude that limn Ufn (XA ) · XB = (XA · 1)(XB · 1).
n−1
7.1.2. Assuming that E exists, decompose (1/n) j=0 |aj | into two terms, one over j ∈
E and the other over j ∈ / E. The hypotheses imply that the two terms converge to
n−1
zero. Conversely, assume that (1/n) j=0 |aj | converges to zero. Define Em =
490 Hints or solutions for selected exercises

{j ≥ 0 : |aj | ≥ (1/m)} for each m ≥ 1. The sequence (Em )m is increasing and each

Em has density zero; in particular, there exists m ≥ 1 such that (1/n)# Em ∩

{0, . . . , n − 1} < (1/m) for every n ≥ m . Choose ( m )m increasing and define
E = m (Em ∩ { m , . . . , m+1 − 1}). For the second part of the exercise, apply
the first part to both sequences, (an )n and (a2n )n .
7.1.6. (Pollicott and Yuri [PY98]) It is enough to treat the case when ϕj dμ = 0
for every j. Use induction on the number k of functions. The case k = 1 is
contained in Theorem 3.1.6. Use the inequalities

1 1  +1  % m 
n N−m+1 m−1
an ≤ an+j + max |ai | + max |ai |
N n=1 N n=1 m j=0 N 1≤i≤m N−m≤i≤N

+1 N %2 N
bn ≤ (1/N) |bn |2
N n=1 n=1
  
to conclude that (1/N) N−1 (ϕ1 ◦ f n ) · · · (ϕk ◦ f kn )2 dμ is bounded above by
j=0


1 + 1    2 + 2m m2 %+ %2
N m−1
| ϕ1 ◦f n+j · · · ϕk ◦f k(n+j)  dμ+ + 2 max supess |ϕi | .
N n=1 m j=0 N N 1≤i≤k

The integral is equal to


m−1  &
k +

m−1 
 %
ϕl ϕl ◦ f l(j−i) ◦ f l(n+i) dμ.
i=0 j=0 l=1

By the induction hypothesis,


k 
1 &+  % l(n+i) &  
N k
ϕl ϕl ◦ f l(j−i)
◦f → ϕl ϕl ◦ f l(j−i) dμ
N n=1 l=2 l=2

in L2 (μ), when N → ∞. Therefore,


N  k 
1  &+  % l(n+i) &  
k
ϕl ϕl ◦ f l(j−i)
◦f dμ → ϕl ϕl ◦ f l(j−i) dμ
N n=1 l=1 l=1

in L2 (μ), when N → ∞. Combining these estimates,


 m−1 m−1 k 
1  N
    1 & 
lim sup  ϕ1 ◦f · · · ϕk ◦f
n kn 2
dμ ≤ 2 ϕl (ϕl ◦f l(j−i) dμ.
N N n=1 m i=0 j=0 l=1

Since (f , μ) is weak mixing, ϕl (ϕl ◦ f lr dμ converges to 0 when r → ∞,
restricted to a set of values of l with density 1 at infinity (recall Exercise 7.1.2).
Therefore, the expression on the right-hand side is close to zero when m is
large.
7.2.5. The first statement is analogous to Exercise 7.2.1. The definition ensures that
μk has memory k. Given ε > 0 and any (uniformly) continuous function ϕ :
 → R, there exists κ ≥ 1 such that | C ϕ dη − ϕ(x)η(C)| ≤ εη(C) for every
x ∈ C, every cylinder C of length l ≥ κ and every probability measure η. Since
μ = μk for cylinders of length k, it follows that | ϕ dμk − ϕ dμ| ≤ ε for
every k ≥ κ. This proves that (μk )k converges to μ in the weak∗ topology.
Hints or solutions for selected exercises 491
n +n  n1 n2
7.2.6. (a) Use that Pi,i1 2 = j Pi,j Pj,i . All the terms in this expression are
non-negative and the term corresponding to j = i is positive. (b) Up to replacing
R by R/κ, we may suppose that κ = 1. Start by showing that if S ⊂ Z is closed
under addition and subtraction then S = aZ, where a is the smallest positive
element of S. Use that fact to show that if a1 , . . . , as are positive integers with
greatest common divisor equal to 1 then there exist integers b1 , . . . , bs such that
b1 a1 + · · · + bs as = 1. Now take a1 , . . . , as ∈ R such that their greatest common
divisor is equal to 1. Using the previous observation, and the hypothesis that R
is closed under addition, conclude that there exists p, q ∈ R such that p − q = 1.
To finish, show that R contains every integer n ≥ pq. (c) Consider any i, j ∈ X
and let κi , κj be the greatest common divisors of R(i), R(j), respectively. By
irreducibility, there exist k, l ≥ 1 such that Pki,j > 0 and Plj,i > 0. Deduce that if
n ∈ R(i) then n + k + l ∈ R(j). In view of (b), this is possible only if κi ≥ κj .
Exchanging the roles of i and j, it also follows that κi ≤ κj . If κ ≥ 2 then, given
any i, we have Pni,i = 0 for n arbitrarily large and so P cannot be aperiodic.
Now suppose that κ = 1. Then, using (b) and the hypothesis that X is finite,
there exists m ≥ 1 such that Pni,i > 0 for every i ∈ X and every n ≥ m. Then,
since P is irreducible and X is finite, there exists k ≥ 1 such that for any i, j
there exists l ≤ k such that Pli,j > 0. Deduce that Pm+k i,j > 0 for every i, j and
so P is aperiodic. (d) Fix any i ∈ X and, for each r ∈ {0, . . . , κ − 1}, define
Xr = {j ∈ X : there exists n ≡ r mod κ such that Pni,j > 0}. Check that these
sets Xr cover X and are pairwise disjoint. Show that the restriction of Pκ to
each of them is aperiodic.
7.3.1. By the theorem of Darboux, there exist coordinates (x1 , x2 ) in the neighbor-
hood of any point of S such that ω = dx1 ∧ dx2 . Consider the expression of
the vector field in those coordinates: X = X1 (∂/∂x1 ) + X2 (∂/∂x2 ). Show that
β = X1 dx − 2 − X2 dx1 and so dβ = (div X) dx1 ∧ dx2 . Hence, β is closed if and
only if the divergent of X vanishes.
7.3.5. Observe that f is invertible and if A is a d-adic interval of level r ≥ 1 (that is,
an interval of the form A = [id−r , (i + 1)d−r ]), then there exists s ≥ r such that
f (A) consists of ds−r d-adic intervals of level s. Deduce that f preserves the
Lebesgue measure. Show also that if A and B are d-adic intervals then, since σ
has no periodic points, m(f k (A) ∩ B) = m(A)m(B) for every large k.
7.4.2. (a) Given y1 , y2 ∈ M, write f −1 (yi ) = {x1i , . . . , xdi } with d(xj1 , xj2 ) ≤ σ −1 d(y1 , y2 ).
Then,

1
d
|Lϕ(y ) − Lϕ(y )| =
1 2
|ϕ(xj1 ) − ϕ(xj2 )| ≤ Kθ (ϕ)σ −θ d(y1 , y2 )θ .
d j=1

(b) It follows that Lϕ ≤ sup |ϕ| + σ −θ Kθ (ϕ) ≤ ϕ for every ϕ ∈ E, and
the identity holds if and only if ϕ is constant. Hence, L = 1. (c) Let
Jn = [inf Ln ϕ, sup Ln ϕ]. By part (a), the sequence (Jn )n is decreasing and the
diameter of Jn converges to zero exponentially fast. Take νϕ to be the point in
the intersection and note that Ln ϕ − νϕ  = sup |Ln ϕ − νϕ | + Kθ (Ln ϕ). (d) The
constant functions are eigenvectors of L, associated with the eigenvalue λ = 1.
It follows that νϕ+c = νϕ + c for every ϕ ∈ E and every c ∈ R. Then, H = {ϕ :
νϕ = 0} is a hyperplane of E transverse to the line of constant functions. This
492 Hints or solutions for selected exercises

hyperplane is invariant under L and, by part (c), the spectral radius of L | H


is less or equal than σ −θ < 1. (e) By part (b), Ln ϕ − Ln ψ ≤ Lk ϕ − Lk ψ
for every n ≥ k ≥ 1. Making n → ∞, we get that |νϕ − νψ | ≤ Lk ϕ − Lk ψ
for every k ≥ 1. Using part (a) and making k → ∞, we get that |νϕ − νψ | ≤
sup |ϕ − ψ|. Therefore, the linear operator ψ → νψ is continuous, relative to
the norm in the space C0 (M).
8.1.2. Denote Xi = X ∩ [0; i] and pi = μ([0; i]), for i = 1, . . . , k. Since μ is a Bernoulli
 
measure, μ(Xi ) = pi μ(f (Xi )). Hence, i pi μ(f (Xi )) = 1. Since i pi = 1, it
follows that μ(f (Xi )) = 1 for every i. Consequently, i f (Xi ) has full measure.
Take x in that intersection. If (f , μ) and (g, ν) are ergodically equivalent, there
exists a bijection φ : X → Y between full measure invariant subsets such that
φ ◦ f = g ◦ φ. Take x ∈ X with k pre-images x1 , . . . , xk in X. The points φ(xi ) are
pre-images of φ(x) for the transformation g. Hence, k ≤ l; by symmetry, we
also have that l ≤ k.
8.2.5. Assume that (f , μ) is not weak mixing. By Theorem 8.2.1, there exists a
non-constant function ϕ such that Uf ϕ = λϕ for some λ = e2π iθ . By ergodicity,
the absolute value of ϕ is constant μ-almost everywhere. Using that f n is
ergodic for every n (Exercise 4.1.8), θ is irrational and any set where ϕ
is constant has measure zero. Given α < β in [0, 2π ], consider A = {x ∈
C : α ≤ arg(ϕ(x)) ≤ β}. Show that for every ε > 0 there exists n such that
μ(f −n (A) \ A) < ε. Show that, by choosing |β − α| sufficiently small, one gets
to contradict the inequality in the statement.
8.2.7. Note that fn+1 (x) = fn (x) for every x ∈ Jn that is not on the top of Sn . Hence,
(for example, arguing as in Exercise 6.3.10), f (x) = fn (x) for every x ∈ [0, 1)
and every n sufficiently large; moreover, f preserves the Lebesgue measure.
Let an = #Sn be the height of each pile Sn . Denote by {I e , I c , I d } the partition
of each I ∈ Sn into subintervals of equal length, ordered from left to right. (a)
If A is a set with m(A) > 0 then for every ε > 0 there exists n ≥ 1 and some
interval I ∈ Sn such that m(A ∩ I) ≥ (1 − ε)m(I). If A is invariant, it follows that
m(A ∩ J) ≥ (1 − ε)m(J) for every J ∈ Sn . (b) Assume that Uf ϕ = λϕ. Since Uf
is an isometry, |λ| = 1. By ergodicity, |ϕ| is constant almost everywhere; we
may suppose that |ϕ| ≡ 1. Initially, assume that there exists n and some interval
I ∈ Sn such that the restriction of ϕ to I is constant. Take x ∈ I e and y ∈ I c and
z ∈ I d . Then, ϕ(x) = ϕ(y) = ϕ(z) and ϕ(y) = λan ϕ(x) and ϕ(z) = λan +1 ϕ(y).
Hence, λ = 1 and, by ergodicity, ϕ is constant. In general, use the theorem of
Lusin (Theorems A.3.5–A.3.9) to reach the same conclusion. (c) A is a union
of intervals Ij in the pile Sn for each n ≥ 2. Then, f an (Ije ) = Ijc for every j. Hence,
m(f an (A) ∩ A) ≥ m(A)/3 = 2/27.
8.3.1. Let {vj : j ∈ I} be a basis of H formed by eigenvectors with norm 1 and λj be the
eigenvalue associated with each eigenvector vj . The hypothesis ensures that we
may consider I = N. Show that for every δ > 0 and every k ≥ 1 there exists n ≥
1 such that |λnj − 1| ≤ δ for every j ∈ {1, . . . , k} (use the pigeonhole principle).
 
Decompose ϕ = j cj vj , with cj ∈ C. Observe that Ufn ϕ = j∈N cj λnj vj , and so


k 
∞ 

Ufn ϕ − ϕ22 ≤ |cj (λnj − 1)| +
2
2|cj | ≤ δ
2 2
ϕ22 + 2|cj |2 .
j=1 j=k+1 j=k+1
Hints or solutions for selected exercises 493

Given ε > 0, we may choose δ and k in such a way that each one of the terms
on the right-hand side is less than ε/2.
8.4.3. Let U : H → H be a non-invertible isometry. Recalling Exercise 2.3.6, show
that there exist closed subspaces V and W of H such that U : H → H is unitarily
conjugate to the operator U1 : V ⊕ W N → V ⊕ W N given by U1 | V = U | V and
U1 | W N = id . Let U2 : V ⊕ W Z → V ⊕ W Z be the linear operator defined by
U1 | V = U | V and U1 | W Z = id . Check that U2 is a unitary operator such
that U2 ◦ j = j ◦ U1 , where j : V ⊕ W N → V ⊕ W Z is the natural inclusion.
Show that if E ⊂ V ⊕ W N satisfies the conditions in the definition of Lebesgue
spectrum for U1 then j(E) satisfies those same conditions for U2 . Conclude
that the rank of U1 is well defined.
8.4.6. The lemma of Riemann–Lebesgue ensures that F takes values in c0 . The
operator F is continuous: F(ϕ) ≤ ϕ for every ϕ ∈ L1 (λ). Moreover,
F is injective: if F(ϕ) = 0 then ϕ(z)ψ(z) dλ(z) = 0 for every linear

combination ψ(z) = |j|≤l aj zj , aj ∈ C. Given any interval I ⊂ S1 , the

sequence ψN = |n|≤N cn zn , cn = I z−n dλ(z) of partial sums of the Fourier
series of the characteristic function XI is bounded (see [Zyg68, page 90]).
Using the dominated convergence theorem, it follows that F(ϕ) = 0 implies
I
ϕ(z) dλ(z) = 0, for any interval I. Hence, ϕ = 0. If F were bijective then, by
the open mapping theorem, its inverse would be a continuous linear operator.
Then, there would be c > 0 such that F(ϕ) ≥ cϕ for every ϕ ∈ L1 (λ). But

that is false: consider DN (z) = |n|≤N zn for N ≥ 0. Check that F(DN ) = (aNn )n
with aNn = 1 if |n| ≤ N and aNn = 0 otherwise. Hence, F(DN ) = 1 for every
N. Writing z = e2π it , check that DN (z) = sin((2N + 1)π t)/ sin(π t). Conclude
that DN  = |DN (z)| dλ(z) converges to infinity when N → ∞. [Observation:
One can also give explicit examples. For instance, if (an )n converges to zero

and satisfies ∞ n=1 an /n = ∞ then the sequence (αn )n given by αn = an /(2i)
for n ≥ 1 and αn + α−n = 0 for every n ≥ 0 may not be written in the form
αn = zn dν(z). See Section 7.3.4 of Edwards [Edw79].]
8.5.3. By Exercise 8.5.2, f̃ is always injective. Conclude that if f̃ is surjective then
it is invertible: there exists a homomorphism of measure algebras h : B̃ → B̃
such that h ◦ f̃ = f̃ ◦ h = id . Use Proposition 8.5.6 to find g : M → M such
that g ◦ f = f ◦ g at μ-almost every point. The converse is easy: if (f , μ) is
invertible at almost every point then the homomorphism of measure algebras g̃
associated with g = f −1 satisfies g̃ ◦ f̃ = f̃ ◦ g̃ = id ; in particular, f̃ is surjective.
8.5.6. Check that the unions of elements of n Pn are pre-images, under the inclusion
ι, of an open subset of K. Use that fact to show that if the chains have measure
zero then for each δ > 0 there exists an open set A ⊂ K such that m(A) < δ
and every point outside A is in the image of the inclusion: in other words,
K \ ι(MP ) ⊂ A. Conclude that ι(MP ) is a Lebesgue measurable set and its
complement in K has measure zero. For the converse, use the fact that (a)
implies (c) in Exercise A.1.13.
9.1.1. Hμ (P/R) ≤ Hμ (P ∨ Q/R) = Hμ (Q/R) + Hμ (P/Q ∨ R) ≤ Hμ (Q/R) +
Hμ (P/Q).
*k−1 −i * *
9.1.3. Let g = f k . Then Hμ ( i=0 f (P)/ nj=k f −j (P)) = Hμ (P k / n−k −i
i=1 g (P )). By
k

Lemma 9.1.12, this expression converges to hμ (g, Pk ). Now use Lemma 9.1.13.
494 Hints or solutions for selected exercises
*n−1 −j
9.2.5. Write Qn = j=0 f (Q) for each n and let A be the σ -algebra generated by
n Q . Check that f is measurable with respect to the σ -algebra A. Show
n

that the hypothesis implies that P ⊂ A. By Corollary 9.2.4, it follows that


Hμ (P/Qn ) converges to zero. By Lemmas 9.1.11 and 9.1.13, we have that
hμ (f , P) ≤ hμ (f , Q) + Hμ (P/Qn ) for every n.
9.2.7. The set A of all finite disjoint of rectangles Ai × Bi , with Ai ⊂ M and Bi ⊂ N,
is an algebra that generates the σ -algebra of M × N. Given partitions P and
Q of M and N, respectively, the family P × Q = {P × Q : P ∈ P and Q ∈ Q}
is a partition of M × N contained in A and such that hμ×ν (f × g, P × Q) =
hμ (f , P) + hν (g, Q). Conversely, given any partition R ⊂ A of M × N, there
exist partitions P and Q such that R ≺ P × Q and so hμ×ν (f × g, R) ≤
hμ (f , P) + hν (g, Q). Conclude using Exercise 9.2.6.
9.3.3. It is clear that B(x, n, ε) ⊂ B(f (x), n − 1, ε). Hence, hμ (f , x) ≥ hμ (f , f (x)) for
μ-almost every x. On the other hand, hμ (f , x) dμ(x) = hμ (f , f (x)) dμ(x)
since the measure μ is invariant under f .
9.4.2. Use the following consequence of the Jordan canonical form: there exist
numbers ρ1 , . . . , ρl > 0, there exists an A-invariant decomposition Rd = E1 ⊕
· · · ⊕ El and, given α > 0, there exists an inner product in Rd relative to which
the subspaces Ej are orthogonal and satisfy e−α ρj v ≤ Av ≤ eα ρj v for
every v ∈ Ej . Moreover, the ρi are the absolute values of the eigenvalues of A
 
and they satisfy di=1 log+ |λi | = lj=1 dim Ej log+ ρj .
9.4.3. Consider any countable partition P with {B, Bc } ≺ P. Let Q be the restriction
*n−1 −j * −j
of P to the set B. Write P n = j=0 f (P) and Qk = k−1 j=0 g (Q). Check
that, for every x ∈ B and k ≥ 1, there exists nk ≥ 1 such that Qk (x) = P nk (x).
Moreover, by ergodicity, limk k/nk = τ (B, x) = μ(B) for almost every x. By the
theorem of Shannon–McMillan–Breiman,

1 1
hν (g, Q, x) = lim − log ν(Qk (x)) and hμ (f , P, x) = lim − log μ(P nk (x)).
k k k nk

Conclude that hν (f , Q, x) = μ(B)hν (g, Q, x) for almost every x ∈ B. Varying P,


deduce that hν (f ) = μ(B)hν (g).
9.5.2. Consider A ∈ n f −n (B) with m(A) > 0. Then, for each n there exists An ∈ B
 
such that A = f −n (An ). Consider the intervals Ij,n = (j − 1)/10n , j/10n . Then,

m(A ∩ Ij,n ) m(An )


= = m(A) for every 1 ≤ j ≤ 10n .
m(Ij,n ) m((0, 1))

Making n → ∞, conclude that Ac has no points of density. Hence, m(Ac ) = 0.


*
9.5.6. Assume that hμ (f , P) = 0. Use Lemma 9.5.4 to show that P ≺ ∞ j=1 f
−jk
(P)
−k
for every k ≥ 1. Deduce that, up to measure zero, P is contained in f (B) for
every k ≥ 1. Conclude that the partition P is trivial.
9.6.1. Uniqueness is immediate. To prove the existence, consider the functional 
 
defined by (ψ) = ψ dη dW(η) in the space of bounded measurable
functions ψ : M → R. Note that  is linear and non-negative and satisfies
(1) = 1. Use the monotone convergence theorem to show that if Bn , n ≥ 1

are pairwise disjoint measurable subsets of M then (X n Bn ) = n (XBn ).
Conclude that ξ(B) = (XB ) defines a probability measure in the σ -algebra
Hints or solutions for selected exercises 495

of measurable subsets of M. Show that ψ dξ = (ψ) for every bounded


measurable function. Take bar(W) = ξ .
9.6.5. For the penultimate identity one would need to know that n−1 log μP (Qn (x)) is
a dominated sequence (for example).
9.7.4. By Exercise 9.7.3, given any bounded measurable function ψ,
  +  1 %
(ψ ◦ f ) dη = ψ(x) (z) dη(x).
−1
Jη f
z∈f (x)

Deduce the first part of the statement. For the second part, note that if
η is invariant then η(f (A)) = η(f −1 (f (A))) ≥ η(A) for every domain of
invertibility A.
9.7.7. The “if” part of the statement is easy: we may exhibit the ergodic equivalence
explicitly. Assume that the two systems are ergodically equivalent. The fact
that k = l follows from Exercise 8.1.2. To prove that p and q are permutations
of one another, use the fact that the Jacobian is invariant under ergodic
equivalence (Exercise 9.7.6), together with the expressions of the Jacobians
given by Example 9.7.1.
10.1.6. Note that ψ(M) is compact and the inverse ψ −1 : ψ(M) → M is (uniformly)
continuous. Hence, given ε > 0 there exists δ > 0 such that if E ⊂ M is
(n, ε)-separated for f then ψ(E) ⊂ N is (n, δ)-separated for g. Conclude that
s(f , ε, M) ≤ s(g, δ, N) and deduce that h(f ) ≤ h(g). [Observation: The statement
remains valid in the non-compact case, as long as we assume the inverse
ψ −1 : ψ(M) → M to be uniformly continuous.]
For the second part, consider the distance defined in  by
   −|n|
d (xn )n , (yn )n = 2 |xn − yn |.
n∈Z

Consider a discrete set A ⊂ [0, 1] with n elements. Check that the restriction to
AZ of the distance of [0, 1]Z is uniformly equivalent to the distance defined in
(9.2.15). Using Example 10.1.2, conclude that the topological entropy of σ is
at least log n, for any n.
10.1.10. (Carlos Gustavo Moreira) Let θ1 = 0, θ2 = 01 and, for n ≥ 2, θn+1 = θn θn−1 . We
claim that, for every n ≥ 1, there exists a word τn such that θn θn+1 = τn αn and
θn+1 θn = τn βn , where αn = 10 and βn = 01 if n is even and αn = 01 and βn = 10
if n is odd. That holds for n = 1 with τ1 = 0 and for n = 2 with τ2 = 010. If it
holds for a given n, then θn+1 θn+2 = θn+1 θn+1 θn = θn+1 τn βn = τn+1 αn+1 and also
θn+2 θn+1 = θn+1 θn θn+1 = θn+1 τn αn = τn+1 βn+1 , as long as we take τn+1 = θn+1 τn .
This proves the claim. It follows that the last letters of θn and θn+1 are distinct.
Now, we claim that θ = limn θn is not pre-periodic. Indeed, suppose that θ
were pre-periodic and let m be its period. Since the length of θn is Fn+1 (where
Fk is the k-th Fibonacci number), we may take n large such that m divides
Fn+1 and the pre-period (that is, the length of the non-periodic part) of θ is less
than Fn+2 . Then, θ starts with θn+3 = θn+2 θn+1 = θn+1 θn θn+1 . However, since
the length Fn+1 of θn is a multiple of the period m, the Fm+2 -th letter of θ ,
which is the last letter of θn+1 , must coincide with the (Fm+2 + Fn+1 )-th letter
of θ , which is the last letter of θn . This would contradict the conclusion of the
previous paragraph.
496 Hints or solutions for selected exercises

Next, we claim that ck+1 (θ ) > ck (θ ) for every k. Indeed, suppose that
ck+1 (θ ) = ck (θ ) for some k. Then, every subword of length k can have only
one continuation of length k + 1. Hence, we have a transformation in the set
of subwords of length k, assigning to each subword its unique continuation,
without the first letter. Since the domain is finite, all the orbits of this
transformation are pre-periodic. In particular, θ is also pre-periodic, which
contradicts the conclusion in the previous paragraph.
Since c1 (θ ) = 2, it follows that ck (θ ) ≥ k + 1 for every k. We claim that
cFn+1 (θ ) ≤ Fn+1 + 1 for every n > 1. To prove that fact, note that θ may
be written as a concatenation of words belonging to {θn , θn+1 } because (by
induction) every θr with r ≥ n may be written as a concatenation of words
belonging to {θn , θn+1 }. Thus, any subword of θ of length Fn+1 (which is the
length of θn ) is a subword of θn θn+1 or θn+1 θn . Since θn θn+1 = θn θn θn−1 , is a
subword of θn θn θn−1 θn−2 = θn θn θn , there are at most |θn | = Fn+1 subwords of
length |θn | = Fn+1 of θn θn θn and, hence, of θn θn+1 . Since θn θn+1 = τn αn and
θn+1 θn = τn βn , and θn+1 θn ends with θn and |βn | = 2, the unique subword
of θn+1 θn of length |θn | = Fn+1 that may not be a subword of θn θn+1 is the
subword that ends with the first letter of βn (that is, one position before the end
of θn+1 θn ). Hence, cFn+1 (θ ) ≤ Fn+1 + 1 as stated.
We are ready to obtain the statement of the exercise. Assume that ck (θ ) >
k+1 for some k. Taking n such that Fn+1 > k, we would have cFn+1 (θ )−ck (θ ) <
Fn+1 + 1 − (k + 1) = Fn+1 − k and that would imply that cm+1 (θ ) ≤ cm (θ )
for some m with k ≤ m < Fn+1 . This would contradict the conclusion in the
previous paragraph.
10.2.4. By Proposition 10.2.1, h(f ) = g(f , δ, M) whenever f is ε-expansive and δ < ε/2.
Show that if d(f , h) < δ/3 then g(h, δ/3, M) ≤ g(f , δ, M). Deduce that if (fk )k
converges to f then lim supk h(fk ) = lim supk g(fk , δ/3, M) ≤ g(f , δ, M) = h(f ).
10.2.8. (Bowen [Bow72]) Write a = g∗ (f , ε). Observe that if E is an (n, δ)-generating
set of M, with δ < ε, then M = x∈E B(x, n, ε). Combining this fact with the
result of Bowen, show that gn (f , δ, M) ≤ #Eec+(a+b)n . Take b → 0 to conclude
the inequality.
10.3.3. (a) The hypothesis implies that for every n and every subcover δ of β n
there exists a subcover γ of α n such that γ ≺ δ. Taking γ minimal, #γ ≤
 
#δ and U∈γ infx∈U eφn (x) ≤ V∈δ infy∈V eφn (y) . It follows that Qn (f , φ, α) ≤
Qn (f , φ, β) for every n. (b) Lemma 10.1.11 gives that α n+k−1 is a subcover
of (α k )n . A variation of the argument in part (a) gives that Qn (f , φ, α k ) ≤
e(k−1) sup |φ| Qn+k−1 (f , φ, α) for every n. Hence, Q± (f , φ, α k ) ≤ Q± (f , φ, α).
[Observation: Analogously, P(f , φ, α k ) ≤ P(f , φ, α).] By the second part of
Lemma 10.1.11, for every subcover β of (α k )n there exists a subcover
 φn+k−1 (x)
γ of α n+k−1 such that γ ≺ β, #γ ≤ #β and U∈γ infx∈U e ≤
(k−1) sup |φ|
 φn (y)
e V∈β infy∈V e (taking γ minimal). Deduce that Qn+k−1 (f , φ, α) ≤
(k−1) sup |φ|
e Qn (f , φ, α ). Hence, Q± (f , φ, α) ≤ Q± (f , φ, α k ). (c) Follows from
k

part (b) and Corollary 10.3.3. (d) If the elements of α are disjoint then
(α k )n = α n+k−1 and so
 
Pn (f , φ, α k ) = inf{ sup eφn (x) : γ ⊂ (α k )n } = inf{ sup eφn (x) : γ ⊂ α n+k−1 }.
U∈γ x∈U U∈γ x∈U
Hints or solutions for selected exercises 497

It follows that e−(k−1) sup |φ| Pn (f , φ, α k ) ≤ Pn+k−1 (f , φ, α) ≤ e(k−1) sup |φ| Pn (f , φ, α k ).


(e) Follows from part (d) and the definition of pressure (Lemma 10.3.1). (f)
Note that α ±k = f k (α 2k ) and use Exercise 10.3.2.
10.3.8. Show that given ε > 0 there exists κ ≥ 1 such that every dynamical ball
B(x, n, ε) has diameter equal to ε2−n and contains some periodic point pnx of
period n + κ. Show that given C, θ > 0 there exists K > 0 such that |φn (y) −
φn (pnx )| ≤ K for every y ∈ B(x, n, ε), every n ≥ 1 and every (C, θ)-Hölder
function φ : S1 → R. Use this fact to replace generating (or separated) sets
by sets of periodic points in the definition of pressure.
10.3.10. Since ξ | c = η | c ,
 
  
E(ξ , η) − E (ξ | ) + E (η | ) =  (k − l, ξk , ξl )

(k,l)∈×c c ×


− (k − l, ηk , ηl )

≤ 2Ke−θ|k−l| .

(k,l) (k,l)∈×c c ×

Recalling that  is an interval, the cardinal of {(k, l) ∈  × c ∪ c ×  :


|k − l| = n} is less than or equal to 4n for every n ≥ 1. Hence,
   ∞
E(ξ , η) − E (ξ | ) + E (η | ) ≤ 8Kne−θn < ∞.
n=1

The second part of the statement is an immediate consequence of the first one.
10.4.4. Consider the shift map σ in the space  = {0, 1}N . Consider the function φ :
 → R defined by φ(x) = 0 if x0 = 0 and φ(x) = 1 if x0 = 1. Let N be the set
of points x ∈  such that the time average in the orbit of x does not converge.
Check that N is invariant under σ and is non-empty: for each finite sequence
(z0 , . . . , zk ) one can find x ∈ N with xi = zi for i = 0, . . . , k. Deduce that the
topological entropy of the restriction f | Nφ is equal to log 2. Justify that N
does not support any probability measure invariant under f .
10.4.5. Consider the open cover ξ of K whose elements are K ∩ [0, α] and K ∩ [1 −
β, 1]. Check that P(f , φ) = P(f , φ, ξ ) for every potential φ. Moreover,

Pn (f , −t log g , ξ ) = [(gn ) ]−t (U) = (α t + β t )n .
U∈α n

Conclude that ψ(t) = log(α t + β t ). Check that ψ  < 0 and ψ  > 0 (convexity
also follows from Proposition 10.3.7). Moreover, ψ(0) > 0 > ψ(1). By the
variational principle, the last inequality implies that hμ (f ) − log g dμ < 0.
10.5.3. The Gibbs property gives that limn (1/n) log μ(Cn (x)) = ϕ̃(x) − P, where
Cn (x) is the cylinder of length n that contains x. Combine this identity with
the theorem of Brin–Katok (Theorem 9.3.3) and the theorem of Birkhoff to
get the first claim. Now assume that μ1 and μ2 are two ergodic Gibbs states
with the same constant P. Observe that there exists C such that C−1 μ1 (A) ≤
μ2 (A) ≤ Cμ1 (A) for every A in the algebra formed by the finite disjoint unions
of cylinders. Using the monotone class theorem (Theorem A.1.18), deduce
that C−1 μ1 (A) ≤ μ2 (A) ≤ Cμ1 (A) for any measurable set A. This implies
498 Hints or solutions for selected exercises

that μ1 and μ2 are equivalent measures. Using Lemma 4.3.1, it follows that
μ1 = μ2 .
10.5.5. By Proposition 10.3.7, the pressure function is convex. By Exercise A.5.1, it
follows that it is also continuous. By the smoothness theorem of Mazur (recall
Exercise 2.3.7), there exists a residual subset R ⊂ C0 (M) such that the pressure
function is differentiable at every ϕ ∈ R. Apply Exercise 10.5.4.
11.1.3. Adapt the arguments in Section 9.4.2, as follows. Start by checking that the
iterates of f have bounded distortion: there exists K > 1 such that
1 |Df n (x)|
≤ ≤ K,
K |Df n (y)|
for every n ≥ 1 and any points x, y with P n (x) = P n (y). Consider the
n−1 j
sequence μn = (1/n) j=0 f∗ m of averages of the iterates of the Lebesgue
measure m. Show that the Radon–Nikodym derivatives dμn /dm are uniformly
bounded and are Hölder, with uniform Hölder constants. Deduce that every
accumulation point μ of that sequence is an invariant probability measure
absolutely continuous with respect to the Lebesgue measure. Show that the
Radon–Nikodym derivative ρ = dμ/dm is bounded from zero and infinity (in
other words, log ρ is bounded). Show that ρ and log ρ are Hölder.
11.1.5. Check that Jμ f = (ρ ◦ f )|f  |/ρ and use the Rokhlin formula (Theorem 9.7.3).
11.2.4. Take  = {2−n : n ≥ 0} mod Z. The restriction f :  →  cannot be an
expanding map because 1/2 is an isolated point in  but 1 = f (1/2) is not.
[Observation: Note that  = S1 \ ∞ n=0 f
−n
(I), where I = (1/2, 1) mod Z.
Modifying suitably the choice of I, one finds many other examples, possibly
with  uncountable.]
11.3.3. Let a = ϕ dμ1 and b = ϕ dμ2 . Assume that a < b and write r = (b − a)/5.
By the ergodic decomposition theorem, we may assume that μ1 and μ2 are
ergodic. Then, there exist x1 and x2 such that ϕ̃(x1 ) = a and ϕ̃(x2 ) = b. Using
the hypothesis that f is topologically exact, construct a pseudo-orbit (zn )n≥0
alternating (long) segments of the orbits of x1 and x2 in such a way that the
sequence of time averages of ϕ along the pseudo-orbit (zn )n oscillates from
a + r to b − r (meaning that lim inf ≤ a + r and lim sup ≥ b − r). Next, use the
shadowing lemma to find x ∈ M whose orbit shadows this pseudo-orbit. Using
that ϕ is uniformly continuous, conclude that the sequence of time averages of
ϕ along the orbit of x oscillates from a + 2r to b − 2r.
12.1.2. Theorem 2.1.5 gives that M1 (M) is weak∗ compact and it is clear that it
is convex. Check that the operator L : C0 (M) → C0 (M) is continuous and
deduce that its dual L∗ : M(M) → M(M) is also continuous. If (ηn )n → η in
the weak∗ topology then ( L1 dηn )n → L1 dη. Conclude that the operator
G : M1 (M) → M1 (M) is continuous. Hence, by the Tychonoff–Schauder
theorem, G has some fixed point ν. This means that L∗ ν = λν, where
λ = L1 dν. Since λ > 0, this proves that ν is a reference measure. Using

Corollary 12.1.9, check that λ = lim supn n Ln 1 and deduce that λ is the
spectral radius of L.
12.2.4. Fix in S1 the orientation induced by R. Consider the fixed point p0 = 0 of f and
let p1 , . . . , pd be its pre-images, ordered cyclically, with pd = p0 . Analogously,
let q0 be a fixed point of g and q1 , . . . , qd be its pre-images, ordered cyclically,
Hints or solutions for selected exercises 499

with qd = q0 . Note that f maps each [pi−1 , pi ] and g maps each [qi−1 , qi ] onto
S1 . Then, for each sequence (in )n ∈ {1, . . . , d}N there exists exactly one point
x ∈ S1 and one point y ∈ S1 such that f n (x) ∈ [pin −1 , pin ] and gn (y) ∈ [qin −1 , qin ]
for every n. Clearly, the maps (in )n → x and (in )n → y are surjective. Consider
two sequences (in )n and (jn )n to be equivalent if there exists N ∈ N ∪ {∞}
such that (1) in = jn for every n ≤ N and either (2a) in = 1 and jn = d for
every n > N or (2b) in = d and jn = 1 for every n > N. Show that the points x
corresponding to (in )n and (jn )n coincide if and only if the two sequences are
equivalent and a similar fact holds for the points y corresponding to the two
sequences. Conclude that the map φ : x → y is well defined and is a bijection
in S1 such that φ(f (x)) = g(φ(x)) for every x. Observe that φ preserves the
orientation of S1 and, thus, is a homeomorphism.
12.2.5. (a) ⇒ (b): Trivial. (b) ⇒ (c): Let μa be the absolutely continuous invariant
probability measure and μm be the measure of maximum entropy of f ; let νa
and νm be the corresponding measures for g. Show that μa = μm . Let φ : S1 →
S1 be a topological conjugacy. Show that νm = φ∗ μm and νa = φ∗ μa if φ is
absolutely continuous. Use Corollary 12.2.4 to conclude that in the latter case
|(gn ) (x)| = kn for every x ∈ Fix(f n ). (c) ⇒ (a): The hypothesis implies that νa =
νm and so νa = φ∗ μa . Recall (Proposition 12.1.20) that the densities dμa /dm
and dνa /dm are continuous and bounded from zero and infinity. Conclude that
7
φ is differentiable, with φ  = (dμ/dm) (dν/dm) ◦ φ.
12.3.2. Consider A = (a, 1), P = (p, 1), Q = (q, 1), B = (b, 1), O = (0, 0) ∈ C × R. Let
A (respectively, B ) be the point where the line parallel to OQ (respectively,
OP) passing through P (respectively, Q) intersects the boundary of C. Note
that all these points belong to the plane determined by P, Q and O; note also
that A ∈ OA and B ∈ OB. By definition, α(P, Q) = |B Q|/|OP| and β(P, Q) =
|OQ|/|A P|. Check that |AP|/|AQ| = |A P|/|OQ| and |BQ|/|BP| = |B Q|/|OP|.
Hence,

β(P, Q) |OQ| |OP| |AQ| |BP|


θ (P, Q) = log = log  = log .
α(P, Q) |A P| |B Q| |AP| |BQ|

In other words, d(p, q) = log(|aq| |bp|)/(|ap| |bq|) = (p, q), for any p, q ∈ D.
12.3.4. Consider the cone C0 of positive continuous functions in M. The corresponding
projective distance θ0 is given in Example 12.3.5. Check that θ1 is the
restriction of θ0 to the cone C1 . Consider a sequence of positive differentiable
functions converging uniformly to a (continuous but) non-differentiable
function g0 . Show that (gn )n converges to g0 with respect to the distance θ0
and, thus, is a Cauchy sequence for θ0 and θ1 . Argue that (gn )n cannot be
convergent for θ1 .
12.3.8. (a) It is clear that log g is (b, β)-Hölder and sup g/ inf g is close to 1 if the norm
vβ,ρ is small; this will be implicit in all that follows. Then, g ∈ C(b, β, R). To
estimate θ (1, g), use the expression given by Lemma 12.3.8. Observe that

 
exp(bδ)g(x) − g(y)
β(1, g) = sup g(x), : x  = y, d(x, y) < ρ where δ = d(x, y)β .
exp(bδ) − 1
500 Hints or solutions for selected exercises

Clearly, g(x) ≤ 1 + sup |v|. Moreover,


exp(bδ)g(x) − g(y) exp(bδ)g(y) + exp(bδ)Hβ,ρ (v)δ − g(y)

exp(bδ) − 1 exp(bδ) − 1
δ exp(bδ)
= g(y) + Hβ,ρ (v).
exp(bδ) − 1
Take K1 > K2 > 0, depending only on b, β, ρ, such that K1 ≥ exp(bs)s/(exp(bs)−
1) ≥ K2 for every s ∈ [0, ρ β ]. Then, the term on the right-hand side of the
previous inequality is bounded by 1+sup |v|+K1 Hβ,ρ (v). Hence, log β(1, g) ≤
log(1 + sup |v| + K1 Hβ,ρ (v)) ≤ K1 vβ,ρ , where K1 = max{K1 , 1}. Varying x
and y in the previous arguments, we also find that β(1, g) ≥ 1 + sup |v| and
β(1, g) ≥ 1 − sup |v| + K2 Hβ,ρ (v). Deduce that
 
log β(1, g) ≥ max log(1 + sup |v|), log(1 − sup |v| + K2 Hβ,ρ (v)) ≥ K2 vβ,ρ ,
where the constant K2 depends only on K2 , β and ρ. Analogously, there exist
constants K3 > K4 > 0 such that −K3 vβ,ρ ≤ log α(1, g) ≤ −K4 vβ,ρ . Fixing
K ≥ max{(K1 + K3 ), 1/(K2 + K4 )},
it follows that K −1 vβ,ρ ≤ θ (1, g) ≤ K vβ,ρ . (b) It is no restriction to assume
that vβ,ρ < r. Note that P n g = 1 + P n v for every n. Corollary 12.3.12 gives
that
θ (P kN g, 1) ≤ k0 θ(1, g) for every k,
with 0 < 1. By part (a), it follows that P kN v ≤ K 2 k0 for every k. This
yields the statement, with τ = 0 and C = K 2 PN −1
1/k
0 .
12.4.4. Consider 0 < δ ≤ ρ. For every cover U of A with diameter less than δ, we have
 
(diam U)d ≥ K −1 μ(U) ≥ K −1 μ(A).
U∈U U∈U

Taking the infimum over U, we get that md (A, δ) ≥ K −1 μ(A). Making δ → 0,


we find that md (A) > K −1 μ(A); hence, d(A) ≥ d.
12.4.7. Consider = 1. Then, D, D1 , . . . , DN (Section 12.4.3) are compact intervals.
It is no restriction to assume that D = [0, 1]. Write Din = hi0 ◦ · · · ◦ hin−1 (D)
for each in = (i0 , . . . , in−1 ) in {1, . . . , N}n . Starting from the bounded distortion
property (Proposition 12.4.5), prove that there exists c > 0 such that, for every
in and every n,
(i) c ≤ |(f n ) (x)| diam Din ≤ c−1 for every x ∈ Din ;
(ii) d(Din , Djn ) ≥ c diam Din for every jn  = in ;
(iii) diam Din+1 ≥ c diam Din for every in , where in+1 = (i0 , . . . , in−1 , in ).
Let ν be the reference measure of the potential ϕ = −d0 log |f  |. Since P(f , ϕ) =
0, it follows from Lemma 12.1.3 and Corollary 12.1.15 that Jν f = |f  |d0 .
Deduce that c ≤ |(f n ) (x)|d0 ν(Din ) ≤ c−1 for any x ∈ Din and, using (i) once
more, conclude that
diam(Din )d0
c2 ≤ ≤ c−2 for every in and every n.
ν(Din )
 
It follows that in diam(Din )d0 ≤ c−2 in ν(Din ) = c−2 . Since the diameter of
Din converges uniformly to zero when n → ∞, this implies that md0 () ≤ c−2 .
Hints or solutions for selected exercises 501

For the lower estimate, let us prove that ν satisfies the hypothesis of the
mass distribution principle (Exercise 12.4.4). Given any U with diam U <
c min{diam D1 , . . . , diam DN }, there exist n ≥ 1 and in such that Din intersects U
and c diam Din > diam U. By (ii), we have that ν(U) ≤ ν(Din ) ≤ c−2 diam Din0 .
d

Take n maximum. Then, using (iii), diam U ≥ c diam Din+1 ≥ c2 diam Din
for some choice of in . Combining the two inequalities, we get ν(U) ≤
c−2−2d0 (diam U)d0 . Then, by the mass distribution principle, md0 () ≥ c2+2d0 .
Finally, extend these arguments to any dimension ≥ 1.
A.1.9. Given A1 ⊃ · · · ⊃ Ai ⊃ · · · , take A = ∞ 
i=1 Ai . For j ≥ 1, consider Aj = Aj \ A.
By Theorem A.1.14, we have that μ(Aj ) → 0 and so μ(Aj ) → μ(A). Given
A1 ⊂ · · · ⊂ Ai ⊂ · · · , take A = ∞ 
i=1 Ai . For each j, consider Aj = A \ Aj . By
Theorem A.1.14, we have that μ(Aj ) → 0, that is, μ(Aj ) → μ(A).
A.1.13. (Royden [Roy63]) (b) ⇒ (a) Assume that there exist Borel sets B1 , B2 such that
B1 ⊂ E ⊂ B2 and m(B2 \ B1 ) = 0. Deduce that m∗ (E \ B1 ) = 0, hence E \ B1 is
a Lebesgue measurable set. Conclude that E is a Lebesgue measurable set. (a)
⇒ (c) Let E be a Lebesgue measurable set such that m∗ (E) < ∞. Given ε > 0,

there exists a cover by open rectangles (Rk )k such that k m∗ (Rk ) < m∗ (E) + ε.
Then, A = k Rk is an open set containing E and such that m∗ (A) − m∗ (E) < ε.
Using that E is a Lebesgue measurable set, deduce that m∗ (A \ E) < ε. For
the general case, write E as a disjoint union of Lebesgue measurable sets with
finite exterior measure. (c) ⇔ (d) It is clear that E is a Lebesgue measurable
set if and only if its complement is. (c) and (d) ⇒ (b) For each k ≥ 1, consider
a closed set Fk ⊂ E and an open set Ak ⊃ E such that m∗ (E \ Fk ) and m∗ (Ak \ E)
are less than 1/k. Then, B1 = ∪Fk and B2 = k Ak are Borel sets such that
B1 ⊂ E ⊂ B2 and m∗ (E \ B1 ) = m∗ (B2 \ E) = 0. Conclude that m(B2 \ B1 ) =
m∗ (B2 \ B1 ) = 0.
A.1.18. Show that x → 1n #{0 ≤ j ≤ n − 1 : aj = 5} is a simple function for each n ≥ 1.
By Proposition A.1.31, it follows that ω5 is measurable.
A.2.8. (a) Assume that F is uniformly integrable. Consider C > 0 corresponding to
α = 1 and take L = C + 1. Check that |f | dμ < L for every f ∈ F . Given
ε > 0, consider C > 0 corresponding to α = ε/2 and take δ = ε/(2C). Check
that A |f | dμ < ε for every f ∈ F and every set with μ(A) < δ. Conversely,
given α > 0, take δ > 0 corresponding to ε = α and let C = L/δ. Show that
|f |>C
|f | dμ < α. (b) Applying Exercise A.2.5 to the function |g|, show that F
satisfies the criterion in (a). (c) Let us prove three facts about f = limn fn . (i) f is
finite at almost every point: Consider L as in (a). Note that μ({x : |fn (x)| ≥ k}) ≤
L/k for every n, k ≥ 1 (Exercise A.2.4) and deduce that μ({x : |f (x)| ≥ k}) ≤ L/k
for every k ≥ 1. (ii) f is integrable: Fix K > 0. Given any ε > 0, take δ as in
(a). Take n sufficiently large that μ({x : |fn (x) − f (x)| > ε}) < δ. Note that

  
|f | dμ ≤ |f | dμ + |f | dμ ≤ (L + ε) + Kδ.
|f |≤K |fn −f |≤ε |f |≤K,|fn −f |>ε

Deduce that |f |≤K |f | dμ ≤ L for every K and |f | dμ ≤ L. (iii) (fn )n converges


to f in L1 (μ): Show that given ε > 0 there exists K > 0 such that |f |>K |f | dμ <
ε and |f |>K |fn | dμ < ε for every n. Take δ as in part (a) and n large enough that
502 Hints or solutions for selected exercises

μ({x : |fn (x) − f (x)| > ε}) < δ. Then


   
|fn − f | dμ ≤ |fn − f | dμ + |fn | dμ + |f | dμ.
|f |≤K |fn −f |≤ε |fn −f |>ε |fn |≤K,|fn −f |>ε

The right-hand side is bounded above by 2ε + Kδ. Combining these


inequalities, |fn − f | dμ < 4ε + Kδ for every n sufficiently large.
A.2.14. It is no restriction to assume that the Bn are pairwise disjoint. For each n,
consider the measure ηn defined in Bn by ηn (A) = η(f (A)). Then, ηn  (η | Bn )
and, by the theorem of Radon–Nikodym, there exists ρn : Bn → [0, +∞] such
that Bn φ dηn = Bn φρn dη for every bounded measurable function φ : Bn → R.
Define Jη | Bn = ρn . The essential uniqueness of Jη is a consequence of the
essential uniqueness of the Radon–Nikodym derivative.
A.3.5. Given any Borel set B ⊂ M, use Proposition A.3.2 and Lemma A.3.4 to
construct Lipschitz functions ψn : M → [0, 1] such that μ({x ∈ M : ψn (x)  =
XB (x)}) ≤ 2−n for every n. Conclude that the claim in the exercise is true for ev-
ery simple function. Extend the conclusion to every bounded measurable func-
tion, using the fact that it is a uniform limit of simple functions. Finally, for any
integrable function, use the fact that the positive part and the negative part are
monotone pointwise limits of bounded measurable functions. Now consider
M = [0, 1] and assume that there exists a sequence of continuous functions ψn :
M → R converging to the characteristic function ψ of M ∩ Q at every point.
Consider the set R = m n>m {x ∈ M : ψn (x) > 1/2}. On the one hand, R =
Q ∩ M; on the other hand, R is a residual subset of M; this is a contradiction.
A.4.6. By the inverse function theorem, for every x ∈ M there exist neighborhoods
U(x) ⊂ M of x and V(x) ⊂ N of f (x) such that f maps U(x) diffeomor-
phically onto V(x). This implies that the function y → #f −1 (y) is lower
semi-continuous. Moreover, this function is bounded. Indeed, if there were
yn ∈ N with #f −1 (yn ) ≥ n for every n ≥ 1 then, since M is compact, we could
find xn , xn ∈ f −1 (yn ) distinct with d(xn , xn ) → 0. Let x be any accumulation
point of either sequence. Then f would not be injective in the neighborhood of
x, contradicting the hypothesis. Let k be the maximum value of #f −1 (y). The
set Bk of points y ∈ N such that #f −1 (y) = k is open, closed and non-empty.
Since N is connected, it follows that Bk = M.
A.4.9. Consider local charts ϕα : Uα → Xα , x → ϕα (x) of M and ϕα : TUα M → Xα ×Rd ,
(x, v) → (ϕα , Dϕα (x)v) of TM. Note that ϕα ◦ π ◦ Dϕα−1 is the canonical
projection Xα × Rd → Xα , which is infinitely differentiable. Since M is of class
Cr and TM is of class Cr−1 , it follows that π is of class Cr−1 .
A.5.2. (a) Use the fact that the exponential function is convex. (b) Starting from the
Young inequality, show that |f ḡ| dμ ≤ 1 whenever f p = gq = 1. Deduce
the general case of the Hölder inequality. (c) Start by noting that |f + g|p ≤
|f ||f + g|p−1 + |g||f + g|p−1 . Apply the Hölder inequality to each of the terms
on the right-hand side of this inequality to obtain the Minkowski inequality.
A.5.6. (Rudin [Rud87, Theorem 6.16]) Note that (g) ∈ Lp (μ)∗ and (g) ≤ gq :
for q < ∞, that follows from the Hölder inequality; the case q = ∞ is
immediate. It is clear that  is linear. To see that it is injective, given g such that
(g) = 0, consider a function β with values on the unit circle such that βg =
|g|. Then, φ(g)β = |g| dμ = 0, hence g = 0. We are left to prove that for every
A.7 Hints or solutions for selected exercises 503

φ ∈ Lp (μ)∗ there exists g ∈ Lq (μ) such that φ = (g) and gq = φ. For
each measurable set B ⊂ M, define η(B) = φ(XB ). Check that η is a complex
measure (to prove σ -additivity one needs p < ∞) and observe that η  μ. Con-
sider the Radon–Nikodym derivative g = (dη/dμ). Then, φ(XB ) = B g dμ for
every B; conclude that φ(f ) = fg dμ for every f ∈ L∞ (μ). In the case p = 1,
this construction yields | B g dμ| ≤ φμ(B) for every measurable set. Deduce
that g∞ ≤ φ. Now suppose that 1 < p < ∞. Take fn = XBn β|g|q−1 , where
Bn = {x : |g(x)| ≤ n}. Observe that fn ∈ L∞ (μ) and |fn |p = |g|q in the set Bn and
   1/p  1/p
|g| dμ = fn g dμ = φ(fn ) ≤ φ
q
|fn | dμ
p
≤ φ |g| dμ
q
.
Bn Bn

This yields Bn |g|q dμ ≤ φq for every n and, thus, gq ≤ φ. Finally,
φ(f ) = fg dμ for every f ∈ Lp (μ), since the two sides are continuous
functionals and they coincide on the dense subset L∞ .
A.6.5. By definition, u · Lv = L∗ u · v and u · L∗ v = (L∗ )∗ u · v for any u and v. Hence,
v · (L∗ )∗ u = L∗ v · u for any u and v. Reversing the roles of u and v, we see that
L = (L∗ )∗ . Note that L∗ u · v ≤ L u v for every u and v. Taking v = L∗ u,
it follows that L∗ u ≤ L u for every u and so L∗  ≤ L. Since L = (L∗ )∗ ,
it follows that L ≤ L∗ , hence the two norms coincide. Since the operator
norm is submultiplicative, L∗ L ≤ L2 . On the other hand, u · L∗ Lu = Lu2
and so L∗ L u2 ≥ Lu2 , for every u. Deduce that L∗ L ≥ L2 and so the
two expressions coincide. Analogously, LL∗  = L2 .
A.6.8. Assume that v ∈ H and (un )n is a sequence in E such that un · v → u · v for
every v ∈ H. Considering v ∈ E⊥ , conclude that v ∈ (E⊥ )⊥ . By Exercise A.6.7,
it follows that u ∈ E. Therefore, E is closed in the weak topology. Now consider
any sequence (vn )n in U(E) converging to some v ∈ H. For each n, take
un = h−1 (vn ) ∈ E. Since h is an isometry, um − un  = vm − vn  for any m, n.
It follows that (un )n is a Cauchy sequence in E and so it admits a limit u ∈ E.
Hence, v = h(u) is in U(E).
A.7.1. The inverse of T + H is given by the equation (T + H)(T −1 + J) = id , which
may be rewritten as a fixed point equation J = −L−1 HL−1 + L−1 HJ. Use the
hypothesis to show that this equation admits a (unique) solution. Hence, T + H
is an isomorphism. Deduce that L − λid whenever λ > L. Therefore, the
spectrum of L is contained in the disk of radius L. It also follows from the
previous observation that if L − λid is an isomorphism then the same is true
for L − λ id if λ is sufficiently close to λ.
A.7.4. (a) Observe that L − λid = (z − λ) dE(z) and use Lemma A.7.4. By the
continuity from below property (Exercise A.1.9), E({λ}) = limn E({z : |z − λ| ≤
1/n}). It follows that E({λ})v = v. (b) It follows from Exercise A.7.3 that
E(B)E({λ}) = E({λ}) if λ ∈ B and E(B)E({λ}) = E(∅) = 0 otherwise. Since
L = z dE(z), we get that Lv = λE({λ})v = λv.
References

[Aar97] J. Aaronson. An introduction to infinite ergodic theory, volume 50


of Mathematical Surveys and Monographs. American Mathematical
Society, 1997.
[AB] A. Avila and J. Bochi. Proof of the subadditive ergodic theorem. Preprint
www.mat.puc-rio.br/∼jairo/docs/kingbirk.pdf.
[AF07] A. Avila and G. Forni. Weak mixing for interval exchange transforma-
tions and translation flows. Ann. Math., 165:637–664, 2007.
[AKM65] R. Adler, A. Konheim and M. McAndrew. Topological entropy. Trans.
Amer. Math. Soc., 114:309–319, 1965.
[AKN06] V. Arnold, V. Kozlov and A. Neishtadt. Mathematical aspects of classical
and celestial mechanics, volume 3 of Encyclopaedia of Mathematical
Sciences. Springer-Verlag, third edition, 2006. [Dynamical systems. III],
Translated from the Russian original by E. Khukhro.
[Ano67] D. V. Anosov. Geodesic flows on closed Riemannian manifolds of
negative curvature. Proc. Steklov Math. Inst., 90:1–235, 1967.
[Arn78] V. I. Arnold. Mathematical methods of classical mechanics. Springer-Verlag,
1978.
[AS67] D. V. Anosov and Ya. G. Sinai. Certain smooth ergodic systems. Russian
Math. Surveys, 22:103–167, 1967.
[Bal00] V. Baladi. Positive transfer operators and decay of correlations. World
Scientific Publishing Co. Inc., 2000.
[BDV05] C. Bonatti, L. J. Dı́az and M. Viana. Dynamics beyond uniform
hyperbolicity, volume 102 of Encyclopaedia of Mathematical Sciences.
Springer-Verlag, 2005.
[Bil68] P. Billingsley. Convergence of probability measures. John Wiley & Sons
Inc., 1968.
[Bil71] P. Billingsley. Weak convergence of measures: Applications in probabil-
ity. Society for Industrial and Applied Mathematics, 1971. Conference
Board of the Mathematical Sciences Regional Conference Series in
Applied Mathematics, No. 5.
[Bir13] G. D. Birkhoff. Proof of Poincaré’s last Geometric Theorem. Trans. Amer.
Math. Soc., 14:14–22, 1913.
[Bir67] G. Birkhoff. Lattice theory, volume 25. A.M.S. Colloq. Publ., 1967.
[BK83] M. Brin and A. Katok. On local entropy. In Geometric dynamics (Rio
de Janeiro, 1981), volume 1007 of Lecture Notes in Math., pages 30–38.
Springer-Verlag, 1983.
References 505

[BLY] D. Burguet, G. Liao and J. Yang. Asymptotic h-expansiveness rate of C∞


maps. arxiv:1404.1771.
[Bos86] J.-B. Bost. Tores invariants des systèmes hamiltoniens. Astérisque,
133–134:113–157, 1986.
[Bos93] M. Boshernitzan. Quantitative recurrence results. Invent. Math., 113(3):
617–631, 1993.
[Bow71] R. Bowen. Entropy for group endomorphisms and homogeneous spaces.
Trans. Amer. Math. Soc., 153:401–414, 1971.
[Bow72] R. Bowen. Entropy expansive maps. Trans. Am. Math. Soc., 164:323–331,
1972.
[Bow75a] R. Bowen. Equilibrium states and the ergodic theory of Anosov
diffeomorphisms, volume 470 of Lect. Notes in Math. Springer-Verlag,
1975.
[Bow75b] R. Bowen. A horseshoe with positive measure. Invent. Math., 29:203–204,
1975.
[Bow78] R. Bowen. Entropy and the fundamental group. In The Structure of
Attractors in Dynamical Systems, volume 668 of Lecture Notes in Math.,
pages 21–29. Springer-Verlag, 1978.
[BS00] L. Barreira and J. Schmeling. Sets of “non-typical” points have full
topological entropy and full Hausdorff dimension. Israel J. Math.,
116:29–70, 2000.
[Buz97] J. Buzzi. Intrinsic ergodicity for smooth interval maps. Israel J. Math,
100:125–161, 1997.
[Car70] H. Cartan. Differential forms. Hermann, 1970.
[Cas04] A. A. Castro. Teoria da medida. Projeto Euclides. IMPA, 2004.
[Cla72] J. Clark. A Kolmogorov shift with no roots. ProQuest LLC, Ann Arbor,
MI, 1972. PhD. Thesis, Stanford University.
[dC79] M. do Carmo. Geometria riemanniana, volume 10 of Projeto Euclides.
Instituto de Matemática Pura e Aplicada, 1979.
[Dei85] K. Deimling. Nonlinear functional analysis. Springer-Verlag, 1985.
[Din70] E. Dinaburg. A correlation between topological entropy and metric
entropy. Dokl. Akad. Nauk SSSR, 190:19–22, 1970.
[Din71] E. Dinaburg. A connection between various entropy characterizations of
dynamical systems. Izv. Akad. Nauk SSSR Ser. Mat., 35:324–366, 1971.
[dlL93] R. de la Llave. Introduction to K.A.M. theory. In Computational physics
(Almuñécar, 1992), pages 73–105. World Sci. Publ., 1993.
[DS57] N. Dunford and J. Schwarz. Linear operators I: General theory. Wiley &
Sons, 1957.
[DS63] N. Dunford and J. Schwarz. Linear operators II: Spectral theory. Wiley
& Sons, 1963.
[Dug66] J. Dugundji. Topology. Allyn and Bacon Inc., 1966.
[Edw79] R. E. Edwards. Fourier series. A modern introduction. Vol. 1, volume 64
of Graduate Texts in Mathematics. Springer-Verlag, second edition, 1979.
[ET36] P. Erdös and P. Turán. On some sequences of integers. J. London. Math.
Soc., 11:261–264, 1936.
[Fal90] K. Falconer. Fractal geometry: Mathematical foundations and applica-
tions. John Wiley & Sons Ltd., 1990.
[Fer02] R. Fernandez. Medida e integração. Projeto Euclides. IMPA, 2002.
[FFT09] S. Ferenczi, A. Fisher and M. Talet. Minimality and unique ergodicity for
adic transformations. J. Anal. Math., 109:1–31, 2009.
506 References

[FO70] N. Friedman and D. Ornstein. On isomorphism of weak Bernoulli


transformations. Advances in Math., 5:365–394, 1970.
[Fri69] N. Friedman. Introduction to ergodic theory. Van Nostrand, 1969.
[Fur61] H. Furstenberg. Strict ergodicity and transformation of the torus. Amer. J.
Math., 83:573–601, 1961.
[Fur77] H. Furstenberg. Ergodic behavior and a theorem of Szemerédi on
arithmetic progressions. J. d’Analyse Math., 31:204–256, 1977.
[Fur81] H. Furstenberg. Recurrence in ergodic theory and combinatorial number
theory. Princeton University Press, 1981.
[Goo71a] T. Goodman. Relating topological entropy and measure entropy. Bull.
London Math. Soc., 3:176–180, 1971.
[Goo71b] G. Goodwin. Optimal input signals for nonlinear-system identification.
Proc. Inst. Elec. Engrs., 118:922–926, 1971.
[GT08] B. Green and T. Tao. The primes contain arbitrarily long arithmetic
progressions. Ann. of Math., 167:481–547, 2008.
[Gur61] B. M. Gurevič. The entropy of horocycle flows. Dokl. Akad. Nauk SSSR,
136:768–770, 1961.
[Hal50] P. Halmos. Measure Theory. Van Nostrand, 1950.
[Hal51] P. Halmos. Introduction to Hilbert space and the theory of spectral
multiplicity. Chelsea Publishing Company, 1951.
[Hay] N. Haydn. Multiple measures of maximal entropy and equilibrium states
for one-dimensional subshifts. Preprint, Penn State University.
[Hir94] M. Hirsch. Differential topology, volume 33 of Graduate Texts in
Mathematics. Springer-Verlag, 1994. Corrected reprint of the 1976
original.
[Hof77] F. Hofbauer. Examples for the nonuniqueness of the equilibrium state.
Trans. Amer. Math. Soc., 228:223–241, 1977.
[Hop39] E. F. Hopf. Statistik der geodätischen Linien in Mannigfaltigkeiten
negativer Krümmung. Ber. Verh. Sächs. Akad. Wiss. Leipzig, 91:261–304,
1939.
[HvN42] P. Halmos and J. von Neumann. Operator methods in classical mechanics.
II. Ann. Math., 43:332–350, 1942.
[Jac60] K. Jacobs. Neuere Methoden und Ergebnisse der Ergodentheorie. Ergeb-
nisse der Mathematik und ihrer Grenzgebiete. N. F., Heft 29.
Springer-Verlag, 1960.
[Jac63] K. Jacobs. Lecture notes on ergodic theory, 1962/63. Parts I, II.
Matematisk Institut, Aarhus Universitet, Aarhus, 1963.
[Kal82] S. Kalikow. T, T −1 transformation is not loosely Bernoulli. Ann. Math.,
115:393–409, 1982.
[Kat71] Yi. Katznelson. Ergodic automorphisms of T n are Bernoulli shifts. Israel
J. Math., 10:186–195, 1971.
[Kat80] A. Katok. Lyapunov exponents, entropy and periodic points of diffeomor-
phisms. Publ. Math. IHES, 51:137–173, 1980.
[Kea75] M. Keane. Interval exchange transformations. Math. Zeit., 141:25–31,
1975.
[KM10] S. Kalikow and R. McCutcheon. An outline of ergodic theory, volume 122
of Cambridge Studies in Advanced Mathematics. Cambridge University
Press, 2010.
[Kok35] J. F. Koksma. Ein mengentheoretischer Satz über die Gleichverteilung
modulo Eins. Compositio Math., 2:250–258, 1935.
References 507

[KR80] M. Keane and G. Rauzy. Stricte ergodicité des échanges d’intervalles.


Math. Zeit., 174:203–212, 1980.
[Kri70] W. Krieger. On entropy and generators of measure-preserving transfor-
mations. Trans. Amer. Math. Soc., 149:453–464, 1970.
[Kri75] W. Krieger. On the uniqueness of the equilibrium state. Math. Systems
Theory, 8:97–104, 1974/75.
[KSS91] A. Krámli, N. Simányi and D. Szász. The K-property of three billiard
balls. Ann. Math., 133:37–72, 1991.
[KSS92] A. Krámli, N. Simányi and D. Szász. The K-property of four billiard balls.
Comm. Math. Phys., 144:107–148, 1992.
[KW82] Y. Katznelson and B. Weiss. A simple proof of some ergodic theorems.
Israel J. Math., 42:291–296, 1982.
[Lan73] O. Lanford. Entropy and equilibrium states in classical statistical mechan-
ics. In Statistical mechanics and mathematical problems, volume 20 of
Lecture Notes in Physics, page 1–113. Springer-Verlag, 1973.
[Led84] F. Ledrappier. Propriétés ergodiques des mesures de Sinaı̈. Publ. Math.
I.H.E.S., 59:163–188, 1984.
[Lin77] D. Lind. The structure of skew products with ergodic group actions. Israel
J. Math., 28:205–248, 1977.
[LS82] F. Ledrappier and J.-M. Strelcyn. A proof of the estimation from
below in Pesin’s entropy formula. Ergod. Th & Dynam. Sys, 2:203–219,
1982.
[LVY13] G. Liao, M. Viana and J. Yang. The entropy conjecture for dif-
feomorphisms away from tangencies. J. Eur. Math. Soc. (JEMS),
15(6):2043–2060, 2013.
[LY85a] F. Ledrappier and L.-S. Young. The metric entropy of diffeomorphisms.
I. Characterization of measures satisfying Pesin’s entropy formula. Ann.
Math., 122:509–539, 1985.
[LY85b] F. Ledrappier and L.-S. Young. The metric entropy of diffeomorphisms.
II. Relations between entropy, exponents and dimension. Ann. Math.,
122:540–574, 1985.
[Man75] A. Manning. Topological entropy and the first homology group. In
Dynamical Systems, Warwick, 1974, volume 468 of Lecture Notes in
Math., pages 185–190. Springer-Verlag, 1975.
[Mañ85] R. Mañé. Hyperbolicity, sinks and measure in one-dimensional dynamics.
Comm. Math. Phys., 100:495–524, 1985.
[Mañ87] R. Mañé. Ergodic theory and differentiable dynamics. Springer-Verlag,
1987.
[Mas82] H. Masur. Interval exchange transformations and measured foliations.
Ann. Math, 115:169–200, 1982.
[Mey00] C. Meyer. Matrix analysis and applied linear algebra. Society for
Industrial and Applied Mathematics (SIAM), 2000.
[Mis73] M. Misiurewicz. Diffeomorphim without any measure of maximal
entropy. Bull. Acad. Pol. Sci., 21:903–910, 1973.
[Mis76] M. Misiurewicz. A short proof of the variational principle for a Z+ N action

on a compact space. Asterisque, 40:147–187, 1976.


[MP77a] M. Misiurewicz and F. Przytycki. Entropy conjecture for tori. Bull. Pol.
Acad. Sci. Math., 25:575–578, 1977.
[MP77b] M. Misiurewicz and F. Przytycki. Topological entropy and degree of
smooth mappings. Bull. Pol. Acad. Sci. Math., 25:573–574, 1977.
508 References

[MP08] W. Marzantowicz and F. Przytycki. Estimates of the topological entropy


from below for continuous self-maps on some compact manifolds.
Discrete Contin. Dyn. Syst. Ser., 21:501–512, 2008.
[MT78] G. Miles and R. Thomas. Generalized torus automorphisms are Bernoul-
lian. Advances in Math. Supplementary Studies, 2:231–249, 1978.
[New88] S. Newhouse. Entropy and volume. Ergodic Theory Dynam. Systems,
8∗ (Charles Conley Memorial Issue):283–299, 1988.
[New90] S. Newhouse. Continuity properties of entropy. Ann. Math., 129:215–235,
1990. Errata in Ann. Math. 131:409–410, 1990.
[NP66] D. Newton and W. Parry. On a factor automorphism of a normal
dynamical system. Ann. Math. Statist., 37:1528–1533, 1966.
[NR97] A. Nogueira and D. Rudolph. Topological weak-mixing of interval
exchange maps. Ergod. Th. & Dynam. Sys., 17:1183–1209, 1997.
[Orn60] D. Ornstein. On invariant measures. Bull. Amer. Math. Soc., 66:297–300,
1960.
[Orn70] D. Ornstein. Bernoulli shifts with the same entropy are isomorphic.
Advances in Math., 4:337–352 (1970), 1970.
[Orn72] Donald S. Ornstein. On the root problem in ergodic theory. In Pro-
ceedings of the Sixth Berkeley Symposium on Mathematical Statis-
tics and Probability (Univ. California, Berkeley, Calif., 1970/1971),
Vol. II: Probability theory, pages 347–356. Univ. California Press,
1972.
[Orn74] D. Ornstein. Ergodic theory, randomness, and dynamical systems. Yale
University Press, 1974. James K. Whittemore Lectures in Mathematics
given at Yale University, Yale Mathematical Monographs, No. 5.
[OS73] D. Ornstein and P. Shields. An uncountable family of K-automorphisms.
Advances in Math., 10:63–88, 1973.
[OU41] J. C. Oxtoby and S. M. Ulam. Measure-preserving homeomorphisms and
metrical transitivity. Ann. Math., 42:874–920, 1941.
[Par53] O. S. Parasyuk. Flows of horocycles on surfaces of constant negative
curvature. Uspehi Matem. Nauk (N.S.), 8:125–126, 1953.
[Pes77] Ya. B. Pesin. Characteristic Lyapunov exponents and smooth ergodic
theory. Russian Math. Surveys, 324:55–114, 1977.
[Pes97] Ya. Pesin. Dimension theory in dynamical systems: Contemporary views
and applications. University of Chicago Press, 1997.
[Pet83] K. Petersen. Ergodic theory. Cambridge University Press, 1983.
[Phe93] R. Phelps. Convex functions, monotone operators and differentiability,
volume 1364 of Lecture Notes in Mathematics. Springer-Verlag, second
edition, 1993.
[Pin60] M. S. Pinsker. Informatsiya i informatsionnaya ustoichivostsluchainykh
velichin i protsessov. Problemy Peredači Informacii, Vyp. 7. Izdat. Akad.
Nauk SSSR, 1960.
[PT93] J. Palis and F. Takens. Hyperbolicity and sensitive-chaotic dynamics at
homoclinic bifurcations. Cambridge University Press, 1993.
[PU10] F. Przytycki and M. Urbański. Conformal fractals: Ergodic theory
methods, volume 371 of London Mathematical Society Lecture Note
Series. Cambridge University Press, 2010.
[PW72a] W. Parry and P. Walters. Errata: “Endomorphisms of a Lebesgue space”.
Bull. Amer. Math. Soc., 78:628, 1972.
References 509

[PW72b] W. Parry and P. Walters. Endomorphisms of a Lebesgue space. Bull.


Amer. Math. Soc., 78:272–276, 1972.
[PY98] M. Pollicott and M. Yuri. Dynamical systems and ergodic theory,
volume 40 of London Mathematical Society Student Texts. Cambridge
University Press, 1998.
[Qua99] A. Quas. Most expanding maps have no absolutely continuous invariant
mesure. Studia Math., 134:69–78, 1999.
[Que87] M. Queffélec. Substitution dynamical systems—spectral analysis, volume
1294 of Lecture Notes in Mathematics. Springer-Verlag, 1987.
[Rok61] V. A. Rokhlin. Exact endomorphisms of a Lebesgue space. Izv. Akad.
Nauk SSSR Ser. Mat., 25:499–530, 1961.
[Rok62] V. A. Rokhlin. On the fundamental ideas of measure theory. A. M. S.
Transl., 10:1–54, 1962. Transl. from Mat. Sbornik 25 (1949), 107–150.
First published by the A. M. S. in 1952 as Translation Number 71.
[Rok67a] V. A. Rokhlin. Lectures on the entropy theory of measure-preserving
transformations. Russ. Math. Surv., 22(5):1–52, 1967. Transl. from
Uspekhi Mat. Nauk. 22(5) (1967), 3–56.
[Rok67b] V. A. Rokhlin. Metric properties of endomorphisms of compact commu-
tative groups. Amer. Math. Soc. Transl., 64:244–252, 1967.
[Roy63] H. L. Royden. Real analysis. Macmillan, 1963.
[RS61] V. A. Rokhlin and Ja. G. Sinaı̆. The structure and properties of invariant
measurable partitions. Dokl. Akad. Nauk SSSR, 141:1038–1041, 1961.
[Rud87] W. Rudin. Real and complex analysis. McGraw-Hill, 1987.
[Rue73] D. Ruelle. Statistical mechanics on a compact set with Z ν action
satisfying expansiveness and specification. Trans. Amer. Math. Soc.,
186:237–251, 1973.
[Rue78] D. Ruelle. An inequality for the entropy of differentiable maps. Bull.
Braz. Math. Soc., 9:83–87, 1978.
[Rue04] D. Ruelle. Thermodynamic formalism: The mathematical structures
of equilibrium statistical mechanics. Cambridge Mathematical Library.
Cambridge University Press, second edition, 2004.
[RY80] C. Robinson and L. S. Young. Nonabsolutely continuous foliations for an
Anosov diffeomorphism. Invent. Math., 61:159–176, 1980.
[SC87] Ya. Sinaı̆ and Nikolay Chernov. Ergodic properties of some systems of
two-dimensional disks and three-dimensional balls. Uspekhi Mat. Nauk,
42:153–174, 256, 1987.
[Shu69] M. Shub. Endomorphisms of compact differentiable manifolds. Amer.
Journal of Math., 91:129–155, 1969.
[Shu74] M. Shub. Dynamical systems, filtrations and entropy. Bull. Amer. Math.
Soc., 80:27–41, 1974.
[Sim02] N. Simányi. The complete hyperbolicity of cylindric billiards. Ergodic
Theory Dynam. Systems, 22:281–302, 2002.
[Sin63] Ya. Sinaı̆. On the foundations of the ergodic hypothesis for a dynamical
system of statistical mechanics. Soviet. Math. Dokl., 4:1818–1822,
1963.
[Sin70] Ya. Sinaı̆. Dynamical systems with elastic reflections. Ergodic properties
of dispersing billiards. Uspehi Mat. Nauk, 25:141–192, 1970.
[Ste58] E. Sternberg. On the structure of local homeomorphisms of Euclidean
n-space – II. Amer. J. Math., 80:623–631, 1958.
510 References

[SW75] M. Shub and R. Williams. Entropy and stability. Topology, 14:329–338,


1975.
[SX10] R. Saghin and Z. Xia. The entropy conjecture for partially hyperbolic
diffeomorphisms with 1-D center. Topology Appl., 157:29–34, 2010.
[Sze75] S. Szemerédi. On sets of integers containing no k elements in arithmetic
progression. Acta Arith., 27:199–245, 1975.
[vdW27] B. van der Waerden. Beweis eibe Baudetschen Vermutung. Nieuw Arch.
Wisk., 15:212–216, 1927.
[Vee82] W. Veech. Gauss measures for transformations on the space of interval
exchange maps. Ann. of Math., 115:201–242, 1982.
[Ver99] Alberto Verjovsky. Sistemas de Anosov, volume 9 of Monographs of the
Institute of Mathematics and Related Sciences. Instituto de Matemática y
Ciencias Afines, IMCA, Lima, 1999.
[Via14] M. Viana. Lectures on Lyapunov exponents. Cambridge University Press,
2014.
[VO14] M. Viana and K. Oliveira. Fundamentos da Teoria Ergódica. Coleção
Fronteiras da Matemática. Sociedade Brasileira de Matemática, 2014.
[Wal73] P. Walters. Some results on the classification of non-invertible measure
preserving transformations. In Recent advances in topological dynamics
(Proc. Conf. Topological Dynamics, Yale Univ., New Haven, Conn.,
1972; in honor of Gustav Arnold Hedlund), pages 266–276. Lecture
Notes in Math., Vol. 318. Springer-Verlag, 1973.
[Wal75] P. Walters. A variational principle for the pressure of continuous
transformations. Amer. J. Math., 97:937–971, 1975.
[Wal82] P. Walters. An introduction to ergodic theory. Springer-Verlag, 1982.
[Wey16] H. Weyl. Uber die Gleichverteilungen von Zahlen mod Eins. Math. Ann.,
77:313–352, 1916.
[Yan80] K. Yano. A remark on the topological entropy of homeomorphisms.
Invent. Math., 59:215–220, 1980.
[Yoc92] J.-C. Yoccoz. Travaux de Herman sur les tores invariants. Astérisque,
206:Exp. No. 754, 4, 311–344, 1992. Séminaire Bourbaki, Vol. 1991/92.
[Yom87] Y. Yomdin. Volume growth and entropy. Israel J. Math., 57:285–300,
1987.
[Yos68] K. Yosida. Functional analysis. Second edition. Die Grundlehren der
mathematischen Wissenschaften, Band 123. Springer-Verlag, 1968.
[Yuz68] S. A. Yuzvinskii. Metric properties of endomorphisms of compact groups.
Amer. Math. Soc. Transl., 66:63–98, 1968.
[Zyg68] A. Zygmund. Trigonometric series: Vols. I, II. Second edition, reprinted
with corrections and some additions. Cambridge University Press, 1968.

You might also like