Ergodic Theory and Recurrence
Ergodic Theory and Recurrence
Recurrence
The same theme reappears in Section 1.5, in a more elaborate context: there,
we deal with any finite number of dynamical systems commuting with each
other, and we seek simultaneous returns of the orbits of all those systems to the
neighborhood of the initial state. This kind of result has important applications
in combinatorics and number theory, as we will see.
The recurrence phenomenon is also behind the constructions that we present
in Section 1.4. The basic idea is to fix some positive measure subset of
the domain and to consider the first return to that subset. This first-return
transformation is often easier to analyze, and it may be used to shed much
light on the behavior of the original transformation.
Proof. Suppose that the measure μ is invariant under f . We are going to show
that the relation (1.1.4) is valid for increasingly broader classes of functions.
Let XB denote the characteristic function of any measurable subset B. Then
μ(B) = XB dμ and μ(f −1 (B)) = Xf −1 (B) dμ = (XB ◦ f ) dμ.
Thus, the hypothesis μ(B) = μ(f −1 (B)) means that (1.1.4) is valid for
characteristic functions. Then, by linearity of the integral, (1.1.4) is valid for all
simple functions. Next, given any integrable φ : M → R, consider a sequence
(sn )n of simple functions, converging to φ and such that |sn | ≤ |φ| for every n.
That such a sequence exists is guaranteed by Proposition A.1.33. Then, using
the dominated convergence theorem (Theorem A.2.11) twice:
φ dμ = lim sn dμ = lim (sn ◦ f ) dμ = (φ ◦ f ) dμ.
n n
This shows that (1.1.4) holds for every integrable function if μ is invariant.
The converse is also contained in the arguments we just presented.
1.1.1 Exercises
1.1.1. Let f : M → M be a measurable transformation. Show that a Dirac measure δp is
invariant under f if and only if p is a fixed point of f . More generally, a probability
measure δp,k = k−1 δp + δf (p) + · · · + δf k−1 (p) is invariant under f if and only if
f k (p) = p.
1.1.2. Prove the following version of Proposition 1.1.1. Let M be a metric space, f :
M → M be a measurable transformation and μ be a measure on M. Show that f
preserves μ if and only if
φ dμ = φ ◦ f dμ
The expression on the left-hand side is finite, since the measure μ is assumed
to be finite. On the right-hand side we have a sum of infinitely many terms that
are all equal. The only way such a sum can be finite is if the terms vanish. So,
μ(E0 ) = 0 as claimed.
Now let us denote by F the set of points x ∈ E that return to E a finite number
of times. It is clear from the definition that every point x ∈ F has some iterate
1.2 Poincaré recurrence theorem 5
if the set on the right-hand side is non-empty and ρE (x) = ∞ if, on the contrary,
x has no iterate in E. According to Theorem 1.2.1, the second alternative occurs
only on a set with zero measure.
The next result shows that this function is integrable and even provides the
value of the integral. For the statement we need the following notation:
E0 = {x ∈ E : f n (x) ∈
/ E for every n ≥ 1} and
E0∗ = {x ∈ M : f n (x) ∈
/ E for every n ≥ 0}.
In other words, E0 is the set of points in E that never return to E and E0∗ is
the set of points in M that never enter E. We have seen in Theorem 1.2.1 that
μ(E0 ) = 0.
ρE is integrable and
ρE dμ = μ(M) − μ(E0∗ ).
E
f −1 (En∗ ) = En+1
∗
∪ En+1 for every n. (1.2.3)
Indeed, f (y) ∈ En∗ means that the first iterate of f (y) that belongs to E is
∗
f n (f (y)) = f n+1 (y) and that occurs if and only if y ∈ En+1 or else y ∈ En+1 .
This proves the equality (1.2.3). So, given that μ is invariant,
The relation (1.2.2) implies that μ(Em∗ ) → 0 when m → ∞. So, taking the limit
as m → ∞ in the equality (1.2.4), we find that
∞
μ(En∗ ) = μ(Ei ). (1.2.5)
i=n+1
To complete the proof, replace (1.2.5) in the equality (1.2.2). In this way we
find that
∞
∞ ∞
μ(M) − μ(E0∗ ) = μ(Ei ) = nμ(En ) = ρE dμ,
n=1 i=n n=1 E
as we wanted to prove.
1.2 Poincaré recurrence theorem 7
In some cases, for example when the system (f , μ) is ergodic (this property
will be defined and studied later, starting from Chapter 4), the set E0∗ has zero
measure. Then the conclusion of the Kac̆ theorem means that
1 μ(M)
ρE dμ = (1.2.6)
μ(E) E μ(E)
for every measurable set E with positive measure. The left-hand side of this
expression is the mean return time to E. So, (1.2.6) asserts that the mean return
time is inversely proportional to the measure of E.
Remark 1.2.3. By definition, En∗ = f −n (E) \ n−1
k=0 f
−k
(E). So, the fact that the
∗
sum (1.2.2) is finite implies that the measure of En converges to zero when
n → ∞. This fact will be useful later.
Proof. For each k, denote by Ũk the set of points x ∈ Uk that never return to
Uk . According to Theorem 1.2.1, every Ũk has zero measure. Consequently,
the countable union
Ũ = Ũk
k∈N
also has zero measure. Hence, to prove the theorem it suffices to check that
every point x that is not in Ũ is recurrent. That is easy, as we are going to see.
Consider x ∈ M \ Ũ and let U be any neighborhood of x. By definition, there
exists some element Uk of the basis of open sets such that x ∈ Uk and Uk ⊂ U.
Since x is not in Ũ, we also have that x ∈ / Ũk . In other words, there exists n ≥ 1
such that f n (x) is in Uk . In particular, f n (x) is also in U. Since the neighborhood
U is arbitrary, this proves that x is a recurrent point.
Let us point out that the conclusions of Theorems 1.2.1 and 1.2.4 are false,
in general, if the measure μ is not finite:
8 Recurrence
Proof. Consider the family I of all non-empty closed sets X ⊂ M that are
invariant under f , in the sense that f (X) ⊂ X. This family is non-empty, since
M ∈ I. We claim that an element X ∈ I is minimal for the inclusion relation
if and only if the orbit of every x ∈ X is dense in X. Indeed, it is clear that if
X is a closed invariant subset then X contains the closure of the orbit of each
one of its elements. Hence, in order to be minimal, X must coincide with every
one of these closures. Conversely, for the same reason, if X coincides with the
orbit closure of each one of its points then it has no proper subset that is closed
and invariant. That is, X is minimal. This proves our claim. In particular, every
point x in a minimal set is recurrent. Therefore, to prove the theorem it suffices
to prove that there exists some minimal set.
We claim that every totally ordered set {Xα } ⊂ I admits a lower bound.
Indeed, consider X = α Xα . Observe that X is non-empty, since the Xα are
compact and they form a totally ordered family. It is clear that X is closed and
invariant under f and it is equally clear that X is a lower bound for the set {Xα }.
That proves our claim. Now it follows from Zorn’s lemma that I does contain
minimal elements.
Theorem 1.2.6 can also be deduced from Theorem 1.2.4 together with
the fact, which we will prove later (in Chapter 2), that every continuous
transformation on a compact metric space admits some invariant probability
measure.
1.2.4 Exercises
1.2.1. Show that the following statement is equivalent to Theorem 1.2.1, meaning that
each one of them can be obtained from the other. Let f : M → M be a measurable
transformation and μ be a finite invariant measure. Let E ⊂ M be any measurable
1.2 Poincaré recurrence theorem 9
set with μ(E) > 0. Then there exists N ≥ 1 and a positive measure set D ⊂ E such
that f N (x) ∈ E for every x ∈ D.
1.2.2. Let f : M → M be an invertible transformation and suppose that μ is an invariant
measure, not necessarily finite. Let B ⊂ M be a set with finite measure. Prove
that, given any measurable set E ⊂ M with positive measure, μ-almost every
point x ∈ E either returns to E an infinite number of times or has only a finite
number of iterates in B.
1.2.3. Let f : M → M be an invertible transformation and suppose that μ is a σ -finite
invariant measure: there exists an increasing sequence of measurable subsets Mk
with μ(Mk ) < ∞ for every k and k Mk = M. We say that a point x goes to
infinity if, for every k, there exists only a finite number of iterates of x that are
in Mk . Show that, given any E ⊂ M with positive measure, μ-almost every point
x ∈ E returns to E an infinite number of times or else goes to infinity.
1.2.4. Let f : M → M be a measurable transformation, not necessarily invertible, μ be
an invariant probability measure and D ⊂ M be a set with positive measure. Prove
that almost every point of D spends a positive fraction of time in D:
1
lim sup #{0 ≤ j ≤ n − 1 : f j (x) ∈ D} > 0
n n
for μ-almost every x ∈ D. [Note: One may replace lim sup by lim inf in the
statement, but the proof of that fact will have to wait until Chapter 3.]
1.2.5. Let f : M → M be a measurable transformation preserving a finite measure μ.
Given any measurable set A ⊂ M with μ(A) > 0, let n1 < n2 < · · · be the sequence
of values of n such that μ(f −n (A) ∩ A) > 0. The goal of this exercise is to prove
that VA = {n1 , n2 , . . . } is a syndetic, that is, that there exists C > 0 such that ni+1 −
ni ≤ C for every i.
(a) Show that for any increasing sequence k1 < k2 < · · · there exist j > i ≥ 1 such
that μ(A ∩ f −(kj −ki ) (A)) > 0.
(b) Given any infinite sequence = (lj )j of natural numbers, denote by S( ) the
set of all finite sums of consecutive elements of . Show that VA intersects
S( ) for every .
(c) Deduce that the set VA is syndetic.
[Note: Exercise 3.1.2 provides a different proof of this fact.]
1.2.6. Show that if f : [0, 1] → [0, 1] is a measurable transformation preserving the
Lebesgue measure m then m-almost every point x ∈ [0, 1] satisfies
lim inf n|f n (x) − x| ≤ 1.
n
[Note: Boshernitzan [Bos93] proved a much more general result, namely that
lim infn n1/d d(f n (x), x) < ∞ for μ-almost every point and every probability
measure μ invariant under f : M → M, assuming M is a separable metric whose
d-dimensional Hausdorff measure is σ -finite.]
1.2.7. Define f : [0, 1] →√[0, 1] by f (x) = (x + ω) − [x + ω], where ω represents the
golden ratio (1 + 5)/2. Given x ∈ [0, 1], check that n|f n (x) − x| = n2 |ω − qn |
for every n, where (qn )n → ω is the sequence of rational numbers given by qn =
[x + nω]/n. √ Using that the roots of the polynomial R(z)
√ = z2 − z − 1 are precisely
ω and ω − 5, prove that lim infn n2 |ω − qn | ≥ 1/ 5. [Note: This shows that
the constant 1 in Exercise 1.2.6 cannot be replaced by any constant smaller than
10 Recurrence
√
1/ 5. It is not known whether 1 is the smallest constant such that the statement
holds for every transformation on the interval.]
1.3 Examples
Next, we describe some simple examples of invariant measures for transforma-
tions and flows that help us interpret the significance of the Poincaré recurrence
theorems and also lead to some interesting conclusions.
Here and in what follows, we use [y] as the integer part of a real number y,
that is, the largest integer smaller than or equal y. So, f is the map sending
each x ∈ [0, 1] to the fractional part of 10x. Figure 1.1 represents the graph
of f .
We claim that the Lebesgue measure μ on the interval is invariant under the
transformation f , that is, it satisfies
union of intervals. Now, the family of all finite unions of intervals is an algebra
that generates the Borel σ -algebra of [0, 1]. Hence, to conclude the proof it is
enough to use the following general fact:
μ(E) = lim μ(Ei ) and μ(f −1 (E)) = lim μ(f −1 (Ei )).
i i
Hence, E ∈ C. In precisely the same way, one gets that the intersection of any
decreasing sequence of elements of C is in C. This proves that C is indeed a
monotone class.
Now it is easy to deduce the conclusion of the lemma. Indeed, since
C is assumed to contain A, we may use the monotone class theorem
(Theorem A.1.18), to conclude that C contains the σ -algebra B generated by
A. That is precisely what we wanted to prove.
Now we explain how one may use the fact that the Lebesgue measure is
invariant under f , together with the Poincaré recurrence theorem, to reach some
interesting conclusions. The transformation f is directly related to the usual
decimal expansion of a real number: if x is given by
x = 0.a0 a1 a2 a3 · · ·
with ai ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and ai = 9 for infinitely many values of i, then
its image under f is given by
f (x) = 0.a1 a2 a3 · · · .
Thus, more generally, the n-th iterate of f can be expressed in the following
way, for every n ≥ 1:
f n (x) = 0.an an+1 an+2 · · · (1.3.2)
Let E be the subset of points x ∈ [0, 1] whose decimal expansion starts with
the digit 7, that is, such that a0 = 7. According to Theorem 1.2.1, almost every
element in E has infinitely many iterates that are also in E. By the expression
(1.3.2), this means that there are infinitely many values of n such that an = 7.
12 Recurrence
So, we have shown that almost every number x whose decimal expansion starts
with 7 has infinitely many digits equal to 7.
Of course, instead of 7 we may consider any other digit. Even more, there
is a similar result (see Exercise 1.3.2) when, instead of a single digit, one
considers a block of k ≥ 1 consecutive digits. Later on, in Chapter 3, we will
prove a much stronger fact: for almost every number x ∈ [0, 1], every digit
occurs with frequency 1/10 (more generally, every block of k ≥ 1 digits occurs
with frequency 1/10k ) in the decimal expansion of x.
1 1
a1 = and x1 = − a1 .
x0 x0
1 1
a2 = and x2 = − a2 .
x1 x1
Then
1 1
x1 = and so x0 = .
a1 + x2 1
a1 +
a2 + x2
Now we may proceed by induction: for each n ≥ 1 such that xn−1 ∈ (0, 1),
define
1 1
an = and xn = − an = G(xn−1 ),
xn−1 xn−1
and observe that
1
x0 = . (1.3.3)
1
a1 +
1
a2 +
1
··· +
an + xn
1.3 Examples 13
A remarkable fact that makes this transformation interesting from the point
of view of ergodic theory is that G admits an invariant probability measure
that, in addition, is equivalent to the Lebesgue measure on the interval. Indeed,
consider the measure defined by
c
μ(E) = dx for every measurable set E ⊂ [0, 1], (1.3.6)
E 1+x
where c is a positive constant. The integral is well defined, since the function
in the integral is continuous on the interval [0, 1]. Moreover, this function takes
values inside the interval [c/2, c] and that implies
c
m(E) ≤ μ(E) ≤ c m(E) for every measurable set E ⊂ [0, 1]. (1.3.7)
2
In particular, μ is indeed equivalent to the Lebesgue measure m.
Lemma 1.3.3. Let f : [0, 1] → [0, 1] be a transformation such that there exist
pairwise disjoint open intervals I1 , I2 , . . . satisfying
for almost every y ∈ [0, 1]. Then the measure μ = ρdx is invariant under f .
1.3 Examples 15
Note that (fk−1 ) (y) = 1/f (fk−1 (y)). So, the previous relation implies that
1 ∞
φ(f (x))ρ(x) dx = φ(f (x))ρ(x) dx
0 k=1 Ik
∞
(1.3.9)
1
ρ(fk−1 (y))
= φ(y) dy.
k=1 0 |f (fk−1 (y))|
Using the monotone convergence theorem (Theorem A.2.9) and the hypothesis
(1.3.8), we see that the last expression in (1.3.9) is equal to
∞
1
ρ(fk−1 (y)) 1
φ(y) dy = φ(y)ρ(y) dy.
0 k=1
|f (fk−1 (y))| 0
1 1
In this way we find that 0 φ(f (x))ρ(x) dx = 0 φ(y)ρ(y) dy. Since μ = ρdx and
φ = XE , this means that μ(f −1 (E)) = μ(E) for every measurable set E ⊂ [0, 1].
In other words, μ is invariant under f .
To conclude the proof of Proposition 1.3.2 we must show that the condition
(1.3.8) holds for ρ(x) = c/(1 + x) and f = G. Let Gk denote the restriction of
G to the interval Ik = (1/(k + 1), 1/k), for k ≥ 1. Note that G−1
k (y) = 1/(y + k)
for every k. Note also that G (x) = (1/x) = −1/x for every x = 0. Therefore,
2
∞ ∞ 2 ∞
ρ(G−1 (y)) c(y + k) 1 c
k
−1
= = .
k=1
|G (G (y))|
k k=1
y + k + 1 y + k k=1
(y + k)(y + k + 1)
(1.3.10)
Observing that
1 1 1
= − ,
(y + k)(y + k + 1) y + k y + k + 1
we see that the last sum in (1.3.10) has a telescopic structure: except for the
first one, all the terms occur twice, with opposite signs, and so they cancel out.
This means that the sum is equal to the first term:
∞
c c
= = ρ(y).
k=1
(y + k)(y + k + 1) y + 1
This proves that the equality (1.3.8) is indeed satisfied and, hence, we may use
Lemma 1.3.1 to conclude that μ is invariant under f .
16 Recurrence
This proposition allows us to use ideas from ergodic theory, applied to the
Gauss map, to obtain interesting conclusions in number theory. For example
(see Exercise 1.3.3), the natural number 7 occurs infinitely many times in the
continued fraction expansion of almost every number x0 ∈ (1/8, 1/7), that is,
one has an = 7 for infinitely many values of n ∈ N. Later on, in Chapter 3,
we will prove a much more precise statement, that contains the following
conclusion: for almost every x0 ∈ (0, 1) the number 7 occurs with frequency
1 64
log
log 2 63
in the continued fraction expansion of x0 . Try to guess right away where this
number comes from!
such that |qθ − p| < ε. Note that the number a = qθ − p is necessarily different
from zero, since θ is irrational. Let us suppose that a is positive (the case when
a is negative is analogous). Subdividing the real line into intervals of length a,
we see that there exists an integer l such that 0 ≤ r − la < a. This implies that
|r − (lqθ − lp)| = |r − la| < a < ε.
As m = lq and n = −lq are integers and ε is arbitrary, this proves that r is in
the closure of the set D, for every r ∈ R.
Now, given y ∈ R and ε > 0, we may take r = y − x and, using the previous
paragraph, we may find m, n ∈ Z such that |m + nθ − (y − x)| < ε. This is
equivalent to saying that the distance from [y] to the iterate Rnθ ([x]) is less than
ε. Since x, y and ε are arbitrary, this shows that every orbit O([x]) is dense
in S1 .
In particular, it follows that every point on the circle is recurrent for Rθ (this
is also true when θ is rational). The previous proposition also leads to some
interesting conclusions in the study of the invariant measures of Rθ . Among
other things, we will learn later (in Chapter 6) that if θ is irrational then the
Lebesgue measure is the unique probability measure that is preserved by Rθ .
Related to this, we will see that the orbits of Rθ are uniformly distributed
subsets of S1 .
Proof. Suppose that the absolute value | det Df | of its Jacobian is equal to 1
at every point. Let E be any measurable set E and B = f −1 (E). The formula
(1.3.12) yields
vol(E) = 1 dx = vol(B) = vol(f −1 (E)).
B
This means that f preserves the measure vol and so we proved the “if” part of
the statement.
To prove the “only if,” suppose that | det Df (x)| > 1 for some point x ∈ M.
Then, since the Jacobian is continuous, there exists a neighborhood U of x and
some number σ > 1 such that
| det Df (y)| ≥ σ for all y ∈ U.
Then, applying (1.3.12) to B = U, we get that
vol(f (U)) ≥ σ dx ≥ σ vol(U).
U
20 Recurrence
Denote E = f (U). Since vol(U) > 0, the previous inequality implies that
vol(E) > vol(f −1 (E)). Hence, f does not leave vol invariant. In precisely the
same way, one shows that if | det Df (x)| < 1 for some point x ∈ M then f does
not leave the measure vol invariant.
Recall that the divergence of a vector field F is the trace of its Jacobian matrix,
that is
∂F1 ∂Fd
div F = + ··· + . (1.3.15)
∂x1 ∂xd
Combining the Liouville formula with (1.3.13), we obtain:
where ω = ρdx1 · · · dxd is the expression of the volume form in those local
coordinates. Let F be a C1 vector field on M. Writing
Then, it follows from the recurrence theorem for flows that, assuming that
the manifold has finite volume (for example, if M is compact) and div F = 0,
then almost every point is recurrent for the flow of F.
1.3.7 Exercises
1.3.1. Use Lemma 1.3.3 to give another proof of the fact that the decimal expansion
transformation f (x) = 10x − [10x] preserves the Lebesgue measure on the
interval.
1.3.2. Prove that, for any number x ∈ [0, 1] whose decimal expansion contains the
block 617 (for instance, x = 0.3375617264 · · · ), that block occurs infinitely
many times in the decimal expansion of x. Even more, the block 617 occurs
infinitely many times in the decimal expansion of almost every x ∈ [0, 1].
1.3.3. Prove that the number 617 appears infinitely many times in the continued
fraction expression of almost every number x0 ∈ (1/618, 1/617), that is, one
has an = 617 for infinitely many values of n ∈ N.
22 Recurrence
1.3.4. Let G be the Gauss map. Show that a number x ∈ (0, 1) is rational if and only if
there exists n ≥ 1 such that Gn (x) = 0.
1.3.5. Consider the sequence 1, 2, 4, 8, . . . , an = 2n , . . . of all the powers of 2. Prove that,
given any digit i ∈ {1, . . . , 9}, there exist infinitely many values of n for which an
starts with that digit.
1.3.6. Prove the following extension of Lemma 1.3.3. Let f : M → M be a C1 local
diffeomorphism on a compact Riemannian manifold M. Let vol be the volume
measure on M and ρ : M → [0, ∞) be a continuous function. Then f preserves
the measure μ = ρ vol if and only if
ρ(x)
= ρ(y) for every y ∈ M.
| det Df (x)|
x∈f −1 (y)
When f is invertible this means that f preserves the measure μ if and only if
ρ(x) = ρ(f (x))| det Df (x)| for every x ∈ M.
1.3.7. Check that if A is a d × d matrix with integer coefficients and determinant
different from zero then the transformation fA : Td → Td defined on the torus by
fA ([x]) = [A(x)] preserves the Lebesgue measure on Td .
1.3.8. Show that the Lebesgue measure on S1 is the only probability measure invariant
under all the rotations of S1 , even if we restrict to rational rotations. [Note: We
will see in Chapter 6 that, for any irrational θ , the Lebesgue measure is the
unique probability measure invariant under Rθ .]
1.3.9. Suppose that θ = (θ1 , . . . , θd ) is rationally dependent. Show that there exists a
continuous non-constant function ϕ : Td → C such that ϕ ◦ Rθ = ϕ. Conclude
that there exist non-empty open subsets U and V of Td that are disjoint and
invariant under Rθ , in the sense that Rθ (U) = U and Rθ (V) = V. Deduce that no
orbit O([x]) of the rotation Rθ is dense in Td .
1.3.10. Suppose that θ = (θ1 , . . . , θd ) is rationally independent. Prove that if V is
a non-empty open subset of Td invariant under Rθ , then V is dense in Td .
Conclude that n∈Z Rnθ (U) is dense in the torus, for every non-empty open
subset U. Deduce that there exists [x] whose orbit O([x]) under the rotation
Rθ is dense in Td . Conclude that O([y]) is dense in Td for every [y].
1.3.11. Let U be an open subset of R2d and H : U → R be a C2 function. Denote by
(p1 , . . . , pd , q1 , . . . , qd ) the coordinate variables in R2d . The Hamiltonian vector
field associated with H is defined by
∂H ∂H ∂H ∂H
F(p1 , . . . , pd , q1 , . . . , qd ) = ,..., ,− ,..., − .
∂q1 ∂qd ∂p1 ∂pd
1.4 Induction
In this section we describe a general method, based on the Poincaré recurrence
theorem, to construct from a given system (f , μ) other systems, that we refer to
as systems induced by (f , μ). The reason this is interesting is the following. On
the one hand, it is often the case that an induced system is easier to analyze,
because it has better global properties than the original one. On the other hand,
interesting conclusions about the original system can often be obtained from
analyzing the induced one. Examples will appear in a while.
Together with (1.4.1), this shows that μ(g−1 (B)) = μ(B) for every measurable
subset B of E. That is to say, the measure μE is invariant under g.
Proof. By definition, f −n (E) ∩ Ek = ∅ for every 0 < n < k. This implies that,
given any measurable set B ⊂ E, all the terms with n > 0 in the definition
(1.4.5) are zero. Hence, νρ (B) = k>0 ν(B ∩ Ek ) = ν(B) as claimed in the first
part of the statement.
Consider any measurable set B ⊂ M. Then,
μ B = μ B ∩ E + μ B ∩ Ec = ν B ∩ E + μ B ∩ Ec
∞
(1.4.7)
= ν B ∩ Ek + μ B ∩ Ec .
k=1
26 Recurrence
Since μ is invariant, μ(B∩Ec ) = μ f −1 (B)∩f −1 (Ec ) . Then, as in the previous
equality,
μ B ∩ Ec = μ f −1 (B) ∩ E ∩ f −1 (Ec ) + μ f −1 (B) ∩ Ec ∩ f −1 (Ec )
∞
= ν f −1 (B) ∩ Ek + μ f −1 (B) ∩ Ec ∩ f −1 (Ec ) .
k=2
N
≥ ν f −n (B) ∩ Ek for every N ≥ 1.
n=0 k>n
a1
a2
0 a3 a2 a1 1 0 a3 a2 a1 1
Using the ideas that will be developed in Chapter 11, one can show that g
admits a unique invariant probability measure ν equivalent to the Lebesgue
measure on (0, 1]. In fact, the density (Radon–Nikodym derivative) of ν with
respect to the Lebesgue measure is bounded from zero and infinity. Then, the
f -invariant measure νρ in (1.4.5) is equivalent to Lebesgue measure. It follows
(see Exercise 1.4.2) that this measure is finite if and only if d ∈ (0, 1).
k-th floor
(k−1)-st floor
g
2nd floor
1st floor
ground floor
E1 E2 E3 Ek
In other words, each point (x, n) is “lifted” one floor at a time, until reaching the
floor ρ(x) − 1; at that stage, the point “falls” directly to (g(x), 0) on the ground
(zero-th) floor. The ground floor E × {0} is naturally identified with the set E.
Besides, the first-return map to E × {0} corresponds precisely to g : E → E.
Finally, the measure νρ is defined by
νρ | (Ek × {n}) = ν | Ek
for every 0 ≤ n < k. It is clear that the restriction of νρ to the ground floor
coincides with ν. Moreover, νρ is invariant under f and
∞
νρ (M) = kν(Ek ) = ρ dν.
k=1 E
1.4.4 Exercises
1.4.1. Let f : S1 → S1 be the transformation f (x) = 2x mod Z. Show that the function
τ (x) = min{k ≥ 0 : f k (x) ∈ (1/2, 1)} is integrable with respect to the Lebesgue
measure. State and prove a corresponding result for any C1 transformation g :
S1 → S1 that is close to f , in the sense that supx {g(x) − f (x), g (x) − f (x)} is
sufficiently small.
1.4.2. Consider the measure νρ and the sequence (an )n defined in Example 1.4.5. Check
that νρ is always σ -finite. Show that (an )n is decreasing and converges to zero.
Moreover, there exist c1 , c2 , c3 , c4 > 0 such that
c1 ≤ aj j1/d ≤ c2 and c3 ≤ aj − aj+1 j1+1/d ≤ c4 for every j. (1.4.8)
Deduce that the g-invariant measure νρ is finite if and only if d ∈ (0, 1).
1.4.3. Let σ : → be the map defined on the space = {1, . . . , d}Z by σ ((xn )n ) =
(xn+1 )n . Describe the first-return map g to the subset {(xn )n ∈ : x0 = 1}.
1.4.4. [Kakutani–Rokhlin lemma] Let f : M → M be an invertible transformation
and μ be an invariant probability measure without atoms and such that
μ( n∈N f n (E)) = 1 for every E ⊂ M with μ(E) > 0. Show that for every
1.5 Multiple recurrence theorems 29
n ≥ 1 and ε > 0 there exists a measurable set B ⊂ M such that the iterates
B, f (B), . . . , f n−1 (B) are pairwise disjoint and the complement of their union has
measure less than ε. In particular, this holds for every invertible system that is
aperiodic, that is, whose periodic points form a zero measure set.
1.4.5. Let f : M → M be a transformation and (Hj )j≥1 be a collection of subsets of M
such that if x ∈ Hn then f j (x) ∈ Hn−j for every 0 ≤ j < n. Let H be the set of points
that belong to Hj for infinitely many values of j, that is, H = ∞ k=1
∞
j=k Hj . For
τ (y)
y ∈ H, define τ (y) = min{j ≥ 1 : y ∈ Hj } and T(y) = f (y). Observe that T maps
H inside H. Moreover, show that
1
k−1
1 1
lim sup #{1 ≤ j ≤ n : x ∈ Hj } ≥ θ > 0 ⇒ lim inf τ (T i (x)) ≤ .
n n k k i=0 θ
The key point here is that the sequence (nk )k does not depend on i: we say
that the point a is simultaneously recurrent for all the maps fi , i = 1, . . . , q.
A proof of Theorem 1.5.1 is given in Section 1.5.1. Next, we discuss the
following generalization of the Poincaré recurrence theorem (Theorem 1.2.1):
Theorem 1.5.2 (Poincaré multiple recurrence). Let (M, B, μ) be a probability
space and fi : M → M, i = 1, . . . , q be measurable commuting maps that
30 Recurrence
preserve the measure μ. Then, given any set E ⊂ M with positive measure,
there exists n ≥ 1 such that
μ E ∩ f1−n (E) ∩ · · · ∩ fq−n (E) > 0.
Thus, m + n is also a simultaneous return to E, for all the points in some subset
of E with positive measure.
It follows that, for any set E ⊂ M with μ(E) > 0 and for μ-almost every
point x ∈ E, there exist infinitely many simultaneous returns of x to E. Indeed,
suppose there is a positive measure set F ⊂ E such that every point of F has a
finite number of simultaneous returns to E. On the one hand, up to replacing
F by a suitable subset, we may suppose that the simultaneous returns to E
of all the points of F are bounded by some k ≥ 1. On the other hand, using
the previous paragraph, there exists n > k such that G = F ∩ f1−n (F) ∩ · · · ∩
fq−n (F) has positive measure. Now, it is clear from the definition that n is a
simultaneous return to E of every x ∈ G. This contradicts the choice of F, thus
proving our claim.
Another direct corollary is the Birkhoff multiple recurrence theorem
(Theorem 1.5.1). Indeed, if fi : M → M, i = 1, . . . , q are continuous commuting
transformations on a compact metric space then there exists some probability
measure μ that is invariant under all these transformations (this fact will be
checked in the next chapter, see Exercise 2.2.2). From this point on, we may
argue exactly as in the proof of Theorem 1.2.4. More precisely, consider
any countable basis {Uk } for the topology of M. According to the previous
paragraph, for every k there exists a set Ũk ⊂ Uk with zero measure such
that every point in Uk \ Ũk has infinitely many simultaneous returns to Uk .
Then Ũ = k Ũk has measure zero and every point in its complement is
simultaneously recurrent, in the sense of Theorem 1.5.1.
1.5 Multiple recurrence theorems 31
Lemma 1.5.3. If M is minimal then for every non-empty open set U ⊂ M there
exists a finite subset H ⊂ G such that
h−1 (U) = M.
h∈H
n n
= (g1k (y), . . . , gq−1
k
(y), y)
converges to (y, . . . , y, y) when k → ∞. This proves the lemma, with x̃ = x̃k ,
ỹ = (y, . . . , y, y) and n = nk for every k sufficiently large.
The next step is to show that the point ỹ in Lemma 1.5.4 is arbitrary:
Lemma 1.5.5. Given ε > 0 and z̃ ∈ q there exist w̃ ∈ q and m ≥ 1 such that
d(F m (w̃), z̃) < ε.
Proof. Given ε > 0 and z̃ ∈ q , consider U = open ball of center z̃ and radius
ε/2. By Lemma 1.5.3 and the observation (1.5.3), we may find a finite set
), h ∈ H cover q . Since the elements of G are
H ⊂ G such that the sets h̃−1 (U
(uniformly) continuous functions, there exists δ > 0 such that
d(x̃1 , x̃2 ) < δ ⇒ d(h̃(x̃1 ), h̃(x̃2 )) < ε/2 for every h ∈ H.
By Lemma 1.5.4 there exist x̃, ỹ ∈ q and n ≥ 1 such that d(F n (x̃), ỹ) < δ. Fix
h ∈ H such that ỹ ∈ h̃−1 (U). Then,
d h̃(F n (x̃)), z̃ ≤ d h̃(F n (x̃)), h̃(ỹ) + d h̃(ỹ), z̃ < ε/2 + ε/2.
Take w̃ = h̃(x̃). Since h̃ commutes with F n , the previous inequality implies that
d(F n (w̃), z̃) < ε, as we wanted to prove.
Proof. Given ε > 0 and z̃0 ∈ q , consider the sequences εj , mj and z̃j , j ≥ 1
defined by recurrence as follows. Initially, take ε1 = ε/2.
• By Lemma 1.5.5 there are z̃1 ∈ q and m1 ≥ 1 with d(F m1 (z̃1 ), z̃0 ) < ε1 .
• By the continuity of F m1 , there exists ε2 < ε1 such that d(z̃, z̃1 ) < ε2 implies
d(F m1 (z̃), z̃0 ) < ε1 .
Next, given any j ≥ 2:
• By Lemma 1.5.5 there are z̃j ∈ q and mj ≥ 1 with d(F mj (z̃j ), z̃j−1 ) < εj .
• By the continuity of F mj , there exists εj+1 < εj such that d(z̃, z̃j ) < εj+1
implies d(F mj (z̃), z̃j−1 ) < εj .
In particular, for any i < j,
ε
d(F mi+1 +···+mj (z̃j ), z̃i ) < εi+1 ≤ .
2
Since q is compact, we can find i, j with i < j such that d(z̃i , z̃j ) < ε/2. Take
k = mi+1 + · · · + mj . Then,
d(F k (z̃j ), z̃j ) ≤ d(F k (z̃j ), z̃i ) + d(z̃i , z̃j ) < ε.
This completes the proof of the lemma.
Now we are ready to conclude the proof of Theorem 1.5.1. For that, let us
consider the function
φ: q → [0, ∞), φ(x̃) = inf{d(F n (x̃), x̃) : n ≥ 1}.
Observe that φ is upper semi-continuous: given any ε > 0, every point x̃ admits
some neighborhood V such that φ(ỹ) < φ(x̃) + ε for every y ∈ V. This is an
immediate consequence of the fact that φ is given by the infimum of a family of
continuous functions. Then (Exercise 1.5.4), φ admits some continuity point ã.
We are going to show that this point satisfies the conclusion of Theorem 1.5.1.
Let us begin by observing that φ(ã) = 0. Indeed, suppose that φ(ã) is
positive. Then, by continuity, there exist β > 0 and a neighborhood V of ã
such that φ(ỹ) ≥ β > 0 for every ỹ ∈ V. Then,
d(F n (ỹ), ỹ) ≥ β for every y ∈ V and n ≥ 1. (1.5.4)
On the other hand, according to (1.5.3), for every x̃ ∈ q there exists h ∈ H
such that h̃(x̃) ∈ V. Since the transformations h are uniformly continuous, we
may fix α > 0 such that
d(z̃, w̃) < α ⇒ d h̃(z̃), h̃(w̃) < β for every h ∈ H. (1.5.5)
By Lemma 1.5.6, there exists n ≥ 1 such that d(x̃, F n (x̃)) < α. Then, using
(1.5.5) and recalling that F commutes with every h̃,
d h̃(x̃), F n (h̃(x̃)) < β.
This contradicts (1.5.4). This contradiction proves that φ(ã) = 0, as claimed.
34 Recurrence
In other words, there exists (nk )k → ∞ such that d(F nk (ã), ã) → 0 when k →
∞. This means that (1.5.2) is satisfied and, hence, the proof of Theorem 1.5.1
is complete.
1.5.2 Exercises
1.5.1. Show, by means of examples, that the conclusion of Theorem 1.5.1 is generally
false if the transformations fi do not commute with each other.
1.5.2. Let G be the abelian group generated by commuting homeomorphisms f1 , . . . , fq :
M → M on a compact metric space. Prove that there exists some minimal element
X ⊂ M for the inclusion relation in the family of non-empty, closed, G-invariant
subsets of M.
1.5.3. Show that if ϕ : M → R is an upper semi-continuous function on a compact
metric space then ϕ attains its maximum, that is, there exists p ∈ M such that
ϕ(p) ≥ ϕ(x) for every x ∈ M.
1.5.4. Show that if ϕ : M → R is an (upper or lower) semi-continuous function on a
compact metric space then the set of continuity points of ϕ contains a countable
intersection of open and dense subsets of M. In particular, the set of continuity
points is dense in M.
1.5.5. Let f : M → M be a measurable transformation preserving a finite measure μ.
Given k ≥ 1 and a positive measure set A ⊂ M, show that for almost every x ∈ A
there exists n ≥ 1 such that f jn (x) ∈ A for every 1 ≤ j ≤ k.
1.5.6. Let f1 , . . . , fq : M → M be commuting homeomorphisms on a compact metric
space. A point x ∈ M is called non-wandering if for every neighborhood U of x
n n
there exist n1 , . . . , nq ≥ 1 such that f1 1 · · · fq q (U) intersects U. The non-wandering
set is the set (f1 , . . . , fq ) of all non-wandering points. Prove that (f1 , . . . , fq ) is
non-empty and compact.
2
Existence of invariant measures
In this chapter we prove the following result, which guarantees the existence
of invariant measures for a broad class of transformations:
The main point in the proof is to introduce a certain topology in the set
M1 (M) of probability measures on M, that we call weak∗ topology. The idea
is that two measures are close, with respect to this topology, if the integrals
they assign to (many) bounded continuous functions are close. The precise
definition and some of the properties of the weak∗ topology are presented
in Section 2.1. The crucial property, that makes this topology so useful for
proving the existence theorem, is that it turns M1 (M) into a compact space
(Theorem 2.1.5).
The proof of Theorem 2.1 is given in Section 2.2. We will also see,
through examples, that the hypotheses of continuity and compactness cannot
be omitted.
In Section 2.3 we insert the construction of the weak∗ topology into a
broader framework from functional analysis and we also take the opportunity
to introduce the notion of the Koopman operator of a transformation, which
will be very useful in the sequel. In particular, as we are going to see, it allows
us to give an alternative proof of Theorem 2.1, based on tools from functional
analysis.
In Section 2.4 we describe certain explicit constructions of invariant
measures for two important classes of systems: skew-products and natural
extensions (or inverse limits) of non-invertible transformations.
Finally, in Section 2.5 we discuss some important applications of the idea of
multiple recurrence (Section 1.5) in the context of combinatorial arithmetics.
Theorem 2.1.5 has an important role in the arguments, which is the reason why
this discussion was postponed to the present chapter.
36 Existence of invariant measures
Note that the intersection of any two such sets contains some set of this form.
Thus, the family {V(μ, , ε) : , ε} may be taken as a basis of neighborhoods
of each μ ∈ M1 (M).
The weak∗ topology is the topology defined by these bases of neighbor-
hoods. In other words, the open sets in the weak∗ topology are the sets A ⊂
M1 (M) such that for every μ ∈ A there exists some V(μ, , ε) contained in A.
Observe that the definition depends only on the topology of M, not on its dis-
tance. Furthermore, this topology is Hausdorff: Proposition A.3.3 implies that
if μ and ν are distinct probabilities then there exist ε > 0 and some bounded
continuous function φ : M → R such that V(μ, {φ}, ε) ∩ V(ν, {φ}, ε) = ∅.
Proof. To prove the “only if” claim, consider any set = {φ} consisting of a
single bounded continuous function φ. Since (μn )n → μ, for any ε > 0 there
exists n̄ ≥ 1 such that μn ∈ V(μ, , ε) for every n ≥ n̄. This means, precisely,
that
φ dμn − φ dμ < ε for every n ≥ n̄.
so, let = {φ1 , . . . , φN }. The hypothesis ensures that for every i there exists n̄i
such that
φi dμn − φi dμ < ε for every n ≥ n̄i .
Taking n̄ = max{n̄1 , . . . , n̄N } we get that μn ∈ V(μ, , ε) for every n ≥ n̄.
to Lemma A.3.4, for each δ > 0 and each i there exists a Lipschitz function
ψi : M → [0, 1] such that XFi ≤ ψi ≤ XFδ . Observe that δ Fiδ = Fi , because
i
Fi is closed, and so μ(Fiδ ) → μ(Fi ) when δ → 0. Fix δ > 0 small enough so
that μ(Fiδ ) − μ(Fi ) < ε/2 for every i. Let be the set of functions ψ1 , . . . , ψN
obtained in this way. Observe that
ψi dν − ψi dμ < ε/2 ⇒ ν(Fi ) − μ(F δ ) < ε/2 ⇒ ν(Fi ) ≤ μ(Fi ) + ε
i
and we also have similar inequalities for the integrals relative to ν. It follows
that
k
φi dμ − φi dν ≤ |μ(Bi,j ) − ν(Bi,j )| + ε/2 (2.1.6)
j=1
Next, let us consider the (finite) partition P of U defined by the family of balls
{B(pj , r) : j = 1, . . . , k}.
That is, the elements of P are the maximal sets P ⊂ U such that, for each j,
either P is contained in B(pj , r) or P is disjoint from B(pj , r). See Figure 2.1.
Now let E be the family of all finite unions of elements of P. Note that the
boundary of every element of E has measure zero, since it is contained in the
union of the boundaries of the balls B(pj , r), 1 ≤ j ≤ k. That is, every element
of E is a continuity set of μ.
If ν ∈ Vc (μ, E, δ) then
|μ(E) − ν(E)| < δ for every E ∈ E. (2.1.9)
In particular, (2.1.8) together with (2.1.9) imply that
ν U > 1 − 2δ. (2.1.10)
Now, given any Borel subset B, denote by EB the union of all the elements of
P that intersect B. Then EB ∈ E and so the relation (2.1.9) yields
|μ(EB ) − ν(EB )| < δ.
Observe that B is contained in EB U c . Moreover, EB ⊂ Bδ because every
element of P has diameter less than 2r < δ. These facts, together with (2.1.8)
and (2.1.10), imply that
μ(B) ≤ μ(EB ) + δ < ν(EB ) + 2δ ≤ ν(Bδ ) + 2δ
ν(B) ≤ ν(EB ) + 2δ < μ(EB ) + 3δ ≤ μ(Bδ ) + 3δ.
Since 3δ < ε, these relations imply that ν ∈ BD (μ, ε).
Theorem 2.1.5. The space M1 (M) is compact for the weak∗ topology.
Proposition 2.1.6. Every sequence (μk )k∈N in M1 (M) has some subsequence
that converges in the weak∗ topology.
exists, for every function ϕ ∈ C0 (M). Indeed, suppose first that ϕ is in the unit
ball of C0 (M). Given any ε > 0, we may find n ∈ N such that ϕ − φn ≤ ε.
Then,
ϕ dμ − φn dμ ≤ ε
j j
Since ε is arbitrary, we find that limj ϕ dμ j exists. This proves (2.1.11) when
the function is in the unit ball. The general case reduces immediately to this
one, just replacing ϕ with ϕ/ϕ. In this way, we have completed the proof of
(2.1.11).
Finally, it is clear that the operator : C0 (M) → R defined by (2.1.11) is
linear and positive: (ϕ) ≥ min ϕ ≥ 0 whenever ϕ ≥ 0 at all points. Moreover,
(1) = 1. Thus, by Theorem A.3.11, there exists some Borel probability
42 Existence of invariant measures
According to Lemma 2.1.1, this means that the subsequence (μ j )j∈N converges
to μ in the weak∗ topology.
Proof. We only prove the necessary condition, which is the most useful part
of the statement. Then, in Exercise 2.1.8, we invite the reader to prove the
converse.
Suppose that K is tight. Consider an increasing sequence (Kl )l of compact
subsets of M such that η(Klc ) ≤ 1/l for every l and every η ∈ K. Fix any
sequence (μn )n in K. To begin with, we claim that for every l there exists a
subsequence (nj )j and there exists a measure νl on M such that νl (Klc ) = 0 and
(μnj | Kl )j converges to νl , in the sense that
ψ dμnj → ψ dνl for every continuous function ψ : Kl → R. (2.1.12)
Kl Kl
Analogously, for any k > l and any continuous function φ : M → [0, 1],
φ dνk − φ dνl = lim φ dμnj ≤ lim sup μnj (Klc ) ≤ 1/l.
j Kk \Kl j
Using Exercise A.3.5, we may translate this in terms of measures of sets (rather
than integrals of functions): for every k > l and every Borel set E ⊂ M,
Define ν(E) = liml νl (E) for each Borel set E. We claim that ν is a probability
measure on M. It is immediate from the definition that ν(∅) = 0 and that
ν is additive. Furthermore, ν(M) = liml ν(Kl ) = liml bl = 1. To show that ν
is countably additive (σ -additive), we use the criterion of continuity at the
empty set (Theorem A.1.14). Consider any decreasing sequence (Bn )n of Borel
subsets of M with n Bn = ∅. Given ε > 0, choose l such that 1/l < ε. Since
νl is countably additive, Theorem A.1.14 shows that νl (Bn ) < ε for every n
sufficiently large. Hence, ν(Bn ) ≤ νl (Bn ) + 1/l < 2ε for every n sufficiently
large. This proves that (ν(Bn ))n converges to zero and, by Theorem A.1.14, it
follows that ν is indeed countably additive.
The definition of ν implies (see Exercise 2.1.1 or Exercise 2.1.4) that (νl )l
converges to ν in the weak∗ topology. So, given ε > 0 and any bounded
continuous function ϕ : M → R, we have that | ϕ dνl − ϕ dν| < ε for every
l sufficiently large. Fix l such that, in addition, sup |ϕ|/l < ε. Then,
ϕ dμn − ϕ dνl ≤ ϕ dμ + ϕ dμ − ϕ dν
l ≤ 2ε
j nj nj
Klc Kl Kl
for every j sufficiently large. This shows that | ϕ dμnj − ϕ dν| < 3ε whenever
j is large enough. Thus, (μnj )j converges to ν in the weak∗ topology.
44 Existence of invariant measures
2.1.6 Exercises
2.1.1. Let M be a metric space and (μn )n be a sequence in M1 (M). Show that the
following conditions are all equivalent:
1. (μn )n converges to a probability measure μ in the weak∗ topology.
2. lim supn μn (F) ≤ μ(F) for every closed set F ⊂ M.
3. lim infn μn (A) ≥ μ(A) for every open set A ⊂ M.
4. limn μn (B) = μ(B) for every continuity set B of μ.
5. limn ψ dμn = ψ dμ for every Lipschitz function ψ : M → R.
2.1.2. Fix any dense subset F of the unit ball of C0 (M). Show that a sequence (μn )n∈N
of probability measures on M converges to some μ ∈ M1 (M) in the weak∗
topology if and only if
φ dμn converges to φ dμ, for every φ ∈ F .
2.1.3. Show that the subset formed by the measures with finite support is dense in
M1 (M), relative to the weak∗ topology. Assuming that the metric space M is
separable, conclude that M1 (M) is also separable.
2.1.4. The uniform topology in M1 (M) is defined by the basis of neighborhoods
and so the left-hand side is smaller than ε if the right-hand side is smaller than
ε. That means that
f∗ V(μ, , ε) ⊂ V(f∗ μ, , ε)) for every μ, and ε,
and this last fact shows that f∗ is continuous.
At this point, Theorem 2.1 could be deduced from the classical Schauder–
Tychonoff fixed point theorem for continuous operators in topological vector
spaces. A topological vector space is a vector space V endowed with a
topology relative to which both operations of V (addition and multiplication
by a scalar) are continuous. A set K ⊂ V is said to be convex if (1 − t)x + ty ∈ K
for every x, y ∈ K and every t ∈ [0, 1].
Theorem 2.1 corresponds to the special case when V = M(M) is the space
of complex measures, K = M1 (M) is the space of probability measures on M
and F = f∗ is the action of f on M(M).
However, the situation of Theorem 2.1 is a lot simpler than the general
case of the Schauder–Tychonoff theorem because the operator f∗ besides being
continuous is also linear. This allows for a direct and elementary proof of
Theorem 2.1 that also provides some additional information about the invariant
measure.
To that end, let ν be any probability measure on M: for example, ν could be
the Dirac mass at any point. Form the sequence of probability measures
1 j
n−1
μn = f ν, (2.2.2)
n j=0 ∗
j
where f∗ ν is the image of ν under the iterate f j . By Theorem 2.1.5, this
sequence has some accumulation point, that is, there exists some subsequence
(nk )k∈N and some probability measure μ ∈ M1 (M) such that
nk −1
1
f∗j ν → μ (2.2.3)
nk j=0
Lemma 2.2.4. Every accumulation point of a sequence (μn )n∈N of the form
(2.2.2) is a probability measure invariant under f .
Proof. The relation (2.2.3) asserts that, given any family = {φ1 , . . . , φN } of
bounded continuous functions and given any ε > 0, we have
n
1 k −1
(φ ◦ f j
) dν − φ dμ < ε/2 (2.2.4)
n i i
k j=0
and the latter expression is smaller than ε/2 for every i and every k sufficiently
large. Combining this fact with (2.2.4), we conclude that
nk
1
(φ ◦ f j
) dν − φ dμ<ε (2.2.6)
n i i
k j=1
1
n
k
f∗j ν → μ
nk j=1
Now the proof of Theorem 2.1 is complete. The examples that follow show
that neither of the two hypotheses in the theorem, continuity and compactness,
may be omitted.
Example 2.2.5. Consider f : (0, 1] → (0, 1] given by f (x) = x/2. Suppose that
f admits some invariant probability measure: we are going to show that this is
actually not true. By the recurrence theorem (Theorem 1.2.4), relative to that
probability measure almost every point of (0, 1] is recurrent. However, it is
clear that there are no recurrent points: the orbit of every x ∈ (0, 1] converges
to zero and, in particular, does not accumulate on the initial point x. Hence, f
is an example of a continuous transformation (on a non-compact space) that
does not have any invariant probability measure.
Example 2.2.6. Modifying a little the previous construction, we see that the
same phenomenon may occur in compact spaces, if the transformation is not
continuous. Consider f : [0, 1] → [0, 1] given by f (x) = x/2 if x = 0 and f (0) =
1. For the same reason as before, no point x ∈ (0, 1] is recurrent. So, if there
exists some invariant probability measure μ then it must give full weight to the
sole recurrent point x = 0. In other words, μ must be the Dirac mass supported
at zero, that is, the measure δ0 defined by
δ0 (E) = 1 if 0 ∈ E and δ0 (E) = 0 if 0 ∈
/ E.
However, the measure δ0 is not invariant under f : for example, the measurable
set E = {0} has measure 1 and yet its pre-image f −1 (E) is the empty set, which
has measure zero. Thus, this transformation f has no invariant probability
measures.
2.2.1 Exercises
2.2.1. Prove the following generalization of Lemma 2.2.4. Let f : M → M be a
continuous transformation on a compact metric space, ν be a probability measure
on M and (In )n be a sequence of intervals of natural numbers such that #In
converges to infinity when n goes to infinity. Then every accumulation point of
the sequence
1 j
μn = f ν
#In j∈I ∗
n
2.2.4. Prove the theorem of existence of invariant measures for continuous flows:
every continuous flow (f t )t∈R on a compact metric space admits some invariant
probability measure.
2.2.5. Show that the transformation f : [−1, 1] → [−1, 1], f (x) = 1 − 2x2 has some
invariant probability measure equivalent to the Lebesgue on the interval.
2.2.6. Let f : M → M be an invertible measurable transformation and m be a probability
measure on M such that m(A) = 0 if and only if m(f (A)) = 0. We say that the pair
(f , m) is totally dissipative if there exists a measurable set W ⊂ M whose iterates
f j (W), j ∈ Z are pairwise disjoint and such that their union has full measure. Prove
that if (f , m) is totally dissipative then f admits some σ -finite invariant measure
equivalent to Lebesgue measure m. This measure is necessarily infinite.
2.2.7. Let f : M → M be an invertible measurable transformation and m be a probability
measure on M such that m(A) = 0 if and only if m(f (A)) = 0. We say that the
pair (f , m) is conservative if there is no measurable set W ⊂ M with positive
measure whose iterates f j (W), j ∈ Z are pairwise disjoint. Show that if (f , m) is
conservative then, for every measurable set X ⊂ M, m-almost every point of X
returns to X infinitely times.
2.2.8. Suppose that (f , m) is conservative. Show that f admits a σ -finite invariant
measure μ equivalent to m if and only if there exist sets X1 ⊂ · · · ⊂ Xn ⊂ · · ·
with M = n Xn and m(Xn ) < ∞ for every n, such that the first-return map fn to
each Xn admits a finite invariant measure μn absolutely continuous with respect
to the restriction of m to Xn .
2.2.9. Find conservative pairs (f , m) such that f has no finite invariant measures
equivalent to m. [Observation: Ornstein [Orn60] gave examples such that f does
not even have σ -finite invariant measures equivalent to m.]
The weak topology in the space E is the topology defined by the following
basis of neighborhoods:
V(v, {g1 , . . . , gN }, ε) = {w ∈ E : |gi (v) − gi (w)| < ε for every i}, (2.3.2)
The weak∗ topology in the dual space E∗ is the topology defined by the
following basis of neighborhoods:
where v1 , . . . , vN ∈ E. It satisfies
2.3.3 Exercises
2.3.1. Let 1 be the space of summable sequences of complex numbers, endowed with
the norm (an )n 1 = ∞ n=0 |an |. Let
∞
be the space of bounded sequences and
c0 be the space of sequences converging to zero, both endowed with the norm
(an )n ∞ = supn≥0 |an |.
2.4 Skew-products and natural extensions 53
every n < 0. Consider the map π : M̂ → M sending each sequence (xn )n≤0 to
its term x0 of order zero. Observe that π(M̂) = M. Finally, define fˆ : M̂ → M̂
to be the shift by one unit to the left:
fˆ(. . . , xn , . . . , x0 ) = (. . . , xn , . . . , x0 , f (x0 )). (2.4.2)
It is clear that fˆ is well defined and satisfies π ◦ fˆ = f ◦ π . Moreover, fˆ is
invertible: the inverse is the shift to the right:
(. . . , yn , . . . , y−1 , y0 ) → (. . . , yn , . . . , y−2 , y−1 ).
If M is a measurable space then we may turn M̂ into a measurable space by
endowing it with the σ -algebra generated by the measurable cylinders
[Ak , . . . , A0 ] = {(xn )n≤0 ∈ M̂ : xi ∈ Ai for i = k, . . . , 0}, (2.4.3)
where k ≤ 0 and Ak , . . . , A0 are measurable subsets of M. Then π is a
measurable map, since
π −1 (A) = [A]. (2.4.4)
Moreover, fˆ is measurable if f is measurable:
fˆ−1 ([Ak , . . . , A0 ]) = Ak , . . . , A−2 , A−1 ∩ f −1 (A0 ) . (2.4.5)
The inverse of fˆ is also measurable, since
fˆ([Ak , . . . , A0 ]) = [Ak , . . . , A0 , M]. (2.4.6)
Analogously, if M is a topological space then we may turn M̂ into a
topological space by endowing it with the topology generated by the open
cylinders [Ak , . . . , A0 ], where k ≤ 0 and Ak , . . . , A0 are open subsets of M.
The relations (2.4.4) and (2.4.6) show that π and fˆ−1 are continuous, whereas
(2.4.5) shows that fˆ is continuous if f is continuous. Observe that if M admits
a countable basis U of open sets then the cylinders [Ak , . . . , A0 ] with k ≥ 0 and
A0 , . . . , Ak ∈ U constitute a countable basis of open sets for M̂.
If M is a metric space, with distance d, then the following function is a
distance on M̂:
0
d̂ x̂, ŷ) = 2n min{d(xn , yn ), 1}, (2.4.7)
n=−∞
where x̂ = (xn )n≤0 and ŷ = (yn )n≤0 . It follows immediately from the definition
that if x̂ and ŷ belong to the same pre-image π −1 (x) then
d̂(fˆj (x̂), fˆj (ŷ)) ≤ 2−j d̂(x̂, ŷ) for every j ≥ 0.
So, every pre-image π −1 (x) is a stable set, that is, a subset restricted to which
the transformation fˆ is uniformly contracting.
Example 2.4.2. Given any transformation g : M → M, consider its maximal
invariant set Mg = ∞
n=1 g (M). Clearly, g(Mg ) ⊂ Mg . Suppose that
n
(i) M is compact and g is continuous or (ii) #g−1 (y) < ∞ for every y.
56 Existence of invariant measures
admits a unique lift, that is, there is a unique measure μ̂ on M̂ invariant under
fˆ and such that π∗ μ̂ = μ.
The proof of existence will be proposed to the reader in Exercise 5.2.4, using
ideas to be developed in Chapter 5. We will also see in Exercise 8.5.7 that those
arguments remain valid in the somewhat more general setting of Lebesgue
spaces. But existence of the lift is not true in general, for arbitrary probability
spaces, as shown by the example in Exercise 1.15 in the book of Przytycki and
Urbański [PU10]).
2.4.3 Exercises
2.4.1. Let M be a compact metric space and X be a set of continuous maps f : M →
M, endowed with a probability measure ν. Consider the skew-product F : X N ×
M → X N × M defined by F((fn )n , x) = ((fn+1 )n , f0 (x)). Show that F admits some
invariant probability measure m of the form m = ν N × μ. Moreover, a measure m
of this form is invariant under F if and only if the measure μ is stationary for ν,
that is, if and only if μ(E) = f∗ μ(E) dν(f ) for every measurable set E ⊂ M.
2.4.2. Let f : M → M be a surjective transformation, fˆ : M̂ → M̂ be its natural extension
and π : M̂ → M be the canonical projection. Show that if g : N → N is an
invertible transformation such that f ◦p = p◦g for some map p : N → M then there
exists a unique map p̂ : N → M̂ such that π ◦ p̂ = p and p̂ ◦ g = fˆ ◦ p̂. Suppose that
M and N are compact spaces and the maps p and g are continuous. Show that if p
is surjective then p̂ is surjective (and so g : N → N is an extension of fˆ : M̂ → M̂).
2.4.3. Check the claims in Example 2.4.2.
2.4.4. Show that if (M, d) is a complete separable metric space then the same holds
for the space (M̂, d̂) of the pre-orbits of any continuous surjective transformation
f : M → M.
2.4.5. The purpose of this exercise and the next is to generalize the notion of
natural extension to finite families of commuting transformations. Let M be
a compact space and f1 , . . . , fq : M → M be commuting surjective continuous
transformations. Let M̂ be the set of all sequences (xn1 ,...,nq )n1 ,...,nq ≤0 , indexed by
the q-tuples of non-positive integer numbers, such that
fi (xn1 ,...,ni ,...,nq ) = xn1 ,...,ni +1,...,nq for every i and every (n1 , . . . , nq ).
Let π : M̂ → M be the map sending (xn1 ,...,nq )n1 ,...,nq ≤0 to the point x0,...,0 .
For each i, let fˆi : M̂ → M̂ be the map sending (xn1 ,...,ni ,...nq )n1 ,...,nq ≤0 to
(xn1 ,...,ni +1,...nq )n1 ,...,nq ≤0 .
(a) Prove that M̂ is a compact space. Moreover, M̂ is metrizable if M is
metrizable.
58 Existence of invariant measures
Theorem 2.5.1 (van der Waerden). Given any partition {S1 , . . . , Sl } of Z, there
exists j ∈ {1, . . . , l} such that Sj contains arithmetic progressions of every length.
In other words, for every q ≥ 1 there exist m ∈ Z and n ≥ 1 such that m + in ∈ Sj
for every 1 ≤ i ≤ q.
Some time afterwards, the Hungarian mathematicians Pål Erdös and Pål
Turan [ET36] conjectured the following statement, which is stronger than the
theorem of van der Waerden: any set S ⊂ Z whose upper density is positive
contains arithmetic progressions of every length. This was proven by another
Hungarian mathematician, Endre Szemerédi [Sze75], almost four decades
later. To state the theorem of Szemerédi precisely, we need to define the notion
of upper density of a subset of Z.
We call an interval of the set Z any subset I of the form {n ∈ Z : a ≤ n < b}
with a ≤ b. The cardinal of an interval I is the number #I = b − a. The upper
2.5 Arithmetic progressions 59
ideas in ergodic theory: we will show in Section 2.5.1 how to obtain the
theorem of van der Waerden from the multiple recurrence theorem of Birkhoff
(Theorem 1.5.1); similar arguments yield the theorem of Szemerédi from the
multiple recurrence theorem of Poincaré (Theorem 1.5.2), as we will see in
Section 2.5.2.
The theory of Szemerédi remains a very active research area. In particular,
alternative proofs of Theorem 2.5.5 have been given by other authors. Recently,
this led to the following spectacular result of the British mathematician Ben
Green and the Australian mathematician Terence Tao [GT08]: the set of
prime numbers contains arithmetic progressions of every length. This is not a
consequence of the theorem of Szemerédi, because the upper density of the set
of prime numbers is zero, but the theorem of Szemerédi does have an important
role in the proof. On the other hand, the Green–Tao theorem is a special case
of yet another conjecture of Erdös: if S ⊂ N is such that the sum of the inverses
diverges, that is, such that
1
= ∞,
n∈S
n
then S contains arithmetic progressions of every length. This more general
statement remains open.
2.5.3 Exercises
2.5.1. Prove Lemma 2.5.2.
2.5.2. Show that the conclusion of Theorem 2.5.1 remains valid for partitions of finite
subsets of Z, as long as they are sufficiently large. More precisely: given q, l ≥ 1
there exists N ≥ 1 such that, for any partition of the set {1, 2, . . . , N} into l subsets,
at least one of these subsets contains arithmetic progressions of length q.
2.5.3. A point x ∈ M is said to be super non-wandering if, given any neighborhood
U of x and any k ≥ 1, there exists n ≥ 1 such that kj=0 f −jn (U) = ∅. Show
that the theorem of van der Warden is equivalent to the following statement:
every invertible transformation on a compact metric space has some super
non-wandering point.
2.5.4. Prove the following generalization of the theorem of van der Waerden to arbitrary
dimension, called the Grünwald theorem: given any partition Nk = S1 ∪ · · · ∪ Sl
and any q ≥ 1, there exist j ∈ {1, . . . , l}, d ∈ N and b ∈ Nk such that
1
n−1
lim ϕ(f j (x)). (3.0.3)
n→∞ n
j=0
This suggests a natural generalization of the original question: does the limit
in (3.0.3) exist for more general functions ϕ, for example, for all integrable
functions?
The ergodic theorem of von Neumann (Theorem 3.1.6) states that the limit in
(3.0.3) does exist, in the space L2 (μ), for every function ϕ ∈ L2 (μ). The ergodic
theorem of Birkhoff (Theorem 3.2.3) goes a lot further, by asserting that the
convergence holds at μ-almost every point, for every ϕ ∈ L1 (μ). In particular,
the limit in (3.0.1) is well defined for μ-almost every x (Theorem 3.2.1).
We give a direct proof of the theorem of von Neumann and we also show
how it can be deduced from the theorem of Birkhoff. Concerning the latter, we
are going to see that it can be obtained as a special case of an even stronger
result, the subadditive ergodic theorem of Kingman (Theorem 3.3.3). This
theorem asserts that ψn /n converges almost everywhere, for any sequence of
functions ψn such that ψm+n ≤ ψm + ψn ◦ f m for every m, n.
All these results remain valid for flows, as we comment upon in Section 3.4.
H = F ⊕ F⊥ , (3.1.1)
66 Ergodic theorems
To close this brief digression, let us quote a classical result from functional
analysis, due to Marshall H. Stone, that permits the reduction of the study of
Koopman operators of continuous time systems to the discrete case.
Let Ut : H → H, t ∈ R be a 1-parameter group of linear operators on a
Banach space: by this we mean that U0 = id and Ut+s = Ut Us for every t, s ∈ R.
We say that the group is strongly continuous if
lim Ut v = Ut0 v, for every t0 ∈ R and v ∈ H.
t→t0
See Yosida [Yos68, § IX.3] for a proof of the fact that the limit on the
right-hand side exists for every v in a dense subspace of H.
vectors of U. Then,
1 j
n−1
lim U v = Pv for every v ∈ H. (3.1.5)
n→∞ n
j=0
Proof. Let L(U) be the set of vectors v ∈ H of the form v = Uu − u for some
u ∈ H and let L̄(U) be its closure. We claim that
v · (Uun − un ) = v · Uun − v · un = U ∗ v · un − v · un = 0
for every n, we conclude that v · w = 0. This proves that I(U) ⊂ L̄(U)⊥ . Next,
consider any v ∈ L̄(U)⊥ . Then, in particular,
for every u ∈ H. This means that U ∗ v = v. Using Lemma 3.1.3 once more,
we deduce that v ∈ I(U). This shows that L̄(U)⊥ ⊂ I(U), which completes the
proof of (3.1.6). As a consequence, using (3.1.1),
Now we prove the identity (3.1.5), successively, for v ∈ I(U), for v ∈ L̄(U)
and for any v ∈ H. Begin by supposing that v ∈ I(U). On the one hand, Pv = v.
On the other hand,
1 j 1
n−1 n−1
Uv= v=v
n j=0 n j=0
1 j 1 j+1 1
n−1 n−1
Uv= U u − U j u = (U n u − u).
n j=0 n j=0 n
1 j
n−1
lim U v = 0 for every v ∈ L(U). (3.1.8)
n n
j=0
3.1 Ergodic theorem of von Neumann 69
More generally, suppose that v ∈ L̄(U). Then, there exist vectors vk ∈ L(U)
converging to v when k → ∞. Observe that
1 n−1 j 1
n−1
1 n−1
U v− j
U vk ≤ U j (v − vk ) ≤ v − vk
n n n
j=0 j=0 j=0
for every n and every k. Together with (3.1.8), this implies that
1 j
n−1
lim U v=0 for every v ∈ L̄(U). (3.1.9)
n n
j=0
Since (3.1.6) implies that Pv = 0 for every v ∈ L̄(U), this shows that (3.1.5)
holds also when v ∈ L̄(U).
The general case of (3.1.5) follows immediately, as H = I(U) ⊕ L̄(U).
1
n−1
ϕ ◦fj (3.1.10)
n j=0
1
n−1
ϕ ◦ f −j (3.1.11)
n j=0
3.1.4 Exercises
3.1.1. Show that under the hypotheses of the von Neumann ergodic theorem one has
the following stronger conclusion:
1
n−1
lim ϕ ◦ f j → P(ϕ).
n−m→∞ n − m
j=m
3.1.2. Use the previous exercise to show that, given any A ⊂ M with μ(A) > 0, the set
of values of n ∈ N such that μ(A ∩ f −n (A)) > 0 is syndetic. [Observation: We
have seen a different proof of this fact in Exercise 1.2.5.]
3.1.3. Prove that the set F = {ϕ ∈ L1 (μ) : ϕ is f -invariant} is a closed subspace of L1 (μ).
3.1.4. State and prove a version of the von Neumann ergodic theorem for flows.
3.1.5. Let ft : M → M, t ∈ R be a continuous flow on a compact metric space M and μ be
an invariant probability measure. Check that the 1-parameter group Ut : L2 (μ) →
L2 (μ), t ∈ R of Koopman operators ϕ → Ut ϕ = ϕ ◦ ft is strongly continuous.
Show that μ is ergodic if and only if 0 is a simple eigenvalue of the infinitesimal
generator of the group.
1 His son Garret Birkhoff was also a mathematician, and is well known for his work in algebra.
The notion of projective distance that we use in Section 12.3 was due to him.
3.2 Birkhoff ergodic theorem 71
Indeed, by definition,
1
n
τ (E, f (x)) = lim XE (f j (x))
n→∞ n
j=1
1 1
n−1
= lim XE (f j (x)) − XE (x) − XE (f n (x))
n→∞ n n
j=0
1
= τ (E, x) − lim XE (x) − XE (f n (x)) .
n→∞ n
Since the characteristic function is bounded, the last limit is equal to zero. This
proves (3.2.1).
The next example shows that the mean sojourn time does not exist for every
point, in general:
x = 0.10011110000000011111111111111110 . . . ,
where the lengths of the alternating blocks of 0s and 1s are given by successive
powers of 2. Let f : [0, 1] → [0, 1] be the transformation defined in Section 1.3.1
and let E = [0, 1/10). That is, E is the set of all points whose decimal expansion
starts with the digit 0. It is easy to check that if n = 2k − 1 with k even then
1
n−1
21 + 23 + · · · + 2k−1 2
XE (f (x)) =
j
= .
n j=0 2k − 1 3
1
n−1
21 + 23 + · · · + 2k−2 2k − 2 1
XE (f j (x)) = = →
n j=0 2k − 1 3(2k − 1) 3
as k → ∞. Thus, the mean sojourn time of x in the set E does not exist.
1
n−1
τ (E, x) = lim ϕ(f j (x)), where ϕ = XE .
n n
j=0
The next statement extends Theorem 3.2.1 to the case when ϕ is any integrable
function:
72 Ergodic theorems
1
n−1
ϕ̃(x) = lim ϕ(f j (x)) (3.2.2)
n→∞ n
j=0
Proof. By definition,
1 1 1
n n−1
ϕ̃(f (x)) = lim ϕ(f (x)) = lim
j
ϕ(f j (x)) + ϕ(f n (x)) − ϕ(x)
n→∞ n n→∞ n n
j=1 j=0
1 n
= ϕ̃(x) + lim ϕ(f (x)) − ϕ(x) .
n→∞ n
In general, the total measure subset of points for which the limit in (3.2.2)
exists depends on the function ϕ under consideration. However, in some
situations it is possible to choose such a set independent of the function. A
useful example of such a situation is:
Proof. By the Birkhoff ergodic theorem, for every continuous function ϕ there
exists G(ϕ) ⊂ M such that μ(G(ϕ)) = 1 and (3.2.4) holds for every x ∈ G(ϕ).
By Theorem A.3.13, the space C0 (M) of continuous functions admits some
countable dense subset {ϕk : k ∈ N}. Take
∞
G= G(ϕk ).
k=1
1 1
n−1 n−1
lim sup ϕ(f j (x)) ≤ lim ϕk (f j (x)) + ε = ϕ̃k (x) + ε
n n j=0 n n j=0
1 1
n−1 n−1
lim inf ϕ(f j (x)) ≥ lim ϕk (f j (x)) − ε = ϕ̃k (x) − ε.
n n j=0 n n
j=0
74 Ergodic theorems
1 1
n−1 n−1
lim sup ϕ(f j (x)) − lim inf ϕ(f j (x)) ≤ 2ε.
n n j=0 n n j=0
In general, one can not say anything about the speed of convergence in
Theorem 3.2.3. For example, it follows from a theorem of Kakutani and
Petersen (check pages 94 to 99 of Petersen [Pet83]) that if the measure μ
is ergodic2 and non-atomic then, given any sequence (an )n of positive real
numbers with limn an = 0, there exists some bounded measurable function ϕ
with
1 1
n−1
lim sup ϕ(f (x)) − ϕ dμ = +∞.
j
n an n j=0
1
n−1
lim inf ϕ ◦fj = 0 at almost every point;
n an j=0
This and other related facts about infinite measures are proved in Section 2.4
of Aaronson [Aar97].
2 We say that an invariant measure μ is ergodic if f −1 (A) = A up to measure zero implies that
either μ(A) = 0 or μ(Ac ) = 0. The study of ergodic measures will be the subject of the next
chapter.
3.2 Birkhoff ergodic theorem 75
We can use the Minkowski inequality (Theorem A.5.3) to bound the sequence
on the right-hand side from above:
2
1
n−1 1/2 n−1 1/2
1
|ϕ ◦ f j | dμ ≤ |ϕ ◦ f | dμ
j 2
. (3.2.6)
n j=0 n j=0
Since 2μ isinvariant
1/2
under f , the expression on the right-hand side is equal to
|ϕ| dμ . So, (3.2.5) and (3.2.6) imply that ϕ̃2 ≤ ϕ2 < ∞.
n−1
Now let us show that (1/n) j=0 ϕ ◦ f j converges to ϕ̃ in L2 (μ). Initially,
suppose that the function ϕ is bounded, that is, there exists C > 0 such that
|ϕ| ≤ C. Then,
1 n−1
j
ϕ ◦f ≤ C for every n and |ϕ̃| ≤ C.
n j=0
left to extend this conclusion to arbitrary functions ϕ in L2 (μ). For that, let us
consider some sequence (ϕk ) of bounded functions such that (ϕk )k converges
to ϕ. For example:
ϕ(x) if |ϕ(x)| ≤ k
ϕk (x) =
0 otherwise.
Denote by ϕ̃k the corresponding time averages. Given any ε > 0, let k0 be fixed
such that ϕ − ϕk 2 < ε/3 for every k ≥ k0 . Note that (ϕ − ϕk ) ◦ f j 2 is equal
to ϕ − ϕk 2 for every j ≥ 0, because the measure μ is invariant. Thus,
1 n−1
j
(ϕ −ϕ k )◦f ≤ ϕ −ϕk 2 < ε/3 for every n ≥ 1 and k ≥ k0 . (3.2.7)
n j=0 2
76 Ergodic theorems
Observe also that ϕ̃ − ϕ̃k is the time average of the function ϕ − ϕk . So, the
argument in the previous paragraph gives that
ϕ̃ − ϕ̃k 2 ≤ ϕ − ϕk 2 < ε/3 for every k ≥ k0 . (3.2.8)
By assumption, for every k ≥ 1 there exists n0 (k) ≥ 1 such that
1 n−1
ϕk ◦ f − ϕ̃k < ε/3
j
for every n ≥ n0 (k). (3.2.9)
n j=0 2
This completes the proof of the theorem of von Neumann from the theorem of
Birkhoff.
Exercise 3.2.5 contains an extension of these conclusions to any Lp (μ)
space.
Corollary 3.2.7. The time average ϕ̃ of any function ϕ ∈ L2 (μ) coincides with
the orthogonal projection P(ϕ) of ϕ to the subspace of invariant functions.
Proof. On the one hand, Theorem 3.1.7 gives that (1/n) n−1 j=0 ϕ ◦ f converges
j
to P(ϕ) in L2 (μ). On the other hand, we have just shown that this sequence
converges to ϕ̃ in the space L2 (μ). So, by uniqueness of the limit, P(ϕ) = ϕ̃.
1 1
n−1 n−1
lim ϕ ◦ f −j = lim ϕ ◦fj at μ-almost every point. (3.2.10)
n n n n
j=0 j=0
Proof. The limit on the left-hand side of (3.2.10) is the orthogonal projection
of ϕ to the subspace of functions invariant under f −1 , whereas the limit on the
right-hand side is the orthogonal projection of ϕ to the subspace of functions
invariant under f . It is clear that these two subspaces are exactly the same.
Thus, the two limits coincide in L2 (μ).
3.2.4 Exercises
3.2.1. Let X = {x1 , . . . , xr } be a finite set and σ : X → X be a permutation. We call σ a
cyclic permutation if it admits a unique orbit (containing all r elements of X).
1. Prove that, for any cyclic permutation σ and any function ϕ : X → R,
1
n−1
ϕ(x1 ) + · · · + ϕ(xr )
lim ϕ(σ i (x)) = .
n→∞ n r
i=0
3.2 Birkhoff ergodic theorem 77
2. More generally, prove that for any permutation σ and any function ϕ
1
n−1
ϕ(x) + ϕ(σ (x)) + · · · + ϕ(σ p−1 (x))
lim ϕ(σ (x)) =
i
,
n→∞ n p
i=0
1 (n+1)ρ ρ
1
ϕ(j) − ϕ(j) < 2ε for every n ≥ 1.
ρ j=nρ+1 ρ j=1
(c) Show that the sequence (1/n) nj=1 ϕ(j) converges to some real number
when n → ∞.
(d) More generally, prove that limn (1/n) nk=1 ϕ(x + k) exists for every x ∈ Z
and is independent of x.
3.2.4. Prove that for Lebesgue-almost every x ∈ [0, 1], the geometric mean of the integer
numbers a1 , . . . , an , . . . in the continued fraction expansion of x converges to some
real number: in other words, there exists b ∈ R such that limn (a1 a2 · · · an )1/n = b.
[Observation: Compare with Exercise 4.2.12.]
3.2.5. Let ϕ : M → R be an integrable function and ϕ̃ be the corresponding time average,
given by Theorem 3.2.3. Show that if ϕ ∈ Lp (μ) for some p > 1 then ϕ̃ ∈ Lp (μ)
and ϕ̃p ≤ ϕp . Moreover,
1
n−1
ϕ ◦fj
n j=0
The proof of Theorem 3.3.3 that we are going to present is due to Avila
and Bochi [AB], who started from a proof of the Birkhoff ergodic theorem
(Theorem 3.2.3) by Katznelson and Weiss [KW82]. An important observation
is that Theorem 3.2.3 is not used in the arguments. This allows us to obtain the
theorem of Birkhoff as a particular case of Theorem 3.3.3.
The crucial step in the proof of the theorem is the following estimate:
3.3 Subadditive ergodic theorem 81
E kc E kc E kc E kc
m0 n1 m1 nl ml n
E kc E kc E kc E kc
m0 n1 m1 nl ml nl+1 n
Proof. Take x ∈ M such that ϕ− (x) = ϕ− (f j (x)) for every j ≥ 1 (this holds
at μ-almost every point, according to Exercise 3.3.2). Consider the sequence,
possibly finite, of integer numbers
m0 ≤ n1 < m1 ≤ n2 < m2 ≤ . . . (3.3.6)
defined inductively as follows (see also Figure 3.1).
Define m0 = 0. Let nj be the smallest integer greater than or equal to mj−1
satisfying f nj (x) ∈ Ek (if it exists). Then, by the definition of Ek , there exists mj
such that 1 ≤ mj − nj ≤ k and
ϕmj −nj (f nj (x)) ≤ (mj − nj )(ϕ− (f nj (x)) + ε). (3.3.7)
This completes the definition of the sequence (3.3.6). Now, given n ≥ k, let
l ≥ 0 be the largest integer such that ml ≤ n. By subadditivity,
nj −1
ϕnj −mj−1 (f mj−1 (x)) ≤ ϕ1 (f i (x))
i=mj−1
for every j = 1, . . . , l such that mj−1 = nj , and analogously for ϕn−ml (f ml (x)).
Thus,
l
ϕn (x) ≤ ϕ1 (f (x)) +
i
ϕmj −nj (f nj (x)) (3.3.8)
i∈I j=1
l
where I = j=1 [mj−1 , nj ) [ml , n). Observe that
l
ϕ1 (f i (x)) = ψk (f i (x)) for every i ∈ [mj−1 , nj ) ∪ [ml , min{nl+1 , n}),
j=1
since f i (x) ∈ Ekc in all these cases. Moreover, since ϕ− is constant on orbits (see
Exercise 3.3.2) and ψk ≥ ϕ− + ε, the relation (3.3.7) gives that
mj −1 mj −1
ϕmj −nj (f nj (x)) ≤ (ϕ− (f i (x)) + ε) ≤ ψk (f i (x))
i=nj i=nj
82 Ergodic theorems
min{nl+1 ,n}−1
n−1
ϕn (x) ≤ ψk (f (x)) +
i
ϕ1 (f i (x)).
i=0 i=nl+1
3.3.3 Estimating ϕ−
Towards establishing (3.3.5), in this section we prove the following lemma:
Lemma 3.3.6. ϕ− dμ = L
Proof. Suppose for a while that ϕn /n is uniformly bounded from below, that is,
that there exists κ > 0 such that ϕn /n ≥ −κ for every n. Applying the lemma of
Fatou (Theorem A.2.10) to the sequence of non-negative functions ϕn /n + κ,
we get that ϕ− is integrable and
ϕn
ϕ− dμ ≤ lim dμ = L.
n n
To prove the opposite inequality, observe that Lemma 3.3.5 implies
1 n−k k
ϕn dμ ≤ ψk dμ + max{ψk , ϕ1 } dμ. (3.3.9)
n n n
Note that max{ψk , ϕ1 } ≤ max{ϕ− + ε, ϕ1+ }, and this last function is integrable.
So, the limit superior of the last term in (3.3.9) as n → ∞ is less than or equal
to zero. So, making n → ∞ we get that L ≤ ψk dμ for every k. Then, making
k → ∞, we conclude that
L ≤ ϕ− dμ + ε.
Finally, making ε → 0 we get that L ≤ ϕ− dμ. This proves the lemma when
ϕn /n is uniformly bounded from below.
We are left to remove this hypothesis. Define, for each κ > 0,
ϕnκ = max{ϕn , −κn} and ϕ−κ = max{ϕ− , −κ}.
The sequence (ϕnκ )n satisfies all the conditions of Theorem 3.3.3: indeed, it is
subadditive and the positive part of ϕ1κ is integrable. Moreover, it is clear that
ϕ−κ = lim infn (ϕnκ /n). So, the argument in the previous paragraph shows that
κ 1
ϕ− dμ = inf ϕnκ dμ. (3.3.10)
n n
3.3.4 Bounding ϕ+
To complete the proof of (3.3.5), we are now going to show that ϕ+ dμ ≤ L
as long as infx ϕn (x) is finite for every n. Let us start by proving the following
auxiliary result:
Lemma 3.3.8. Suppose that infx ϕn (x) > −∞ for every n. Then ϕ+ dμ ≤ L.
n−1
Proof. For each k and n ≥ 1, consider θn = − j=0 ϕk ◦ f jk . Observe that
θn dμ = −n ϕk dμ for every n, (3.3.12)
Observe also that the sequence (θn )n is additive: θm+n = θm + θn ◦ f km for every
m, n ≥ 1. Since θ1 = −ϕk is bounded from above by − inf ϕk , we also have that
84 Ergodic theorems
the function θ1+ is bounded and, consequently, integrable. Thus, we may apply
Lemma 3.3.6, together with the equality (3.3.12), to conclude that
θn
θ− dμ = inf dμ = − ϕk dμ. (3.3.14)
n n
Putting (3.3.13) and (3.3.14) together we get that
1
ϕ+ dμ ≤ ϕk dμ.
k
Finally, taking the infimum over k we get that ϕ+ dμ ≤ L.
Lemmas 3.3.6 and 3.3.8 imply the relation (3.3.5) and, thus, Theorem 3.3.3
is proven when inf ϕk > −∞ for every k. In the general case, consider
ϕnκ = max{ϕn , −κn} and ϕ−κ = max{ϕ− , −κ} and ϕ+κ = max{ϕ+ , −κ}
for every constant κ > 0. The previous arguments may be applied to the
sequence (ϕnκ )n for each fixed κ > 0. Hence, ϕ+κ = ϕ−κ at μ-almost every point
for every κ > 0. Since ϕ−κ → ϕ− and ϕ+κ → ϕ+ when κ → ∞, it follows that
ϕ− = ϕ+ at μ-almost every point. The proof of Theorem 3.3.3 is complete.
Moreover, the numbers k(x) and λ1 (x), . . . , λk (x) and the subspaces Vx1 , . . . , Vxk
depend measurably on the point x.
The numbers λi (x) are called the Lyapunov exponents of θ at the point x.
They satisfy λ1 = λmax and λk = λmin . For this reason, we also call λmax (x)
and λmin (x) the extremal Lyapunov exponents of θ at the point x. Each di (x) is
called the multiplicity of the Lyapunov exponent λi (x).
When f is invertible, we may extend the sequence φ n to the whole of Z,
through
φ −n (x) = φ n (f −n (x))−1 for every n ≥ 1 and x ∈ M.
Assuming also that log+ θ −1 ∈ L1 (μ), one obtains a stronger conclusion than
before: more than a filtration, there is a decomposition
Rd = Ex1 ⊕ · · · ⊕ Exk (3.3.17)
86 Ergodic theorems
1 k
(c2) lim log | det φ (x)| = n
di (x)λi (x), where di (x) = dim Exi .
n→+∞ n
i=1
The reader will find a much more detailed discussion of these results,
including proofs, in Chapter 4 of [Via14].
3.3.6 Exercises
3.3.1. Give a direct proof of the Birkhoff ergodic theorem (Theorem 3.2.3), using the
approach in the proof of Theorem 3.3.3.
3.3.2. Given a subadditive sequence (ϕn )n with ϕ1+ ∈ L1 (μ), show that the functions
ϕn ϕn
ϕ− = lim inf and ϕ+ = lim sup
n n n n
are f -invariant, that is, they satisfy ϕ− (x) = ϕ− ◦ f (x) and ϕ+ (x) = ϕ+ ◦ f (x) for
μ-almost every x ∈ M.
3.3.3. State and prove the subadditive ergodic theorem for flows.
3.3.4. Let M be a compact manifold and f : M → M be a diffeomorphism of class C1
that preserves the Lebesgue measure. Check that
k(x)
di (x)λi (x) = 0 at μ-almost every point x ∈ M,
i=1
where λi (x), i = 1, . . . , k(x) are the Lyapunov exponents of Df at the point x and
di (x), i = 1, . . . , k(x) are the corresponding multiplicities.
3.3.5. Let (ϕn )n be a subadditive sequence of functions for some transformation f : M →
M. We call the time constant of (ϕn )n the number
1
lim ϕn dμ
n n
when it exists. Assuming that the limit does exist and is finite, show that we may
write ϕn = ψn + γn for each n, in such a way that (ψn )n is an additive sequence
and (γn )n is a subadditive sequence with time constant equal to zero.
3.3.6. Under the assumptions of the Furstenberg–Kesten theorem, show that the
sequence ψn = (1/n) log φ n is uniformly integrable, in the following sense:
for every ε > 0 there exists δ > 0 such that
μ(E) < δ ⇒ ψn+ dμ < ε for every n.
E
3.3.7. Under the assumptions of the Furstenberg–Kesten theorem, let k denote the
time average of the function ψk = (1/k) log φ k relative to the transformation
f k . Show that λmax (x) ≤ k (x) for every k and μ-almost every x. Using
Exercise 3.3.6, show that for every ρ > 0 and μ-almost every x there exists k
such that k (x) ≤ λmax (x) + ρ.
3.4 Discrete time and continuous time 87
for every x ∈ M. That is the case, for instance, if τ is bounded away from zero.
The first step is to construct the domain N of the suspension flow. Let us
consider the transformation F : M × R → M × R defined by
F(x, s) = (f (x), s − τ (x)).
Note that F is invertible. Let ∼ be the equivalence relation defined in M × R
by
(x, s) ∼ (x̃, s̃) ⇔ there exists n ∈ Z such that F n (x, s) = (x̃, s̃).
We denote by N the set of equivalence classes and by π : M × R → N the
canonical projection associating with every (x, s) ∈ M × R the corresponding
equivalence class.
Now consider the flow Gt : M × R → M × R given by Gt (x, s) = (x, s + t). It
is clear that Gt ◦ F = F ◦ Gt for every t ∈ R. This ensures that Gt , t ∈ R induces
a flow gt , t ∈ R in the quotient space N, given by
gt (π(x, s)) = π(Gt (x, s)) for every x ∈ M and s, t ∈ R. (3.4.2)
Indeed, if π(x, s) = π(x̃, s̃) then there exists n ∈ Z such that F n (x, s) = (x̃, s̃).
Hence,
Gt (x̃, s̃) = Gt ◦ F n (x, s) = F n ◦ Gt (x, s)
and so π(Gt (x, s)) = π(Gt (x̃, s̃)). This shows that the flow gt , t ∈ R is well
defined.
88 Ergodic theorems
Then, taking n maximum such that sn ≥ 0, we find that (xn , sn ) ∈ D. In this way,
the claim is proved. Now observe that the claim means that the restriction of
the projection π to domain D is a bijection over N. Thus, we may identify N
with D and, in particular, we may consider gt , t ∈ R as a flow in D.
In just the same way, we may identify M with the subset = π(M × {0}) of
N. Observing that
ν = π∗ (μ × ds | D). (3.4.4)
f(x)
0 τ (x)
Lemma 3.4.2. Let A be a measurable subset of ρ for some ρ > 0. Then, the
function δ → ν(Aδ )/δ is constant in the interval (0, ρ].
and this is a disjoint union. Using that ν is invariant under the flow gt , t ∈ R,
we conclude that ν(Aδ ) = lν(Aδ/l ) for every δ ∈ (0, ρ] and every l ≥ 1. Then,
ν(Arδ ) = rν(Aδ ) for every δ ∈ (0, ρ] and every rational number r ∈ (0, 1). Using,
furthermore, the fact that both sides of this relation vary monotonically with
r, we get that the equality remains true for every real number r ∈ (0, 1). This
implies the conclusion of the lemma.
Proposition 3.4.3. Suppose that the measure ν is finite. Then the measure μ
in is invariant under the Poincaré map f .
f(A)
f(A)δ
Aδ
A
Next, choose ti < inf(τ | Ai ) ≤ sup(τ | Ai ) < si such that si − ti < ερi . Fix
δi = ρi /2. Then, using the fact that f is essentially surjective,
gti (Aiδi ) ⊃ Biδi −(si −ti ) and gsi (Aiδi ) ⊂ Biδi +(si −ti ) .
Hence, using the hypothesis that ν is invariant,
ν(Aiδi ) = ν(gti (Aiδi )) ≥ ν(Biδi −(si −ti ) )
ν(Aiδi ) = ν(gsi (Aiδi )) ≤ ν(Biδi +(si −ti ) ).
Dividing by δi we get that
(si − ti )
μ(Ai ) ≥ 1 − μ(Bi ) > (1 − 2ε)μ(Bi )
δ
(si − ti )
μ(Ai ) ≤ 1 + μ(Bi ) < (1 + 2ε)μ(Bi ).
δ
Finally, adding over all the values of i, we conclude that
(1 − 2ε)μ(A) ≤ μ(B) ≤ (1 + 2ε)μ(A).
Since ε is arbitrary, this proves that the measure μ is invariant under f .
3.4.3 Exercises
3.4.1. Check that the function μ defined by (3.4.6)–(3.4.7) is a measure.
3.4.2. In the context of Section 3.4.1, suppose that M is a topological space and f : M →
M and τ : M → (0, ∞) are continuous. Let gt : N → N be the suspension flow
and ν be the suspension of some Borel measure μ invariant under f .
(a) Show that if x ∈ M is recurrent for the transformation f then π(x, s) ∈ N is
recurrent for the flow gt , for every s ∈ R.
(b) Show that if π(x, s) ∈ N is recurrent for the flow gt , for some s ∈ R, then
x ∈ M is recurrent for f .
92 Ergodic theorems
(c) Conclude that the set of recurrent points of f has total measure for μ if and
only if the set of recurrent points of gt , t ∈ R has total measure for ν. In
particular, this happens if at least one of the measures μ or ν is finite.
3.4.3. Let gt : N → N, t ∈ R be the flow defined by a vector field X of class C1 on a
compact Riemannian manifold N. Assume that this flow preserves the volume
measure ν associated with the Riemannian metric. Let be a hypersurface of N
transverse to X and ν be the volume measure on associated with the restriction
of the Riemannian metric. Define φ : → (0, ∞) through φ(y) = |X(y) · n(y)|,
where n(·) is a unit vector field orthogonal to . Show that the measure η = φν
is invariant under the Poincaré map f : → of the flow. Indeed, η coincides
with the flux of ν through .
3.4.4. The following construction has a significant role in the theory of interval
exchanges. Let N̂ ⊂ R4+ be the set of all 4-tuples (λ1 , λ2 , h1 , h2 ) of positive real
numbers, endowed with the standard volume measure ν̂ = dλ1 dλ2 dh1 dh2 . Define
(λ1 − λ2 , λ2 , h1 , h1 + h2 ) if λ1 > λ2
F : N̂ → N̂, F(λ1 , λ2 , h1 , h2 ) =
(λ1 , λ2 − λ1 , h1 + h2 , h2 ) if λ1 < λ2 .
The theorems presented in the previous chapter fully establish the first part of
Boltzmann’s ergodic hypothesis: for any measurable set E, the mean sojourn
time τ (E, x) is well defined for almost every point x. The second part of the
ergodic hypothesis, that is, the claim that τ (E, x) should coincide with the
measure of E for almost every x, is a statement of a different nature and is
the subject of the present chapter.
In this chapter we always take μ to be a probability measure invariant under
some measurable transformation f : M → M. We say that the system (f , μ) is
ergodic if, given any measurable set E, we have τ (E, x) = μ(E) for μ-almost
every point x ∈ M. We are going to see that this is equivalent to saying that
the system is dynamically indivisible, in the sense that every invariant set
has either full measure or zero measure. Other equivalent formulations of the
ergodicity property are discussed in Section 4.1. One of them is that time
averages coincide with space averages: for every integrable function ϕ,
1
n−1
lim ϕ(f j (x)) = ϕ dμ at μ-almost every point.
n n
j=0
(i) For every measurable set B ⊂ M one has τ (B, x) = μ(B) for μ-almost
every point.
(ii) For every measurable set B ⊂ M the function τ (B, ·) is constant at
μ-almost every point.
(iii) For every integrable function ϕ : M → R one has ϕ̃(x) = ϕ dμ for
μ-almost every point.
(iv) For every integrable function ϕ : M → R the time average ϕ̃ : M → R is
constant at μ-almost every point.
(v) For every invariant integrable function ψ : M → R one has ψ(x) = ψ dμ
for μ-almost every point.
(vi) Every invariant integrable function ψ : M → R is constant at μ-almost
every point.
(vii) For every invariant subset A we have either μ(A) = 0 or μ(A) = 1.
Proof. It is immediate that (i) implies (ii), that (iii) implies (iv) and that
(v) implies (vi). It is also clear that (v) implies (iii) and (vi) implies (iv),
because the time average is an invariant function (recall Proposition 3.2.4).
Analogously, (iii) implies (i) and (iv) implies (ii), because the mean sojourn
time is a time average (of the characteristic function of B). We are left to prove
the following implications:
(ii) implies (vii): Let A be an invariant set. Then τ (A, x) = 1 for μ-almost
every x ∈ A and τ (A, x) = 0 for μ-almost every x ∈ Ac . Since τ (A, ·) is assumed
to be constant at μ-almost every point, it follows that μ(A) = 0 or μ(A) = 1.
(vii) implies (v): Let ψ be an invariant integrable function. Then every
level set
Bc = {x ∈ M : ψ(x) ≤ c}
is an invariant set. So, the hypothesis implies that μ(Bc ) ∈ {0, 1} for every
c ∈ R. Since c → μ(Bc ) is non-decreasing, it follows that there exists c̄ ∈ R
96 Ergodicity
such that μ(Bc ) = 0 for every c < c̄ and μ(Bc ) = 1 for every c ≥ c̄. Then ψ = c̄
at μ-almost every point. Hence, ψ dμ = c̄ and so ψ = ψ dμ at μ-almost
every point.
(i) (f , μ) is ergodic.
(ii) For any pair of measurable sets A and B one has
1 −j
n−1
lim μ f (A) ∩ B = μ(A)μ(B). (4.1.1)
n n
j=0
(iii) For any functions ϕ ∈ Lp (μ) and ψ ∈ Lq (μ), with 1/p + 1/q = 1, one has
n−1
1 j
lim (Uf ϕ)ψ dμ = ϕ dμ ψ dμ. (4.1.2)
n n
j=0
Proof. It is clear that (iii) implies (ii): just take ϕ = XA and ψ = XB . To show
that (ii) implies (i), let A be an invariant set. Taking A = B in hypothesis (ii),
we get that
1 −j
n−1
μ(A) = lim μ f (A) ∩ A = μ(A)2 .
n n
j=0
at μ-almost every point. Initially, assume that |ϕ| ≤ k for some k ≥ 1. Then, for
every n ∈ N,
1 n−1 j
U ϕ ψ ≤ k|ψ|.
n f
j=0
So, since k|ψ| ∈ L1 (μ), we may use the dominated convergence theorem
(Theorem A.2.11) to conclude that
⎛ ⎞
j
n−1
⎝1 ⎠
U ϕ ψ dμ → ϕ dμ ψ dμ.
n j=0 f
4.1 Ergodic systems 97
This proves the claim (4.1.2) when ϕ is bounded. All that is left to do is remove
this restriction. Given any ϕ ∈ Lp (μ) and k ≥ 1, define
⎧
⎨ k if ϕ(x) > k
ϕk (x) = ϕ(x) if ϕ(x) ∈ [−k, k]
⎩
−k if ϕ(x) < −k.
if n is large enough (depending on k). Next, observe that ϕk − ϕp → 0 when
k → ∞: this is clear when p = ∞, because ϕk = ϕ for every k > ϕ∞ ; for
p < ∞ use the monotone convergence theorem (Theorem A.2.9). Hence, using
the Hölder inequality (Theorem A.5.5), we have that
(ϕk − ϕ) dμ ψ dμ ≤ ϕk − ϕp ψ dμ < ε, (4.1.5)
1 j
n−1 (4.1.6)
≤ U (ϕk − ϕ)p ψq dμ
n j=0 f
for every n and every k sufficiently large, independent of n. Fix k so that (4.1.5)
and (4.1.6) hold and then take n sufficiently large such that (4.1.4) also holds.
Summing the three relations (4.1.4) to (4.1.6), we get that
n−1
1 j
U ϕ ψ dμ − ϕ dμ ψ dμ < 3ε
n j=0 f
1 n
n−1
lim (Uf ϕ) − (ϕ · 1) · ψ = 0 for every ϕ, ψ ∈ L2 (μ). (4.1.7)
n n
j=0
98 Ergodicity
We will use a few times the following elementary facts: given any
measurable sets A and B,
|μ(A) − μ(B)| = |μ(A \ B) − μ(B \ A)|
(4.1.8)
≤ μ(A \ B) + μ(B \ A) = μ(A B),
and given any sets A1 , A2 , B1 , B2 ,
A1 ∩ A2 B1 ∩ B2 ⊂ (A1 B1 ) ∪ (A2 B2 ). (4.1.9)
Corollary 4.1.5. Assume that the condition (4.1.1) in Proposition 4.1.4 holds
for every A and B in some algebra A that generates the σ -algebra of
measurable sets. Then (f , μ) is ergodic.
1 −j
n−1
lim μ f (A0 ) ∩ B0 = μ(A0 )μ(B0 )
n n
j=0
implies that
1 −j
n−1
−4ε ≤ lim inf μ f (A) ∩ B − μ(A)μ(B)
n n j=0
1 −j
n−1
≤ lim sup μ f (A) ∩ B − μ(A)μ(B) ≤ 4ε.
n n j=0
Since ε is arbitrary, this proves that the condition (4.1.1) holds for all pairs of
measurable sets. According to Proposition 4.1.4, it follows that the system is
ergodic.
In the same spirit, it suffices to check part (iii) of Proposition 4.1.4 on dense
subsets:
Corollary 4.1.6. Assume that the condition (4.1.2) in Proposition 4.1.4 for
every ϕ and ψ in dense subsets of Lp (μ) and Lq (μ), respectively. Then (f , μ)
is ergodic.
We leave the proof of this fact to the reader (see Exercise 4.1.3).
4.1 Ergodic systems 99
4.1.3 Exercises
4.1.1. Let (M, A) be a measurable space and f : M → M be a measurable transformation.
Prove that if p ∈ M is a periodic point of period k, then the measure μp = 1k (δp +
δf (p) + · · · + δf k−1 (p) ) is ergodic.
4.1.2. Let μ be an invariant probability measure, not necessarily ergodic, of a
measurable transformation f : M → M. Show that the following limit exists for
any pair of measurable sets A and B:
1 −i
n−1
lim μ f (A) ∩ B .
n n
i=0
4.2 Examples
In this section we use a number of examples to illustrate different methods for
checking whether a system is ergodic or not.
is invariant under Rθ and its Lebesgue measure satisfies 0 < m(A) < 1. Thus, if
θ is rational then the Lebesgue measure is not ergodic. The converse is much
more interesting:
We are going to mention two different proofs of this fact. The first one,
which we detail below, uses some simple facts from Fourier analysis. The
second one, which we leave as an exercise (Exercise 4.2.6), is based on a
density point argument similar to the one we will use in Section 4.2.2 to prove
that the decimal expansion map is ergodic relative to the Lebesgue measure.
We denote by L2 (m) the Hilbert space of measurable functions ψ whose
square is integrable, that is, such that:
|ψ|2 dm < ∞.
φk : S1 → C, x → e2πikx , k∈Z
is a Hilbert basis of this space: given any ϕ ∈ L2 (m) there exists a unique
sequence (ak )k∈Z of complex numbers such that
ϕ(x) = ak e2πikx for almost every x ∈ S1 . (4.2.1)
k∈Z
In fact, the irrational rotations on the circle or, more generally, on any torus
have a much stronger property than ergodicity: they are uniquely ergodic,
meaning that they admit a unique invariant probability measure (which is
the Lebesgue measure, of course). Uniquely ergodic systems are studied in
Chapter 6.
102 Ergodicity
Proof. Since the fact of being 10-normal or not is independent of the integer
part of the number, we only need to show that almost every x ∈ [0, 1] is
10-normal. Consider f : [0, 1] → [0, 1] defined by f (x) = 10x − [10x]. For each
block (b1 , . . . , bl ) ∈ {0, . . . , 9}l , consider the interval
κ κ +1%
l
Ib1 ,...,bl = , with κ = bi 10l−i .
10l 10l i=1
Recall that if x = 0.a0 a1 · · · ak ak+1 · · · then f k (x) = 0.ak ak+1 · · · for every k ≥ 1.
Hence, f k (x) ∈ Ib1 ,...,bl if and only if (ak , . . . , ak+l−1 ) = (b1 , . . . , bl ). So, the mean
sojourn time τ (Ib1 ,...,bl , x) is equal to the frequency of the block (b1 , . . . , bl ) in
the decimal expansion of x. Using the Birkhoff ergodic theorem and the fact
that the transformation f is ergodic with respect to the Lebesgue measure m,
we conclude that for every (b1 , . . . , bl ) there exists a full Lebesgue measure
subset B(b1 , . . . , bl ) of the interval [0, 1] such that
1
τ (Ib1 ,...,bl , x) = m(Ib1 ,...,bl ) = for every x ∈ B(b1 , . . . , bl ).
10l
104 Ergodicity
σ −1 ([m; Am , . . . , An ]) = [m + 1; Am , . . . , An ]. (4.2.6)
and (using Lemma 1.3.1) that ensures that the measure μ is invariant under σ .
Lemma 4.2.8. If B and C are finite unions of pairwise disjoint cylinders, then
μ B ∩ σ −j (C) = μ(B)μ(σ −j (C)) = μ(B)μ(C),
Proof. First, suppose that B and C are both cylinders: B = [k; Bk , . . . , Bl ] and
C = [m; Cm , . . . , Cn ]. Then,
This proves the conclusion of the lemma when both sets are cylinders. The
general case follows easily, using the fact that μ is finitely additive.
Proceeding with the proof of Proposition 4.2.7, suppose for a while that
the invariant set A belongs to the algebra B0 whose elements are the finite
unions of pairwise disjoint cylinders. Then, on the one hand, we may apply
the previous lemma with B = C = A to conclude that μ(A ∩ σ −j (A)) = μ(A)2
for every large j. On the other hand, since A is invariant, the left-hand side of
this identity is equal to μ(A) for every j. It follows that μ(A) = μ(A)2 , which
means that either μ(A) = 0 or μ(A) = 1.
106 Ergodicity
Using (4.1.8) and (4.1.9) and the fact that μ is invariant, we get that
μ A ∩ σ −j (A) − μ B ∩ σ −j (B) ≤ 2μ(A B) < 2ε (4.2.8)
(a similar fact was deduced during the proof of Corollary 4.1.5). Moreover,
μ(A)2 − μ(B)2 ≤ 2μ(A) − μ(B) < 2ε. (4.2.9)
Putting the relations (4.2.7), (4.2.8) and (4.2.9) together, we conclude that
|μ(A) − μ(A)2 | < 4ε. Since ε is arbitrary, we deduce that μ(A) = μ(A)2 and,
hence, either μ(A) = 0 or μ(A) = 1.
on the set X. Furthermore, it is assumed that the character hit at each time
is independent of all the previous ones. This means that the distribution of the
sequences of characters (xn )n is governed by the Bernoulli probability measure
μ = νN.
The text of “Os Lusı́adas” corresponds to a certain finite (albeit very long)
sequence of characters (l0 , . . . , lN ). Consider the cylinder L = [0; l0 , . . . , lN ].
Then
&N
μ(L) = plj
j=1
ergodic theorem and the fact that (σ , μ) is ergodic, the set K of values of k for
which that happens satisfies
1
lim # K ∩ [0, n − 1] = μ(L) > 0, (4.2.10)
n n
with full probability. In particular, for almost all sequences (xn )n the set K
is infinite, which means that (xn )n contains infinitely many copies of “Os
Lusı́adas”. Actually, (4.2.10) yields an even stronger conclusion: still with full
probability, the copies of our poem correspond to a positive (although small)
fraction of all the typed characters. In other words, on average, the monkey
types a new copy of “Os Lusı́adas” every so many (a great many) years.
This can be proved using a more elaborate version of the method introduced
in Section 4.2.2. We are going to outline the arguments in the proof, referring to
Section 4.2.2 for those parts that are common to both situations and addressing
in more detail the main new difficulty.
Let A be an invariant set with positive measure. We want to show that
μ(A) = 1. On the one hand, it remains true that for almost every point a ∈ [0, 1]
there exists a sequence of intervals Ik containing a and such that Gk maps Ik
108 Ergodicity
bijectively and differentiably onto (0, 1). Indeed, such intervals can be found
as follows. First, consider
1 1
I(1, m) = , ,
m+1 m
for each m ≥ 1. Next, define, by recurrence,
I(k, m1 , . . . , mk ) = I(1, m1 ) ∩ G−k+1 I(k − 1, m2 , . . . , mk )
for m1 , . . . , mk ≥ 1. Then, it suffices to take as Ik the interval I(k, m1 , . . . , mk )
that contains a. This is well defined for every k ≥ 1 and every point a in the
complement of a countable set, namely, the set ∞ −k
k=0 G ({0, 1}).
On the other hand, although the restriction of Gk to each Ik is a differentiable
bijection, it is not affine. For that reason, the analogue of (4.2.4) cannot hold in
the present case. This difficulty is by passed by the result that follows, which
is an example of distortion control: it is important to note that the constant K
is independent of Ik , E1 , E2 and, most of all, k.
Proposition 4.2.11 (Bounded distortion). There exists K > 1 such that, given
any k ≥ 1 and any interval Ik such that Gk restricted to Ik is a differentiable
bijection,
μ(Gk (E1 )) μ(E1 )
≤ K
μ(Gk (E2 )) μ(E2 )
for any measurable subsets E1 and E2 of the interval Ik .
For the proof of this proposition we need the following two auxiliary results:
Proof. Recall that G(x) = 1/x − m on each interval (1/(m + 1), 1/m].
Therefore,
1 2
G (x) = − 2 and G (x) = 3 .
x x
The first identity implies that |G (x)| ≥ 1 for every x ∈ (0, 1]. Moreover,
|G (x)| ≥ 2 whenever x ≤ 2/3. On the other hand, x ≥ 2/3 implies that
G(x) = 1/x − 1 < 2/3 and, consequently, G (G(x)) ≥ 2. Combining these
observations we find that |(G2 ) (x)| = |G (x)| |G (G(x))| ≥ 2 for every x ∈ (0, 1].
Finally, |G (x)/G (x)2 | = 2|x| ≤ 2 also for every x ∈ (0, 1].
Lemma 4.2.13. There exists C > 1 such that, given any k ≥ 1 and any interval
Ik such that Gk restricted to Ik is a differentiable bijection,
|(Gk ) (x)|
≤C for any x and y in Ik .
|(Gk ) (y)|
4.2 Examples 109
k
= log |G ◦ gj (Gj (x))| − log |G ◦ gj (Gj (y))|,
j=1
where gj denotes a local inverse of G defined on the interval [Gj (x), Gj (y)].
Using the estimate (4.2.12), we get that
|(Gk ) (x)| k
k−1
log
≤2 |G (x) − G (y)| = 2
j j
|Gk−i (x) − Gk−i (y)|. (4.2.13)
|(G ) (y)|
k
j=1 i=0
|(Gk ) (x)|
k−1
log ≤2 2−[i/2] |Gk (x) − Gk (y)| ≤ 8|Gk (x) − Gk (y)| ≤ 8.
|(Gk ) (y)| i=0
We are ready to conclude that (G, μ) is ergodic. Let A be an invariant set with
μ(A) > 0. Then A also has positive Lebesgue measure, since μ is absolutely
continuous with respect to the Lebesgue measure. Let a be a density point of
A whose future trajectory is contained in the open interval (0, 1). Consider the
sequence (Ik )k of the intervals I(k, m1 , . . . , mk ) that contain a. It follows from
Lemma 4.2.12 that
1
diam Ik ≤ sup
: x ∈ Ik ≤ 2−[k/2]
|(G ) (x)|
k
where [x] denotes the equivalence class that contains x ∈ Rd . These transfor-
mations are called linear endomorphisms of the torus.
Note that fA is differentiable and the derivative DfA ([x]) at each point is
canonically identified with A. In particular, the Jacobian det DfA ([x]) is constant
equal to det A. It follows (Exercise 4.2.9) that the degree of f is equal to | det A|.
In particular, fA is invertible if and only if | det A| = 1. In this case, the inverse
4.2 Examples 111
is the transformation fA−1 induced by the inverse matrix A−1 ; observe that A−1
is also a matrix with integer coefficients.
In any case, fA preserves the Lebesgue measure on Td . This may be seen as
follows. Since fA is a local diffeomorphism, the pre-image of any measurable
set D with sufficiently small diameter consists of | det A| (= degree of fA )
pairwise disjoint sets Di , each of which is mapped diffeomorphically onto D.
By the formula of change of variables, m(D) = | det A| m(Di ) for every i. This
proves that m(D) = m(fA−1 (D)) for every measurable set D with small diameter.
Hence, fA does preserve the Lebesgue measure m. Next we prove the following
fact:
Exercise 4.2.8) that there exists some k ∈ Zd \ {0} such that Ar∗ (k) = k. Fix k
and consider the function ϕ ∈ L2 (m) defined by
r−1
r−1
2πi(Ai∗ (k)·x) i
ϕ([x]) = e = e2πi(k·A (x)) .
i=0 i=0
s
(x)
the image of any stable leaf is still a stable leaf. Moreover, by (4.2.17),
the transformation A contracts distances uniformly inside each stable leaf.
Analogously, the family of all affine subspaces of Rd of the form v + Eu with
v ∈ Rd defines the unstable foliation F u of Rd , whose elements are called
unstable leaves. The unstable foliation is also invariant and the transformation
A expands distances uniformly inside unstable leaves.
Mapping F s and F u by the canonical projection π : Rd → Td , we obtain
foliations W s and W u of the torus that we call stable foliation and unstable
foliation of the transformation fA . See Figure 4.1. The previous observations
show that these foliations are invariant under fA . Moreover:
j j
(i) d(fA (x), fA (y)) → 0 when j → +∞, for any points x and y in the same
stable leaf;
j j
(ii) d(fA (y), fA (z)) → 0 when j → −∞, for any points y and z in the same
unstable leaf.
1 1
n−1 n−1
+ j − −j
ϕ (x) = lim ϕ(fA (x)) and ϕ (x) = lim ϕ(fA (x)),
n n n n
j=0 j=0
which are defined for m-almost every x ∈ Td . By Corollary 3.2.8, there exists
a full measure set X ⊂ Td such that
ϕ + (x) = ϕ − (x) for every x ∈ X. (4.2.18)
Let us denote by W s (x) and W u (x), respectively, the stable leaf and the
unstable leaf of fA through each point x ∈ Td .
1
n−1
j j
lim ϕ(fA (x)) − ϕ(fA (y))
n n
j=0
is also zero. That implies that ϕ + (y) exists and is equal to ϕ + (x). The argument
for ϕ − is entirely analogous.
Given any open subset R of the torus and any x ∈ R, denote by W s (x, R)
the connected component of W s (x) ∩ R that contains x and by W u (x, R) the
connected component of W u (x) ∩ R that contains x. We call R a rectangle if
W s (x, R) intersects W u (y, R) at a unique point, for every x and y in R. See
Figure 4.2.
Proof. Let us denote by msx the Lebesgue measure on the stable leaf W s (x)
of each point x ∈ Td . Note that m(R \ X) = 0, since X has full measure in Td .
Then, by the theorem of Fubini,
msx W s (x, R) \ X = 0 for m-almost every x ∈ R.
Define YR = x ∈ X ∩ R : msx W s (x, R) \ X = 0 . Then YR has full measure in
R. Given x, y ∈ R, consider the map π : W s (x, R) → W s (y, R) defined by
R s
(x)
x
x
y s
(y)
y
This map is affine and, consequently, it has the following property, called
absolute continuity:
msx (E) = 0 ⇔ msy (π(E)) = 0.
In particular, the image of W s (x, R) ∩ X has full measure in W s (y, R) and,
consequently, it intersects W s (y, R) ∩ X. So, there exists x ∈ W s (x, R) ∩ X
whose image y = π(x ) is in W s (y, R) ∩ X. Observing that x and y are in
the same unstable leaf, by the definition of π , we see that these points satisfy
the conditions in the conclusion of the lemma.
4.2.7 Exercises
4.2.1. Prove Proposition 4.2.2.
4.2.2. Prove Proposition 4.2.9.
4.2.3. Let I = [0, 1] and f : I → I be the function defined by
⎧
⎪
⎪ 2x if 0 ≤ x < 1/3
⎪
⎨ 2x − 2/3 if 1/3 ≤ x < 1/2
f (x) =
⎪
⎪ 2x − 1/3 if 1/2 ≤ x < 2/3
⎪
⎩ 2x − 1 if 2/3 ≤ x ≤ 1.
4.2.5. Let X be a topological space, endowed with the corresponding Borel σ -algebra
C, and let = X N . Show that if X has a countable basis of open sets then the
Borel σ -algebra of (for the product topology) coincides with the product
σ -algebra B = C N . The same is true for = X Z and B = C Z .
4.2.6. In this exercise we propose an alternative proof of Proposition 4.2.1. Assume
that θ is irrational. Let A be an invariant set with positive measure. Recalling
that the orbit {Rnθ (a) : n ∈ Z} of every a ∈ S1 is dense in S1 , show that no point
of S1 is a density point of Ac . Conclude that μ(A) = 1.
4.2.7. Assume that θ is irrational. Let ϕ : S1 → R be any continuous function. Show
that
1
n−1
j
ϕ̃(x) = lim ϕ(Rθ (x)) (4.2.19)
n→∞ n
j=0
exists at every point and, in fact, the limit is uniform. Deduce that ϕ̃ is constant
at every point. Conclude that Rθ has a unique invariant probability measure.
4.2.8. Let A be a square matrix of dimension d with rational coefficients and let λ be
a rational eigenvalue of A. Show that there exists some eigenvector with integer
coefficients, that is, some k ∈ Zd \ {0} such that Ak = λk.
4.2.9. Show that if f : M → M is a local diffeomorphism on a compact Riemannian
manifold then
degree f = | det Df | dm,
1
n−1
ϕ̃(x) = lim ϕ(f j (x))
n→∞ n
j=0
(the first equality is part of the Birkhoff ergodic theorem). Therefore, the
integrals of each bounded measurable function ϕ with respect to μ and with
respect to ν coincide. In particular, considering characteristic functions, we
conclude that μ = ν.
Proof. To prove the “if” claim, assume that μ is not ergodic. Then there exists
some invariant set A with 0 < μ(A) < 1. Define μ1 and μ2 to be the normalized
restriction of μ to the set A and to its complement Ac , respectively:
μ E∩A μ E ∩ Ac
μ1 (E) = and μ2 (E) = .
μ(A) μ(Ac )
Since A and Ac are invariant sets and μ is an invariant measure, both μ1 and
μ2 are still invariant probability measures. Moreover,
μ = μ(A)μ1 + μ(Ac )μ2
and, consequently, μ is not extremal.
To prove the converse, assume that μ is ergodic and μ = (1 − t)μ1 + tμ2 for
some t ∈ (0, 1). It is clear that μ(E) = 0 implies μ1 (E) = μ2 (E) = 0, that is, μ1
and μ2 are absolutely continuous with respect to μ. Hence, by Lemma 4.3.1,
μ1 = μ = μ2 . This shows that μ is extremal.
118 Ergodicity
Let us also point out that distinct ergodic measures “live” in disjoint subsets
of the space M (see also Exercise 4.3.6):
Lemma 4.3.3. Assume that the σ -algebra of M admits some countable
generating subset . Let {μi : i ∈ I} be an arbitrary family of ergodic
probability measures, all distinct. Then these measures μi are mutually
singular: there exist pairwise disjoint measurable subsets {Pi : i ∈ I} invariant
under f and such that μi (Pi ) = 1 for every i ∈ I.
Proof. Assume that f is transitive and let x ∈ M be a point whose orbit {f n (x) :
n ∈ N} is dense. Then there exists m ≥ 1 such that f m (x) ∈ V and (using the
fact that {f n (x) : n > m} is also dense) there exists n > m such that f n (x) ∈ U.
Take k = n − m. Then f m (x) ∈ f −k (U) ∩ V. This proves the “only if” part of the
statement.
To prove the converse, let {Uj : j ∈ N} be a countable basis of open subsets
of M. The hypothesis ensures that the open set ∞ k=1 f
−k
(Uj ) is dense in M for
every j ∈ N. Then the intersection
∞
∞
X= f −k (Uj )
j=1 k=1
Since the Uj constitute a basis of open subsets of M, this means that {f k (x) :
k ∈ N} is dense in M.
Proposition 4.3.5. Let M be a Baire space with a countable basis of open sets.
If μ is an ergodic probability measure then the restriction of f to the support
of μ is transitive.
Proof. Start by noting that supp μ has a countable basis of open sets, because
it is a subspace of M, and it is a Baire space, since it is closed in M. Let
U and V be open subsets of supp μ. By the definition of support, μ(U) >
0 and μ(V) > 0. Define B = ∞ k=1 f
−k
(U). Then μ(B) > 0, because B ⊃ U,
−1
and f (B) ⊂ B. By ergodicity (see Exercise 1.1.4) it follows that μ(B) = 1.
Then B must intersect V. This proves that there exists k ≥ 1 such that f −k (U)
intersects V. By Lemma 4.3.4, it follows that the restriction f : supp μ → supp μ
is transitive.
4.3.1 Exercises
4.3.1. Let M be a topological space M with a countable basis of open sets, f : M → M
be a measurable transformation and μ be an ergodic probability measure. Show
that the orbit {f n (x) : n ≥ 0} of μ-almost every point x ∈ M is dense in the support
of μ.
4.3.2. Let f : M → M be a continuous transformation in a compact metric space. Given
a function ϕ : M → R, prove that there exists an invariant probability measure μϕ
such that
ϕ dμϕ = sup ϕ dη.
η∈M1 (f )
The results presented below imply that the conclusion of this theorem is
no longer true when one replaces Homeovol (M) by the space Diffeokvol (M)
of conservative diffeomorphisms of class Ck , at least for k > 3. Essentially
nothing is known in this regard in the cases k = 2 and k = 3. On the other hand,
Artur Avila, Sylvain Crovisier and Amie Wilkinson have recently announced
a C1 version of the previous theorem: for every compact Riemannian manifold
M, there exists a residual subset R of the space Diffeo1vol (M) of conservative
diffeomorphisms of class C1 such that every f ∈ R with positive entropy hvol (f )
is ergodic. The notion of entropy will be studied in Chapter 9.
4.4 Comments in conservative dynamics 121
dH ∂H dqj ∂H dpj
d
= + ≡ 0.
dt j=1
∂qj dt ∂pj dt
Thus, we may consider the restriction of the flow to each energy hypersurface
Hc = {(q, p) : H(q, p) = c}. The volume measure dq1 · · · dqd dp1 · · · dpd is called
the Liouville measure. Observe that the divergence of the vector field
∂H ∂H ∂H ∂H
F= − ,...,− , ,...,
∂p1 ∂pd ∂q1 ∂qd
is identically zero. Thus (recall Section 1.3.6) the Liouville measure is
invariant under the Hamiltonian flow. It follows (see Exercise 1.3.12) that
the restriction of the flow to each energy hypersurface Hc admits an invariant
measure μc that is given by
ds
μc (E) = for every measurable set E ⊂ Hc ,
E grad H
where ds denotes the volume element on the hypersurface. Then, the ergodic
hypothesis may be viewed as claiming that, in general, Hamiltonian systems
122 Ergodicity
are ergodic with respect to this invariant measure μc on (almost) every energy
hypersurface.
The first important result in this context was announced by Andrey Kol-
mogorov at the International Congress of Mathematicians ICM 1954 and was
substantiated, soon afterwards, by the works of Vladimir Arnold and Jürgen
Moser. This led to the deep theory of so-called almost integrable systems that
is known as KAM theory, in homage to its founders, and to which several
other mathematicians contributed in a decisive manner, including Helmut
Rüssmann, Michael Herman, Eduard Zehnder, Jean-Christophe Yoccoz and
Jürgen Pöschel, among others. Let us explain what is meant by “almost
integrable”.
A Hamiltonian system with d degrees of freedom is said to be integrable (in
the sense of Liouville) if it admits d first integrals I1 , . . . , Id :
It follows from the previous remarks that every system with d = 1 degree
of freedom is integrable: the Hamiltonian H itself is a first integral. Another
important example:
Observe that the Hessian matrix of H0 coincides with the Jacobian matrix of
the function I → ω(I). Therefore, the twist condition (4.4.4) means that the
map assigning to each value of I the corresponding frequency vector ω(I) is a
local diffeomorphism.
The next theorem means that, under this condition, most of the invariant tori
of the Hamiltonian flow of H0 persist for any nearby system:
The latter is because the set K may be decomposed into positive volume
subsets that are also unions of invariant tori and, thus, are invariant. The proof
of the theorem shows that the persistence or not of a given invariant torus of H0
is intimately related to the arithmetic properties of the corresponding frequency
vector. Let us explain this.
Given c > 0 and τ > 0, we say that a vector ω0 ∈ Rd is (c, τ )-Diophantine if
c
|k · ω0 | ≥ τ
for every k ∈ Zd , (4.4.5)
k
where k = |k1 | + · · · + |kd |. Diophantine vectors are rationally independent;
in fact, the condition (4.4.5) means that ω0 is badly approximated by rationally
dependent vectors. We say that ω0 is τ -Diophantine if it is (c, τ )-Diophantine
for some c > 0. The set of τ -Diophantine vectors is non-empty if and only if
τ ≥ d − 1; moreover, it has full measure in Rd if τ is strictly larger than d − 1
(see Exercise 4.4.1).
While proving Theorem 4.4.4, it is shown that, given c > 0, τ ≥ d − 1 and
any compact set ⊂ ω(Bd ), one can find a neighborhood V of H0 such that,
for every H ∈ V and every (c, τ )-Diophantine vector ω0 ∈ , the Hamiltonian
flow of H admits a differentiable invariant torus restricted to which the flow is
conjugate to the linear flow t → ϕ(t) = ϕ(0) + tω0 .
Next, we discuss a version of Theorem 4.4.4 for discrete time systems or,
more precisely, symplectic transformations. We call a symplectic manifold
(see Arnold [Arn78, Chapter 8]) any differentiable manifold M endowed
with a symplectic form, that is, a non-degenerate differential 2-form θ . By
“non-degenerate” we mean that for every x ∈ M and every u = 0 there exists
4.4 Comments in conservative dynamics 125
Using
∂ ∂ ∂ ∂ ∂ωi ∂
Df0 · = and Df0 · = + ,
∂qj ∂qj ∂pj ∂pj i
∂pj ∂qi
for each (q, p) ∈ T ∗ M. It is clear that α is well defined and of class Cr−2 .
Consider the exterior derivative θ ∗ = dα. One can check (for instance, using
local coordinates) that θ ∗ is non-degenerate at every point and, thus, is a
symplectic form in T ∗ M.
p : Tq M → R, p(w) = v ·q w.
for any w1 , w2 ∈ T(q,v) (TM). It is clear from the construction that, unlike θ ∗ ,
this form θ depends on the Riemannian metric in M.
By analogy with the case of flows, we call a transformation f0 integrable if
there exist coordinates (q, p) ∈ Td × Bd such that f0 (q, p) = (q + ω(p), p) for
every (q, p). Moreover, we say that f0 is non-degenerate if
p q p
q q
ζ
p p
q
Figure 4.3. Invariant circles, periodic orbits and homoclinic intersections in the
neighborhood of a generic elliptic fixed point
Figure 4.4. Computational evidence for the presence of invariant circles, elliptic
islands and transverse homoclinic intersections
with associated transverse homoclinic intersections. One can also observe the
presence of certain trajectories with “chaotic” behavior, apparently related to
those homoclinic intersections.
More generally, let f : M → M be a symplectic diffeomorphism on a
symplectic manifold M of any (even) dimension 2d ≥ 2. We say that a
fixed point ζ ∈ M is elliptic if all the eigenvalues of the derivative Df (ζ )
are in the unit circle. Let λ1 , λ̄1 , . . . , λd , λ̄d be those eigenvalues. We say
k k
that ζ is non-degenerate if λ11 . . . λdd = 1 for every (k1 , . . . , kd ) ∈ Zd with
|k1 | + · · · + |kd | ≤ 4 (in particular, the eigenvalues are all distinct). Then, by
the Birkhoff normal form theorem (see Arnold [Arn78, Appendix 7]), there
exist canonical coordinates (x1 , . . . , xd , y1 , . . . , yd ) ∈ R2d in a neighborhood of ζ
such that ζ = (0, . . . , 0, 0, . . . , 0) and the transformation f has the form
f0 : (θ , ρ) → (θ + ω0 + ω1 (ρ), ρ)
is integrable and satisfies the twist condition (4.4.7). Applying the ideas of
Theorem 4.4.8, one concludes that ζ is a density point of a set K formed
by invariant tori of dimension d, restricted to which the transformation f is
conjugate to a Diophantine rotation.
In particular, symplectic transformations with generic elliptic fixed (or
periodic) points are never ergodic. Observe, on the other hand, that for d > 1
a torus of dimension d does not separate the ambient space M into two
connected components. Therefore, the argument we used before to conclude
that generic elliptic fixed points on surfaces are stable does not extend to higher
dimensions. In fact, it is known that when d > 1 elliptic fixed points are usually
unstable: trajectories starting arbitrarily close to the fixed point may escape
from a fixed neighborhood of it. This is related to the phenomenon known as
Arnold diffusion, which is a very active research topic in this area.
Finally, let us mention that this theory also applies to continuous time
conservative systems. We say that a stationary point ζ of a Hamiltonian flow
is elliptic if all the eigenvalues of the derivative of the vector field at the point
ζ are pure imaginary numbers. Arguments similar to those in the discrete time
case show that, under generic hypothesis, ζ is a density point of a set formed
by invariant tori of dimension d restricted to each of which the Hamiltonian
flow is conjugate to a linear flow.
4.4 Comments in conservative dynamics 131
3 That is, a volume form defined up to sign: the sign is not determined because the inner product
does not detect the orientation of the vector space.
132 Ergodicity
(see Figure 4.5), with exponential convergence rates that are uniform, that is,
independent of γ . Moreover, the geodesic flow is transitive. The second main
step in the proof of Theorem 4.4.10 consists of showing that every transitive,
uniformly hyperbolic flow (or transitive Anosov flow) of class C2 that preserves
volume is ergodic. We will comment on this last issue in a little while.
There exists a corresponding notion for discrete time systems: we say that
a diffeomorphism f : N → N on a compact Riemannian manifold is uniformly
W s (γ )
γ
W u (γ)
(for some choice of a norm compatible with the Riemannian metric on M).
One can prove that for each z ∈ N the set W s (z) of points whose forward
trajectory is asymptotic to the trajectory of z is a differentiable (immersed)
submanifold of N tangent to Ezs at the point z; analogously, the set W s (z)
of points whose backward trajectory is asymptotic to the trajectory of z is a
differentiable submanifold tangent to Ezu at the point z. These submanifolds
form foliations (that is, decompositions of N into differentiable submanifolds)
that are invariant under the diffeomorphism:
f (W s (z)) = W s (f (z)) and f (W u (z)) = W u (f (z)) for every z ∈ N.
We call W s (z) the stable manifold (or stable leaf ) and W u (z) the unstable
manifold (or unstable leaf ) of the point z ∈ M.
Concerning the second part of the proof of Theorem 4.4.10, the crucial tech-
nical tool to prove that every transitive, uniformly hyperbolic diffeomorphism
of class C2 that preserves volume is ergodic is the following theorem of Anosov
and Sinai [AS67]:
Theorem 4.4.11 (Absolute continuity). The stable and unstable foliations of
any Anosov diffeomorphism (or flow) of class C2 are absolutely continuous:
1. if X ⊂ N has zero volume then X ∩ W s (x) has volume zero inside W s (x) for
almost every x ∈ N;
2. if Y ⊂ is a zero volume subset of some submanifold transverse to the
stable foliation, then the union of the stable manifolds through the points of
Y has zero volume in N;
and analogously for the unstable foliation.
Ergodicity of the system may then be deduced using the Hopf argument,
which we introduced in a special case in Section 4.2.6. Let us explain this.
Given any continuous function ϕ : N → R, let Eϕ be the set of all points
z ∈ N for which the forward and backward time averages, ϕ + (z) and ϕ − (z),
are well defined and coincide. This set Eϕ has full volume, as we have
seen in Corollary 3.2.8. Observe also that ϕ + is constant on each stable
manifold and ϕ − is constant on each unstable manifold. So, by the first part
of Theorem 4.4.11, the intersection Yz = W u (z) ∩ Eϕ has full volume in W u (z)
for almost every z ∈ N. Moreover, ϕ − = ϕ + is constant on each Yz . Fix any
134 Ergodicity
such z. The transitivity hypothesis implies that the union of all stable manifolds
through the points of W u (z) is the whole ambient manifold N. Hence, using the
second part of Theorem 4.4.11, the union of the stable manifolds through the
points of Yz has full volume in N. Clearly, ϕ + is constant on this union. This
shows that the time average of every continuous function ϕ is constant on a
full measure set. Hence, f is ergodic.
We close this section by observing that all the known examples of Anosov
diffeomorphisms are transitive. The corresponding statement for Anosov flows
is false (see Verjovsky [Ver99]). Another open problem in this setting is
whether ergodicity still holds when the Anosov system is only of class C1 . It is
known (see [Bow75b, RY80]) that in this case the absolute continuity theorem
(Theorem 4.4.11) is false, in general.
4.4.6 Billiards
As we have seen in Sections 4.4.2 and 4.4.3, non-ergodic systems are quite
common in the realm of Hamiltonian flows and symplectic transformations.
However, this fact alone is not sufficient to invalidate the ergodic hypothesis
of Boltzmann in the context where it was formulated. Indeed, ideal gases are
a special class of systems and it is conceivable that ergodicity could be typical
in this more restricted setting, even it is not typical for general Hamiltonian
systems.
In the 1960’s, the Russian mathematician and theoretical physicist Yakov
Sinai [Sin63] conjectured that Hamiltonian systems formed by spherical
hard balls that hit each other elastically are ergodic. Hard ball systems (see
Example 4.4.13 for a precise definition) had been proposed as a model for the
behavior of ideal gases by the American scientist Josiah Willard Gibbs who,
together with Boltzmann and Scottish mathematician and theoretical physicist
James Clark Maxwell, created the area of statistical mechanics. The ergodic
hypothesis of Boltzmann–Sinai, as Sinai’s conjecture is often referred to, is the
main topic in the present section.
In fact, we are going to discuss the problem of ergodicity for somewhat
more general systems, called billiards, whose formal definition was first given
by Birkhoff in the 1930’s.
In its simplest form, a billiard is given by a bounded connected domain ⊂
R , called the billiard table, whose boundary ∂ is formed by a finite number
2
of differentiable curves. We call the corners those points of the boundary where
it fails to be differentiable; by hypothesis, they constitute a finite set C ⊂ ∂.
One considers a point particle moving uniformly along straight lines inside
, with elastic reflections on the boundary. That is, whenever the particle hits
∂ \ C it is reflected in such a way that the angle of incidence equals the angle
of reflection. When the particle hits some corner it is absorbed: its trajectory is
not defined from then on.
4.4 Comments in conservative dynamics 135
∂Ω ∂Ω
θ
s s′
θ′ s′
θ
θ
s
Let us denote by n the unit vector field orthogonal to the boundary ∂ and
pointing to the inside of . It defines an orientation in ∂\C: a vector t tangent
to the boundary is positive if the basis {t, n} of R2 is positive. It is clear that the
motion of the particle is characterized completely by the sequence of collisions
with the boundary. Moreover, each such collision may be described by the
position s ∈ ∂ and the angle of reflection θ ∈ (−π/2, π/2). Therefore, the
evolution of the billiard is governed by the transformation
f : (∂ \ C) × (−π/2, π/2) → ∂ × (−π/2, π/2), (4.4.13)
that associates with each collision (s, θ ) the subsequent one (s , θ ). See
Figure 4.6.
In the example on the left-hand side of Figure 4.6 the billiard table is a
polygon, that is, the boundary consists of a finite number of straight line
segments. The one trajectory represented in the figure hits one of the corners.
Nearby trajectories, to either side, collide with distinct boundary segments,
with very different angles of incidence. In particular, it is clear that the billiard
transformation (4.4.13) cannot be continuous. Discontinuities may occur even
in the absence of corners. For example, on the right-hand side of Figure 4.6
the boundary has four connected components, all of which are differentiable
curves. Consider the trajectory represented in the figure, tangent to one of the
boundary components. Nearby trajectories, to either side, hit with different
boundary components. Consequently, the billiard map is discontinuous in this
case also.
Example 4.4.12 (Circular billiard table). On the left-hand side of Figure 4.7
we represent a billiard in the unit ball ⊂ R2 . The corresponding billiard
transformation is given by
f : (s, θ ) → (s − (π − 2θ ), θ ).
The behavior of this transformation is described geometrically on the
right-hand side of Figure 4.7. Observe that f preserves the area measure ds dθ
and satisfies the twist condition (4.4.4). Note also that f is integrable (in the
sense of Section 4.4.2) and, in particular, the area measure is not ergodic.
We will see in a while (Theorem 4.4.14) that every planar billiard preserves
a natural measure equivalent to the area measure on ∂ × (−π/2, π/2).
136 Ergodicity
s
θ = π/ 2
θ
π −2θ
θ′
θ = −π/ 2
s′
Then, using the previous observations, the KAM theory allows us to prove
that billiards with almost circular tables are not ergodic with respect to that
invariant measure.
4 One may replace the torus Td by a more plausible container, such as the d-dimensional cube
[0, 1]d , for example. However, the analysis is a bit more complicated in that case, because we
must take into account the collisions of the balls with the container’s walls.
4.4 Comments in conservative dynamics 137
d
Rij
vj
vi
vj vj
vi
vi
d ≥ 2. Let us also assume that all the molecules move with constant unit speed.
This system can be modelled by a billiard, as follows.
For 1 ≤ i ≤ N, denote by pi ∈ Td the position of the center of the i-th molecule
Mi . Let ρ > 0 be the radius of each molecule. Then, each state of the system is
entirely described by a value of p = (p1 , . . . , pN ) in the set
= {p = (p1 , . . . , pN ) ∈ TNd : pi − pj ≥ 2ρ for every i = j}
(this set is connected, as long as the radius ρ is sufficiently small).
In the absence of collisions, the point p moves along a straight line inside ,
with constant speed. When two molecules Mi and Mj collide, pi − pj = 2ρ
and the velocity vectors change in the following way. Let vi and vj be the
velocity vectors of the two molecules immediately before the collision and
let Rij be the straight line through pi and pj . The elasticity hypothesis means
that the velocity vectors vi and vj immediately after the collision are given by
(check the right-hand side of Figure 4.8):
(i) the components of vi and vi in the direction of Rij are symmetric and the
same is true for vj and vj ;
(ii) the components of vi and vi in the direction orthogonal to Rij are equal and
the same is true for vj and vj .
This means, precisely, that the point p undergoes elastic reflection on
the hypersurface {p ∈ ∂ : pi − pj = 2ρ} of the boundary of (see
Exercise 4.4.4). Therefore, the motion of the point p corresponds exactly to
the evolution of the billiard in the table .
The next result places billiards well inside the domain of interest of ergodic
theory. Let ds be the volume measure induced on the boundary ∂ by the
Riemannian metric of the ambient manifold; in the planar case (that is, when
⊂ R2 ), ds is just the arc-length. Denote by dθ the angle measure on each
hemisphere {v ∈ Sd−1 : v · n(s) > 0}.
Theorem 4.4.14. The transformation f preserves the measure ν = cos θ ds dθ
on the domain {(s, v) ∈ ∂ × Sd−1 : v · n(s) > 0}.
138 Ergodicity
dh
s′ s′
dt
dθ
s s
In what follows we sketch the proof for planar billiards. The reader should
have no trouble checking that all the arguments extend naturally to arbitrary
dimension.
Consider any family of trajectories starting from a given boundary point (this
means that s is fixed), as represented on the left-hand side of Figure 4.9. Let
this family be parameterized by the angle of reflection θ . Denote by (s, s ) the
length of the line segment connecting s to s . Then (s, s )dθ = dh = cos θ ds
and, thus,
∂s (s, s )
= .
∂θ cos θ
To calculate the derivative of θ with respect to θ , observe that the variation
of θ is the sum of two components: the first one corresponds to the variation
of θ , whereas the second one arises from the variation of the normal vector
n(s ) as the collision point s varies. By the definition of curvature, this
second component is equal to κ(s )ds . It follows that dθ = dθ + κ(s )ds and,
consequently,
∂θ ∂s (s, s )
= 1 + κ(s ) = 1 + κ(s ) .
∂θ ∂θ cos θ
This can be summarized as follows:
∂ (s, s ) ∂ (s, s ) ∂
Df (s, θ ) =
+ 1 + κ(s ) . (4.4.14)
∂θ cos θ ∂s cos θ ∂θ
Let J(s, θ ) be the matrix of the derivative Df (s, θ ) with respect to the bases
{∂/∂s, ∂/∂θ} and {∂/∂s , ∂/∂θ }. The relations (4.4.14) and (4.4.15) imply that
(s,s ) 1
θ θ
cos
(s,s ) κ(s )
cos
1 + κ(s ) cos θ cos θ cos θ
det J(s, θ ) = = . (4.4.16)
0 − 1 cos θ
cos θ
1 − κ(s)
cos θ
for every bounded measurable function ϕ. This proves that f preserves the
measure ν = cos θ ds dθ , as we stated.
We call a billiard dispersing if the boundary of the billiard table is strictly
convex at every point, when viewed from the inside. In the planar case, with
the orientation conventions that we adopted, this means that the curvature κ is
negative at every point. Figure 4.10 presents two examples. In the first one,
⊂ R2 and the boundary is a connected set formed by the union of five
differentiable curves. In the second example, ⊂ T2 and the boundary has
three connected components, all of which are differentiable and convex.
The class of dispersing billiards was introduced by Sinai in his 1970
article [Sin70]. The denomination “dispersing” refers to the fact that in
such billiards any (thin) beam of parallel trajectories becomes divergent
upon reflection on the boundary, as illustrated on the left-hand side of
Figure 4.10. Sinai observed that dispersing billiards are hyperbolic systems,
in a non-uniform sense: invariant sub-bundles Ezs and Ezu as in (4.4.11) exist
at almost every point and, instead of (4.4.12), we have that the derivative
is contracting along Ezs and expanding along Ezu asymptotically, that is, for
sufficiently large iterates (depending on the point z).
∂Ω
4.4.7 Exercises
4.4.1. We say that ω ∈ Rd is τ -Diophantine if it is (c, τ )-Diophantine, that is, if it
satisfies (4.4.5), for some c > 0. Prove that the set of τ -Diophantine vectors is
4.4 Comments in conservative dynamics 141
non-empty if and only if τ ≥ d − 1. Moreover, show that the set has full Lebesgue
measure in Rd whenever τ is strictly larger than d − 1.
4.4.2. Consider a billiard on a rectangular table. Check that every trajectory that does
not hit any corner either is periodic or is dense in the billiard table.
4.4.3. Show that every billiard on an acute triangle exhibits some periodic trajectory.
[Observation: the same is true for right triangles, but the problem is open for
obtuse triangles.]
4.4.4. Consider the billiard model for ideal gases in Example 4.4.13. Check that elastic
collisions between any two molecules correspond to the elastic reflections of the
billiard point particle on the boundary of .
4.4.5. Prove Theorem 4.4.9 under the additional hypothesis that the function ρ →
"(θ , ρ) is monotone (increasing or decreasing) for every θ ∈ R.
4.4.6. Consider the context of Theorem 4.4.9 but, instead of (4.4.10), assume that f
rotates the two boundary components of A with different velocities: there exists
some lift F : R × [a, b] → R × [a, b] and there exist p, q ∈ Z with q ≥ 1, such that,
denoting F q = ("q , Rq ),
q
" (θ , a) − p − θ "q (θ, b) − p − θ < 0 for every θ ∈ R. (4.4.17)
Show that f has two periodic orbits with period q in the interior of A, at least.
4.4.7. Let be a convex domain in the plane whose boundary ∂ is a differentiable
curve. Show that the billiard on has infinitely many periodic orbits.
5
Ergodic decomposition
For convex subsets of vector spaces with finite dimension, it is clear that every
element of the convex set may be written as a convex combination of the
extremal elements. For example, every point in a triangle may be written as
a convex combination of the vertices of the triangle. In view of the results in
Section 4.3, it is natural to ask whether a similar property holds in the space
of invariant probability measures, that is, whether every invariant measure is a
convex combination of ergodic measures.
The ergodic decomposition theorem, which we prove in this chapter
(Theorem 5.1.3), asserts that the answer is positive, except that the number
of “terms” in this combination is not necessarily finite, not even countable.
This theorem has several important applications; in particular, it permits the
reduction of the proof of many results to the case when the system is ergodic.
We are going to deduce the ergodic decomposition theorem from another
important result from measure theory, the Rokhlin disintegration theorem.
The simplest instance of this theorem holds when we have a partition of
a probability space (M, μ) into finitely many measurable subsets P1 , . . . , PN
with positive measure. Then, obviously, we may write μ as a linear
combination
The identity is not affected if we consider the integral restricted to the subset
of irrational values of y. Then (5.1.1) presents m as an (uncountable) convex
combination of ergodic measures.
Part (iv) of the theorem means that μ is a convex combination of the ergodic
probability measures μP , where the “weight” of each μP is determined by the
probability measure μ̂. Part (ii) ensures that the integral in (iv) is well defined.
Moreover (see Exercise 5.1.3), it implies that the map P → M1 (M) given by
P → μP is measurable.
topological space with a countable basis of open sets and B is the Borel
σ -algebra:
Since μP (A) > μP (A) for every P ∈ QA , this implies that μ̂(QA ) = 0 for every
A ∈ A. A similar argument shows that μ̂(RA ) = 0 for every A ∈ A. So,
Q A ∪ RA
A∈A
Example 5.1.9. Let M = T2 , endowed with the Lebesgue measure m, and let
P be the partition of M into horizontal circles S1 × {y}. Then P is a measurable
partition. To see that, consider
Pn = {S1 × I(i, n) : i = 1, . . . , 2n },
5.1.5 Exercises
5.1.1. Show that a partition P is measurable if and only if there exist measurable subsets
M0 , E1 , E2 , . . . , En , . . . such that μ(M0 ) = 1 and, restricted to M0 ,
)
∞
P = {En , M \ En }.
n=1
exists in the weak∗ topology and taking for P the partition of M0 defined
by P(x) = P(y) ⇔ μx = μy . Check the details of this alternative proof of
Theorem 5.1.3 for compact metric spaces.
5.1.7. Let σ : → be the shift map in = {1, . . . , d}Z . Consider the partition W s of
into “stable sets”
for every n ∈ N (the sums involve only partition elements Pn ∈ Pn with positive
measure).
Proof. Initially, suppose that ψ ≥ 0. For each α < β, let S(α, β) be the set of
points x ∈ M such that
lim inf en (ψ, x) < α < β < lim sup en (ψ, x).
n n
It is clear that the sequence en (ψ, x) diverges if and only if x ∈ S(α, β) for some
pair of rational numbers α < β. In other words, the limit e(ψ, x) exists if and
only if x belongs to the intersection Mψ of all S(α, β)c with rational α < β. As
this is a countable intersection, in order to prove that μ(Mψ ) = 1 it suffices to
show that μ(S(α, β)) = 0 for every α < β. We do this next.
Let α and β be fixed and denote S = S(α, β). Given x ∈ S, fix any sequence
of integers 1 ≤ ax1 < bx1 < · · · < axi < bxi < · · · such that
eaxi (ψ, x) < α and ebxi (ψ, x) > β for every i ≥ 1.
5.2 Rokhlin disintegration theorem 151
Define Ai to be the union of the partition elements Ai (x) = Paxi (x) and Bi to be
the union of the partition elements Bi (x) = Pbxi (x) obtained in this way, for all
points x ∈ S. By construction, S ⊂ Ai+1 ⊂ Bi ⊂ Ai for every i ≥ 1. In particular,
S is contained in the set
∞ ∞
S= Bi = Ai .
i=1 i=1
Since the sequence Pn , n ≥ 1, is monotone increasing, given any two of the sets
Ai (x) = Paxi (x) that form Ai , either they are disjoint or one is contained in the
other. It follows that the maximal sets Ai (x) are pairwise disjoint and, hence,
they constitute a partition of Ai . Hence, adding only over such maximal sets
with positive measure,
ψ dμ = ψ dμ ≤ αμ(Ai (x)) = αμ(Ai ),
Ai Ai (x) Ai (x) Ai (x)
for every i ≥ 1. Taking the limit as i → ∞, we find that αμ( S) ≥ βμ( S). This
implies that μ(S) = 0 and, hence, μ(S) = 0. This proves the claim when ψ is
non-negative. The general case follows immediately, since we may always write
ψ = ψ + − ψ − , where ψ ± are measurable, non-negative and bounded. Note that
en (ψ) = en (ψ + ) − en (ψ − ) for every n ≥ 1 and, hence, the conclusion of the
lemma holds for ψ if it holds for ψ + and ψ − . This ends the proof of claim (i).
The other claims are simple consequences of the definition. The fact that
e(ψ) is measurable follows directly from Proposition A.1.31. Since Pn is
coarser than P, it is clear that en (ψ) is constant on each P ∈ P, restricted to a
subset of M with full measure. Hence, the same is true for e(ψ). This proves
part (ii). Observe also that |en (ψ)| ≤ sup |ψ| for every n ≥ 1. Hence, we may
use the dominated convergence theorem to pass to the limit in (5.2.2). In this
way, we get part (iii).
E(A, P) = e(ψ, x) for any x ∈ Mψ ∩ P. Note that e(ψ) = E(A) ◦ π . Hence, the
function E(A) is measurable and satisfies:
ψ dμ = e(ψ) dμ = E(A) dμ̂. (5.2.4)
(C) for every k ∈ N such that ik = 1 there exists l(k) > k such that il(k) = 1 and
Ūl(k) ⊂ Uk and diam Ul(k) ≤ diam Uk /2.
5.2 Rokhlin disintegration theorem 153
Conversely, suppose that (ik )k ∈ satisfies conditions (A), (B) and (C). We are
going to show that there exists x ∈ M such that γ (x) = (ik )k . For that, define
n
Fn = Vk ,
k=0
Finally, given any k ∈ N, let L(k) be the set of all l > k such that Ūl ⊂ Uk and
diam Ul ≤ diam Uk /2. Condition (C) corresponds to the subset
∞ +
[0; a0 , . . . , ak−1 , 0]
k=0 a0 ,...,ak−1
%
∪ [0; a0 , . . . , ak−1 , 1, ak+1 , . . . , al−1 , 1] .
l∈L(k) ak+1 ,...,al−1
This implies that γ is measurable, because the cylinders generate the Borel
σ -algebra of . Next, observe that
a
γ (U0 0 ∩ · · · ∩ Usas ) = [0; a0 , . . . , as ] ∩ γ (M) (5.2.6)
a
for every s, a0 , . . . , as . Using Lemma 5.2.3, it follows that γ (U0 0 ∩ · · · ∩ Usas )
is a Borel subset of for every s, a0 , . . . , as . This proves that the inverse
transformation γ −1 is measurable.
Now we are ready to prove that μ extends to a probability measure on
the Borel σ -algebra of M, as claimed in Proposition 5.2.2. For that, let us
consider the algebra A generated by the cylinders of . Note that the
elements of A are the finite pairwise disjoint unions of cylinders. In particular,
every element of A is compact and, consequently, A is a compact algebra
(Definition A.1.15). Define:
a
ν([0; a0 , . . . , as ]) = μ U0 0 ∩ · · · ∩ Usas , (5.2.7)
for every s ≥ 0 and a0 , . . . , as in {0, 1}. Then ν is an additive function in the set
of all cylinders, with values in [0, 1]. It extends in a natural way to an additive
function defined on the algebra A , which we still denote as ν.
It is clear that ν() = 1. Moreover, since the algebra A is compact, we may
use Theorem A.1.14 to conclude that the function ν : A → [0, 1] is σ -additive.
Hence, by Theorem A.1.13, the function ν extends to a probability measure
defined on the Borel σ -algebra of . Given any cover C of γ (M) by cylinders,
it follows from the definition (5.2.7) that
+ % + %
−1
ν C =μ γ (C) = μ(M) = 1.
C∈C C∈C
Taking the infimum over all covers, we conclude that ν(γ (M)) = 1.
By Corollary 5.2.4, the image γ∗−1 ν is a Borel probability measure on M.
By definition, and using the relation (5.2.6),
a + a
γ∗−1 ν U0 0 ∩ · · · ∩ Usas = ν γ U0 0 ∩ · · · ∩ Usas = ν [0; a0 , . . . , as ] ∩ γ (M)
a
= ν([0; a0 , . . . , as ]) = μ U0 0 ∩ · · · ∩ Usas
for any s, a0 , . . . , as . This implies that γ∗−1 ν is an extension of the function
μ : A → [0, 1]. Therefore, the proof of Proposition 5.2.2 is complete.
5.2.4 Exercises
5.2.1. Let P and Q be measurable partitions of (M, B, μ) such that P ≺ Q up to measure
zero. Let {μP : P ∈ P} be a disintegration of μ with respect to P and, for every
P ∈ P, let {μP,Q : Q ∈ Q, Q ⊂ P} be a disintegration of μP with respect to Q. Let
π : Q → P be the canonical projection, such that Q ⊂ π(Q) for almost every
Q ∈ Q. Show that {μπ(Q),Q : Q ∈ Q} is a disintegration of μ with respect to Q.
5.2.2. (Converse to the theorem of Rokhlin) Let M be a complete separable metric
space. Show that if P satisfies the conclusion of Theorem 5.1.11, that is, if μ
admits a disintegration with respect to P, then the partition P is measurable.
5.2.3. Let P1 ≺ · · · ≺ Pn ≺ · · · be an increasing sequence of countable partitions such
that the union n Pn generates the σ -algebra B of measurable sets, up to measure
zero. Show that the conditional expectation e(ψ) = limn en (ψ) coincides with ψ
at almost every point, for every bounded measurable function.
5.2.4. Prove Proposition 2.4.4, using Proposition 5.2.2.
6
Unique ergodicity
Since the space M1 (M) of probability measures on M is compact for the weak∗
topology (Theorem 2.1.5), up to replacing this sequence by a subsequence,
6.2 Minimality 159
6.1.1 Exercises
6.1.1. Give an example of a transformation f : M → M in a compact metric space such
n−1
that (1/n) j=0 ϕ ◦ f j converges uniformly, for every continuous function ϕ :
M → R, but f is not uniquely ergodic.
6.1.2. Let f : M → M be a transitive continuous transformation in a compact metric
n−1
space. Show that if (1/n) j=0 ϕ ◦ f j converges uniformly, for every continuous
function ϕ : M → R, then f is uniquely ergodic.
6.1.3. Let f : M → M be an isometric homeomorphism in a compact metric space M.
Show that if μ is an ergodic measure for f then, for every n ∈ N, the function
ϕ(x) = d(x, f n (x)) is constant on the support of μ.
6.2 Minimality
Let ⊂ M be a closed invariant set of f : M → M. We say that is minimal
if it coincides with the closure of the orbit {f n (x) : n ≥ 0} of every point x ∈ .
We say that the transformation f is minimal if the ambient M is a minimal set.
Recall that the support of a measure μ is the set of all points x ∈ M such
that μ(V) > 0 for every neighborhood V of x. It follows immediately from the
definition that the complement of the support is an open set: if x ∈ / supp μ then
there exists an open neighborhood V such that μ(V) = 0; then V is contained
in the complement of the support. Therefore, supp μ is a closed set.
It is also easy to see that the support of any invariant measure is an invariant
set, in the following sense: f (supp μ) ⊂ supp μ. Indeed, let x ∈ supp μ and
let V be any neighborhood of y = f (x). Since f is continuous, f −1 (V) is a
neighborhood of x. Then μ(f −1 (V)) > 0, because x ∈ supp μ. Hence, using
that μ is invariant, μ(V) > 0. This proves that y ∈ supp μ.
Proof. Suppose that there exists x ∈ supp μ whose orbit {f j (x) : j ≥ 0} is not
dense in the support of μ. This means that there exists some open subset U of
M such that U ∩ supp μ is non-empty and
f j (x) ∈
/ U ∩ supp μ for every j ≥ 0. (6.2.1)
Let ν be any accumulation point of the sequence of probability measures
n−1
−1
νn = n δf j (x) , n≥1
j=0
require that:
there exists ρ < 1 such that |an | ≤ ρ n for every n sufficiently large. (6.2.3)
Indeed, in that case the series n∈Z an zn converges uniformly on every corona
{z ∈ C : r ≤ |z| ≤ r−1 } with r > ρ. In particular, its sum in the unit circle, which
coincides with φ, is a real-analytic function. Since we want φ to take values in
the real line and to have zero average, we must also require:
In this way, the problem is reduced to finding α and (an )n that satisfy (6.2.3),
(6.2.4), (6.2.5) and (6.2.8). Exercise 6.2.4 hints at the issues involved in the
choice of such objects.
6.2.1 Exercises
6.2.1. Show that if u is a measurable solution of the cohomological equation (6.2.2)
then h : T2 → T2 , h(x, y) = (x, y + u(x)) is an ergodic equivalence between (f0 , m)
and (f , m), that is, h is an invertible measurable transformation that preserves the
measure m and conjugates the two maps f and f0 . Deduce that (f , m) cannot be
ergodic.
6.2.2. Show that if u is a continuous solution of the cohomological equation (6.2.2) then
h : T2 → T2 , h(x, y) = (x, y + u(x)) is a topological conjugacy between f0 and f .
In particular, f cannot be transitive.
162 Unique ergodicity
6.2.3. Check that if u(x) = n∈Z bn e
2π inx
is a solution of (6.2.2) then
an
bn = for every n ∈ Z. (6.2.9)
e2π inα −1
Moreover, u ∈ L2 (m) if and only if ∞ n=1 |bn | < ∞.
2
6.2.4. We say that an irrational number α is Diophantine if there exist c > 0 and τ > 0
such that |qα − p| ≥ c|q|−τ for any p, q ∈ Z with q = 0. Show that the condition
(6.2.5) is satisfied whenever α is Diophantine and φ satisfies (6.2.3).
6.2.5. (Theorem of Gottschalk) Let f : M → M be a continuous map in a compact metric
space M. Show that the closure of the orbit of a point x ∈ M is a minimal set if
and only if Rε = {n ∈ Z : d(x, f n (x)) < ε} is a syndetic set for every ε > 0.
6.2.6. Let f : M → M be a continuous map in a compact metric space M. We say that
x, y ∈ M are close if infn d(f n (x), f n (y)) = 0. Show that if x ∈ M is such that the
closure of its orbit is a minimal set then, for every neighborhood U of x and every
point y close to x, there exists an increasing sequence (ni )i such that f ni1 +···+nik (x)
and f ni1 +···+nik (y) are in U for any i1 < · · · < ik and k ≥ 1.
6.2.7. (Theorem of Hindman) A theorem of Auslander and Ellis (see [Fur81,
Theorem 8.7]) states that in the conditions of Exercise 6.2.6 the closure of the
orbit of every y ∈ M contains some point x that is close to y and such that the
closure of its orbit is a minimal set. Deduce the following refinement of the
theorem of van der Waerden: given any decomposition N = S1 ∪ · · · ∪ Sq of the set
of natural numbers into pairwise disjoint sets, there exists j such that Sj contains
a sequence n1 < · · · < ni < · · · such that ni1 + · · · + nik ∈ Sj for every k ≥ 1 and
any i1 < · · · < ik .
1
n−1
j
ϕn = ϕ ◦ Rθ converges to cϕ at every point. (6.3.1)
n j=0
6.3 Haar measure 163
Then, using that ϕ is continuous, given any ε > 0 we may find δ > 0 such that
j j j j
d(x, y) < δ ⇒ d(Rθ (x), Rθ (y)) < δ ⇒ |ϕ(Rθ (x)) − ϕ(Rθ (y))| < ε
Since ε does not depend on n, this proves that the sequence (ϕn )n is
equicontinuous.
This allows us to use the theorem of Ascoli to prove the claim (6.3.1), as
follows. Suppose that there exists x̄ ∈ Td such that (ϕn (x̄))n does not converge
to cϕ . Then there exists c = cϕ and some subsequence (nk )k such that ϕnk (x̄)
converges to c when k → ∞. By the theorem of Ascoli, up to restricting to a
subsequence we may suppose that (ϕnk )k is uniformly convergent. Let ψ be its
limit. Then ψ is a continuous function such that ψ(x) = cϕ for a dense subset
of values of x ∈ Td but ψ(x̄) = c is different from cϕ . It is clear that such a
function does not exist. This contradiction proves our claim that Rθ is uniquely
ergodic.
are continuous. In all that follows it is assumed that the topology of G is such
that every set consisting of a single point is closed. When G is a manifold and
the operations in (6.3.2) are differentiable, we say that (G, ·) is a Lie group. See
Exercise 6.3.1.
The Euclidean space Rd is a topological group, and even a Lie group, relative
to addition +, and the same holds for the torus Td . Recall that Td is the quotient
of Rd by its subgroup Zd . This construction may be generalized as follows:
xH · yH = (x · y)H.
(i) There exists some Borel measure μG on G that is invariant under all
left-translations, finite on compact sets and positive on open sets;
(ii) If η is a measure invariant under all left-translations and finite on
compact sets then η = cμG for some c > 0.
(iii) μG (G) < ∞ if and only if G is compact.
We are going to sketch the proof of parts (i) and (ii) in the special case
when G is a Lie group. It will be apparent that in this case μG is a volume
measure on G. The proof of part (iii), for any topological group, is proposed in
Exercise 6.3.4.
Starting with part (i), let e be the unit element and d ≥ 1 be the dimension of
the Lie group. Consider any inner product · in the tangent space Te G. For each
g ∈ G, represent by Lg : Te G → Tg G the derivative of the left-translation Lg at
the point e. Next, consider the inner product defined in Tg G in the following
way:
u · v = L−1 −1
g (u) · Lg (v) for every u, v ∈ Tg G.
= L−1 −1
g (u) · Lg (v) = u · v
Then μG (B) = B |ρ(x)| dx1 · · · dxd , for any measurable set B contained in the
domain of the local coordinates. Noting that the function ρ is continuous and
non-zero, for every local chart, it follows that μG is positive on open sets and
finite on compact sets. Moreover, since the Riemannian metric is invariant
under left-translations, the measure μG is also invariant under left-translations.
Now we move on to discussing part (ii) of Theorem 6.3.4. Let ν any measure
as in the statement. Denote by B(g, r) the open ball of center g and radius r,
relative to the distance associated with the Riemannian metric. In other words,
B(g, r) is the set of all points in G that may be connected to g by some curve of
length less than r. Fix ρ > 0 such that ν(B(e, ρ)) is finite (such a ρ does exist
166 Unique ergodicity
By the Vitali lemma (Theorem A.2.16), we may find (gj )j in B(e, ρ) and (nj )j
in N such that
1. the balls B(gj , rnj ) are contained in B(e, ρ) and they are pairwise disjoint;
2. the union of these balls has full μG -measure in B(e, ρ).
Moreover, given any a ∈ R smaller than the limit in (6.3.4), we may suppose
that the integers nj are sufficiently large that ν(B(gj , rnj )) ≥ aμG (B(gj , rnj )) for
every j. It follows that
ν(B(e, ρ)) ≥ ν(B(gj , rnj )) ≥ aμG (B(gj , rnj )) = aμG (B(e, ρ)).
j j
Since a may be taken arbitrarily close to (6.3.4), this proves the claim (6.3.3).
Next, we claim that ν is absolutely continuous with respect to μG . Indeed,
let b be any number larger than the quotient on the right-hand side of (6.3.3).
Given any measurable set B ⊂ G with μG (B) = 0, and given any ε > 0, let
{B(gj , rj ) : j} be a cover of B by balls of small radii, such that ν(B(gj , rj )) ≤
bμ(B(gj , rj )) and j μG (B(gj , rj )) ≤ ε. Then,
ν(B) ≤ ν(B(gj , rj )) ≤ b μ(B(gj , rj )) ≤ bε.
j j
for μ-almost every g ∈ G. The limit on the left-hand side does not depend on g
and, by (6.3.3), it is finite. Let c ∈ R be that limit. Then ν = cμG , as stated in
part (ii) of Theorem 6.3.4.
In the case when the group G is compact, it follows from Theo-
rem 6.3.4 that there exists a unique probability measure that is invariant
under left-translations, positive on open sets and finite on compact sets. This
probability measure μG is called the Haar measure of the group. For example,
6.3 Haar measure 167
the normalized Lebesgue measure is the Haar measure on the torus Td . See
also Exercises 6.3.5 and 6.3.6. The Haar measure features some additional
properties:
Corollary 6.3.5. Assume that G is compact. Then the Haar measure μG is
invariant under right-translations and under every surjective endomorphism
of G.
Then, ϕ is continuous and ϕ(e) = 0 < ϕ(z) for every z = e. Now define
The sets U × W obtained in this way, with x fixed and g, h variable, form
an open cover of G2 . Let Ui × Wi , i = 1, . . . , k be a finite subcover and Vi ,
i = 1, . . . , k be the corresponding neighborhoods of x. Take V = ki=1 Vi and
consider any y ∈ V. Given any (g, h) ∈ G2 , the condition (6.3.6) implies that
|ϕ(gxh) − ϕ(gyh)| ≤ δ/2. It follows that d(x, y) ≤ δ/2 and, consequently,
y ∈ B(x, δ).
Example 6.3.7. Given a matrix A ∈ GL(d, R), denote by A its operator
norm, that is, A = sup{Av : v = 1}. Observe that OA = A = AO
for every O in the orthogonal group O(d, R). Define
for every C ∈ GL(d, R). This distance is not invariant under right-translations
in GL(d, R) (Exercise 6.3.3). However, it is right-invariant (and left-invariant)
6.3 Haar measure 169
restricted to the orthogonal group O(d, R): for every O ∈ O(d, R),
Proof. It is clear that (i) implies (ii). To prove that (ii) implies (iii), consider the
invariant distance d given by Lemma 6.3.6. Let H be the closure of {gn : n ∈ Z}
and consider the continuous function
Observe that this function is invariant under Lg : using that gH = H, we get that
6.3.4 Odometers
Odometers, or adding machines, are mathematical models for the mechanisms
that register the distance (number of kilometers) travelled by a car or the
amount of electricity (number of energy units) consumed in a house. They
come with a dynamic, which consists in advancing the counter by one unit
each time. The main difference with respect to real-life odometers is that our
idealized counters allow for an infinite number of digits.
Fix any number basis d ≥ 2, for example d = 10, and consider the set X =
{0, 1, . . . , d − 1}, endowed with the discrete topology. Let M = X N be the set of
all sequences α = (αn )n with values in X, endowed with the product topology.
This topology is metrizable: it is compatible, for instance, with the distance
defined in M by
d(α, α ) = 2−N(α,α ) where N(α, α ) = min{j ≥ 0 : αj = αj }. (6.3.7)
Observe also that M is compact, being the product of compact spaces (theorem
of Tychonoff).
Let us introduce in M the following operation of “sum with transport”: given
α = (αn )n and β = (βn )n in M, define α + β = (γn )n as follows. First,
• if α0 + β0 < d then γ0 = α0 + β0 and δ1 = 0;
• if α0 + β0 ≥ d then γ0 = α0 + β0 − d and δ1 = 1.
The auxiliary sequence (δn )n corresponds precisely to the transports. The map
+ : M × M → M defined in this way turns M into an abelian topological group
and the distance (6.3.7) is invariant under all the translations (Exercise 6.3.8).
Now consider the “translation by 1” f : M → M defined by
f (αn )n = (αn )n + (1, 0, . . . , 0, . . . ) = (0, . . . , 0, αk + 1, αk+1 , . . . , αn , . . . )
where k ≥ 0 is the smallest value of n such that αn < d − 1; if there exists no
such k, that is, if (αn )n is the constant sequence equal to d − 1, then the image
f ((αn )n ) is the constant sequence equal to 0. We leave it to the reader to check
that this transformation f : M → M is uniquely ergodic (Exercise 6.3.9).
It is possible to genralize this construction somewhat, in the following
.
way. Take M = ∞ n=0 {0, 1, . . . , dn − 1}, where (dn )n is any sequence of integer
numbers larger than 1. Just as in the previous particular case, this set has the
6.3 Haar measure 171
Ik–1
I0
This is only one of the simplest applications of the so-called piling method,
which is a very effective tool to produce examples with interesting properties.
The reader may find a detailed discussion of this method in Section 6 of
Friedman [Fri69]. Another application, a bit more elaborate, will be given in
Example 8.2.3.
1 For definiteness, take all intervals to be closed on the left and open on the right.
172 Unique ergodicity
such systems and the odometer, we recommend the book of Queffélec [Que87]
and the paper of Ferenczi, Fisher and Talet [FFT09].
We call a substitution in a finite alphabet A any map associating with each
letter α ∈ A a word s(α) formed by a finite number of letters of A. A few
examples, for A = {0, 1}: Thue–Morse substitution s(0) = 01 and s(1) = 10;
Fibonacci substitution s(0) = 01 and s(1) = 0; Feigenbaum substitution s(0) =
11 and s(1) = 10; Cantor substitution s(0) = 010 and s(1) = 111; and Chacon
substitution s(0) = 0010 and s(1) = 1. We may iterate a substitution by defining
s1 (α) = s(α) and
sk+1 (α) = s(α1 ) · · · s(αn ) if sk (α) = α1 · · · αn .
We call a substitution s primitive (or aperiodic) if there exists k ≥ 1 such that
for any α, β ∈ A the word sk (α) contains the letter β.
Let A be endowed with the discrete topology and = AN be the space of
all sequences in A, endowed with the product topology. Denote by S : →
the map induced in that space by a given substitution s: the image of each
(a0 , . . . , an , . . . ) ∈ is the sequence of the letters that constitute the word
obtained when one concatenates the finite words s(a0 ), . . . , s(an ), . . . Suppose
that there exists some letter α0 ∈ A such that the word s(α0 ) has length larger
than 1 and starts with the letter α0 . That is the case for all the examples listed
above. Then (Exercise 6.3.11), S admits a unique fixed point x = (xn )n with
x0 = α0 .
Consider the restriction σ : X → X of the shift map σ : → to the
closure X ⊂ of the orbit {σ n (x) : n ≥ 0} of the point x. If the substitution s is
primitive then σ : X → X is minimal and uniquely ergodic (see Section 5
in [Que87]). That holds, for instance, for the Thue–Morse, Fibonacci and
Feigenbaum substitutions.
6.3.5 Exercises
6.3.1. Let G be a manifold and · be a group operation in G such that the map (g, h) →
g · h is of class C1 . Show that g → g−1 is also of class C1 .
6.3.2. Let G be a compact topological space such that every point admits a countable
basis of neighborhoods and let · be a group operation in G such that the map
(g, h) → g · h is continuous. Show that g → g−1 is also continuous.
6.3.3. Show that the distance d in Example 6.3.7 is not right-invariant.
6.3.4. Prove part (iii) of Theorem 6.3.4: a locally compact group G is compact if and
only if its Haar measure is finite.
6.3.5. Identify GL(1, R) with the multiplicative group R \ {0}. Check that the measure
μ defined on GL(1, R) by
ϕ(x)
ϕ dμ = dx
GL(1,R) R\{0} |x|
is both left-invariant and right-invariant. Find a measure invariant under all the
translations of GL(1, C).
6.4 Theorem of Weyl 173
6.3.6. Identify GL(2, R) with {(a11 , a12 , a21 , a22 ) ∈ R4 : a11 a22 − a12 a21 = 0}, in such
a way that det(a11 , a12 , a21 , a22 ) = a11 a22 − a12 a21 . Show that the measure μ
defined by
ϕ(x11 , x12 , x21 , x22 )
ϕ dμ = dx11 dx12 dx21 dx22
GL(2,R) | det(x11 , x12 , x21 , x22 )|2
is both left-invariant and right-invariant. Find a measure invariant under all the
translations of GL(2, C).
6.3.7. Let G be a compact metrizable group and let g ∈ G. Check that the following
conditions are equivalent:
(1) Lg is uniquely ergodic;
(2) Lg is transitive: there is x ∈ G such that {gn x : n ∈ Z} is dense in G;
(3) Lg is minimal: {gn y : n ∈ Z} is dense in G for every y ∈ G.
6.3.8. Show that the operation + : M × M → M defined in Section 6.3.4 is continuous
and endows M with the structure of an abelian group. Moreover, every
translation in this group preserves the distance defined in (6.3.7).
6.3.9. Let f : M → M be an odometer, as defined in Section 6.3.4, with d = 10. Given
b0 , . . . , bk−1 in {0, . . . , 9}, denote by [b0 , . . . , bk−1 ] the set of all sequences β ∈ M
with β0 = b0 , . . . , βk−1 = bk−1 . Show that
1 1
lim # 0 ≤ j < n : f j (x) ∈ [b0 , . . . , bk−1 ] = k
n n 10
for every x ∈ M. Moreover, this limit is uniform. Conclude that f admits a unique
invariant probability measure and calculate that measure explicitly.
6.3.10. Check the claims in Example 6.3.9.
6.3.11. Prove that if s is a substitution in a finite alphabet A and α ∈ A is such that
s(α) has length larger than 1 and starts with the letter α, then the transformation
S : → defined in Example 6.3.10 admits a unique fixed point that starts
with the letter α ∈ A.
f : S1 → S1 , f (θ ) = θ + a1 .
6.4.1 Ergodicity
Now we extend the previous arguments to any degree d ≥ 1. Consider the
transformation f : Td → Td defined on the d-dimensional torus Td by the
following expression:
Note also that the derivative of f at each point is given by the matrix
⎛ ⎞
1 0 0 ··· 0 0
⎜ 1 1 0 ··· 0 0 ⎟
⎜ ⎟
⎜ 0 1 1 ··· 0 0 ⎟
⎜ ⎟,
⎜ . . . . . . ⎟
⎝ .. .. .. .. .. .. ⎠
0 0 0 ··· 1 1
whose determinant is 1. This ensures that f preserves the Lebesgue measure
on the torus (recall Lemma 1.3.5).
Proposition 6.4.3. The Lebesgue measure on Td is ergodic for f .
Observe that
ϕ(f (θ )) = an e2πi(n1 (θ1 +α)+n2 (θ2 +θ1 )+···+nd (θd +θd−1 ))
n∈Zd
= an e2πin1 α e2πiL(n)·θ ,
n∈Zd
in its Fourier series vanish except, possibly, the constant term. This means that
ϕ is constant at almost every point, which proves that the Lebesgue measure is
ergodic for f .
Proof. The proof is by induction on the degree d of the polynomial P. The case
of degree 1 was treated previously. Therefore, we only need to explain how the
case of degree d may be deduced from the case of degree d − 1. For that, we
write Td = Td−1 × S1 and
f : Td−1 × S1 → Td−1 × S1 , f (θ0 , η) = (f0 (θ0 ), η + θd−1 ), (6.4.4)
where θ0 = (θ1 , . . . , θd−1 ) and f0 (θ0 ) = (θ1 + α, θ2 + θ1 , . . . , θd−1 + θd−2 ). By
induction, the transformation
f0 : Td−1 → Td−1
is uniquely ergodic. Let us denote by π : Td → Td−1 the projection π(θ ) = θ0 .
Lemma 6.4.5. For any probability measure μ invariant under f , the projection
π∗ μ coincides with the Lebesgue measure m0 on Td−1 .
has full measure. Let G0 (μ) be the set of all θ0 ∈ Td−1 such that G(μ) intersects
{θ0 } × S1 . In other words, G0 (μ) = π(G(μ)). It is clear that π −1 (G0 (μ))
contains G(μ) and, thus, has full measure. Hence, using Lemma 6.4.5,
m0 (G0 (μ)) = μ(π −1 (G0 (μ))) = 1. (6.4.6)
For the same reasons, this relation remains valid for the Lebesgue measure:
m0 (G0 (m)) = m(π −1 (G0 (m))) = 1. (6.4.7)
The identities (6.4.6) and (6.4.7) imply that the intersection between G0 (μ)
and G0 (m) has full measure for m0 . So, in particular, these two sets cannot be
disjoint. Let θ0 be any point in the intersection. By definition, G(μ) intersects
{θ0 } × S1 . But the next result asserts that G(m) contains {θ0 } × S1 :
Lemma 6.4.6. If θ0 ∈ G0 (m) then {θ0 } × S1 is contained in G(m).
Proof. The crucial observation is that the measure m is invariant under every
transformation of the form
Rβ : Td−1 × S1 → Td−1 × S1 , (ζ , η) → (ζ , η + β).
The hypothesis θ0 ∈ G0 (m) means that there exists some η ∈ S1 such that
(θ0 , η) ∈ G(m), that is,
1
n−1
lim ϕ(f j (θ0 , η)) = ϕ dm
n n
j=0
1 1
n−1 n−1
lim ϕ(f j (θ0 , η + β)) = lim (ϕ ◦ Rβ )(f j (θ0 , η))
n n n j=0
j=0
= (ϕ ◦ Rβ ) dm = ϕ dm.
It follows from what we said so far that G(μ) and G(m) intersect each other
at some point of {θ0 } × S1 . In view of the definition (6.4.5), this implies that the
two measures have the same integral for every continuous function. According
to Proposition A.3.3, this implies that μ = m, as we wanted to prove.
178 Unique ergodicity
Proof. By definition, pd (x) = P(x) has degree d. Hence, to prove the first claim
it suffices to show that if pj (x) has degree j then pj−1 (x) has degree j − 1. In
order to do that, let
pj (x) = bj xj + bj−1 xj−1 + · · · + b0 ,
where bj = 0. Then
pj (x + 1) = bj (x + 1)j + bj−1 (x + 1)j−1 + · · · + b0
= bj xj + (jbj + bj−1 )xj−1 + · · · + b0 .
Subtracting one expression from the other, we get that
pj−1 (x) = (jbj )xj−1 + bj−2 xj−2 + · · · + b0
has degree j − 1. This proves the first claim in the lemma. This calculation
also shows that the main coefficient of pj−1 (x) (the coefficient of the term with
highest degree) can be obtained multiplying by j the main coefficient of pj (x).
Consequently, the main coefficient of p1 must be equal to d!aq , as claimed in
the last part of the lemma.
This ends the proof of Theorem 6.4.2 in the case when ad is irrational.
Now suppose that ad is rational. Write ad = p/q with p ∈ Z and q ∈ N. It is
clear that we may write zn as a sum
zn = xn + yn , xn = ad nd and yn = Q∗ (n)
where Q(x) = a0 +a1 x+· · ·+ad−1 xd−1 and Q∗ : R → S1 is given by Q∗ = π ◦Q.
To begin with, observe that
p p
xn+q − xn = (n + q)d − nd
q q
is an integer, for every n ∈ N. This means that the sequence xn , n ∈ N is periodic
(with period q) in the circle R/Z. In particular, it takes no more than q distinct
values. Observe also that, since ad is rational, the hypothesis of the theorem
implies that some of the coefficients a1 , . . . , ad−1 of Q are irrational. Hence, by
induction on the degree, the sequence yn , n ∈ N is equidistributed. More than
that, the subsequences
yqn+r = Q∗ (qn + r), n∈Z
are equidistributed for every r ∈ {0, 1, . . . , q − 1}. In fact, as the reader may
readily check, these sequences may be written as ynq+r = Q(r) ∗ (n) for some
(r)
polynomial Q that also has degree d − 1 and, thus, the induction hypothesis
applies to each one of them as well. From these two observations it follows
180 Unique ergodicity
6.4.4 Exercises
6.4.1. Show that a sequence (zj )j is equidistributed on the circle if and only if
1
lim #{1 ≤ j ≤ n : zj ∈ I} = m(I)
n n
second one. That kind of behavior, where correlations approach zero as time n
increases, is quite common in important models, as we are going to see.
We start by introducing the notions of (strong) mixing and weak mixing
systems, and by studying their basic properties (Section 7.1). In Sections 7.2
and 7.3 we discuss these notions in the context of Markov shifts, which
generalize Bernoulli shifts, and of interval exchanges, which are an extension
of the class of circle rotations. In Section 7.4 we analyze, in quantitative terms,
the speed of decay of correlations for certain classes of functions.
for any measurable sets A, B ⊂ M. In other words, when n grows the probability
of the event {x ∈ B and f n (x) ∈ A} converges to the product of the probabilities
of the events {x ∈ B} and {f n (x) ∈ A}.
Analogously, given a flow f t : M → M, t ∈ R and an invariant probability
measure μ, we define
Ct (ϕ, ψ) = (ϕ ◦ f )ψ dμ − ϕ dμ ψ dμ,
t
t∈R (7.1.3)
7.1.1 Properties
A mixing system is necessarily ergodic. Indeed, suppose that there exists some
invariant set A ⊂ M with 0 < μ(A) < 1. Taking B = Ac , we get f −n (A) ∩ B = ∅
for every n. Then, μ(f −n (A) ∩ B) = 0 for every n, whereas μ(A)μ(B) = 0. In
particular, (f , μ) is not mixing. The example that follows shows that ergodicity
is strictly weaker than mixing:
Example 7.1.1. Let θ ∈ R be an irrational number. As we have seen in
Section 4.2.1, the rotation Rθ : S1 → S1 is ergodic with respect to the Lebesgue
measure m. However, (Rθ , m) is not mixing. Indeed, if A, B ⊂ S1 are two small
intervals then R−n −n
θ (A) ∩ B is empty and, thus, m(Rθ (A) ∩ B) = 0 for infinitely
7.1 Mixing systems 183
Lemma 7.1.2. Assume that limn μ(f −n (A) ∩ B) = μ(A)μ(B) for every pair of
sets A and B in an algebra A that generates the σ -algebra of measurable sets.
Then (f , μ) is mixing.
Proof. Let C be the family of all measurable sets A such that μ(f −n (A) ∩ B)
converges to μ(A)μ(B) for every B ∈ A. By assumption, C contains A. We
claim that C is a monotone class. Indeed, let A = k Ak be the union of an
increasing sequence A1 ⊂ · · · ⊂ Ak ⊂ · · · of elements of C. Given ε > 0, there
exists k0 ≥ 1 such that
μ(A) − μ(Ak ) = μ(A \ Ak ) < ε
for every k ≥ k0 . Moreover, for every n ≥ 1,
μ f −n (A) ∩ B − μ f −n (Ak ) ∩ B = μ f −n (A \ Ak ) ∩ B
≤ μ(f −n (A \ Ak )) = μ(A \ Ak ) < ε.
For each fixed k ≥ k0 , the fact that Ak ∈ C ensures that there exists n(k) ≥ 1
such that
|μ f −n (Ak ) ∩ B − μ(Ak )μ(B)| < ε for every n ≥ n(k).
Adding these three inequalities we conclude that
|μ f −n (A) ∩ B − μ(A)μ(B)| < 3ε for every n ≥ n(k0 ).
Since ε > 0 is arbitrary, this shows that A ∈ C. In the same way, one proves
that the intersection of any decreasing sequence of elements of C is still an
element of C. So, C is indeed a monotone class. By the monotone class theorem
(Theorem A.1.18), it follows that C contains every measurable set: for every
measurable set A one has
lim μ f −n (A) ∩ B = μ(A)μ(B) for every B ∈ A.
n
All that is left to do is to deduce that this property holds for every measurable
set B. This follows from precisely the same kind of arguments as we have just
detailed, as the reader may readily check.
Example 7.1.3. Every Bernoulli shift (recall Section 4.2.3) is mixing. Indeed,
given any two cylinders A = [p; Ap , . . . , Aq ] and B = [r; Br , . . . , Bs ],
μ f −n (A) ∩ B = μ([r; Br , . . . , Bs , X, . . . , X, Ap , . . . , Aq ])
= μ([r; Br , . . . , Bs ])μ([p; Ap , . . . , Aq ]) = μ(A)μ(B)
184 Correlations
for every n > s − p. Let A be the algebra generated by the cylinders: its
elements are the finite pairwise disjoint unions of cylinders. It follows from
what we have just said that μ(f −n (A) ∩ B) = μ(A)μ(B) for every pair of sets
A, B ∈ A and every n sufficiently large. Since A generates the σ -algebra of
measurable sets, we may use Lemma 7.1.2 to conclude that the system is
mixing, as stated.
Example 7.1.4. Let g : S1 → S1 be defined by g(x) = kx, where k ≥ 2 is
an integer number, and let m be the Lebesgue measure on the circle. The
system (g, m) is equivalent to a Bernoulli shift, in the following sense. Let
X = {0, 1, . . . , k − 1} and let f : M → M be the shift map in M = X N . Consider
the product measure μ = ν N in M, where ν is the probability measure defined
by ν(A) = #A/k for every A ⊂ X. The map
∞
an−1
h:M→S , 1
h (an )n =
n=1
kn
is a bijection, restricted to a full measure subset, and both h and its inverse
are measurable. Moreover, h∗ μ = m and h ◦ f = g ◦ h at almost every point.
We say that h is an ergodic equivalence between (g, m) and (f , μ). Through
it, properties of one system may be translated to corresponding properties for
the other system. In particular, recalling Example 7.1.3, we get that (g, m) is
mixing: given any measurable sets A, B ⊂ S1 ,
−n +
m g (A) ∩ B = μ h−1 (g−n (A) ∩ B) = μ f −n (h−1 (A)) ∩ h−1 (B)
1 1 −j
n−1 n−1
lim |Cj (XA , XB )| = lim μ(f (A) ∩ B) − μ(A)μ(B) = 0. (7.1.5)
n n n→∞ n
j=0 j=0
It is clear from the definition that every mixing system is also weak mixing.
On the other hand, every weak mixing system is ergodic. Indeed, if A ⊂ M is
an invariant set then
1
n−1
lim |Cj (XA , XAc )| = μ(A)μ(Ac )
n n
j=0
1 −j 1
n−1
lim inf μ(f (U) ∩ V) − μ(U)μ(V) ≥ μ(U)μ(V) > 0.
n n j=0 2
In this way we get several examples of ergodic systems, even uniquely ergodic
ones, that are not weak mixing.
We are going to see in Section 7.3.2 that the family of interval exchanges
contains many systems that are weak mixing (and uniquely ergodic) but are
not mixing.
The proof of the next result is analogous to the proof of Lemma 7.1.2 and is
left to the reader:
−j
Lemma 7.1.9. Assume that limn (1/n) n−1 j=0 |μ(f (A) ∩ B) − μ(A)μ(B)| = 0
for every pair of sets A and B in some algebra A that generates the σ -algebra
of measurable sets. Then (f , μ) is weak mixing.
Example 7.1.10. Given a system (f , μ), let us consider the product transfor-
mation f2 : M × M → M × M given by f2 (x, y) = (f (x), f (y)). It is easy to see
that f2 preserves the product measure μ2 = μ × μ. If (f2 , μ2 ) is ergodic then
(f , μ) is ergodic: just note that if A ⊂ M is invariant under f and μ(A) ∈ (0, 1)
then A × A is invariant under f2 and μ2 (A × A) ∈ (0, 1).
The converse is not true in general, that is, (f2 , μ2 ) may not be ergodic even
if (f , μ) is ergodic. For example, if f : S1 → S1 is an irrational rotation and d is
a distance invariant under rotations, then any neighborhood {(x, y) : d(x, y) < r}
of the diagonal is invariant under f2 .
The next result shows that this type of phenomenon cannot occur in the
category of weak mixing systems:
Proof. To prove that (i) implies (ii), consider any measurable sets A, B, C, D in
M. Then
μ2 (f −j (A × B) ∩ (C × D)) − μ2 (A × B)μ2 (C × D)
2
= μ f −j (A) ∩ C μ(f −j (B) ∩ D) − μ(A)μ(B)μ(C)μ(D)
≤ μ f −j (A) ∩ C − μ(A)μ(C) + μ f −j (B) ∩ D − μ(B)μ(D).
7.1 Mixing systems 187
1
n−1
μ2 (f2 (A × B) ∩ (C × D)) − μ2 (A × B)μ2 (C × D) = 0.
−j
lim
n n
j=0
It follows that
1
n−1
μ2 (f2 (X) ∩ Y) − μ2 (X)μ2 (Y) = 0
−j
lim
n n
j=0
1 −j 2
n−1
= μ f (A) ∩ B − 2μ(A)μ(B)μ f −j (A) ∩ B + μ(A)2 μ(B)2 .
n j=0
1 + −j
n−1
μ2 f2 (A × A) ∩ (B × B) − μ2 (A × A)μ2 (B × B)
n j=0
1 −j
n−1
− 2μ(A)μ(B) μ f (A) ∩ B − μ(A)μ(B) .
n j=0
1 −j 2
n−1
lim μ f (A) ∩ B − μ(A)μ(B) = 0
n n
j=0
Proof. Condition (i) is the special case of (ii) for characteristic functions. Since
the correlations (ϕ, ψ) → Cn (ϕ, ψ) are bilinear functions, condition (i) implies
that Cn (ϕ, ψ) → 0 for any simple functions ϕ and ψ. This implies (iii), since
the simple functions form a dense subset of Lr (μ) for any r ≥ 1.
To show that (iii) implies (ii), let us begin by observing that as correlations
Cn (ϕ, ψ) are equicontinuous functions of ϕ and ψ. Indeed, given ϕ1 , ϕ2 ∈
Lp (μ) and ψ1 , ψ2 ∈ Lq (μ), the Hölder inequality (Theorem A.5.5) gives that
(ϕ1 ◦f )ψ1 dμ− (ϕ2 ◦f )ψ2 dμ ≤ ϕ1 −ϕ2 p ψ1 q +ϕ2 p ψ1 −ψ2 q .
n n
Moreover,
ϕ1 dμ ψ1 dμ− ϕ2 dμ ψ2 dμ ≤ ϕ1 −ϕ2 1 ψ1 1 +ϕ2 1 ψ1 −ψ2 1 .
Adding these inequalities, and noting that · 1 ≤ · r for every r ≥ 1, we get
that
Cn (ϕ1 , ψ1 ) − Cn (ϕ2 , ψ2 ) ≤ 2ϕ1 − ϕ2 p ψ1 q + 2ϕ2 p ψ1 − ψ2 q (7.1.6)
for every n ≥ 1. Then, given ε > 0 and any ϕ ∈ Lp (μ) and ψ ∈ Lq (μ), we may
take ϕ and ψ in the dense subsets mentioned in the hypothesis such that
ϕ − ϕ p < ε and ψ − ψ q < ε.
In particular, ϕ p < ϕp + ε and ψ q < ψq + ε. Then, (7.1.6) gives that
|Cn (ϕ, ψ)| ≤ |Cn (ϕ , ψ )| + 2ε(ϕp + ψq + 2ε) for every n.
Moreover, by hypothesis, |Cn (ϕ , ψ )| < ε for every n sufficiently large. Since ε
is arbitrary, these two inequalities imply that Cn (ϕ, ψ) converges to zero when
n → ∞. This proves property (ii).
The same argument proves the following version of Proposition 7.1.12 for
the weak mixing property:
Proposition 7.1.13. The following conditions are equivalent:
(i) (f , μ) is weak mixing.
(ii) There exist p, q ∈ [1, ∞] with 1/p+1/q = 1 such that (1/n) nj=1 |Cj (ϕ, ψ)|
converges to 0 for any ϕ ∈ Lp (μ) and ψ ∈ Lq (μ).
(iii) The condition in part (ii) holds for ϕ in some dense subset of Lp (μ) and
ψ in some dense subset of Lq (μ).
7.1 Mixing systems 189
The condition (7.1.7) means that Ufn ϕ converges weakly to ϕ ·1 = ϕ dμ, while
(7.1.8) is a Cesàro version of that assertion. Compare both conditions with the
characterization of ergodicity in (4.1.7).
7.1.4 Exercises
7.1.1. Show that (f , μ) is mixing if and only if μ(f −n (A) ∩ A) → μ(A)2 for every
measurable set A.
7.1.2. Let (an )n be a bounded sequence of real numbers. Prove that
1
n
lim |aj | = 0
n n
j=1
if and only if there exists E ⊂ N with density zero at infinity (that is, with
limn (1/n)#(E ∩ {0, . . . , n − 1}) = 0) such that the restriction of (an )n to the
complement of E converges to zero when n → ∞. Deduce that
1 1 2
n n
lim |aj | = 0 ⇔ lim (aj ) = 0.
n n n n
j=1 j=1
7.1.3. Prove that if μ is weak mixing for f then μ is weak mixing for every iterate f k ,
k ≥ 1.
190 Correlations
7.1.4. Show that if no eigenvalue of A ∈ SL(d, R) is a root of unity then the linear
endomorphism fA : Td → Td induced by A is mixing, with respect to the Haar
measure.
7.1.5. Let f : M → M be a measurable transformation in a metric space. Check that an
invariant probability measure μ is mixing if and only if (f∗n η)n converges to μ in
the weak∗ topology for every probability measure η absolutely continuous with
respect to μ.
7.1.6. (Multiple von Neumann ergodic theorem). Show that if (f , μ) is weak mixing
then
1
N−1
(ϕ1 ◦ f ) · · · (ϕk ◦ f ) → ϕ1 dμ · · · ϕk dμ
n kn
in L2 (μ),
N n=0
for every cylinder [m; Am , . . . , An ] of . One can show (check Exercise 7.2.1)
that this function extends to a probability measure in the σ -algebra generated
by the cylinders. This probability measure is invariant under the shift map σ ,
because the right-hand side of (7.2.2) does not depend on m. Every probability
measure μ obtained in this way is called a Markov measure; moreover, the
system (σ , μ) is called a Markov shift.
Example 7.2.1 (Bernoulli measure). Suppose that P(x, ·) does not depend on
x, that is, that there exists a probability measure ν on X such that P(x, ·) = ν
for every x ∈ X. Then
P(x, E) dp(x) = ν(E) dp(x) = ν(E)
We say that P is a stochastic matrix. Conversely, any matrix that satisfies (i)
and (ii) defines a family of transition probabilities on the set X. Observe also
that, denoting p = (p1 , . . . , pd ), the relation (7.2.4) corresponds to
P∗ p = p, (7.2.6)
where P∗ denotes the transpose of the matrix P. In other words, the stationary
measures correspond precisely to the eigenvectors of the transposed matrix for
the eigenvalue 1. Using the following classical result, one can show that such
eigenvalues always exist:
Using property (ii) of the stochastic matrix, the left-hand side of this equality
may be written as
d d d
pi Pi,j = pi .
i=1 j=1 i=1
Comparing the last two equalities and recalling that the sum of the coefficients
of p is a positive number, we conclude that λ = 1. This proves our claim that
there always exist vectors p = 0 satisfying (7.2.6).
When Pn has positive coefficients for some n ≥ 1, it follows from
Theorem 7.2.3 that the eigenvector is unique up to scaling, and it may be
chosen with positive coefficients.
7.2 Markov shifts 193
Example 7.2.4. In general, p is not unique and it may also happen that there
is no eigenvalue with positive coefficients. For example, consider:
⎛ ⎞
1−a a 0 0 0
⎜ b 1−b 0 ⎟
⎜ 0 0 ⎟
⎜ ⎟
P=⎜ 0 0 1−c c 0 ⎟
⎜ ⎟
⎝ 0 0 d 1−d 0 ⎠
e 0 0 0 1−e
where a, b, c, d, e ∈ (0, 1). A vector p = (p1 , p2 , p3 , p4 , p5 ) satisfies P∗ p = p if
and only if ap1 = bp2 and cp3 = dp4 and p5 = 0. Therefore, the eigenspace has
dimension 2 and no eigenvector has positive coefficients.
On the other hand, suppose that p is such that pi = 0 for some i. Let μ be
the corresponding Markov measure and let i = (X \ {i})N (or i = (X \ {i})Z ).
Then μ(i ) = 1, since μ([n; i]) = pi = 0 for every n. This means that we may
eliminate the symbol i, and still have a system that is equivalent to the original
one. Therefore, up to removing from the set X a certain number of superfluous
symbols, we may always take the eigenvector p to have positive coefficients.
Denote by P the set of all sequences (xn )n ∈ satisfying
Pxn ,xn+1 > 0 for every n, (7.2.7)
that is, such that all the transitions are “allowed” by P. It is clear from the
definition that P is invariant under the shift map σ . The transformations σ :
P → P constructed in this way are called shifts of finite type and will be
studied in more detail in Section 10.2.2.
Lemma 7.2.5. The set P is closed in and, given any solution of P∗ p = p
with positive coefficients, the support of the corresponding Markov measure μ
coincides with P .
Example 7.2.6. There are three possibilities for the support of a Markov
measure in Example 7.2.4. If p = (p1 , p2 , 0, 0, 0) with p1 , p2 > 0 then we may
eliminate the symbols 3, 4, 5. All the sequences of symbols 1, 2 are admissible.
194 Correlations
7.2.1 Ergodicity
In this section we always take p = (p1 , . . . , pd ) to be a solution of P∗ p = p
with pi > 0 for every i, normalized in such a way that i pi = 1. Let μ be the
corresponding Markov measure. We want to understand which conditions the
stochastic matrix P must satisfy for the system (σ , μ) to be ergodic.
We say that a stochastic matrix P is irreducible if for every 1 ≤ i, j ≤ d there
exists n ≥ 0 such that Pni,j > 0. In other words, P is irreducible if any outcome
i may be followed by any outcome j, after a certain number n of steps which
may depend on i and j.
Theorem 7.2.8. The Markov shift (σ , μ) is ergodic if and only if the matrix P
is irreducible.
1 l
n−1
lim Pi,j = pj for every 1 ≤ i, j ≤ d. (7.2.8)
n n
l=0
Proof. Assume that (7.2.8) holds. Recall that pj > 0 for every j. Then, given
any 1 ≤ i, j ≤ d, we have Pli,j > 0 for infinitely many values of l. In particular,
P is irreducible.
To prove the converse, consider A = [0; i] and B = [0; j]. By Lemma 7.2.9,
1 1 1 l
n−1 n−1
−l
μ A ∩ σ (B) = μ(A)μ(B) P .
n l=0 pj n l=0 i,j
According to Exercise 4.1.2, the left-hand side converges when n → ∞.
Therefore,
1 l
n−1
Qi,j = lim Pi,j
n n
l=0
exists for every 1 ≤ i, j ≤ d. Consider the matrix Q = (Qi,j )i,j , that is,
1 l
n−1
Q = lim P. (7.2.9)
n n
l=0
Using Lemma 7.2.7(ii) and taking the limit when n → ∞, we get that
d
pi Qi,j = pj for every 1 ≤ j ≤ d. (7.2.10)
i=1
196 Correlations
It follows that Qi,j does not depend on i. Indeed, suppose that there exist r and
s such that Qr,j < Qs,j . Of course, we may choose s in such a way that the
right-hand side of this inequality is larger. Since P is irreducible, there exists k
such that Pks,r > 0. Hence, using (7.2.11) followed by Lemma 7.2.7(i),
/ d 0
d
Qs,j = Pks,i Qi,j < Pks,i Qs,j = Qs,j ,
i=1 i=1
which is a contradiction. This contradiction proves that Qi,j does not depend
on i, as claimed. Write Qj = Qi,j for any i. The property (7.2.10) gives that
/ d 0
d
pj = Qi,j pi = Qj pi = Qj ,
i=1 i=1
Proof of Theorem 7.2.8. Suppose that μ is ergodic. Let A = [0; i] and B = [0; j].
By Proposition 4.1.4,
1
n−1
lim μ A ∩ σ −l (B) = μ(A)μ(B) = pi pj . (7.2.12)
n n
l=0
On the other hand, by Lemma 7.2.9, we have that μ(A∩σ −l (B)) = pi Pli,j . Using
this identity in (7.2.12) and dividing both sides by pi we find that
1 l
n−1
lim Pi,j = pj .
n n
l=0
Note that j is arbitrary. So, by Lemma 7.2.10, this proves that P is irreducible.
Now suppose that the matrix P is irreducible. We want to conclude that μ is
ergodic. According to Corollary 4.1.5, it is enough to prove that
1
n−1
lim μ A ∩ σ −l (B) = μ(A)μ(B) (7.2.13)
n n
l=0
for any A and B in the algebra generated by the cylinders. Since the elements
of this algebra are the finite pairwise disjoint unions of cylinders, it suffices
to consider the case when A and B are cylinders, say A = [m; am , . . . , aq ] and
B = [r; br , . . . , bs ]. Observe also that the validity of (7.2.13) is not affected if
one replaces B by some pre-image σ −j (B). So, it is no restriction to suppose
that r > q. Then, by Lemma 7.2.9,
1 1 1 r−q+l
n−1 n−1
−l
μ A ∩ σ (B) = μ(A)μ(B) P
n l=0 pbr n l=0 aq ,br
7.2 Markov shifts 197
1 r−q+l 1 l
n−1 n−1
lim Paq ,br = lim Paq ,br = pbr .
n n n n
l=0 l=0
7.2.2 Mixing
In this section we characterize the Markov shifts that are mixing, in terms of
the corresponding stochastic matrix P. As before, we take p to be a normalized
solution of P∗ p = p with positive coefficients and μ to be the corresponding
Markov measure.
We say that a stochastic matrix P is aperiodic if there exists n ≥ 1 such that
Pi,j > 0 for every 1 ≤ i, j ≤ d. In other words, P is aperiodic if some power Pn
n
has only positive coefficients. The relation between the notions of aperiodicity
and irreducibility is analyzed in Exercise 7.2.6.
Theorem 7.2.11. The Markov shift (σ , μ) is mixing if and only if the matrix P
is aperiodic.
Proof. Since we assume that pj > 0 for every j, it is clear that (7.2.14) implies
that Pli,j > 0 for every i, j and every l sufficiently large.
Now suppose that P is aperiodic. Then we may apply the theorem of
Perron–Frobenius (Theorem 7.2.3) to the matrix A = P∗ . Since p is an
eigenvector of A with positive coefficients, we get that λ = 1 and all the other
eigenvalues of A are smaller than 1 in absolute value. By Lemma 7.2.7(iii),
the hyperplane H formed by the vectors (h1 , . . . , hd ) with h1 + · · · + hd = 0 is
invariant under A. It is clear that H is transverse to the direction of p. Then the
decomposition
Rd = Rp ⊕ H (7.2.15)
is invariant under A and the restriction of A to the hyperplane H is a contraction,
meaning that its spectral radius is smaller than 1. It follows that the sequence
(Al )l converges to the projection on the first coordinate of (7.2.15), that is, to
the matrix B characterized by Bp = p and Bh = 0 for every h ∈ H. In other
words, (Pl )l converges to B∗ . Observe that
Bi,j = pi for every 1 ≤ i, j ≤ d.
Therefore, limn Pli,j = Bj,i = pj for every i, j.
198 Correlations
Proof of Theorem 7.2.11. Suppose that the measure μ is mixing. Let A = [0; i]
and B = [0; j]. By Lemma 7.2.9, we have that μ(A ∩ σ −l (B)) = pi Pli,j for every
l. Therefore,
pi lim Pli,j = lim μ A ∩ σ −l (B) = μ(A)μ(B) = pi pj .
l l
Dividing both sides by pi we get that liml Pli,j = pj . According to Lemma 7.2.12,
this proves that P is aperiodic.
Now suppose that the matrix P is aperiodic. We want to conclude that μ is
mixing. According to Lemma 7.1.2, it is enough to prove that
lim μ A ∩ σ −l (B) = μ(A)μ(B) (7.2.16)
l
for any A and B in the algebra generated by the cylinders. Since the elements
of this algebra are the finite pairwise disjoint unions of cylinders, it suffices
to treat the case when A and B are cylinders, say A = [m; am , . . . , aq ] and B =
[r; br , . . . , bs ]. By Lemma 7.2.9,
1 r−q+l
μ A ∩ σ −l (B) = μ(A)μ(B) Paq ,br
pbr
for every l > q − r. Then, using Lemma 7.2.12,
1
lim μ A ∩ σ −l (B) = μ(A)μ(B)
r−q+l
lim Paq ,br
l pbr l
1
= μ(A)μ(B) lim Plaq ,br = μ(A)μ(B)
pbr l
This proves the property (7.2.16) for cylinders A and B.
1/2 0 1/2 0
Indeed, we see that Pni,j > 0 if and only if n has the same parity as i − j. Note
that ⎛ ⎞
1/2 0 1/2 0
⎜ 0 1/2 0 1/2 ⎟
P2 = ⎜ ⎝ 1/2 0 1/2 0 ⎠ .
⎟
0 1/2 0 1/2
Exercise 7.2.6 shows that every irreducible matrix has a form of this type.
7.2.3 Exercises
7.2.1. Let X = {1, . . . , d} and P = (Pi,j )i,j be a stochastic matrix and p = (pi )i be a
probability vector such that P∗ p = p. Show that the function defined on the set of
all cylinders by
μ [m; am , . . . , an ] = pam Pam ,am+1 · · · Pan−1 ,an
extends to a measure on the Borel σ -algebra of = X N (or = X Z ), invariant
under the shift map σ : → .
7.2.2. Prove that every weak mixing Markov shift is actually mixing.
7.2.3. Let X = {1, . . . , d} and let μ be a Markov measure for the shift map σ : X Z → X Z .
Does it follow that μ is also a Markov measure for the inverse σ −1 : → ?
7.2.4. Let X be a finite set and = X Z (or = X N ). Let μ be a probability measure
on , invariant under the shift map σ : → . Given k ≥ 0, we say that μ has
memory k if
μ([m − l; am−l , . . . , am−1 , am ]) μ([m − k; am−k , . . . , am−1 , am ])
=
μ([m − l; am−l , . . . , am−1 ]) μ([m − k; am−k , . . . , am−1 ])
for every l ≥ k, every m and every (an )n ∈ (by convention, the equality holds
whenever at least one of the denominators is zero). Check that the measures with
memory 0 are the Bernoulli measures and the measures with memory 1 are the
Markov measures. Show that every measure with memory k ≥ 2 is equivalent to
a Markov measure in the space ˜ = X̃ Z (or ˜ = X̃ N ), where X̃ = X k .
7.2.5. The goal is to show that the set of all measures with finite memory is dense
in the space M1 (σ ) of all probability measures invariant under the shift map
σ : → . Given any invariant probability measure μ and any k ≥ 1, consider
the function μk defined on the set of all cylinders by
• μk = μ for cylinders with length less than or equal to k;
• for every l ≥ k, every m and every (an )n ∈ ,
μk ([m − l; am−l , . . . , am−1 , am ]) μ([m − k; am−k , . . . , am−1 , am ])
= .
μk ([m − l; am−l , . . . , am−1 ]) μ([m − k; am−k , . . . , am−1 ])
200 Correlations
f (T )
f (C )
f (A )
f (G )
T C A G
f (A)
f (B)
A B
lengths λβ corresponding to all labels β to the left of α on the top row of π and
v1 is the sum of the lengths λγ corresponding to all the labels γ to the left of α
on the bottom row of π . Then
f (x) = x + wα for every x ∈ Iα .
The vector w = (wα )α∈A is called the translation vector. Clearly, for each fixed
π , the translation vector is a linear function of the length vector λ = (λα )α∈A .
Example 7.3.2. The simplest interval exchanges have only two continuity
subintervals. See Figure 7.2. Choosing the alphabet A = {A, B}, we get
A B x + λB for x ∈ IA
π= and f (x) =
B A x − λA = x + λB − 1 for x ∈ IB .
This transformation corresponds precisely to the rotation RλB if we identify
[0, 1) with the circle S1 in the natural way. In this sense, the class of interval
exchanges are a generalization of the family of circle rotations.
for every non-zero vector (nα )α∈A with integer coefficients. This turns out to
be true but, in fact, the hypothesis of rational independence is a bit too strong:
we are going to present a somewhat more general condition that still implies
minimality.
We denote by ∂Iα the left endpoint of each subinterval Iα . We say that a
pair (π , λ) satisfies the Keane condition if the trajectories of these points are
disjoint:
f m (∂Iα ) = ∂Iβ for every m ≥ 1 and any α, β ∈ A with ∂Iβ = 0 (7.3.1)
(note that there always exist ᾱ and β̄ such that f (∂Iᾱ ) = 0 = ∂Iβ̄ ). We leave the
proof of the next lemma as an exercise (Exercise 7.3.2):
Lemma 7.3.3. (1) If the pair (π , λ) satisfies the Keane condition then the
combinatorics matrix π is irreducible.
(2) If π is irreducible and λ is rationally independent then the pair (π , λ)
satisfies the Keane condition.
Example 7.3.4. In the case of two subintervals (recall Example 7.3.2), the
interval exchange has the form f m (x) = x + mλB mod Z. Then, the Keane
condition means that
mλB = λA + n and λA + mλB = λA + n
for every m ∈ N and n ∈ Z. It is clear that this holds if and only if the vector
(λA , λB ) is rationally independent.
Example 7.3.5. For exchanges of three or more intervals, the Keane condition
is strictly weaker than the rational independence of the length vector. Consider,
for example,
A B C
π= .
C A B
Then f m (x) = x + mλC mod Z and, thus, the Keane condition means that
{mλC , λA + mλC , λA + λB + mλC } and {λA + n, λA + λB + n}
204 Correlations
Example 7.3.7. The Keane condition is not necessary for minimality. For
example, consider the interval exchange defined by (π , λ), where
A B C D
π= ,
D C B A
λA = λC , λB = λD and λA /λB = λC /λD is irrational. Then (π , λ) does not satisfy
the Keane condition and yet f is minimal.
Earlier, Michael Keane and Gérard Rauzy [KR80] had shown that unique
ergodicity holds for a residual (Baire second category) subset of length vectors
whenever the combinatorics is irreducible.
7.3.2 Mixing
The interval exchanges provide many examples of systems that are uniquely
ergodic and weak mixing but not (strongly) mixing.
Indeed, the theorem of Masur–Veech (Theorem 7.3.8) asserts that almost
every interval exchange is uniquely ergodic. Another deep theorem, due to
Artur Avila and Giovanni Forni [AF07], states that, circle rotations (more
precisely, interval exchanges with a unique discontinuity point) excluded,
almost every interval exchange is weak mixing. The topological version of
this fact had been proved by Arnaldo Nogueira and Donald Rudolph [NR97].
7.3 Interval exchanges 205
On the other hand, a result of Anatole Katok [Kat80] that we are going to
discuss below asserts that interval exchanges are never mixing:
Proof. Let B be the set formed by the endpoints a and b of J together with the
endpoints ∂Iα , α ∈ A minus the origin. Then #B ≤ d + 1. Let BJ ⊂ J be the set
of points x ∈ J for which there exists m ≥ 1 such that f m (x) ∈ B and f n (x) ∈
/J
for every 0 < n < m. The fact that f is injective, together with the definition of
m, implies that the map
BJ → B, x → f m (x)
and so si=1 f ti (Ji ) = J. This proves part (iii). Part (iv) also follows directly
from the fact that f is injective and the ti are the first-return times to J. Finally,
part (v) is a direct consequence of part (iii).
Lemma 7.3.11. Given δ > 0 and N ≥ 1, we may choose the interval J in such
a way that diam PJ < δ and ti ≥ N for every i.
Proof. It is clear that diam f n (Ji ) = diam Ji ≤ diam J for every i and every n.
Hence, diam PJ < δ as long as we pick J with diameter smaller than δ. To get
the second property in the statement, take any point x ∈ I such that f n (x) = ∂Iα
for every 0 ≤ n < N and α ∈ A. We claim that f n (x) = x for every 0 < n < N.
Otherwise, since f n is a translation in the neighborhood of x, we would have
f n (y) = y for every point y in that neighborhood, which would contradict the
hypothesis that (f , m) is ergodic. This proves our claim. Now it suffices to take
J = [x, x + ε) with ε < min0<n<N d(x, f n (x)) to ensure that ti ≥ N for every i.
Since the algebra AJ is formed by the finite pairwise disjoint unions of intervals
f n (Ji ), 0 ≤ n < ti , it follows that
s
si
A⊂ f −ti,j (A) for every A ∈ AJ .
i=1 j=1
si
In particular, m(A) ≤ si=1 j=1 m(A ∩ f −ti,j (A)). Recalling that s ≤ d + 2 and
si ≤ d + 2 for every i, this implies (7.3.2).
We are ready to conclude the proof of Theorem 7.3.9. For that, let us fix a
measurable set X ⊂ [0, 1) with
1
0 < m(X) < .
4(d + 2)2
By Lemma 7.3.11, given any N ≥ 1 we may find an interval J ⊂ [0, 1) such that
all the first-return times ti ≥ N and there exists A ∈ AJ such that
1
m(X A) < m(X)2 . (7.3.3)
4
Applying Lemma 7.3.12, we get that there exists ti,j ≥ ti ≥ N such that:
m X ∩ f −tij (X) ≥ m A ∩ f −tij (A) − 2m(X A)
1 1
≥ m(A) − m(X)2 .
(d + 2)2 2
The relation (7.3.3) implies that m(A) ≥ (3/4)m(X). Therefore,
3 1 1
m X ∩ f −tij (X) ≥ m(X) − m(X)2
4 (d + 2)2 2
1
≥ 3m(X)2 − m(X)2 > 2m(X)2 .
2
This proves that lim supn m(X ∩ f −n (X)) ≥ 2m(X)2 , and so the system (f , m)
cannot be mixing.
7.3.3 Exercises
7.3.1. Let ω be an area form on a surface. Let X be a differentiable vector field on S
and β be the differential 1-form defined on S by βx = ωx (X(x), ·). Show that β is
closed if and only if X preserves the area measure.
7.3.2. Prove Lemma 7.3.3.
7.3.3. Show that if (π , λ) satisfies the Keane condition then f has no periodic points.
[Observation: This is a step in the proof of Theorem 7.3.6.]
7.3.4. Let f : [0, 1) → [0, 1) be an irreducible interval exchange and let a ∈ (0, 1)
be the largest of all the discontinuity points of f and f −1 . The Rauzy–Veech
renormalization R(f ) : [0, 1) → [0, 1) is defined by R(f )(x) = g(ax)/a, where
g is the first-return map of f to the interval [0, a). Check that R(f ) is an interval
exchange with the same number of continuity subintervals as f , or less. If f is
described by the data (π , λ), how can we describe R(f )?
208 Correlations
the largest eigenvalue is simple and all the rest of the spectrum is contained in
a closed disk with strictly smaller radius.
The transfer operator of the shift map f is the linear operator Lf mapping
each function ψ : M → R to the function Lf ψ : M → R defined by
d
Lf ψ(x1 , . . . , xn , . . . , ) = Lx1 ,x0 ψ(x0 , x1 , . . . , xn , . . . ). (7.4.3)
x0 =1
Using the definition of the transfer operator, the right-hand side of this
expression is equal to
pa0 Pa0 ,a1 Pa1 ,a2 · · · Pan−1 ,an ϕ(a1 , . . . , an )ψ(a0 , a1 , . . . , an ).
a0 ,a1 ,...,an
Taking ϕ ≡ 1 in (7.4.4) we get the following special case, which will also be
useful later:
Lf ψ dμ = ψ dμ for every ψ. (7.4.7)
Now let us denote by E0 the subset of functions ψ that depend only on the
first coordinate. The map ψ → (ψ(1), . . . , ψ(d)) is an isomorphism between
E0 and the Euclidean space Rd . Moreover, the definition
d
Lf ψ(x1 ) = Lx1 ,x0 ψ(x0 )
x0 =1
Then the spectral gap property implies that there exists B > 1 such that
n
sup Lf ψ − ψ dμ ≤ Bψ0 λn for every n ≥ 1. (7.4.8)
Observe that Lf (E) ⊂ E. The function ψ = sup |ψ| + Kθ (ψ) is a complete
norm in E and the linear operator Lf : E → E is continuous relative to this
7.4 Decay of correlations 211
norm (Exercise 7.4.1). One way to prove the theorem is by showing that this
operator has the spectral gap property, with invariant decomposition
E = Ru ⊕ ψ ∈ E : ψ dμ = 0 .
Once that is done, exactly the same argument that we used before for E0 proves
the exponential decay of correlations in E. We do not present the details here
(but we will come back to this theme, in a much more general context, near
the end of Section 12.3). Instead, we give a direct proof that (7.4.8) may be
extended to the space E.
Given ψ ∈ E and x = (x1 , . . . , xn , . . . ) ∈ M, we have
d
Lkf ψ(x) = Lx1 ,ak · · · La2 ,a1 ψ(a1 , . . . , ak , x1 , . . . , xn , . . . )
a1 ,...,ak =1
Fix σ < 1 such that σ 2 ≥ max{2−θ , λ}. Then the previous inequality (with
l ≈ n/2 ≈ k) gives
n
sup Lf ψ − ψ dμ ≤ Bψσ n−1 for every n. (7.4.12)
Now Theorem 7.4.1 follows from the same argument that we used before for
E0 , with (7.4.12) in the place of (7.4.8).
7.4.1 Exercises
7.4.1. Show that ϕ = sup |ϕ| + Kθ (ϕ) defines a complete norm in the space E of
θ-Hölder functions and the transfer operator Lf is continuous relative to this
norm.
7.4.2. Let f : M → M be a local diffeomorphism on a compact manifold M and d ≥ 2
be the degree of f . Assume that there exists σ > 1 such that Df (x)v ≥ σ v
for every x ∈ M and every vector v tangent to M at the point x. Fix θ > 0 and let
E be the space of θ -Hölder functions ϕ : M → R. For every ϕ ∈ E, define
1
Lf ϕ : M → R, Lf ϕ(y) = ϕ(x).
d −1
x∈f (y)
(a) Show that inf ϕ ≤ inf Lf ϕ ≤ sup Lf ϕ ≤ sup ϕ and Kθ (Lf ϕ) ≤ σ −θ Kθ (ϕ) for
every ϕ ∈ E.
(b) Conclude that Lf : E → E is a continuous linear operator (relative to the
norm defined in Exercise 7.4.1) with Lf = 1.
(c) Show that, for every ϕ ∈ E, the sequence (Lnf ϕ)n converges to a constant
νϕ ∈ R when n → ∞. Moreover, there exists C > 0 such that
φ∗ μ = ν and φ ◦ f = g ◦ φ.
the intersection of Y with some interval of length 10−m . This observation also
shows that φ∗ ν = m, where ν denotes the Bernoulli measure on {0, 1, . . . , 9}N
that assigns equal weights to all the digits. Moreover, denoting by σ the shift
map in {0, 1, . . . , 9}N , we have that
φ ◦ σ (an )n = 0, a1 a2 . . . an · · · = f ◦ φ (an )n
σ : {1, 2}Z → {1, 2}Z and ζ : {1, 2, 3}Z → {1, 2, 3}Z , (8.1.1)
endowed with the corresponding Bernoulli measures giving the same weights
to all the symbols, ergodically equivalent? It is easy to see that σ and ζ are
not topologically conjugate (for example: ζ has three fixed points, whereas
σ has only two), but the existence of an ergodic equivalence is a much more
delicate issue. In fact, this type of question motivates most of the content of
the present chapter and also leads to the notion of entropy, which is the subject
of Chapter 9.
8.1.1 Exercises
8.1.1. Let f : [0, 1] → [0, 1] be the transformation defined by f (x) = 2x − [2x] and m be
the Lebesgue measure on [0, 1]. Exhibit a map g : [0, 1] → [0, 1] and a probability
measure ν invariant under g such that (g, ν) is ergodically equivalent to (f , μ) and
the support of ν has empty interior.
8.1.2. Let f : {1, . . . , k}N → {1, . . . , k}N and g : {1, . . . , l}N → {1, . . . , l}N be one-sided shift
maps, endowed with Bernoulli measures μ and ν, respectively. Show that, for
every set X ⊂ {1, . . . , k}N with f −1 (X) = X and μ(X) = 1, there exists x ∈ X such
that #(X ∩ f −1 (x)) = k. Conclude that if k = l then (f , μ) and (g, ν) cannot be
ergodically equivalent.
8.1.3. Let X = {1, . . . , d} and consider the shift map σ : X N → X N endowed with
a Markov measure μ. Given any cylinder C = [0; c0 , . . . , cl ] in X N , let μC
be the normalized restriction of μ to C. Show that there exists an induced
transformation σC : C → C (see Section 1.4.2) preserving μC and such that
(σC , μC ) is ergodically equivalent to a Bernoulli shift (σN , ν) in NN .
The converse is false, as will be clear from the sequel. For example,
all countably generated two-sided Bernoulli shifts are spectrally equivalent
(Corollary 8.4.12); however, not all have the same entropy (Example 9.1.10)
and so not all are ergodically equivalent.
and, hence, limn Ugn ϕ · ψ = L−1 ϕ dμ L−1 ψ dμ for every ϕ, ψ ∈ L2 (ν). Also,
L−1 ϕ dμ = L−1 ϕ · 1 = L−1 ϕ · L−1 1 = ϕ · 1 = ϕ dν
for every ϕ, ψ ∈ L2 (ν), that is, (g, ν) is also mixing. This shows that the mixing
property is an invariant of spectral equivalence.
The same argument may be used for the weak mixing property, though the
theorem that we prove in Section 8.2.2 below gives us a more interesting proof
of the fact that weak mixing is an invariant of spectral equivalence.
contradicting the hypothesis that the system is weak mixing. In the second
case, using that the system is ergodic, we find that ϕ is constant at μ-almost
every point. This shows that if the system is weak mixing then the constant
functions are the only eigenvectors.
8.2 Spectral equivalence 219
Now suppose that the only eigenvectors of Uf are the constant functions. To
conclude that (f , μ) is weak mixing, we must show that
1
n−1
Cj (ϕ, ψ)2 → 0 for any ϕ, ψ ∈ L2 (μ)
n j=0
C
where θ = Eϕ · ψ. The expression on the right-hand side may be rewritten as
follows:
z dθ (z) z̄ dθ̄ (z) =
j j
zj w̄j dθ (z) dθ̄ (w).
C C C C
Therefore, given any n ≥ 1,
1 1
n−1 n−1
Cj (ϕ, ψ)2 = (zw̄)j dθ (z) dθ̄ (w). (8.2.3)
n j=0 C C n j=0
Ik–1
I0
8.2.3 Exercises
8.2.1. Let (f , μ) be an invertible ergodic system with no atoms. Show that every λ in the
unit circle {z ∈ C : |z| = 1} is an approximate eigenvalue of the Koopman operator
Uf : L2 (μ) → L2 (μ): there exists some sequence (ϕn )n such that ϕn → 1 and
Uf ϕn − λϕn → 0. In particular, the spectrum of Uf coincides with the unit
circle.
8.2.2. Let m be the Lebesgue measure on the circle and Uα : L2 (m) → L2 (m) be
the Koopman operator of the irrational rotation Rα : S1 → S1 . Calculate the
eigenvalues of Uα and deduce that (Rα , m) and (Rβ , m) are spectrally equivalent
if and only if α = ±β. [Observation: Corollary 8.3.6 provides a more complete
statement.]
8.2.3. Let m be the Lebesgue measure on the circle and, for each integer number k ≥ 2,
let Uk : L2 (m) → L2 (m) be the Koopman operator of the transformation fk : S1 →
S1 given by fk (x) = kx mod Z. Check that if p = q then (fp , m) and (fq , m) are not
ergodically equivalent. Show that, for any k ≥ 2,
∞
j
L (m) = {constants} ⊕
2
Uk (Hk ),
j=0
where Hk = { n∈Z an e2π inx : an = 0 if k | n} and the terms in the direct sum are
pairwise orthogonal. Conclude that (fp , m) and (fq , m) are spectrally equivalent
for any p and q.
8.2.4. Let f : S1 → S1 be the transformation given by f (x) = kx mod Z and μ be the
Lebesgue measure. Show that (f , μ) is weak mixing if and only if |k| ≥ 2.
8.2.5. Prove that, for any invertible transformation f , if μ is ergodic for every iterate f n
and there exists C > 0 such that
lim sup μ f −n (A) ∩ B ≤ Cμ(A)μ(B),
n
for any measurable sets A and B, then μ is weak mixing. [Observation: This
statement is due to Ornstein [Orn72]. In fact, he proved more: under these
hypotheses the system is (strongly) mixing.]
8.2.6. Let z and w be two complex numbers with absolute value 1. Check that
1 j
n−1
(a) lim |z − 1| = 0 if and only if z = 1;
n n
j=0
1
n−1
(b) lim (zw̄)j = 0 if z = w.
n n j=0
8.2.7. Consider the system (f , m) in Example 8.2.3. Show that
(a) the system (f , m) is ergodic;
(b) the only eigenvalues of the Koopman operator Uf : L1 (m) → L1 (m) are the
constant functions, and hence (f , m) is weak mixing;
(c) lim supn m(f n (A) ∩ A) ≥ 2/27 if we take A = [0, 2/9); in particular, (f , m) is
not mixing.
222 Equivalent systems
since λ1 λ−12 = 1. This identity also shows that the set of all eigenvalues is
closed under the operation (λ1 , λ2 ) → λ1 λ−1 2 . Recalling that 1 is always an
eigenvalue, it follows that this set is a multiplicative group.
Now assume that (f , μ) is ergodic and suppose that Uf ϕ = λϕ. Then
Uf (|ϕ|) = |Uf ϕ| = |λϕ| = |ϕ| at μ-almost every point. By ergodicity, this
implies that |ϕ| is constant at μ-almost every point. Next, suppose that Uf ϕ1 =
λϕ1 , Uf ϕ2 = λϕ2 and the functions ϕ1 and ϕ2 are not identically zero. Since
|ϕ2 | is constant at μ-almost every point, ϕ2 (x) = 0 for μ-almost every x. Then
ϕ1 /ϕ2 is well defined. Moreover,
ϕ1 Uf (ϕ1 ) λϕ1 ϕ1
Uf = = = .
ϕ2 Uf (ϕ2 ) λϕ2 ϕ2
By ergodicity, it follows that the quotient is constant at μ-almost every point.
That is, ϕ1 = cϕ2 for some c ∈ C.
We say that a system (f , μ) has discrete spectrum if the eigenvectors
of the Koopman operator Uf : L2 (μ) → L2 (μ) generate the Hilbert space
L2 (μ). Observe that this implies that Uf is invertible and, hence, is a unitary
operator. This terminology is justified by the following observation (recall also
Theorem A.7.9):
Proposition 8.3.2. A system (f , μ) has discrete spectrum if and only if its
Koopman operator Uf has a spectral representation of the form
T: L2 (σj )χj → L2 (σj )χj , (ϕj,l )j,l → z → zϕj,l (z) j,l , (8.3.1)
j j
Example 8.3.3. Let m be the Lebesgue measure on the torus Td . Consider the
Fourier basis {φk (x) = e2πik·x : k ∈ Zd } of the Hilbert space L2 (m). Let f be the
rotation Rθ : Td → Td corresponding to a given vector θ = (θ1 , . . . , θd ). Then,
Uf φk (x) = φk (x + θ ) = e2πik·θ φk (x) for every x ∈ Td .
This shows that every φk is an eigenvector of Uf and, hence, (f , m) has discrete
spectrum. Note that the group of eigenvalues is
Gθ = {e2πik·θ : k ∈ Zd }, (8.3.4)
that is, the subgroup of the unit circle generated by {e2πiθj : j = 1, . . . , d}.
is that every subgroup of the unit circle is the group of eigenvalues of some
ergodic system with discrete spectrum. These facts are proved in Section 3.3
of the book of Peter Walters [Wal82].
Proposition 8.3.4. Suppose that (f , μ) and (g, ν) are ergodic and have discrete
spectrum. Then (f , μ) and (g, ν) are spectrally equivalent if and only if their
Koopman operators Uf : L2 (μ) → L2 (μ) and Ug : L2 (ν) → L2 (ν) have the
same eigenvalues.
Proof. It is clear that if the Koopman operators are conjugate then they have
the same eigenvalues. To prove the converse, let (λj )j be the eigenvalues of the
two operators. By Proposition 8.3.2, the eigenvalues are simple. For each j,
let uj and vj be unit vectors in ker(Uf − λj id ) and ker(Ug − λj id ), respectively.
Then (uj )j and (vj )j are Hilbert bases of L2 (μ) and L2 (ν), respectively. Consider
the isomorphism L : L2 (μ) → L2 (ν) defined by L(uj ) = vj . This operator is
unitary, since it maps a Hilbert basis to a Hilbert basis, and it satisfies
We leave the proof to the reader (Exercise 8.3.2). In the special case of the
circle, we get that two irrational rotations Rθ and Rτ are equivalent if and only
if either Rθ = Rτ or Rθ = R−1τ . See also Exercise 8.3.3.
8.4 Lebesgue spectrum 225
8.3.1 Exercises
8.3.1. Suppose that (f , μ) has discrete spectrum and the Hilbert space L2 (μ) is separable
(this is the case, for instance, if the σ -algebra of measurable sets is countably
generated). Show that there exists a sequence (nk )k converging to infinity such
n
that Uf k ϕ − ϕ2 converges to zero when k → ∞, for every ϕ ∈ L2 (μ).
8.3.2. Prove Corollary 8.3.6.
8.3.3. Let m be the Lebesgue measure on S1 and θ = p/q and τ = r/s be two rational
numbers, with gcd(p, q) = 1 = gcd(r, s). Show that the rotations (Rθ , m) and
(Rτ , m) are ergodically equivalent if and only if the denominators q and s are
equal.
(i) U(E) ⊂ E;
(ii) n∈N U n (E) = {0};
−n
(iii) n∈N U (E) = H.
Note that L02 (μ) is invariant under the Koopman operator: ϕ ∈ L02 (μ) if and
only if Uf ϕ ∈ L02 (μ). We say that the system (f , μ) has Lebesgue spectrum if
the restriction of the Koopman operator to L02 (μ) has Lebesgue spectrum.
For each n ∈ N, we may write Ac = σ −n ({x ∈ X N : ψn (x) > c}). Then Ac belongs
to the σ -algebra generated by the cylinders of the form [n; Cn , . . . , Cm ] with
m ≥ n. Consequently, μ(Ac ∩ C) = μ(Ac )μ(C) for every cylinder C of the form
C = [0; C0 , . . . , Cn−1 ]. Since n is arbitrary and the cylinders are a generating
family, it follows that μ(Ac ∩B) = μ(Ac )μ(B) for every measurable set B ⊂ X N .
Taking B = Ac we conclude that μ(Ac ) = μ(Ac )2 ; in other words, μ(Ac ) ∈ {0, 1}
for every c ∈ R. This proves that ϕ is constant at μ-almost every point, as stated.
Lemma 8.4.4. If (f , μ) has Lebesgue spectrum then limn Ufn ϕ · ψ = 0 for every
ϕ ∈ L02 (μ) and every ψ ∈ L2 (μ).
Now consider any ϕ ∈ L02 (μ). By condition (iii) in the definition, for every
ε > 0 there exist k ∈ N and ϕk ∈ Uf−k (E) such that ϕ − ϕk 2 ≤ ε. Using the
Cauchy–Schwarz once more inequality:
|Ufn ϕ · ψ − Ufn ϕk · ψ| ≤ ϕ − ϕk 2 ψ2 ≤ εψ2
for every n. Recalling that limn Ufn ϕk · ψ = 0 (by the previous paragraph), we
find that
−εψ2 ≤ lim inf Ufn ϕ · ψ ≤ lim sup Ufn ϕ · ψ ≤ εψ2 .
n n
Proof. Suppose that there exists some subspace F as in the statement. Take
1
E= ∞ k=0 U (F). Condition (i) in Definition 8.4.1 is immediate:
k
∞
U(E) = U k (F) ⊂ E.
k=1
1∞ k
As for condition (ii), note that ϕ ∈ ∞n=0 U n
(E) means that ϕ ∈ k=n U (F) for
every n ≥ 0. This implies that ϕ is orthogonal to U (F) for every k ∈ Z. Hence,
k
for every n and the sequence on the left-hand side converges to ϕ when n → ∞.
This gives condition (iii) in the definition.
Now we prove the converse. Given E satisfying the conditions (i), (ii) and
(iii) in the definition, take F = E " U(E). It is easy to see that the iterates of F
are pairwise orthogonal. We claim that
∞
U k (F) = E. (8.4.1)
k=0
n−1
v =
2
vj 2 + wn 2 for every n
j=0
and, thus, the series ∞j=0 vj is summable. Given ε > 0, fix m ≥ 1 such that
2
Condition (iii) in the hypothesis implies that this subspace coincides with H.
The next result is the reason why systems with Lebesgue spectrum are
denominated in this way, and it also leads naturally to the notion of rank:
Proof. Let us start by proving the “if” claim. As we know, the Fourier family
{zn : n ∈ Z} is a Hilbert basis of the space L2 (λ). Let Vn be the one-dimensional
1
subspace generated by ϕ(z) = zn . Then, L2 (λ) = n∈Z Vn and, consequently,
χ
χ
L (λ) =
2
Vn = Vnχ (8.4.3)
n∈Z n∈Z
statement of the proposition, with χ equal to the cardinal of the set Q, that is,
equal to the Hilbert dimension of the subspace F.
Proof. It is clear that two invertible systems are spectrally equivalent if and
only if they admit the same spectral representation. By Proposition 8.4.10, this
happens if and only if the value of the cardinal χ is the same, that is, if the rank
is the same.
Proof. As we saw in the previous section, all Bernoulli shifts of countable type
have countable rank.
Proofs of the facts that are quoted in the following may be found in
Mañé [Mañ87, Section II.10]:
Example 8.4.13 (Gaussian shifts). Let A = (ai,j )i,j∈Z be an infinite real matrix.
We say that A is positive definite if every finite restriction Am,n = (ai,j )m≤i,j<n
is positive definite, for any m < n. We say that A is symmetric if ai,j = aj,i
for any i, j ∈ Z. Let μ be a Borel probability measure on = RZ (similar
considerations hold for = RN ). We say that μ is a Gaussian measure if there
exists some symmetric positive definite matrix A such that μ([m; Bm , . . . , Bn−1 ])
is equal to
1 1 1 −1
exp − (Am,n z · z) dz
(det Am,n )1/2 (2π )(n−m)/2 Bm ×···×Bn−1 2
for any m < n and any measurable sets Bm , . . . , Bn−1 ⊂ R. The reason for the
factor on the left-hand side is explained in Exercise 8.4.4. A is called the
covariance matrix of μ. It is uniquely determined by
ai,j = xi xj dμ(x) for each i, j ∈ Z.
232 Equivalent systems
For each symmetric positive definite matrix A there exists a unique Gaussian
probability measure μ that has A as its covariance matrix. Moreover, μ is
invariant under the shift map σ : → if and only if ai,j = ai+1,j+1 for any
i, j ∈ Z. In that case, the properties of the system (σ , μ) are directly related to
the behavior of the covariance sequence
αn = an,0 = Uσn x0 · x0 for each n ≥ 0.
In particular, (f , μ) is mixing if and only if the covariance sequence converges
to zero.
Now, Exercise 8.4.5 shows that if (f , μ) has Lebesgue spectrum then the
covariance sequence is generated by some absolutely continuous probability
measure ν on the unit circle, in the following sense:
αn = zn dν(z) for each n ≥ 0.
8.4.3 Exercises
8.4.1. Show that every mixing Markov shift has Lebesgue spectrum with countable
rank. [Observation: In Section 9.5.3 we mention stronger results.]
8.4.2. Let μ be the Haar measure on Td and fA : Td → Td be a surjective endomorphism.
Assume that no eigenvalue of the matrix A is a root of unity. Check that every
orbit of At in the set Zd \ {0} is infinite and use this fact to conclude that (fA , μ)
has Lebesgue spectrum. Conversely, if (fA , μ) has Lebesgue spectrum then no
eigenvalue of A is a root of unity.
8.4.3. Complete the proof of Proposition 8.4.6, using Exercise 2.3.6 to reduce the
general case to the invertible
√ one.
−x2 /2
8.4.4. Check that R e dx = 2π. Use this fact to show that if A is a symmetric
positive definite matrix of dimension d ≥ 1 then
exp − (A−1 z · z)/2 dz = (det A)1/2 (2π )d/2 .
Rd
8.4.5. Let (f , μ) be an invertible system with Lebesgue spectrum. Show that for every
ϕ ∈ L02 (μ) there exists a probability measure ν absolutely continuous with respect
to the Lebesgue measure λ on the unit circle {z ∈ C : |z| = 1} and such that
Ufn ϕ · ϕ = zn dν(z) for every n ∈ Z.
8.4.6. Let λ be the Lebesgue measure in the unit circle. Consider the linear operator
F : L1 (λ) → c0 defined by
F(ϕ) = z ϕ(z) dλ(z) .
n
n
8.5 Lebesgue spaces and ergodic isomorphism 233
Show that F is continuous and injective but not surjective. Therefore, not every
sequence of complex numbers (αn )n converging to zero may be written as αn =
zn dν(z) for n ≥ 0, for some probability measure ν absolutely continuous with
respect to λ.
The definition does not depend on the representation of the simple function
as a linear combination of characteristic functions (Exercise 8.5.1). Moreover,
L(ϕ) = ϕ for every simple function. Recall that the set of simple functions
is dense in L2 (ν). Then, by continuity, L extends uniquely to a linear isometry
defined on the whole of L2 (ν). Observe that this isometry is invertible: the
inverse is constructed in the same way, starting from the inverse of H. Finally,
Uf ◦ L(XC ) = Uf (XH(C) ) = Xf̃ (H(C)) = XH(g̃(C)) = L(Xg̃(C) ) = L ◦ Ug (XC )
˜ By linearity, it follows that Uf ◦ L(ϕ) = L ◦ Ug (ϕ) for every
for every C ∈ C.
simple function; then, by continuity, the same holds for every ϕ ∈ L2 (ν).
N
N
m(I) = αn μ(P) = αn μ(Pj ) > αn+1 μ(Pj ).
j=1 j=1
Then, define ψn+1 (Pj ) = Ij for each j = 1, . . . , N. Repeating this procedure for
each P ∈ Pn , we complete the definition of ψn+1 and In+1 . It is clear that the
conditions (i), (ii), (iii) are preserved. This finishes the construction.
236 Equivalent systems
where the union is over all the Q ∈ Pk that are contained in P. To get the
inclusion ⊂ it suffices to note that ι(P) = Q ι(Q) and ι(Q) ⊂ ψ(Q) for every
Q ∈ Pk and every k. The converse follows from the fact that ι(P) intersects
every ψk (Q) (the intersection contains ι(Q)) and the length of the ψk (Q)
converges to zero when k → ∞. In this way, (8.5.2) is proven. It follows that
m(ι(P)) = lim m ψk (Q) = lim αk μ(Q) = lim αk μ(P) = μ(P).
k k k
Q Q
Moreover, (8.5.2) means that ι(P) = ∞ k=n I I, where the union is over all
the I ∈ Ik that are contained in ψn (P). The right-hand side of this equality
coincides with K ∩ ψn (P) and, hence, is an open and closed subset of K. It also
follows from the construction that ι−1 (ι(P)) = P. Consequently, ι∗ μ(ι(P)) =
μ(P) = m(ι(P)) for every P ∈ n Pn . Since the algebra of finite pairwise
disjoint unions of sets ι(P) generates the measurable structure of K, we
conclude that ι∗ μ = m | K.
Pn+1 = {K1 , Q1 \ K1 , . . . , Km , Qm \ Km }.
It is clear that Pn+1 satisfies (i) and (ii). Therefore, our construction is
complete.
All that is left is to show that the existence of such a sequence (Pn )n
implies the conclusion of the theorem. Property (i) ensures that the sequence is
separating. Let ι : MP → K be a map as in Proposition 8.5.3. Fix any N ≥ 1 and
consider any point y ∈ K \ ι(MP ). For each n > N, let In be the interval in the
family In that contains y and let Pn be the element of Pn such that ψn (Pn ) = In .
Note that (Pn )n is a decreasing sequence. If they were all compact, there would
be x ∈ n>N Pn and, by definition, ι(x) would be equal to y. Since we are
assuming that y is not in the image of ι, this proves that there exists l > N such
that Pl is not compact. Take l > N minimum and let Il = ψl (Pl ). Recall that
m(Il ) = αl μ(Pl ) ≤ 2μ(Pl ). Let ĨN and P̃N be the unions of all these Il and Pl ,
respectively, when we vary y on the whole K \ ι(MP ). On the one hand, ĨN
contains K \ ι(MP ); on the other hand, P̃N is contained in l>N El . Moreover,
238 Equivalent systems
the Il are pairwise disjoint (because we took l minimum) and the same holds
for the Pl . Hence,
m ĨN ≤ 2μ P̃N ≤ 2μ El ≤ 2−N+1 .
l>N
Then the intersection N ĨN has Lebesgue measure zero and contains K \
ι(MP ). Since K is a Borel set, this shows that ι(MP ) is a Lebesgue measurable
set.
The next result implies that all the Lebesgue spaces with no atoms are
isomorphic:
Proposition 8.5.5. If (M, B, μ) is a Lebesgue space with no atoms, there exists
an invertible measurable map h : M → [0, 1] (defined between subsets of full
measure) such that h∗ μ coincides with the Lebesgue measure on [0, 1].
A ⊂ M denotes the set of atoms of the measure μ, then there exists an invertible
measurable map h : M → [0, 1 − μ(A)] ∪ A such that h∗ μ coincides with m on
the interval [0, 1 − μ(A)] and coincides with μ on A.
Proposition 8.5.6. Let (M, B, μ) and (N, C, ν) be two Lebesgue spaces and
H : C˜ → B̃ be an isomorphism between the corresponding measure algebras.
Then there exists an invertible measurable map h : M → N such that h∗ μ = ν
and H = h̃ for every C ∈ C. ˜ Moreover, h is essentially unique: any two maps
satisfying these conditions coincide at μ-almost every point.
We are going to sketch the proof of this proposition in the non-atomic case.
The arguments are based on the ideas and use the notations in the proof of
Proposition 8.5.3.
Let us start with the uniqueness claim. Let h1 , h2 : M → N be any two maps
such that (h1 )∗ μ = (h2 )∗ μ = ν. Suppose that h1 (x) = h2 (x) for every x in a
set E ⊂ M with μ(E) > 0. Let (Qn )n be a separating sequence in (N, C, ν).
Then Qn (h1 (x)) = Qn (h2 (x)) for every x ∈ E and every n sufficiently large.
Hence, we may fix n (large) and E ⊂ E with μ(E ) > 0 such that Qn (h1 (x)) =
Qn (h2 (x)) for every x ∈ E . Consequently, there exist Q ∈ Qn and E ⊂ E
with μ(E ) > 0 such that Qn (h1 (x)) = Q and Qn (h2 (x)) = Q for every x ∈ E .
Therefore, E ⊂ h−1 −1
1 (Q) \ h2 (Q). This implies that h̃1 (Q) = h̃2 (Q) and, hence,
h̃1 = h̃2 .
Next we comment on the existence claim. Let (Pn )n and (Qn )n be separating
sequences in (M, B, μ) and (N, C, ν), respectively. Define Pn = Pn ∨H(Qn ) and
Qn = Qn ∨ H −1 (Pn ). Then (Pn )n and (Qn )n are also separating sequences and
Pn = H(Qn ) for each n. Let ι : MP → K be a map as in Proposition 8.5.3 and
ψn : Pn → In , n ≥ 1 be the family of bijections used in its construction. Let
j : NQ → L and ϕn : Qn → Jn be corresponding objects for (N, C, ν). Since
we are assuming that (M, B, μ) and (N, C, ν) are Lebesgue spaces, ι and j are
invertible maps over subsets with full measure. Recall also that m(ψn (P)) =
αn μ(P) for each P ∈ Pn and, analogously, m(ϕn (Q)) = αn ν(Q) for each Q ∈ Qn .
Hence, m(ψn (P)) = m(ϕn (Q)) if P = H(Q). Then, for each n,
ψn ◦ H ◦ ϕn−1 : Jn → In (8.5.3)
Proof. We only need to show that if the systems are ergodically isomorphic
then they are ergodically equivalent. Let H : C˜ → B̃ be an ergodic isomorphism.
By Proposition 8.5.6, there exists an invertible measurable map h : M → N such
that h∗ μ = ν and H = h̃. Then,
h2
◦ f = f̃ ◦ h̃ = f̃ ◦ H = H ◦ g̃ = h̃ ◦ g̃ = g2
◦ h.
By the uniqueness part of Proposition 8.5.6, it follows that h ◦ f = g ◦ h at
μ-almost every point. This shows that h is an ergodic equivalence.
8.5.3 Exercises
8.5.1. Let H : C˜ → B̃ be a homomorphism of measure algebras. Show that
l
k
l
k
bi XBi = cj XCj ⇒ bi XH(Bi ) = cj XH(Cj ) .
i=1 j=1 i=1 j=1
The word entropy was invented in 1865 by the German physicist and mathe-
matician Rudolf Clausius, one of the founding pioneers of thermodynamics. In
the theory of systems in thermodynamical equilibrium, the entropy quantifies
the degree of “disorder” in the system. The second law of thermodynamics
states that, when an isolated system passes from an equilibrium state to another,
the entropy of the final state is necessarily bigger than the entropy of the initial
state. For example, when we join two containers with different gases (oxygen
and nitrogen, say), the two gases mix with one another until reaching a new
macroscopic equilibrium, where they are both uniformly distributed in the two
containers. The entropy of the new state is larger than the entropy of the initial
equilibrium, where the two gases were separate.
The notion of entropy plays a crucial role in different fields of science.
An important example, which we explore in our presentation, is the field of
information theory, initiated by the work of the American electrical engineer
Claude Shannon in the mid 20th century. At roughly the same time, the
Russian mathematicians Andrey Kolmogorov and Yakov Sinai were proposing
a definition of the entropy of a system in ergodic theory. The main purpose
was to provide an invariant of ergodic equivalence that, in particular, could
distinguish two Bernoulli shifts. This Kolmogorov–Sinai entropy is the subject
of this chapter.
In Section 9.1 we define the entropy of a transformation with respect to an
invariant probability measure, by analogy with a similar notion in information
theory. The theorem of Kolmogorov–Sinai, which we discuss in Section 9.2, is
a fundamental tool for the actual calculation of the entropy in specific systems.
In Section 9.3 we analyze the concept of entropy from a more local viewpoint,
which is more closely related to Shannon’s formulation of this concept. Next,
in Section 9.4, we illustrate a few methods for calculating the entropy, by
means of concrete examples.
In Section 9.5 we discuss the role of the entropy as an invariant of
ergodic equivalence. The highlight is the theorem of Ornstein (Theorem 9.5.2),
according to which any two-sided Bernoulli shifts are ergodically equivalent
9.1 Definition of entropy 243
if and only if they have the same entropy. In that section we also introduce
the class of Kolmogorov systems, which contains the Bernoulli shifts and is
contained in the class of systems with Lebesgue spectrum. In both cases the
inclusion is strict.
In the last couple of sections we present two complementary topics that will
be useful later. The first one (Section 9.6) is the theorem of Jacobs, according
to which the entropy behaves in an affine way with respect to the ergodic
decomposition. The other (Section 9.7) concerns the notion of the Jacobian
and its relations with the entropy.
where pa1 ...an denotes the probability of the word. In the independent case this
coincides with the product pa1 . . . pan of the probabilities of the symbols, but
not in general. Denoting by An the set of all the words of length n, we define
I(An ) = pa1 ...an I(a1 , . . . , an ) = −pa1 ...an log pa1 ...an . (9.1.4)
a1 ,...,an a1 ,...,an
We invite the reader to check that the sequence I(An ) is subadditive and, thus,
the limit in (9.1.5) does exist. This is also contained in the much more general
theory that we are about to present.
prematurely, since at that point the remaining letters could add no information: there is only
one mathematical object whose name includes the letter Z three times (the Yoccoz puzzle).
9.1 Definition of entropy 245
1
Figure 9.1. Graph of the function φ(x) = −x log x
246 Entropy
Lemma 9.1.3. Every finite partition P has finite entropy: Hμ (P) ≤ log #P and
the identity holds if and only if μ(P) = 1/#P for every P ∈ P.
of the sum. Then, we may partition [0, 1] into intervals Pk with μ(Pk ) =
1/(ck(log k)2 ) for every k. Let P be the partition formed by these subintervals.
Then,
∞
log c + log k + 2 log log k
Hμ (P) = 2
.
k=1
ck(log k)
9.1 Definition of entropy 247
By the ratio convergence criterion, the series on the right-hand side has the
same behavior as the series ∞ k=1 1/(k log k) which, as we know (use the
integral convergence criterion), is divergent. Therefore, Hμ (P) = ∞.
This shows that infinite partitions may have infinite entropy. From now, for
the rest of the chapter, we always consider (countable) partitions with finite
entropy.
The conditional entropy of a partition P with respect to another partition Q
is the number
μ P∩Q
Hμ (P/Q) = −μ P ∩ Q log . (9.1.9)
P∈P Q∈Q
μ(Q)
Proof. By definition,
μ P∩Q∩R
Hμ P ∨ Q/R = −μ P ∩ Q ∩ R log
P,Q,R
μ(R)
μ P∩Q∩R
= −μ P ∩ Q ∩ R log
P,Q,R
μ P ∩ R
μ P∩R
+ −μ P ∩ Q ∩ R log .
P,Q,R
μ(R)
This proves the first half of claim (ii). To prove the second half, note that for
any P ∈ P and R ∈ R,
μ R∩P μ(Q) μ R ∩ Q
= .
μ(P) Q⊂P
μ(P) μ(Q)
It is clear that Q⊂P μ(Q)/μ(P) = 1. Therefore, by (9.1.8),
μ(Q) μ R ∩ Q
μ R∩P
φ ≥ φ
μ(P) Q⊂P
μ(P) μ(Q)
Finally, it follows from the definition in (9.1.9) that Hμ (P/Q) = 0 if and only if
μ P∩Q
μ P ∩ Q = 0 or else =1
μ(Q)
for every P ∈ P and every Q ∈ Q. In other words, either Q is disjoint from P
(up to measure zero) or else Q is contained in P (up to measure zero). This
means that Hμ (P/Q) = 0 if and only if P ≺ Q.
Therefore, ε
Hμ (R) = φ(μ(R)) < #R ≤ ε.
R∈R
k2
It is clear from the definition that P ∨ Q = P ∨ R. Then, using (9.1.11) and
(9.1.10),
Hμ (Q/P) = Hμ P ∨ Q − Hμ (P) = Hμ P ∨ R − Hμ (P)
= Hμ (R/P) ≤ Hμ (R) < ε.
This proves the lemma.
In view of Lemma 3.3.4, it follows from Lemma 9.1.7 that the limit
1
hμ (f , P) = lim Hμ (P n ) (9.1.15)
n n
exists and coincides with the infinitum of the sequence on the left-hand side.
We call hμ (f , P) the entropy of f with respect to the partition P. Observe that
this entropy is a non-decreasing function of the partition:
P ≺Q ⇒ hμ (f , P) ≤ hμ (f , Q). (9.1.16)
where the supremum is taken over all the partitions with finite entropy. A useful
observation is that the definition is not affected if we take the supremum only
over the finite partitions (see Exercise 9.1.2).
Example 9.1.9. Consider the decimal expansion map f : [0, 1] → [0, 1], given
by f (x) = 10x − [10x]. As observed previously, f preserves the Lebesgue
measure μ on the interval. Let P be the partition of [0,n1] into the intervals
of the form (i − 1)/10, i/10]
with i =n1, . . . ,n10. Then, P is the partition into
the intervals of the form (i − 1)/10 , i/10 ] with i = 1, . . . , 10n . Using the
9.1 Definition of entropy 251
Using the theory in Section 9.2 (the theorem of Kolmogorov–Sinai and its
corollaries), one can easily check that this is also the value of the entropy hμ (f ),
that is, P realizes the supremum in the definition (9.1.17).
1 d
hμ (σ , P) = lim Hμ (P ) =
n
−pi log pi . (9.1.18)
n n
i=1
The theory presented in Section 9.2 permits us to prove that this is also the
value of the entropy hμ (σ ).
It follows from expression (9.1.18) that for every x > 0 there exists some
Bernoulli shift (σ , μ) such that hμ (σ ) = x. We use this observation a few times
in what follows.
Dividing by n and taking the limit when n → ∞, we get the conclusion of the
lemma.
*n −j
Lemma 9.1.12. hμ (f , P) = limn Hμ (P/ j=1 f (P)) for any partition P with
finite entropy.
252 Entropy
Proof. Using Lemma 9.1.5(i) and the fact that the measure μ is invariant
under f , we get that
n−1
) n−1
) n−1
)
−j −j −j
Hμ f (P) = Hμ f (P) + Hμ P/ f (P)
j=0 j=1 j=1
n−2
) )
n−1
−j −j
= Hμ f (P) + Hμ P/ f (P)
j=0 j=1
Therefore,
1 1
hμ f , P k = lim Hμ P n+k−1 = lim Hμ P n = hμ f , P .
n n n n
This proves the first part of the lemma. To prove the second part, note that
)
n−1 ) )
n−1 k−1 n+k−2
)
−j ±k −j −i
f (P ) = f f (P) = f −l (P) = f −k P n+2k−1
j=0 j=0 i=−k l=−k
Therefore,
1
khμ f , P = lim Hμ P km = hμ g, P k . (9.1.20)
m m
Since P ≺ P k , this implies that hμ (g, P) ≤ khμ (f , P) ≤ hμ (g) for any P. Taking
the supremum over these partitions P, it follows that hμ (g) ≤ khμ (f ) ≤ hμ (g).
This proves that khμ (f ) = hμ (g), as stated.
Now suppose that f is invertible. Let P be any partition of M with finite
entropy. For any n ≥ 1,
+ n−1
) % + + n−1
) %% + n−1
) %
−j −n+1
Hμ f (P) = Hμ f f (P)
i
= Hμ f i (P) ,
j=0 i=0 i=0
because the measure μ is invariant under f . Dividing by n and taking the limit
when n → ∞, we get that
hμ (f , P) = hμ (f −1 , P). (9.1.21)
Taking the supremum over these partitions P, it follows that hμ (f ) = hμ (f −1 ).
Replacing f with f k and using the first half of the proposition, we get that
hμ (f −k ) = hμ (f k ) = khμ (f ) for every k ∈ N.
9.1.4 Exercises
9.1.1. Prove that Hμ (P/R) ≤ Hμ (P/Q) + Hμ (Q/R) for any partitions P, Q and R.
9.1.2. Show that the supremum of hμ (f , P) over the finite partitions coincides with the
supremum over all the partitions with finite entropy.
* −i
*n −j
9.1.3. Check that limn Hμ ( k−1 i=0 f (P)/ j=k f (P)) = kh(f , P) for every partition P
with finite entropy and every k ≥ 1.
9.1.4. Let f : M → M be a measurable transformation preserving a probability
measure μ.
(a) Assume that there exists an invariant set A ⊂ M with μ(A) ∈ (0, 1). Let
μA and μB be the normalized restrictions of μ to the sets A and B = Ac ,
respectively. Show that hμ (f ) = μ(A)hμA (f ) + μ(B)hμB (f ).
(b) Suppose that μ is a convex combination μ = ni=1 ai μi of ergodic measures
μ1 , . . . , μn . Show that hμ (f ) = ni=1 ai hμi (f ).
[Observation: In Section 9.6 we prove much stronger results.]
9.1.5. Let (M, B, μ) and (N, C, ν) be probability spaces and f : M → M and g : N → N be
measurable transformations preserving the measures μ and ν, respectively. We
254 Entropy
say that (g, ν) is a factor of (f , μ) if there exists a measurable map, not necessarily
invertible, φ : (M, B) → (N, C) such that φ∗ μ = ν and φ ◦ f = g ◦ φ at almost every
point. Show that in that case hν (g) ≤ hμ (f ).
Proof. The limit always exists, for property (9.1.16) implies that the sequence
hμ (f , Pn ) is non-decreasing. The inequality ≥ in the statement is a direct
consequence of the definition of entropy. Therefore, we only need to show
that hμ (f , Q) ≤ limn hμ (f , Pn ) for every partition Q with finite entropy. We use
the following fact, which is interesting in itself:
Proposition 9.2.2. Let A be an algebra that generates the σ -algebra of
measurable sets, up to measure zero. For every partition Q with finite
entropy and every ε > 0 there exists some finite partition P ⊂ A such that
Hμ (Q/P) < ε.
Proof. The first step is to reduce the statement to the case when Q is finite.
Denote by Qj , j = 1, 2, . . . the elements of Q. For each k ≥ 1, consider the finite
partition
k
Qk = Q1 , . . . , Qk , M \ Qj .
j=1
All the terms with i ≥ 1 vanish, since in that case μ(Qi ∩ Qj ) is equal to zero if
i = j and is equal to μ(Qi ) if i = j. For i = 0 we have that μ(Qi ∩ Qj ) is equal
to zero if j ≤ k and is equal to μ(Qj ) if j > k. Therefore,
μ(Qj )
Hμ (Q/Qk ) = −μ(Qj ) log ≤ −μ(Qj ) log μ(Qj ).
j>k
μ(Q 0 ) j>k
9.2 Theorem of Kolmogorov–Sinai 255
The hypothesis that Q has finite entropy means that the expression on the
right-hand side converges to zero when k → ∞.
Given ε > 0, fix k ≥ 1 such that Hμ (Q/Qk ) < ε/2. Consider any δ > 0. By
the approximation theorem (Theorem A.1.19), for each i = 1, . . . , k there exists
Ai ∈ A such that
μ(Qi Ai ) < δ/(2k2 ). (9.2.1)
Define P1 = A1 and Pi = Ai \ i−1 j=1 Aj for i = 2, . . . , k and P0 = M \
k−1
j=1 Aj .
It is clear that P = {P1 , . . . , Pk , P0 } is a partition of M and also that Pi ∈ A for
i−1
every i. For i = 1, . . . , k, we have Pi Ai = Pi \ Ai = Ai ∩ j=1 Aj . For any x
in this set, there is j < i such that x ∈ Ai ∩ Aj . Since Qi ∩ Qj = ∅, it follows that
x ∈ (Ai \ Qi ) ∪ (Aj \ Qj ). This proves that
i i
Pi Ai ⊂ (Aj \ Qj ) ⊂ (Aj Qj ),
j=1 j=1
and so μ(Pi Ai ) < iδ/(2k2 ) ≤ δ/(2k). Together with (9.2.1), this implies that
μ(Pi Qi ) < δ/(2k2 ) + δ/(2k) ≤ δ/k for i = 1, . . . , k. (9.2.2)
k
Moreover, P0 Q0 ⊂ i=1 Pi Qi since P and Qk are partitions of M. Hence,
(9.2.2) implies that
μ(P0 Q0 ) < δ. (9.2.3)
By Lemma 9.1.6, the relations (9.2.2) and (9.2.3) imply that Hμ (Qk /P) <
ε/2, as long as we take δ > 0 sufficiently small. Then, by the inequality in
Exercise 9.1.1,
Hμ (Q/P) ≤ Hμ (Q/Qk ) + Hμ (Qk /P) < ε,
as stated.
Proof. For each n, let An be the algebra generated by nj=1 Pj . Then (An )n is a
non-decreasing sequence and the union A = n An is the algebra generated by
n Pn . Consider any ε > 0. By Proposition 9.2.2, there exists a finite partition
P ⊂ A such that Hμ (Q/P) < ε. Hence, since P is finite, there exists m ≥ 1
such that P ⊂ Am and, thus, P is coarser than Pm . Then, using Lemma 9.1.5,
Hμ (Q/Pn ) ≤ Hμ (Q/Pm ) ≤ Hμ (Q/P) < ε for every n ≥ m.
This completes the proof of the corollary.
Corollary 9.2.5. Let P be a partition with finite entropy such that the union of
*n−1 −j
the iterates P n = j=0 f (P), n ≥ 1 generates the σ -algebra of measurable
sets, up to measure zero. Then, hμ (f ) = hμ (f , P).
Proof. It suffices to apply Theorem 9.2.1 to the sequence P n and to recall that
hμ (f , P n ) = hμ (f , P) for every n, by Lemma 9.1.13.
Corollary 9.2.7. Assume that the system (f , μ) is invertible and there exists a
partition P with finite entropy such that ∞n=1 P generates the σ -algebra of
n
Proof. Let U be any non-empty open subset of M. The hypothesis ensures that
for each x ∈ U there exists n(x) such that the set Px = Pn(x) (x) is contained in
U. It is clear that Px belongs to the algebra A generated by n Pn . Observe
also that A is countable, since it consists of the finite unions of elements of the
partitions Pn together with the complements of such unions. In particular, the
map x → Px takes only countably many values. It follows that U = x∈U Px is
in the σ -algebra generated by A. This proves that the σ -algebra generated by
A contains all the open sets and, thus, all the Borel sets. Now, the conclusion
follows directly from Theorem 9.2.1.
Corollary 9.2.10. Let P be a partition with finite entropy such that we have
diam P n (x) → 0 for μ-almost every x ∈ M. Then, hμ (f ) = hμ (f , P).
satisfies μ(∂P) = 0. By Theorem 2.1.2 or, more precisely, by the fact that the
topology (2.1.5) is equivalent to the weak∗ topology, the function ν → ν(P) is
continuous at the point μ, for every P ∈ P. Consequently, the function
ν → Hν (P) = −ν(P) log ν(P)
P∈P
Proposition 9.2.12. Let P be a finite partition such that μ(∂P) = 0. Then the
function ν → hν (f , P) is upper semi-continuous at μ.
Corollary 9.2.13. Assume that there exists a finite partition P such that
μ(∂P) = 0 and n P n generates the σ -algebra of measurable sets of M, up to
measure zero. Then the function η → hη (f ) is upper semi-continuous at μ.
Corollary 9.2.14. Assume that there exists ε0 > 0 such that every finite
partition P with diam P < ε0 satisfies limn diam P n = 0. Then, the functionμ →
260 Entropy
2 For example: for each x choose r ∈ (0, ε ) such that the boundary of the ball of center x and
x 0
radius rx has measure zero. Then let U be a finite cover of M by such balls and take for P the
partition defined by U , that is, the partition whose elements are the maximal sets that, for each
U ∈ U , are either contained in U or disjoint from U; see Figure 2.1.
9.2 Theorem of Kolmogorov–Sinai 261
We leave it to the reader to check (Exercise 9.2.1) that the decimal expansion
transformation f (x) = 10x − [10x] is also expansive. On the other hand, torus
rotations are never expansive.
Proposition 9.2.16. Let f : M → M be an expansive transformation in a
compact metric space and let ε0 > 0 be a constant of expansivity. Then
limn diam P n = 0 for every finite partition P with diam P < ε0 .
Proof. It is clear that the sequence diam P n is non-increasing. Suppose that its
infimum δ is positive. Then, for every n ≥ 1 there exist points xn and yn such
that d(xn , yn ) > δ/2 but xn and yn belong to the same element of P n and, thus,
satisfy
d(f j (xn ), f j (yn )) ≤ diam P < ε0 for every 0 ≤ j < n.
By compactness, there exists a sequence (nj )j → ∞ such that (xnj )j and
(ynj )j converge to points x and y, respectively. Then, d(x, y) ≥ δ/2 > 0 but
d(f j (x), f j (y)) ≤ diam P < ε0 for every j ≥ 0. This contradicts the hypothesis
that ε0 is a constant of expansivity.
9.2.4 Exercises
9.2.1. Show that the decimal expansion f : [0, 1] → [0, 1], f (x) = 10x − [10x] is
expansive and exhibit a constant of expansivity.
9.2.2. Check that for every s > 0 there exists some Bernoulli shift (σ , μ) whose entropy
is equal to s.
9.2.3. Let X = {0} ∪ {1/n : n ≥ 1} and consider the space = X N endowed with the
distance d((xn )n , (yn )n ) = 2−N |xN − yN |, where N = min{n ∈ N : xn = yn }.
(a) Verify that the shift map σ : → is not expansive.
(b) For each k ≥ 1, let νk be the probability measure on X that assigns weight
1/2 to each of the points 1/k and 1/(k + 1). Use the Bernoulli measures
μk = νkN to conclude that the entropy function of the shift is not upper
semi-continuous.
(c) Let μ be the Bernoulli measure associated with any probability vector
(px )x∈X such that x∈X −px log px = ∞. Show that hμ (σ ) is infinite.
9.2.4. Let f : S1 → S1 be a covering map of degree d ≥ 2 and μ be a probability measure
invariant under f . Show that hμ (f ) ≤ log d.
262 Entropy
9.2.5. Let P and Q be two partitions with finite entropy. Show that if P is coarser than
*∞ −j
j=0 f (Q) then hμ (f , P) ≤ hμ (f , Q).
9.2.6. Show that if A is an algebra that generates the σ -algebra of measurable sets, up
to measure zero, then the supremum of hμ (f , P) over the partitions with finite
entropy (or even the finite partitions) P ⊂ A coincides with hμ (f ).
9.2.7. Consider transformations f : M → M and g : N → N preserving probability
measures μ and ν, respectively. Consider f × g : M × N → M × N defined by
(f × g)(x, y) = (f (x), g(y)). Show that f × g preserves the product measure μ × ν
and that hμ×ν (f × g) = hμ (f ) + hν (g).
Recall that P n (x) = P(x) ∩ f −1 (P(f (x))) ∩ · · · ∩ f −n+1 (P(f n−1 (x))), that
is, P n (x) is formed by the points whose trajectories remain “close” to the
trajectory of x during n iterates, in the sense that they visit the same elements
of P. Theorem 9.3.1 states that the measure of this set has a well-defined
exponential rate of decay: at μ-almost every point,
μ(P n (x)) ≈ e−nhμ (f ,P,x) for every large n.
The proof of the theorem is presented in Section 9.3.1.
The theorem of Brin–Katok that we state in the sequel belongs to the same
family of results, but uses a different notion of proximity.
In other words,
B(x, n, ε) = B(x, ε) ∩ f −1 (B(f (x), ε)) ∩ · · · ∩ f −n+1 (B(f n−1 (x), ε)).
Define
1
h+
μ (f , ε, x) = lim sup − log μ(B(x, n, ε)) and
n n
1
h−
μ (f , ε, x) = limninf − log μ(B(x, n, ε)).
n
Theorem 9.3.3 (Brin–Katok). Let μ be a probability measure invariant under
f . The limits
lim h+ μ (f , ε, x) and lim h−
μ (f , ε, x)
ε→0 ε→0
exist and are equal, for μ-almost every point. Denoting by hμ (f , x) their
common value, the function hμ (f , ·) is integrable and
hμ (f ) = hμ (f , x)dμ(x).
The proof of this result may be found in the original paper of Brin and
Katok [BK83], and is not presented here.
Example 9.3.4 (Translations in compact groups). Let G be a compact
metrizable group and μ be its Haar measure. Every translation of G has zero
entropy with respect to μ. Indeed, consider in G any distance d that is invariant
under all the translations (recall Lemma 6.3.6). Relative to such a distance,
Lgj (B(x, ε)) = B(Lgj (x), ε)
for every g ∈ G and j ∈ Z. Consequently, B(x, n, ε) = B(x, ε) for every n ≥ 1.
Then,
1
h± μ (Lg , ε, x) = lim − log μ(B(x, ε)) = 0
n n
for every ε > 0 and x ∈ G. By the theorem of Brin–Katok, it follows
that hμ (Lg ) = 0 for every g ∈ G. The same argument applies to every
right-translation Rg .
1
n−2
1 1
− log μ(P n (x)) = − log μ(P(f n−1 (x))) + ϕn−j (f j (x)) (9.3.2)
n n n j=0
Lemma 9.3.5. The sequence −n−1 log μ(P(f n−1 (x))) converges to zero μ-
almost everywhere and in L1 (μ).
Using Lemma 3.2.5, it follows that −(n − 1)−1 log μ(P(f n−1 (x))) converges to
zero at μ-almost every point. Moreover, it is clear that this conclusion is not
affected if one replaces n − 1 by n in the denominator. This proves the claim
of μ-almost everywhere convergence. Next, using the fact that the measure μ
is invariant and Hμ (P) < ∞,
1 1 1
− log μ(P(f n−1 (x))) = − log μ(P(f n−1 (x))) dμ(x) = Hμ (P)
n n n
1
Next, we show that the last term in (9.3.2) also converges μ-almost
everywhere and in L1 (μ).
Lemma 9.3.6. The limit ϕ(x) = limn ϕn (x) exists at μ-almost every point.
exists for μ-almost every point. Taking logarithms, we get that limn ϕn (x) exists
for μ-almost every point, as stated.
9.3 Local entropy 265
and, in that case, ϕn (y) > t for every y ∈ P ∩ Qn (x). Therefore, we may write
the set {x ∈ P : (x) > t} as a disjoint union j (P ∩ Qj ), where each Qj belongs
to some partition Qn(j) and
μ P ∩ Qj < e−t μ(Qj ) for every j.
1 1
n−2 n−2
lim ϕn−j (f (x)) = lim
j
ϕ(f j (x)).
n n n n
j=0 j=0
Proof. By the Birkhoff ergodic theorem (Theorem 3.2.3), the limit on the
right-hand side exists at μ-almost every point and in L1 (μ), indeed, it is equal
to the time average of the function ϕ. Therefore, it is enough to show that the
difference
1
n−2
(ϕn−j − ϕ) ◦ f j (9.3.5)
n j=0
converges to zero at μ-almost every point and in L1 (μ). Since the measure μ
is invariant, (ϕn−j − ϕ) ◦ f j 1 = ϕn−j − ϕ1 for every j. Hence,
n−2
1 1
n−2
j
(ϕn−j − ϕ) ◦ f ≤ ϕn−j − ϕ1 .
n n
j=0 1 j=0
By Lemma 9.3.8 the sequence on the right-hand side converges to zero. This
implies that (9.3.5) converges to zero in L1 (μ). We are left to prove that the
sequence converges at μ-almost every point.
For each fixed k ≥ 2, consider k = supi>k |ϕi − ϕ|. Note that k ≤ and,
thus, k ∈ L1 (μ). Moreover,
1 1 1
n−2 n−k−1 n−2
|ϕn−j − ϕ| ◦ f =
j
|ϕn−j − ϕ| ◦ f +
j
|ϕn−j − ϕ| ◦ f j
n j=0 n j=0 n j=n−k
1 1
n−k−1 n−2
≤ k ◦ f +
j
◦ f j.
n j=0 n j=n−k
By the Birkhoff ergodic theorem, the first term on the right-hand side converges
˜ k at μ-almost every point. By Lemma 3.2.5, the last term
to the time average
converges to zero at μ-almost every point: the lemma implies that n−1 ◦ f n−i
converges to zero for any fixed i. Hence,
1
n−2
lim sup ˜ k (x)
|ϕn−j − ϕ|(f j (x)) ≤ at μ-almost every point. (9.3.6)
n n j=0
We claim that limk ˜ k (x) = 0 at μ-almost every point. Indeed, the sequence
(k )k is non-increasing and, by Lemma 9.3.6, it converges to zero at μ-almost
every point. By the monotone convergence theorem (Theorem A.2.9), it
follows that k dμ → 0 when k → ∞. Another consequence is that ( ˜ k )k
is non-increasing. Hence, using the monotone convergence theorem together
9.3 Local entropy 267
1
n−2
lim |ϕn−j − ϕ| ◦ f j = 0
n n
j=0
1
= lim Hμ (P n ) = hμ (f , P).
n n
9.3.2 Exercises
9.3.1. Check that, for any measurable function ϕ : M → (0, ∞),
∞
ϕ dμ = μ({x ∈ M : ϕ(x) > t}) dt.
0
9.3.2. Use Theorem 9.3.1 to calculate the entropy of a Bernoulli shift in = {1, . . . , d}N .
9.3.3. Show that the function hμ (f , x) in Theorem 9.3.3 is f -invariant. Conclude that if
(f , μ) is ergodic, then hμ (f ) = hμ (f , x) for μ-almost every x.
9.3.4. Suppose that (f , μ) is ergodic and let P be a partition with finite entropy. Show
that given ε > 0 there exists k ≥ 1 such that for every n ≥ k there exists Bn ⊂ P n
such that
9.4 Examples
In this section we illustrate the previous results through a few examples.
n−1
+ − log Paj ,aj+1 pa1 Pa1 ,a2 · · · Pan−1 ,an ,
j=1 aj ,aj+1
where the last sum is over all the values of a1 , . . . , aj−1 , aj+2 , . . . , an . On the one
hand,
Pa1 ,a2 · · · Pan−1 ,an = Pna1 ,an = 1,
a2 ,...,an an
This conclusion remains valid for two-sided Markov shifts as well, that is,
when = {1, . . . , d}Z . The argument is analogous, using Corollary 9.2.6.
for every n ≥ 1 and every Pn ∈ P n . This implies (B). Property (C) is given by
Lemma 4.2.13. Finally, (D) follows directly from (9.4.2).
Proposition 9.4.2. hμ (G) = log |G | dμ.
Proof. Consider the function ψn (x) = − log μ(P n (x)), for each n ≥ 1. Observe
that
Hμ (P ) =
n
−μ(Pn ) log μ(Pn ) = ψn (x) dμ(x).
Pn ∈P n
Now, property (B) ensures that we may apply Corollary 9.2.10 to conclude that
hμ (G) = hμ (G, P) = log |G | dμ.
1
−2 log x dx π2
hμ (G) = log |G | dμ = = ≈ 5.46 . . .
0 (1 + x) log 2 6 log 2
Then, recalling that (G, μ) is ergodic (Section 4.2.4), it follows from the
theorem of Shannon–McMillan–Breiman (Theorem 9.3.1) that
1 π2
lim − log μ(P n (x)) = for μ-almost every x.
n n 6 log 2
As the measure μ is comparable to the Lebesgue measure, up to a constant
factor, this means that
2
− 6πlogn2
diam P (x) ≈ e
n
(up to a factor e±εn )
for μ-almost every x and every n sufficiently large. Observe that P n (x) is
formed by the points y whose continued fraction expansion coincide with the
continued fraction expansion of x up to order n.
with t1 , . . . , td close to zero. Given ε > 0, denote by D(x, ε) the set of points y of
this form with |ti | < ε for every i = 1, . . . , d. Moreover, for each n ≥ 1, consider
j j
D(x, n, ε) = {y ∈ Td : fA (y) ∈ D(fA (x), ε) for every j = 0, . . . , n − 1}.
j j j
Observe that fA (y) = fA (x) + di=1 ti λi vi for every n ≥ 1. Therefore,
d
D(x, n, ε) = x + ti vi : |λi ti | < ε for i ≤ u and |ti | < ε for i > u .
n
i=1
Hence, there exists a constant C1 > 1 that depends only on A, such that
&
u &
u
C1−1 εd |λi | −n
≤ μ(D(x, n, ε)) ≤ C1 ε d
|λi |−n
i=1 i=1
for every x ∈ Td , n ≥ 1 and ε > 0. It is also clear that there exists a constant
C2 > 1 that depends only on A, such that
B(x, C2−1 ε) ⊂ D(x, ε) ⊂ B(x, C2 ε)
for x ∈ Td and ε > 0 small. Then, B(x, n, C2−1 ε) ⊂ D(x, n, ε) ⊂ B(x, n, C2 ε) for
every n ≥ 1. Combining these two observations and taking C = C1 C2d , we get
that
&u &
u
−n
−1 d
C ε |λi | ≤ μ(B(x, n, ε)) ≤ Cε d
|λ−n
i |
i=1 i=1
for μ-almost every point x. This proves Proposition 9.4.3 in the diagonalizable
case.
The general case may be treated analogously, through writing the matrix A
in canonical Jordan form. We leave this task to the reader (Exercise 9.4.2).
for every v ∈ Vxi \ Vxi+1 , every i ∈ {1, . . . , k(x)} and μ-almost every x ∈ M.
Moreover, all these objects depend measurably on the point x ∈ M. When the
measure μ is ergodic, the number k(x), the Lyapunov exponents λi (x) and their
multiplicities mi (x) = dim Vxi − dim Vxi+1 are all constant on a full measure set.
Define ρ + (x) to be the sum of all positive Lyapunov exponents, counted
with multiplicity:
k(x)
+
ρ (x) = mi (x)λ+
i (x) with λ+
i (x) = max{λi (x), 0}.
i=1
It may happen that all the Lyapunov exponents are positive: that is the
case, for instance, for the expanding differentiable maps in Section 11.1. Then,
ρ + (x) is simply the sum of all Lyapunov exponents, counted with multiplicity.
Now, it is also part of the theorem of Oseledets (property (c1) in Section 3.3.5)
that
k(x)
1
mi (x)λi (x) = lim log | det Df n (x)|
n n
i=1
9.4 Examples 273
at μ-almost every point. Observe that the right-hand side of this identity is a
Birkhoff time average:
1
n−1
1
lim log | det Df n (x)| = lim log | det Df |(f j (x)).
n n n n
j=0
So, by the Birkhoff ergodic theorem, the integral of ρ + coincides with the
integral of the function log | det Df |. Thus, in this case the Margulis–Ruelle
inequality becomes:
hμ (f ) ≤ log | det Df | dμ.
This fundamental result was originally proven by Pesin [Pes77]. See also
Mañé [Mañ87, Section 4.13] for an alternative proof.
The expression for the entropy of the Haar measure of a linear torus
endomorphism, given in Proposition 9.4.3, is a special case of Theorem 9.4.5.
Indeed, one can check that the Lyapunov exponents of a linear endomorphism
fA at every point coincide with the logarithms log |λi | of the absolute values
of the eigenvalues of the matrix A, with the same multiplicities. Thus, in this
context
d
+
ρ (x) ≡ log+ |λi |.
i=1
9.4.5 Exercises
9.4.1. Show that every rotation Rθ : Td → Td has entropy zero with respect to the Haar
measure of the torus Td . [Observation: This is a special case of Example 9.3.4
but for the present statement we do not need the theorem of Brin–Katok.]
9.4.2. Complete the proof of Proposition 9.4.3.
9.4.3. Let f : M → M be a measurable transformation and μ be an ergodic probability
measure. Let B ⊂ M be a measurable set with μ(B) > 0, g : B → B be the
first-return map of f to B and ν be the normalized restriction of μ to the set B
(recall Section 1.4.1). Show that hμ (f ) = ν(B)hν (g).
9.4.4. Let f : M → M be a measure-preserving transformation in a Lebesgue space
(M, μ). Let f̂ : M̂ → M̂ be the natural extension of f and μ̂ be the lift of μ
(Exercise 8.5.7). Show that hμ (f ) = hμ̂ (f̂ ).
9.4.5. Prove that if f is the time-1 of a smooth flow on a surface M then hμ (f ) = 0 for
every invariant ergodic measure μ. [Observation: Using Theorem 9.6.2 below, it
follows that the entropy is zero for every invariant measure.]
3 Unstable disks are differentiably embedded disks that are contracted exponentially under
negative iteration; in the non-invertible case, the definition is formulated in terms of the natural
extension of the transformation.
9.5 Entropy and equivalence 275
Since X and Y are both invariant, PXn is the restriction of P n to the subset X and
Qn = QnY ∪ {Y c } for every n. Moreover,
)
n−1 + n−1
) %
−j −j
QnY = g (QY ) = φ f (PX ) = φ(PXn )
j=0 j=0
for every n. Thus, the previous argument proves that Hν (Qn ) = Hμ (P n ) for
every n and so
1 1
hν (g, Q) = lim Hν (Qn ) = lim Hμ (P n ) = hμ (f , P).
n n n n
Using this observation, Kolmogorov and Sinai concluded that not all
two-sided Bernoulli shifts are ergodically equivalent despite their being
spectrally equivalent, as we saw in Corollary 8.4.12. This also shows that
spectral equivalence is strictly weaker than ergodic equivalence. In fact, as
observed in Exercise 9.2.2, for every s > 0 there exists some two-sided
Bernoulli shift (σ , μ) such that hμ (σ ) = s. Therefore, a sole class of spectral
equivalence may contain a whole continuum of ergodic equivalence classes.
Lemma 9.5.3. For every ε > 0 there exists δ > 0 such that if P and Q are
partitions with finite entropy and Hμ (P/Q) < δ then for every P ∈ P there
exists a union P of elements of Q satisfying μ(P P ) < ε.
for every n and, consequently, μ(P P∗ ) = 0. This shows that every element of
*
P coincides, up to measure zero, with a union of elements of ∞ −j
j=1 f (P), as
claimed in the “only if” half of the statement.
The argument to prove the converse is similar to the one in Proposition 9.2.2.
*
Suppose that P ≺ ∞ −j
j=1 f (P). Write P = {Pj : j = 1, 2, . . . } and, for each k ≥ 1,
consider the finite partition Pk = {P1 , . . . , Pk , M \ kj=1 Pj }. Given any ε > 0,
Lemma 9.2.3 ensures that Hμ (P/Pk ) < ε/2 for every k sufficiently large. Fix
k in these conditions and write P0 = M \ kj=1 Pj . For each n ≥ 1 and each
*
j = 1, . . . , k, let Qni be the union of the elements of nj=1 f −j (P) that intersect
Pi . The hypothesis ensures that each (Qni )n is a decreasing sequence whose
intersection coincides with Pi up to measure zero. Then, given δ > 0 there
exists n0 such that
k
μ(Qni \ Pi ) < δ for every n ≥ n0 . (9.5.3)
i=1
disjoint,
i−1
i−1
n
Rni \ Pi ⊂ Qni \ Pi and Pi \ Rni = Pi ∩ Qnj ⊂ Qj \ Pj
j=1 j=1
278 Entropy
k k
for i = 1, . . . , k. Similarly, Rn0 ⊂ P0 and P0 \ Rn0 = P0 ∩ n
j=1 Qi = j=1 (Qj \ Pj ).
n
These arguments also prove the following fact, which will be useful in a
while:
Exercise 9.1.5 implies that if (f , μ) has entropy zero then the same is true
for any factor. Therefore, the following fact is also an immediate consequence
of the proposition:
n=0 n=0
This proves the inclusion ⊃ in (9.5.4). In this way, we have shown that
∞ + )∞ %
−n 2
Uf (L0 (M, A, μ)) = L0 M, An , μ .
2
n=0 n=0
*∞
Therefore, the hypothesis that n=0 An = B up to measure zero implies that
∞ −n
n=0 Uf (E) = L0 (M, B, μ).
2
This concludes the proof that (f , μ) has Lebesgue spectrum. To prove that
the rank is infinite, we need the following lemma:
Lemma 9.5.10. Let A be any σ -algebra satisfying the conditions in Defini-
tion 9.5.8. Then for every A ∈ A with μ(A) > 0 there exists B ⊂ A such that
0 < μ(B) < μ(A).
Proof. Suppose that A has any element A with positive measure that does not
satisfy the conclusion of the lemma. We claim that A ∩ f −k (A) has measure
zero for every k ≥ 1. Then,
+ % + %
−i −j −j+i
μ f (A) ∩ f (A) = μ A ∩ f (A) = 0 for every 0 ≤ i < j.
Since μ(f −j (A)) = μ(A) for every j ≥ 0, this implies that the measure μ is
infinite, which is a contradiction. This contradiction reduces the proof of the
lemma to proving our claim.
To do that, note that condition (i) implies that f −k (A) ∈ f −k (A) ⊂ A. Then it
follows from the choice of A that A ∩ f −k (A) must have either zero measure or
full measure in A:
+ %
μ A ∩ f (A) = 0 or else μ(A \ f −k (A)) = 0.
−k
So, to prove the claim it suffices to exclude the second possibility. Suppose
that μ(A \ f −k (A)) = 0. Then (Exercise 1.1.4), there exists B ∈ A such that
μ(A B) = 0 and f −k (B) = B. It follows that B = f −nk (B) for every n ≥ 1 and,
thus,
−nk
B∈ f (A) = f −n (A).
n∈N n∈N
By condition (ii), this means that the measure of B is either 0 or 1. Since
μ(B) = μ(A) is positive, it follows that μ(A) = μ(B) = 1. Then, the hypothesis
about A implies that the σ -algebra A contains only sets with measure 0 or 1.
By condition (iii), it follows that the same is true for the σ -algebra B, which
contradicts the assumption that the probability space is non-trivial.
On the way toward proving that the orthogonal complement F = E " Uf (E)
has infinite dimension, let us start by checking that F = {0}. Indeed, otherwise
we would have Uf (E) = E and, thus, Ufn (E) = E for every n ≥ 1. By condition
(ii), that would imply that E = n Ufn (E) = {0}. Then, by condition (iii), we
would have L02 (M, B, μ) = {0} and that would contradict the hypothesis that the
probability space is non-trivial.
282 Entropy
Let ϕ be any non-zero element of F, fixed once and for all, and let N be the
set of all x ∈ M such that ϕ(x) = 0. Then, N ∈ A and μ(N) > 0. It is convenient
to consider the space E = L2 (M, A, μ) = E ⊕ {constants}. Observe that F
coincides with E " Uf (E ), because the Koopman operator preserves the line
of constant functions. Let EN be the subspace of functions ψ ∈ E that vanish
outside N, that is, such that ψ(x) = 0 for every x ∈ N c . By Lemma 9.5.10,
we may find sets Aj ∈ A, j ≥ 1 with positive measure, contained in N and
pairwise disjoint. Then, XAj is in EN for every j. Moreover, Ai ∩ Aj = ∅ yields
XAi · XAj = 0 for every i = j. This implies that EN has infinite dimension.
Now denote by Uf (E )N the subspace of functions ψ ∈ Uf (E ) that vanish
outside N. Denote FN = EN " Uf (E )N . The fact that dim EN = ∞ ensures that
dim FN = ∞ or dim Uf (E )N = ∞ (or both). We are going to show that any of
these alternatives implies that dim F = ∞.
To treat the first alternative, it suffices to show that FN ⊂ F. For that, since
it is clear that FN ⊂ E , it suffices to check that FN is orthogonal to Uf (E ).
Consider any ξ ∈ FN and η ∈ E . The function (Uf η)XN = Uf (ηXf −1 (N) ) is in
Uf (E ) and vanishes outside N. In other words, it is in Uf (E )N . Then ξ · Uf η =
ξ · (Uf η)XN = 0, because ξ vanishes outside N and is orthogonal to Uf (E )N .
This completes the argument in this case.
Now we treat the second alternative. Given any Uf η ∈ Uf (E )N and any
n ∈ N, let ηn = ηXRn with Rn = {x ∈ M : |η(x)| ≤ n}. Then (ηn )n is a sequence of
bounded functions converging to η in E . Moreover, every ηn vanishes outside
f −1 (N), because η does. Then, (Uf ηn )n is a sequence of bounded functions
that vanish outside N and, recalling that Uf is an isometry, this sequence
converges to Uf η in E . This proves that the subspace of bounded functions
is dense in Uf (E )N . Then, since we are assuming that dim Uf (E )N = ∞, this
subspace must also have infinite dimension. Choose {ξk : k ≥ 1} ⊂ E such
that {Uf ξk : k ≥ 1} is a linearly independent subset of Uf (E )N consisting of
bounded functions. Observe that the products ϕ(Uf ξk ), k ≥ 1 form a linearly
independent subset of E . Moreover, given any η ∈ E ,
ϕ(Uf ξk ) · (Uf η) = ϕ (ξk ◦ f ) (η̄ ◦ f ) dμ = ϕ (ξk η̄) ◦ f dμ = ϕ · Uf (ξ̄k η).
This last expression is equal to zero because ξ̄k η ∈ E and the function ϕ ∈ F is
orthogonal to Uf (E ). Varying η ∈ E , we conclude that ϕ(Uf ξk ) is orthogonal
to Uf (E ) for every k. This shows that {ϕ(Uf ξk ) : k ≥ 1} is contained in F and,
thus, dim F = ∞ also in this case.
This completes the proof that (f , μ) has infinite rank. When B is countably
generated, L02 (M, B, μ) is separable (Example 8.4.7) and so the rank is
necessarily countable.
We say that a partition of (M, B, μ) is trivial if all its elements have measure
0 or 1. Keep in mind that in the present chapter all partitions are assumed to be
countable.
9.5 Entropy and equivalence 283
This observation also implies that, unlike the Kolmogorov property, exactness
is an invariant of spectral equivalence.
We saw in Example 8.4.2 that every one-sided Bernoulli shift has Lebesgue
spectrum. In order to do that, we considered the subspace E = L02 (M, B, μ).
Therefore, the same argument proves that every one-sided Bernoulli shift is an
exact system. A much larger class of examples, expanding maps endowed with
their equilibrium states, is studied in Chapter 12.
It is immediate that invertible systems are never exact: in the invertible case
f −n (B) = B up to measure zero, for every n; therefore, the exactness condition
corresponds to saying that the σ -algebra B is trivial (which is excluded, by
hypothesis).
Figure 9.2 summarizes the relations between the different classes of systems
studied in this book. It is organized in three columns: systems with zero entropy
(which are necessarily invertible, as we saw in Proposition 9.5.5), invertible
systems with positive entropy and non-invertible systems.
RT B2 B1
Bernoulliaut. exactsyst.
Kolmogorovsyst.
discretespec. Lebesguespec.
mixingsyst.
ergodicsyst.
9.5.5 Exercises
9.5.1. Show that if (f , μ) is a Bernoulli automorphism then it is ergodically equivalent
to its inverse (f −1 , μ). Moreover, for every k ≥ 1 there exists a Bernoulli
automorphism (g, ν) that is a k-th root of (f , μ), that is, such that (gk , ν)
is ergodically equivalent to (f , μ). [Observation: Ornstein [Orn74] proved
that, conversely, every k-th root of a Bernoulli automorphism is a Bernoulli
automorphism.]
9.5.2. Use the notion of density point to show that the decimal expansion map f (x) =
10x − [10x] is exact, relative to the Lebesgue measure.
9.5.3. Show that the Gauss map is exact, relative to its absolutely continuous invariant
measure μ.
9.5.4. Show that the two-sided Markov shift associated with any aperiodic stochastic
matrix P is a Kolmogorov automorphism.
9.5.5. Show that the one-sided Markov shift associated with any aperiodic stochastic
matrix P is an exact system.
9.5.6. Prove that if (f , μ) is exact then hμ (f , P) > 0 for every non-trivial partition P
with finite entropy.
Proof. Define φ(x) = −x log x for x > 0. On the one hand, since the function
φ is concave,
for every measurable set B ⊂ M. On the other hand, given any measurable set
B ⊂ M,
φ tμ(B) + (1 − t)ν(B) − tφ μ(B) − (1 − t)φ ν(B)
tμ(B) + (1 − t)ν(B) tμ(B) + (1 − t)ν(B)
= −tμ(B) log − (1 − t)ν(B) log
μ(B) ν(B)
≤ −tμ(B) log t − (1 − t)ν(B) log(1 − t)
because the function − log is decreasing. Therefore, given any partition P with
finite entropy,
Htμ+(1−t)ν (P) ≥ tHμ (P) + (1 − t)Hν (P) and
Htμ+(1−t)ν (P) ≤ tHμ (P) + (1 − t)Hν (P) − t log t − (1 − t) log(1 − t).
9.6 Entropy and ergodic decomposition 287
Consequently,
where μA and μAc denote the normalized restrictions of μ to the set A and
its complement, respectively (this fact was obtained before, in Exercise 9.1.4).
Another immediate consequence is the following version of Proposition 9.6.1
for finite convex combinations:
n
n
μ= ti μi ⇒ hμ (f ) = ti hμi (f ), (9.6.3)
i=1 i=1
We are going to deduce this result from a general theorem about affine
functionals in the space of probability measures, that we state in Section 9.6.1
and prove in Section 9.6.2.
Now we treat the general case of the lemma, by reduction to the previous
paragraph. Given any finite partition Q, let = QN and
h : M → , h(x) = Q(f n (x)) n∈N .
Observe that h ◦ f = σ ◦ h, where σ : → denotes the shift map. To each
measure η on M we may assign the measure η = h∗ η on . The previous
relation ensures that if η is invariant under f then η is invariant under σ .
Moreover, if η is ergodic for f then η is ergodic for σ . Indeed, if B ⊂ is
invariant under σ then B = h−1 (B ) is invariant under σ . Assuming that η is
ergodic, it follows that η (B ) = η(B) is either 0 or 1; hence, η is ergodic.
By construction, Q = h−1 (Q ), where Q denotes the partition of into the
*n−1 −j *n−1 −j
cylinders [0; Q], Q ∈ Q. More generally, j=0 f (Q) = h−1 ( j=0 σ (Q ))
and, thus,
n−1
) n−1
)
−j −j
Hη f (Q) = Hη σ (Q )
j=0 j=0
As the measures μP are ergodic, the relation (9.6.6) means that {μP : P ∈ P}
is an ergodic decomposition of μ . Then, according to the previous paragraph,
hμ (σ , Q ) = hμP (σ , Q ) dμ̂(P). By the relation (9.6.5) applied to η = μ and
to η = μP , this may be rewritten as
hμ (σ , Q) = hμP (σ , Q) dμ̂(P),
as we wanted to prove. Note that the argument remains valid even when either
of the two sides of this identity is infinite (then the other one is also infinite).
In this way, we reduced the proof of Theorem 9.6.2 to proving Theo-
rem 9.6.5.
Since H is affine and the expression on the right-hand side is a (finite) convex
combination, it follows that
∞ n
n
H ti νi = ti H(νi ) + (1 − sn )H(Rn ) ≥ ti H(νi )
i=1 i=1 i=1
Proof. Suppose that H is not bounded: there exist νi ∈ M such that H(νi ) ≥ 2i
for every i ≥ 1. Consider ν = ∞ −i
i=1 2 νi . By Lemma 9.6.8,
∞
H(ν) ≥ 2−i H(νi ) = ∞.
i=1
Now we are ready to prove the inequality ≥ in Theorem 9.6.5. Let us write
μ = bar(W). By the hypothesis of semi-continuity, given any ε > 0 there exist
δ > 0 and a finite family = {φ1 , . . . , φN } of bounded continuous functions
such that
H(η) < H(μ) + ε for every η ∈ M ∩ V(μ, , δ). (9.6.8)
Since M1 (M) is a separable metric space, it admits a countable basis of open
sets, and then so does any subspace. Let {V1 , . . . , Vn , . . . } be a basis of open sets
of M, with the following properties:
which is precisely what (9.6.9) means. Then, combining (9.6.8), (9.6.9) and
Lemma 9.6.8,
/ 0
W(Pn )H(νn ) ≤ H W(Pn )νn < H(μ) + ε.
n n
Adding the last two inequalities, we get that H(η) dW(η) < H(μ) + 2ε. Since
ε > 0 is arbitrary, this implies that H(μ) ≥ H(η) dW(η).
Now we prove the inequality ≤ in Theorem 9.6.5. Consider any sequence
(Pn )n of finite partitions of M such that, for every ν ∈ M, the diameter
of P (ν) converges to zero when n goes to infinity. For example, Pn =
*n n
i=1 {Vi , Vi }, where {Vn : n ≥ 1} is any countable basis of open sets of M.
c
and, therefore,
H(bar(W)) = W(P)H(bar(WP )).
P∈Pn
Define Hn (η) = H(bar(WPn (η) )), for each η ∈ M. Then the last identity above
may be rewritten as follows:
H(bar(W)) = Hn (η) dW(η) for every n. (9.6.10)
9.6 Entropy and ergodic decomposition 293
It follows directly from the definition of Hn that 0 ≤ Hn (η) ≤ sup H for every
n and every η. Recall that sup H < ∞ (Corollary 9.6.9). We also claim that
lim sup Hn (η) ≤ H(η) for every η ∈ M. (9.6.11)
n
as we wanted to prove.
Now the proof of Theorems 9.6.2 and 9.6.5 is complete.
9.6.3 Exercises
9.6.1. Check that, given any probability measure W on M1 (M), there exists a unique
probability measure bar(W) ∈ M1 (M) on M that satisfies (9.6.4).
9.6.2. Show that the barycenter function is strongly affine: if Wi , i ≥ 1 are probability
measures on M1 (M) and ti , i ≥ 1 are non-negative numbers with ∞ i=1 ti = 1,
then
∞
∞
bar( ti W i ) = ti bar(Wi ).
i=1 i=1
9.6.3. Show that if M ⊂ M1 (M) is a closed convex set then M is strongly convex.
Moreover, in that case W(M) = 1 implies that bar(W) ∈ M.
9.6.4. Check that the inequality ≥ in Theorem 9.6.2 may also be obtained through the
following, more direct, argument:
(1) Recalling that the function φ(x) = −x log x is concave, show that Hμ (Q) ≥
HμP (Q) dμ̂(P) for every finite partition Q.
(2) Deduce that hμ (f , Q) ≥ hμP (f , Q) dμ̂(P) for every finite partition Q.
(3) Conclude that hμ (f ) ≥ hμP (f ) dμ̂(P).
9.6.5. The inequality ≤ in Theorem 9.6.2 is based on the fact that hμ (f , Q) ≤
hμP (f , Q) dμ̂(P) for every finite partition Q, which is part of Lemma 9.6.6.
Find what is wrong with the following “alternative proof”:
Let Q be a finite partition. The theorem of Shannon–McMillan–Breiman
ensures that hμ (f , Q) = hμ (f , Q, x) dμ(x), where
1 1
hμ (f , Q, x) = lim − log μ(Q (x)) = lim − log μP (Qn (x)) dμ̂(P).
n
n n n n
294 Entropy
This shows that hμ (f , Q, x) ≤ hμP (f , Q) dμ̂(P) for every finite partition Q and
μ-almost every x. Consequently, hμ (f , Q) ≤ hμP (f , Q) dμ̂(P) for every finite
partition Q.
By Lemma 5.2.1, the limit e(XP , x) = limn en (XP , x) exists at μ-almost every
x. So, observing that the function φ is bounded, we may use the dominated
convergence theorem to deduce from (9.7.4)–(9.7.6) that
hμ (f ) = φ(e(XP , x)) dμ(x). (9.7.7)
P∈P
Now we need to relate the expression inside the integral to the Jacobian. This
we do by means of Lemma 9.7.5 below. Beforehand, let us prove the following
change of variables formulas:
Proof. The definition (9.7.2) means that the formula in part (i) holds for the
characteristic function ϕ = Xf (A) for any invertibility domain A. Thus, it holds
for the characteristic function of any measurable subset of f (A), since such a
subset may be written as f (B) for some invertibility domain B ⊂ A. Hence, by
linearity, the identity extends to every simple function defined on f (A). Using
the monotone convergence theorem, we conclude that the identity holds for
every non-negative measurable function. Using linearity once more, we get
the general statement of part (i).
298 Entropy
To deduce the claim in (ii), apply (i) to the function ϕ = (ψ/Jη f ) ◦ (f | A)−1 .
Note that this function is well defined at η-almost every point for, as observed
before, Jη f (x) > 0 for every x in the pre-image of η-almost every y ∈ M.
*n −j
n
Proof. Recall that Qn = j=1 f (P), that is, Qn (x) = f −1 (P(f i (x))) for
j=1
*n−1 −j
each x. We also use the sequence of partitions P = j=0 f (P). Observe that
n
Qn (x) = f −1 (P n−1 (f (x))) and P n (x) = P(x) ∩ Qn (x) for every n and every x.
Then, ψ
ψ̂ dη = ◦ (f | P)−1 dη.
P n−1 (f (x)) n−1 (f (x)) Jη f
P∈P f (P)∩P
Therefore,
ψ̂ dη = ψ dη. (9.7.8)
P n−1 (f (x)) Qn (x)
Let en−1 (ψ̂, x) be the conditional expectation of ψ̂ with respect to the partition
P n−1 , as defined in Section 5.2.1, and let e (ψ̂, x) be its limit when n goes to
infinity,
n−1 given by Lemma
5.2.1. The hypothesis that η is invariant gives that
η P (f (x)) = η Qn (x) . Dividing both sides of (9.7.8) by this number, we
get that
en−1 (ψ̂, f (x)) = en (ψ, x) for every x and every n > 1. (9.7.9)
Then, taking the limit, e (ψ̂, f (x)) = e(ψ, x) for η-almost every x. On the other
hand, according to Exercise 5.2.3, the hypothesis implies that e (ψ̂, y) = ψ̂(y)
for η-almost every y ∈ M.
(the last step uses the identity in part (ii) of Lemma 9.7.4). Replacing this
expression in (9.7.7), we get that
hμ (f ) = log Jμ f dμ = log Jμ f dμ,
P∈P P
9.7.1 Exercises
9.7.1. Check that the definition of a Jacobian does not depend on the choice of the cover
{Uk : k ≥ 1} by invertibility domains.
9.7.2. Let σ : → be the shift map in = {1, 2, . . . , d}N and μ be the Markov
measure associated with an aperiodic matrix P. Find the Jacobian of f with
respect to μ.
9.7.3. Let f : M → M be a locally invertible transformation and η be a probability
measure on M, non-singular with respect to f . Show that for every bounded
measurable function ψ : M → R,
ψ
ψ dη = (z)dη(x).
−1
Jη f
z∈f (x)
Assuming that f is invertible, what can be said about the Jacobian of f −1 with
respect to η?
9.7.6. Let f : M → M and g : N → N be locally invertible transformations and let μ
and ν be probability measures invariant under f and g, respectively. Assume that
there exists an ergodic equivalence φ : M → N between the systems (f , μ) and
(g, ν). Show that Jμ f = Jν g ◦ φ at μ-almost every point.
300 Entropy
always exists and is finite. It is called the entropy of f with respect to the open
cover α. The relation (10.1.2) implies that
α≺β ⇒ h(f , α) ≤ h(f , β). (10.1.4)
Finally, we define the topological entropy of f to be
h(f ) = sup{h(f , α) : α is an open cover of M}. (10.1.5)
In particular, if β is a subcover of α then h(f , α) ≤ h(f , β). Therefore, the
definition (10.1.5) does not change when one restricts the supremum to the
finite open covers.
Observe that the entropy h(f ) is a non-negative number, possibly infinite
(see Exercise 10.1.6).
Example 10.1.1. Let f : S1 → S1 be any homeomorphism (for example, a
rotation Rθ ) and let α be an open cover of the circle formed by a finite
number of open intervals. Let ∂α be the set consisting of the endpoints of
those intervals. For each n ≥ 1, the open cover α n is formed by intervals whose
endpoints are in
∂α n = ∂α ∪ f −1 (∂α) ∪ · · · ∪ f −n+1 (∂α).
Note that #α n ≤ #∂α n ≤ n#∂α. Therefore,
1 1 1
h(f , α) = lim H(α n ) ≤ lim inf log #α n ≤ lim inf log n = 0.
n n n n n n
Proposition 10.1.12 below gives that h(f ) = limk h(f , αk ) for any sequence
of open covers αk with diam αk → 0. Then, considering open covers αk by
intervals of length less than 1/k, we conclude from the previous calculation
that h(f ) = 0 for every homeomorphism of the circle.
Example 10.1.2. Let = {1, . . . , d}N and α be the cover of by the cylinders
[0; a], a = 1, . . . , d. Consider the shift map σ : → . For each n, the open
cover α n consists of the cylinders of length n:
α n = {[0; a0 , . . . , an−1 ] : aj = 1, . . . , d}.
304 Variational principle
n−1
n−1
−j
−1 n−1
−1 −j
−1 −j
f θ (Aj ) = θ g (Aj ) = θ g (Aj ) .
j=0 j=0 j=0
Noting that the sets of the form on the right-hand side of this identity constitute
the pre-image θ −1 (α n ) of α n , we conclude that θ −1 (α n ) = θ −1 (α)n . Since θ is
surjective, a family γ ⊂ α n covers N if and only if θ −1 (γ ) covers M. Therefore,
H(θ −1 (α)n ) = H(θ −1 (α n )) = H(α n ).
Since n is arbitrary, it follows that h(f , θ −1 (α)) = h(g, α). Then, taking the
supremum over all the open covers α of N:
h(g) = sup h(g, α) = sup h(f , θ −1 (α)) ≤ h(f ).
α α
This proves the first part of the proposition. The second part is an immediate
consequence, since in that case f is also a factor of g.
This proves the first part of the proposition. The second part is an immediate
consequence.
Therefore, the hypothesis that E is a generating set implies that the family
n−1 −i
γ = { i=0 f (Ax,i ) : x ∈ E} is an open cover of M. Since γ ⊂ α n , it follows
that N(α n ) ≤ #E = gn (f , ε, M) for every n. Therefore,
1 1
h(f , α) = lim log N(α n ) ≤ lim inf log gn (f , ε, M)
n n n n
(10.1.15)
1
≤ lim sup log gn (f , ε, M) = g(f , ε, M).
n n
Making ε → 0, we get that h(f , α) ≤ g(f , M) = g(f ). Since the open cover α is
arbitrary, it follows that h(f ) ≤ g(f ).
Proof. The main point is to show that the open covers (α k )n and α n+k−1 have
the same entropy, for every n ≥ 1. We use the following simple fact, which will
be useful again later:
Lemma 10.1.11. Given any open cover α and any n, k ≥ 1,
1. α n+k−1 is a subcover of (α k )n and, in particular, (α k )n ≺ α n+k−1 ;
2. for any subcover β of (α k )n there exists a subcover γ of α n+k−1 such that
#γ ≤ #β and γ ≺ β.
n+k−2 −l
Proof. By definition, every element α n+k−1 has the form B = l=0 f (Bl )
with Bl ∈ α for every l. It is clear that this may be written in the form
n−1 −i k−1 −j
B = i=0 f j=0 f (Bi+j ) and, thus, B ∈ (α ) . This proves the first claim.
k n
According to the relation (10.1.2), the first part of Lemma 10.1.11 implies
that H((α k )n ) ≤ H(α n+k−1 ). Clearly, the second part of the lemma implies the
opposite inequality. Hence,
H(α n+k−1 ) = H((α k )n ) for any n, k ≥ 1, (10.1.19)
as we claimed. Therefore,
1 1
h(f , α k ) = lim H((α k )n ) = lim H(α n+k−1 ) = h(f , α) for every k.
n n n n
The next proposition and its corollary simplify the calculation of the
topological entropy significantly in concrete examples. Recall that, when M
is a metric space, the diameter of an open cover is defined to be the supremum
of the diameters of its elements.
Proposition 10.1.12. Assume that M is a compact metric space. Let (βk )k be
any sequence of open covers of M such that diam βk converges to zero. Then
h(f ) = sup h(f , βk ) = lim h(f , βk ).
k k
Proof. Given any open cover α, let ε > 0 be a Lebesgue number of α. Take
n ≥ 1 such that diam βk < ε for every k ≥ n. By the definition of Lebesgue
number, it follows that every element of βk is contained in some element of α.
In other words, α ≺ βk and, hence, h(f , βk ) ≥ h(f , α). In view of the definition
(10.1.5), this proves that
lim inf h(f , βk ) ≥ h(f ).
k
It is also clear from the definitions that h(f ) ≥ supk h(f , βk ) ≥ lim supk h(f , βk ).
Combining these observations, we obtain the conclusion of the proposition.
Next, we check that the topological entropy behaves as one could expect
with respect to positive iterates, at least when the transformation is uniformly
continuous:
Making ε → 0 and taking the supremum over K, we see that h(f k ) ≤ kh(f ).
The proof of the other inequality uses the assumption that f is uniformly
continuous. Take δ > 0 such that d(x, y) < δ implies d(f j (x), f j (y)) < ε for
every j ∈ {0, . . . , k − 1}. If E ⊂ M is an (n, δ)-generating set of K for f k then
E is an (nk, ε)-generating set of K for f . Therefore, gnk (f , ε, K) ≤ gn (f k , δ, K).
This shows that kg(f , ε, K) ≤ g(f k , δ, K). Making ε and δ go to zero, we get that
kg(f , K) ≤ g(f k , K) for every compact set K. Hence, kh(f ) ≤ h(f k ).
Since α is arbitrary, this proves that h(f ) = h(f −1 ). The second part of the
statement follows from combining the first part with Proposition 10.1.14.
312 Variational principle
The claim in Proposition 10.1.15 is generally false when the space M is not
compact:
This proves that gn (f , ε, K) ≥ 2n−2 /ε for every n and, thus, g(f , ε, K) ≥ log 2. It
follows that h(f ) ≥ g(f , K) ≥ log 2. On the other hand, f −1 is a contraction and
so it follows from Example 10.1.7 that its topological entropy h(f −1 ) is zero.
10.1.4 Exercises
10.1.1. Let M be a compact topological space. Show that if α and β are open covers of
M such that α ≺ β then H(α) ≤ H(β).
10.1.2. Let f : M → M be a continuous transformation and α, β be open covers of
a compact topological space M. Show that H(α ∨ β) ≤ H(α) + H(β) and
H(f −1 (β)) ≤ H(β). Check that if f is surjective then H(f −1 (β)) = H(β).
10.1.3. Let M be a compact topological space. Show that if f : M → M is a surjective
continuous transformation and β is an open cover of M then h(f , β) =
h(f , f −1 (β)). Moreover, if f is a homeomorphism then h(f , β) = h(f , f (β)).
10.1.4. Let M = (0, ∞) and f : M → M be given by f (x) = 2x. Calculate the topological
entropy of f when one considers in M:
(a) the usual distance d(x, y) = |x − y|;
(b) the distance d(x, y) = | log x − log y|.
[Observation: Hence, in non-compact spaces the topological entropy may
depend on the distance function, not just the topology.]
10.1.5. Consider in M two distances d1 and d2 that are uniformly equivalent: for every
ε > 0 there exists δ > 0 such that
10.2 Examples
Let us use a few concrete situations to illustrate the ideas introduced in the
previous section.
(i) h(f ) = h(f , α) for every open cover α with diameter less than ε0 ;
(ii) h(f ) = g(f , ε, M) = s(f , ε, M) for every ε < ε0 /2.
Proof. Let α be any open cover of M with diameter less than ε0 . We claim that
limk diam α k = 0. Indeed, suppose that this is not so. It is clear that the sequence
of diameters is non-increasing. Then, there exists δ > 0 and for each k ≥ 1
there exist points xk and yk in the same element of α k such that d(xk , yk ) ≥ δ.
314 Variational principle
By compactness, we may find a subsequence (kj )j such that both x = limj xkj
and y = limj ykj exist. On the one hand, d(x, y) ≥ δ and so x = y. On the other
hand, the fact that xk and yk are in the same element of α k implies that
d(f i (xk ), f i (yk )) ≤ diam α for every 0 ≤ i < k.
Passing to the limit, we get that d(f i (x), f i (y)) ≤ diam α < ε0 for every i ≥ 0.
This contradicts the hypothesis that ε0 is a constant of expansivity for f . This
contradiction proves our claim. Using Corollary 10.1.13, it follows that h(f ) =
h(f , α), as claimed in part (i).
To prove part (ii), let α be the open cover of M formed by the balls of
radius ε. Note that α n contains every dynamical ball B(x, n, ε):
n−1
B(x, n, ε) = f −j B(f j (x), ε) and each B(f j (x), ε) ∈ α.
j=0
d(f i (x), f i (y)) < diam α < ε0 for every i = 0, . . . , n − 1. Since f n (x) = x and
f n (y) = y, it follows that d(f i (x), f i (y)) < ε0 for every i ≥ 0. By expansivity,
this implies that x = y, which proves our claim. It follows that
1 1
lim sup log # Fix(f n ) ≤ lim sup log N(α n ) = h(f , α).
n n n n
Taking the limit when the diameter of α goes to zero, we get the conclusion of
the proposition.
In some interesting situations, one can show that the topological entropy
actually coincides with the rate of growth of the number of periodic points:
1
lim log # Fix(f n ) = h(f ). (10.2.1)
n n
That is the case, for example, for the shifts of finite type, which we are going
to study in Section 10.2.2 (check Proposition 10.2.5 below). More generally,
(10.2.1) holds whenever f : M → M is an expanding transformation in a
compact metric space, as we are going to see in Section 11.3.
1 2
3 4
that the columns of P are also not zero; this is automatic, for example, if the
matrix P is aperiodic). Comparing (7.2.7) and (10.2.2), we see that a sequence
is A-admissible if and only if it is P-admissible. Let μ be the Markov measure
determined by a probability vector p = (pj )j with positive coefficients and
such that P∗ p = p (recall Example 7.2.2). By Lemma 7.2.5, the support of
μ coincides with the set A = P of all admissible sequences.
It is useful to associate with any transition matrix A the oriented graph whose
vertices are the points of X = {1, . . . , d} and such that there exists an edge from
vertex a to vertex b if and only if Aa,b = 1. In other words,
GA = {(a, b) ∈ X × X : Aa,b = 1}.
For example, Figure 10.1 describes the graph associated with the matrix
⎛ ⎞
0 1 1 0
⎜ 1 1 0 1 ⎟
A=⎜ ⎝ 1 0 1 0 ⎠.
⎟
1 0 0 1
A path of length l ≥ 1 in the graph GA is a sequence a0 , . . . , al in X such that
Aai−1 ,ai = 1 for every i, that is, such that there always exists an edge connecting
ai−1 to ai . Given a, b ∈ X and l ≥ 1, denote by Ala,b the number of paths of length
l starting at a and ending at b, that is, with a0 = a and al = b. Observe that:
where trc denotes the trace of the matrix and · denotes any norm in
the vector space of linear maps (all norms are equivalent, as we are in
finite dimension). Most of the time, one uses the operator norm B =
sup{Bv/v : v = 0}, but it will also be useful to consider the norm · s
defined by
d
Bs = |Bi,j |.
i,j=1
Proof. We treat the case of one-sided shifts; the two-sided case is analogous,
as the reader may readily check. Consider the open cover α of A formed by
the restrictions
[0; a]A = {(xj )j ∈ A : x0 = a}
of the cylinders [0; a] of . For each n ≥ 1, the open cover α n is formed by the
restrictions
[0; a0 , . . . , an−1 ]A = {(xj )j ∈ A : xj = aj for j = 0, . . . , n − 1}
of the cylinders of length n. Observe that [0; a0 , . . . , an−1 ]A is non-empty if and
only if a0 , . . . , an−1 is a path (of length n − 1) in the graph GA : it is evident that
this condition is necessary; to see that it is also sufficient, use the assumption
that for every i there exists j such that Ai,j = 1. Since the cylinders are pairwise
318 Variational principle
disjoint, this observation shows that N(α n ) is equal to the total number of paths
of length n − 1 in the graph GA . In other words,
d
N(α ) =
n n−1
Ai,j = An−1 s .
i,j=1
Finally, since diam α n → 0, Corollary 10.1.13 yields that h(σA ) = h(σA , α).
Proof. We treat the case of one-sided shifts, leaving the two-sided case for the
reader. Note that (xk )k ∈ A is a fixed point of σAn if and only if xk = xk−n for
every k ≥ n. In particular, every cylinder [0; a0 , . . . , an−1 ]A contains at most one
element of Fix(σAn ). Moreover, the cylinder does contain a fixed point if and
only if a0 , . . . , an−1 , a0 is a path (of length n) in the graph GA . This proves that
d
# Fix(σAn ) = Ani,i = trc An
i=1
and we say that E ⊂ K is a (T, ε)-separated set if the dynamical ball B(x, T, ε)
of each x ∈ E contains no other element of E.
10.2 Examples 319
1
s(φ, K) = lim lim sup log sT (φ, ε, K)
ε→0 T→∞ T
and define
g(φ) = sup g(φ, K) and s(φ) = sup s(φ, K),
K K
where both suprema are taken over all the compact sets K ⊂ M.
The next result, a continuous-time analogue of Proposition 10.1.4, ensures
that these two last numbers coincide. We leave the proof up to the reader
(Exercise 10.2.3). By definition, the topological entropy of the flow φ is the
number h(φ) = g(φ) = s(φ).
d(x, y) < δ ⇒ d(φ t (x), φ t (y)) < ε for every t ∈ [−T, T].
(given a sequence (Tj )j that realizes the supremum on the left-hand side,
consider the sequence (nj )j given by nj = [Tj ]+1). Making ε → 0 (then δ → 0),
we get that g(φ, K) ≤ g(φ 1 , K).
point of δZd ∩ (−1, 1)d . Therefore, for any ϕα ∈ AK , every x ∈ ϕα−1 ((−1, 1)d )
10.2 Examples 321
√
is at a distance less than Bδ d from some point a ∈ ϕ(δZd ∩ (−1, 1)d ). Then,
by the choice of δ,
√ √
d(f j (x), f j (a)) ≤ Lj Bδ d < Ln Bδ d = ε
for every j = 0, . . . , n − 1. This proves that E is an (n, ε)-generating set for K.
On the other hand, by construction,
√
#E ≤ #AK # δZd ∩ (−1, 1)d ≤ #AK (2/δ)d ≤ #AK (2B dLn /ε)d ,
so the expression on the right-hand side is an upper bound for gn (f , ε, K).
Consequently,
1 √
g(f , ε, K) ≤ lim sup log(2B dLn /ε)d = d log L.
n n
Making ε → 0 and taking the supremum over K, we get that h(f ) ≤ d log L.
where each ρ(fk ) denotes the spectral radius of the action fk : Hk (M) → Hk (M)
induced by f in the real homology of dimension k.
The full statement of the conjecture remains open to date, but several partial
answers and related results have been obtained, both positive and negative. Let
us summarize what is known in this regard.
It follows from a result of Yano [Yan80] that the inequality (10.2.4) is true
for an open and dense subset of the space of homeomorphisms in any manifold
of dimension d ≥ 2. Moreover, it is true for every homeomorphism in certain
classes of manifolds, such as the spheres or the infranilmanifolds [MP77b,
MP77a, MP08]. On the other hand, Shub [Shu74] exhibited a Lipschitz
homeomorphism, with zero topological entropy, for which (10.2.4) is false.
See Exercise 10.2.7.
A useful way to approach (10.2.4) is by comparing the topological entropy
with each one of the spectral radii ρ(fk ). The case k = d is relatively easy.
Indeed, for any continuous map f in a manifold of dimension d, the spectral
322 Variational principle
radius ρ(fd ) is equal to the absolute value | deg f | of the degree of the map. In
particular, the inequality h(f ) ≥ log ρ(fd ) is trivial for any homeomorphism.
For non-invertible continuous maps, the topological entropy may be less than
the logarithm of the absolute value of the degree. However, it was shown in
[MP77b] that for differentiable maps one always has h(f ) ≥ log | deg f |.
Anthony Manning [Man75] proved that the inequality h(f ) ≥ log ρ(f1 ) is
true for every homeomorphism in a manifold of any dimension d. It follows
that h(f ) ≥ log ρ(fd−1 ), since the duality theorem of Poincaré implies that
ρ(fk ) = ρ(fd−k ) for every 0 < k < d.
In particular, the theorem of Manning together with the observations in
the previous paragraph prove that entropy conjecture is true for every
homeomorphisms in any manifold of dimension d ≤ 3.
Rufus Bowen [Bow78] proved that for any homeomorphism in a manifold
the topological entropy h(f ) is greater than or equal to the logarithm of the rate
of growth of the fundamental group. One can show that this rate of growth is
greater than or equal to ρ(f1 ). Thus, this result of Bowen implies the theorem
of Manning that we have just mentioned.
The main result concerning the entropy conjecture is the theorem of
Yosef Yomdin [Yom87], according to which the conjecture is true for every
diffeomorphism of class C∞ . The crucial ingredient in the proof is a relation
between the topological entropy h(f ) and the diffeomorphism’s rate of growth
of volume, which is defined as follows. For each 1 ≤ k < d, let Bk be the unit
ball in Rk . Denote by v(σ ) the k-dimensional volume of the image of any
differentiable embedding σ : Bk → M. Then, define
1
vk (f ) = sup lim sup log v(f n ◦ σ ),
σ n n
where the supremum is taken over all the embeddings σ : Bk → M of class C∞ .
Define also v(f ) = max{vk (f ) : 1 ≤ k < d}. It is not difficult to check that
log ρ(fk ) ≤ vk (f ) for every 1 ≤ k < d. (10.2.5)
On the one hand, Sheldon Newhouse [New88] proved that h(f ) ≤ v(f ) for every
diffeomorphism of class Cr with r > 1. On the other hand, Yomdin [Yom87]
proved the opposite inequality:
v(f ) ≤ h(f ), (10.2.6)
for every diffeomorphism of class C∞ (this inequality is false, in general, in the
Cr case with r < ∞). Combining (10.2.5) with (10.2.6), one gets the entropy
conjecture (10.2.4) for every diffeomorphism of class C∞ .
Concerning systems of class C1 , it is also known that the inequality (10.2.4)
is true for every Axiom A diffeomorphism with no cycles [SW75], for certain
partially hyperbolic diffeomorphisms [SX10] and, more generally, for any C1
diffeomorphism far from homoclinic tangencies [LVY13].
10.2 Examples 323
Initially, assume that A is diagonalizable, that is, that there exists a basis
v1 , . . . , vd of Rd with Avi = λi vi for each i. Then, clearly, we may take the
elements of such a basis to be unit vectors. Moreover, up to renumbering the
eigenvalues, we may assume that there exists u ∈ {0, . . . , d} such that |λi | > 1 for
1 ≤ i ≤ u and |λi | ≤ 1 for every i > u. Let e1 , . . . , ed be the canonical basis of Rd
and P : Rd → Rd be the linear isomorphism defined by P(ei ) = vi for each i.
Then P−1 AP is a diagonal matrix. Fix L > 0 large enough so that P((0, L)d )
.
contains some unit cube di=1 [bi , bi + 1]d . See Figure 10.2. Let π : Rd → Td
be the canonical projection. Then π P((0, L)d ) contains √ the whole torus T .
d
Given n ≥ 1 and ε > 0, fix δ > 0 such that Pδ d < ε. Moreover, for each
i = 1, . . . , d, take
δ|λi |−n if i ≤ u
δi =
δ if i > u.
Consider the set
E = π P (k1 δ1 , . . . , kd δd ) ∈ (0, L)d : k1 , . . . , kd ∈ Z .
Observe also that, given any j ≥ 0,
j j j
fA (E) ⊂ π P (k1 λ1 δ1 , . . . , kd λd δd ) : k1 , . . . , kd ∈ Z .
324 Variational principle
P b2 + 1
b2
L b1 b1 + 1
j
Consider 0 ≤ j < n. By construction, |λi δi | ≤ δ for every i = √1, . . . , d. Therefore,
every point of R is at a distance less than or equal to δ d from some point
d
j j
of the form (k1 λ1 δ1 , . . . , kd λd δd ). Then (see Figure
√ 10.2), for each x ∈ Td we
may find a ∈ E such that d(f j (x), f j (a)) ≤ Pδ d < ε for every 0 ≤ j < n. This
shows that E is an (n, ε)-generating set for Td . On the other hand,
&d d & u
L L
#E ≤ = |λi |n .
i=1 i
δ δ i=1
.
These observations show that gn (fA , ε, Td ) ≤ (L/δ)d ui=1 |λi |n for every n ≥ 1
and ε > 0. Hence,
1 u d
h(f ) = lim lim sup gn (fA , ε, Td ) ≤ log |λi | = log+ |λi |.
ε→0 n n i=1 i=1
10.2.6 Exercises
10.2.1. Let (Mi , di ), i = 1, 2 be metric spaces and fi : Mi → Mi , i = 1, 2 be continuous
transformations. Let M = M1 × M2 , d be the distance defined in M by
and f : M → M be the transformation defined by f (x1 , x2 ) = (f1 (x1 ), f2 (x2 )). Show
that h(f ) ≤ h(f1 ) + h(f2 ) and the identity holds if at least one of the spaces is
compact.
10.2.2. Let σA : A → A be a shift of finite type, either one-sided or two-sided. We
say that a transition matrix A is irreducible if for any i, j ∈ X there exists n ≥ 1
such that Ani,j > 0 and that A is aperiodic if there exists n ≥ 1 such that Ani,j > 0
for every i, j ∈ X. Show that:
(a) If A is irreducible then the set of periodic points of σA is dense in A .
10.3 Pressure 325
where B(x, ∞, ε) denotes the set of all y ∈ M such that d(f i (x), f i (y)) ≤ ε for
every n ≥ 0. Bowen [Bow72] has shown that, given b > 0 and δ > 0, there exists
c > 0 such that
Using this fact, prove that h(f ) ≤ g(f , ε, M) + g∗ (f , ε). One says that f is
h-expansive if g∗ (f , ε) = 0 for some ε > 0. Conclude that in that case
h(f ) = g(f , ε, M). [Observation: This generalizes Proposition 10.2.1, since every
expansive transformation is also h-expansive.]
10.3 Pressure
In this section we introduce an important extension of the concept of
topological entropy, called (topological) pressure, and we study its main
properties. Throughout, we consider only continuous transformations in
compact metric spaces. Related to this, check Exercises 10.3.4 and 10.3.5.
326 Variational principle
Pn (f , φ, α) = inf sup eφn (x) : γ is a finite subcover of α n . (10.3.1)
U∈γ x∈U
exists. Define the pressure of the potential φ with respect to f to be the limit
P(f , φ) of P(f , φ, α) when the diameter of α goes to zero. The existence of this
limit is guaranteed by the following lemma:
Lemma 10.3.1. There exists limdiam α→0 P(f , φ, α), that is, there exists some
P(f , φ) ∈ R̄ such that
lim P(f , φ, αk ) = P(f , φ)
k
Proof. Let (αk )k and (βk )k be any sequences of open covers with diameters
converging to zero. Given any ε > 0, fix δ > 0 such that |φ(x) − φ(y)| ≤ ε
whenever d(x, y) ≤ δ. By assumption, diam αk < δ for every k sufficiently large.
For fixed k, let ρ > 0 be a Lebesgue number for αk . By assumption, diam βl < ρ
for every l sufficiently large. By the definition of Lebesgue number, it follows
that every B ∈ βl is contained in some A ∈ αk . Observe also that
sup φn (x) ≤ nε + sup φn (y)
x∈A y∈B
Since ε > 0 is arbitrary, it follows that lim supk P(f , φ, αk ) ≤ lim infl P(f , φ, βl ).
Exchanging the roles of the two sequences of covers, we conclude that the
limits limk P(f , φ, αk ) and liml P(f , φ, βl ) exist and are equal.
Let (αk )k be any sequence of open covers with diameters going to zero. Then,
by Proposition 10.1.12 and the definition of the pressure,
Observe, however, that for general potentials P(f , φ) need not coincide with
the supremum of P(f , φ, α) over all open covers α (see Exercise 10.3.5).
Given any constant c ∈ R, we have that Pn (f , φ + c, α) = ecn Pn (f , φ, α) for
every n ≥ 1 and, consequently, P(f , φ + c, α) = P(f , φ, α) + c for any open
cover α. Hence,
P(f , φ + c) = P(f , φ) + c. (10.3.4)
Analogously, if φ ≤ ψ then Pn (f , φ, α) ≤ Pn (f , ψ, α) for every n ≥ 1, which
implies that P(f , φ, α) = P(f , ψ, α) for every open cover α. That is,
One may replace the supremum by the infimum in (10.3.1), that is, replace
Pn (f , φ, α) with
φn (x)
Qn (f , φ, α) = inf inf e : γ is a finite subcover of α ,
n
x∈U
U∈γ
328 Variational principle
although this makes the definition a bit more complicated. In contrast with
log Pn (f , φ, α), the sequence log Qn (f , φ, α) need not be subadditive. Denote
1
Q− (f , φ, α) = lim inf log Qn (f , φ, α) and
n n
1
Q+ (f , φ, α) = lim sup log Qn (f , φ, α).
n n
Clearly, Q− (f , φ, α) ≤ Q+ (f , φ, α) for every open cover α of M. Furthermore,
Qn (f , 0, α) = Pn (f , 0, α) = N(α n ) for every n and so Q− (f , 0, α) = Q+ (f , 0, α) =
P(f , 0, α) = h(f , α).
Proof. Since φ is (uniformly) continuous, given any ε > 0 there exists δ > 0
such that
inf φn (x) ≤ sup φn (x) ≤ nε + inf φn (x)
x∈C x∈C x∈C
Qn (f , φ, α) ≤ Pn (f , φ, α) ≤ enε Qn (f , φ, α)
as claimed.
Next, define
1
G(f , φ, ε) = lim sup log Gn (f , φ, ε) and
n n
(10.3.8)
1
S(f , φ, ε) = lim sup log Sn (f , φ, ε),
n n
and also
G(f , φ) = lim G(f , φ, ε) and S(f , φ) = lim S(f , φ, ε) (10.3.9)
ε→0 ε→0
Proof. Consider n ≥ 1 and ε > 0. It is clear from the definitions that every
maximal (n, ε)-separated set is (n, ε)-generating. Then,
Sn (f , φ, ε) = sup eφn (x) : E is (n, ε)-separated
x∈E
φn (x)
= sup e : E is (n, ε)-separated maximal (10.3.10)
x∈E
≥ inf eφn (x) : E is (n, ε)-generating = Gn (f , φ, ε)
x∈E
for every n and every ε. This implies that G(f , φ, ε) ≤ S(f , φ, ε) for every ε and,
thus, G(f , φ) ≤ S(f , φ).
Next, we prove that S(f , φ) ≤ P(f , φ). Let ε and δ be positive numbers such
that d(x, y) ≤ δ implies |φ(x) − φ(y)| ≤ ε. Let α be any open cover of M with
diam α < δ and E ⊂ M be any (n, δ)-separated set. Given any subcover γ of α n ,
it is obvious that every point of E is contained in some element of γ . On the
other hand, the hypothesis that E is (n, δ)-separated implies that each element
of γ contains at most one element of E. Therefore,
eφn (x) ≤ sup eφn (y) .
x∈E U∈γ y∈U
Observe that γ (x) ∈ α n and B(x, n, ρ) ⊂ γ (x). Hence, the hypothesis that E is
(n, ρ)-generating implies that γ = {γ (x) : x ∈ E} is a subcover of α. Observe
also that
sup φn (y) ≤ nε + φn (x) for every x ∈ E,
y∈γ (x)
10.3.3 Properties
Properties of the pressure function in the spirit of Proposition 10.1.10
and Corollary 10.1.13 are stated in Exercise 10.3.3. Let us also extend
Propositions 10.1.14 and 10.1.15 to the present context:
n−1
kn−1 )
n−1 )
nk−1
−j
ψ ◦g = j
φ ◦f l
and g (β) = f −l (α).
j=0 l=0 j=0 l=0
Then,
n−1 )
n−1
j
Pn (g, ψ, β) = inf sup e j=0 ψ(g (x)) :γ ⊂ −j
g (β)
U∈γ x∈U j=0
kn−1 )
nk−1
φ(f l (x))
= inf sup e l=0 :γ ⊂ f −l (α) = Pkn (f , φ, α).
U∈γ x∈U l=0
U∈δ x∈U
= inf sup eφn (y) : δ ⊂ α n = Pn (f , φ, α)
V∈γ y∈V
Proof. Write ξ = (1 − t)φ + tψ. Then ξn = (1 − t)φn + tψn for every n ≥ 1 and,
thus, sup(ξn | U) ≤ (1 − t) sup(φn | U) + t sup(ψn | U) for every U ⊂ M. Then,
by the Hölder inequality (Theorem A.5.5),
1−t t
ξn (x) φn (x) ψn (x)
sup e ≤ sup e sup e
U∈γ x∈U U∈γ x∈U U∈γ x∈U
for any finite family γ of subsets of M. This implies that, given any open
cover α,
Pn (f , ξ , α) ≤ Pn (f , φ, α)1−t Pn (f , ψ, α)t
e−2K Pn (f , φ, α) ≤ Pn (f , ψ, α) ≤ e2K Pn (f , φ, α)
1 From the mathematical point of view, the two quantities are equivalent. Preference for one
denomination or the other has mostly to do with the physical interpretation of the set F: for
spin systems one usually refers to the Gibbs free energy, whereas for lattice gases, where the
elements of F describe the rate of occupation of each site, it is more natural to refer to the
pressure.
10.3 Pressure 335
e−βE(ξ )
μ({ξ }) = −βE(η)
for every ξ ∈ (10.3.16)
η∈ e
where |m| = max{|m1 |, . . . , |md |}. In particular (Exercise 10.3.9), the energy
ϕ(ξ ) = (k, ξk , ξ0 )
k∈L
resulting from the action of all the sites on the site 0 at the origin is uniformly
bounded.
Initially, given any finite set ⊂ L, let us consider the system one obtains
by observing only the sites k ∈ and “switching off” their interactions with
the sites in the complement of . This is a finite system, as the configuration
space is contained in F , with energy function given by
E (x) = (k − l, xk , xl ) for every x ∈ F .
l∈ k∈
336 Variational principle
The notion of Gibbs state is obtained from this one by “switching back on”
the interaction with the sites outside , in the way we are going to explain.
Denote by r (x) the expression on the right-hand side of (10.3.18). Observe
that
−1
βE (x)−βE (y)
r (x) = e
y∈F
For ξ , η ∈ F L , define
E(ξ , η) = (k − l, ξk , ξl ) − (k − l, ηk , ηl )
l∈L k∈L
= (j, ξj+l , ξl ) − (j, ηj+l , ηl ) = ϕ(σ l (ξ )) − ϕ(σ l (η)).
l∈L j∈L l∈L
It follows from the condition (10.3.17) that this sum converges whenever the
two configurations are such that ξk = ηk for every k in the complement of some
finite set (Exercise 10.3.9). Then,
−1
βE(ξ ,η)
ρ (ξ ) = β e
η|c =ξ |c
10.3.5 Exercises
10.3.1. Check that the sequence log Pn (f , φ, α) is subadditive.
10.3.2. Show that if f is a homeomorphism then P(f , φ, f (α)) = P(f , φ, α), Q+ (f , φ, f (α)) =
Q+ (f , φ, α) and Q− (f , φ, f (α)) = Q− (f , φ, α) for every open cover α.
10.3.3. Show that, for any potential φ : M → R:
(a) If α, β are open covers with α ≺ β then Q+ (f , φ, α) ≤ Q+ (f , φ, β) and
Q− (f , φ, α) ≤ Q− (f , φ, β).
(b) Q+ (f , φ, α) = Q+ (f , φ, α k ) and Q− (f , φ, α) = Q− (f , φ, α k ) for every k ≥ 1
and every open cover α.
(c) Q+ (f , φ, α) = P(f , φ) = Q− (f , φ, α) for any open cover α such that
diam α k → 0.
(d) P(f , φ, α) = P(f , φ, α k ) for every k ≥ 1 and any open cover α whose
elements are pairwise disjoint.
(e) P(f , φ, α) = P(f , φ) for any open cover α such that diam α k → 0 and whose
elements are pairwise disjoint.
(f) If f is a homeomorphism, one may replace α k by α ±k in statements (b), (c),
(d) and (e).
10.3.4. (Walters). Prove that
P(f , φ) = sup{Q− (f , φ, α) : α is an open cover of M}
= sup{Q+ (f , φ, α) : α is an open cover of M}.
[Observation: In particular, the pressure depends only on the topology of M,
not the distance. This also provides a way to extend the definition to continuous
transformations in compact topological spaces.]
10.3.5. Exhibit a homeomorphism f : M → M, a potential φ : M → R and open
covers α and β of a compact metric space M such that α ≺ β and P(f , φ, α) >
P(f , φ, β) = P(f , φ). [Observation: Thus, the conclusions of Exercise 10.3.3(a)
and Exercise 10.3.4 are no longer valid if one replaces Q± (f , φ, α) by P(f , φ, α).]
10.3.6. Check that the cohomology relation
φ ∼ ψ ⇔ ψ = φ + u ◦ f − u for some continuous function u : M → R
is an equivalence relation.
10.3.7. Let fi : Mi → Mi , i = 1, 2 be continuous transformations in compact metric spaces
and, for each i, let φi be a potential in Mi . Define
f1 × f2 : M 1 × M 2 → M1 × M2 , f1 × f2 (x1 , x2 ) = (f1 (x1 ), f2 (x2 ))
φ1 × φ2 : M1 × M2 → R, φ1 × φ2 (x1 , x2 ) = φ1 (x1 ) + φ2 (x2 ).
Show that P(f1 × f2 , φ1 × φ2 ) = P(f1 , φ1 ) + P(f2 , φ2 ).
10.3.8. Consider the transformation f : S1 → S1 defined by f (x) = 2x mod Z. Prove
that if φ : S1 → R is a Hölder function then
1
P(f , φ) = lim log eφn (p) .
n n
p∈Fix(f n )
338 Variational principle
for every c ∈ R. For c > 0, this implies that η(M) ≤ 1 + h(f )/c. Passing to the
limit when c → +∞, we get that η(M) ≤ 1. Analogously, considering c < 0 and
taking the limit when c → −∞, we get η(M) ≥ 1. Therefore, η is a probability
measure, as stated.
340 Variational principle
for every k ≥ 1. Dividing by k and taking the limit when k → ∞, we get the
inequality (10.4.1).
For proving (10.4.2) we need the following elementary fact:
k
pi (ai − log pi ) ≤ log A.
i=1
Since the measure ν is regular (Proposition A.3.2), given ε > 0 we may find
compact sets Qi ⊂ Pi such that ν(Pi \ Qi ) < ε for every i = 1, . . . , s. Let Q0
be the complement of si=1 Qi and let P0 = ∅. Then Q = {Q0 , Q1 , . . . , Qs } is a
finite partition of M such that ν(Pi Qi ) < sε for every i = 0, 1, . . . , s. Hence,
by Lemma 9.1.6,
Hν (P/Q) ≤ log 2
as long as ε > 0 is sufficiently small (depending only on s). Let ε and Q be
fixed from now on and assume that the open cover α satisfies
diam α < min{d(Qi , Qj ) : 1 ≤ i < j ≤ s}. (10.4.3)
By Lemma 9.1.11, we have that hν (f , P) ≤ hν (f , Q) + Hν (P/Q) ≤ hν (f , Q) +
log 2. Hence, to prove (10.4.2) it suffices to show that
hν (f , Q) + φ dν ≤ log 2 + P(f , φ, α). (10.4.4)
more than 2n elements of Qn . In particular, for each U ∈ γ there exist not more
than 2n sets Q ∈ Qn such that UQ = U. Therefore,
sup eφn (x) ≤ 2n sup eφn (y) , (10.4.6)
Q∈Qn x∈Q U∈γ y∈U
1 φn (x)
n−1
νn = e δx and μn = f∗j νn .
A x∈E j=0
By the definition (10.3.8), recalling also that the space of probability measures
is compact (Theorem 2.1.5), we may choose a subsequence (nj )j → ∞ such
that
1
1. log Snj (f , φ, ε) converges to S(f , φ, ε), and
nj
2. μnj converges, in the weak∗ topology, to some probability measure μ.
We are going to check that such a measure μ is invariant under f and satisfies
(10.4.7). For the reader’s convenience, we split the argument into four steps.
and, consequently,
ϕ d(f∗ μn ) − ϕ dμn ≤ 2 sup |ϕ|.
n
Step 2: Next, we estimate the entropy with respect to νn . Let P be any finite
partition of M such that diam P < ε and μ(∂P) = 0, where ∂P denotes the
union of the boundaries ∂P of all sets P ∈ P. The first condition implies that
each element of P n contains at most one element of E. On the other hand, it is
clear that every element of E is contained in some element of P n . Hence,
1
φn (x) 1 φn (x)
Hνn (P ) =
n
−νn ({x}) log νn ({x}) = − e log e
x∈E x∈E
A A
(10.4.9)
1 φn (x)
= log A − e φn (x) = log A − φn dνn
A x∈E
(the first term is void if r = 0 and the third one is void if n = kqr +r). Therefore,
qr −1
Hνn (P ) ≤
n
Hνn (f −(kj+r) (P k )) + Hνn (P r ) + Hνn (f −(kqr +r) (P n−(kqr +r) )).
j=0
Clearly, #P r ≤ (#P)k . Using Lemma 9.1.3, we find that Hνn (P r ) ≤ k log #P.
For the same reason, the last term in the previous inequality is also bounded
by k log #P. Then, using the property (9.1.12),
qr −1
Hνn (Pn ) ≤ Hf (kj+r) ν (P k ) + 2k log #P (10.4.10)
∗ n
j=0
for every r ∈ {0, . . . , k − 1}. Now, it is clear that every number i ∈ {0, . . . , n − 1}
may be written in a unique way as i = kj+r with 0 ≤ j ≤ qr −1. Then, summing
(10.4.10) over all the values of r,
n−1
kHνn (Pn ) ≤ Hf∗i νn (P k ) + 2k2 log #P. (10.4.11)
i=0
344 Variational principle
The concavity property (9.1.8) of the function φ(x) = −x log x implies that
1
n−1
Hf∗i νn (P k ) ≤ Hμn (P k ).
n i=0
10.4.3 Exercises
10.4.1. Let f : M → M be a continuous transformation in a compact metric space M.
Check that P(f , ϕ) ≤ h(f ) + sup ϕ for every continuous function ϕ : M → R.
10.4.2. Show that if f : M → M is a continuous transformation in a compact metric
space and X ⊂ M is a forward invariant set, meaning that f (X) ⊂ X, then P(f |
X, ϕ | X) ≤ P(f , ϕ).
10.4.3. Give an alternative proof of Proposition 10.3.8, using the variational principle.
10.4.4. Exhibit a continuous transformation f : M → M in a non-compact metric space
M such that f has no invariant probability measure and yet the topological
entropy h(f ) is positive. [Observation: Thus, the variational principle need not
hold when the ambient space is not compact.]
10.4.5. Given numbers α, β > 0 such that α + β < 1, define
x/α if x ∈ [0, α]
g : [0, α] ∪ [1 − β, 1] → [0, 1] g(x) =
(x − 1)/β + 1 if x ∈ [1 − β, 1].
Let K ⊂ [0, 1] be the Cantor set formed by the points x such that gn (x) is defined
for every n ≥ 0 and f : K → K be the restriction of g. Calculate the function ψ :
R → R defined by ψ(t) = P(f , −t log g ). Check that ψ is convex and decreasing
and admits a (unique) zero in (0, 1). Show that hμ (f ) < log g dμ for every
probability measure μ invariant under f .
10.4.6. Let f : M → M be a continuous transformation in a compact metric space, such
that the set of ergodic invariant probability measures is finite. Show that for
every potential ϕ : M → R there exists some invariant probability measure that
realizes the supremum in (10.0.1).
In the special case φ ≡ 0 the elements of E(f , φ) are also called measures of
maximal entropy. Let us start with a few simple examples.
1 d
hμ (σ , P) = lim Hμ (P n ) = −pi log pi .
n n
i=1
We leave it to the reader (Exercise 10.5.1) to check that this function attains
its maximum precisely when the coefficients pi are all equal to 1/d. Moreover,
in that case hμ (σ ) = log d. Recall also (Example 10.1.2) that h(σ ) = log d.
Therefore, the Bernoulli measure associated with the vector p = (1/d, . . . , 1/d)
is the only measure of maximal entropy among the Bernoulli measures. In fact,
it follows from the theory that we develop in Chapter 12 that μ is the unique
measure of maximal entropy among all invariant measures.
This implies that the supremum of over all the invariant probability measures
is less than or equal to the supremum of over the ergodic invariant
probability measures. Since the opposite inequality is obvious, it follows that
the two suprema coincide. By the variational principle (Theorem 10.4.1), the
supremum of over all the invariant probability measures is equal to P(f , φ).
10.5 Equilibrium states 347
Proposition 10.5.5. Assume that h(f ) < ∞. Then the set of equilibrium states
for any potential φ : M → R is a convex subset of M1 (f ): more precisely, given
t ∈ (0, 1) and μ1 , μ2 ∈ M1 (f ),
(1 − t)μ1 + tμ2 ∈ E(f , φ) ⇔ {μ1 , μ2 } ⊂ E(f , φ).
Moreover, an invariant probability measure μ is in E(f , φ) if and only if almost
every ergodic component of μ is in E(f , φ).
Proof. As we have seen in (10.3.6), the hypothesis that the topological entropy
is finite ensures that P(f , φ) < ∞ for every potential φ. Let us consider the
functional (μ) = hμ (f ) + φ dμ introduced in the proof of the previous
result. By Proposition 9.6.1, this functional is convex:
((1 − t)μ1 + tμ2 ) = (1 − t)(μ1 ) + t(μ2 )
for every t ∈ (0, 1) and any μ1 , μ2 ∈ M1 (f ). Then, ((1 − t)μ1 + tμ2 ) is equal
to the supremum of if and only if both (μ1 ) and (μ2 ) are. This proves
the first part of the proposition. The proof of the second part is analogous: the
relation (10.5.1) implies that (μ) = sup if and only if (μP ) = sup for
μ̂-almost every P.
Proof. To get the first claim it suffices to consider the ergodic components of
any element of E(f , φ). Let us move on to proving the second claim. If μ ∈
E(f , φ) is ergodic then (Proposition 4.3.2) μ is an extremal element of M1 (f )
and so it must be an extremal element of E(f , φ). Conversely, if μ ∈ E(f , φ) is
not ergodic then we may write
μ = (1 − t)μ1 + tμ2 , with 0 < t < 1 and μ1 , μ2 ∈ M1 (f ).
By Proposition 10.5.5 we have that μ1 , μ2 ∈ E(f , φ), which implies that μ is
not an extremal element of the set E(f , φ).
In general, the set of equilibrium states may be empty. The first example
of this kind was given by Gurevič. The following construction is taken from
Walters [Wal82]:
x =(. . . , ∗, o1 , . . . , om , ∗, . . . , ∗, e1 , . . . , en , ∗ . . . ),
3 45 6
k
10.5.1 Exercises
10.5.1. Show that, among the Bernoulli measures of the shift map σ : → in the
space = {1, . . . , d}Z , the one with the largest entropy is given by the probability
vector (1/d, . . . , 1/d).
10.5.2. Let σ : → be the shift map in = {1, . . . , d}Z and φ : → R be a
locally constant potential, that is, such that φ is constant on each cylinder [0; i].
Calculate P(f , φ) and show that there exists some equilibrium state that is a
Bernoulli measure.
10.5 Equilibrium states 351
It is clear from the definition that any map sufficiently close to an expanding
one, relative to the C1 topology, is still expanding. Thus, the observation in
Example 11.1.1 provides a whole open set of examples of expanding maps.
A classical result of Michael Shub [Shu69] asserts a (much deeper) kind of
converse: every expanding map on the torus Td is topologically conjugate to
an expanding linear endomorphism fA .
Given a probability measure μ invariant under a transformation f : M → M,
we call the basin of μ the set B(μ) of all points x ∈ M such that
1
n−1
lim ϕ(f j (x)) = ϕ dμ
n→∞ n
j=0
354 Expanding maps
Proof. By the inverse function theorem, for every ξ ∈ M there exist open
neighborhoods U(ξ ) ⊂ M of ξ and V(ξ ) ⊂ N of f (ξ ) such that f maps U(ξ )
11.1 Expanding maps on manifolds 355
Since f −1 (η) is finite (Exercise A.4.6), every W(η) is an open set. Fix ρ > 0
such that 2ρ is a Lebesgue number for the open cover {W(η) : η ∈ M} of M.
In particular, for every y ∈ M there exists η ∈ M such that B(y, ρ) is contained
in W(η), that is, it is contained in V(ξ ) for all ξ ∈ f −1 (η). Since the U(ξ ) are
pairwise disjoints and #f −1 (y) = degree(f ) = #f −1 (η), given any x ∈ f −1 (y)
there exists exactly one ξ ∈ f −1 (η) such that x ∈ U(ξ ). Let h be the restriction
to B(y, ρ) of the inverse of f : U(ξ ) → V(ξ ). By construction, f ◦ h = id and
h(y) = x. Moreover, Dh(z) = Df (h(z))−1 ≤ σ −1 for every z in the domain
of h. By the mean value theorem, this implies that h has the property (11.1.2).
Proof. By Lemma 11.1.3, there exists ρ > 0 such that, for any pre-image x of a
point y ∈ M, there exists a map h : B(y, ρ) → M of class C1 such that f ◦ h = id ,
h(y) = x and
d(h(y1 ), h(y2 )) ≤ σ −1 d(y1 , y2 ) for every y1 , y2 ∈ B(y, ρ).
Hence, if d(f n (x), f n (y)) ≤ ρ for every n ≥ 0 then
d(x, y) ≤ σ −n d(f n (x), f n (y)) ≤ σ −n ρ,
which immediately implies that x = y.
356 Expanding maps
The next result provides a good control of the distortion of the iterates of f
and their inverse branches, which is crucial for the proof of Theorem 11.1.2.
This is the only step of the proof where we use the hypothesis that the Jacobian
x → det Df (x) is Hölder. Note that, since f is a local diffeomorphism and M is
compact, the Jacobian is bounded from zero and infinity. Hence, the logarithm
log | det Df | is also Hölder: there exist C0 > 0 and ν > 0 such that
log | det Df (x)| − log | det Df (y)| ≤ C0 d(x, y)ν for any x, y ∈ M.
Proposition 11.1.5 (Distortion lemma). There exists C1 > 0 such that, given
any n ≥ 1, any y ∈ M and any inverse branch hn : B(y, ρ) → M of f n ,
| det Dhn (y1 )|
log ≤ C1 d(y1 , y2 )ν ≤ C1 (2ρ)ν
| det Dh (y2 )|
n
Note that log | det Dhi | = − log | det Df | ◦ hi and recall that every hj is a
σ −1 -contraction. Hence,
| det Dhn (y1 )|
n n
ν
log ≤ C0 d(hi
(y1 ), hi
(y2 )) ≤ C0 σ −iν d(y1 , y2 )ν .
| det Dhn (y2 )| i=1 i=1
Therefore, to prove the lemma it suffices to take C1 = C0 ∞ i=1 σ
−iν
.
We also need the auxiliary result that follows. Recall that, given a function ϕ
and a measure ν, we denote by ϕν the measure defined by (ϕν)(B) = B ϕ dν.
Lemma 11.1.8. Let ν be a probability measure on a compact metric space X
and ϕ : X → [0, +∞) be an integrable function with respect to ν. Let μi , i ≥ 1,
be a sequence of probability measures on X converging, in the weak∗ topology,
to a probability measure μ. If μi ≤ ϕν for every i ≥ 1 then μ ≤ ϕν.
Proof. Let B be any measurable set. For each ε > 0, let Kε be a compact subset
of B such that μ(B \ Kε ) and (ϕν)(B \ Kε ) are both less than ε (such a compact
set does exist, by Proposition A.3.2). Fix r > 0 small enough that the measure
of Aε \ Kε is also less than ε, for both μ and ϕν, where Aε = {z : d(z, Kε ) < r}.
The set of values of r for which the boundary of Aε has positive μ-measure is at
most countable (Exercise A.3.2). Hence, up to changing r slightly if necessary,
we may suppose that the boundary of Aε has measure zero. Then μ = limi μi
implies that μ(Aε ) = limi μi (Aε ) ≤ (ϕν)(Aε ). Making ε → 0, we conclude that
μ(B) ≤ (ϕν)(B).
Proof. Given any 0 < ε < ν(B), let Kε ⊂ B be a compact set with ν(B \ Kε ) < ε.
Let Kε,n be the union of all the elements of Pn that intersect Kε . Since the
diameters of the partitions converge to zero, ν(Kε,n \ Kε ) < ε for every n
sufficiently large. By contradiction, suppose that
ν(B) − ε
ν Kε ∩ Vn ≤ ν(Vn )
ν(B) + ε
for every Vn ∈ Pn that intersects Kε . It would follow that
ν(B) − ε ν(B) − ε
ν(Kε ) ≤ ν Kε ∩ Vn ≤ ν(Vn ) = ν(Kε,n )
V V
ν(B) + ε ν(B) + ε
n n
ν(B) − ε
≤ (ν(Kε ) + ε) ≤ ν(B) − ε < ν(Kε ).
ν(B) + ε
This contradiction shows that there must exist some Vn ∈ Pn such that
ν(B) − ε
ν(Vn ) ≥ ν B ∩ Vn ≥ ν Kε ∩ Vn > ν(Vn )
ν(B) + ε
and, consequently, ν(Vn ) > 0. Making ε → 0 we get the claim.
Proof. It follows from the previous lemma there exist at most s = #P0 pairwise
disjoint invariant sets with positive Lebesgue measure. Therefore, M may be
partitioned into a finite number of minimal invariant sets A1 , . . . , Ar , r ≤ s
with positive Lebesgue measure, where by minimal we mean that there are no
invariant sets Bi ⊂ Ai with 0 < m(Bi ) < m(Ai ). Given any absolutely continuous
invariant probability measure μ, there exists some i such that μ(Ai ) > 0. The
normalized restriction
μ(B ∩ Ai )
μi (B) =
μ(Ai )
of μ to any such Ai is invariant and absolutely continuous. Moreover, the
assumption that Ai is minimal implies that μi is ergodic.
Proof. Let x ∈ U and r > 0 be such that the ball of radius r around x is
contained in U. Given any n ≥ 1, suppose that f n (U) does not cover the
whole manifold. Then there exists some curve γ connecting f n (x) to a point
y ∈ M \f n (U), and that curve may be taken with length smaller than diam M +1.
360 Expanding maps
1
n−1
δ j →μ for Lebesgue almost every x.
n j=0 f (x)
1 Note that any local diffeomorphism from a compact manifold to itself is a covering map.
11.1 Expanding maps on manifolds 361
Actually, the facts stated in the previous paragraph can already be proven
with the methods available at this point. We invite the reader to do just that
(Exercises 11.1.3 through 11.1.6), in the context of expanding maps of the
interval, which are technically a bit simpler than expanding maps on a general
manifold.
Example 11.1.16. We say that a transformation f : [0, 1] → [0, 1] is an
expanding map of the interval if there exists a countable (possibly finite)
family P of pairwise disjoint open subintervals whose union has full Lebesgue
measure in [0, 1] and which satisfy:
(i) The restriction of f to each P ∈ P is a diffeomorphism onto (0, 1); denote
by fP−1 : (0, 1) → P its inverse.
(ii) There exist C > 0 and θ > 0 such that, for every x, y and every P ∈ P,
log |D(f −1 )(x)| − log |D(f −1 )(y)| ≤ C|x − y|θ .
P P
(iii) There exist c > 0 and σ > 1 such that, for every n and every x,
|Df n (x)| ≥ cσ n (whenever the derivative is defined.)
This class of transformations includes the decimal expansion and the Gauss
map as special cases. Its properties are analyzed in Exercises 11.1.3
through 11.1.5.
Exercise 11.1.6 deals with a slightly more general class of transformations,
where we replace condition (i) by
(i ) There exists δ > 0 such that the restriction of f to each P ∈ P is a
diffeomorphism onto some interval f (P) of length larger than δ that
contains every element of P that it intersects.
11.1.4 Exercises
11.1.1. Let f : M → M be a local diffeomorphism in a compact manifold and m be the
Lebesgue measure on M. Check the following facts:
362 Expanding maps
for every t. This shows that we may take δ = 1. Then, β(1) ∈ B(p, ρ) and
f (β(1)) = γ (1) = y. In this way, we have shown that f (B(p, ρ)) contains
B(f (p), σρ), which is a neighborhood of the closure of B(f (p), ρ). Now
consider any x, y ∈ B(p, ρ). Note that d(f (x), f (y)) < 2Kρ. Let γ : [0, 1] →
B(f (x), 2Kρ) be a minimizing geodesic connecting f (x) to f (y). Arguing as in
the previous paragraph, we find a differentiable curve β : [0, 1] → B(x, 2Kρ)
connecting x to y and such that f (β(t)) = γ (t) for every t. Then,
Proof. It is clear that the condition (11.2.1) remains valid for the restriction.
We are left to check that f ( ∩ B(p, ρ)) contains a neighborhood of ∩
B(f (p), ρ) inside . By assumption, f (B(p, ρ)) contains some neighborhood V
of the closure of B(f (p), ρ). Then ∩ V is a neighborhood of ∩ B(f (p), ρ).
Moreover, given any y ∈ ∩ V there exists x ∈ B(p, ρ) such that f (x) = y.
Since f −1 () = , this point is necessarily in . This proves that ∩ V is
contained in the image f ( ∩ B(p, ρ)). Hence, the restriction of f to the set
is an expanding map, as stated.
0 J1 J2 J3 1
is compact (one can show that K is a Cantor set) and f −1 () = . The
restriction f : → is an expanding map. Indeed, fix any ρ > 0 smaller than
the distance between any two connected components of J. Then every ball of
radius ρ inside is contained in a unique connected component of J and so,
by (11.2.2), it is dilated by a factor greater than or equal to σ .
that is, |(f n ) (x)| = 1 for every x ∈ Fix(f n ) and every n ≥ 1. Let be the
complement of the union of the basins of attraction of all the attracting periodic
points of f . Then the restriction f : → is an expanding map: this is a
consequence of a deep theorem of Ricardo Mañé [Mañ85].
its inverse and call it the inverse branch of f at p. It is clear that hp (f (p)) = p
and f ◦ hp = id . The condition (11.2.1) implies that hp is a σ −1 -contraction:
d(hp (z), hp (w)) ≤ σ −1 d(z, w) for every z, w ∈ B(f (p), ρ). (11.2.5)
hp hf(p)
p f(p) f 2(p)
Figure 11.2. Inverse branches of f n
Proof. Assume that d(f n (z), f n (w)) < ρ for every n ≥ 0. This implies that z =
hnw (f n (z)) for every n ≥ 0. Then, the property (11.2.6) gives that
d(z, w) ≤ σ −n d(f n (z), f n (w)) < ρσ −n .
Making n → ∞, we get that z = w. So, ρ is a constant of expansivity for f .
Lemma 11.2.10. If d(x, y) < ρ then, given any pre-orbit (x−n )n of x, there
exists a pre-orbit (y−n )n of y asymptotic to (x−n )n , in the sense that d(x−n , y−n )
converges to 0 when n → ∞.
Proof. Write x = hp (y) and q = f (p). Consider any ε > 0 such that 2ε is
a constant of expansivity for f . Take δ > 0 given by the shadowing lemma
(Proposition 11.2.9). By Lemma 11.2.10, there exists a pre-orbit (x−n )n of x
asymptotic to the periodic pre-orbit (p̄−n )n of p. In particular,
q−1 y
x q−l
x−k+1
p q
Proof. Since z ∈ , we may find some periodic point p close enough to z that
w ∈ B(f (p), ρ) and hp (w) = hz (w). Since w ∈ , we may find periodic points
yn ∈ B(f (p), ρ) converging to w. By Lemma 11.2.12, we have that hp (yn ) ∈
for every n. Passing to the limit, we conclude that hp (w) ∈ .
Proof. Given any x ∈ M, let ω(x) denote its ω-limit set, that is, the set of
accumulation points of the iterates f n (x) when n → ∞. First, we show that
ω(x) ⊂ . Then, we deduce that f k (x) ∈ for some k ≥ 0.
Let ε > 0 be such that 2ε is a constant of expansivity for f . Take δ > 0
given by the shadowing lemma (Proposition 11.2.9) and let α ∈ (0, δ) be such
that d(f (z), f (w)) < δ whenever d(z, w) < α. Let y be any point in ω(x). The
definition of the ω-limit set implies that there exist r ≥ 0 and s ≥ 1 such that
Observe that d(f (z0 ), z1 ) = d(f (y), f r+1 (x)) < δ (because d(y, f r (x)) < α),
d(f (zs−1 ), zs ) = d(f r+s (x), y) < α < δ and f (zn ) = zn+1 in all the other cases. In
particular, (zn )n is a δ-pseudo-orbit. Then, by Proposition 11.2.9, there exists
some periodic point z such that d(y, z) < ε. Making ε → 0, we conclude that y
is accumulated by periodic points, that is, y ∈ .
Let ε > 0 and δ > 0 be as before. It is no restriction to suppose that
δ < ε. Take β ∈ (0, δ/2) such that d(f (z), f (w)) < δ/2 whenever d(z, w) < β.
Since ω(x) is contained in , there exist k ≥ 1 and points wn ∈ such that
d(f n+k (x), wn ) < β for every n ≥ 0. Observe that
d(f (wn ), wn+1 ) ≤ d(f (wn ), f n+k+1 (x)) + d(f n+k+1 (x), wn+1 ) < δ/2 + β < δ
d(f n (f k (x)), f n (w)) ≤ d(f n+k (x), wn ) + d(wn , f n (w)) < β + ε < 2ε
Moreover, the number k, the numbers m(i) and the sets Mi,j are unique up to
renumbering.
claim. Note that the statement means that the image and the pre-image of any
equivalence class are both equivalence classes.
Observe also that if d(p, q) < ρ then p ∼ q. Indeed, by Lemma 11.2.10
we may find a pre-orbit of q asymptotic to the periodic pre-orbit of p and,
analogously, a pre-orbit of p asymptotic to the periodic pre-orbit of q. It follows
that the equivalence classes are open sets and, since M is compact, they are
finite in number. Moreover, if A and B are two different equivalence classes,
then their closures Ā and B̄ are disjoint: the distance between them is at least
ρ. Since p ∼ q if and only if f (p) ∼ f (q), it follows that the transformation f
permutes the closures of the equivalence classes.
Thus, we may enumerate the closures of the equivalence classes as Mi,j , with
1 ≤ i ≤ k and 1 ≤ j ≤ m(i), in such a way that
f (Mi,j ) = Mi,j+1 for j < m(i) and f (Mi,m(i) ) = Mi,1 . (11.2.12)
The properties (i) and (ii) in the statement of the theorem are immediate
consequences.
Let us prove property (iii). Since the Mi are pairwise disjoint, it follows
from (11.2.12) that f −1 (Mi ) = Mi for every i. Hence, Lemma 11.2.2 implies
that f : Mi → Mi is an expanding map. By Lemma 4.3.4, to show that this map
is transitive it suffices to show that given any open subsets U and V of Mi there
exists n ≥ 1 such that f n (U) intersects V. It is no restriction to assume that U ⊂
Mi,j for some j. Moreover, up to replacing V by some pre-image f −k (V), we
may suppose that V is contained in the same Mi,j . Choose periodic points p ∈ U
and q ∈ V. By the definition of equivalence classes, there exists some pre-orbit
(q−n )n of q asymptotic to the periodic pre-orbit (p̄−n )n of p. In particular, we
may find n arbitrarily large such that q−n ∈ U. Then q ∈ f n (U) ∩ V. Therefore,
f : Mi → Mi is transitive.
Next, we prove property (iv). Since the Mi,j are pairwise disjoint, it follows
from (11.2.12) that f −m(i) (Mi,j ) = Mi,j for every i. Hence (Lemma 11.2.2),
g = f m(i) : Mi,j → Mi,j is an expanding map. We also want to prove that g
is topologically exact. Let U be a non-empty open subset of Mi,j and p be a
periodic point of f in U. By (11.2.12), the period κ is a multiple of m(i), say
κ = sm(i). Let q be any periodic point of f in Mi,j . By the definition of the
equivalence relation ∼, there exists some pre-orbit (q−n )n of q asymptotic to
the periodic pre-orbit (p̄−n )n of p. In particular, d(q−κn , p) → 0 when n → ∞.
Then hκn q (B(q, ρ)) is contained in U for every n sufficiently large. This implies
that g (U) = f κn (U) contains B(q, ρ) for every n sufficiently large. Since Mi,j
sn
11.2.4 Exercises
11.2.1. Show that if f : M → M is a local homeomorphism in a compact connected
metric space then the number of pre-images #f −1 (y) is the same for every y ∈ M.
11.2.2. Show that if an expanding map is topologically mixing then it is topologically
exact.
374 Expanding maps
(i) d(f mj−1 +i (p), f i (xj )) < ε for 0 ≤ i < nj and 1 ≤ j ≤ s, and
(ii) f ms (p) = p.
Proof. Given ε > 0, take δ > 0 as in the shadowing lemma (Proposition 11.2.9).
Without loss of generality, we may suppose that δ < ε and 2ε is a constant of
expansivity for f (recall Corollary 11.2.8). Since f is topologically exact, given
any z ∈ M there exists κ ≥ 1 such that f k (B(z, δ)) = M for every k ≥ κ. Moreover
(see Exercise 11.2.3), since M is compact, we may choose κ depending only
on δ. Let xj , nj , kj ≥ κ, j = 1, . . . , s be as in the statement. In particular, for
each j = 1, . . . , s − 1 there exists yj ∈ B(f nj (xj ), δ) such that f kj (yj ) = xj+1 .
Analogously, there exists ys ∈ B(f ns (xs ), δ) such that f ks (ys ) = x1 . Consider the
11.3 Entropy and periodic points 375
1
n−1
φ̃i (x) = lim φi (f t (x)) exists for every i. (11.3.7)
n n
t=0
Fix C > sup |φi | ≥ sup |φ̃i | and take δ > 0 such that
ε
d(x, y) < δ ⇒ |φi (x) − φi (y)| < for every i. (11.3.8)
5
Fix κ = κ(δ) ≥ 1 given by the property of specification (Proposition 11.3.1).
Choose points xj ∈ M, 1 ≤ j ≤ s satisfying (11.3.7) and positive numbers αj ,
1 ≤ j ≤ s such that j αj = 1 and
s
ε
φ̃i dμ − α φ̃ (x ) <
j i j 5 for every i (11.3.9)
j=1
(use Exercise A.2.6). Take kj ≡ κ and choose integer numbers nj much bigger
than κ, in such a way that
nj
− αj < ε (11.3.10)
m 5Cs
s
(recall that ms = j (nj + kj ) = sκ + j nj ) and, using (11.3.8),
nj −1
ε
φ (f t
(x )) − n φ̃ (x ) < nj for 1 ≤ i ≤ N. (11.3.11)
i j j i j 5
t=0
Combining (11.3.9) and (11.3.10) with the fact that φ̃i dμ = φi dμ, we get
s
ε
φi dμ − nj
φ̃ (x ) < + ε s sup |φ̃i | < 2ε . (11.3.12)
m
i j 5 5Cs 5
j=1 s
378 Expanding maps
11.3.3 Exercises
11.3.1. Let f : M → M be a continuous transformation in a compact metric space
M. Check that if some iterate f l , l ≥ 1 has the property of specification, or
specification by periodic points, then so does f .
11.3.2. Let f : M → M be a continuous transformation in a metric space with the
property of specification. Show that f is topologically mixing.
11.3.3. Let f : M → M be a topologically mixing expanding map and ϕ : M → R
be a continuous function. Assume that there exist probability measures μ1 , μ2
invariant under f and such that ϕ dμ1 = ϕ dμ2 . Show that there exists x ∈ M
such that the time average of ϕ on the orbit of x does not converge. [Observation:
One can show (see [BS00]) that the set Mϕ of points where the time average of
ϕ does not converge has full entropy and full Hausdorff dimension.]
11.3.4. Prove the following generalization of Proposition 11.3.2: if f : M → M is a
topologically exact expanding map then
1
P(f , φ) = lim log eφk (p) for every Hölder function φ : M → R.
k k
p∈Fix(f )
k
11.3 Entropy and periodic points 379
Before getting into the details of the proof of Theorem 12.1 let us outline the
main points. The arguments in the proof turn around the transfer operator (or
Ruelle–Perron–Frobenius operator), the linear operator L : C0 (M) → C0 (M)
defined in the Banach space C0 (M) of continuous complex functions by
Lg(y) = eϕ(x) g(x). (12.1.1)
x∈f −1 (y)
of all complex Borel measures. Then, the dual of the transfer operator is the
linear operator L∗ : M(M) → M(M) defined by
∗
gd L η = Lg dη for every g ∈ C0 (M) and η ∈ M(M). (12.1.4)
The other comment concerns the Rokhlin formula. Let P be any finite
partition of M with diam P < ρ. For each n ≥ 1, every element of the partition
*n−1 −j
P n = j=0 f (P) is contained in the image hn−1 (P) of some P ∈ P by an
inverse branch hn−1 of the iterate f n−1 . In particular, diam P n < σ −n+1 ρ for
every n. Then, P satisfies the hypotheses of Theorem 9.7.3 at every point.
Hence, the Rokhlin formula holds for every invariant probability measure.
and may be seen as the cone of finite positive Borel measures. It follows
0
directly from (12.1.4) that C+ (M)∗ is preserved by the dual operator L∗ .
Lemma 12.1.1. Consider the spectral radius λ = ρ(L∗ ) = ρ(L). Then there
exists some probability measure ν on M such that L∗ ν = λν.
L∗ m = m. (12.1.6)
To check this fact, it is enough to show that L∗ m(E) = m(E) for every
measurable set E contained in the image of some inverse branch hj : B(y, ρ) →
M (because, M being compact, every measurable set may be written as a finite
disjoint union of subsets E of this kind). Now, using the expression (12.1.2),
k
XE
L∗ m(E) = XE d(L∗ m) = (LXE ) dm = ◦ hi dm.
i=1
| det Df |
384 Thermodynamic formalism
converges to ν(f (A)). Since the expression on the left-hand side converges to
−ϕ
A λe dν, we conclude that
ν(f (A)) = λe−ϕ dν,
A
Proof. Suppose, by contradiction, that there exists some open set U ⊂ M such
that η(U) = 0. Note that f is an open map, since it is a local homeomorphism.
12.1 Theorem of Ruelle 385
Thus, the image f (U) is also an open set. Moreover, we may write U as a finite
disjoint union of domains of invertibility A. For each one of them,
η(f (A)) = Jη f dη = 0.
A
Therefore, η(f (U)) = 0. By induction, it follows that η(f n (U)) = 0 for every
n ≥ 0. Since we take f to be topologically exact, there exists n ≥ 1 such that
f n (U) = M. This contradicts the fact that η(M) = 1.
Lemma 12.1.5. There exists K1 > 0 such that for every n ≥ 1, every x ∈ M and
every y ∈ B(x, n + 1, ρ),
|ϕn (x) − ϕn (y)| ≤ K1 d(f n (x), f n (y))α .
Proof. By hypothesis, d(f i (x), f i (y)) < ρ for every 0 ≤ i ≤ n. Then, for each j =
1, . . . , n, the inverse branch hj : B(f n (x), ρ) → M of f j at the point f n−j (x), which
maps f n (x) to f n−j (x), also maps f n (y) to f n−j (y). Hence, recalling (11.2.6),
d(f n−j (x), f n−j (y)) ≤ σ −j d(f n (x), f n (y)) for every j = 1, . . . , n. Then,
n
|ϕn (x) − ϕn (y)| ≤ |ϕ(f n−j (x)) − ϕ(f n−j (y))|
j=1
n
≤ K0 σ −jα d(f n (x), f n (y))α .
j=1
∞ −jα
Therefore, we may take any K1 ≥ K0 j=0 σ .
Corollary 12.1.6. There exists K2 > 0 such that for every n ≥ 1, every x ∈ M
and every y ∈ B(x, n + 1, ρ),
Jν f n (x)
K2−1 ≤ ≤ K2 .
Jν f n (y)
386 Thermodynamic formalism
Proof. From the expression of the Jacobian in Lemma 12.1.3 it follows that
(recall Exercise 9.7.5)
Lemma 12.1.7. For every small ε > 0, there exists K3 = K3 (ε) > 0 such that,
denoting P = log λ,
ν(B(x, n, ε))
K3−1 ≤ ≤ K3 for every x ∈ M and every n ≥ 1.
exp(ϕn (x) − nP)
Up to reducing ε, we may assume that d(f (x), f (y)) < ρ whenever d(x, y) < ε.
This implies that B(x, n, ε) ⊂ B(x, n + 1, ρ) for every x ∈ M and n ≥ 1. Then,
by Corollary 12.1.6, the value of Jν f n at any point y ∈ B(x, n, ε) differs from
Jν f n (x) by a factor bounded by the constant K2 . It follows that
K2−1 ν(f n (B(x, n, ε))) ≤ Jν f n (x)ν(B(x, n, ε)) ≤ K2 ν(f n (B(x, n, ε))). (12.1.8)
for every x ∈ M and every n. It is clear that the left-hand side of (12.1.9) is
bounded above by 1. Moreover, Jν f = λe−ϕ is bounded from zero and (by
Exercise 12.1.1 and Lemma 12.1.4) the set {ν(B(y, ε)) : y ∈ M} is also bounded
from zero. Therefore, the right-hand side of (12.1.9) is bounded below by some
number a > 0. Using these observations in (12.1.8), we obtain
ν(B(x, n, ε))
K2−1 a ≤ ≤ K2 .
exp(ϕn (x) − nP)
Now it suffices to take K3 = max{K2 /a, K2 }.
12.1 Theorem of Ruelle 387
where the sum is over all inverse branches hni : B(y, ρ) → M of the iterate f n .
In particular, ϕn (hn (y ))
Ln 1(y1 ) ie
i 1
= ϕn (hni (y2 ))
.
Ln 1(y2 ) i e
By Lemma 12.1.5, for each of these inverse branches hni one has
|ϕn (hni (y1 )) − ϕn (hni (y2 ))| ≤ K1 d(y1 , y2 )α .
Consequently,
Ln 1(y1 )
−K1 d(y1 ,y2 )α α
e ≤ n ≤ eK1 d(x1 ,x2 ) .
L 1(y2 )
Therefore, one may take any K4 ≥ K1 .
It follows that the sequence λ−n Ln 1 is bounded from zero and infinity:
Corollary 12.1.9. There exists K5 > 0 such that K5−1 ≤ λ−n Ln 1(x) ≤ K5 for
every n ≥ 1 and any x ∈ M.
Since f is topologically exact, there exists N ≥ 1 such that f N (B(x, ρ)) = M for
every x ∈ M (check Exercise 11.2.3). Now, given any x, y ∈ M, we may find
x ∈ B(x, ρ) such that f N (x ) = y. Then, on the one hand,
Ln+N 1(y) = eϕN (z) Ln 1(z) ≥ eϕN (x ) Ln 1(x ) ≥ e−cN Ln 1(x ).
z∈f −N (y)
388 Thermodynamic formalism
On the other hand, Lemma 12.1.8 gives that Ln 1(x ) ≥ Ln 1(x) exp(−K4 ρ α ).
Take c = sup |ϕ| and K ≥ exp(K4 ρ α )ecN λN . Combining the previous inequali-
ties, we get that
Ln+N 1(y) ≥ exp(−K4 ρ α )e−cN Ln 1(x) ≥ K −1 λN Ln 1(x)
for every x, y ∈ M. Therefore, for every n ≥ 1,
min λ−(n+N) Ln+N 1 ≥ K −1 max λ−n Ln 1. (12.1.11)
Combining (12.1.10) and (12.1.11), we get:
max λ−n Ln 1 ≤ K min λ−(n+N) Ln+N 1 ≤ K for every n ≥ 1,
min λ−n Ln 1 ≥ K −1 max λ−n+N Ln−N 1 ≥ K −1 for every n > N.
To conclude the proof, we only have to extend this last estimate to the values
n = 1, . . . , N. For that, observe that each Ln 1 is a positive continuous function.
Since M is compact, it follows that the minimum of Ln 1 is positive for every n.
Then, we may take K5 ≥ K such that min λ−n Ln 1 ≥ K5−1 for every n = 1, . . . , N.
We are ready to show that the transfer operator L admits some eigenfunction
associated with the eigenvalue λ. Corollary 12.1.9 and Lemma 12.1.10 imply
that the time average
1 −i i
n−1
hn = λ L1
n i=0
The first term on the right-hand side converges to λh whereas the second
one converges to zero, because the sequence λ−n Ln 1 is uniformly bounded.
It follows that Lh = λh, as we stated.
Note that λ−n Ln 1 dν = λ−n d(L∗n ν) = 1 dν = 1 for every n ∈ N, by the
definition of ν. It follows that hn dν = 1 for every n and, using the dominated
convergence theorem, h dν = 1. All the other claims in the statement follow,
in an entirely analogous way, from Corollary 12.1.9 and Lemma 12.1.10.
We are going to see that μ is an equilibrium state for the potential ϕ and
satisfies all the other conditions in Theorem 12.1.
From Lemma 12.1.11 we get that μ(M) = h dν = 1 and so μ is a
probability measure. Moreover,
also follows from the relation (12.1.12), together with Lemma 12.1.7, that μ is
a Gibbs state: taking L = K5 K, we find that
μ(B(x, n, ε))
L−1 ≤ ≤ L, (12.1.13)
exp(ϕn (x) − nP)
for every x ∈ M and every n ≥ 1. Recall that P = log λ.
Lemma 12.1.12. The probability measure μ is invariant under f . Moreover, f
admits a Jacobian with respect to μ, given by Jμ f = λe−ϕ (h ◦ f )/h.
Proof. Start by noting that L (g1 ◦f )g2 ) = g1 Lg2 , for any continuous functions
g1 , g2 : M → R. Indeed, for every y ∈ M,
L (g1 ◦ f )g2 (y) = eϕ(x) g1 (f (x))g2 (x)
x∈f −1 (y)
(12.1.14)
ϕ(x)
= g1 (y) e g2 (x) = g1 (y)Lg2 (y).
x∈f −1 (y)
Proof. Combining the Rokhlin formula (Theorem 9.7.3) with the second part
of Lemma 12.1.12,
hμ (f ) = log Jμ f dμ = log λ − ϕ dμ + (log h ◦ f − log h) dμ.
Since μ is invariant and log h is bounded (Corollary 12.1.9), the last term is
equal to zero. This shows that hμ (f ) = P − ϕ dμ, as stated.
12.1 Theorem of Ruelle 391
By the definition of g and the hypothesis that η is invariant, the integral on the
right-hand side of (12.1.18) is equal to
g
(− log gη + log g + log h ◦ f − log h) dη = log dη. (12.1.19)
gη
Recalling the definition of gη , Exercise 9.7.3 gives that
g g
log dη = gη (x) log (x) dη(y). (12.1.20)
gη −1
gη
x∈f (y)
for every j = 1, . . . , n.
For each y ∈ M, take pi = gη (xi ) and bi = g(xi )/gη (xi ), where the xi are the
pre-images of y. The identity (12.1.17) means that i pi = 1 for η-almost every
y. Then, we may apply Lemma 12.1.14:
g g
gη (x) log (x) ≤ log gη (x) (x)
gη gη
x∈f −1 (y) x∈f −1 (y)
(12.1.21)
= log g(x) = 0
x∈f −1 (y)
for η-almost every y; in the last step we used (12.1.16). Combining the relations
(12.1.18) through (12.1.21), we find:
g
hη (f ) + ϕ dη − P = log dη = 0. (12.1.22)
gη
Corollary 12.1.15. P(f , ϕ) = P = log ρ(L).
Proof. The first claim is an immediate consequence of the second one and
Lemma 12.1.4.
Note that the identity in (12.1.22) also implies that the identity in (12.1.21)
holds for η-almost every y ∈ M. According to Lemma 12.1.14, that happens if
and only if the numbers bi = log(g(xi )/gη (xi )) are all equal. In other words, for
η-almost every y ∈ M there exists a number c(y) such that
g(x)
= c(y) for every x ∈ f −1 (y).
gη (x)
12.1 Theorem of Ruelle 393
for η-almost every y. It follows that gη = g at η-almost every point, and so the
function 1/g = λe−ϕ (h ◦ f )/h is a Jacobian of f with respect to η. This proves
the second claim.
To prove the third claim, let ξ : M → R be any continuous function. On the
one hand, by the definition of the transfer operator,
∗ η 1 1 ϕ(x)
ξ dL = Lξ dη = e ξ(x) dη(y). (12.1.23)
h h h(y) −1 x∈f (y)
Since the continuous function ξ is arbitrary, this shows that L∗ (η/h) = λ(η/h),
as stated.
for any x ∈ hn (P). Recalling that Jη1 f = Jη2 f (Corollary 12.1.16), it follows that
η2 (P)η1 (hn (P))
K7−2 ≤ ≤ K72 . (12.1.26)
η1 (P)η2 (h (P))
n
Combining Lemmas 4.3.3 and 12.1.18 we get that all the ergodic equilib-
rium states are equal. Now, by Proposition 10.5.5, the connected components
of any equilibrium state are also equilibrium states (ergodic, of course). It
follows that there exists a unique equilibrium state, as stated.
There is an alternative proof of the fact that the equilibrium state is unique
that does not use Proposition 10.5.5 and, thus, does not require the theorem of
Jacobs. Indeed, the results in the next section imply that the equilibrium state
μ = hν in Section 12.1.4 is ergodic. By Lemma 12.1.18, that implies that all
the equilibrium states are ergodic. Using Lemma 4.3.3, it follows that all the
equilibrium states must coincide.
As a consequence, the reference measure ν is also unique: if there were two
distinct reference measures, ν1 and ν2 , then μ1 = hν1 and μ2 = hν2 would be
distinct equilibrium states. Analogously, the positive eigenfunction h is unique
up to multiplication by a positive constant.
12.1.7 Exactness
Finally, let us prove that the system (f , μ) is exact. Recall that this means that
if B ⊂ M is such that there exist measurable sets Bn satisfying B = f −n (Bn ) for
every n ≥ 1, then B has measure 0 or measure 1.
Let B be such a subset of M and assume that μ(B) > 0. Let P be a finite
partition of M by subsets with non-empty interior and diameter less than ρ.
For each n, let Qn be the partition of M whose elements are the images hn (P)
of the sets P ∈ P under the inverse branches hn of the iterate f n .
Lemma 12.1.19. For every ε > 0 and every n ≥ 1 sufficiently large there exists
some hn (P) ∈ Qn such that
μ B ∩ hn (P) > (1 − ε)μ(hn (P)). (12.1.28)
Proof. Fix ε > 0. Since the measure μ is regular (Proposition A.3.2), given any
δ > 0 there exist some compact set F ⊂ B and some open set A ⊃ B satisfying
μ(A \ F) < δ. Since we assume that μ(B) > 0, this inequality implies that
μ(F) > (1 − ε)μ(A), as long as δ > 0 is sufficiently small. Fix δ from now on.
Note that diam Qn < σ −n ρ. Then, for every n sufficiently large, any element
hn (P) of Qn that intersects F is contained in A. By contradiction, suppose
that (12.1.28) is false for every hn (P). Then, adding over all the hn (P) that
intersect F,
μ(F) ≤ μ F ∩ hn (P) ≤ μ B ∩ hn (P)
P,hn P,hn
≤ (1 − ε) μ(hn (P)) ≤ (1 − ε)μ(A).
P,hn
L∗ m = m.
Applying the previous theory (from Lemma 12.1.3 on) with λ = 1 and ν = m,
we find a Hölder function h : M → R, bounded from zero and infinity, such
that Lh = h and the measure μ = hm is the equilibrium state of the potential
ϕ. Recalling Corollary 11.1.15, it follows that μ is also the unique probability
measure invariant under f and absolutely continuous with respect to m. The
fact that h is positive implies that μ and m are equivalent measures. Exactness
was proven in Section 12.1.7.
Let ϕ̃ be the time average of the function ϕ, given by the Birkhoff ergodic
theorem. Then,
log | det Df | dμ = −ϕ dμ = −ϕ̃ dμ. (12.1.32)
Moreover,
1
n−1
1
− ϕ̃(x) = lim log | det Df (f j (x))| = lim log | det Df n (x)| (12.1.33)
n n n n
j=0
at μ-almost every point. In the context of our comments about the Oseledets
theorem (see the relation (c1) in Section 3.3.5) we mentioned that
1 k(x)
lim log | det Df (x)| =
n
di (x)λi (x), (12.1.34)
n n
i=1
where λ1 (x), . . . , λk(x) (x) are the Lyapunov exponents of the transformation
f at the point x and d1 (x), . . . , dk(x) (x) are the corresponding multiplicities.
398 Thermodynamic formalism
Since these functions are invariant (see the relation (a1) in Section 3.3.5)
and the measure μ is ergodic, the functions k(x), λi (x) and di (x) are constant
at μ-almost every point. Let us denote by k, λi and di these constants. Then
(12.1.35) translates into the following theorem:
12.1.9 Exercises
12.1.1. Show that if η is a Borel measure on a compact metric space then for every
ε > 0 there exists b > 0 such that η(B(y, ε)) > b for every y ∈ supp η.
12.1.2. Let f : M → M be an expanding map. Consider the non-linear operator G :
M1 (M) → M1 (M) defined in the space M1 (M) of all Borel probability
measures by
L∗ (η)
G(η) = .
L1 dη
Use the Tychonoff–Schauder theorem (Theorem 2.2.3) to show that G admits
some fixed point and deduce Lemma 12.1.1.
12.1.3. Let σ : A → A be the one-sided shift of finite type associated with a given
transition matrix A (recall Section 10.2.2). Let P be a stochastic matrix such that
Pi,j = 0 whenever Ai,j = 0 and p be a probability vector with positive coefficients
such that P∗ p = p. Consider the transfer operator L associated with the locally
constant potential
pi1
ϕ(i0 , i1 , . . . , in , . . . ) = − log .
pi0 Pi0 ,i1
Show that the Markov measure μ associated with the matrix P and the vector p
satisfies L∗ μ = μ.
12.1.4. Let λ be any positive number and ν be a Borel probability measure
such that L∗ ν = λν. Show that, given any u ∈ L1 (ν) and any continuous
12.2 Theorem of Livšic 399
function v : M → R,
(u ◦ f )v dν = u(λ−1 Lv) dν.
for every x ∈ M such that f n (x) = x. The converse is a lot more interesting.
Suppose that ϕn (x) = 0 for every x ∈ Fix(f n ) and every n ≥ 1. Consider
any point z ∈ M whose orbit is dense in M; such a point exists because f is
topologically exact and, consequently, transitive. Define the function u on the
orbit of z through the following relation:
u(f n (z)) = u(z) + ϕn (z), (12.2.1)
where u(z) is arbitrary. Observe that
u(f n+1 (z)) − u(f n (z)) = ϕn+1 (z) − ϕn (z) = ϕ(f n (z)) (12.2.2)
for every n ≥ 0. In other words, the cohomology relation
φ −ψ = u◦f −u (12.2.3)
holds on the orbit of z. To extend this relation to the whole of M, we use the
following fact:
Proof. Given ε ∈ (0, ρ), take δ > 0 given by the shadowing lemma (Proposi-
tion 11.2.9). Suppose that k ≥ 0 and l ≥ 1 are such that d(f k (z), f k+l (z)) < δ.
Then the periodic sequence (xn )n of period l given by
x0 = f k (z), x1 = f k+1 (z), . . . , xl−1 = f k+l−1 (z), xl = f k (z)
400 Thermodynamic formalism
j−1
k+j
ϕl (f k (z)) − ϕl (x) ≤ ϕ(f (z)) − ϕ(f j (x)) ≤ Cd(f j (x), f k+j (z))ν .
j=0 j=0
(i) μφ = μψ ;
(ii) there exist c ∈ R and an arbitrary function u : M → R such that φ − ψ =
c + u ◦ f − u;
(iii) φ − ψ is cohomologous to some constant c ∈ R;
(iv) there exist c ∈ R and a Hölder function u : M → R such that φ − ψ =
c + u ◦ f − u;
(v) there exists c ∈ R such that φn (x) − ψn (x) = cn for every x ∈ Fix(f n ) and
n ≥ 1.
Moreover, the constants c ∈ R in (ii), (iii), (iv) and (v) coincide; indeed, they
are all equal to P(f , φ) − P(f , ψ).
Proof. It is clear that (iv) implies (iii) and (iii) implies (ii).
12.2 Theorem of Livšic 401
Since f n (x) = x, the sum of the last two terms over every j = 0, . . . , n − 1
vanishes. Therefore, φn (x) − ψn (x) = cn. This proves that (ii) implies (v).
Suppose that φn (x) − ψn (x) = cn for every x ∈ Fix(f n ) and every n ≥ 0. That
means that the function ϕ = φ − ψ − c satisfies ϕn (x) = 0 for every x ∈ Fix(f n )
and every n ≥ 0. Note also that ϕ is Hölder. Hence, by Theorem 12.2.1, there
exists a continuous function u : M → R such that ϕ = u ◦ f − u. In other words,
φ − ψ is cohomologous to c. This shows that (v) implies (iii).
It follows from (10.3.4) and Proposition 10.3.8 that if φ is cohomologous to
ψ + c then
P(f , φ) = P(f , ψ + c) = P(f , ψ) + c.
On the other hand, given any invariant probability measure ν,
hν (f ) + φ dν = hν (f ) + (ψ + c) dν = hν (f ) + ψ dν + c.
Therefore, Theorem 12.2.3 gives that μ = μ0 if and only if there exists some
number c ∈ R such that log | det Df n (x)| = 0 + cn for every x ∈ Fix(f n ) and
every n ≥ 1.
12.2.1 Exercises
12.2.1. Consider the two-sided shift σ : → in = {1, . . . , d}Z . Show that for
every Hölder function ϕ : → R, there exists a Hölder function ϕ + : → R,
cohomologous to ϕ and such that ϕ + (x) = ϕ + (y) whenever x = (xi )i∈Z and
y = (yi )i∈Z are such that xi = yi for i ≥ 0.
12.2.2. Prove that if the functions ϕ, ψ : M → R are such that there exist constants C, L
satisfying |ϕn (x) − ψn (x) − nC| ≤ L for every x ∈ M, then P(f , ϕ) = P(f , ψ) + C
and ϕ is cohomologous to ψ + C.
12.2.3. Let f : M → M be a differentiable expanding map on a compact manifold, with
Hölder derivative. Check that any two potentials of the form ϕ = − log | det Df |,
for two different choices of a Riemannian metric on M, are cohomologous.
[Observation: In particular, all such potentials have the same equilibrium state,
namely, the absolutely continuous invariant probability measure. This was
observed before, in Section 12.1.8.]
12.2.4. Given k ≥ 2, let f : S1 → S1 be the (expanding) map given by f (x) = kx mod Z.
Let g : S1 → S1 be a differentiable expanding map of degree k. Show that f and
g are topologically conjugate.
12.2.5. Given k ≥ 2, let f : S1 → S1 be the map given by f (x) = kx mod Z. Let g : S1 →
S1 be a differentiable expanding map of degree k, with Hölder derivative. Show
that the following conditions are equivalent:
(a) f and g are conjugated by some diffeomorphism;
(b) f and g are conjugated by some absolutely continuous homeomorphism
whose inverse is also absolutely continuous;
(c) (gn ) (p) = kn for every p ∈ Fix(f n ).
= (g1 ◦ f )(g2 h) dν − g1 dμ (g2 h) dν = Bn (g1 , g2 h).
n
404 Thermodynamic formalism
This proves the first part of the theorem, with K2 (g2 ) = K1 (g2 h)/K5 . The
second part is an immediate consequence: if g1 is β-Hölder then g1 ∈ L1 (μ)
and it suffices to take K(g1 , g2 ) = K2 (g2 ) |g1 | dμ.
v2
v2 − α (v1, v2)v1
C
v1
v1 − β (v1, v2)−1v1
0
Figure 12.1. Defining the projective distance in a cone C
12.3 Decay of correlations 405
Therefore, α(v2 , v1 ) = β(v1 , v2 )−1 in all cases. Exchanging the roles of v1 and
v2 , we also get that β(v2 , v1 ) = α(v1 , v2 )−1 for any v1 , v2 ∈ C. Part (i) of the
proposition is an immediate consequence of these observations.
Next, we claim that α(v1 , v2 )α(v2 , v3 ) ≤ α(v1 , v3 ) for any v1 , v2 , v3 ∈ C. This
is obvious if α(v1 , v2 ) = 0 or α(v2 , v3 ) = 0; therefore, we may suppose that
α(v1 , v2 ) > 0 and α(v2 , v3 ) > 0. Then, by definition, there exist increasing
sequences of positive numbers (rn )n → α(v1 , v2 ) and (sn )n → α(v2 , v3 ) such
that
v 2 − rn v 1 ∈ C and v 3 − sn v 2 ∈ C for every n ≥ 1.
406 Thermodynamic formalism
where
(c − a)(d − b)
R(a, b, c, d) =
(b − a)(d − c)
denotes the cross-ratio of four positive numbers a < b ≤ c < d.
In Exercise 12.3.2 we invite the reader to check a similar fact when the
interval is replaced by the unit disk D = {z ∈ C : |z| < 1}.
12.3 Decay of correlations 407
Given b > 0 and β > 0, we denote by C(b, β) the set of positive functions
g ∈ C0 (M) whose logarithm is (b, β)-Hölder on balls of radius ρ, that is, such
that
| log g(x) − log g(y)| ≤ bd(x, y)β whenever d(x, y) < ρ. (12.3.8)
Lemma 12.3.8. For any b > 0 and β > 0, the set C(b, β) is a cone in the space
E = C0 (M) and the corresponding projective distance is given by
β(g1 , g2 )
θ (g1 , g2 ) = log ,
α(g1 , g2 )
where α(g1 , g2 ) is the infimum and β(g1 , g2 ) is the supremum of the set
g2 exp(bd(x, y)β )g2 (x) − g2 (y)
(x), : x = y and d(x, y) < ρ .
g1 exp(bd(x, y)β )g1 (x) − g1 (y)
have to check that it is convex. Consider any g1 , g2 ∈ C(b, β). The definition
(12.3.8) means that
gi (x)
exp(−bd(x, y)β ) ≤ ≤ exp(bd(x, y)β )
gi (y)
for i = 1, 2 and any x, y ∈ M with d(x, y) < ρ. Then, given t1 , t2 > 0,
t1 g1 (x) + t2 g2 (x)
exp(−bd(x, y)β ) ≤ ≤ exp(bd(x, y)β )
t1 g1 (y) + t2 g2 (y)
for any x, y ∈ M with d(x, y) < ρ. Hence, t1 g1 + t2 g2 is in C(b, β).
We proceed to calculate the projective distance. By definition, α(g1 , g2 )
is the supremum of all the numbers t > 0 satisfying the following three
conditions:
g2
(g2 − tg1 )(x) > 0 ⇔ t < (x)
g1
(g2 − tg1 )(x) β exp(bd(x, y)β )g2 (y) − g2 (x)
≤ exp(bd(x, y) ) ⇔ t ≤
(g2 − tg1 )(y) exp(bd(x, y)β )g1 (y) − g1 (x)
(g2 − tg1 )(x) β exp(bd(x, y)β )g2 (x) − g2 (y)
≥ exp(−bd(x, y) ) ⇔ t ≤
(g2 − tg1 )(y) exp(bd(x, y)β )g1 (x) − g1 (y)
for any x, y ∈ M with x = y and d(x, y) < ρ. Hence, α(g1 , g2 ) is equal to
g2 (x) exp(bd(x, y)β )g2 (x) − g2 (y)
inf , : x = y and d(x, y) < ρ .
g1 (x) exp(bd(x, y)β )g1 (x) − g1 (y)
Analogously, β(g1 , g2 ) is the supremum of this same set.
The crucial fact that makes the proof of Theorem 12.3.1 work is that the
transfer operator tends to improve the regularity of functions or, more pre-
cisely, their Hölder constants. The next proposition is a concrete manifestation
of this fact:
Lemma 12.3.9. For each β ∈ (0, α] there exists a constant λ0 ∈ (0, 1) such that
L(C(b, β)) ⊂ C(λ0 b, β) for every b sufficiently large (depending on β).
for i = 1, 2, where the points xi,j ∈ f −1 (yi ) satisfy d(x1i , x2i ) ≤ σ −1 d(y1 , y2 ) for
every 1 ≤ j ≤ k. By hypothesis, ϕ is (K0 , α)-Hölder. Since we suppose that
12.3 Decay of correlations 411
k
≤ eϕ(x2,i ) g(x2,i ) exp bd(x1,i , x2,i )β + Kd(x1,i , x2,i )β
i=1
≤ (Lg)(y2 ) exp (b + K)σ −β d(y1 , y2 )β
for every g ∈ C(b, β). Fix λ0 ∈ (σ −β , 1). For every b sufficiently large, (b +
K)σ −β ≤ bλ0 . Then the previous relation gives that
(Lg)(y1 ) ≤ (Lg)(y2 ) exp(λ0 bd(y1 , y2 )β ),
for any y1 , y2 ∈ M with d(y1 , y2 ) < ρ. Exchanging the roles of y1 and y2 , we
obtain the other inequality.
On the other hand, by the choice of N, there exists x ∈ B(z, ρ) such that f N (x) =
y2 . Then,
β β
LN g(y2 ) ≥ eϕN (x) g(x) ≥ e−N sup |ϕ| e−bd(x,z) g(z) ≥ e−N sup |ϕ|−bρ g(z).
Since y1 and y2 are arbitrary, this proves that
sup LN g N 2N sup |ϕ|+bρ β
≤ degree(f )e .
inf LN g
Now it suffices to take L equal to the expression on the right-hand side of this
inequality.
Combining Lemmas 12.3.9 and 12.3.10, we get that there exists N ≥ 1 and,
given β ∈ (0, α] there exists λ0 ∈ (0, 1) such that, for every b > 0 sufficiently
large (depending on N and β) there exists L > 1, satisfying
LN (C(b, β)) ⊂ C(λN0 b, β) ∩ C(L). (12.3.9)
In what follows, we write C(c, β, R) = C(c, β) ∩ C(R) for any c > 0, β > 0 and
R > 1.
412 Thermodynamic formalism
Lemma 12.3.11. For every c ∈ (0, b) and R > 1, the set C(c, β, R) ⊂
C(b, β) has finite diameter with respect to the projective distance of the cone
C(b, β).
Proof. We use the expression of θ given by Lemma 12.3.8. On the one hand,
the hypothesis that g1 , g2 ∈ C(c, β) ensures that
exp bd(x, y)β g2 (x) − g2 (y) g2 (x) 1 − exp − bd(x, y)β g2 (y)/g2 (x)
=
exp bd(x, y)β g1 (x) − g1 (y) g1 (x) 1 − exp − bd(x, y)β g1 (y)/g1 (x)
g2 (x) 1 − exp − (b − c)d(x, y)β )
≥
g1 (x) 1 − exp − (b + c)d(x, y)β
g2 (x) 1 − exp − (b − c)ρ β )
≥
g1 (x) 1 − exp − (b + c)ρ β
for any x, y ∈ M with d(x, y) < ρ. Denote by r the value of the last fraction on
the right-hand side. Then, observing that r ∈ (0, 1),
g2 (x) g2 (x) g2 (x) inf g2
α(g1 , g2 ) ≥ inf ,r : x ∈ M = r inf :x∈M ≥r .
g1 (x) g1 (x) g1 (x) sup g1
Analogously,
g2 (x) 1 g2 (x) 1 g2 (x) 1 sup g2
β(g1 , g2 ) ≤ sup , : x ∈ M = sup :x∈M ≤ .
g1 (x) r g1 (x) r g1 (x) r inf g1
On the other hand, the hypothesis that g1 , g2 ∈ C(R) gives that
sup g2 inf g2
≤ R2 .
inf g1 sup g1
Combining these three inequalities, we conclude that θ (g1 , g2 ) ≤ log(R2 /r2 )
for any g1 , g2 ∈ C(c, β, R).
Corollary 12.3.12. There exists N ≥ 1 such that for every β ∈ (0, α] and every
b > 0 sufficiently large there exists 0 < 1 satisfying
Then LN (C(b, β)) ⊂ C(c, β, R) and it follows from Lemma 12.3.11 that the
diameter D of the image LN (C(b, β)) with respect to the projective distance
θ is finite. Take 0 = tanh(D/4). Now the conclusion of the corollary is an
immediate application of Proposition 12.3.6.
12.3 Decay of correlations 413
Proof. Let g ∈ C(c, β, R). In particular, g > 0 and so g dν > 0. The conclusion
of the lemma is not affected when we multiply g by any positive number.
Hence, it is no restriction to suppose that g dν = 1. Then,
−n n −n ∗n
λ L g dν = λ g d(L ν) = g dν = 1 = h dν
for every j ≥ 0. Fix C1 > 0 such that |ex − 1| ≤ C1 |x| whenever |x| ≤ D. Then
the previous relation implies that
−jN jN
λ L g(x) − h(x) ≤ h(x)C1 j D for every x ∈ M and j ≥ 0. (12.3.11)
0
1/N
Take C2 = C1 D sup h and = 0 . The inequality (12.3.11) means that
λ−jN LjN g − h ≤ C2 jN for every j ≥ 1.
Given any n ≥ 1, write n = jN + r with j ≥ 0 and 0 ≤ r < N. Since the transfer
operator L : C0 (M) → C0 (M) is continuous and Lh = λh,
λ−n Ln g − h = λ−r Lr (λ−jN LjN g − h) ≤ (L/λ)r λ−jN LjN g − h.
Combining the last two inequalities,
λ−n Ln g − h ≤ (L/λ)r C2 n−r .
This proves the conclusion of the lemma, as long as we take C ≥ C2 (L/(λ))r
for every 0 ≤ r < N.
|g± ± β
2 (x) − g2 (y)| ≤ |g2 (x) − g2 (y)| ≤ Hd(x, y) ,
for x, y ∈ M. Hence, using the mean value theorem and the fact that B ≥ H/c,
± ± β
log g± (x) − log g± (y) ≤ |g2 (x) − g2 (y)| ≤ Hd(x, y) ≤ cd(x, y)β .
2 2
B B
Moreover, since B ≥ sup |g2 |/(R − 1),
sup g± ±
2 ≤ sup |g2 | + B ≤ RB ≤ R inf g2 .
12.3 Decay of correlations 415
and, consequently,
Bn (g1 , g2 ) ≤ Bn (g1 , g+ −
2 ) + Bn (g1 , g2 )
(12.3.13)
≤ C n
|g1 | dν (g+ −
2 + g2 ) dν.
We close this section with a few comments about the spectral gap property.
Let Cβ (M) be the vector space of β-Hölder functions g : M → C. We leave it
to the reader (Exercise 12.3.6) to check the following facts:
(i) The function gβ,ρ = sup |g| + Hβ,ρ (g) is a complete norm in Cβ (M).
(ii) Cβ (M) is invariant under the transfer operator: L(Cβ (M)) ⊂ Cβ (M).
(iii) The restriction L : Cβ (M) → Cβ (M) is continuous with respect to the
norm · β,ρ .
It follows that the spectrum of L : Cβ (M) → Cβ (M) is the union of {λ} with
the restriction of L to the hyperplane V. In Exercise 12.3.8 we invite the reader
to show that the spectral radius of L | V is strictly less than λ. Consequently,
L : Cβ (M) → Cβ (M) has the spectral gap property.
The book of Viviane Baladi [Bal00] contains an in-depth presentation of the
spectral theory of transfer operators and its connections to the issue of decay
of correlations, for differentiable (or piecewise differentiable) expanding maps
and also for uniformly hyperbolic diffeomorphisms.
12.3.4 Exercises
12.3.1. Show that the cross-ratio R(a, x, y, b) is invariant under every Möbius automor-
phism of the real line, that is, R(φ(a), φ(b), φ(c), φ(d)) = R(a, b, c, d) for any
a < b ≤ c < d and every transformation of the form φ(x) = (αx + β)/(γ x + δ)
with αδ − βγ = 0.
12.3.2. Consider the cone C = {(z, s) ∈ C × R : s > |z|}. Its projective quotient may be
identified with the unit disk D = {z ∈ C : |z| < 1} through (z, 1) → z. Let d be the
distance induced in D, through this identification, by the projective distance of
C. Show that d coincides with the Cayley–Klein distance , which is defined by
|aq| |pb|
(p, q) = log , for p, q ∈ D,
|ap| |bq|
where a and b are the points where the straight line through p and q intersects
the boundary of the disk, denoted in such a way that p is between a and q and
q is between p and b. [Observation: The Cayley–Klein distance is related to the
Poincaré distance in the disk through the map z → (2z)/1 + |z|2 .]
12.3.3. Show that the projective distance associated with the cone C+ in Example 12.3.7
is complete, in the following sense: with respect to the projective distance,
every Cauchy sequence (gn )n converges to some element of C+ . Moreover, if
we normalize the functions (for example, fixing any probability measure η on
M and requiring that gn dη = 1 = g dη for every n), then (gn )n converges
uniformly to g.
12.3.4. Let M be a compact manifold and C1 be the cone of positive differentiable
functions in M. Show that the corresponding projective distance θ1 is not
complete.
12.3.5. Check that if g1 , g2 : M → R are β-Hölder functions, θ : M → M is an
L-Lipschitz transformation and η is a probability measure on M then:
12.4 Dimension of conformal repellers 417
(a) Hβ (g1 g2 ) ≤ sup |g1 |Hβ (g2 ) + sup |g2 |Hβ (g1 );
(b) |g1 | dη ≤ sup |g1 | ≤ |g1 | dη + Hβ (g1 )(diam M)β ;
(c) Hβ (g ◦ θ ) ≤ Lβ Hβ (g).
Moreover, the claim in (a) remains true if we replace Hβ by Hβ,ρ . The same
holds for the claim in (c), as long as L ≤ 1.
12.3.6. Let Cβ (M) be the vector space of β-Hölder functions on a compact metric space
M. Prove the properties (i), (ii), (iii) stated at the end of Section 12.3.
12.3.7. Endow Cβ (M) with the norm · β,ρ . Let L : Cβ (M) → Cβ (M) be the transfer
operator associated with an α-Hölder potential ϕ : M → R, with α ≥ β. Let λ
be the spectral radius, ν be the reference measure, h be the eigenfunction and
μ = hν be the equilibrium state of the potential ϕ. Consider the transfer operator
P : Cβ (M) → Cβ (M) associated with the potential ψ = ϕ + log h − log h ◦ f −
log λ.
(a) Check that L is linearly conjugate to λP, and so spec(L) = λ spec(P).
Moreover, P1 = 1 and P ∗ μ = μ.
(b) Show that |P n g| dμ ≤ |g| dμ and sup |P n g| ≤ sup |g| and there exist
constants C > 0 and τ < 1 such that Hβ,ρ (P n g) ≤ τ n Hβ,ρ (g) + C sup |g| for
every g ∈ Cβ (M) and every n ≥ 1.
12.3.8. The goal of this exercise is to prove that the spectral radius of the restriction of
L to the hyperplane V = {g ∈ Cβ (M) : g dν = 0} is strictly less than λ. By part
(a) of Exercise 12.3.7, it is enough to consider the case L = P (with λ = 1 and
ν = μ and h = 1). Fix b, β, R as in Corollary 12.3.12.
(a) Show that there exist K > 1 and r > 0 such that, for every v ∈ V
with vβ,ρ ≤ r, the function g = 1 + v is in the cone C(b, β, R) and
satisfies
(b) Use Corollary 12.3.12 and the previous item to find C > 0 and τ < 1 such
that P n vβ,ρ ≤ Cτ n vβ,ρ for every v ∈ V. Deduce that the spectral radius
of P | V is less or equal than τ < 1.
That is, we consider all possible covers of M by subsets with diameter less than
δ and we try to minimize the sum of the diameters raised to the power d. This
number varies with δ in a monotonic fashion: when δ decreases, the class of
admissible covers decreases and, thus, the infimum can only increase. We call
Hausdorff measure of M in dimension d the limit
Note that md (M) ∈ [0, ∞]. Moreover, it follows directly from the definition
that
md1 (M, δ) ≤ δ d1 −d2 md2 (M, δ) for every δ > 0 and any d1 > d2 > 0.
Example 12.4.1. Consider the usual Cantor set K in the real line. That is,
∞
K= Kn
n=0
The lower bound is a bit more difficult, because one needs to deal with
arbitrary covers. We are going to show that, given any cover U of M,
(diam U)d0 ≥ 1. (12.4.4)
U∈U
Clearly, (12.4.3) and (12.4.5) imply (12.4.4). We are left to prove (12.4.5).
The strategy is to modify the cover U successively, in such a way that the
expression on the left-hand side of (12.4.5) never increases and one reaches
the cover V n after finitely many modifications. For each U ∈ U , let k ≥ 0 be
minimum such that U intersects a unique element V of V k . The choice of n
implies that k ≤ n: for k > n, if U intersects an element of V k then U contains
all the 2k−n elements of V k inside the same element of V n . Suppose that k < n.
By the choice of k, the set U intersects exactly two elements of V k+1 . Let them
be denoted V1 and V2 and let U1 and U2 be their intersections with U. Then
This means that the value on the left-hand side of (12.4.5) does not increase
when we replace U by U1 and U2 in the cover U . On the one hand, the
new cover satisfies the same conditions as the original: U1 and U2 are open
segments (because V1 , V2 and U are open segments) and they contain every
element of V n that they intersect. On the other hand, by construction, each one
of them intersects a unique element of V k+1 . Therefore, after finitely many
repetitions of this procedure we reduce the initial situation to the case where
k = n for every U ∈ U . Now, the choice of n implies that in that case each
U ∈ U contains the unique V ∈ V n that it intersects. Observe that this means
that U = V. In particular, any elements of U that correspond to the same
420 Thermodynamic formalism
D D
hj
Dj
Di
hi
The first hypothesis is that the map f is expanding: there exists σ > 1 such
that
Df (x)v ≥ σ for every x ∈ D∗ and every v ∈ R . (12.4.8)
d() = d0 ,
where d0 ∈ (0, 1) is the unique number such that P(f , −d0 log | det Df |) = 0.
with i0 , . . . , in−1 ∈ {1, . . . , N}. For each n ≥ 1, denote by I n the family of all
inverse branches hn of f n . By construction, the images hn (D), hn ∈ I n are
pairwise disjoint and their union contains .
The principal goal in this section is to prove the following geometric
estimate, which is at the heart of the proof of Theorem 12.4.3:
Proposition 12.4.5. There exists C0 > 1 such that for every n ≥ 1, every hn ∈
I n , every E ⊂ hn (D) and every x ∈ hn (D):
1
[diam f n (E)] ≤ [diam E] | det Df n (x)| ≤ C0 [diam f n (E)] . (12.4.14)
C0
Starting the proof of this proposition, observe that our hypotheses imply that
every inverse branch hi of f is a diffeomorphism with Dhi ≤ σ −1 . Then, since
D is convex, we may use the mean value theorem to conclude that
Note that hn−k (D) ⊂ Dik for each k. It follows from (12.4.15) that each hn−k is
a σ k−n -contraction. In particular,
Recall that the convex hull of a set X ⊂ R is the union of all the line
segments whose endpoints are in X. It is clear that the convex hull has the
same diameter as the set itself. Since Di is convex for every i, the convex hull
of each hn−k (D) is contained in Dik . In particular, the derivative Df is defined
at every point in the convex hull of every hn−k (D).
12.4 Dimension of conformal repellers 423
Lemma 12.4.6. There exists C1 > 1 such that, for every n ≥ 1 and every
inverse branch hn ∈ I n ,
&
n−1
| det Df (zk )|
≤ C1
k=0
| det Df (w k )|
The time has come for us to exploit the conformality hypothesis (12.4.10).
Given any linear isomorphism L : R → R , it is clear that | det L| ≤ L , and
analogously for the inverse. Therefore,
1 = | det L| | det L−1 | ≤ L L−1 .
Hence, L L−1 = 1 implies that | det L| = L , and analogously for the
inverse. Therefore, (12.4.10) implies that
| det Df (y)| = Df (y) for every y ∈ D∗ . (12.4.18)
Now we are ready to prove Proposition 12.4.5:
By Lemma 12.4.6,
&
n−1
| det Df (zk )| ≤ C1 | det Df n (x)|. (12.4.22)
k=0
Let L be the open cover of whose elements are the images h() of
under all the inverse branches of f . For each n ≥ 1, the iterated sum Ln is
formed by the images hn () of under the inverse branches of f n . It follows
from (12.4.17) that diam Ln ≤ σ −n diam D for every n, and so diam Ln → 0.
Then, since the elements of L are pairwise disjoint, we may use Exercise 10.3.3
to conclude that
P(f , ψ) = P(f , ψ, L) for every potential ψ. (12.4.23)
In particular, (0) = P(f , 0, L) = h(f , L). Note that each family Ln is a minimal
cover of the repeller, that is, no proper subfamily covers . Therefore, H(Ln ) =
log #Ln = n log N for every n and, consequently, h(f , L) = log N. This proves
that (0) is positive.
Proposition 12.4.7. (1) = limn 1n log vol f −n (D) < 0.
for every hn ∈ I n . Combining this inequality with (12.4.25) and the fact that
vol(hn (D) \ hn (D∗ )) = vol hn (D) − vol hn (D∗ ), we obtain that
vol f −(n+1) (D)
≤ e−β for every n ≥ 0
−n
vol f (D)
(the case n = 0 follows directly from the hypothesis (12.4.6)). Hence,
1
lim log vol f −n (D) ≤ −β < 0.
n n
Figure 12.3 summarizes the conclusions in this section. Recall that the func-
tion defined by (t) = P(f , −t log | det Df |) is convex, by Proposition 10.3.5.
t Ψ(t )
h(f )
d0 1
0
Let L be the open cover of introduced in the previous section and let b > 0
be such that P(f , bφ) < 0. The property (12.4.23) implies that
for some x ∈ L. It is also clear that f n (L) = for every L ∈ Ln . Then, taking
E = L in Proposition 12.4.5,
for every n sufficiently large. Since the expression on the right-hand side
converges to zero, and the diameter of the covers Ln also converges to zero,
it follows that mb (M) = 0. Therefore, d(M) ≤ b .
Fix such an n. Let ε > 0 be a lower bound for the distance between any two
elements of Ln : a lower bound does exist because the elements of Ln are
compact and pairwise disjoint. Fix ρ ∈ (0, εa ). The reason for this choice will
be clear soon. We claim that
[diam U]al ≥ 2−al ρ (12.4.30)
U∈U
428 Thermodynamic formalism
for every cover U of . By definition, this implies that ma () ≥ 2−al ρ > 0
and, consequently, d() ≥ a . Therefore, to end the proof of Theorem 12.4.3
it suffices to prove this claim.
Let us suppose that there exists some open cover of which does not satisfy
(12.4.30). Then, using Exercise 12.4.3, there exists some open cover U of
with
[diam U]al < ρ < εal . (12.4.31)
U∈U
By compactness, we may suppose that this open cover U is finite. The relation
(12.4.31) implies that every U ∈ U has diameter less than ε. Hence, each U ∈ U
intersects at most one L ∈ Ln . Since Ln covers and U is a non-empty subset
of , we also have that U intersects some L ∈ Ln . This means that U is the
disjoint union of the families
UL = U ∈ U : U ∩ L = ∅ , L ∈ Ln .
If U ∈ UL then U ⊂ L. Let us consider the families f n (UL ) = {f n (U) :
U ∈ UL }. Observe that each one of them is a cover of . Moreover, using
Proposition 12.4.5,
−aφn (L)
[diam V] =
a
[diam f (U)] ≤ C0 e
n a
[diam U]a .
V∈f n (UL ) U∈UL U∈UL
(12.4.32)
Therefore,
[diam U]a = [diam U]a ≥ C0−1 eaφn (L) [diam V]a .
U∈U L∈Ln U∈UL L∈Ln V∈f n (UL )
This is a contradiction, because eκn > C0 . Hence, there exists L ∈ Ln such that
[diam V]a ≤ [diam U]a < ρ.
V∈f n (UL ) U∈U
Thus, we may repeat the previous procedure with f n (UL ) in the place of U .
Observe, however, that #f n (UL ) = #UL is strictly less than #U . Therefore, this
process must stop after a finite number of steps. This contradiction proves the
claim 12.4.30.
The proof of Theorem 12.4.3 is complete. However, it is possible to prove an
even stronger result: in the conditions of the theorem, the Hausdorff measure
of in dimension d(M) is positive and finite. We leave this statement as a
special challenge (Exercise 12.4.7) for the reader who remained with us till the
end of this book!
12.4.7 Exercises
12.4.1. Let d = log 2/ log 3. Show that (x1 + 1 + x2 )d ≥ x1d + x2d for every x1 , x2 ∈ [0, 1].
Moreover, the identity holds if and only if x1 = x2 = 1.
12.4.2. Let f : M → N be a Lipschitz map, with Lipschitz constant L. Show that
md (f (A)) ≤ Ld md (A)
for any d ∈ (0, ∞) and any A ⊂ M. Use this fact to show that if A ⊂ Rn and t > 0,
then md (tA) = td md (A), where tA = {tx : x ∈ A}.
12.4.3. Represent by mod (M) and mcd (M) the numbers defined in the same ways as the
Hausdorff measure md (M) but considering only covers by open sets and covers
by closed sets, respectively. Show that mod (M) = mcd (M) = md (M).
12.4.4. (Mass distribution principle) Let μ be a finite measure on a compact metric
space M and assume that there exist numbers d, K, ρ > 0 such that μ(B) ≤
K(diam U)d for every set B ⊂ M with diameter less than ρ. Show that if A ⊂ M
is such that μ(A) > 0 then md (A) > 0 and so d(A) ≥ d.
12.4.5. Use the mass distribution principle to show that the Hausdorff dimension of the
Sierpinski triangle (Figure 12.4) is equal to d0 = log 3/ log 2 and the Hausdorff
measure in dimension d0 is positive and finite.
12.4.6. Check the pressure formula (12.4.12).
12.4.7. Adapting arguments from Exercise 12.4.5, show that in the conditions of
Theorem 12.4.3 one has 0 < md() () < ∞.
Appendix A
Topics in measure theory, topology
and analysis
Euclidean spaces. The last part is dedicated to measurable maps, which are the
maps that preserve the structure of measurable spaces.
(i) ∅ ∈ B;
(ii) A ∈ B implies Ac ∈ B;
(iii) A ∈ B and B ∈ B implies A ∪ B ∈ B;
(iv) A ∈ B and B ∈ B implies A ∩ B ∈ B;
(v) A ∈ B and B ∈ B implies A \ B ∈ B.
The two last properties are immediate consequences of the previous ones,
since A ∩ B = (Ac ∪ Bc )c and A \ B = A ∩ Bc . Moreover, by associativity,
properties (iii) and (iv) imply that the union and the intersection of any finite
family of elements of B are also in B.
Example A.1.4. For any set X, the following families of subsets are
σ -algebras:
{∅, X} and 2X = { all subsets of X}.
Moreover, clearly, if B is any algebra of subsets of X then {∅, X} ⊂ B ⊂ 2X . So,
{∅, X} is the smallest and 2X is the largest of all algebras of subsets of X.
In the statement that follows, I is an arbitrary set whose sole use is to index
the elements of the family of σ -algebras.
432 Measure theory, topology and analysis
Observe that lim infn En ⊂ lim supn En and both sets are in B.
Example A.1.9. The extended line R̄ = [−∞, ∞] is the union of the real line
R = (−∞, +∞) with the two points ±∞ at infinity. This space has a natural
A.1 Measure spaces 433
for any countable family of pairwise disjoint sets Aj ∈ B. This last property is
called countable additivity or σ -additivity. Then the triple (X, B, μ) is called
a measure space. If μ(X) < ∞ then we say that μ is a finite measure and if
μ(X) = 1 then we call μ a probability measure. In this last case, (X, B, μ) is
called a probability space.
Example A.1.11. Let X be an arbitrary set, endowed with the σ -algebra B =
2X . Given any p ∈ X, consider the function δp : 2X → [0, +∞] defined by:
1 if p ∈ A
δp (A) =
0 if p ∈/ A.
It is easy to see that δp is a measure. It is usually called the Dirac measure, or
Dirac mass at p.
Definition A.1.12. We say that a measure μ is σ -finite if there exists a
sequence A1 , . . . , An , . . . of subsets of X such that μ(Ai ) < ∞ for every i ∈ N
and
∞
X= Ai .
i=1
Clearly, the two families {∅, X} and 2X are monotone classes. Moreover, if
{Ci : i ∈ I} is any family of monotone classes then the intersection i∈I Ci
is a monotone class. Thus, for every subset A of 2X there exists the smallest
monotone class that contains A.
Theorem A.1.18 (Monotone class). The smallest monotone class that con-
tains an algebra A coincides with the σ -algebra σ (A) generated by A.
Another important result about σ -algebras that will be useful later states that
every element of a σ -algebra B generated by an algebra A is approximated by
the elements of A, in the sense that the measure of the symmetric difference
A B = (A \ B) ∪ (B \ A) = A ∪ B \ A ∩ B
can be made arbitrarily small. More precisely:
where the supremum is taken over all countable partitions of the measurable
set E into measurable subsets (this definition coincides with the one we gave
previously in the special case when μ is real). The function μ = |μ|(X)
defines a norm in the vector space of complex measures on X, which we also
denote as M(X). Moreover, this norm is complete. When X is a compact
metric space, the complex Banach space (M(X), · ) is isomorphic to the
dual of the space C0 (X) of continuous complex functions on X (theorem of
Riesz–Markov).
A.1 Measure spaces 437
Each cube [k1 , k1 + 1) × · · · × [kd , kd + 1) may be identified with [0, 1)d through
the translation Tk1 ,...,kd (x) = x − (k1 , . . . , kd ) that maps (k1 , k2 , . . . , kd ) to the
origin. That allows us to define a measure mk1 ,k2 ,...,kd on C, by setting
mk1 ,k2 ,...,kd (B) = m0 Tk1 ,...,kd (B)
for every measurable set B ⊂ C. Finally, given any measurable set B ⊂ Rd ,
define:
m(B) = ··· mk1 ,...,kd B ∩ [k1 , k1 + 1) × · · · × [kd , kd + 1) .
k1 ∈Z kd ∈Z
where the infimum is taken over all countable covers (Rk )k of E by open
rectangles. The function E → m(E) is defined for every E ⊂ Rd , but is not
finitely additive (although it is countably subadditive). We say that E is a
Lebesgue measurable set if
m∗ (A) = m∗ A ∩ E + m∗ A ∩ Ec for every A ⊂ Rd .
Next, extend the definition of μφ to the algebra A formed by the finite unions
A = I1 ∪ · · · ∪ Ik of pairwise disjoint intervals, through the relation
k
μφ (A) = μφ (Ij ).
j=1
The measure μφ that we have just constructed has the following special
property: if a set A ⊂ [0, 1] has Lebesgue measure zero then μφ (A) = 0. This
property is called absolute continuity (with respect to the Lebesgue measure)
and is studied in a lot more depth in Appendix A.2.4.
Here is an example of a measure that is positive on any open set but is not
absolutely continuous with respect to Lebesgue measure:
A.1 Measure spaces 439
On the one hand, the measure of any non-empty open subset of the real line is
positive, for such a subset must contain some ri . On the other hand, the measure
of Q is
1
μ(Q) = = 1.
2i
ri ∈Q
Since Q has Lebesgue measure zero (because it is a countable set), this implies
that μ is not absolutely continuous with respect to the Lebesgue measure.
Proof. If supp μ is empty then for each point x ∈ X we may find an open
neighborhood Vx such that μ(Vx ) = 0. Let {Aj : j = 1, 2, . . . } be a countable
440 Measure theory, topology and analysis
basis of the topology of X. Then, for each x ∈ X we may choose i(x) ∈ N such
that x ∈ Ai(x) ⊂ Vx . Hence,
X= Vx = Ai(x)
x∈X x∈X
and so ∞
μ(X) = μ Ai(x) ≤ μ(Ai ) = 0.
x∈X i=1
This is a contradiction, and so supp μ cannot be empty.
A.1.5 Exercises
A.1.1. Let X be a set and (Y, C) be a measurable space. Show that, for any
transformation f : X → Y there exists some σ -algebra B of subsets of X such
that the transformation is measurable with respect to the σ -algebras B and C.
A.1.2. Let X be a set and consider the family of subsets
B0 = {A ⊂ X : A is finite or Ac is finite}.
442 Measure theory, topology and analysis
A.1.12. Let K ⊂ [0, 1] be the Cantor set, that is, K = ∞ n=0 Kn where K0 = [0, 1] and
each Kn is the set obtained by removing from each connected component C of
Kn−1 the open interval whose center coincides with the center of C and whose
length is one third of the length of C. Show that K has Lebesgue measure equal
to zero.
A.1.13. Given a set E ⊂ Rd , prove that the following conditions are equivalent:
(a) E is a Lebesgue measurable set.
(b) E belongs to the completion of the Borel σ -algebra relative to the
Lebesgue measure, that is, there exist Borel sets B1 , B2 ⊂ Rd such that
B1 ⊂ E ⊂ B2 and m(B2 \ B1 ) = 0.
(c) (Approximation from above by open sets) Given ε > 0 we can find an open
set A such that E ⊂ A and m∗ (A \ E) < ε.
(d) (Approximation from below by closed sets) Given ε > 0 we can find a
closed set F such that F ⊂ E and m∗ (E \ F) < ε.
A.1.14. Prove Proposition A.1.31.
A.1.15. Let gn : M → R, n ≥ 1 be a sequence of measurable functions such that
f (x) = ∞ n=1 gn (x) converges at every point. Show that the sum f is a measurable
function.
A.1.16. Prove Proposition A.1.33.
A.1.17. Let f : X → X be a measurable transformation and ν be a measure on X. Define
(f∗ ν)(A) = ν(f −1 (A)). Show that f∗ ν is a measure and note that it is finite if and
only if ν itself is finite.
A.1.18. Let ω5 : [0, 1] → [0, 1] be the function assigning to each x ∈ [0, 1] the upper
frequency of the digit 5 in the decimal expansion of x. In other words, writing
x = 0.a0 a1 a2 . . . with ai = 9 for infinitely many values of i,
1
ω5 (x) = lim sup #{0 ≤ j ≤ n − 1 : aj = 5}.
n n
even extend this construction to countable families. Near the end, we discuss
the related notions of absolute continuity and Lebesgue derivation.
as long as at least one of the integrals on the right-hand side is finite (with
the usual conventions that (+∞) − a = +∞ and a − (+∞) = −∞ for every
a ∈ R).
A.2 Integration in measure spaces 445
Proposition A.2.7. The set L1 (μ) of all real integrable functions is a real
vector space. Moreover, the map I : L1 (μ) → R given by I(f ) = f dμ is a
positive linear functional:
(1) af + bg dμ = a f dμ + b g dμ, and
(2) f dμ ≥ g dμ if f (x) ≥ g(x) for every x.
In particular, | f dμ| ≤ |f | dμ if |f | ∈ L1 (μ). Moreover, |f | ∈ L1 (μ) if and
only if f ∈ L1 (μ).
The notion of the Lebesgue integral may be extended to an even broader
class of functions, in two different ways. On the one hand, we may consider
complex functions f : X → C. In this case, we say that f is integrable if and
only if the real part (f and the imaginary part )f are both integrable. Then, by
definition,
f dμ = (f dμ + i )f dμ.
446 Measure theory, topology and analysis
On the other hand, we may consider functions that are not necessarily
measurable but coincide with some measurable function on a subset of the
domain with total measure. To explain this, we need the following notion,
which is used frequently throughout the text:
Definition A.2.8. We say that a property holds at μ-almost every point (or
μ-almost everywhere) if the subset of points of X for which it does not hold is
contained in some zero measure set.
This observation permits the definition of the integral for any function f ,
possibly non-measurable, that coincides at μ-almost every point with some
measurable function g: it suffices to take f dμ = g dμ.
To close this section, let us observe that the notion of integral may also be
extended to signed measures and even complex measures, as follows. Let μ be
a signed measure and μ = μ+ − μ− be its Hahn decomposition. We say that a
function φ is integrable with respect to μ if it is integrable with respect to both
μ+ and μ− . Then we define:
φ dμ = φ dμ − φ dμ− .
+
The next result applies to much more general sequences, not necessarily
monotone:
Theorem A.2.10 (Lemma of Fatou). Let fn : X → [0, +∞] be a sequence
of non-negative measurable functions. Then the function f : X → [−∞, +∞]
defined by f (x) = lim infn fn (x) is integrable and satisfies
lim inf fn (x) dμ ≤ lim inf fn dμ.
n n
The most powerful of the results in this section is the dominated convergence
theorem, which asserts that we may take the limit under the integral sign
whenever the sequence of functions is bounded by some integrable function:
Theorem A.2.11 (Dominated convergence). Let fn : X → R be a sequence
of measurable functions and assume that there exists some integrable function
g : X → R such that |fn (x)| ≤ |g(x)| for μ-almost every x in X. Assume moreover
that the sequence (fn )n converges at μ-almost every point to some function
f : X → R. Then f is integrable and satisfies
lim fn dμ = f dμ.
n
Theorem A.2.12 remains valid when the measures μj are just σ -finite, except
that in this case the product measure μ is also only σ -finite.
Next, we describe the product of a countable family of measure spaces.
Actually, for now we restrict ourselves to the case of probability spaces. Let
(Xj , Bj , μj ), j ∈ I be probability measure spaces with μj (Xj ) = 1 for every j ∈ I.
What follows holds for both I = N and I = Z. Consider the Cartesian product
&
= Xj = {(xj )j∈I : xj ∈ Xj }. (A.2.1)
j∈I
The proof of this theorem (see Theorem 38.B in Halmos [Hal50]) uses the
extension theorem (Theorem A.1.13) together with the theorem of continuity
at the empty set (Theorem A.1.14). The probability measure μ is called the
.
product of the measures μj and is denoted as j∈I μj . The probability space
(, B, μ) is called the product of the spaces (Xj , Bj , μj ), j ∈ I.
An important special case is when the spaces (Xi , Bi , μi ) are all equal to
a given (X, C, ν). The corresponding product space may be used to model
a sequence of identical random experiments such that the outcome of each
experiment is independent of all the others. To explain this, take X to be
the set of possible outcomes of each experiment and let ν be the probability
distribution of those outcomes. In this context, the measure μ = ν I =
.
j∈I ν is usually called the Bernoulli measure defined by ν. Property (A.2.3)
corresponds to the identity
&
n
μ([m; Am , . . . , An ]) = ν(Aj ), (A.2.4)
j=m
which may be read in the following way: the probability of any composite
event {xm ∈ Am , . . . , xn ∈ An } is equal to the product of the probabilities of
the individual events xi ∈ Ai . So, (A.2.4) does reflect the assumption that the
successive experiments are mutually independent.
A.2 Integration in measure spaces 449
We have a special interest in the case when X is a finite set, endowed with
the σ -algebra C = 2X of all its subsets. In this case, it is useful to consider the
elementary cylinders
Consider the finite set X endowed with the discrete topology. The product
topology on = X I coincides with the topology generated by the elementary
cylinders. Moreover (see Exercise A.1.11), it coincides with the topology
associated with the distance defined by
d (xi )i∈I , (yi )i∈I = θ N , (A.2.7)
where θ ∈ (0, 1) is fixed and N = N((xi )i∈I , (yi )i∈I ) ≥ 0 is the largest integer
such that xi = yi for every i ∈ I with |i| < N.
In particular,
1
lim f (y)dm = f (x) at m-almost every point x.
r→0 m(B(x, r)) B(x,r)
The crucial ingredient in the proof of these results is the following geometric
fact:
This theorem remains valid if, instead of balls, we take for (Bn (x))n any
sequence of sets such that n Bn (x) = {x} and
sup{d(x, y) : y ∈ Bn (x)}
sup < ∞.
x,n inf{d(x, z) : z ∈
/ Bn (x)}
The set of measures defined on the same measurable space possesses a
natural partial order relation:
The Lebesgue decomposition theorem states that, given any two finite
measures μ and ν in the same measurable space, we may write ν = νa + νs
where νa and νs are finite measures such that νa μ and νs ⊥ μ. Combining
this with the theorem of Radon–Nikodym, we get:
A.2.5 Exercises
A.2.1. Prove that the integral of a simple function is well defined: if two linear
combinations of characteristic functions define the same function, then the
values of the integrals obtained from the two combinations coincide.
A.2.2. Show that if (rn )n and (sn )n are non-decreasing sequences of non-negative
functions converging at μ-almost every point to the same function f : M →
[0, +∞), then limn rn dμ = limn sn dμ.
A.2.3. Prove Proposition A.2.7.
A.2.4. (Tchebysheff–Markov inequality) Let f : M → R be a non-negative function
integrable with respect to a finite measure μ. Then, given any real number
a > 0,
1
μ {x ∈ M : f (x) ≥ a} ≤ f dμ.
a X
In particular, if |f | dμ = 0 then μ {x ∈ X : f (x) = 0} = 0.
A.2.5. Let f be an integrable function. Show that for every ε > 0 there exists δ > 0
such that | E f dμ| < ε for every measurable set E with μ(E) < δ.
A.2.6. Let ψ1 , . . . , ψN : M → R be bounded measurable functions defined on a
probability space (M, B, μ). Show that for any ε > 0 there exist x1 , . . . , xs ∈ M
and positive numbers α1 , . . . , αs such that sj=1 αj = 1 and
s
ψi dμ − α ψ (x ) <ε for every i = 1, . . . , N.
j i j
j=1
A.2.7. Deduce the dominated convergence theorem (Theorem A.2.11) from the
Lemma of Fatou (Theorem A.2.10).
452 Measure theory, topology and analysis
that is, μ = p1 δx1 + p2 δx2 and ν = q1 δx1 + q2 δx2 . Check that ν μ and μ ν
and calculate the corresponding Radon–Nikodym derivatives.
A.2.13. Construct a probability measure μ on [0, 1] absolutely continuous with respect
to the Lebesgue measure m and such that there exists a measurable set K ⊂ [0, 1]
with μ(K) = 0 and m(K) = 1/2. In particular, m is not absolutely continuous
with respect to μ. Could we require that m(K) = 1?
A.2.14. Assume that f : X → X is such that there exists a countable cover of M by
measurable sets Bn , n ≥ 1, such that the restriction of f to each Bn is a bijection
onto its image, with measurable inverse. Let η be a probability measure on M
such that A ⊂ Bn and η(A) = 0 implies η(f (A)) = 0. Show that there exists a
function Jη : X → [0, +∞] such that
ψ dη = (ψ ◦ f )Jη dη
f (Bn ) Bn
We denote B(x, r) = {y ∈ M : d(x, y) < r} and call it the ball of center x ∈ M and
radius r > 0.
Every metric space has a natural structure of a topological space where the
family of balls centered at each point is a basis of neighborhoods for that
point. Equivalently, a subset of M is open if and only if it contains some ball
centered at each one of its points. In the converse direction, one says that a
topological space is metrizable if its topology can be defined in this way, from
some distance function.
Proof. Let B0 be the family of all Borel subsets B for which the condition in
the definition holds, that is, such that for every ε > 0 there exist a closed set
F and an open set A satisfying F ⊂ B ⊂ A and μ(A \ F) < ε. Begin by noting
that B0 contains all the closed subsets of M. Indeed, let B be any closed set
and let Bδ denote the (open) set of points whose distance to B is less than δ.
454 Measure theory, topology and analysis
It follows that, as stated above, the values that the probability measure μ
takes on the closed subsets of M determine μ completely: if ν is another
probability measure such that μ(F) = ν(F) for every closed set F then, taking
the complement, μ(A) = ν(A) for every open set A and, using the theorem,
μ(B) = ν(B) for every Borel set B. In other words, μ = ν. The same argument
shows that the values of μ on the open sets also determine the measure
completely.
The proposition that we state and prove next implies that the values of the
integrals of the bounded continuous functions also determine the probability
measure completely. Indeed, the same is true for the (smaller) set of bounded
Lipschitz functions.
Recall that a map h : M → N is Lipschitz if there exists some constant C > 0
such that d(h(x), h(y)) ≤ Cd(x, y) for every x, y ∈ M. When it is necessary to
specify the constant, we say that the function h is C-Lipschitz. More generally,
we say that h is Hölder if there exist C, θ > 0 such that d(h(x), h(y)) ≤ Cd(x, y)θ
for every x, y ∈ M. Then we also say that h is θ -Hölder or even (C, θ )-Hölder.
Lemma A.3.4. Given any closed subset F of M and any δ > 0, there exists
a Lipschitz function gδ : M → [0, 1] such that gδ (x) = 1 for every x ∈ F and
gδ (x) = 0 for every x ∈ M such that d(x, F) ≥ δ.
A.3 Measures in metric spaces 455
Now we may finish the proof of Proposition A.3.3. Let F be any closed
subset of M and, for every δ > 0, let gδ : M → [0, 1] be a function as in the
lemma above. By assumption,
gδ dμ = gδ dν for every δ > 0.
This shows that μ(F) = ν(F) for every closed subset F. As pointed out before,
the latter implies μ = ν.
On the other hand, every ϕ −1 (Bn,k ) is an open subset of ϕ −1 (Bn,k ) ∪ Acn,k , since
the complement is the closed set Acn,k . Consequently, ϕ −1 (Bn,k ) is open in E
for every (n, k). This shows that the restriction of ϕ to the set E is continuous.
To conclude the proof it suffices to use Proposition A.3.2 once more to find a
closed set F ⊂ E such that μ(E \ F) < ε/2.
satisfies μ(Ln ) > 1 − ε/2n . Take K = ∞ n=1 Ln . Note that K is closed and
∞ ∞
ε
μ(K ) ≤ μ
c
Ln <
c
n
= ε.
n=1 n=1
2
It remains to check that K is compact. For that, it is enough to show that every
sequence (xi )i in K admits some Cauchy subsequence (since M is complete,
this subsequence converges). Such a subsequence may be found as follows.
Since xi ∈ L1 for every i, there exists l(1) ≤ k(1) such that the set of indices
I1 = {i ∈ N : xi ∈ B(pl(1) , 1)}
is infinite. Let i(1) be the smallest element of I1 . Next, since xi ∈ L2 for every
i, there exists l(2) ≤ k(2) such that
I2 = {i ∈ I1 : xi ∈ B(pl(2) , 1/2)}
is infinite. Let i(2) be the smallest element of I2 \ {i(1)}. Repeating this
procedure, we construct a decreasing sequence In of infinite subsets of N, and
an increasing sequence i(1) < i(2) < · · · < i(n) < · · · of integers such that
A.3 Measures in metric spaces 457
i(n) ∈ In and all the xi , i ∈ In are contained in the same closed ball of radius
1/n. In particular,
d(xi(a) , xi(b) ) ≤ 2/n for every a, b ≥ n.
This shows that the subsequence (xi(n) )n is indeed Cauchy.
Proof. By Proposition A.3.2, we may find some closed set F ⊂ B such that
μ(B \ F) < ε/2. By Theorem A.3.5, there exists a compact subset K ⊂ M such
that μ(M \ K) < ε/2. Take L = F ∩ K. Then L is compact and μ(B \ L) < ε.
This norm is complete and, hence, endows C0 (M) with the structure of a
Banach space.
The conclusions of the previous sections hold in this setting, since every
compact metric space is separable and complete. Another useful fact about
compact metric spaces is that every open cover admits some Lebesgue number,
that is, some number ρ > 0 such that for every x ∈ M there exists some element
of the cover that contains the ball B(x, ρ).
A linear functional : C0 (M) → C is said to be positive if (ϕ) ≥ 0
for every function ϕ ∈ C0 (M) with ϕ(x) ≥ 0 for every x ∈ M. The theorem
of Riesz–Markov (see Theorem 6.19 in Rudin [Rud87]) shows that the only
positive linear functionals on C0 (M) are the integrals:
The next result, which is also known as the theorem of Riesz–Markov, gives
an analogous representation for continuous linear functionals in C0 (M), not
necessarily positive. Recall that the norm of a linear functional : C0 (M) → C
is defined by
|(ϕ)|
= sup : ϕ = 0 (A.3.1)
ϕ
and that is continuous if and only if the norm is finite.
The norm μ = |μ|(X) of the measure μ coincides with the norm of the
functional . Moreover, μ takes values in [0, ∞) if and only if is positive
and μ takes values in R if and only if (ϕ) ∈ R for every real function ϕ.
In other words, this last theorem asserts that the dual space of C0 (M) is
isometrically isomorphic to M(M). Theorems A.3.11 and A.3.12 extend to
locally compact topological spaces, with suitable assumptions on the behavior
of the functions at infinity. In this context the measure μ is still regular, but not
necessarily finite.
We also use the fact that the space C0 (M) has countable dense subsets
(Exercise A.3.6 is a particular instance):
A.3 Measures in metric spaces 459
Proof. We treat the case of real functions; the complex case is entirely
analogous. Every compact metric space is separable. Let {xk : k ∈ N} be a
countable dense subset of M. For each k ∈ N, consider the function fk : M → R
defined by fk (x) = d(x, xk ). Represent by A the set of all functions f : M → R
of the form
f = c+ ck1 ,...,ks fk1 · · · fks (A.3.2)
k1 ,...,ks
A.3.4 Exercises
A.3.1. Let M be a metrizable topological space. Justify that every point of M admits
a countable basis of neighborhoods. Check that M is separable if and only if it
admits a countable basis of open sets. Give examples of separable metric spaces
and non-separable metric spaces.
A.3.2. Let μ be a finite measure on a metric space M. Show that for every closed set
F ⊂ M there exists some finite or countable set E ⊂ (0, ∞) such that
μ({x ∈ M : d(x, F) = r}) = 0 for every r ∈ (0, ∞) \ E.
A.3.3. Let μ be a finite measure on a separable metric space M. Show that for every ε >
0 there exists a countable partition of M into measurable subsets with diameter
less than ε and whose boundaries have measure zero.
A.3.4. Let μ be a probability measure on [0, 1] and φ : [0, 1] → [0, 1] be the function
given by φ(x) = μ([0, x]). Check that φ is continuous if and only if μ is
non-atomic. Check that φ is absolutely continuous if and only if μ is absolutely
continuous with respect to the Lebesgue measure.
A.3.5. Let μ be a probability measure on some metric space M. Show that for every
integrable function ψ : M → R there exists a sequence ψn : M → R, n ≥ 1
of uniformly continuous functions converging to ψ at μ-almost every point.
Moreover, if ψ is bounded then we may choose the sequence in such a way
460 Measure theory, topology and analysis
that sup |ψn | ≤ sup |ψ| for every n. Do these claims remain true if we require
convergence at every point?
A.3.6. Without using Theorem A.3.13, show that the space C0 ([0, 1]d ) of continuous
functions, real or complex, on the compact unit cube is separable, for every d ≥ 1.
Euclidean space Rd : consider the atlas consisting of a unique map, namely, the
identity map Rd → Rd .
A.4 Differentiable manifolds 461
Sphere Sd = {(x0 , x1 , . . . , xd ) ∈ Rd+1 : x02 + x12 + · · · + xd2 = 1}: consider the atlas
formed by the two stereographic projections:
Sd \ {(1, 0, . . . , 0)} → Rd , (x0 , x1 , . . . , xd ) → (x1 , . . . , xd )/(1 − x0 )
Sd \ {(−1, 0, . . . , 0)} → Rd , (x0 , x1 , . . . , xd ) → (x1 , . . . , xd )/(1 + x0 ).
Torus Td = Rd /Zd : consider the atlas formed by the inverses of the maps gz :
(0, 1)d → Td , defined by gz (x) = z + x mod Zd for every z ∈ Rd .
(i) A and B are compatible: the coordinate changes ψβ ◦ ϕα−1 and ϕα ◦ ψβ−1
are differentiable in their domains, for every α and every β;
(ii) for every β, the local chart ψβ maps Vβ = Vβ ∩ S onto an open subset Yβ
of Rk × {0d−k }.
It is clear that f ∈ U (f ). For each g ∈ U (f ) and each pair (i, j) such that Ki,j is
non-empty, denote by gi,j the restriction of ψj ◦ g ◦ ϕi−1 to the set ϕi (Ki,j ). For
each r ∈ N and ε > 0, define
where the supremum is over every s ∈ {1, . . . , r}, every x ∈ ϕi (Ki,j ) and every
pair (i, j) such that Ki,j = ∅. By definition, the family {U r (f , ε) : ε > 0} is a basis
of neighborhoods of each f ∈ Cr (M, N) relative to the Cr topology. Also by
definition, the family {U r (f , ε) : ε > 0 and r ∈ N} is a basis of neighborhoods
of f ∈ C∞ (M, N) relative to the C∞ topology.
The Cr topology has very nice properties: in particular, it admits a countable
basis of open sets and is completely metrizable, that is, it is generated by some
complete distance. An interesting consequence is that Cr (M, N) is a Baire
space: every intersection of a countable family of open dense subsets is dense
in the space. The set Diffeor (M) of diffeomorphisms of class Cr is an open
subset of Cr (M, M) relative to the Cr topology.
The tangent space to the manifold M at the point p is the set of such
equivalence classes. We denote this set by Tp M. For any fixed local chart
ϕα : Uα → Xα with p ∈ Uα , the map
Dϕα (p) : Tp M → Rd , [c] → (ϕα ◦ c) (0)
is well defined and is a bijection. We may use this bijection to identify Tp M
with Rd . In this way, the tangent space acquires the structure of a vector space,
transported from Rd via Dϕα (p). Although this identification Dϕα (p) depends
on the choice of the local chart, the vector space structure on Tp M does not.
That is because, for any other local chart ϕβ : Uβ → Xβ with p ∈ Uβ , the
corresponding map Dϕβ (p) is given by
Dϕβ (p) = D ϕβ ◦ ϕα−1 (ϕα (p)) ◦ Dϕα (p).
Since D ϕβ ◦ ϕα−1 (ϕα (p)) is a linear isomorphism, it follows that the vector
space structures transported from Euclidean space to Tp M by Dϕα (p) and
Dϕβ (p) coincide, as we stated.
If f : M → N is a differentiable map, its derivative at a point p ∈ M is the
linear map Df (p) : Tp M → Tf (p) N defined by
Df (p) = Dψβ (f (p))−1 ◦ D ψβ ◦ f ◦ ϕα−1 (ϕα (p)) ◦ Dϕα (p),
where ϕα : Uα → Xα is a local chart of M with p ∈ Uα and ψβ : Vβ → Yβ is a
local chart of N with f (p) ∈ Vβ . The definition does not depend on the choice
of these local charts.
The tangent bundle to M is the (disjoint) union TM = p∈M Tp M of all the
tangent spaces to M. For each local chart ϕα : Uα → Xα , consider the union
TUα M = p∈Uα Tp M and the map
Dϕα : TUα M → Xα × Rd
that associates with each [c] ∈ TUα M the pair
((ϕα ◦ c)(0), (ϕα ◦ c) (0)) ∈ Xα × Rd .
We consider on TM the (unique) topology that turns every Dϕα into a
homeomorphism. Assuming that the atlas {ϕα : Uα → Xα } of the manifold M
is of class Cr , the coordinate change
Dϕβ ◦ Dϕα−1 : ϕα Uα ∩ Uβ × Rd → ϕβ Uα ∩ Uβ × Rd
is a map of class Cr−1 for any α and β such that Uα ∩ Uβ = ∅. So, the tangent
bundle TM is endowed with the structure of a manifold of class Cr−1 and
dimension 2d.
The derivative Df : TM → TN of a differentiable map f : M → N is the map
whose restriction to each tangent space Tp M is given by Df (p). If f is of class
Cr then Df is of class Cr−1 , relative to the manifold structure on the tangent
bundles TM and TN that we introduced in the previous paragraph. For example,
464 Measure theory, topology and analysis
The family {dxi1 ∧ · · · ∧ dxik : 1 ≤ i1 < · · · < ik ≤ d} is a basis of the vector space
of alternate k-linear forms in Tp M.
A differential k-form in M is a map θ assigning to each point p ∈ M an
alternate k-linear form in the tangent space Tp M that depends differentiably on
the point. In local coordinates, this may be written as
θp = ai1 ,...,ik (p)dxi1 ∧ · · · ∧ dxik .
1≤i1 <···<ik ≤d
The differentiability condition means that the coefficients ai1 ,...,ik (p) depend
differentiably on the point p.
Assuming that k < d, the exterior derivative of θ is the differential
(k + 1)-form dθ determined by
∂ai ,...,i
dθp = 1 k
(p)dxj ∧ dxi1 ∧ · · · ∧ dxik ,
1≤i <···<i ≤d j
∂x j
1 k
where the second sum is over all j ∈ / {i1 , . . . , ik }; one can check that the
expression on the right-hand side does not depend on the choice of the local
chart. A differential k-form θ is closed if dθ = 0 (or else k = d) and it is exact if
there exists some (k − 1)-form η such that dη = θ (or else k = 0). Every exact
differential form is closed.
For much more information on the subject of differential forms, see the book
of Henri Cartan [Car70].
A.4.4 Transversality
The result that we state next is an important tool for constructing new
manifolds. We say that y ∈ N is a regular value of a differentiable map f : M →
N if the derivative Df (x) : Tx M → Ty N is surjective for every x ∈ f −1 (y). Note
that this holds, automatically, if y is not in the image of f , that is, if f −1 (y) is
the empty set. On the other hand, in order that some point y ∈ f (M) is a regular
value of f it is necessary that dim M ≥ dim N.
The next theorem asserts that, for every map f : M → N of class Cr with r
sufficiently high, “almost all” points y ∈ N are regular values. We say that a
set X ⊂ N is residual if it contains some countable intersection of open and
dense subsets. Every residual set is dense in the manifold, because manifolds
are Baire spaces. We say that a set Z ⊂ N has volume zero if for every local
A.4 Differentiable manifolds 467
such that v ·p v > 0 for every non-zero vector v ∈ Tp M. As part of the definition,
this inner product is required to vary in a differentiable way with the point p, in
the following sense. Consider any local chart ϕα : Uα → Xα of M. As explained
previously, for every p ∈ Uα we may identify Tp M with Rd , through the map
Dϕα (p). Thus, we may view ·p as an inner product in the Euclidean space. Let
e1 , . . . , ed be a basis of Rd . Then the functions gα,i,j (p) = ei ·p ej are required to
be differentiable, for every pair (i, j) and any choice of the local chart ϕα and
the basis e1 , . . . , ed .
We call a Riemannian manifold any manifold endowed with a Riemannian
metric. Every submanifold S of a Riemannian manifold M inherits the structure
of a Riemannian manifold, given by the restriction of the inner product ·p of
M to the tangent subspace Tp S of each point p ∈ S. Every compact manifold
admits (infinitely many) Riemannian metrics. That follows from the theorem
of Whitney (see Section 1.3 of Hirsch [Hir94]), according to which every
compact manifold may be realized as a submanifold of some Euclidean space.
Actually, this remains true in the much larger class of paracompact manifolds
(which we do not define here): every paracompact manifold of dimension
d is diffeomorphic to some submanifold of R2d . In particular, paracompact
manifolds are always metrizable.
Starting from the Riemannian metric, we may define the length of a
differentiable curve γ : [a, b] → M, by
b
length(γ ) = γ (t)γ (t) dt, where vp = (v ·p v)1/2 .
a
Most of the time, one considers the restriction of the geodesic flow to the unit
tangent bundle T 1 M = {(p, v) ∈ TM : vp = 1}. This is well defined since,
as we mentioned before, the norm of the velocity vector of any geodesic is
constant.
A.4.6 Exercises
A.4.1. Check that every set X with the cardinality of R may be endowed with the
structure of a differentiable manifold of class C∞ and dimension d, for any d ≥ 1.
A.4.2. Consider the differentiable manifolds M = (R, A) and N = (R, B), where A is
the atlas consisting of the map φ(x) = x and B is the atlas consisting of the map
ψ(x) = x3 . Is the map f : M → N defined by f (x) = x a diffeomorphism between
these manifolds?
A.4.3. A topological space is path connected if any two points are connected by some
continuous curve. Show that every (connected) manifold is path connected.
A.4.4. For each d ≥ 2, the projective space of dimension d is the set Pd of all subspaces
of Rd+1 with dimension 1. Equivalently, Pd is the quotient space of Rd+1 \ {0}
for the equivalence relation defined by:
Ui = {[x0 : · · · : xd ] ∈ Pd : xi = 0}
[Observation: The number k is called the degree of f and is denoted degree(f ).]
A.4.7. Consider on R+ = {x ∈ R : x > 0} the Riemannian metric defined by u ·x v =
uv/x2 . Calculate the distance d(a, b) between any two points a, b ∈ R+ .
A.4.8. Let M and N be submanifolds of Rm+n with dim M = m and dim N = n. Show
that there exists a set Z ⊂ Rm+n with volume zero such that, for every v in the
complement of Z, the translate M + v is transverse to N:
Tx (M + v) + Tx N = Rd for every x ∈ (M + v) ∩ N.
Note that if the measure μ is finite, which is the case in most of our
examples, then all bounded measurable functions are in Lp (μ):
|f |p dμ ≤ (sup |f |)p m(X) < ∞.
The next theorem asserts that · p turns Lp (μ) into a Banach space:
The most interesting part of the proof of this theorem is to establish the
triangle inequality, which in this context is known as the Minkowski inequality:
In Exercises A.5.2 and A.5.5 we invite the reader to prove the Minkowski
inequality and to complete the proof of Theorem A.5.2.
It follows from the properties of the Lebesgue integral that this expression does
define an inner product on L2 (μ). Moreover, this product gives rise to the norm
· 2 through:
f 2 = (f · f )1/2 .
In particular, we have the Cauchy–Schwarz inequality:
This inequality has the following interesting consequence. Assume that the
measure μ is finite and consider any f ∈ L2 (μ). Then, taking g ≡ 1,
1/2 1/2
|f | dμ = |f ḡ| dμ ≤ |f | dμ
2
1 dμ < ∞. (A.5.2)
This proves that every function in L2 (μ) is also in L1 (μ). In fact, when the
measure μ is finite one has Lp (μ) ⊂ Lq (μ) whenever p ≥ q (Exercise A.5.3).
The next result is a generalization of the Cauchy–Schwarz inequality for all
values of p > 1:
Theorem A.5.5 (Hölder inequality). Given 1 < p < ∞, consider q > 1 defined
by the relation 1p + 1q = 1. Then, for every f ∈ Lp (μ) and every g ∈ Lq (μ), we
have that f ḡ ∈ L1 (μ) and
1p 1q
|f ḡ| dμ ≤ |f |p dμ |g|q dμ .
This statement is false for p = ∞: in general, the dual space of L∞ (μ) is not
isomorphic to L1 (μ).
A.5.4 Convexity
We say that a function φ : I → R defined on an interval I of the real line is
convex if
φ(tx + (1 − t)y) ≤ tφ(x) + (1 − t)φ(y)
for every x, y ∈ I and t ∈ [0, 1]. Moreover, we say that φ is concave if −φ
is convex. For functions that are twice differentiable we have the following
practical criterion (Exercise A.5.1): φ is convex if φ (x) ≥ 0 for every x ∈ I
and it is concave if φ (x) ≤ 0 for every x ∈ I.
Theorem A.5.8 (Jensen inequality). Let φ : I → R be a convex function. If μ
is a probability measure on X and f ∈ L1 (μ) is such that f dμ ∈ I, then:
φ f dμ ≤ φ ◦ f dμ.
Example A.5.9. For any probability measure μ and any integrable positive
function f , we have
log f dμ ≥ log f dμ.
Indeed, this corresponds to the Jensen inequality for the function φ : (0, ∞) →
R given by φ(x) = − log x. Note that φ is convex: φ (x) = 1/x2 > 0 for every x.
Example A.5.10. Let φ : R → R be a convex function, (λi )i be a sequence
of non-negative real numbers satisfying ∞i=1 λi ≤ 1 and (ai )i be a bounded
sequence of real numbers. Then
/∞ 0 ∞
φ λi ai ≤ λi φ(ai ). (A.5.4)
i=1 i=1
This may be seen as follows. Consider X = [0, 1] endowed with the Lebesgue
measure μ. Let f : [0, 1] → R be a function of the form f = ∞ i=1 ai XEi , where
the Ei are pairwise disjoint measurable sets such that μ(Ei ) = λi . The Jensen
inequality applied to this function f gives precisely the relation (A.5.4).
A.5.5 Exercises
A.5.1. Consider any function ϕ : (a, b) → R. Show that if ϕ is twice differentiable and
φ ≥ 0 then ϕ is convex. Show that if ϕ is convex then it is continuous.
A.5.2. Consider p, q > 1 such that 1/p + 1/q = 1. Prove:
(a) The Young inequality: ab ≤ ap /p + aq /q for every a, b > 0.
(b) The Hölder inequality (Theorem A.5.5).
(c) The Minkowski inequality (Theorem A.5.3).
A.6 Hilbert spaces 473
A.5.3. Show that if μ is a finite measure then we have Lq (μ) ⊂ Lp (μ) for every 1 ≤ p <
q ≤ ∞.
A.5.4. Let μ be a finite measure and f ∈ L∞ (μ) be different from zero. Show that
|f |n+1 dμ
f ∞ = lim .
n |f |n dμ
A.5.5. Show that a normed vector space (V, · ) is complete if and only if every
series k vk that is absolutely summable (meaning that k vk converges) is
convergent. Use this fact to show that if μ is a probability measure then · p is
a complete norm on Lp (μ) for every 1 ≤ p ≤ ∞.
A.5.6. Show that if μ is a finite measure and 1/p + 1/q = 1 with 1 ≤ p < ∞ then the
map : Lq (μ) → Lp (μ)∗ , (g)f = fg dμ is an isomorphism and an isometry.
A.5.7. Show that if X is a metric space then, given any Borel probability measure μ,
the set C0 (X) of all continuous functions is dense in Lp (μ) for every 1 ≤ p ≤
∞. Indeed, the same holds for the subset of all uniformly continuous bounded
functions.
A.5.8. Let f , g : X → R be two positive measurable functions such that f (x)g(x) ≥ 1 for
every x. Show that f dμ g dμ ≥ 1 for every probability measure μ.
1. (u + w) · v = u · v + w · v and u · (v + w) = u · v + u · w;
2. (λu) · v = λ(u · v) and u · (λv) = λ̄(u · v);
3. u · v = v · u;
4. u · u ≥ 0 and u · u = 0 if and only if u = 0.
Given any family (Hα )α of subspaces of H, the set of all vectors of the form
v = α vα with vα ∈ Hα for every α is a subspace of H (see Exercise A.6.2).
It is called the sum of the family (Hα )α and it is denoted by α Hα .
474 Measure theory, topology and analysis
A.6.1 Orthogonality
Let H be a Hilbert space. Two vectors u, v ∈ H are said to be orthogonal if
u · v = 0. We call a subset of H orthonormal if its elements have norm 1 and
are pairwise orthogonal.
A Hilbert basis of H is an orthonormal subset B = {vβ } such that the set of
all (finite) linear combinations of elements of B is dense in H. For example,
the Fourier basis
{x → e2πikx : k ∈ Z} (A.6.1)
is a Hilbert basis of the space L2 (m) of all measurable functions on the unit
circle whose square is integrable with respect to the Lebesgue measure.
A Hilbert basis B = {vβ } is usually not a basis of the vector space in the
usual sense (Hammel basis): it is usually not true that every vector of H is a
finite linear combination of the elements of B. However, every v ∈ H may be
written as an infinite linear combination of the elements of the Hilbert basis:
v= (v · vβ )vβ and, moreover, v2 = |v · vβ |2 .
β β
A.6.2 Duality
A linear functional on a Hilbert space H (or, more generally, on a Banach
space) is a linear map from H to the scalar field (R or C). It is said to be
bounded if
|φ(v)|
φ = sup : v = 0 < ∞.
v
This is equivalent to saying that the linear functional is continuous, relative to
the topology defined by the norm of H (see Exercise A.6.3). The dual space
of a Hilbert space H is the vector space H ∗ formed by all the bounded linear
functionals. The function φ → φ is a complete norm on H ∗ and, hence, it
endows the dual with the structure of a Banach space. The map
h : H → H∗, w → v → v · w (A.6.2)
is a bijection between the two spaces and it preserves the norms. In particular,
h is a homeomorphism. Moreover, it satisfies h(w1 + w2 ) = h(w1 ) + h(w2 ) and
h(λw) = λ̄h(w).
The weak topology in H is the smallest topology relative to which all the
linear functionals v → v · w are continuous. In terms of sequences, it can be
characterized as follows:
The weak∗ topology in the dual space H ∗ is the smallest topology relative to
which φ → φ(v) is continuous for every v ∈ H.
It is known from the theory of Banach spaces (theorem of Banach–Alaoglu)
that every bounded closed subset of the dual space is compact for the weak∗
topology. In the special case of Hilbert spaces, the weak topology in the space
H is homeomorphic to the weak∗ topology in the dual space H ∗ : the map h in
(A.6.2) is also a homeomorphism for these topologies. Since h preserves the
class of bounded sets, it follows that the weak topology in the space H itself
enjoys the property in the theorem of Banach–Alaoglu:
v · Lw = L∗ v · w for every v, w ∈ H.
The adjoint operator is continuous, with L∗ = L and L∗ L = LL∗ =
L2 . Moreover, (L∗ )∗ = L and (L1 + L2 )∗ = L1∗ + L2∗ and (λL)∗ = λ̄L∗ (in
Exercise A.6.5 we invite the reader to prove these facts).
A continuous linear operator L : H → H is self-adjoint if L = L∗ . More
generally, L is normal if it satisfies L∗ L = LL∗ . We are especially interested in
the case when L is unitary, that is, L∗ L = id = LL∗ . We call linear isometry
to every linear operator L : H → H such that L∗ L = id . Hence, the unitary
operators are the linear isometries that are also normal operators.
A.6.3 Exercises
A.6.1. Let H be a Hilbert space. Prove:
(a) That every ball (either open or closed) is a convex subset of H.
(b) The parallelogram identity: v + w2 + v − w2 = v2 + w2 for any
v, w ∈ H.
(c) The polarization identity: 4(v · w) = v + w2 − v − w2 (real case) or
4(v · w) = (v + w2 − v − w2 ) + i(v + iw2 − v − iw2 ) (complex
case).
A.6.2. Show that, given any family (Hα )α of subspaces of a Hilbert space H, the set
of all the vectors of the form v = α vα with vα ∈ Hα for every α is a vector
subspace of H.
A.6.3. Show that a linear operator L : E1 → E2 between two Banach spaces is
continuous if and only if there exists C > 0 such that L(v)2 ≤ Cv1 for every
v ∈ E1 , where · i denotes the norm in the space Ei (we say that L is a bounded
operator).
A.6.4. Consider the Hilbert space L2 (μ). Let V be the subspace formed by the constant
functions. What is the orthogonal complement of V? Determine the (orthogonal)
projection to V of an arbitrary function g ∈ L2 (μ).
A.6.5. Prove that if L : H → H is a bounded operator on a Hilbert space H then the
adjoint operator L∗ is also bounded and L∗ = L and L∗ L = LL∗ = L2
and (L∗ )∗ = L.
A.6.6. Show that if K is a closed convex subset of a Hilbert space then for every z ∈ H
there exists a unique v ∈ K such that z − v = d(z, K).
A.6.7. Let S be a subspace of a Hilbert space H. Prove that:
(a) The orthogonal complement S⊥ of S is a closed subspace of H and it
coincides with the orthogonal complement of the closure S̄. Moreover,
(S⊥ )⊥ = S̄.
(b) Every v ∈ H may be written, in a unique fashion, as a sum v = s + s⊥ of
some s ∈ S̄ and some s⊥ ∈ S⊥ . The two vectors s and s⊥ are the elements of
S and S⊥ that are closest to v.
A.7 Spectral theorems 477
A.6.8. Let E be a closed subspace of a Hilbert space H. Show that E is also closed in the
weak topology. Moreover, U(E) is a closed subspace of H, for every isometry
U : H → H.
A.6.9. Show that a linear operator L : H → H on a Hilbert space H is an isometry if and
only if L(v) = v for every v ∈ H. Moreover, L is a unitary operator if and
only if L is an isometry and is invertible.
We write
ψ(L) = ψ(z) dE(z). (A.7.5)
for every ϕ, ψ.
480 Measure theory, topology and analysis
A.7.3 Exercises
A.7.1. Let T : E → E be a Banach space isomorphism, that is, a continuous linear
bijection whose inverse is also continuous. Show that T + H is a Banach space
isomorphism for every continuous linear map H : E → E such that H T −1 <
1. Use this fact to prove that the spectrum of every continuous linear operator
L : E → E is a closed set and is contained in the closed disk of radius L around
the origin.
A.7.2. Show that if L : H → H is a linear operator in a Hilbert space H with finite
dimension then spec(L) consists of the eigenvalues of L, that is, the complex
numbers λ for which L − λid is not injective. Give an example, in infinite
dimension, such that the spectrum is strictly larger than the set of eigenvalues.
A.7.3. Prove Lemma A.7.3.
A.7.4. Prove Proposition A.7.8, along the following lines:
(a) Assume that Lv = λv for some v = 0. Consider the functions
(z − λ)−1 if |z − λ| > 1/n
ϕn (z) =
0 otherwise.
Show that ϕn (L)(L − λid ) = E({z : |z − λ| > 1/n}) for every n. Conclude that
E({λ})v = v and, consequently, λ is an atom of E.
(b) Assume that there exists w ∈ H such that v = E({λ})w is non-zero. Show
that, given any measurable set B ⊂ C,
v if λ ∈ B
E(B)v =
0 if λ ∈ / B.
with grad H(f (x)) |α| = grad H(x). Note that = (γi,j )i,j is the matrix of
D(f | Hc ) and observe that | det | = grad H(x)/ grad H(f (x)). Using the
formula of change of variables, conclude that f | Hc preserves the measure
ds/ grad H.
1.4.4. Choose a set E ⊂ M with measure less than ε/n and, for each k ≥ 1, let Ek be
the set of points x ∈ E that return to E in exactly k iterates. Take for B the union
of the sets Ek , with k ≥ n, of the n-th iterates of the sets Ek with k ≥ 2n, and so
Hints or solutions for selected exercises 483
on. For the second part, observe that if (f , μ) is aperiodic then μ cannot have
atoms.
1.4.5. By assumption, f τ (y) ∈ Hn−τ (y) whenever y ∈ Hn with n > τ (y). Therefore,
T(y) ∈ H if y ∈ H. Consider An = {1 ≤ j ≤ n : x ∈ Hj } and Bn = {l ≥ 1 :
l
i=0 τ (T (x)) ≤ n}. Show, by induction, that #An ≤
i
#B and deduce that
k−1n i
lim supn #Bn /n ≥ θ . Now suppose that lim infk (1/k) i=0 τ (T (x)) > (1/θ).
Show that there exists θ0 < θ such that #Bn < θ0 n, for every n sufficiently
large. This contradicts the previous conclusion.
1.5.5. Observe that the maps f , f 2 , . . . , f k commute with each other and then use the
Poincaré multiple recurrence theorem.
1.5.6. By definition, the complement of (f1 , . . . , fq )c is an open set. The Birkhoff
multiple recurrence theorem ensures that the non-wandering set is
non-empty.
2.1.6. Consider the image V∗ μ of the measure μ under V. Check that V∗ μ((a, b]) =
F(b) − F(a) for every a < b. Consequently, V∗ μ({b}) = F(b) − lima→b F(a).
Therefore, (−∞, b] is a continuity set for V∗ μ if and only if b is a continuity
point for F. Using Theorem 2.1.2, it follows that if (Vk∗ μ)k converges to V∗ μ
in the weak∗ topology then (Vk )k converges to V in distribution. Conversely,
if (Vk )k converges to V in distribution then Vk∗ μ((a, b]) = Fk (b) − Fk (a)
converges to F(b) − F(a) = V∗ μ((a, b]), for any continuity points a < b
of F. Observing that such intervals (a, b] generate the Borel σ -algebra
of the real lines, conclude that (Vk∗ μ)k converges to V∗ μ in the weak∗
topology.
2.1.8. (Billingsley [Bil68]) Use the hypothesis to show that if (Un )n is an increasing
sequence of open subsets of M such that n Un = M then, for every ε > 0 there
exists n such that μ(Un ) ≥ 1 − ε for every μ ∈ K. Next, imitate the proof of
Proposition A.3.7.
2.2.2. For the first part of the statement use induction in q. The case q = 1
corresponds to Theorem 2.1. Consider continuous transformations fi : M → M,
1 ≤ i ≤ q + 1 commuting with each other. By the induction hypothesis,
there exists a probability ν invariant under fi for 1 ≤ i ≤ q. Define μn =
n−1 j
(1/n) j=0 (fq+1 )∗ (ν). Note that (fi )∗ μn = μn for every 1 ≤ i ≤ q and every n.
Hence, every accumulation point of (μn )n is invariant under every fi , 1 ≤ i ≤ q.
By compactness, there exists some accumulation point μ ∈ M1 (M). Check
that μ is invariant under fq+1 . For the second part, denote by Mq ⊂ M1 (M)
the set of probability measures invariant under fi , 1 ≤ i ≤ q. Then, (Mq )q
is a non-increasing sequence of closed non-empty subsets of M1 (M). By
compactness, the intersection q Mq is non-empty.
2.2.6. Define μ in each iterate f j (W), j ∈ Z by letting μ(A) = m(f −j (A)) for each
measurable set A ⊂ f j (W).
2.3.2. Clearly, convergence in norm implies weak convergence. To prove the
converse, assume that (xk )k converges to zero in the weak topology but not
in the norm topology. The first condition implies that, for every fixed N, the
sum Nn=0 |xnk | converges to zero when k → ∞. The second condition means
that, up to restricting to a subsequence, there exists δ > 0 such that xk > δ
484 Hints or solutions for selected exercises
for every k. Then, there exists some increasing sequence (lk )k such that
lk−1
1
lk
1 1
|xnk | ≤ but |xnk | ≥ xk − ≥δ− for every k.
n=0
k n=0
k k
Take an = xnk /|xnk | for each lk−1 < n ≤ lk . Then, for every k,
∞
k 4 4
a x ≥ |x k
| − |x k
| − |xnk | ≥ xk − ≥ δ − .
n n n n
k k
n=0 l <n≤l
k−1 k n≤l k−1 n>l k
This contradicts the hypotheses. Now take xnk = 1 if k = n and xnk = 0 otherwise.
Given any (an )n ∈ c0 , we have that n an xnk = ak converges to zero when k →
∞. Therefore, (xk )k converges to zero in the weak∗ topology. But xk = 1 for
every k, hence (xk )k does not converge to zero in the norm topology.
1
2.3.6. Take W = U(H)⊥ and V = ( ∞ n=0 U (W)) .
n ⊥
2.3.7. Suppose that there exist tangent functionals T1 and T2 with T1 (v) > T2 (v) for
some v ∈ E. Show that φ(u + tv) + φ(u − tv) − 2φ(u) ≥ t(T1 (u) − T2 (u)) for
every t and deduce that φ is not differentiable in the direction of v.
2.4.1. Consider the set P of all probability measures on X × M of the form ν Z ×
η. Note that P is compact in the weak∗ topology and is invariant under the
operator F∗ .
2.4.2. The condition p̂ ◦ g = f̂ ◦ p̂ entails f̂ n ◦ p̂ = p̂ ◦ gn for every n ∈ Z. Using π ◦
p̂ = p, it follows that π ◦ f̂ n ◦ p̂ = p ◦ gn for every n ≤ 0. Therefore, p̂(y) =
n
p(g (y)) n≤0 . This proves the existence and uniqueness of p̂. Now suppose
that p is surjective. The hypotheses of compactness and continuity ensure that
−n −1
g (p ({xn })) n≤0
is a nested sequence of compact sets, for every (xn )n≤0 ∈ M̂. Take y in the
intersection and note that p̂(y) = (xn )n≤0 .
2.5.2. Fix q and l. Assume that for every n ≥ 1 there exists a partition {S1n , . . . , Sln } of
the set {1, . . . , n} such that no subset of Sjn contains an arithmetic progression of
length q. Consider the function φn : N → {1, . . . , l} given by φn (i) = j if i ∈ Sj and
φn (i) = l if i > n. Take (nk )k → ∞ such that the subsequence (φnk )k converges
at every point to some function φ : N → {1, . . . , l}. Consider Sj = φ −1 (j) for
j = 1, . . . , l. Some Sj contains some arithmetic progression of length q. Then
n
Sj k contains that arithmetic progression for every k sufficiently large.
2.5.4. Consider = {1, . . . , l}N with the distance d(ω, ω ) = 2−N where N ≥ 0 is
k
largest such that ω(i1 , . . . , ik ) = ω (i1 , . . . , ik ) for every i1 , . . . , ik < N. Note that
is a compact metric space. Given q ≥ 1, let Fq = {(a1 , . . . , ak ) : 1 ≤ ai ≤
q and 1 ≤ i ≤ k}. Let e1 , . . . , em be an enumeration of the elements of Fq . For
each j = 1, . . . , m, consider the shift map σj : → given by (σj ω)(n) = ω(n+
ej ) for n ∈ Nk . Consider the point ω ∈ defined by ω(n) = i ⇔ n ∈ Si . Let Z be
l
the closure of {σ11 · · · σmlm (ω) : l1 , . . . , lm ∈ N}. Note that Z is invariant under the
shift maps σj . By the Birkhoff multiple recurrence theorem, there exist ζ ∈ Z
and s ≥ 1 such that d(σjs (ζ ), ζ ) < 1 for every j = 1, . . . , m. Let e = (1, . . . , 1) ∈
l
Nk . Then ζ (e) = ζ (e + se1 ) = · · · = ζ (e + sem ). Consider σ11 · · · σmlm (ω) close
enough to ζ that ω(b) = ω(b + se1 ) = · · · = ω(b + sem ), where b = e + l1 e1 +
Hints or solutions for selected exercises 485
(c) Given ε > 0, take ρ as in part (b). For each n ≥ 1, write n = sρ + r, with
1 ≤ r ≤ ρ. Then,
(i+1)ρ
1 ρ 1 1
n s−1 sρ+r
ϕ(j) = ϕ(l) + ϕ(l).
n j=1 sρ + r i=0 ρ l=iρ+1 n l=sρ+1
For s large, the first term on the right-hand side is close to (1/ρ) ρ−1 j=0 ϕ(j)
(by part (b)) and the last term is close to zero (by part (a)). Conclude that the
left-hand side of the identity is a Cauchy sequence. (d) Observe that
n
1 1 n
2|x|
ϕ(x + k) − ϕ(j) ≤ sup |ϕ|
n n n
j=1 j=1
Using Lemma 3.2.5, the first inequality shows that lim supT→∞ (1/T)ϕT ≤ ϕ.
Analogously, using the version of Lemma 3.2.5 for continuous time, the
second inequality above gives that lim infT→∞ (1/T)ϕT ≥ ϕ. It also follows that
limT→∞ (1/T) ϕT dμ coincides with limn (1/n) ϕn dμ. By Theorem 3.3.3,
this last limit is equal to ϕ dμ.
486 Hints or solutions for selected exercises
3.3.6. Since log+ φ ∈ L1 (μ), for every ε > 0 there exists δ > 0 such that μ(B) < δ
n−1 +
implies B log+ θ dμ < ε. Using that log+ φ n ≤ j=0 log θ ◦ f j , one
gets that
n−1
1 + 1
μ(E) < δ ⇒ log φ dμ ≤
n
log+ θ dμ ≤ ε.
n E n j=0 f −j (E)
Next, note that | det Dξ |(y) = |X(y) · (∂/∂t)| = φ(y) for every y ∈ . It follows
that the flux of ν coincides with the measure η = φν . In particular, η is
invariant under the Poincaré map.
4.1.2. Use the theorem of Birkhoff and the dominated convergence theorem.
4.1.8. Assume that Uf ϕ = λϕ. Since Uf is an isometry, |λ| = 1. If λn = 1 for some n
then ϕ ◦ f n = ϕ and, by ergodicity, ϕ is constant almost everywhere. Otherwise,
given any c = 0, the sets ϕ −1 (λ−k c), k ≥ 0 are pairwise disjoint. Since they all
have the same measure, this measure must be zero. Finally, the set ϕ −1 (c) is
invariant under f and, consequently, its measure is either zero or total.
4.2.4. Let K be such a set. We may assume that K contains an infinite sequence of
periodic orbits (On )n with period going to infinity. Let Y ⊂ K be the set of
accumulation points of that sequence. Show that Y cannot consist of a single
point. Let p = q be periodic points in Y and z be a heteroclinic point, that is,
such that σ n (z) converges to the orbit of p when n → −∞ and to the orbit of q
when n → +∞. Show that z ∈ Y and deduce the conclusion of the exercise.
4.2.10. Let Jk = (0, 1/k), for each k ≥ 1. Check that the continued fraction expansion
of x is of bounded type if and only if there exists k ≥ 1 such that Gn (x) ∈ / Jk
for every n. Observe that μ(Jk ) > 0 for every k. Deduce that for every k and
μ-almost every x there exists n ≥ 1 such that Gn (x) ∈ Jk . Conclude that L has
zero Lebesgue measure.
4.2.11. For each L ∈ N, consider ϕL (x) = min{φ(x), L}. Then, ϕL ∈ L1 (μ) and, by
ergodicity, ϕ̃L = ϕL dμ at μ-almost every point. To conclude, observe that
φ̃ ≥ φ̃L for every L and φL dμ → +∞.
4.3.7. Let M = {0, 1}N and, for each n, let μn be the invariant measure supported on
the periodic orbit α n = (αkn )k , with period 2n, defined by αkn = 0 if 0 ≤ k < n
and αkn = 1 if n ≤ k < 2n. Show that (μn )n converges to (δ0 + δ1 )/2, where 0
and 1 are the fixed points of the shift map.
4.3.9. (a) Take k ≥ 1 such that every cylinder of length k has diameter less than δ.
Take y = (yj ) defined by yj+ni = xji for each 0 ≤ j < mi + k. (b) Take δ > 0 such
that d(z, w) < δ implies |ϕ(z) − ϕ(w)| < ε and consider k ≥ 1 given by part
(a). Choose mi , i = 1, . . . , s such that mi /ns ≈ αi for every i. Then take y as in
part (a). (c) By the ergodic theorem, ϕ dμ = ϕ̃ dμ. Take x1 , . . . , xs ∈ and
Hints or solutions for selected exercises 487
α 1 , . . . , α s such that ϕ̃ dμ ≈ i αi ϕ̃(xi ). Note that ϕ̃(y) = ϕ dνy , where νy is
the invariant measure supported on the orbit of y. Recall Exercise 4.1.1.
4.4.3. On each side of the triangle, consider the foot of the corresponding height, that
is, the orthogonal projection of the opposite vertex. Show that the trajectory
defined by those three points is a periodic orbit of the billiard.
4.4.5. Using (4.4.10) and the twist condition, we get that for each θ ∈ R there
exists exactly one number ρθ ∈ (a, b) such that "(θ , ρθ ) = θ . The function
θ → ρθ is continuous and periodic, with period 1. Consider its graph =
{(θ , ρθ ) : θ ∈ S1 }. Every point in ∩ f () is fixed under f : if (θ, ρθ ) =
f (γ , ργ ) = "(γ , ργ ), R(γ , ργ ) then, since "(γ , ργ ) = γ , it follows that θ = γ
and so ρθ = ργ . Since f preserves the area measure, none of the connected
components of A \ may be mapped inside itself. This implies that f ()
intersect at no less than two points.
4.4.7. Taking inspiration from Example 4.4.12, show that the billiard map in
extends to a Dehn twist in the annulus A = S1 × [−π/2, π/2], that is, a
homeomorphism f : A → A that coincides with the identity on both boundary
components but is homotopically non-trivial: actually, f admits a lift F : R ×
[−π/2, π/2] → R × [−π/2, π/2] such that F(s, −π/2) = (s − 2π , −π/2) and
F(s, π/2) = (s, π/2) for every s. Consider rational numbers pn /qn ∈ (−2π , 0)
with qn → ∞. Use Exercise 4.4.6 to show that g has periodic points of period
qn . One way to ensure that these periodic points are all distinct is to take the
qn mutually prime.
5.1.7. The statement does not depend on the choice of the ergodic decomposition,
since the latter is essentially unique. Consider the construction in Exer-
cise 5.1.6. The set M0 is saturated by the partition W s , that is, if x ∈ M0 then
W s (x) ⊂ M0 . Moreover, the map y → μy is constant on each W s (x). Since
the partition P is characterized by P(x) = P(y) ⇔ μx = μy , it follows that
P ≺ W s restricted to M0 .
5.2.1. Consider the canonical projections πP : M → P and πQ : M → Q, the
quotient measures μ̂P = (πP )∗ μ and μ̂Q = (πQ )∗ μ and the disintegrations
μ = μP dμ̂P (P) and μ = μQ dμ̂Q (Q). Moreover, for each P ∈ P, consider
μ̂P,Q = (πQ )∗ μP and the disintegration μP = μP,Q dμ̂P,Q (Q). Observe that
μ̂P,Q dμ̂P (P) = μ̂Q : given any B ⊂ Q,
−1 −1
μ̂P,Q (B) dμ̂P (P) = μP (πQ (B)) dμ̂P (P) = μ(πQ (B)) = μ̂Q (B).
Section 5.2.3. (c) Note that μ = μP dμ̂P (P) = μP,Q dμ̂P,Q (Q)dμ̂P (P) =
μπ(Q),Q dμ̂P,Q (Q)dμ̂P (P) = μπ(Q),Q dμ̂Q (Q).
5.2.2. Argue that the partition Q of the space M1 (M) into points is measurable.
Given any disintegration {μP : P ∈ P}, consider the measurable map M →
M1 (M), x → μP(x) . The pre-image of Q under this map is a measurable
partition. Check that this pre-image coincides with P on a subset with full
measure.
6.1.3. The function ϕ is invariant.
6.2.5. Denote by X the closure of the orbit of x. If X is minimal, for each y ∈ X there
exists n(y) ≥ 1 such that d(f n(y) (y), x) < ε. Then, by continuity, y admits an
open neighborhood V(y) such that d(f n(y) (z), x) < ε for every z ∈ V(y). Take
y1 , . . . , ys such that X ⊂ i V(yi ) and let m = maxi n(yi ). Given any k ≥ 1, take
i such that f k (x) ∈ V(yi ). Then, d(f k+ni (x), x) < ε, that is, k + ni ∈ Rε . This
proves that, given any m + 1 consecutive integers, at least one of them is in Rε .
Hence, Rε is syndetic. Now assume that X is not minimal. Then, there exists a
non-empty, closed invariant set F properly contained in X. Note that x ∈ / F and
so, for every ε sufficiently small, there exists an open set U that contains F and
does not intersect B(x, ε). On the other hand, since Rε is syndetic, there exists
m ≥ 1 such that for any k ≥ 1 there exists n ∈ {k, . . . , k + m} satisfying f n (x) ∈
B(x, ε). Take k such that f k (x) ∈ U1 , where U1 = U ∩ f −1 (U) ∩ · · · ∩ f −m (U),
and find a contradiction.
6.2.6. By Exercise 6.2.5, the set Rε = {n ∈ N : d(x, f n (x)) < ε} is syndetic for every
ε > 0. If y is close to x then {n ∈ N : d(f n (x), f n (y)) < ε} contains blocks of
consecutive integers with arbitrary length, no matter the choice of ε > 0. Let U1
be any neighborhood of x. It follows from the previous observations that there
exist infinitely many values of n ∈ N such that f n (x), f n (y) are in U1 . Fix n1 with
this property. Next, consider U2 = U1 ∩ f −n1 (U1 ). By the previous step, there
exists n2 > n1 such that f n2 (x), f n2 (y) ∈ U2 . Continuing in this way, construct
a non-increasing sequence of open sets Uk and an increasing sequence of
natural numbers nk such that f nk (Uk+1 ) ⊂ Uk and f nk (x), f nk (y) ∈ Uk . Check
that f ni1 +···+nik (x) and f ni1 +···+nik (y) are in U1 for any i1 < · · · < ik , k ≥ 1.
6.2.7. Consider the shift map σ : → in = {1, 2, . . . , q}N . The partition N =
S1 ∪ · · · ∪ Sq defines a certain element α = (αn ) ∈ , given by αn = i if and only
if n ∈ Si . Consider β in the closure of the orbit of α such that α and β are near
and the closure of the orbit of β is a minimal set. Apply Exercise 6.2.6 with
x = β, y = α and U = [0; α0 ] to obtain the result.
6.3.6. Write g = (a11 , a12 , a2 , a22 ). Then,
Eg (x11 , x12 , x21 , x22 ) = (a11 x11 +a12 x21 , a11 x12 +a12 x22 , a21 x11 +a22 x21 , a21 x12 +a22 x22 ).
Write the right-hand side as (y11 , y12 , y21 , y22 ). Use the formula of change of
variables, observing that det(y11 , y12 , y21 , y22 ) = (det g) det(x11 , x12 , x21 , x22 ) and
dy11 dy12 dy21 dy22 = (det g)2 dx11 dx12 dx21 dx22 .
√
N−1
#{1 ≤ n < N : 2
n ∈ (α, β)} ≤ 2k(β − α) + (β 2 − α 2 )
k=1
and the difference between the term on the right and the one on the left is less
than N. Hence,
1 √
lim 2 #{1 ≤ n < N 2 : n ∈ (α, β)} = β − α.
N
A similar calculation shows that the sequence (log n mod Z)n is not equidis-
tributed in the circle. [Observation: But it does admit a continuous (non-constant)
limit density. Calculate that density!]
6.4.3. Define φn = an + (−1/a)n . Check that (φn )n is the Fibonacci sequence and,
in particular, φn ∈ N for every n ≥ 1. Now observe that (−1/a)n converges to
zero. Hence, {n ≥ 1 : an mod Z ∈ I} is finite, for any interval I ⊂ S1 whose
closure does not contain zero.
7.1.1. It is clear that the condition is necessary. To see that it is sufficient: Given
A, consider the closed subspace V of L2 (μ) generated by the functions 1
and Xf −k (A) , k ∈ N. The hypothesis ensures that limn Ufn (XA ) · Xf −k (A) = (XA ·
1)(Xf −k (A) · 1) for every k. Conclude that limn Ufn (XA ) · φ = (XA · 1)(φ · 1) for
every φ ∈ V. Given a measurable set B, write XB = φ + φ ⊥ with φ ∈ V and
φ ⊥ ∈ V ⊥ to conclude that limn Ufn (XA ) · XB = (XA · 1)(XB · 1).
n−1
7.1.2. Assuming that E exists, decompose (1/n) j=0 |aj | into two terms, one over j ∈
E and the other over j ∈ / E. The hypotheses imply that the two terms converge to
n−1
zero. Conversely, assume that (1/n) j=0 |aj | converges to zero. Define Em =
490 Hints or solutions for selected exercises
{j ≥ 0 : |aj | ≥ (1/m)} for each m ≥ 1. The sequence (Em )m is increasing and each
Em has density zero; in particular, there exists m ≥ 1 such that (1/n)# Em ∩
{0, . . . , n − 1} < (1/m) for every n ≥ m . Choose ( m )m increasing and define
E = m (Em ∩ { m , . . . , m+1 − 1}). For the second part of the exercise, apply
the first part to both sequences, (an )n and (a2n )n .
7.1.6. (Pollicott and Yuri [PY98]) It is enough to treat the case when ϕj dμ = 0
for every j. Use induction on the number k of functions. The case k = 1 is
contained in Theorem 3.1.6. Use the inequalities
1 1 +1 % m
n N−m+1 m−1
an ≤ an+j + max |ai | + max |ai |
N n=1 N n=1 m j=0 N 1≤i≤m N−m≤i≤N
+1 N %2 N
bn ≤ (1/N) |bn |2
N n=1 n=1
to conclude that (1/N) N−1 (ϕ1 ◦ f n ) · · · (ϕk ◦ f kn )2 dμ is bounded above by
j=0
1 + 1 2 + 2m m2 %+ %2
N m−1
| ϕ1 ◦f n+j · · · ϕk ◦f k(n+j) dμ+ + 2 max supess |ϕi | .
N n=1 m j=0 N N 1≤i≤k
1
d
|Lϕ(y ) − Lϕ(y )| =
1 2
|ϕ(xj1 ) − ϕ(xj2 )| ≤ Kθ (ϕ)σ −θ d(y1 , y2 )θ .
d j=1
(b) It follows that Lϕ ≤ sup |ϕ| + σ −θ Kθ (ϕ) ≤ ϕ for every ϕ ∈ E, and
the identity holds if and only if ϕ is constant. Hence, L = 1. (c) Let
Jn = [inf Ln ϕ, sup Ln ϕ]. By part (a), the sequence (Jn )n is decreasing and the
diameter of Jn converges to zero exponentially fast. Take νϕ to be the point in
the intersection and note that Ln ϕ − νϕ = sup |Ln ϕ − νϕ | + Kθ (Ln ϕ). (d) The
constant functions are eigenvectors of L, associated with the eigenvalue λ = 1.
It follows that νϕ+c = νϕ + c for every ϕ ∈ E and every c ∈ R. Then, H = {ϕ :
νϕ = 0} is a hyperplane of E transverse to the line of constant functions. This
492 Hints or solutions for selected exercises
k
∞
∞
Ufn ϕ − ϕ22 ≤ |cj (λnj − 1)| +
2
2|cj | ≤ δ
2 2
ϕ22 + 2|cj |2 .
j=1 j=k+1 j=k+1
Hints or solutions for selected exercises 493
Given ε > 0, we may choose δ and k in such a way that each one of the terms
on the right-hand side is less than ε/2.
8.4.3. Let U : H → H be a non-invertible isometry. Recalling Exercise 2.3.6, show
that there exist closed subspaces V and W of H such that U : H → H is unitarily
conjugate to the operator U1 : V ⊕ W N → V ⊕ W N given by U1 | V = U | V and
U1 | W N = id . Let U2 : V ⊕ W Z → V ⊕ W Z be the linear operator defined by
U1 | V = U | V and U1 | W Z = id . Check that U2 is a unitary operator such
that U2 ◦ j = j ◦ U1 , where j : V ⊕ W N → V ⊕ W Z is the natural inclusion.
Show that if E ⊂ V ⊕ W N satisfies the conditions in the definition of Lebesgue
spectrum for U1 then j(E) satisfies those same conditions for U2 . Conclude
that the rank of U1 is well defined.
8.4.6. The lemma of Riemann–Lebesgue ensures that F takes values in c0 . The
operator F is continuous: F(ϕ) ≤ ϕ for every ϕ ∈ L1 (λ). Moreover,
F is injective: if F(ϕ) = 0 then ϕ(z)ψ(z) dλ(z) = 0 for every linear
combination ψ(z) = |j|≤l aj zj , aj ∈ C. Given any interval I ⊂ S1 , the
sequence ψN = |n|≤N cn zn , cn = I z−n dλ(z) of partial sums of the Fourier
series of the characteristic function XI is bounded (see [Zyg68, page 90]).
Using the dominated convergence theorem, it follows that F(ϕ) = 0 implies
I
ϕ(z) dλ(z) = 0, for any interval I. Hence, ϕ = 0. If F were bijective then, by
the open mapping theorem, its inverse would be a continuous linear operator.
Then, there would be c > 0 such that F(ϕ) ≥ cϕ for every ϕ ∈ L1 (λ). But
that is false: consider DN (z) = |n|≤N zn for N ≥ 0. Check that F(DN ) = (aNn )n
with aNn = 1 if |n| ≤ N and aNn = 0 otherwise. Hence, F(DN ) = 1 for every
N. Writing z = e2π it , check that DN (z) = sin((2N + 1)π t)/ sin(π t). Conclude
that DN = |DN (z)| dλ(z) converges to infinity when N → ∞. [Observation:
One can also give explicit examples. For instance, if (an )n converges to zero
and satisfies ∞ n=1 an /n = ∞ then the sequence (αn )n given by αn = an /(2i)
for n ≥ 1 and αn + α−n = 0 for every n ≥ 0 may not be written in the form
αn = zn dν(z). See Section 7.3.4 of Edwards [Edw79].]
8.5.3. By Exercise 8.5.2, f̃ is always injective. Conclude that if f̃ is surjective then
it is invertible: there exists a homomorphism of measure algebras h : B̃ → B̃
such that h ◦ f̃ = f̃ ◦ h = id . Use Proposition 8.5.6 to find g : M → M such
that g ◦ f = f ◦ g at μ-almost every point. The converse is easy: if (f , μ) is
invertible at almost every point then the homomorphism of measure algebras g̃
associated with g = f −1 satisfies g̃ ◦ f̃ = f̃ ◦ g̃ = id ; in particular, f̃ is surjective.
8.5.6. Check that the unions of elements of n Pn are pre-images, under the inclusion
ι, of an open subset of K. Use that fact to show that if the chains have measure
zero then for each δ > 0 there exists an open set A ⊂ K such that m(A) < δ
and every point outside A is in the image of the inclusion: in other words,
K \ ι(MP ) ⊂ A. Conclude that ι(MP ) is a Lebesgue measurable set and its
complement in K has measure zero. For the converse, use the fact that (a)
implies (c) in Exercise A.1.13.
9.1.1. Hμ (P/R) ≤ Hμ (P ∨ Q/R) = Hμ (Q/R) + Hμ (P/Q ∨ R) ≤ Hμ (Q/R) +
Hμ (P/Q).
*k−1 −i * *
9.1.3. Let g = f k . Then Hμ ( i=0 f (P)/ nj=k f −j (P)) = Hμ (P k / n−k −i
i=1 g (P )). By
k
Lemma 9.1.12, this expression converges to hμ (g, Pk ). Now use Lemma 9.1.13.
494 Hints or solutions for selected exercises
*n−1 −j
9.2.5. Write Qn = j=0 f (Q) for each n and let A be the σ -algebra generated by
n Q . Check that f is measurable with respect to the σ -algebra A. Show
n
1 1
hν (g, Q, x) = lim − log ν(Qk (x)) and hμ (f , P, x) = lim − log μ(P nk (x)).
k k k nk
Deduce the first part of the statement. For the second part, note that if
η is invariant then η(f (A)) = η(f −1 (f (A))) ≥ η(A) for every domain of
invertibility A.
9.7.7. The “if” part of the statement is easy: we may exhibit the ergodic equivalence
explicitly. Assume that the two systems are ergodically equivalent. The fact
that k = l follows from Exercise 8.1.2. To prove that p and q are permutations
of one another, use the fact that the Jacobian is invariant under ergodic
equivalence (Exercise 9.7.6), together with the expressions of the Jacobians
given by Example 9.7.1.
10.1.6. Note that ψ(M) is compact and the inverse ψ −1 : ψ(M) → M is (uniformly)
continuous. Hence, given ε > 0 there exists δ > 0 such that if E ⊂ M is
(n, ε)-separated for f then ψ(E) ⊂ N is (n, δ)-separated for g. Conclude that
s(f , ε, M) ≤ s(g, δ, N) and deduce that h(f ) ≤ h(g). [Observation: The statement
remains valid in the non-compact case, as long as we assume the inverse
ψ −1 : ψ(M) → M to be uniformly continuous.]
For the second part, consider the distance defined in by
−|n|
d (xn )n , (yn )n = 2 |xn − yn |.
n∈Z
Consider a discrete set A ⊂ [0, 1] with n elements. Check that the restriction to
AZ of the distance of [0, 1]Z is uniformly equivalent to the distance defined in
(9.2.15). Using Example 10.1.2, conclude that the topological entropy of σ is
at least log n, for any n.
10.1.10. (Carlos Gustavo Moreira) Let θ1 = 0, θ2 = 01 and, for n ≥ 2, θn+1 = θn θn−1 . We
claim that, for every n ≥ 1, there exists a word τn such that θn θn+1 = τn αn and
θn+1 θn = τn βn , where αn = 10 and βn = 01 if n is even and αn = 01 and βn = 10
if n is odd. That holds for n = 1 with τ1 = 0 and for n = 2 with τ2 = 010. If it
holds for a given n, then θn+1 θn+2 = θn+1 θn+1 θn = θn+1 τn βn = τn+1 αn+1 and also
θn+2 θn+1 = θn+1 θn θn+1 = θn+1 τn αn = τn+1 βn+1 , as long as we take τn+1 = θn+1 τn .
This proves the claim. It follows that the last letters of θn and θn+1 are distinct.
Now, we claim that θ = limn θn is not pre-periodic. Indeed, suppose that θ
were pre-periodic and let m be its period. Since the length of θn is Fn+1 (where
Fk is the k-th Fibonacci number), we may take n large such that m divides
Fn+1 and the pre-period (that is, the length of the non-periodic part) of θ is less
than Fn+2 . Then, θ starts with θn+3 = θn+2 θn+1 = θn+1 θn θn+1 . However, since
the length Fn+1 of θn is a multiple of the period m, the Fm+2 -th letter of θ ,
which is the last letter of θn+1 , must coincide with the (Fm+2 + Fn+1 )-th letter
of θ , which is the last letter of θn . This would contradict the conclusion of the
previous paragraph.
496 Hints or solutions for selected exercises
Next, we claim that ck+1 (θ ) > ck (θ ) for every k. Indeed, suppose that
ck+1 (θ ) = ck (θ ) for some k. Then, every subword of length k can have only
one continuation of length k + 1. Hence, we have a transformation in the set
of subwords of length k, assigning to each subword its unique continuation,
without the first letter. Since the domain is finite, all the orbits of this
transformation are pre-periodic. In particular, θ is also pre-periodic, which
contradicts the conclusion in the previous paragraph.
Since c1 (θ ) = 2, it follows that ck (θ ) ≥ k + 1 for every k. We claim that
cFn+1 (θ ) ≤ Fn+1 + 1 for every n > 1. To prove that fact, note that θ may
be written as a concatenation of words belonging to {θn , θn+1 } because (by
induction) every θr with r ≥ n may be written as a concatenation of words
belonging to {θn , θn+1 }. Thus, any subword of θ of length Fn+1 (which is the
length of θn ) is a subword of θn θn+1 or θn+1 θn . Since θn θn+1 = θn θn θn−1 , is a
subword of θn θn θn−1 θn−2 = θn θn θn , there are at most |θn | = Fn+1 subwords of
length |θn | = Fn+1 of θn θn θn and, hence, of θn θn+1 . Since θn θn+1 = τn αn and
θn+1 θn = τn βn , and θn+1 θn ends with θn and |βn | = 2, the unique subword
of θn+1 θn of length |θn | = Fn+1 that may not be a subword of θn θn+1 is the
subword that ends with the first letter of βn (that is, one position before the end
of θn+1 θn ). Hence, cFn+1 (θ ) ≤ Fn+1 + 1 as stated.
We are ready to obtain the statement of the exercise. Assume that ck (θ ) >
k+1 for some k. Taking n such that Fn+1 > k, we would have cFn+1 (θ )−ck (θ ) <
Fn+1 + 1 − (k + 1) = Fn+1 − k and that would imply that cm+1 (θ ) ≤ cm (θ )
for some m with k ≤ m < Fn+1 . This would contradict the conclusion in the
previous paragraph.
10.2.4. By Proposition 10.2.1, h(f ) = g(f , δ, M) whenever f is ε-expansive and δ < ε/2.
Show that if d(f , h) < δ/3 then g(h, δ/3, M) ≤ g(f , δ, M). Deduce that if (fk )k
converges to f then lim supk h(fk ) = lim supk g(fk , δ/3, M) ≤ g(f , δ, M) = h(f ).
10.2.8. (Bowen [Bow72]) Write a = g∗ (f , ε). Observe that if E is an (n, δ)-generating
set of M, with δ < ε, then M = x∈E B(x, n, ε). Combining this fact with the
result of Bowen, show that gn (f , δ, M) ≤ #Eec+(a+b)n . Take b → 0 to conclude
the inequality.
10.3.3. (a) The hypothesis implies that for every n and every subcover δ of β n
there exists a subcover γ of α n such that γ ≺ δ. Taking γ minimal, #γ ≤
#δ and U∈γ infx∈U eφn (x) ≤ V∈δ infy∈V eφn (y) . It follows that Qn (f , φ, α) ≤
Qn (f , φ, β) for every n. (b) Lemma 10.1.11 gives that α n+k−1 is a subcover
of (α k )n . A variation of the argument in part (a) gives that Qn (f , φ, α k ) ≤
e(k−1) sup |φ| Qn+k−1 (f , φ, α) for every n. Hence, Q± (f , φ, α k ) ≤ Q± (f , φ, α).
[Observation: Analogously, P(f , φ, α k ) ≤ P(f , φ, α).] By the second part of
Lemma 10.1.11, for every subcover β of (α k )n there exists a subcover
φn+k−1 (x)
γ of α n+k−1 such that γ ≺ β, #γ ≤ #β and U∈γ infx∈U e ≤
(k−1) sup |φ|
φn (y)
e V∈β infy∈V e (taking γ minimal). Deduce that Qn+k−1 (f , φ, α) ≤
(k−1) sup |φ|
e Qn (f , φ, α ). Hence, Q± (f , φ, α) ≤ Q± (f , φ, α k ). (c) Follows from
k
part (b) and Corollary 10.3.3. (d) If the elements of α are disjoint then
(α k )n = α n+k−1 and so
Pn (f , φ, α k ) = inf{ sup eφn (x) : γ ⊂ (α k )n } = inf{ sup eφn (x) : γ ⊂ α n+k−1 }.
U∈γ x∈U U∈γ x∈U
Hints or solutions for selected exercises 497
The second part of the statement is an immediate consequence of the first one.
10.4.4. Consider the shift map σ in the space = {0, 1}N . Consider the function φ :
→ R defined by φ(x) = 0 if x0 = 0 and φ(x) = 1 if x0 = 1. Let N be the set
of points x ∈ such that the time average in the orbit of x does not converge.
Check that N is invariant under σ and is non-empty: for each finite sequence
(z0 , . . . , zk ) one can find x ∈ N with xi = zi for i = 0, . . . , k. Deduce that the
topological entropy of the restriction f | Nφ is equal to log 2. Justify that N
does not support any probability measure invariant under f .
10.4.5. Consider the open cover ξ of K whose elements are K ∩ [0, α] and K ∩ [1 −
β, 1]. Check that P(f , φ) = P(f , φ, ξ ) for every potential φ. Moreover,
Pn (f , −t log g , ξ ) = [(gn ) ]−t (U) = (α t + β t )n .
U∈α n
Conclude that ψ(t) = log(α t + β t ). Check that ψ < 0 and ψ > 0 (convexity
also follows from Proposition 10.3.7). Moreover, ψ(0) > 0 > ψ(1). By the
variational principle, the last inequality implies that hμ (f ) − log g dμ < 0.
10.5.3. The Gibbs property gives that limn (1/n) log μ(Cn (x)) = ϕ̃(x) − P, where
Cn (x) is the cylinder of length n that contains x. Combine this identity with
the theorem of Brin–Katok (Theorem 9.3.3) and the theorem of Birkhoff to
get the first claim. Now assume that μ1 and μ2 are two ergodic Gibbs states
with the same constant P. Observe that there exists C such that C−1 μ1 (A) ≤
μ2 (A) ≤ Cμ1 (A) for every A in the algebra formed by the finite disjoint unions
of cylinders. Using the monotone class theorem (Theorem A.1.18), deduce
that C−1 μ1 (A) ≤ μ2 (A) ≤ Cμ1 (A) for any measurable set A. This implies
498 Hints or solutions for selected exercises
that μ1 and μ2 are equivalent measures. Using Lemma 4.3.1, it follows that
μ1 = μ2 .
10.5.5. By Proposition 10.3.7, the pressure function is convex. By Exercise A.5.1, it
follows that it is also continuous. By the smoothness theorem of Mazur (recall
Exercise 2.3.7), there exists a residual subset R ⊂ C0 (M) such that the pressure
function is differentiable at every ϕ ∈ R. Apply Exercise 10.5.4.
11.1.3. Adapt the arguments in Section 9.4.2, as follows. Start by checking that the
iterates of f have bounded distortion: there exists K > 1 such that
1 |Df n (x)|
≤ ≤ K,
K |Df n (y)|
for every n ≥ 1 and any points x, y with P n (x) = P n (y). Consider the
n−1 j
sequence μn = (1/n) j=0 f∗ m of averages of the iterates of the Lebesgue
measure m. Show that the Radon–Nikodym derivatives dμn /dm are uniformly
bounded and are Hölder, with uniform Hölder constants. Deduce that every
accumulation point μ of that sequence is an invariant probability measure
absolutely continuous with respect to the Lebesgue measure. Show that the
Radon–Nikodym derivative ρ = dμ/dm is bounded from zero and infinity (in
other words, log ρ is bounded). Show that ρ and log ρ are Hölder.
11.1.5. Check that Jμ f = (ρ ◦ f )|f |/ρ and use the Rokhlin formula (Theorem 9.7.3).
11.2.4. Take = {2−n : n ≥ 0} mod Z. The restriction f : → cannot be an
expanding map because 1/2 is an isolated point in but 1 = f (1/2) is not.
[Observation: Note that = S1 \ ∞ n=0 f
−n
(I), where I = (1/2, 1) mod Z.
Modifying suitably the choice of I, one finds many other examples, possibly
with uncountable.]
11.3.3. Let a = ϕ dμ1 and b = ϕ dμ2 . Assume that a < b and write r = (b − a)/5.
By the ergodic decomposition theorem, we may assume that μ1 and μ2 are
ergodic. Then, there exist x1 and x2 such that ϕ̃(x1 ) = a and ϕ̃(x2 ) = b. Using
the hypothesis that f is topologically exact, construct a pseudo-orbit (zn )n≥0
alternating (long) segments of the orbits of x1 and x2 in such a way that the
sequence of time averages of ϕ along the pseudo-orbit (zn )n oscillates from
a + r to b − r (meaning that lim inf ≤ a + r and lim sup ≥ b − r). Next, use the
shadowing lemma to find x ∈ M whose orbit shadows this pseudo-orbit. Using
that ϕ is uniformly continuous, conclude that the sequence of time averages of
ϕ along the orbit of x oscillates from a + 2r to b − 2r.
12.1.2. Theorem 2.1.5 gives that M1 (M) is weak∗ compact and it is clear that it
is convex. Check that the operator L : C0 (M) → C0 (M) is continuous and
deduce that its dual L∗ : M(M) → M(M) is also continuous. If (ηn )n → η in
the weak∗ topology then ( L1 dηn )n → L1 dη. Conclude that the operator
G : M1 (M) → M1 (M) is continuous. Hence, by the Tychonoff–Schauder
theorem, G has some fixed point ν. This means that L∗ ν = λν, where
λ = L1 dν. Since λ > 0, this proves that ν is a reference measure. Using
√
Corollary 12.1.9, check that λ = lim supn n Ln 1 and deduce that λ is the
spectral radius of L.
12.2.4. Fix in S1 the orientation induced by R. Consider the fixed point p0 = 0 of f and
let p1 , . . . , pd be its pre-images, ordered cyclically, with pd = p0 . Analogously,
let q0 be a fixed point of g and q1 , . . . , qd be its pre-images, ordered cyclically,
Hints or solutions for selected exercises 499
with qd = q0 . Note that f maps each [pi−1 , pi ] and g maps each [qi−1 , qi ] onto
S1 . Then, for each sequence (in )n ∈ {1, . . . , d}N there exists exactly one point
x ∈ S1 and one point y ∈ S1 such that f n (x) ∈ [pin −1 , pin ] and gn (y) ∈ [qin −1 , qin ]
for every n. Clearly, the maps (in )n → x and (in )n → y are surjective. Consider
two sequences (in )n and (jn )n to be equivalent if there exists N ∈ N ∪ {∞}
such that (1) in = jn for every n ≤ N and either (2a) in = 1 and jn = d for
every n > N or (2b) in = d and jn = 1 for every n > N. Show that the points x
corresponding to (in )n and (jn )n coincide if and only if the two sequences are
equivalent and a similar fact holds for the points y corresponding to the two
sequences. Conclude that the map φ : x → y is well defined and is a bijection
in S1 such that φ(f (x)) = g(φ(x)) for every x. Observe that φ preserves the
orientation of S1 and, thus, is a homeomorphism.
12.2.5. (a) ⇒ (b): Trivial. (b) ⇒ (c): Let μa be the absolutely continuous invariant
probability measure and μm be the measure of maximum entropy of f ; let νa
and νm be the corresponding measures for g. Show that μa = μm . Let φ : S1 →
S1 be a topological conjugacy. Show that νm = φ∗ μm and νa = φ∗ μa if φ is
absolutely continuous. Use Corollary 12.2.4 to conclude that in the latter case
|(gn ) (x)| = kn for every x ∈ Fix(f n ). (c) ⇒ (a): The hypothesis implies that νa =
νm and so νa = φ∗ μa . Recall (Proposition 12.1.20) that the densities dμa /dm
and dνa /dm are continuous and bounded from zero and infinity. Conclude that
7
φ is differentiable, with φ = (dμ/dm) (dν/dm) ◦ φ.
12.3.2. Consider A = (a, 1), P = (p, 1), Q = (q, 1), B = (b, 1), O = (0, 0) ∈ C × R. Let
A (respectively, B ) be the point where the line parallel to OQ (respectively,
OP) passing through P (respectively, Q) intersects the boundary of C. Note
that all these points belong to the plane determined by P, Q and O; note also
that A ∈ OA and B ∈ OB. By definition, α(P, Q) = |B Q|/|OP| and β(P, Q) =
|OQ|/|A P|. Check that |AP|/|AQ| = |A P|/|OQ| and |BQ|/|BP| = |B Q|/|OP|.
Hence,
In other words, d(p, q) = log(|aq| |bp|)/(|ap| |bq|) = (p, q), for any p, q ∈ D.
12.3.4. Consider the cone C0 of positive continuous functions in M. The corresponding
projective distance θ0 is given in Example 12.3.5. Check that θ1 is the
restriction of θ0 to the cone C1 . Consider a sequence of positive differentiable
functions converging uniformly to a (continuous but) non-differentiable
function g0 . Show that (gn )n converges to g0 with respect to the distance θ0
and, thus, is a Cauchy sequence for θ0 and θ1 . Argue that (gn )n cannot be
convergent for θ1 .
12.3.8. (a) It is clear that log g is (b, β)-Hölder and sup g/ inf g is close to 1 if the norm
vβ,ρ is small; this will be implicit in all that follows. Then, g ∈ C(b, β, R). To
estimate θ (1, g), use the expression given by Lemma 12.3.8. Observe that
exp(bδ)g(x) − g(y)
β(1, g) = sup g(x), : x = y, d(x, y) < ρ where δ = d(x, y)β .
exp(bδ) − 1
500 Hints or solutions for selected exercises
For the lower estimate, let us prove that ν satisfies the hypothesis of the
mass distribution principle (Exercise 12.4.4). Given any U with diam U <
c min{diam D1 , . . . , diam DN }, there exist n ≥ 1 and in such that Din intersects U
and c diam Din > diam U. By (ii), we have that ν(U) ≤ ν(Din ) ≤ c−2 diam Din0 .
d
Take n maximum. Then, using (iii), diam U ≥ c diam Din+1 ≥ c2 diam Din
for some choice of in . Combining the two inequalities, we get ν(U) ≤
c−2−2d0 (diam U)d0 . Then, by the mass distribution principle, md0 () ≥ c2+2d0 .
Finally, extend these arguments to any dimension ≥ 1.
A.1.9. Given A1 ⊃ · · · ⊃ Ai ⊃ · · · , take A = ∞
i=1 Ai . For j ≥ 1, consider Aj = Aj \ A.
By Theorem A.1.14, we have that μ(Aj ) → 0 and so μ(Aj ) → μ(A). Given
A1 ⊂ · · · ⊂ Ai ⊂ · · · , take A = ∞
i=1 Ai . For each j, consider Aj = A \ Aj . By
Theorem A.1.14, we have that μ(Aj ) → 0, that is, μ(Aj ) → μ(A).
A.1.13. (Royden [Roy63]) (b) ⇒ (a) Assume that there exist Borel sets B1 , B2 such that
B1 ⊂ E ⊂ B2 and m(B2 \ B1 ) = 0. Deduce that m∗ (E \ B1 ) = 0, hence E \ B1 is
a Lebesgue measurable set. Conclude that E is a Lebesgue measurable set. (a)
⇒ (c) Let E be a Lebesgue measurable set such that m∗ (E) < ∞. Given ε > 0,
there exists a cover by open rectangles (Rk )k such that k m∗ (Rk ) < m∗ (E) + ε.
Then, A = k Rk is an open set containing E and such that m∗ (A) − m∗ (E) < ε.
Using that E is a Lebesgue measurable set, deduce that m∗ (A \ E) < ε. For
the general case, write E as a disjoint union of Lebesgue measurable sets with
finite exterior measure. (c) ⇔ (d) It is clear that E is a Lebesgue measurable
set if and only if its complement is. (c) and (d) ⇒ (b) For each k ≥ 1, consider
a closed set Fk ⊂ E and an open set Ak ⊃ E such that m∗ (E \ Fk ) and m∗ (Ak \ E)
are less than 1/k. Then, B1 = ∪Fk and B2 = k Ak are Borel sets such that
B1 ⊂ E ⊂ B2 and m∗ (E \ B1 ) = m∗ (B2 \ E) = 0. Conclude that m(B2 \ B1 ) =
m∗ (B2 \ B1 ) = 0.
A.1.18. Show that x → 1n #{0 ≤ j ≤ n − 1 : aj = 5} is a simple function for each n ≥ 1.
By Proposition A.1.31, it follows that ω5 is measurable.
A.2.8. (a) Assume that F is uniformly integrable. Consider C > 0 corresponding to
α = 1 and take L = C + 1. Check that |f | dμ < L for every f ∈ F . Given
ε > 0, consider C > 0 corresponding to α = ε/2 and take δ = ε/(2C). Check
that A |f | dμ < ε for every f ∈ F and every set with μ(A) < δ. Conversely,
given α > 0, take δ > 0 corresponding to ε = α and let C = L/δ. Show that
|f |>C
|f | dμ < α. (b) Applying Exercise A.2.5 to the function |g|, show that F
satisfies the criterion in (a). (c) Let us prove three facts about f = limn fn . (i) f is
finite at almost every point: Consider L as in (a). Note that μ({x : |fn (x)| ≥ k}) ≤
L/k for every n, k ≥ 1 (Exercise A.2.4) and deduce that μ({x : |f (x)| ≥ k}) ≤ L/k
for every k ≥ 1. (ii) f is integrable: Fix K > 0. Given any ε > 0, take δ as in
(a). Take n sufficiently large that μ({x : |fn (x) − f (x)| > ε}) < δ. Note that
|f | dμ ≤ |f | dμ + |f | dμ ≤ (L + ε) + Kδ.
|f |≤K |fn −f |≤ε |f |≤K,|fn −f |>ε
φ ∈ Lp (μ)∗ there exists g ∈ Lq (μ) such that φ = (g) and gq = φ. For
each measurable set B ⊂ M, define η(B) = φ(XB ). Check that η is a complex
measure (to prove σ -additivity one needs p < ∞) and observe that η μ. Con-
sider the Radon–Nikodym derivative g = (dη/dμ). Then, φ(XB ) = B g dμ for
every B; conclude that φ(f ) = fg dμ for every f ∈ L∞ (μ). In the case p = 1,
this construction yields | B g dμ| ≤ φμ(B) for every measurable set. Deduce
that g∞ ≤ φ. Now suppose that 1 < p < ∞. Take fn = XBn β|g|q−1 , where
Bn = {x : |g(x)| ≤ n}. Observe that fn ∈ L∞ (μ) and |fn |p = |g|q in the set Bn and
1/p 1/p
|g| dμ = fn g dμ = φ(fn ) ≤ φ
q
|fn | dμ
p
≤ φ |g| dμ
q
.
Bn Bn
This yields Bn |g|q dμ ≤ φq for every n and, thus, gq ≤ φ. Finally,
φ(f ) = fg dμ for every f ∈ Lp (μ), since the two sides are continuous
functionals and they coincide on the dense subset L∞ .
A.6.5. By definition, u · Lv = L∗ u · v and u · L∗ v = (L∗ )∗ u · v for any u and v. Hence,
v · (L∗ )∗ u = L∗ v · u for any u and v. Reversing the roles of u and v, we see that
L = (L∗ )∗ . Note that L∗ u · v ≤ L u v for every u and v. Taking v = L∗ u,
it follows that L∗ u ≤ L u for every u and so L∗ ≤ L. Since L = (L∗ )∗ ,
it follows that L ≤ L∗ , hence the two norms coincide. Since the operator
norm is submultiplicative, L∗ L ≤ L2 . On the other hand, u · L∗ Lu = Lu2
and so L∗ L u2 ≥ Lu2 , for every u. Deduce that L∗ L ≥ L2 and so the
two expressions coincide. Analogously, LL∗ = L2 .
A.6.8. Assume that v ∈ H and (un )n is a sequence in E such that un · v → u · v for
every v ∈ H. Considering v ∈ E⊥ , conclude that v ∈ (E⊥ )⊥ . By Exercise A.6.7,
it follows that u ∈ E. Therefore, E is closed in the weak topology. Now consider
any sequence (vn )n in U(E) converging to some v ∈ H. For each n, take
un = h−1 (vn ) ∈ E. Since h is an isometry, um − un = vm − vn for any m, n.
It follows that (un )n is a Cauchy sequence in E and so it admits a limit u ∈ E.
Hence, v = h(u) is in U(E).
A.7.1. The inverse of T + H is given by the equation (T + H)(T −1 + J) = id , which
may be rewritten as a fixed point equation J = −L−1 HL−1 + L−1 HJ. Use the
hypothesis to show that this equation admits a (unique) solution. Hence, T + H
is an isomorphism. Deduce that L − λid whenever λ > L. Therefore, the
spectrum of L is contained in the disk of radius L. It also follows from the
previous observation that if L − λid is an isomorphism then the same is true
for L − λ id if λ is sufficiently close to λ.
A.7.4. (a) Observe that L − λid = (z − λ) dE(z) and use Lemma A.7.4. By the
continuity from below property (Exercise A.1.9), E({λ}) = limn E({z : |z − λ| ≤
1/n}). It follows that E({λ})v = v. (b) It follows from Exercise A.7.3 that
E(B)E({λ}) = E({λ}) if λ ∈ B and E(B)E({λ}) = E(∅) = 0 otherwise. Since
L = z dE(z), we get that Lv = λE({λ})v = λv.
References