Heights in Diophantine Geometry - E. Bombieri, W. Gubler (Cambridge, 2006) WW PDF
Heights in Diophantine Geometry - E. Bombieri, W. Gubler (Cambridge, 2006) WW PDF
The first half of the book is devoted to the general theory of heights and its applications,
including a complete, detailed proof of the celebrated subspace theorem of W. M. Schmidt.
The second part deals with abelian varieties, the Mordell–Weil theorem and Faltings’s
proof of the Mordell conjecture, ending wih a self-contained exposition of Nevanlinna
theory and the related famous conjectures of Vojta. The book concludes with a
comprehensive list of references. It is destined to be a definitive reference book on modern
diophantine geometry, bringing a new standard of rigor and elegance to the field.
Editorial Board
Enrico Bombieri
Institute of Advanced Study, Princeton
Walter Gubler
University of Dortmund
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents
Preface page xi
Terminology xv
1. Heights 1
1.1. Introduction 1
1.2. Absolute values 1
1.3. Finite-dimensional extensions 5
1.4. The product formula 9
1.5. Heights in projective and affine space 15
1.6. Heights of polynomials 21
1.7. Lower bounds for norms of products of polynomials 29
1.8. Bibliographical notes 33
2. Weil heights 34
2.1. Introduction 34
2.2. Local heights 35
2.3. Global heights 39
2.4. Weil heights 42
2.5. Explicit bounds for Weil heights 45
2.6. Bounded subsets 54
2.7. Metrized line bundles and local heights 57
2.8. Heights on Grassmannians 66
2.9. Siegel’s lemma 72
2.10. Bibliographical notes 80
3. Linear tori 82
3.1. Introduction 82
3.2. Subgroups and lattices 82
3.3. Subvarieties and maximal subgroups 88
3.4. Bibliographical notes 92
v
vi Contents
4. Small points 93
4.1. Introduction 93
4.2. Zhang’s theorem 93
4.3. The equidistribution theorem 101
4.4. Dobrowolski’s theorem 107
4.5. Remarks on the Northcott property 117
4.6. Remarks on the Bogomolov property 120
4.7. Bibliographical notes 123
8.5. The theorem of the square and the dual abelian variety 252
8.6. The theorem of the cube 257
8.7. The isogeny multiplication by n 263
8.8. Characterization of odd elements in the Picard group 265
8.9. Decomposition into simple abelian varieties 267
8.10. Curves and Jacobians 268
8.11. Bibliographical notes 282
References 620
Glossary of notation 635
Index 643
Preface
xi
xii Preface
analogue of the classical theory of algebraic surfaces which applies in this arith-
metic setting.
This can be done only to some extent. First of all, global results require working
with complete varieties, and a first problem was to compactify Spec(Z) and de-
velop a good intersection theory for divisors. This step was brilliantly solved by
Arakelov, using adeles and introducing metrics on the “fibre at infinity.” Arakelov’s
work can be regarded as the start of a beautiful new theory, aptly named “arith-
metic geometry.” As an example, in arithmetic geometry the theory of heights is a
special chapter of the much more precise arithmetic intersection theory.
Arakelov’s theory did not solve all problems and major questions remain. In the
“horizontal direction” given by the base Spec(Z), infinitesimal methods are no
longer at our disposal and genuine new difficulties, with no counterpart in the
classical theory, do appear. This is one of the major stumbling blocks for further
progress. Thus at the present stage we may take a view half-way towards Max
Noether’s view: Arithmetic surfaces were also created by God, but their study
encounters devilish difficulties.
Today, there are already good books devoted to the subject, and we can mention
here Lang’s [169], Serre’s [277], the more expository but very comprehensive ac-
count of Lang’s [171] and Hindry and Silverman’s [153]. So, why a new book on
diophantine geometry?
As is often the case, this book grew from introductory lectures at the graduate
level, given over a decade ago at the Scuola Normale Superiore di Pisa and the
Mathematisches Forschungsinstitut of the Eidgenössische Technische Hochschule
in Zürich. An advanced knowledge of algebra or algebraic geometry was not a
prerequisite of the courses. Thus the subject was developed mainly through clas-
sical lines, namely the theory of varieties over fields of characteristic 0 insofar
as algebraic geometry was concerned, and the theory of heights for the number
theoretic aspects.
Already with the initial rough notes, embracing the view that in order to learn tools
it is best to use them in practice, it was decided to keep mathematical rigor as a
strict requirement, supplying references whenever needed and making a clear dis-
tinction between a proof and a plausible argument. Examples, including unusual
ones, and advanced sections in which deeper aspects of the theory were either
developed or described, were included whenever possible. Rather than including
this type of material as “exercises” at various levels of difficulty, often disguising
good research papers as exercises, it was decided to include proofs and extended
comments also for them. However, in the time needed to put the original material
together, the subject matter continued to advance at a fast pace, whence the need
for inclusion of additional interesting material, as well as substantial revisions of
what had been done before.
Preface xiii
In the final product, this book is basically divided into three parts. Chapters 1 to
7 develop the elementary theory of heights and its applications to the diophantine
geometry of subvarieties of the split torus Gnm , including applications to diophan-
tine approximation with proofs of Roth’s theorem and Schmidt’s subspace theorem
and some unusual applications.
Chapters 8 to 11 deal with abelian varieties and the diophantine geometry of their
subvarieties, ending with a detailed proof of Faltings’s celebrated theorem estab-
lishing Mordell’s conjecture for curves, following Vojta’s proof as simplified in
[29]. However, we felt that a proper treatment of Faltings’s big theorem, namely
his proof of Lang’s conjectures about rational points on subvarieties of abelian va-
rieties, was best done in the context of arithmetic geometry and with regrets we
limited ourselves on this matter only to a few comments about the theorem itself
and to some of its applications.
Chapters 12 to 14 are more speculative and at times straddle the borderline be-
tween diophantine geometry and arithmetic geometry. Chapter 12 deals with the
so-called abc-conjecture over number fields, including a complete proof of Belyı̆’s
theorem and its application to Elkies’s theorem, various examples, concluding
with a finiteness result for the generalized Fermat equation, due to Darmon and
Granville. Chapter 13, which is largely self contained, is an exposition of the
classical Nevanlinna theory, with proofs of the first and second main theorems of
Nevanlinna, and also Cartan’s extension of them to the theory of meromorphic
curves. Its purpose is to motivate the final Chapter 14 dealing with the well-known
Vojta conjectures, which have spurred a great deal of work in the field.
Proofs are usually given in full detail, but of course it was not feasible to de-
velop all algebra and algebraic geometry from scratch and they tend to be fairly
condensed at times. To alleviate this, Appendix A summarizes all concepts of al-
gebraic geometry needed in this book and Appendix B gathers the necessary facts
about ramification in number theory and algebraic geometry. Both are provided
with complete references to standard books and should help the reader in under-
standing which notions and notations we use. Finally Appendix C contains an
account of Minkowski’s geometry of numbers, with proofs, at least to the extent
we need in this book.
Some sections in this book appear in small print. Their meaning is simply that
they can be omitted in a first reading, either because they require more advanced
knowledge of algebra and geometry, or because they deal with side topics not
appearing elsewhere in the book. At the end of every chapter, the reader will find
some bibliographical notes, containing both historical comments and references
to additional literature. However, in no way do these references pretend to be
complete and they only represent our personal choices for additional reading.
xiv Preface
This book does not represent an introduction to diophantine geometry, nor a com-
plete treatment of the theory of heights. Neither do we strive for maximum gener-
ality, and most of the book is concerned only with a number field as ground field,
dealing only marginally with the function field case and even less with ground
fields of positive characteristic. Also, we do not extend the theory to semiabelian
varieties or non-split commutative linear groups, which are also quite important
and lead to delicate questions.
The whole theory of effective diophantine approximation, and Baker’s theory of
logarithmic forms, are missing entirely from this book and relegated to a few com-
ments at the end of Chapter 5. This is not due to a perception of lack of importance
of the subject. Rather, an adequate treatment of the topic would have required a
second large volume for this already large book.
The same can be said for arithmetic geometry, which no doubt deserves an ad-
vanced monograph by itself, also for the arithmetic theory of elliptic curves and
abelian varieties, and for the arithmetic theory of modular functions and its appli-
cations to diophantine problems.
Our goal in writing this book was to provide, in addition to the existing literature,
a wide selection of topics in the subject, containing foundational material with
complete proofs, numerous examples, and additional material viewed as a bridge
between the classical theory and arithmetic geometry proper. A fair portion of this
book is meant to be accessible to a reader with only a basic course in algebra and
algebraic geometry, but even the specialist in the field should be able to find inter-
esting material in it. We made no serious attempt to reach completeness about the
history of the subject, also referenced material (we never quote from secondary
sources) is for this very reason mostly from literature in the English and French
languages. Finally, although we attempted to put together a comprehensive bib-
liography, in no way do we pretend it to be complete. We apologize in advance
for the inevitable omissions in our bibliography, regarding priorities and precursor
works.
At the end of the book the reader will find an index of mathematical names in
lexicographic order and an index of notations ordered by page number. The vanity
index (index of authors mentioned in the text) has been omitted.
Terminology
We try to use standard terminology, but for convenience of the reader we gather
here some of the most frequently used notation and conventions.
In set theory, A ⊂ B means that A is a subset of B . In particular, A may be
equal to B . If this case is excluded, then we write A B . The complement of A
in B is denoted by B \ A as we reserve − for algebraic purposes. We denote the
number of elements of A by |A| (possibly ∞ ). The identity map is id.
A quasi-compact topological space is characterized by the Heine–Borel property
for open coverings. In this book, a compact space is quasi-compact and Hausdorff.
We denote by N the set of natural numbers with 0 included and Z is the ring of
rational integers. Then Q , R and C are the fields of rational, real, and complex
numbers. A positive number means x > 0, but we use R+ for the non-negative
real numbers. The Kronecker symbol δij is 0 for i = j and 1 for i = j .
The real (resp. imaginary) part of a complex number z is denoted by z (resp.
z ) and z is complex conjugation.
The floor function x, defined for x ∈ R , is the largest rational integer ≤ x . The
ceiling function x denotes the smallest rational integer ≥ x .
The real functions on X are denoted by RX . For f, g ∈ RX , the Landau symbol
f = O(g) means |f (x)| ≤ Cg(x) for some unspecified positive constant C .
If we want to emphasize the dependence of C on parameters ε, L, . . . we write
f = Oε,L,... (g). As a special case, f = O(1) means that f is a bounded function
on X . We also use, with the same meaning, the equivalent Vinogradov’s symbol
f g and f ε,L,... g . The symbol g f is interpreted as f g .
If X is a topological space and f, g are defined on a subset Y with an accumu-
lation point x , then f = O(g) for y → x means that |f (y)| ≤ Cg(y) holds for
all y ∈ Y contained in a neighbourhood of x . If this is true for all C > 0 (with
neighbourhoods depending on C ), then we use the Landau symbol f = o(g) for
y → x . The asymptotic relation f ∼ g for y → x means that f − g = o(|g|).
The Landau symbols and the Vinogradov symbol must be used with caution in
presence of parameters, not just because the constant involved may depend on pa-
rameters, but especially because the neighbourhood in which the inequality holds
will also depend on the parameters, an easily overlooked fact.
xv
xvi Terminology
In number theory, we use GCD(a, b) for the greatest common multiple of a and
b . As usual, a|b means that a divides b . The number of primes up to x is π(x).
The group of multiplicative units of a commutative ring with identity is denoted
by R× . We use the symbol V ∗ to denote the dual of a vector space V . Rings and
algebras are always assumed to be associative, fields are always commutative. If
the rings have an identity, then we assume that ring homomorphisms send 1 to 1.
The ideal generated by g1 , . . . , gm is denoted by [g1 , . . . , gm ]. The characteristic
of a field K is char(K) and we write Fq for the finite field with q elements.
The ring of polynomials in the variable x with coefficients in K is denoted by
K[x]. A monic polynomial has highest coefficient 1. The minimal polynomial
of an algebraic number α over a field is assumed to be monic and its degree is
the degree of α , denoted by deg(α). If we consider the minimal polynomial
over Z (or any factorial ring), then we replace monic by the assumption that the
coefficients are coprime. We use x to denote a vector with entries xi , thus K[x]
is the ring of polynomials in the variables xi . By K we denote a choice of an
algebraic closure of the field K .
For the terminology used in algebraic geometry, the reader is referred to
Appendix A.
The numbering in this book is by chapter (appendices in capitals), section, and
statement, in progressive order. Equations are numbered separately by chapter (ap-
pendices in capitals) and statement in progressive order, with the label enclosed in
parentheses. References to equations not occurring on the same page or the pre-
ceding page also give the page numbers; the first example is: (A.13) on page 558,
occurring on page 15.
1 HEIGHTS
1.1. Introduction
This chapter contains preliminary material on absolute values and the elementary
theory of heights on projective varieties. Most of this material is quite standard,
although we have included some of the finer results on classical heights which are
not usually treated in other texts.
In Section 1.2 we start with absolute values, and places are introduced as equiv-
alence classes of absolute values. The definitions of residue degree and ramifica-
tion index are given, as well as their basic properties and behaviour with respect
to finite degree extensions. In Sections 1.3 and 1.4 we introduce normalized ab-
solute values and the all-important product formula in number fields and function
fields. Section 1.5 contains the definition of the absolute Weil height in projective
spaces, the characterization of points with height 0, and a general form of Liou-
ville’s inequality in diophantine approximation. Section 1.6 studies the height of
polynomials and Mahler’s measure and proves Gauss’s lemma and its counterpart
at infinity, Gelfond’s lemma. Section 1.7, which can be omitted in a first read-
ing, elaborates further on various comparison results about heights and norms of
polynomials, including an interesting result of Per Enflo on 1 -norms.
The presentation of the material in this chapter is self contained with the excep-
tion of Section 1.2, where the basic facts about absolute values are quoted from
standard reference books (N. Bourbaki [47], Ch.VI, S. Lang [173], Ch.XII, and N.
Jacobson [157], Ch.IX).
(c ) |x + y| ≤ max{|x|, |y|},
(a) w|v .
(b) The topology of Kv induced by w is complete.
(c) K is a dense subset of Kv in the above topology.
The residue degree fw/v of L/K in w is the dimension of k(w) over k(v). Let
| |w be an absolute value representing w and | |v the restriction of | |w to K .
The value group |K × |v is a multiplicative subgroup of |L× |w and its index is
called the ramification index ew/v of w in v .
The place v is called discrete if the value group |K × |v is cyclic. Then mv is a
principal ideal and any principal generator is called a local parameter.
On the other hand, every completion of a number field with respect to a non-
archimedean place is a finite extension of Qp , hence locally compact. For details,
we refer to [47], Ch.VI, §9, no.3, Th.1. The following result of Artin and Whaples
is called the approximation theorem:
Theorem 1.2.13. Let | |1 , . . . , | |n be inequivalent non-trivial absolute values on
a field K . Then for x1 , . . . , xn ∈ K and ε > 0 there is x ∈ K such that
|x − xk |k < ε (k = 1, . . . , n).
Proof: [157], §9.2 or [173], Th.XII.1.2.
1.3. Finite extensions 5
and
Proof: Corollary 1.3.2 implies the first statement. There is an element ξ of L with
L = K(ξ). With the notation of Proposition 1.3.1, we have k1 = · · · = kr = 1
and an isomorphism
r
∼
L ⊗K Kv −→ Kv [t]/(fj (t))
j=1
|ζ − 1|v = p−1/(p−1) .
By Proposition 1.2.11, the ramification index is equal to p − 1 and the residue degree is
equal to 1 .
1.3.10. In the final part of this section, we handle finite-dimensional field extensions without
separability assumptions. It turns out that it suffices to adjust the exponents in the normal-
ization 1.3.6. Since we focus almost exclusively on number fields, the reader may skip the
rest of this section in a first reading.
Let K be a field with absolute values | |v and let L/K be a finite-dimensional field
extension. Our goal is to generalize Proposition 1.3.1 describing the extensions of | |v to
the field L .
Since L ⊗K Kv is a finite-dimensional Kv -algebra, the structure theorem of commutative
artinian rings ([157], Th.7.13) gives uniquely determined ideals Rj , which are local Kv -
algebras with maximal ideals mj and such that
r
L ⊗K Kv = Rj . (1.1)
j=1
and identify the residue field of Tw with the completion Lw . For y ∈ L , we set
yw := |NL w /K v (y)|v[T w :L w ]
and
|y|w := y1/[L:K
w
]
.
With these modifications the analogue of Lemma 1.3.7 still holds, namely:
Lemma 1.3.13. If x ∈ K \ {0} and y ∈ L \ {0} , then
log |x|w = log |x|v , (1.3)
w|v
log yw = log |NL/K (y)|v . (1.4)
w|v
Proof: Formula (1.4) is a trivial consequence of Proposition 1.2.7 and (1.2). If we set y = x
in (1.4), then (1.3) follows immediately from (1.2).
The product formula over Q may be stated and proved as a consequence of the
factorization of a non-zero rational number into a product of primes and a unit. In
spite of its simplicity and essentially trivial nature, it plays a fundamental role and
its importance cannot be overstated. The fact that it involves all places, including
the places at ∞ , means that, from the geometrical point of view, we are dealing
with a complete variety. In the case considered here, the general fibre of the variety
is a point and everything is quite simple. However, the best interpretation of the
product formula and its generalizations is found in the framework of Arakelov
theory.
1.4.1. Let K be a field and MK be a set of non-trivial inequivalent absolute values
on K such that the set
{ | |v ∈ MK | |x|v = 1}
is finite for any x ∈ K\{0}. We identify the elements of MK with the corre-
sponding places and say that MK satisfies the product formula if
|x|v = 1
v∈MK
Let X be a projective irreducible variety over a field K and let us fix an ample line bundle
L (see A.6.10). We denote by deg(Z) the degree of a cycle Z with respect to L . Since
the function field does not change by passing to the normalization (see A.12.6), we may and
shall assume that X is regular in codimension 1 (see A.8.10). For any prime divisor Z of
X , the local ring OX,Z is a discrete valuation ring and the valuation of f ∈ K(X)× is the
order of f at Z . The latter is denoted by ordZ (f ) . Since the degree of a principal divisor
is 0 , we have
ordZ (f ) deg(Z) = 0. (1.7)
Z
12 HEIGHTS
(a) Any finitely generated field over K is a function field of an irreducible projective
normal variety over K .
(b) Two irreducible varieties are birationally equivalent over K if and only if they have
isomorphic function fields over K .
(c) If L is a finite-dimensional extension of the function field K(X) of an irreducible
projective variety X over K , then there are an irreducible projective normal va-
riety Y over K , and a finite surjective morphism ϕ : Y −→ X , such that
L∼ = K(Y ) and the inclusion K(X) ⊂ L corresponds to ϕ : K(X) −→ K(Y ) .
(d) In (c), there is a distinguished choice for Y called the normalization of X in
L , uniquely characterized up to isomorphisms by the following property: Given a
dominant morphism ϕ : Y → X of an irreducible normal K -variety Y to X
and a homomorphism ρ : K(Y ) → K(Y ) over K(X) , then there is a unique
dominant morphism ψ : Y → Y with ψ = ρ .
1.4. The product formula 13
K[t1 , . . . , tn ] −→ L
given by ti
→ xi . Denote the corresponding closed subvariety of An by X . Since I is
prime, X is an irreducible variety. Obviously, K(X) is isomorphic to L . The closure X
of X in PnK is a projective variety. The normalization of X is a projective normal variety
(see A.12.7) with function field L (see A.12.6, A.12.7). This proves (a).
For (b), see A.11.4.
To prove (c) and (d), a generalization of the construction in A.12.6 leads to the normalization
of X in L . If X were affine, then we would take the integral closure of K[X] in L . For
any variety X , we glue the normalizations of the affine open charts to get the normalization
of X (see A. Grothendieck [135], 6.3.9). The morphism from the normalization to X is
finite ([135], 6.3.10) and hence projectivity of X implies projectivity of the normalization
(see A.12.7).
Remark 1.4.11. Note that a curve regular in codimension 1 is regular and hence deter-
mined up to isomorphism by its function field. For a higher-dimensional function field
K(X) , there is no canonical choice for the model X . Even if X is smooth, we may blow
up a point to get another smooth model X for K(X) and it is clear that MX MX .
Hence we always fix a model when dealing with higher-dimensional function fields.
If L/K(X) is a finite extension, then Lemma 1.4.10 (d) shows that the normalization
ϕ : Y → X of X in L is a canonical model for the function field L . Let ϕ : Y → X
be any finite surjective morphism of an irreducible projective variety Y onto X with
K(Y ) = L and with K(X) → L equal to (ϕ ) . We claim that MY = MY .
We first show that we may replace Y by its normalization Y (in L ) without changing the
set of places. Indeed, the normalization morphism π : Y → Y is finite and birational.
Since Y and Y are projective, the valuative criterion of properness (see A.11.10) shows
that π induces an isomorphism outside of subsets of codimension ≥ 2 in Y and Y ,
hence MY = MY . So we may assume Y normal and then Lemma 1.4.10 (d) yields a
unique dominant morphism ψ : Y → Y factoring through ϕ . The morphism ψ is proper
(see A.6.15) and has finite fibres, hence ψ is finite (see A.12.4). Since K(Y ) = K(Y ) =
L , the morphism ψ is also birational (Lemma 1.4.10 (b)) and the same argument as above
shows that ψ is an isomorphism in codimension 1 , hence MY = MY .
We see that the set of places of L is well determined by X and in the following examples
we will show that MY is the set of places of L extending the places of MX .
Example 1.4.12. Let us consider a finite-dimensional field extension of the function field
K(C) of an irreducible projective regular curve C over the ground field K . By Lemma
1.4.10, there is an irreducible projective regular curve C over K and a morphism ϕ :
C → C such that the extension corresponds to the extension K(C )/K(C) induced by
ϕ . We know from the above that the order at a closed irreducible subset Z of dimension 0
induces an absolute value | |v ∈ MC . Since C is regular, Cartier divisors can be identified
14 HEIGHTS
with Weil divisors (cf. A.8.21) and ϕ∗ (Z) is well defined. We have
ϕ∗ (Z) = mZ Z ,
Z
where Z ranges over all irreducible closed subsets of C lying over Z and where mZ
denotes the multiplicity in Z . Note that for f ∈ K(C)× we have ordZ (f ) = mZ ordZ(f ) .
Thus Z induces a place v on C with v |v . Its ramification index and residue degree are
ev /v = mZ , fv /v = [K(Z ) : K(Z)] .
The projection formula for proper intersection products (see W. Fulton [125], Prop.2.5(c))
gives
Z.ϕ∗ (C ) = ϕ∗ (ϕ∗ (Z)),
hence
[K(C ) : K(C)] = mZ [K(Z ) : K(Z)] = ev /v fv /v .
Z v
By Remark 1.3.3 and Proposition 1.2.11, we see that all places v dividing v are induced
by the “fibre points” Z and the local degree satisfies
[K(C )v : K(C)v ] = mZ [K(Z ) : K(Z)] .
Example 1.4.13. In order to extend the above example to higher dimensions, we use the
language of schemes. Let us consider a finite-dimensional extension of a function field
K(X) . By Lemma 1.4.10, we may identify this extension with an extension K(X )/K(X)
induced by a finite surjective morphism ϕ : X → X of irreducible projective varieties
over K and regular in codimension 1 .
Let Z be a prime divisor on X with corresponding place v . To study the places v of X
with v |v , we may assume that X , and hence X , are affine. The fibre over the generic
point ζ of Z is the affine scheme
ϕ−1 (ζ) = Spec K[X ] ⊗K [X ] K(ζ) ,
where K(ζ) is the residue field of OX,ζ . By the structure theorem of finite-dimensional
algebras ([157], Th.7.13), we have
K[X ] ⊗K [X ] K(ζ) ∼
= Oϕ −1 (ζ ),ξ .
ξ∈ϕ −1 (ζ )
Note that K[X ] ⊗K [X ] OX,ζ is a finitely generated module over the discrete valuation
ring OX,ζ . Since it is also torsion free (as a subring of K(X ) ), it is free of rank [K(X ) :
K(X)] . We conclude that
[K(X ) : K(X)] = [Oϕ −1 (ζ ),ξ : K(ζ)].
ξ∈ϕ −1 (ζ )
Clearly, the order at any point ξ ∈ ϕ−1 (ζ) yields a place v of K(X ) with v |v . Indeed,
denoting by tζ a local parameter in OX,ζ , we have
ordξ (f ) = ordξ (tζ )ordζ (f ),
×
for any f ∈ K(X) . Moreover, we verify that
ev /v = ordξ (tζ ), fv /v = [K(ξ) : K(ζ)] .
1.5. Heights in projective and affine space 15
Therefore, as in the preceding example, we conclude that all places of K(X ) dividing v
are induced by points of ϕ−1 (ζ) .
Moreover, let L be an ample line bundle on X . Then the absolute values are normalized
by
|f |v := cordv (f ) degL (v) (v ∈ MK (X ) , f ∈ K(X))
to satisfy the product formula for some constant c . If we use the normalizations from 1.3.6
on K(X ) and the equation
degϕ ∗ L (w) = [K(w) : K(v)] degL (v)
obtained from the projection formula (A.13) on page 558, then the above implies
ordw (g) degϕ ∗ L (w)
|g|w = c1 (w ∈ MK (X ) , g ∈ K(X ))
for c1 := c1/[K (X ):K (X )] . Note that ϕ∗ L is ample (cf. A.12.7) and hence the normaliza-
tions on K(X ) fit with 1.4.6 for the new constant c1 .
Our claim now follows from the first formula of Lemma 1.3.7.
Lemma 1.5.3. h(x) is independent of the choice of coordinates.
16 HEIGHTS
1.5.6. A similar notion holds for affine space. Let AnQ be the affine space of
dimension n over Q , together with the usual embedding in PnQ given by
P = (x1 , . . . , xn ) → (1 : x1 : · · · : xn );
then we define h(P ) as the height of the image of P .
1.5.7. It should always be clear from the context whether we are dealing with
points in affine or projective space, and there should be no problem in using the
same notation h for heights in affine or projective space.
In performing local calculations, it proves to be convenient to introduce the func-
tion
log+ t = max(0, log t)
on the positive real numbers, extended by log+ 0 = 0. Then it is immediate that
the height on affine space is given by
h(x1 , . . . , xn ) = max log+ |xj |v .
j
v∈MK
d
|si (ζ m )| ≤ = 2d .
i=0 i=0
i
There are only finitely many possibilities for the vector of such symmetric func-
tions, and by Dirichlet’s pigeon-hole principle there are two integers m , n with
m > n and with the same vector of symmetric functions.
Obviously, this is the same as saying that ζ m = π(ζ n ) for some permutation π
k k
of {1, . . . , d} and by iterating this relation we find that ζim = ζπnk (i) . If we take
k
−nk
k such that π k is the identity, we conclude that ζ m = 1 with mk − nk > 0.
1.5.10. We recall here some basic facts about S -integers and S -units in a number
field K . Let S ⊂ MK be a finite set of places, which includes the set S∞ of all
archimedean places of K . An element x ∈ K is an S -integer if |x|v ≤ 1 for
v∈ / S . The S -integers of K form a subring OS,K of K . The units in OS,K are
called the S -units of K and form a group US,K . An element x ∈ OS,K is an
S -unit if and only if |x|v = 1 for all v ∈
/ S.
Remark 1.5.11. An easy application of the non-archimedean triangle inequality
and Gauss’s lemma (see Lemma 1.6.3) shows that an S∞ -integer is the same as
an algebraic integer in K . If S = S∞ , we simply talk about the integers and the
units of the number field K . The units of K are the algebraic integers x ∈ K
with norm NK/Q (x) = ±1, as we see from writing the norm as a product of
conjugates of x .
1.5.12. We consider the following homomorphism
φ : US,K → R|S| , x → (log |x|v )v∈S
18 HEIGHTS
of groups. By taking the logarithm ofthe product formula, we see that the image
of φ is contained in the hyperplane v∈S yv = 0, y ∈ R|S| . By Kronecker’s
theorem, the kernel of φ is the group µK of roots of unity in K . This is part of
Dirichlet’s unit theorem:
Theorem 1.5.13. Let S be as in1.5.10. The image of φ is a lattice of|S|−1
maximal
rank |S| − 1 in the hyperplane v∈S yv = 0. Hence US,K ∼ = µK × Z .
We will not prove this result here, and refer instead to W. Narkiewicz [215], Th.3.6,
or S. Lang [172], p.104.
1.5.14. The Segre embedding
(n+1)(m+1)−1
PnQ × Pm
Q
−→ PQ
is given by
(x, y) −→ x ⊗ y := (xi yj ) ,
where the pairs ij are, for example, ordered lexicographically (see A.6.4). An
easy calculation shows that
h(x ⊗ y) = h(x) + h(y),
using max |xi yj |v = max |xi |v · max |yj |v for every v .
ij i j
If instead v is archimedean, we use the triangle inequality for the ordinary absolute
value getting
(1) (r) (k)
|xj + · · · + xj |v ≤ |r|v max |xj |v .
k
Then Lemma 1.3.7 implies
log |r|v = log r
v|∞
1.5. Heights in projective and affine space 19
leading to
(k)
h(P1 + · · · + Pr ) ≤ log r + max log+ |xj |v .
j,k
v∈MK
and
|r|v max |αk |v = |α1 + · · · + αr |v .
k
On the other hand, the equality assumption implies h(rα1 ) = log r + rh(α1 ) , hence
h(α1 ) = 0 because r ≥ 2 . Thus by Kronecker’s theorem in 1.5.9, α1 is a root of unity
and we get h(rα1 ) = h(r) = log r .
Another example yielding almost equality in Proposition 1.5.15 is obtained taking αi =
l/(l + ai ) with 1 ≤ ai ≤ N and distinct ai and with l = N !t + 1 , t any positive
integer. Then the numbers l , l + a1 , . . . , l +
ar are coprime
in pairs and an easy
calculation
shows that h(αi ) = log(l + αi ) and h( αi ) = log(l + ai ) + log( αi ) . Hence
h(α1 + · · · + αr ) > h(α1 ) + · · · + h(αr ) + log r − ε for sufficiently large t .
The following result, quite useful in practice, expresses the fact that the height is
invariant by Galois conjugation.
Proposition 1.5.17. Let P be a point of affine or projective space with coordi-
nates (xj ) in Q. If σ ∈ Gal(Q/Q) and if the point σ(P ) is given by the coordi-
nates (σ(xj )), then h(P ) = h(σ(P )).
20 HEIGHTS
where Z ranges over all prime divisors and the degree is with respect to a fixed ample class.
In particular, the height of a rational function f ∈ K(X)× is
h(f ) = h((1 : f )) = − deg(Z) min(0, ordZ (f )).
Z
Thus h(f ) = 0 if and only if f has no poles. By h(f ) = h(f −1 ) , this is equivalent to
div(f ) = 0 .
If X is normal, a function without poles is regular (R. Hartshorne [148], Proposition
I.6.3A), hence constant on the irreducible components of XK . We conclude that in this
case h(f ) = 0 if and only if f is locally constant on X (use A.6.15).
where
|f |v := max |aj |v (1.9)
j
where we have abbreviated T for the unit circle {eiθ | 0 ≤ θ < 2π} equipped with
the standard measure dµ = (1/2π)dθ . Its main advantage is the multiplicativity
1.6. Heights of polynomials 23
property
M (f g) = M (f )M (g).
Let f (t) = ad td + · · · + a0 be a polynomial with complex coefficients and factor-
ization
f (t) = ad (t − α1 ) · · · (t − αd ).
Now we note that the mean value of log |t − α| on the unit circle is log+ |α|. In
fact, for |α| > 1 the function log |t − α| is harmonic in the unit disk, therefore its
mean value on the unit circle is its value at the centre, namely log |α| = log+ |α|.
If instead |α| < 1, the function log |1 − αt| is harmonic in the unit disk and
coincides with log |t − α| on the unit circle, while its value at the centre is 0, that
is log+ |α|. Finally the case |α| = 1 is deduced by continuity.
We have shown that M (t − α) = log+ |α|. If we combine this with the multi-
plicativity property of the Mahler measure, we obtain Jensen’s formula:
Proposition 1.6.5.
d
log M (f ) = log |ad | + log+ |αj |.
j=1
The following result shows the connexion between the Mahler measure and the
height and gives a bound for the absolute norm of an algebraic number.
f (t) = ad td + · · · + a0 .
We have
[K : Q] h(α) = log+ |σα|v (by Proposition 1.5.17)
v∈MK σ∈G
[K : Q]
= log+ |σα|v − log |ad |v (by (1.10))
d
v|∞ σ∈G v |∞
[K : Q]
d
= log |ad |v + log |αj |v ,
+
d j=1
v|∞
where in the last step we have used the product formula and collected the elements
σα into the conjugates αj , j = 1, . . . , d, of α . By Jensen’s formula, this proves
the first claim.
By the second formula of Lemma 1.3.7, we also have
d
log |NQ(α)/Q (α)| log+ |αj |v
v|∞ j=1
hence
d
d
|ad−r | ≤ |ad | max(1, |αj |).
r j=1
d
|ad−r | ≤ M (f ).
r
The following consequence, Northcott’s theorem, is very important.
Theorem 1.6.8. There are only finitely many algebraic numbers of bounded
degree and bounded height.
Proof: Let α be algebraic of degree d and height h(α) ≤ log H . Let f (t) =
ad td + · · · + a0 be the minimal polynomial of α over Z . By Proposition 1.6.6,
we have M (f ) ≤ H d . Also, Lemma 1.6.7 shows that max |ai | ≤ 2d M (f ).
Therefore, the coefficients of f are bounded by (2H)d . Since there are d + 1
integer coefficients for each f , they give rise to not more than (2(2H)d + 1)d+1
distinct polynomials f . Since each f has d roots, the number of algebraic integers
2
of degree d and height at most H is at most d(2(2H)d + 1)d+1 ≤ (5H)d +d .
For later use, we prove here a result of K. Mahler [188], which gives a bound for
the discriminant in terms of the Mahler measure.
Proposition 1.6.9. Let f (x) = ad xd + · · · + a0 be a polynomial with real or
complex coefficients, with roots α1 , . . . , αd . Let
D = ad2d−2 (αi − αj )2
i>j
⎜ ⎟ j 2
|D| = |ad |2d−2 det ⎜ . ⎟ ≤ |ad |2d−2
|αi | .
⎝ .. ... .. .. .. ... ⎠
i=1 j=0
1 α ... αdd−1
d
and the first statement follows from Jensen’s formula, Proposition 1.6.5.
The second statement is also clear from Proposition 1.6.6.
Lemma 1.6.10. Let f (t1 , . . . , tn ) be a polynomial with complex coefficients and
partial degrees d1 , . . . , dn . Then
n n
−1/2 dj
(dj + 1) M (f ) ≤ ∞ (f ) ≤ M (f ).
j=1 j=1
dj /2
Proof: The same proof as in Lemma 1.6.7 holds for the inequality on the left. We
prove the other assertion by induction on n . We can write uniquely
dn
f (t1 , . . . , tn ) = fj (t1 , . . . , tn−1 ) tjn
j=0
dn
log max |fj (eiθ1 , . . . , eiθn −1 )| dµ1 · · · dµn−1 − log
Tn −1 j dn /2
by Lemma 1.6.7, and which in turn is not smaller than
dn
max log |fj (e , . . . , e
iθ1 iθn −1
)| dµ1 · · · dµn−1 − log .
j Tn −1 dn /2
∗
The inequality states that a determinant of a real matrix is majorized by the product of the
euclidean lengths of its rows. Geometrically, it says that the volume of a parallelepiped generated
by real vectors v1 , . . . , vn of given length is maximal when the vectors vi are pairwise orthogonal,
which is quite easy to prove. The result also holds for complex matrices. Hadamard’s proof of 1893 can
be found, among an interesting analysis of extremal cases with entries ±1 (the so-called Hadamard’s
matrices), in [143]. The result was known much earlier to Lord Kelvin and was proved by T. Muir in
1885, see [209], p.32.
1.6. Heights of polynomials 27
We conclude that
dn
M (f ) ≥ max M (fj )
dn /2 j
and the induction hypothesis implies the claim.
As remarked above, Lemma 1.6.7 leads to Gelfond’s lemma:
Lemma 1.6.11. Let f1 , . . . , fm be complex polynomials in n variables and set
f := f1 · · · fm . Then
m
m
2−d ∞ (fj ) ≤ ∞ (f ) ≤ 2d ∞ (fj ),
j=1 j=1
with
m−1 n
(j)
C= 1 + dk ≤ 2d .
j=1 k=1
In the other direction, Lemma 1.6.10 implies
⎛ ⎞⎛ ⎛ ⎞1/2 ⎞
m m n (j)
n m
dk
∞ (fj ) ≤ ⎝ ⎠⎜ ⎝ ⎝1 + ⎟
dk ⎠ ⎠ ∞ (f ).
(j)
(j)
j=1 j=1 k=1
dk /2 k=1 j=1
Remark 1.6.14. For the upper bound, only the sum d of the partial degrees of
f1 · · · fm−1 does matter. In fact, the proof of Gelfond’s lemma shows
⎛ ⎞
m−1 n m
m
|f |v ≤ ⎝ |1 + dk |v ⎠
(j)
|fj |v ≤ |2|dv |fj |v
j=1 k=1 j=1 j=1
for any archimedean place v of a number field containing all the coefficients. Then
m
m−1 n
m
h(fj ) + d log 2,
(j)
h(f ) ≤ h(fj ) + log(1 + dk ) ≤
j=1 j=1 k=1 j=1
†
This polynomial already appears in Lehmer’s paper loc. cit., with a slightly different numerical
value which we have corrected here.
1.7. Norms of products of polynomials 29
We elaborate here further on the question of lower bounds for norms of products of poly-
nomials. The interesting question is to obtain lower bounds which are proportional to the
product of the norms, as in Gelfond’s lemma of the preceding section. It turns out that for
certain natural norms the constants involved in such lower bounds depend only on the de-
grees of the polynomials, not on the number of variables. This section will not be needed at
other places of the book.
1.7.1. Let us denote by
p (f ) the
p -norm of the coefficients of a complex polynomial f .
We shall prove that for p = 1 and p = 2 the
p -norm has the properties mentioned above.
This result extends to all p , 1 ≤ p < ∞ , but we will not prove this extension here. The
more difficult case p = 1 is due to Enflo, who used it in his work on invariant subspaces of
bounded operators in Banach spaces.
Theorem 1.7.2. Let d, e ∈ N . Then there is a constant C(d, e) > 0 such that
1 (f g) ≥ C(d, e)
1 (f )
1 (g)
for complex polynomials f, g of degree d, e in several variables.
Proof: (H.L. Montgomery) For k ∈ N , we define
1 (P k Q)
C(d, e, k) := inf ,
1 (P )k
1 (Q)
where the infimum ranges over all homogeneous polynomials P, Q of degree d, e .
We shall use in the sequel Euler’s formula
∂f
tj = df
j
∂tj
1 =
1 tj = d
1 (f ).
j
∂tj j
∂tj
Both are proved directly by looking at each monomial in f .
Lemma 1.7.3. The following two estimates hold:
C(d, 0, k + 1) ≥ C(d − 1, dk, 1) C(d, 0, k) if d ≥ 1,
e
C(d, e, k) ≥ C(d, e − 1, k + 1) if e ≥ 1.
2kd + e
30 HEIGHTS
∂ k+1 ∂f
1 f = (k + 1)
1 f k
∂tj ∂tj
∂f
≥ (k + 1) C(d − 1, dk, 1)
1
1 f k
∂tj
∂f
≥ (k + 1)C(d − 1, dk, 1) C(d, 0, k)
1
1 (f )k .
∂tj
∂g ∂g
C(d, e − 1, k + 1)
1 (f )k+1
1 ≤
1 f k+1
∂tj ∂tj
∂ ∂f
=
1 f (f k g) − kf k g
∂tj ∂tj
∂f
∂
≤
1 (f )
1 (f g) + k
1 f k g
1
k
.
∂tj ∂tj
1.7.4. The double induction in the proof of the theorem is very expensive for the final esti-
mates. Let us compute some of the constants so obtained. We define recursively Γ(d, e, k)
as follows
⎧
⎪
⎨1 if d = 0 or k = 0
Γ(d, e, k) = Γ(d − 1, d(k − 1), 1) Γ(d, 0, k − 1) if e = 0 and dk = 0
⎪
⎩ e
2kd+e
Γ(d, e − 1, k + 1) if dek = 0
1.7. Norms of products of polynomials 31
The sum here runs over the lattice points in the hyperplane i1 + · · · + in = d of the n -
dimensional cube 0 ≤ iν ≤ d , ν = 1, . . . , n . Note that the number of lattice points in this
cube is (d + 1)n , growing exponentially in n for fixed d .
There is another way of writing the same polynomial, namely
1
n n
∂d f
f (t1 , . . . , tn ) = ··· ti · · · ti d ;
d! i =1 i =1
∂ti1 · · · ∂tid 1
1 d
we define this as the hypercube representation of f , since now the sum is indexed by the
lattice points of the d -dimensional cube 1 ≤ iδ ≤ n , δ = 1, . . . , d . The number of lattice
points in this cube is nd , which grows polynomially in n for fixed d .
The hypercube representation of a polynomial is very convenient if we want to study poly-
nomials of low degree in a large number of variables.
1.7.6. Define for p ≥ 1
⎛ ⎞
n p 1/p
1 ⎝
n
∂d f ⎠ .
[f ]p := ···
d! i =1 i =1
∂ti1 · · · ∂tid
1 d
holds for all symmetrical functions F ∈ Lp ([0, 1]d ) , G ∈ Lp ([0, 1]e ) , with p denot-
ing the Lp -norm.
Lemma 1.7.8. The constants cp (d, e) and kp (d, e) are related by
d+e
cp (d, e) = kp (d, e).
d
The rest of the proof is an approximation argument. Consider the discretization i/n , i =
1, . . . , n of [0, 1] ; given continuous F , G on [0, 1]d and [0, 1]e , we approximate F , G
by step functions as above and construct corresponding polynomials f , g . As n → ∞ ,
these functions are dense in Lp ([0, 1]d ) and Lp ([0, 1]e ) .
Proposition 1.7.9. The constant c2 (d, e) is
1/2
d+e
c2 (d, e) = .
d
Proof: We may suppose that f , g are homogeneous. The first claim follows from
Lemma 1.7.8 and Proposition 1.7.9. The second follows from (a) and (1.11) on page 31.
The material in the first five sections of this chapter is quite standard and was
mainly taken from S. Lang [169] and J.-P. Serre [277]. However, the reader must
be warned that our normalization for absolute values does not always agree with
the normalization used by other authors. The rationale for our normalization is
that the degree [K : Q] does not appear in the first formula in Lemma 1.3.7, and
therefore it is absent in the definition of the absolute logarithmic height. This leads
to formulas invariant by field extensions.
The proof of Gelfond’s lemma in Section 1.6 follows K. Mahler [187], where
the important Mahler’s height is introduced. The inequality M (f ) ≤ f L2 (T)
appearing in the proof of Lemma 1.6.7 can be found in E. Landau [164], Satz 443,
with a somewhat different proof.
Section 1.7 is mostly from B. Beauzamy, E. Bombieri, P. Enflo, and H.L.
Montgomery [18].
2 WEIL HEIGHTS
2.1. Introduction
34
2.2. Local heights 35
lemma over Z given in 2.9.1 and its Corollary 2.9.2 over number fields, where the
constants are not made explicit, but which is quite often enough for applications.
The reader should be familiar with the concept of Cartier divisors and its connexion
to meromorphic sections of line bundles, as in A.8.
In this section we introduce local heights associated to Cartier divisors on a projec-
tive variety X . However, in order to define them properly we need additional data
beyond the divisor D itself, namely a realization O(D) = O(D+ ) ⊗ O(−D− )
with base-point-free line bundles O(D± ) coming with given sets of generating
global sections. The set of Cartier divisors equipped with these additional data
forms a monoid, and the local heights so defined behave functorially with respect
to this monoid. This removes the need of working modulo bounded functions when
studying Weil heights, a point of crucial importance for applications because it al-
lows precise estimates.
2.2.1. Let K be a field and let us fix an absolute value | | on K . Let X be a
projective variety over K , which for simplicity we assume here to be irreducible.
Let D be a Cartier divisor on X with associated line bundle O(D) and meromor-
phic section sD . For construction of O(D) and sD , see A.8.18. Note that the
associated Cartier divisor D(sD ) of sD is equal to D .
There are base-point-free line bundles L, M on X such that O(D) ∼ = L ⊗ M −1
(cf. A.6.10 (a)). Now choose generating global sections s0 , . . . , sn of L and
t0 , . . . , tm of M , and call the data
D = (sD ; L, s; M, t)
a presentation of the Cartier divisor D .
2.2.2. For P ∈
/ supp(D), we define
sk
λD (P ) := max min log (P ) .
k l tl sD
We use the notation tl sD for tl ⊗sD and sk /s for sk ⊗(s )−1 . Hence sk /(tl sD )
is a rational function on X .
We call λD (P ) the local height of P relative to the presentation D and, by abuse
of language, relative to D . In fact, it depends on the choice of sD as well as
on L, M and their generating sections. The local height is a real-valued function
defined outside of the support of the divisor D .
Example 2.2.3. Let f be a non-zero rational function on X with Cartier di-
visor D := D(f ). Then O(D) = OX and f is a meromorphic section of
36 WEIL HEIGHTS
Proof: It is enough to prove the claim for a refinement of {Ul }. Hence we can
assume that there are regular functions hl on U such that Ul = {x ∈ U | hl =
0}, see A.2.10. By Lemma 2.2.7 there are regular functions gl on U such that
l gl hl = 1. If C is the cardinality of the covering and δ is as before, then
−1
−δ
inf max |hl (P )| ≥ C sup max |gl (P )| > 0. (2.2)
P ∈E l P ∈E l
We define
El := {P ∈ E | |hl (P )| = max |hk (P )|}.
k
Obviously, El ⊂ Ul (K) and E = El . Let f1 , . . . , fN be a set of generators
l
of K[U ]. Then f1 , . . . , fN , 1/hl are generators of K[Ul ]. By Lemma 2.2.9, it is
enough to show that |1/hl | is bounded on El . In fact, the bound
sup |1/hl (P )| ≤ C δ sup max |gk (P )| < ∞. (2.3)
P ∈El P ∈E k
for a certain finite set of elements pm ∈ K , independent of the absolute value | | and
determined exclusively in terms of geometric data.
In this section, starting from the local heights previously defined, we consider the
case in which K is a number field and define global heights.
2.3.1. Let X be an irreducible projective variety defined over K .
We consider a Cartier divisor D on X with presentation
D = (sD ; L, s; M, t).
Let F be a number field with K ⊂ F ⊂ K and let P ∈ X(F ) \ supp(D). For
v ∈ MF , we define the local height
sk
λD (P, v) := max min log (P )
k l tl sD v
using our normalizations from 1.3.6. Let p ∈ MQ be the restriction of v to Q
and let | |u be an absolute value on K such that the restriction to K is equivalent
to | |v and such that the restriction to Q is equal to | |p . The existence of | |u
follows from Proposition 1.3.1. Then
[Fv : Qp ]
λD (P, v) = λD (P, u),
[F : Q]
where λD (P, u) is the local height relative to the absolute value | |u from 2.2.2.
This allows us to apply the results from Section 2.2 to λD (P, v).
40 WEIL HEIGHTS
This explains the name local height. This notion will be extended later to arbitrary
divisors.
2.3.3. We go back to the general case in 2.3.1. Let λD be a local height relative to
the presentation D = (sD ; L, s; M, t) of a Cartier divisor D on X . For P ∈ X
there are sj and tl such that sj (P ) = 0, tl (P ) = 0. Therefore, we can find a non-
zero meromorphic section s of O(D) such that P is not contained in the support
of the Cartier divisor D(s). Then D(s) = (s; L, s; M, t) is a presentation of
D(s) and we have
λD(s) = λD + λf ,
where f is the rational function s/sD . If F is a finite extension K ⊂ F ⊂ K
such that P ∈ X(F ), the local height λD(s) (P, v) is finite for any v ∈ ML ,
because P is not in the support of D(s). Then we define the global height of P
relative to λ := λD by
hλ (P ) := λD(s) (P, v).
v∈MF
The next result justifies the definition and the name global height.
Proposition 2.3.4. The global height hλ is independent of the choices of F and
of the section s .
Proof: By Lemma 1.3.7, the global height is independent of F . Its independence
from the choice of s can be verified as follows. Let t be another non-zero mero-
morphic section of O(D) with P ∈ / supp(D(t)). Then 2.2.4 and 2.2.5 show
that
λD(s) (P, v) − λD(t) (P, v) = λs/t (P, v)
for any v ∈ MF . On the other hand, the product formula shows that the global
height of P relative to λs/t is 0, proving the claim.
Remark 2.3.5. As an immediate consequence the global height relative to the
natural local height of a non-zero rational function is identically 0. It is also clear
that the map λ → hλ is a group homomorphism.
Theorem 2.3.6. Let λ, λ be local heights relative to Cartier divisors D, D with
D − D a principal divisor. Then hλ − hλ is a bounded function.
2.3. Global heights 41
and we get
[Kv :Qp ]
|hλ (P )| ≤ |λ(P, w)| ≤ γv < ∞.
[K:Q]
w∈MF v∈MK
Proof: The first claim follows from Remark 2.3.5 and Theorem 2.3.6. The second
one is an immediate consequence of 2.2.6.
It is quite trivial, but important, to remark that a base-point-free line bundle has
always a non-negative height function. A more general result is the following
Proposition 2.3.9. Let D be an effective Cartier divisor on X . Then there is a
local height λ relative to D such that, for any P ∈
/ supp(D) and for any place u
of K , it holds λ(P, u) ≥ 0.
Proof: There are base-point-free line bundles L, M on X such that O(D) ∼ =
L ⊗ M −1 . Choose generating global sections t0 , . . . , tl of M . We can complete
sD t0 , . . . , sD tl to a family s0 , . . . , sk of generating global sections of L. The
local height given by the presentation
D = (sD ; L, s; M, t)
is non-negative outside of the support of D .
2.3.10. The results of Sections 2.2 and 2.3 extend immediately to varieties which
are not necessarily irreducible. Here we must be careful to require that all mero-
morphic sections considered are invertible, i.e. not identically 0 on any irreducible
component of X . For the functorial property of 2.2.6, we must assume that no ir-
reducible component of Y is mapped into the support of D , in order to guarantee
a well-defined pull-back of the Cartier divisor.
2.3.11. We may introduce global heights for any field with product formula as long
as we work with properly normalized absolute values (see 1.3.6 for a perfect field
and 1.3.12 in general). Then all results of this section continue to hold.
2.3.12. We may also replace the ground field K by K . Then all geometric data
as varieties, morphisms, line bundles, and sections are defined over a sufficiently
large number field K and there is no problem about considerations with global
heights relative to the ground field K . Since the global height does not depend
on the ground field, it also makes sense to consider it as a global height over the
algebraically closed field K .
2.4.2. If ψ : X → PmQ
is another morphism over Q, the join ϕ#ψ is the
morphism
(n+1)(m+1)−1
X → PQ , x → (ϕj (x)ψk (x)),
with the lexicographic ordering on pairs (i, j).
It may be viewed as the composition of the graph morphism G(ψ) : X → X×Pm
Q
,
the product map ϕ × id : X × PQ → PQ × PQ , and the Segre embedding
m n m
(n+1)(m+1)−1
PnQ × Pm
Q
→ PQ (cf. A.6.4).
Remark 2.4.3. If ϕ is a closed embedding, then ϕ#ψ is a closed embedding.
In order to prove this claim, note that G(ψ) is always a closed embedding (see
A. Grothendieck [134], Cor.5.4.3). If ϕ is a closed embedding, then ϕ × id is a
closed embedding ([134], Prop.4.3.1). The Segre embedding is also a closed em-
bedding (cf. A.6.4). Since the composition of closed embeddings remains a closed
embedding ([134], Prop.4.2.5), we conclude that ϕ#ψ is a closed embedding.
The following proposition formalizes a remark already made in 1.5.14 about the
height in Segre embeddings.
Proposition 2.4.4. If ϕ : X → PnQ and ψ : X → Pm
Q
are morphisms over Q ,
then
hϕ#ψ = hϕ + hψ .
2.4.5. We claim that every Weil height may be viewed as a global height in the
sense of Section 2.3. There is a linear form = 0 x0 + · · · + n xn , which does
not vanish identically on any irreducible component of X . Then it follows from
Example 2.3.2 and 2.2.6 that hϕ is the global height relative to the presentation
ϕ∗ (; OPn (1), x0 , . . . , xn ; OPn , 1).
Q Q
2.4.6. Conversely, we can write every global height as a difference of two Weil
heights. Let hλ be the global height relative to the presentation
D = (s; L, s0 , . . . , sn ; M, t0 , . . . , tm ).
We consider the morphisms
ϕ : X → PnQ , x → (s0 (x) : · · · : sn (x))
and
ψ : X → Pm
Q
, x → (t0 (x) : · · · : tm (x))
as in A.6.8. Then it follows from the independence of hλ from s and 2.4.5 that
hλ = h ϕ − h ψ .
44 WEIL HEIGHTS
2.4.7. Note that in 2.4.6 we may even assume that ϕ and ψ are closed embeddings
into projective spaces. This follows from Remark 2.3.5 and Proposition 2.4.4,
choosing any closed embedding θ of X into some projective space over Q and
replacing ϕ , ψ by ϕ#θ , ψ#θ .
Theorem 2.4.8. If ϕ : X → PnQ and ψ : X → Pm Q
are morphisms over Q with
ϕ∗ OPn (1) ∼
= ψ ∗ OPm (1), then hϕ − hψ is a bounded function.
Proof: Using 2.4.5, this is a reformulation of Theorem 2.3.6.
Our next result is the general version of Northcott’s theorem, which is both sim-
ple and fundamental.
Theorem 2.4.9. Let X be a projective variety defined over the number field K
and let hc be a height function associated to an ample class c ∈ Pic(X). Then
the set
{P ∈ X(K) | hc (P ) ≤ C, [K(P ) : K] ≤ d}
is finite for any constants C, d ∈ R .
Proof: There is m ∈ N such that mc is very ample. By Theorem 2.3.8, mhc
is a height function associated to mc . Therefore, we can assume without loss
of generality that c is very ample. By Theorem 2.4.8, it is enough to prove the
statement for X = PnQ and c = cl(OPn (1)), i.e. for the standard height on PnQ .
Let U := {xj = 0} be a standard affine subset of PnQ . We have to show that there
are only finitely many points P in U (Q) with h(P ) ≤ C and [K(P ) : K] ≤
d . The height of P ∈ U is an upper bound for the heights of the coordinates.
Therefore, the case n = 1 implies the general statement. This is Theorem 1.6.8,
ending the proof.
Remark 2.4.10. Clearly, we may also introduce Weil heights for any field with
product formula and all results above remain true with the exception of Northcott’s
theorem. We may use Example 1.5.23 as a counterexample if the field is infinite.
Example 2.4.11. The following example shows that Weil heights in the geometric case may
be interpreted in terms of intersection theory, as a degree function. This is conceptually very
important, because it allows us to use the intuition and methods of algebraic geometry in
dealing with heights.
The corresponding result in the arithmetic case lies much deeper and requires intersection
theory in the setting of arithmetic algebraic geometry (see Example 2.7.20).
Let X be an irreducible regular projective variety over an arbitrary field K , and let deg be
the degree of cycles corresponding to a fixed embedding of X into a projective space PnK .
By Proposition 1.4.7, we have a canonical set of absolute values on K(X) satisfying the
product formula. A point P ∈ PnK (K(X)) is given by coordinates f0 , . . . , fn ∈ K(X) .
Let ϕ be the rational map
X PnK , x
→ ϕ(x) = (f0 (x) : · · · : fn (x)).
2.5. Explicit bounds for heights 45
Let x0 , . . . , xn be the coordinates of PnK , viewed as global sections of OPn (1) . Choose
j ∈ {0, . . . , n} such that xj |ϕ(X ) = 0 . Then the vector (f0 , . . . , fn ) is proportional to
(ϕ∗ x0 /ϕ∗ xj , . . . , ϕ∗ xn /ϕ∗ xj ) ∈ K(X)n+1
and we may assume that they are equal. By Example 1.5.23, we have
h(P ) = − min ordZ (fi ) deg Z
i=0,...,n
Z
where the sums range over all prime divisors Z of X . By the valuative criterion of proper-
ness (cf. A.11.10), the domain U of ϕ has a complement of codimension at least 2 . The
local ring associated to a prime divisor was introduced in A.8.7. By choosing a trivialization
of (ϕ|U )∗ OPn (1) at a generic point of Z , we may view ϕ∗ (xi ) as regular functions in Z .
Therefore, we have
min ordZ (ϕ∗ xi ) = 0
i=0,...,n
and thus
h(P ) = ordZ (ϕ∗ xj ) deg Z.
Z
Since X\U is of codimension at least 2 , the restriction map induces an isomorphism
Pic(X) −→ ∼ Pic(U )
(because on a regular variety Cartier divisors and Weil divisors can be identified, cf. A.8.21).
So it makes sense to view (ϕ|U )∗ OPn (1) as an element of Pic (X) , which we simply
denote by ϕ∗ OPn (1) . It follows that
h(P ) = deg ϕ∗ OPn (1),
where the right-hand side denotes the degree of any divisor of a non-zero meromorphic
section of ϕ∗ OPnK (1) (see A.9.26). If Y is a projective variety over K(X) and ι : Y →
PnK (X ) is a closed embedding over K(X) into projective space, then P ∈ Y (K(X))
induces a rational map
ϕ : X Y
as above, and we have
hι (P ) = deg ϕ∗ OY (1),
where OY (1) is the pull-back of OPn (1) to Y .
This is a somewhat technical section, the reading of which can be omitted at first. Its ulti-
mate purpose is to give a meaning to the phrase “effectively computable,” which otherwise
would be only a hollow claim, devoid of true mathematical significance.
The main tool in this section is the concept of presentation of a closed embedding of a
projective variety in projective space. The basic idea can be described as follows. Let
X → PnQ be a projective algebraic variety over the algebraically closed field Q , embedded
in projective space PnQ . It is well known that every rational function on X is then induced
46 WEIL HEIGHTS
by restriction of a rational function in the ambient space PnQ . On the other hand, we often
need to compare situations relative to different embeddings. The point of view taken in this
section is therefore the following. Since we are dealing with the function field Q(X) of X ,
we are allowed to choose a hypersurface in Pr+1Q
as a birational model. The homogeneous
coordinate ring S is the quotient of a polynomial ring by a principal ideal. This allows us to
introduce a height in S . Thus fixing this choice gives us a reference description of elements
in S .
This being done, we consider an arbitrary closed embedding X → PnQ . Then a presentation
of the embedding X → PnQ , relative to the reference ring S , consists in expressing the
rational functions (xi /xj )|X as elements of S . Now the problem of comparing heights
relative to different embeddings can be solved by comparing corresponding presentations.
This leads to very explicit comparison estimates for heights. The details are as follows.
2.5.1. Let X be an irreducible projective variety over Q of dimension r . There is a Q-
morphism
π : X −→ Pr+1 Q
such that X is mapped birationally onto a hypersurface (cf. A.11.5 and A.11.6). We
denote by z0 , . . . , zr+1 the standard coordinates of Pr+1
Q
. Then we may assume that the
hypersurface is given by an irreducible homogeneous polynomial f of degree d of the form
d−1 d
f (z0 , . . . , zr+1 ) = f0 + f1 zr+1 + · · · + fd−1 zr+1 + zr+1 ,
where fi ∈ Q[z0 , . . . , zr ] is homogeneous of degree d − i , f (0, . . . , 0, 1) = 0 and d is
the degree of X with respect to π ∗ OPr +1 (1) (cf. A.11.7). This situation is fixed for the
whole section.
2.5.2. Let S be the homogeneous coordinate ring of π(X) . We have
S = Q[z0 , . . . , zr+1 ]/I,
where I is the homogeneous ideal generated by f . Let z i be the image of zi in S (0 ≤
i ≤ r + 1) and note that z r+1 is integral over Q[z 0 , . . . , z r ] . The variables z 0 , . . . , z r
are algebraically independent, because the transcendence degree of Q(π(X)) = Q(X) is
r (cf. A.4.11). By abuse of notation, we denote them again by z0 , . . . , zr . The minimal
polynomial of z r+1 over the polynomial ring Q[z0 , . . . , zr ] is equal to f (z0 , . . . , zr , ·) ,
since
d−1
0 = f0 + f1 z r+1 + · · · + fd−1 z r+1 + z dr+1 . (2.5)
The elements form a basis of S over Q[z0 , . . . , zr ] and so we have an
d−1
1, z r+1 , . . . , z r+1
isomorphism of Q-vector spaces
∼ {p ∈ Q[z0 , . . . , zr+1 ] | deg
S −→ z r +1 (p) < d}.
By means of this map, we define the height of an element of S as the height of the corre-
sponding polynomial.
2.5.3. For l ∈ N , there are uniquely determined qlj ∈ Q[z0 , . . . , zr ] (j = 0, . . . , d − 1)
such that
d−1
z lr+1 = qlj z jr+1 . (2.6)
j=0
2.5. Explicit bounds for heights 47
The polynomials qlj are homogeneous of degree l − j (elements of negative degree are 0),
and qlj = δlj for 0 ≤ l ≤ d − 1 , where δlj is Kronecker’s symbol. We may now assume
that l ≥ d . Equation (2.5) shows
d−1
d−1
d−1
z lr+1 = − k+l−d
fk z r+1 =− fk qk+l−d,j z jr+1 ,
k=0 j=0 k=0
n
Lemma 2.5.6. Let ϕj : X → PQj , j = 1, . . . , k, be closed embeddings over Q with
presentations p(j) and let n := (n1 + 1) · · · (nk + 1) − 1 . Then the join ϕ1 # · · · #ϕk
gives a closed embedding
ϕ : X −→ PnQ , P
−→ (ϕ1i1 (P ) · · · ϕkik (P ))ij ∈{0,...,n j } .
with
C = (d − 1)h(f ) + d(d + r + 1).
Proof: By Remark 2.4.3, ϕ is a closed embedding. Also, p is a presentation of ϕ of degree
d(p) = d(p(1) ) + · · · + d(p(k) ).
To prove the estimate for the height, by induction we may assume k = 2 . We have the
decomposition
(j)
d−1
(j)
pi j = pij m z m
r+1 (j = 1, 2),
m=0
whence
d−1
(1)
d−1
(2)
pi = pi1 m 1 zm 1
r+1 pi2 m 2 zm 2
r+1 .
m 1 =0 m 2 =0
with
(1) (2)
2d−2 (1) (2)
pim : = pi1 m 1 pi2 m 2 + pi1 m 1 pi2 m 2 qlm .
m 1 +m 2 =m l=d m 1 +m 2 =l
m 1 ,m 2 ≤d−1
Thereby we have used the fact that number of monomials of degree D in r + 1 variables
is equal to r+D
D
. We use the estimates
r + d(p(1) ) − m1 1 r r
≤ r + d(p(1) ) < 3 + 3 d(p(1) )/r
r r!
and
r+l−m
≤ 2r+l−m
r
to conclude that
r
B := d 2r+2d 3 + 3 d(p(1) )/r
with
C := (d − 1)h(f ) + d(d + r + 1).
Remark 2.5.7. If we work over a fixed number field K and with an irreducible reduced
projective variety, then the constructions in 2.5.1 to 2.5.3 can be done over K and every
K-morphism to projective space has a presentation defined over K . Moreover, Lemma
2.5.6 remains valid.
xα := xα0 0 · · · xαNN .
If Y is a closed subvariety of PN
Q
, we denote by JY the ideal sheaf of Y .
ϕ∗ OPn (1) ∼ ∗
= ψ OPm (1).
If k ≥ kψ and χ(k) := dim H 0 (X, ψ ∗ OPm (k)) and P ∈ X , then
hϕ (P ) − hψ (P ) ≤ (n + 1)χ(k) h(p) + h(q)
1
+ log((n + 1)χ(k)) + C ,
k
where
C := (d − 1)h(f ) + d(d + r + 1).
Proof: In order to understand the proof, the reader should be familiar with some basic facts
from cohomology of sheaves, as given in A.10.
The existence of kψ is a well-known result (see A.10.27). There is a short exact sequence
0 −→ JψX −→ OPm −→ ψ∗ OX −→ 0
of coherent sheaves on Pm
Q
. Tensoring with OPm (k) yields a short exact sequence
as we have verified in Example A.10.20. Using A.10.25, the first terms of the long exact
cohomology sequence are
∗
0 −→ H 0 Pm 0 m 0
Q , JψX ⊗ OP (k) −→ H (PQ , OP (k)) −→ H (X, ψ OP (k)) −→
m m m
−→ H 1 (Pm
Q , JψX ⊗ OP (k)) −→ · · · .
m
The last cohomology group is 0 by the choice of k , and we infer that the map
∗
H 0 Pm 0
Q , OP (k) −→ H (X, ψ OP (k))
m m
is surjective. By assumption, the invertible sheaves ϕ∗ OPn (1) and ψ ∗ OPm (1) are isomor-
phic and we may identify them. Let x = (x0 : · · · : xn ) and y = (y0 : · · · : ym ) be the
standard coordinates of PnQ and Pm Q
, and choose B ⊂ {β ∈ Nm+1 | |β| = k} such that
y β X
β ∈B
is a basis of H 0 (X, ψ ∗ OPm (k)) . There are uniquely determined aiβ ∈ Q such that
xki X = aiβ yβ X . (2.11)
β∈B
2.5. Explicit bounds for heights 51
Let P ∈ X(Q) . Choose a number field F containing xi (P ), yj (P ) and aiβ for all i, j, β .
By Proposition 2.4.4, we obtain, for the k -fold join ψ (k) := ψ# · · · #ψ , the equation
k(hϕ (P ) − hψ (P )) = log max |xki (P )|v − hψ (k ) (P )
i
v∈M F
= log max |xki (P )|v − log max |yβ (P )|v .
i |β|=k
v∈M F v∈M F
Conversely, assume that (aiβ ) is a non-trivial solution of this equation. Then we have
(xi |X )k alβ (yX )β = (xl |X )k aiβ (yX )β .
β∈B β∈B
Let i be such that xi |X is not identically 0 . Then the last displayed equation shows that
the rational function on X defined by
β
β∈B aiβ y X
g :=
xki |X
does not depend on the index i . We claim that g is constant. To prove this, it suffices
to show that g is a regular function (use that X is projective and A.6.15). Indeed, since
x0 , . . . , xn generate OPn (1) , we see that for any point P ∈ X(Q) , there is an index i
such that xi (P ) = 0 , hence g is regular at P , as asserted.
This proves that the space of solutions of (2.13) is spanned by the matrix a = (aiβ ) given
by (2.11).
Our next task is to estimate h(a) . Since a scalar factor does not change the height, we may
estimate the height of any non-trivial solution of (2.13). By Lemma 2.5.6, we have a natural
presentation of ϕ(k) #ψ (k) in terms of p and q . The elements pki qβ of S are entries of
that presentation. The decomposition
d−1
pki qβ = cβij z jr+1
j=0
Let cβij = cβijα z0α 0 · · · zrα r , so that the coefficients cβijα of the polynomials cβij
form a matrix c with
h(c) ≤ k (h(p) + h(q) + r log(6 + 6d(p)/r) + r log(6 + 6d(q)/r) + C) (2.15)
again by Lemma 2.5.6. Moreover, (2.14) is equivalent to the linear system of equations
(cβijα alβ − cβljα aiβ ) = 0
β∈B
indexed by i, l, j, α and unknowns aiβ . Let A denote the matrix associated to this linear
system; its entries are either 0 or ±cβijα . The number of unknowns is (n + 1)|B| and,
as remarked before, the space of solutions has dimension 1 . Therefore, the rank R of the
matrix A is
R = (n + 1)χ(k) − 1.
Let A be a R × (R + 1) submatrix of A of full rank R . Since A and A have the
same kernel, we look for a non-zero solution a of A · a = 0 . We consider a as a vector
(a0 , . . . , aR ) and A as a matrix of the form (Aµν )µ∈{1,...,R}, ν ∈{0,...,R} . Obviously, the
vector a with ρ th entry
aρ := (−1)ρ+1 det(Aµν )µ∈{1,...,R}, ν ∈{0,...,R}\{ρ}
is a non-zero solution of A · a = 0 . The estimate
max |aρ |v ≤ |R!|δδ v max |cβijα |R
v
ρ=0,...,R β,i,j,α
1
+ r log(6 + 6d(p)/r) + r log(6 + 6d(q)/r) + C + log((n + 1)χ(k))
k
proving the proposition.
The upper bound in Proposition 2.5.9 is quite explicit except for χ(k) . The next lemmas
will handle this problem.
Lemma 2.5.10. Let ψ : X → Pm Q
be a closed embedding over Q with presentation q and
let k ≥ kψ , χ(k) be as in Proposition 2.5.9. Then
kd(q) + r + 1 kd(q) − d + r + 1
χ(k) ≤ − .
r+1 r+1
Proof: Let y0 , . . . , ym be the standard coordinates of Pm
Q
. We have seen in the proof of
Proposition 2.5.9 that the linear map
∗
H 0 (Pm 0
Q , OP (k)) −→ H (X, ψ OP (k))
m m
is a basis of H 0 (X, ψ ∗ OPm (k)) . The above monomials are linearly independent if and
Q
only if the polynomials (qβ )β∈B are linearly independent, by definition of a presentation.
Therefore
kd(q) + r + 1 kd(q) − d + r + 1
χ(k) ≤ dim Skd(q) = − ,
r+1 r+1
because by 2.5.2 the space Skd(q) is isomorphic to the vector space of homogeneous poly-
nomials p(z0 , . . . , zr+1 ) of degree kd(q) satisfying degz r +1 (p) < d and because of (2.9)
on page 47 .
The next result is a slight generalization of a result of Mumford. Note that by base change
A.10.28, we may just as well work over the field of complex numbers. For details and a
proof of this lemma, we refer to A. Bertram, L. Ein, and R. Lazarsfeld [22].
Lemma 2.5.11. Let Y be an irreducible smooth closed subvariety of Pm
Q
defined over Q
and let c := min(1 + dim(Y ), codim(Y, Pm
Q
)) . Then
H i (Pm
Q , JY ⊗ OP (k)) = 0
m
Q
with
(d + 1)r (r + 1)r
C1 = d , C2 = (d − 1)h(f ) + d(d + r + 1) + r + 1.
r!
54 WEIL HEIGHTS
Remark 2.6.3. If E is bounded in X , Lemma 2.2.10 shows that for any finite
covering {Ui }i∈I of X by affine open subsets there is a subdivision
E= Ei , Ei ⊂ Ui (K),
i∈I
such that each Ei is bounded in Ui .
It is easy to prove that the image of a bounded set under a morphism is again
bounded. Moreover, if Y is a closed subvariety of X and E ⊂ Y (K), then E is
bounded in Y if E is bounded in X . The details of the proofs will be left to the
reader.
2.6. Bounded subsets 55
Example 2.6.4. In this example, we assume that K is locally compact with re-
spect to | | (for example, the completion of a number field with respect to a place,
cf. 1.2.12). We consider on X(K) the topology induced locally by open balls with
respect to closed embeddings into affine spaces and the maximum norm. Then the
topology is locally compact and independent of the embeddings. It depends only
on the place v represented by | | and is called the v -topology on X(K). A subset
E of X(K) is bounded in X if and only if E is relatively compact in X(K).
In order to prove this statement, suppose first that E is bounded in X . By defini-
tion, E may be covered by finitely many closed balls in affine spaces. Since these
balls are compact, we conclude that E is relatively compact.
On the other hand, let E be relatively compact in X. Passing to the closure, we
may assume E compact. Then E may be covered by finitely many open balls in
affine spaces. By Lemmas 2.2.9 and 2.2.10, E is bounded in X .
Example 2.6.5. The set Pn (K) is bounded in projective space PnK . We can use
the affine covering Xi := {x ∈ PnK | xi = 0}, i ∈ {0, . . . , n}, and the decompo-
sition Ei := {x ∈ PnK | |xi | = max |xj |} of E . By Remark 2.6.3, the set of
j=0,...,n
K-rational points is bounded in any projective variety. Implicitly, we have already
used these facts in the proof of Theorem 2.2.11.
Remark 2.6.7. It is trivial that any subset of a bounded subset is bounded again.
However, we may not pass from X to an open subset. For example, the set E =
{x ∈ Pn (K) | x0 = 0} is bounded in PnK by Example 2.6.5 but it is certainly
not bounded in the affine space {x ∈ PnK | x0 = 0}. Thus the notion of bounded
subset is not a local one and some care is needed with using it.
such that (Eiu )u∈M is M-bounded in Ui in the sense of Definition 2.6.11 for every
i ∈ I.
Example 2.6.16. The set Pn (K) is M-bounded in PnK . The same covering and
the same decomposition as in Example 2.6.5 work. Note that the old Ei now
depend on u ∈ M . Even by starting with a single set, we have to work with
families.
Proposition 2.6.17. A complete K-variety is M-bounded. More generally, the
inverse image of an M-bounded family of subsets under a proper morphism is
M-bounded.
Proof: We leave it to the reader to make the obvious adjustments in the proof of
Proposition 2.6.6.
Definition 2.6.18. A real function f on X × M is called locally M-bounded if,
for any M-bounded family (E u )u∈M in X , there is for every v ∈ MK a non-
negative real number γv , with γv = 0 only for finitely many v ∈ MK , such that
for all u ∈ M with v|u we have
|f (E u , u)| ≤ γv .
This section contains additional material for local and global heights. It will be
used only in Sections 9.5 and 9.6 and may be skipped in a first reading. In Sections
2.2–2.5, we have studied Weil heights coming from morphisms to projective space.
However, as remarked by A. Néron and as we will see in 9.5, the canonical height
functions on abelian varieties are not of this shape. To deal with this situation,
a local height is associated to every locally bounded metric of a line bundle on
an arbitrary variety. We will extend the results from Sections 2.2 and 2.3 to this
framework.
Let K be a field with a fixed embedding into its algebraic closure K . For the
moment, we fix an an absolute value | | on K . From Definition 2.7.12 on, we
deal with several absolute values. In order to define global heights, we will assume
in 2.7.16–2.7.19 that the product formula holds.
58 WEIL HEIGHTS
We claim that both metrics are locally bounded. Let s be a nowhere vanishing
section of OPnK (1) over an open subset U of PnK and let E be bounded in U .
By Remark 2.6.3, we may assume that U is contained in a standard open subset
{x ∈ PnK | xj = 0}. Then s = f xj for an invertible regular function f on U .
2.7. Metrized line bundles 59
By definition, log |f | is bounded on E . For log xj (resp. log xj 2 ), we have
the upper bound 0. A lower bound is easily obtained by using |xi /xj | bounded
on E for every i ∈ {0, . . . , n}. This proves the claim.
Proposition 2.7.5. Every line bundle on an arbitrary variety over K admits a
locally bounded metric.
Proof: For simplicity, we first prove the claim for a projective variety X . We may
assume that our line bundle L is generated by global sections s0 , . . . , sm because
every element in Pic(X) is the difference of two very ample ones (see A.6.10).
Then the morphism ϕ : X → Pm K , given by ϕ(x) = (s0 (x) : · · · : sm (x)),
satisfies ϕ∗ OPm (1) = L. The pull-back of the standard metric (cf. Example
2.7.4) is locally bounded. For further reference, we note that
si
si (x) = min (x) . (2.16)
j sj
Now let L be any line bundle on an arbitrary variety X over K . We cover X
by finitely many affine trivializations {Ui }i=1,...,m of L. Let si be a nowhere
vanishing section of L over Ui . A first try to define the metric of L on Uj would
be formula (2.16) with j restricted to the ones with x ∈ Uj . Clearly, log si
would be bounded from above by 0 on Ui . But poles of sj along Ui \ Uj would
make it impossible to give a lower bound on every bounded subset of Ui . The
following smoothing process avoids this problem.
Let xi1 , . . . , xini be coordinates on Ui . Note first that xi1 , . . . , xini , xj1 , . . . , xjnj
are coordinates on Ui ∩ Uj . This is clear by realizing Ui ∩ Uj as the closed affine
subvariety of Ui × Uj given by intersection with the diagonal (which is closed by
definition of a variety).
For notational simplicity, we add xi0 = 1 to the coordinates on every Ui . We
s
consider the transition function gji = sji of L on Ui ∩ Uj . Because gji may
be written as a polynomial in the coordinates xi , xj , we have a constant C1 and
r ∈ N such that
|gji (x)| ≤ C1 max(|xik |, |xjl |)r ≤ C1 max |xik |r · max |xjl |r (2.17)
k,l k l
To prove that this metric is locally bounded, let E be a bounded subset of an open
subset U of X and let s be a nowhere vanishing section of L over U . By Remark
2.6.3, we may assume that E ⊂ Ui for some i ∈ {1, . . . , m}. By definition, it is
clear that log |s/si | is bounded on E . Since
s
log s = log + log si
si
on E , it follows that we may assume U = Ui , s = si . There is a constant C2
such that
max |xik | ≤ C2 (2.19)
k=0,...,ni
Using xi0 = xj0 = 1, gij gji = 1, formulas (2.17) and (2.19), we get
−r
1 1
max |gij (x)xjl | ≥
r
max |xik | ≥
l C1 k C1 C2r
leading to the lower bound
log si (x) ≥ − log C1 − r log C2
on E . This proves the claim.
Remark 2.7.6. It is sometimes useful to impose additional requirements on the
metrics. In the archimedean case, it is often convenient to require that the functions
s be C ∞ for every open subset U and every nowhere vanishing s ∈ L(U ). In
this case, we should work with the metric 2 on OPnK (1), because the standard
metric is not differentiable.
For a non-archimedean absolute value | | on K with place u , it is natural to
assume that s(x) ∈ |K(x)| for every x ∈ X . Moreover, we may assume that
the functions s are continuous with respect to the u -topology on U (K) defined
in Example 2.6.4.
All the results above remain valid for this kind of metrics. In particular, such
metrics always exist on any line bundle. In the non-archimedean case, the metric
constructed in the proof of Proposition 2.7.5 has these additional properties. In the
archimedean case, the existence of a C ∞ -metric follows from a partition of unity
argument.
In the non-archimedean case, continuity is not so important because the u -topology
is totally disconnected. If K is not locally compact, then continuity does not nec-
essarily imply local boundedness. At any rate, the only relevant property for our
purposes is that the metrics we use are locally bounded.
2.7. Metrized line bundles 61
= (D, )
Definition 2.7.9. The local height associated to the Néron divisor D
on the variety X is given by
λD (P ) := − log sD (P ), P ∈ X \ supp(D).
for any u ∈ M , u|v ∈ MK , and any P ∈ X(K) \ supp(div(f )). This is Weil’s
theorem of decomposition.
2.7.16. In order to define global heights, we assume that the absolute values of
MK satisfy the product formula (cf. 1.4). Let F/K be a subextension of K/K
and denote by MF the set of places w of the field F with w|v for some v ∈ MK .
We normalize the absolute value | |w as in 1.3.6 and 1.3.12. Then for u ∈ M we
have
| |w = | |u[Tw :Kv ]/[F :K] ,
where Tw is the completion Fw if F/K is separable.
2.7.17. Let L be a locally bounded M -metrized line bundle on the K-variety X .
Our goal is to define the associated global height function hL .
Let P ∈ X(K). We choose a finite subextension F/K of K/K with P ∈
X(F ). For every w ∈ MF , let us choose any u ∈ M with u|w . Then we
consider the | |w -norm
w := u[Tw :Kv ]/[F :K]
on the fibre LP (F ). By the compatibility condition in the definition of an M -
metric, this norm is independent of the choice of u .
64 WEIL HEIGHTS
(e) If X is complete (or more generally M-bounded), then hL does not depend
on the choice of the locally bounded M-metric up to bounded functions.
Proof: Property (a) is obvious and (b), (c) follow from Proposition 2.7.10. Claim
(d) is an immediate consequence of Proposition 2.7.11 and (e) follows from
Theorem 2.7.14 or from Proposition 2.7.10 (d).
Remark 2.7.19. By the generalization of Proposition 2.7.5 to several absolute
values, every line bundle L admits a locally bounded M-metric. On complete
varieties, it follows from Proposition 2.7.18 that the corresponding global height
depends only on the isomorphism class of L up to bounded functions. Moreover,
Theorem 2.3.8 holds for arbitrary complete varieties X, Y over K .
In practice, metrics often arise from integral models, as it will transpire in the
following example which assumes that the reader is familiar with the basics of
scheme theory. As the example plays no further role in this book, the reader may
skip it without problems.
Example 2.7.20. Let R be a Dedekind domain with quotient field K . For every maximal
ideal ℘v of R , we fix a discrete valuation | |v on K with ℘v = {x ∈ R | |x|v < 1} .
Again, M is the set of absolute values on K extending those of MK .
2.7. Metrized line bundles 65
We consider a line bundle L on a flat proper reduced scheme X over R . The generic fibre
X = XK is a complete variety over K with line bundle L := LK . The goal is to describe
the natural M-metric L on L induced from L .
Let x ∈ X . Then there is a finite subextension F/K of K/K with x ∈ X(F ) . The inte-
gral closure RF of R in F is again a Dedekind domain ([157], Th.10.7). By the valuative
criterion of properness ([148], Th.II.4.7), there is a unique morphism x : Spec(RF ) → X
mapping the generic point {0} to x . Let u ∈ M restricting to F to the valuation w on F
with corresponding prime ideal ℘w . There is a local nowhere vanishing section s of L in
x(℘w ) . Then s is also defined and non-zero in x (because x(℘w ) is in the closure of x in
X ) and we set
s(x)L,u := 1. (2.20)
If we replace s by another local nowhere vanishing section s in x(℘w ) , then s /s is a
unit in the localization of RF in ℘w (consider the stalk of the sheaf of sections of L in
x(℘w ) ). We conclude that (2.20) determines a well-defined M-metric on L .
To prove that this M-metric is locally bounded, let (E u )u∈M be an M-bounded family
in an open subset U of X and let s ∈ L(U ) be nowhere vanishing. We may cover X
by finitely many affine trivializations Ui of L with nowhere vanishing si ∈ L(Ui ) . For
u ∈ M , we define
Eiu := {x ∈ E u | x(℘w ) ∈ Ui },
using the notation
from above. Clearly, Eiu is contained in the generic fibre Ui of Ui and
we have E = i Ei . For x ∈ Eiu , we have si (x)L,u = 1 and hence
u u
s
log s(x)L,u = log (x) .
si u
So it is enough to prove that (Eiu )u∈M is M-bounded in U ∩Ui . The coordinates on U ∩Ui
are given by the coordinates of U and Ui (cf. proof of Proposition 2.7.5). Clearly, the
coordinates of Ui are M-bounded on (Eiu )u∈M . Let x1 , . . . , xn be a set of coordinates on
Ui over R . Then they are also coordinates of Ui and |xj |u ≤ 1 on Eiu for j = 1, . . . , n
(using x(℘w ) ∈ Ui ). By the generalization of Lemma 2.2.10, (Eiu )u∈M has to be M-
bounded in U ∩ Ui proving that our M-metric is locally bounded.
Let s be an invertible meromorphic section of L . The M -metric L induces a Néron
on D = div(s) and a local height
divisor D
λD (x, u) := − log s(x)L,u
for x ∈ X \ supp(D) and u ∈ M .
Now let x ∈ X(K) \ supp(D) and let u ∈ M with u|v ∈ MK . We will give a geometric
interpretation of λD (x, u) in terms of intersection multiplicities. By renormalization, we
may assume that log |K × | = Z .
There is a similar intersection theory of Cartier divisors on X with cycles, as described in
Section A.9 (cf. [125], 20.2). By flatness, we easily deduce that X is dense in X and that
s extends uniquely to an invertible meromorphic section sL of L . On the other hand, the
closure {x} of x in X is a one-dimensional prime cycle on X . Since x ∈ supp(D) , the
66 WEIL HEIGHTS
2.8.3. Again by the product formula, hAr (P ) is independent of the choice of the
representative x . It follows from Corollary 1.3.2 that the definition hAr (P ) does
not depend on the extension F , i.e. hAr is a well-defined function on PnQ . It is
easy to see that hAr differs from the old height h by a bounded function on PnQ .
In the sense of 2.3.7, they are equivalent.
The preceding remark also follows from Proposition 2.7.18, because hAr is the
global height associated to the locally bounded metrized line bundle on OPn (1)
using the Fubini–Study metrics at the archimedean places and the standard metrics
at the non-archimedean places (cf. 2.7.4).
n
2.8.4. Let W be an m-dimensional subspace of Q . The m th exterior power
n
∧m W is a one-dimensional subspace of ∧m Q . Therefore, we may view W as
m n
n P(∧ Q ). The latter may be identified with
a point PW of the projective space
projective space of dimension m by using standard coordinates.
Definition 2.8.5. hAr (W ) := hAr (PW ) is called the Arakelov height of W .
Definition 2.8.6. Let A be an n×m matrix of rank m with entries in Q. Then the
Arakelov height hAr (A) of A is defined as the Arakelov height of the subspace
n
of Q spanned by the columns of A .
Remark 2.8.7. We can be quite explicit in defining hAr (A). Let I ⊂ {1, . . . , n}
with |I| = m . We denote by AI the m × m submatrix of A formed with the
n
ith rows, i ∈ I , of A . Then the point in P(∧m Q ) corresponding to A is given
by the coordinates det(AI ), where I ranges over all subsets of {1, . . . , n} of
cardinality m . Let F ⊂ Q be a number field containing all entries of A and set
⎧
⎪
⎨max |det(AI )|u if u is non-archimedean,
Hu (A) := I
1/2
⎪
⎩ |det(AI )|2 if u is archimedean.
I
For w ∈ MF , we now set
Hw (A) = Hu (A)[Fw :Qp ]/[F :Q] ,
where p ∈ MQ and u ∈ M are such that w|p and u|w . Then we have
hAr (A) = log Hw (A).
w∈MF
It is also clear that hAr (AG) = hAr (A) for any invertible m × m matrix G with
entries in Q .
Proposition 2.8.8. Let u ∈ M be an archimedean place. Then
Hu (A) = |det(A∗ A)|u1/2 ,
t
where A∗ = A is the adjoint.
68 WEIL HEIGHTS
For completeness, we give the proof for the case of complex matrices. Let u :
Cm −→ Cn be the corresponding linear map and let u∗ be the adjoint map. We
have
∧m (u∗ ) ◦ ∧m (u) = ∧m (u∗ ◦ u).
In the canonical basis, the matrix of ∧m (u) (resp. ∧m (u∗ )) has only one column
(resp. one row) and its entries are det(AI ) (resp. det(AI )), where I ranges over
all subsets of {1, . . . , n} of cardinality m . The matrix of ∧m (u∗ ◦ u) is a 1 × 1
matrix with entry det(A∗ A), proving Binet’s formula.
Remark 2.8.9. Let A be n × m submatrix of rank m with entries in Q and let
B, C be complementary submatrices of type n × m1 and n × m2 , respectively.
For any w ∈ MF , we have
Hu (A) ≤ Hu (B) Hu (C)
and thus
hAr (A) ≤ hAr (B) + hAr (C).
For a non-archimedean u , this follows easily from Laplace’s expansion. If instead
u is archimedean, it is a consequence of Proposition 2.8.8 and Fischer’s inequality
∗
B B B∗C
det ≤ det(B ∗ B) det(C ∗ C)
C ∗B C ∗C
from linear algebra (see e.g. L. Mirsky [206], Th.13.5.5), which extends Hadamard’s
inequality.
n
Proposition 2.8.10. Let W be an m -dimensional subspace of Q and let W ⊥
n n
be its annihilator in the dual (Q )∗ = Q . Then hAr (W ⊥ ) = hAr (W ).
n
Proof: Let V be the vector space Q . Any element x of ∧m (V ) defines a linear
map ψ(x) : y → x ∧ y from ∧n−m V to ∧n (V ), or in other words an element
ϕ(x) of ∧n (V ) ⊗ ∧n−m (V ∗ ). The map
ϕ : ∧m (V ) −→ ∧n (V ) ⊗ ∧n−m (V ∗ )
is an isomorphism, which maps each element of the canonical basis of ∧m (V ) to
± an element of the canonical basis of ∧n (V ) ⊗ ∧n−m (V ∗ ). The line ∧m (W )
is mapped by ϕ to the line ∧n (V ) ⊗ ∧n−m (W ⊥ ), since, for any non-zero x
in ∧m (W ), the kernel of ψ(x) is the subspace of ∧n−m (V ) generated by the
elements of the form w ∧z , with w ∈ W and z ∈ ∧n−m−1 (V ). We conclude that
the coordinates of ∧m (W ) in P(∧m (V )) are, up to a sign, equal to the coordinates
of ∧n−m (W ⊥ ) in P(∧n−m (V ∗ )). This proves the claim.
2.8. Heights on Grassmannians 69
2.8.16. Let u ∈ M . Now define for x, y ∈ (Qu )n+1 \{0} the projective distance
of x , y with respect to u to be
Hu (x ∧ y)
δu (x, y) = .
Hu (x)Hu (y)
70 WEIL HEIGHTS
For the proof, we refer to K.K. Choi and J.D. Vaaler [66].
2.8.20. Let O be the point (1 : 0) on P1Q . The Arakelov height of P ∈ P1Q is
now given by the elegant formula
1
hAr (P ) = log .
δw (P, O)
w∈MF
In 1929 C.L. Siegel, in the course of his work on diophantine approximation and
transcendency, formally stated what is known today as Siegel’s lemma
Lemma 2.9.1. Let aij , i = 1, . . . , M , j = 1, . . . , N be rational integers, not
all 0, bounded by B and suppose that N > M . Then the homogeneous linear
system
a11 x1 + a12 x2 + ··· + a1N xN = 0
a21 x1 + a22 x2 + ··· + a2N xN = 0
· · · ··· · · ·
aM 1 x1 + aM 2 x2 + ··· + aM N xN = 0
Siegel’s lemma and its numerous variants have become a fundamental tool in dio-
phantine approximation and transcendency.
Proof: The M × N matrix (aij ) is denoted by A . Clearly, we may assume that
no row is identically 0. For a positive integer k , let us consider the set
" #
T := x ∈ ZN | 0 ≤ xi ≤ k, i = 1, . . . , N .
+
We denote by Sm the sum of the positive entries in the m th row of A , and sim-
−
ilarly by Sm the sum of the negative entries. Then for x ∈ T and y := Ax we
have
−
kSm ≤ ym ≤ kSm +
.
Let " #
T := y ∈ ZM | kSm
−
≤ ym ≤ kSm
+
, m = 1, . . . , M .
−
Writing Bm := maxn |a mn |, we have Sm − Sm ≤ N Bm and we conclude that
+
the set T has at most m (N kBm + 1) elements. Now we choose k so that T
has more elements than T , namely
(N kBm + 1) < (k + 1)N . (2.26)
m
2.9. Siegel’s lemma 73
1
If we choose k to be the integer part of m (N Bm ) N −M and use N kBm + 1 <
N Bm (k + 1), then we easily verify that inequality (2.26) is fulfilled.
By Dirichlet’s pigeon-hole principle, there are two different points x , x ∈ T
with Ax = Ax . The point x := x − x is a solution of Ax = 0 in integers,
with maxn |xn | ≤ k .
Corollary 2.9.2. Let K be a number field of degree d contained in C with | |
the usual absolute value on C . Let M, N ∈ N, 0 < M < N . Then there are
positive constants C1 , C2 such that for any non-zero M ×N matrix A with entries
amn ∈ OK , there is x ∈ OK N
\ {0} with A · x = 0 and
M
H(x) ≤ C1 (C2 N B) N −M ,
where B := supσ,m,n |σ(amn )| with σ ranging over the embeddings of K into
C.
Proof: Let ω1 , . . . , ωd be a Z -basis of the ring of algebraic integers OK . The
entries of A may be written in the form
d
amn = (j)
amn ωj , (j)
amn ∈ Z. (2.27)
j=1
d (k)
Using xn = k=1 xn ωk , we get
N
d
d
N
d
(l)
(A · x)m = (j)
amn ωj ωk x(k)
n =
(j)
amn bjk x(k)
n ωl ,
n=1 j,k=1 l=1 n=1 j,k=1
d
Let A be the (M d) × (N d) matrix
(l)
where ωj ωk = l=1 bjk ωl .
⎛ ⎞
d
A := ⎝ bjk ⎠
(j) (l)
amn
j=1
with rows indexed by (m, l), columns indexed by (n, k), and let y ∈ ZN d be the
(k)
vector (xn ). Then the Siegel lemma above gives a non-zero integer solution y
of A · y = 0 with
NM−M
(l)
H(y) ≤ N d max |amn | max |bjk |
2 (j)
.
m,n,j j,k,l
d
for a suitable positive constant C2 . Then xn =
(k)
k=1 xn ωk yields H(x) ≤
C1 H(y) and using C2 := C2 d2 maxj,k,l |bjk |, we get the claim.
(l)
2.9.3. The goal of this section is to give an improved version of Siegel’s lemma for
a given number field K of degree d . Often, the simple version above is sufficient
for applications and the reader may skip the remaining part of this section in a first
reading.
After stating the result in Theorem 2.9.4, we give a series of immediate corollaries,
which are quite useful in practice; even in the case K = Q , we get an improve-
ment of the elementary form of Siegel’s lemma proved above. The proof of the
generalized Siegel’s lemma will be done in several steps. First, we recall for com-
pleteness some basic results in the geometry of numbers, referring to Appendix
C for the proofs. Then we use Minkowski’s second main theorem to construct a
“small” basis in the range of a N × M matrix A of rank M . In order to find a
“small” basis of solutions of our original equation Ax = 0, we apply this result
to the matrix A whose columns are formed by any basis of solutions. As A and
A have the same Arakelov height, we will get the generalized Siegel’s lemma.
Finally, we give a relative version where the entries are in a finite extension F of
K but the solutions are required to be in K .
We now state Siegel’s lemma over a number field K of degree d and discriminant
DK/Q , in the form given by E. Bombieri and J.D. Vaaler [35].
Theorem 2.9.4. Let A be an M × N matrix of rank M with entries in K . Then
the K-vector space of solutions of Ax = 0 has a basis x1 , . . . , xN −M , contained
N
in OK , such that
−M
N
N −M
H(xl ) ≤ |DK/Q | 2d HAr (A).
l=1
Proof: There is an R×N submatrix A of rank R . Then A and A have the same
kernel and the same Arakelov height. The result follows by applying Theorem
2.9.4 to A .
2.9.8. If Am is the m th row of A , then Remark 2.8.9 shows that
row
HAr (A) ≤ HAr (Am ),
m
If we compare with the original Siegel’s lemma √in the case K = Q , we have
improved the result by replacing the factor N by N in the final estimate.
2.9.10. The proof of Theorem 2.9.4 uses geometry of numbers over the adeles,
and we shall refer to Appendix C for details. Here we will limit ourselves to basic
definitions and results.
Let Kv be the completion of the number field K with respect to the place v ∈ MK
and let v|p ∈ MQ . Then Kv is a locally compact group (see 1.2.12) with Haar
measure uniquely determined up to a scalar. We normalize this Haar measure as
follows:
76 WEIL HEIGHTS
2.9.14. For the proof of Siegel’s lemma 2.9.4 we choose the sets Sv as follows.
First, let QN N
v be the unit cube in Kv of volume 1 with respect to the Haar
measure βv . Explicitly, this is given by
⎧
⎪
⎨max xn v < 2
1
if v is real
N
Qv := max xn v < 2π 1
if v is complex
⎪
⎩
max xn v ≤ 1 if v is non-archimedean
using the normalization from 1.3.6 with respect to K/Q . Let A be an N × M
matrix of rank M with entries in K . Then we set
Sv := {y ∈ KvM | Ay ∈ QN
v }.
If v is archimedean, then Sv is a non-empty, convex, symmetric, bounded open
subset of KvM ; with respect to the injective map x → Ax , the image of Sv is a
2.9. Siegel’s lemma 77
U −V
A =
V U
and also $ %
1
QN
v =(u, v) ∈ R2N | u2j + vj2 < .
2π
Now we apply Theorem C.3.8 with n1 = · · · = nN = 2 to get
1
βv (Sv ) ≥ det(A A )− 2 .
t
I
W = M ,
W
78 WEIL HEIGHTS
whence
βv (Sv ) ≥ |DK/Q |− 2 HAr (A)−d .
M
v
Now Minkowski’s second theorem yields
M
λ1 · · · λM ≤ 2M |DK/Q | 2d HAr (A). (2.28)
2.9. Siegel’s lemma 79
It remains to connect the successive minima with the basis we want to find. With
our specific sets Sv, we form the K-lattice Λ as in 2.9.11 and identify Λ with its
image Λ∞ in E∞ . Let y ∈ K M be a lattice point in λS for some λ > 0 and let
x := Ay . Then the definition of S = v|∞ Sv gives maxn xn v < λ/2 if v is
real, maxn xn v < λ2 /(2π) if v is complex, and maxn xn v ≤ 1 if v is not
archimedean. Thus we have
s
λ 2 d
H(Ay) < . (2.29)
2 π
There are linearly independent lattice points y1 , . . . , yM ∈ K M such that ym ∈
λm S , for m = 1, . . . , M . Then (2.28) and (2.29) prove what we want, with
xm = Aym .
Proof of Theorem 2.9.4: Let A be an N × (N − M ) matrix whose columns form
a basis of the kernel of A . Clearly, A has rank N − M and the image of A is the
same as the kernel of A . By Corollary 2.8.12, A and A have the same Arakelov
height. Now Proposition 2.9.18 applied to A gives a basis x1 , . . . , xN −M of the
kernel of A such that
N−M
Md s
2 M
H(xl ) ≤ |DK/Q | 2d HAr (A).
π
l=1
This completes the proof of Theorem 2.9.4.
Finally, we give a relative version of Siegel’s lemma.
Theorem 2.9.19. Let K be a number field of degree d and discriminant DK/Q
and let F be a finite-dimensional field extension of K of degree r := [F : K].
Let A be an M × N matrix with entries in F and assume rM < N . Then there
exist N − rM K-linearly independent vectors xl ∈ OK N
such that
Axl = 0, l = 1, 2, . . . , N − rM
and
−rM
N
M
N −r M
H(xl ) ≤ |DK/Q | 2d HAr (Ai )r ,
l=1 i=1
where Ai is the ith row of A .
Proof: Let ω1 , . . . , ωr be a basis of F/K . For the entries of A = (amn ), we have
r
(j)
amn = amn ωj
j=1
(j)
for uniquely determined amn ∈ K . Let A(j) be the M × N matrix with entries
(j)
amn . Then for x ∈ K , the equation Ax = 0 is equivalent to the system of
N
be the distinct embeddings of F into K over K and let Ω×Ω be the r ×r matrix
with entries the M ×M matrices Ωij = σi (ωj )IM , where IM is the M ×M unit
matrix. Thus Ω is an rM ×rM matrix built up by r2 blocks of M ×M matrices.
By Remark B.1.15, we have DF/K = det(σi (ωj ))2 , whence Ω is invertible and
its inverse is again formed by r2 blocks of multiples of IM . From our definitions,
we also see that
⎛ ⎞
σ1 A
⎜ ⎟
A := ⎝ ... ⎠ = ΩA . (2.30)
σr A
We apply Corollary 2.9.7 to A . If R denotes the rank of A (and hence also of
A ), we get a basis x1 , . . . , xN −R of the kernel of A over K , contained in OK ,
such that
N −R
N −R
H(xl ) ≤ |DK/Q | 2d HAr row
(A ).
l=1
By (2.30) and the fact that Ω is a non-singular matrix, it is also clear that A and
A have the same kernels and
row
HAr (A ) = HAr
row
(A ).
By 2.9.8, we conclude that
−R
N
N −R
H(xl ) ≤ |DK/Q | 2d HAr (Ai ),
l=1 i
where i ranges over R linearly independent rows of A . We rearrange our basis
xl by increasing height. Then
N−rM N−R
NN−r
N −r M
M
−R
rM
H(xl ) ≤ H(xl ) ≤ |DK/Q | 2d HAr (Ai ).
l=1 l=1 i=1
are treated first only in the case of projective varieties, using the join operation
on projective embeddings and the fact that very ample line bundles generate the
Picard group to obtain the group structure on global heights. This explicit approach
to heights is less elegant than a more abstract one, but is sufficiently constructive
to be usable for the explicit estimates in Section 2.5, which appear to be new.
The general statement of Northcott’s theorem is in a basic foundational paper by
A. Weil [328], following earlier work of D.G. Northcott [227], [228]. The general
theory of local heights is due to A. Néron [218]. We present it in Section 2.7, but
instead of using his quasi-functions, we use the modern language of metrized line
bundles promoted by Arakelov theory.
A systematic theory of heights for subspaces of linear spaces was developed for
the first time by W.M. Schmidt in [263]. Our exposition is mainly based on [35].
Propositions 2.8.8 and 2.8.10 were communicated to us by Oesterlé. The distorsion
factor 2.8.15 is introduced in E. Bombieri, A.J. Van der Poorten, and J.D. Vaaler
[37]; the projective distance is of course much older.
Already in 1909 A. Thue [298] used the pigeon-hole principle to find small in-
teger solutions to linear systems of equations. C.L. Siegel [283] gave the bound
(N B)M/(N −M ) , but it turns out that the cleaner bound (N B)M/(N −M ) re-
quires only a minor
√ modification in his proof, see A. Baker’s monograph [14]. The
improvement ( N B)M/(N −M ) obtained here seems to be of no consequence
for the applications we will make in this book.
Minkowski’s second theorem in the adelic setting is due to R.B. McFeat [198]. If
we ask for solutions in Q, then we can get a bound independent of the discriminant
DK/Q in Theorem 2.9.4, see D. Roy and J.L. Thunder [247], [248] (note that this
cannot be done for solutions in the field K ). This is quite important in some cases,
see the bibliographical notes to Chapter 7.
3 L I N E A R TO R I
3.1. Introduction
This short chapter contains simple but basic material about the algebraic group
Gnm over a field K of characteristic 0. Rather than developing a theory in a
more general context (as it will be done in Section 8.2) and deriving our results as
special cases, our aim in this section has been to give a down-to-earth elementary
treatment of Gnm , its subgroups and maximal subgroups of subvarieties of Gnm .
In order to read this chapter, the reader should be familiar with basic concepts
about varieties as provided by the first sections of Appendix A. We will also apply
Siegel’s lemma in 2.9.4 over Q to get effective bounds for certain normalization
matrices. The theory of linear tori will be used in Chapter 4.
3.2.1. As an affine variety, we identify G := Gnm with the Zariski open subset
x1 x2 · · · xn = 0
of affine space AnK , with the obvious multiplication
(x1 , x2 , . . . , xn ) · (y1 , y2 , . . . , yn ) = (x1 y1 , x2 y2 , . . . , xn yn ).
The element 1n = (1, 1, . . . , 1) is the identity of the group structure.
3.2.2. An algebraic subgroup of G is a Zariski closed subgroup, and a linear
torus H is an algebraic subgroup which is geometrically irreducible. Note that
we require that an algebraic subgroup is always defined over the fixed ground field
K of characteristic 0. We will later show that any algebraic subgroup is defined
by polynomials with coefficients in Q (Corollary 3.2.15).
Let H1 , H2 be algebraic subgroups of Gnm1 and Gnm2 . Then ϕ : H1 → H2 is
called a homomorphism (of algebraic subgroups) if ϕ is a morphism of algebraic
varieties which is also a group homomorphism. In Corollary 3.2.8, we prove that
a linear torus in G of dimension r is isomorphic to Grm .
A torus coset is simply a coset gH of a linear torus H of positive dimension.
82
3.2. Subgroups and lattices 83
We use a lemma of Mahler, which gives control of a basis of a lattice from the
knowledge of a maximal set of independent vectors.
Lemma 3.2.11. Let M be a subgroup of Zn of rank m and let λ1 , . . . , λm ∈ M
be independent vectors with norm at most d , generating a subgroup Λ ⊂ M . For
i = 1, . . . , m, let Vi be the real span of λ1 , . . . , λi and define Mi = M ∩ Vi ,
hence M = Mm . Then:
(a) [M : Λ] ≤ dm ;
(b) there are v1 , . . . , vm ∈ M such that for every i the vectors v1 , . . . , vi
form a basis of Mi and vi ≤ md .
Proof: Since Mi is a discrete subgroup of Rn , it has a positive 1 -distance from
Vi−1 and there is vi = t1 λ1 + . . . + ti λi ∈ Mi of minimal distance to Vi−1 with
3.2. Subgroups and lattices 85
ai,λ χ = 0.
χ λ∈Li , χ
Proof: It is enough to prove the claim for K algebraically closed, because Corol-
lary 3.2.15 shows that every algebraic subgroup is defined by polynomials with
coefficients in Q . We have to prove that ϕ(H1 ) is Zariski closed in H2 . By
Proposition 3.2.7, we may assume that H1 is a linear torus and so we may sup-
pose that H1 = Gnm (Corollary 3.2.8). Replacing H2 by the Zariski closure of
ϕ(H1 ), which is a linear torus of dimension r and hence isomorphic to Grm , it
remains to show that a dominant homomorphism ϕ : Gnm → Gm m of linear tori is
surjective.
By Proposition 3.2.17, there are µ1 , . . . , µm ∈ Zn with ϕ(x) = (xµ1 , . . . , xµm ).
Let B the matrix with columns µ1 , . . . , µm , then generalizing 3.2.3 we set ϕ(x) =
ϕB (x). Since ϕ(Gnm ) is dense, it is immediate that B has rank m . By lin-
ear algebra, there are R ∈ GL(m, Q) and S ∈ GL(n, Q) with B = RES ,
where E = (Im , 0, . . . , 0) with Im the unit matrix of rank m . If we had
R ∈ GL(m, Z) and S ∈ GL(n, Z), then this would immediately lead to sur-
jectivity of ϕB = ϕR ◦ ϕE ◦ ϕS . In general, the argument gives a k ∈ Z \ {0}
m there is an x ∈ Gm with ϕB (x) = y , which is enough
such that for all y ∈ Gm n k
to prove surjectivity.
Theorem 3.2.19. The following statements hold:
In this section we study algebraic subgroups and their cosets contained in a closed
subvariety X of G := Gnm defined over a number field K .
3.3.1. We say that X is defined by polynomials of degree at most d , if X is
the set of zeros of a finite collection of polynomials fi (x) of degree at most d ,
with coefficients in K . The essential degree δ(X) of X is the minimum integer
d ≥ 1, such that X is defined by polynomials of degree at most d .
Note that, if X is defined over K by polynomials of degree at most d , then it is
also defined by polynomials of degree at most d with coefficients in K . In fact,
let K ⊃ K be a field of definition for the polynomials fi ; by considering the
traces Tr(ωfi (x)) with ω ∈ K , we obtain a set of equations defined over K (see
A.4.13).
Proposition 3.3.2. Let ϕA be a monoidal transformation or more generally any
homomorphism ϕA : Gm m → Gm induced by a matrix A as in 3.2.3. Then
n
k
δ Xi ≤ k max δ(Xi )
i=1,...,k
i=1
+
k
δ Xi ≤ max δ(Xi )
i=1,...,k
i=1
δ(ϕ−1
A (X)) ≤ A δ(X).
Moreover, if X is irreducible of degree d (i.e. its closure in PnK has degree d ),
then δ(X) ≤ d .
Proof: If Xi is defined by polynomials fij (x), j ∈ Ji , then Xi is defined by
polynomials
f1j1 (x) · · · fkjk (x) (ji ∈ Ji ).
3.3. Subvarieties and maximal subgroups 89
This proves the first inequality. The second inequality is obvious and the third
follows easily from the definition of A given before Proposition 3.2.10.
For the last statement, it suffices to consider all linear cones of dimension n − 1
over X , and note that each cone, of codimension 1 in An and of degree at most
d , is defined by a single polynomial equation of the same degree.
3.3.3. If X is defined by polynomials of degree at most d ≥ 1 and has pure
dimension r = dim(X), by Bézout’s theorem (see [125], Ex.8.4.6) every irre-
ducible component of X has degree at most dn−r and in particular is defined by
polynomials of degree at most dn−r .
Proposition 3.3.4. The number of algebraic subgroups H of G with δ(H) ≤
2
d does not exceed (4ed)n . If H is such a subgroup and H is an algebraic
subgroup of H of finite index, then δ(H ) ≤ nd and H/H is a finite group of
order at most dn .
Proof: Let H be an algebraic subgroup of G of dimension r defined by poly-
nomials of degree at most d . Then Proposition 3.2.14 shows that H is defined
by monomial equations xλ = 1 with λ = (λ1 , . . . , λn ), a vector which is the
difference of two vectors, with non-negative components and norm at most d ; in
particular, the norm of λ is bounded by λ ≤ d . By Corollary 3.2.15, H = HΛ
with Λ a subgroup of Zn of rank n − r , generated by elements λ as above. Let
also H = HM , so that Λ ⊂ M has finite index in M .
By Lemma 3.2.11 and Corollary 3.2.9, we have |H/H | = [M : Λ] ≤ dn and M
has a basis of vectors of norm at most nd , hence H is defined by polynomials
of degree at most nd . Moreover, Λ has a basis vi (i = 1, . . . , n − r) such that
vi ≤ nd . The number of such vectors does not exceed
n nd + n (2nd)n
2 ≤ 2n < (4ed)n ,
n n!
because n! > (n/e)n . It follows from this that the number of subgroups Λ does
2
not exceed (4ed)n .
Proposition 3.3.5. Let X be a closed subvariety of G . Then every algebraic
subgroup H ⊂ X is contained in a maximal algebraic subgroup H ⊂ X .
Proof: Without loss of generality, we may assume that every algebraic subgroup
H1 with H ⊂ H1 ⊂ X has the same dimension as H , say r , and that r < n.
In order to prove the proposition, we need to show that there is no infinite chain
H1 H2 H3 · · · of algebraic subgroups contained in the subvariety X .
By Theorem 3.2.19, we have Hi = HΛi for certain subgroups Λi of Z, n
of rank
n − r , satisfying Λ1 Λ2 Λ3 · · · . The intersection M = Λi is a
subgroup of rank strictly less than n − r (note that [Λi : Λi+1 ] ≥ 2 for every i),
therefore the corresponding subgroup HM has dimension at least r + 1.
90 L I N E A R TO R I
Let H be the Zariski closure of Hi . Then H is an algebraic subgroup of G
and it is the smallest algebraic subgroup containing all subgroups Hi . By Theorem
3.2.19, we conclude that H = HM , because M is the largest subgroup contained
in every Λi . Since each Hi ⊂ X , the Zariski closure of Hi is also contained
in X and we conclude that H1 ⊂ HM ⊂ X with dim(HM ) > dim(H1 ). This is
a contradiction.
The same argument proves:
Proposition 3.3.6. The torsion points of an algebraic subgroup H of G are
Zariski dense in H .
Proof: By Corollary 3.2.15, there is a subgroupΛ of Zn with H = HΛ . Let
Hi = {x ∈ H | xi! i!
1 = 1, . . . , xn = 1}. Then Hi is the set of torsion points
of H .,By Theorem 3.2.19, we have Hi = HΛi with Λi = i! · Zn + Λ , and
M = Λi is Λ using the theorem of elementary divisors ([49], Ch.VII, §4, no.3,
Th.1).
As noted in the proof of the preceding proposition, the Zariski closure of
Hi is HM .
Definition 3.3.7. Let X ⊂ G be a closed subvariety of G defined by polynomials
of degree at most d . We define
X∗ = X − {all torsion cosets ⊂ X} ,
X◦ = X − {all torus cosets ⊂ X} .
Remark 3.3.10. The proof gives a matrix A ∈ SL(n, Z) with A ≤ n3 δ(H)n−r
2
and A−1 ≤ n2n−1 δ(H)(n−1) .
92 L I N E A R TO R I
The material in this chapter follows closely the treatment given in Schmidt’s paper
[273], with several additions. For the general theory of linear algebraic groups, we
refer to A. Borel [41] and M. Demazure and P. Gabriel [85].
4 SMALL POINTS
4.1. Introduction
We identify Gm with the affine line punctured at the origin, together with the
usual multiplication. Definitions and notation will be as in Chapter 3, with the
additional assumption that K is a number field. Note also that the statement of
Theorem 3.3.8, (c) has not yet been proved at this stage.
The standard height of x = (x1 , . . . , xn ) ∈ Gnm is
n
h(x) := h(xi )
i=1
h(P m ) = |m| ·
(a) Homogeneity and symmetry: h(P ) for m ∈ Z.
(b) Non-degeneracy:
h(P ) = 0 if and only if P is a torsion point.
(c) Triangle inequality:
h(P Q−1 ) ≤
h(P ) +
h(Q).
(d) Finiteness: There are only finitely many points P ∈ Gnm such that [Q(P ) :
Q] and h(P ) are both bounded.
These properties are clear from the corresponding properties of the Weil height,
with (b) and (d) following from the theorems of Kronecker and Northcott (see
1.5.9, 2.4.9). Thus
h(P Q−1 ) is a translation invariant semidistance d(P, Q) on
Gm and actually a translation invariant distance on Gnm /tors .
n
Theorem 4.2.2. Let X be a closed subvariety of Gnm defined over a number field
K and let X ∗ be the complement in X of the union of all torsion cosets εH ⊂ X .
Let fi ∈ K[x] be a set of polynomials of degree at most d defining X . Then:
(a) The number of maximal torsion cosets in X is finite and bounded in terms
of d and n alone. Moreover, every maximal torsion coset has the form
ζH , where the orders of the torsion points ζ and the essential degrees of
the linear tori H are also bounded in terms of d and n .
(b) The height of points P ∈ X ∗ has a positive lower bound, depending on n ,
d , [K : Q], and max h(fi ).
The results (a) and (b) are effective.
have
h(ξ) = h(a) + h(b) > 0, and we can make it arbitrarily small by choosing
a and b , for example a = b = 21/m with m → ∞ .
Remark 4.2.7. D. Zagier [336] has obtained the optimal lower bound
√
1 1+ 5
h(ξ) ≥ log
2 2
for non-torsion solutions ξ of 1 + x + y = 0. Equality is attained for x or y , a
primitive 10th root of unity, and this minimum is isolated.
d+n
log |ζ|v = log |f (ξ1p , . . . , ξnp )|v ≤ pd log+ |ξi |v + log |f |v + εv log ,
i=1
n
because the number of monomials in f does not exceed d+n n .
4.2. Zhang’s theorem 97
Summing over all v ∈ MK and using v|p log |p|v = − log p from Lemma
1.3.7, we infer
d+n
0= log |ζ|v ≤ − log p + pd
h(ξ) + h(f ) + log .
n
v∈MK
Proof: The easy proof is by induction on the number of prime factors of m , writing
m = pm . If f (ξ m ) = 0, we apply the corollary inductively with m in place of
m . If instead f (ξ m ) = 0, we apply Lemma 4.2.8 to the point ξ m in place of ξ
and note that
h(ξ m ) = m h(ξ).
4.2.10. Proof of Theorem 4.2.2 (b) and the following partial result of 4.2.2 (a):
4.2.2 (a ) The union of torsion cosets of X is contained in finitely many maximal
torsion cosets of the form ζM , where the orders of the torsion points ζ and the
essential degrees of the linear tori M are bounded in terms of d, n, [K : Q], and
max h(fi ).
d+n
bounded by d and let r := |M| = n . For s ∈ {1, . . . , r}, consider the s × r
matrix
⎛ ⎞
xm1
⎜ ⎟
As (x) := ⎝ ... ⎠ ,
m
xs m∈M
s := 1 + max rank Ar (ξ 1 , . . . , ξ r ).
ξ1 ,...,ξr ∈Eγ
S ∗ := ξ ∗1 × · · · × ξ ∗s−1 × Gnm
s−1
ps (S ∗ ∩ εH) = {x ∈ Gnm | xλs = ελ (ξ ∗i )−λi ,
i=1
λ = (λ1 , . . . , λs ) ∈ Λ} = ξ ∗s H ,
100 SMALL POINTS
We conclude that
ps (S ∗ ∩ εH) = ηj M
j
h(ξ ) ≤
h(η j ) +
h(ξ) =
h(ξ∗s ) +
h(ξ) ≤ 2γ.
Finally, we may assume γ ≤ γ /2. By the inductive step, we conclude that the
number of ξ ∈ Eγ ∩η j M with (ξ ∗1 , . . . , ξ ∗s−1 , ξ) ∈ εH is bounded by N . Since
the number of torus cosets η j M is bounded by a controlled N , there are at most
N N points ξ ∈ Eγ with (ξ ∗1 , . . . , ξ ∗s−1 , ξ) ∈ εH .
If ξ ranges over Eγ , then we have seen that the points
(ξ ∗1 , . . . , ξ ∗s−1 , ξ)
are contained in various maximal torus cosets εH of Ys . For every such εH , we
fix ξ ∗s ∈ Eγ with (ξ ∗1 , . . . , ξ ∗s ) ∈ εH and apply our preceding procedure. If N
is the controlled number of such maximal torsion cosets, then Eγ contains at most
N = N N N points. Since N is controlled, this proves the induction step.
4.2.12. Completion of the proof of Theorem 4.2.2 (a): By Theorem 3.3.8 (b),
we already know that every maximal torsion coset in X has the form εH with
δ(H) ≤ nd .
In proving Zhang’s theorem 4.2.2 (a), it suffices to deal with the maximal torsion
points of X , namely maximal torsion cosets of dimension 0. To see this, we use
Theorem 3.3.9 as follows. We have to find all maximal torsion cosets εH ⊂ X .
Assume that dim(H) ≥ 1. Then εH is also a torus coset. By Theorem 3.3.8
(b), we can fix the linear torus H . Now, using Theorem 3.3.9 (c), the maximal
4.3. The equidistribution theorem 101
Another approach is due to L. Szpiro, E. Ullmo, and S. Zhang [296], and Yu.
F. Bilu [24]. The idea is that points of small height under the action of Galois
conjugation tend to be equidistributed with respect to a suitable measure.
For a in the multiplicative group C× of C , let δa be the usual Dirac measure at
a and for ξ ∈ Q let
1
δξ = δσξ
[Q(ξ) : Q]
σ:Q(ξ)→C
be the probability measure supported at all complex conjugates of ξ , with equal
mass at each point.
In order to understand the following considerations, we recall some basic facts
from functional analysis and measure theory.
Let X be a locally compact Hausdorff space and let Cc (X) be the space of
complex-valued continuous functions on X with compact support, endowed with
the supremum norm. Then the Riesz representation theorem says that for every
continuous linear functional Λ on this normed space there exists a unique com-
plex regular Borel measure µ such that
Λ(f ) = f dµ, f ∈ Cc (X).
X
Moreover, the operator norm of Λ equals |µ|(X), where |µ| is the total variation
of the measure µ. For details, we refer to W. Rudin [252], Th.6.19.
102 SMALL POINTS
It is also clear that every complex regular Borel measure µ on X yields a contin-
uous linear functional on Cc (X). Therefore, the space of complex regular Borel
measures on X is the dual of Cc (X) in the sense of functional analysis, and we
denote it by Cc (X)∗ . The weak-* topology on Cc (X)∗ is the coarsest topology
-
on Cc (X) such that, for every f ∈ Cc (X), the linear functional µ → X f dµ
is continuous. The Banach–Alaoglu theorem (see W. Rudin [251], 3.15) says that
the unit ball " #
µ ∈ Cc (X)∗ | |µ|(X) ≤ 1
is weak-* compact.
In what follows, we apply these concepts to X = C× .
We have the following result of Bilu:
Theorem 4.3.1. Let (ξi )i∈N be an infinite sequence of distinct non-zero algebraic
numbers such that h(ξi ) → 0 as i → ∞ . Then the sequence (δ ξi )i∈N converges
in the weak-* topology to the uniform probability measure µT := dθ/(2π) on the
unit circle T := {eiθ | 0 ≤ θ < 2π} in C .
Proof: By the weak-* compactness of the unit ball in Cc (C× )∗ , it is enough to
show that any convergent subsequence of the sequence (δ ξi )i∈N has limit µT .
Thus we may assume that the measures δ ξi converge in the weak-* topology to a
Borel measure µ, and we have to show that µ = µT .
Let µ be a weak-* limit of the measures δ ξi . Let a0i and di be the leading
coefficient and degree of a minimal equation for ξi . Since the ξi are distinct,
Northcott’s theorem in 1.6.8 shows that di → ∞ .
As in 1.5.7, log+ is the maximum of 0 and log . By Propositions 1.6.5 and 1.6.6
and the hypothesis h(ξj ) → 0, we have
1 1 +
h(ξi ) = log |a0i | + log |σξi | → 0 (4.2)
di di σ
as i → ∞ ; this implies log |a0i | = o(di ) and σ log+ |σξi | = o(di ), where σ
ranges over all embeddings of Q(ξi ) into C .
By weak-* convergence we deduce
1
f (σξi ) log |σξi | →
+
f (z) log+ |z| dµ(z)
di σ C
for any continuous function f (z) with compact support in C× . Thus (4.2) shows
that
f (z) log+ |z|dµ(z) = 0
C
and µ must be supported in the unit disk |z| ≤ 1. Since h(1/ξi ) = h(ξi ) → 0,
working with the sequence (1/ξi )i∈N we deduce in a similar fashion that µ is
4.3. The equidistribution theorem 103
supported in |z| ≥ 1. Thus any limit measure µ has support in the unit circle T .
By weak-* convergence again, we conclude that µ is a probability measure.
Let Di be the discriminant of a minimal equation for ξi . By Proposition 1.6.9, we
have
1
log |Di | ≤ log di + (2di − 2)h(ξi ).
di
Therefore, we get
0 ≤ log |Di | = (2di − 2) log |a0i | + log |σξi − σ ξi | = o(d2i ). (4.3)
σ=σ
A fortiori, we get
1 1
lim sup φε (|σξi − σ ξi |) log ≤ 0,
i→∞ d2i
|σξ i − σ ξi |
σ=σ
because log(1/t) > 0 if φε (t) < 1, while φε (t) ≤ 1 always. By weak-* conver-
gence, we infer that
1
φε (|x − y|) log dµ(x) dµ(y) ≤ 0
T 2 |x − y|
for 0 < ε ≤ 1. Since µ is a continuous measure, the diagonal has measure 0 and
monotone convergence shows that (4.5) holds.
104 SMALL POINTS
Equality holds only if µ(n) = 0 for n = 0, hence (4.5) proves that µ is the
uniform measure on T (if a regular Borel measure has all its Fourier coefficients
0, it must be the zero measure).
Remark 4.3.2. We have stated Bilu’s theorem with the condition that the algebraic
numbers ξi are all distinct. The proof of the theorem shows that the only thing
that matters here is that di → ∞ . By the assumption h(ξi ) → 0 and Kronecker’s
theorem in 1.5.9, we may relax the hypothesis of the theorem to h(ξi ) → 0 and
the condition that no root of unity in the sequence (ξi )i∈N is repeated infinitely
often.
4.3.3. Now we give a second proof of Zhang’s theorem 4.2.2 (b). In the form given
here, this proof is non-constructive (it depends on compactness arguments), hence
it gives only the existence of a positive lower bound, but not an explicit dependence
on n , d , [K : Q], and max h(fi ).
4.3. The equidistribution theorem 105
1 ) · · · f (xn )
fχ (x) = f (xm 1 mn
We would like to replace fχ (x) by f (χ(x)) in the last sum, but this step requires
justification, because f (χ(x)) is not compactly supported in (C× )n . Let m =
max
|mj| and M = max |f |. By definition of f , we have fχ (x) = f (χ(x)) if
log |xj | ≤ c/m for every j , and |fχ (x)| ≤ M n in any case.
We have
1 1
fχ (σξ i ) = f (τ χ(ξ i ))
[Q(ξ i ) : Q] [Q(χ(ξ i )) : Q]
σ:Q(ξi )→C τ :Q(χ(ξi ))→C
1
+ (fχ (σξ i ) − f (χ(σξ i ))) .
[Q(ξ i ) : Q]
σ:Q(ξi )→C
106 SMALL POINTS
A typical summand in the last sum is 0 unless log |σξij | > c/m for some j ,
and in any case does not exceed M + M n . Thus the last sum does not exceed
n
1
n
1
(M +M n ) ≤ (M +M n ).
j=1
[Q(ξ i ) : Q] σ :Q(ξi )→C j=1
[Q(ξij ) : Q] σ :Q(ξ i j )→C
| log |σ ξ i j ||> c / m | log |σ ξ i j ||> c / m
Since h(ξ i ) → 0, by (4.2) on page 102 and Remark 4.3.2, this tends to 0 as
i → ∞ , and proves
1 1
lim fχ (σξi ) = lim f (τ χ(ξ i ))
i→∞ [Q(ξ i ) : Q] i→∞ [Q(χ(ξ i )) : Q]
σ:Q(ξi )→C τ :Q(χ(ξi ))→C
= x dµχ (x) (by weak-* convergence)
T
= 0. (by Theorem 4.3.1)
In view of (4.7), this proves (4.6).
As in the proof of Bilu’s theorem, it is clear that µ is a probability measure.
Now the characters χ(x) restricted to Tn form an orthonormal basis of L2 (Tn ),
whence (4.6) shows that the restriction of µ to Tn is the uniform measure on Tn ,
because they have the same Fourier coefficients.
In particular, Tn is contained in the union of the conjugates of X over Q . Since
torsion points are Zariski dense in Gnm (Proposition 3.3.6), this contradicts the
assumption that X is a proper algebraic subvariety of Gnm .
Case II: There is a non-trivial character χ such that the sequence (χ(ξ i ))i∈N has
an element ε0 occurring infinitely many times.
Since h(χ(ξ i )) → 0, we have h(ε0 ) = 0 and ε0 is a root of unity by Kronecker’s
theorem in 1.5.9. Let ε be a torsion point such that χ(ε) = ε0 and replace X by
ε−1 X and {ξ i } by {ε−1 ξ i }. Now (ε−1 X)∗ = ε−1 (X ∗ ) and multiplication by
a torsion point does not change the height; therefore, there is no loss of generality
in assuming that ε0 = 1. Further, going to an infinite subsequence of the sequence
(ξ i )i∈N if needed, we may also assume that there is a torsion point ε such that
{ε ξ i } is contained in the connected component of the identity of the kernel of
χ , say H . Now H is a proper subtorus of Gnm and we may replace X , Gnm by
ε X ∩ H and H , and then use induction.
Remark 4.3.4. As it stands, this proof does not lead to an effective form of
Theorem 4.2.2. However, it is not difficult to show that there is a lower bound
depending only on d , n , [K : Q], and max h(fi ) for a set of defining equations
of X .
∗
The theorem states that xm − a is reducible over a field K of characteristic 0 if and only if
a = bp for some b ∈ K and a prime p with p|m , or a = −4b4 and 4|m .
4.4. Dobrowolski’s theorem 109
m
−pm
hence αq = 1 and α is a root of unity, which was excluded from the begin-
ning.
Our next step is the construction of a polynomial F (x) ∈ Z[x] of degree at most
D , vanishing at α to order at least m . Here D and m are large parameters (in
the end going to ∞ ) and we want some control on the height of F (x).
Lemma 4.4.3. Let α be an algebraic number of degree d ≥ 2. Let us fix ε
with 0 < ε < 1 and suppose that dm ≤ (1 − ε)D . Then there is a polynomial
F (x) ∈ Z[x] of degree at most D , not identically 0, vanishing at α to order at
least m and such that
dm2 D Ddm
h(F ) ≤ log +1 + h(α) + o(D)
D − dm m D − dm
as D → ∞ .
Proof: This follows from Siegel’s lemma, Corollary 2.9.7, noting that in our case
the matrix A is
j j−h
A= α
h
with m rows indexed by h , 0 ≤ h < m , and D + 1 columns indexed by j ,
j = 0, 1, . . . , D . Using the easy bound
a a
log ≤ b log + 1
b b
we get h(Ah ) ≤ m(log(D/m) + 1) + Dh(α) for the h th row of A .
4.4.4. In what follows, α will satisfy the conditions of Lemma 4.4.2. Since α is a
unit, it is an algebraic integer (see Remark 1.5.11). Let
d
f (x) = (x − αi ) ∈ Z[x]
i=1
for some G(x) ∈ Z[x] with G(α ) = 0 for every prime p not dividing d . More-
p
1 d D
H F ≤ H(F ). (4.9)
ep ! dx ep
Then the stated bound (4.8) for the norm follows from the last statement of Propo-
sition 1.6.6 and (4.9), (4.10), and (4.11).
We compare with the lower bound pdm , take logarithms, divide by d , and use
Lemma 4.4.3 to estimate h(F ), obtaining
D dm2 D
m log p ≤ ep log +1 + log +1
ep D − dm m
Ddm
+ pD + h(α) + o(D).
D − dm
4.4. Dobrowolski’s theorem 111
γ log p γp d γ2 d
≤ log +1 + log + 1
d d γp (1 − γ)d γ
(4.12)
γ
+ p+ h(α) + o(1),
1−γ
with the o(1) term going to 0 as D → ∞ .
It remains to optimize inequality (4.12) by choosing a range for the prime p , the
parameter γ < 1, and an optimal γp . In what follows, we shall assume that d is
large and deal with estimates asymptotic with respect to d .
It is clearly convenient to make sure that γp is as small as possible, and to this end
we note that
γp ≤ 1,
p
e
because each fp p divides F and fp has degree d . In particular, if our set of
primes consists of the primes in an interval [Y0 , Y ] not dividing d , and N > 0 is
the number of such primes, there exists p ∈ [Y0 , Y ] such that p does not divide d
and γp ≤ 1/N . Since x(log(d/x) + 1) increases with x for 0 < x < d, we see
that (4.12) can be replaced by
γ log p 1 γ2 d γ
≤ (log(N d)+1)+ log + 1 + p + h(α)+o(1).
d Nd (1 − γ)d γ 1−γ
Now we let D → ∞ , getting rid of the o(1) term, and deduce a fortiori
γ log Y0 1 γ2 d γ
≤ (log(N d) + 1) + log + 1 + Y + h(α),
d Nd (1 − γ)d γ 1−γ
(4.13)
with N the number of primes, not dividing d , in [Y0 , Y ].
In order to optimize (4.13), we consider the parameters Y0 , Y , γ as functions of
d → ∞.
We begin by estimating N . The number of primes dividing d is less than
log d/ log 2 , therefore the prime number theorem (H. Koch [162], Theorem
1.7.3) shows that
Y
N = π(Y ) − π(Y0 ) + O(log d) ∼ ,
log Y
provided Y0 = o(Y ) and log d = o(Y / log Y ) as Y → ∞ , which we shall
assume. We do not want to choose Y too large (in fact, Y = o(d) will suffice),
nor Y0 too small, and it is quite reasonable to choose Y0 = Y / log Y , ensuring
that log Y0 ∼ log Y , and
log Y = o(log d), (4.14)
112 SMALL POINTS
ensuring that log(Y d) ∼ log d . Moreover, we do not want γ to be too small, and
in fact we shall need
log(1/γ) = o(log d) (4.15)
as d → ∞ . Then (4.13) becomes, as Y → ∞
(1 − o(1))γ log Y log Y γ2
≤ (1+o(1)) log d+(1+o(1)) log d+(1+o(1))Y h(α).
d Yd d
We rewrite this as
The remaining part of this section deals with further results about Lehmer’s conjecture and
may be skipped in a first reading. We will use it only in Section 4.6. A natural extension of
Lehmer’s conjecture to the higher dimensional case is:
Conjecture 4.4.6. There is c(n) > 0 with the following property. let α1 , . . . , αn be
multiplicatively independent non-zero algebraic numbers. Then
c(n)
h(α1 ) · · · h(αn ) ≥ .
[Q(α1 , . . . , αn ) : Q]
For n = 1 , this reduces to Lehmer’s conjecture.
A significant extension of Dobrowolski’s theorem has been obtained by F. Amoroso and S.
David [9] in this context. They prove:
4.4. Dobrowolski’s theorem 113
Theorem 4.4.7. There is a positive constant c (n) with the following property. Let α be
as in 4.4.6 and let D = [Q(α1 , . . . , αn ) : Q] . Then
c (n)
h(α1 ) · · · h(αn ) ≥ (log(3D))−nκ(n) ,
D
where κ(n) = (n + 1)(n + 1)!n − 1 .
In fact, they prove this result in the more precise form in which the degree D is replaced
by the smallest degree ωQ (α) of a hypersurface defined over Q containing the point α .
As a corollary, we obtain the validity of Lehmer’s conjecture for any α not a root of unity
such that Q(α) is a Galois extension of Q .
We will not prove this result here and instead refer the interested reader to the original paper
of Amoroso and David.
4.4.8. Recall that a field extension K/Q is called abelian if it is a Galois extension with
an abelian Galois group. We conclude this section with a nice result of F. Amoroso and R.
Dvornicich [11], which provides a uniform positive lower bound for the height of algebraic
numbers, not a root of unity or 0 , in abelian extensions of Q .
Theorem 4.4.9. Let K/Q be an abelian extension and let α ∈ K , α not a root of unity
or 0 . Then
log(5/2)
h(α) ≥ .
10
Remark 4.4.10. Amoroso and Dvornicich obtain the more precise lower bound log(5)/12
and give an example with height log(7)/12 .
4.4.11. For m ≥ 3 let ζm be a primitive m th root of unity and denote by Cm = Q(ζm )
the m th cyclotomic field of degree ϕ(m) , and by Om the ring of integers of Cm . By
the Kronecker–Weber theorem (see L.C. Washington [322], Th.14.1), any finite abelian
extension K of Q is contained in a cyclotomic extension of Q . Thus, in proving 4.4.9,
there is no loss of generality in assuming that K = Cm for some m .
Lemma 4.4.12. Let K be a number field and let w be a non-archimedean place of K .
Then, for any α ∈ K \ {0} , there exists an algebraic integer β ∈ K \ {0} , such that αβ
is an algebraic integer and
|β|w = 1/ max(1, |α|w ).
Proof: Let S0 be the set of non-archimedean places v of K for which |α|v > 1 and set
ξv = 1/α . If, moreover, w ∈ / S0 , set ξw = 1 . Define S = S0 ∪ {w} . By the strong
approximation theorem (see Theorem 1.4.5), for any ε > 0 , there is β ∈ K \ {0} such
that
|β − ξv |v < ε
for every v ∈ S , and also |β|v ≤ 1 for a non-archimedean v not in S .
Since we are dealing with ultrametric absolute values, by definition of S0 and ξv we see
that, if ε is sufficiently small, we have |β|v = |1/α|v ≤ 1 for v ∈ S0 , and |β|w = 1
if w ∈ / S0 . Hence |β|w = min(1, |1/α|w ) = 1/ max(1, |α|w ) and also |β|v ≤ 1 and
|αβ|v ≤ 1 for v ∈ S . If instead v ∈ / S is a non-archimedean place, we have |α|v ≤ 1 by
definition of S0 , and again |β|v ≤ 1 and |αβ|v ≤ 1 for a non-archimedean v not in S .
Hence β and αβ are both algebraic integers, as noted in Definition 1.5.10.
114 SMALL POINTS
≤ |β|−p p p
v max(|(αβ) − σ(αβ)|v , |β − σβ|v |σα|v )
≤ |p|v |β|−p
v max(1, |σα|v )
Now we apply the product formula to η . Let as usual εv = 0 if v is not archimedean and
εv = [(Cm )v : Qv ]/[Cm : Q] if v|∞ . If v|p , we have shown that
log |η|v ≤ log |p|v + p log+ |α|v + log+ |σα|v .
If instead v does not divide p , we have trivially
log |η|v ≤ p log+ |α|v + log+ |σα|v + εv log 2.
Hence summing over all places and using the product formula, we get
0= log |η|v
v
≤ log |p|v + (p log+ |α|v + log+ |σα|v + εv log 2)
v|p v
Note that η = 0 , otherwise Lemma 4.4.13 shows that there would be a root of unity
ζ ∈ Cm such that ζα ∈ Cm/p , which was excluded at the start. Hence the application of
the product formula yields, much in the same way as in the preceding case, the inequality
log(p/2)
h(α) ≥ .
2p
Theorem 4.4.9 follows by considering the prime p = 5 .
We denote by a bar complex conjugation on Cm , namely the automorphism determined by
−1
ζm
→ ζm .
Corollary 4.4.14. Let γ be an algebraic integer in Cm . If γ/γ is not a root of unity, it
holds that
1 log(5/2)
log NC m /Q (γ) ≥ ,
ϕ(m) 10
where ϕ is the Euler ϕ -function.
116 SMALL POINTS
Proof: By Theorem 4.4.9, Lemma 1.3.7, and the product formula, we have
log(5/2)
≤ log+ |γ/γ|v = log+ |γ/γ|v
10 v v | ∞
≤− log |γ|v = log |γ|v
v | ∞ v|∞
1
= log NC m /Q (γ).
ϕ(m)
1
p−1
1
lim log NC p /Q (f (ζp )) = lim log f (e2πih/p )
p→∞ p−1 p→∞ p − 1
h=1
1
2πiθ
= log f (e ) dθ
0
= log M (f ) = dh(α).
In this section, we consider only sets of algebraic numbers contained in a fixed algebraic
closure Q .
4.5.1. We say that a set A of algebraic numbers has the Northcott property (N) if for
every positive real number T the set
" #
A(T ) = α ∈ A | h(α) ≤ T
is finite. The Northcott theorem states that the set of all algebraic numbers of degree at most
d has property (N) (see Theorem 1.6.8).
We may ask if property (N) holds for other interesting sets. For example, does it hold for the
field Q(d) , the composite field of all number fields of degree at most d over Q ? Although
this question remains open in general, we shall show that this is the case if d = 2 . More
generally, we show that property (N) holds for the maximal abelian subfield of Q(d) .
4.5.2. Let K be a number field and denote by K (d) the compositum of all extension fields
(d)
F/K of degree at most d over K . Then K (d) is normal over K . We also denote by Kab
the compositum of all finite abelian extensions L/K with K ⊂ L ⊂ K . Since the
(d)
(d)
compositum of two finite abelian extensions is again a finite abelian extension, Kab is the
(d)
union of all finite abelian extensions over K . In particular, Kab /K is a Galois extension,
(d) (d)
and it is the maximal abelian subfield of Kab . If d ≥ 2 , the fields K (d) and Kab have
infinite degree over K .
We recall that a Galois extension F/K is called of exponent dividing n ∈ N if the order
of every element of Gal(F/K) divides n .
If L is a finite extension of K with [L : K] ≤ d and Galois closure F , then [F : K] ≤ d!
and hence F has exponent dividing d! . Since the compositum of Galois extensions of
exponent dividing n is obviously a Galois extension of exponent dividing n , we conclude
(d)
that K (d) and hence also Kab are Galois extensions of exponent dividing d! .
In the following result, we will prove that the local degrees of K (d) /K are bounded. This
is a motivation to consider the Northcott property (N) for the field K (d), which is an open
(d)
problem. However, we will prove in Theorem 4.5.4 that Kab has property (N).
118 SMALL POINTS
Proposition 4.5.3. Let v be any place of MK and let w be an extension of v to K (d) and
(d) (d)
let Kv and Kw be the corresponding completions. Then the local degree [Kw : Kv ] is
bounded in terms of d and [K : Q] alone, independently of v, w .
Proof: It is enough to consider non-archimedean places. Let us fix an algebraic closure
Ωv of Kv and let p be the residue characteristic of v . By results of M. Krasner [163],
the number of subextensions of Ωv /Kv is precisely known. We only use here that the
number of extensions of degree at most d is finite and bounded only in terms of d and
[Kv : Qp ] . Therefore, the degree of their compositum is bounded only in terms of d and
(d)
[Kv : Qp ] ≤ [K : Q] . Since Kw may be embedded in such a compositum, the result
follows.
(d)
Theorem 4.5.4. Property (N) holds for the field Kab , for any d ≥ 2 .
Corollary 4.5.5. The field K (2) has property (N).
(2)
Proof: Obvious, because K (2) = Kab .
√ √ √
Corollary 4.5.6. For any m ≥ 2 , the field Q( 1, 2, 3, . . . ) has property (N).
m m m
√ √
Proof: Let K = Q( m 1) . Then each field K(√m a)√is of√degree at most m and abelian
over K . Therefore, their compositum F = Q( 1, m 2, m 3, . . . ) is abelian over K and
m
(m) (m)
a subfield of Kab . By Theorem 4.5.4, Kab has the Northcott property and the same
holds for its subfield F .
Proof of Theorem 4.5.4: In what follows, we abbreviate D = d! . We√may enlarge the
number field K , hence we may suppose that K contains the field Q( D 1) generated by
(d)
roots of unity of order D . Let us fix a positive real number T and let α ∈ Kab satisfy
h(α) ≤ T . As a subfield of an abelian field, L = K(α) is automatically a finite abelian
extension of K . By 4.5.2, L/K has exponent dividing D .
Let p be a prime, unramified in K and let v be a place of K above p . For the following
considerations, the reader is assumed to be familiar with the notation and results from ram-
ification theory developed in B.2.18 and B.2.19. Let e = ew/v be the ramification index
of a place w of L lying over v . Since Gal(L/K) operates transitively on these places,
e does not depend on the choice of w . If p > d , then our remarks on the exponent show
that p does not divide the order of Gal(L/K) . Hence w will be tamely ramified over v
and the inertia group of w over v is cyclic of order e (see B.2.18 (d), (e)) proving that e
divides D .
Now let θ = p1/e for some choice of the root, and consider the field L(θ) with a place
u lying over w . As a compositum of two abelian extensions of exponent dividing D , we
note that L(θ)/K is also abelian of exponent dividing D . Using the theory of Eisenstein
polynomials (see J.-P. Serre [276], Ch.I, §6) and v unramified over p , we deduce that the
ramification index of u|K (θ) over v is e and the residue degree is 1 . By Abhyankar’s
lemma ([215], Cor.4, p.236), this and u|K (θ) tamely ramified over v imply that u is un-
ramified over w . By Proposition 1.2.11, we conclude eu/v = e .
Let I ⊂ Gal(L(θ)/K) be the inertia group of u over v , a group of order e . Since
L(θ)/K is abelian, all the inertia groups above v are equal to I . Define U as the fixed
field of I . Then U is normal over K and U/K is unramified over v (see B.2.18 (d)). By
4.5. Remarks on the Northcott property 119
Galois theory, [L(θ) : U ] = |I| = e . Since u|U is unramified over p , we see again by the
theory of Eisenstein polynomials that u|U (θ) has ramification index e = [U (θ) : U ] over
u|U , proving in particular that U (θ) = L(θ) . It follows that α ∈ U (θ) and we may write
α = β0 + β1 θ + · · · + βe−1 θe−1 , βi ∈ U.
r
The conjugates of θ over U are ζ θ , where ζ is a primitive e th root of unity and r =
0, 1, . . . , e − 1 . Therefore, the trace TrU (θ)/U (θj ) vanishes if j is not a multiple of e and
equals e if j = 0 . Hence
1
e−1
1
βj = TrU (θ)/U (αθ−j ) = j/e αr ζ −rj ,
e ep r=0
where αr are certain conjugates of α . Note that Proposition 1.5.17 yields h(αr ζ −rj ) =
h(α) ≤ T for 0 ≤ r ≤ e − 1 . By a standard inequality about the height of a sum (see
Proposition 1.5.15), we find
h(βj pj/e ) ≤ log e + h(αr ) + log e ≤ 2 log D + DT. (4.17)
r
As before, let u be any place of U (θ) = L(θ) above v and use the same letter to denote
the associated discrete valuation normalized by u(L(θ)× ) = Z . Since βj ∈ U , we have
that u(βj ) is divisible by e . Suppose now 1 ≤ j ≤ e − 1 . Then u(pj/e ) = j is not
divisible by e , whence u(βj pj/e ) = 0 . This shows that |u(βj pj/e )| ≥ u(p1/e ) = 1 .
Let us abbreviate γ = βj pj/e and suppose that γ = 0 . Letting δu be the local degree
δu := [U (θ)u : Qp ] , the choice of our normalizations in 1.3.6 leads to
log |γ|u ≥ − 1 log |p|u = δu log p .
e e[U (θ) : Q]
Thus we have
⎛ ⎞
1
2h(γ) = h(γ) + h(γ −1 ) ≥ log |γ|u ≥ ⎝ δu ⎠ log p.
e [U (θ) : Q]
u|v u|v
By Corollary 1.3.2, we have δu = [U (θ) : K] . We conclude that, if γ = 0 , then
1
2h(γ) ≥ log p.
e [K : Q]
Comparing with (4.17), we derive that either βj = 0 or
log p ≤ 2e [K : Q](2 log D + DT ).
Let S be the set of rational primes containing all prime divisors of the discriminant DK/Q
and all primes p ≤ exp(2e [K : Q](2 log D + DT )) . If v ∈ MK is lying over a prime
p ∈ S , then B.2.13 shows that v is unramified over p . Hence our considerations above
yield that we must have βj = 0 for 1 ≤ j ≤ e−1 . This means that the algebraic number α
lies in U , which is an abelian extension of K of exponent dividing D and unramified over
v . Hence K(α) is unramified above any p ∈ S . This implies that K(α) is of bounded
degree over K , as we will show in Example 10.5.11. Here, we give a direct argument based
on Hermite’s discriminant theorem: Recall that a cyclic extension is a Galois extension with
cyclic Galois group. Writing Gal(K(α)/K) as a direct product of cyclic groups of order
dividing D , we see that K(α) is the compositum of cyclic extensions of K of degree at
120 SMALL POINTS
most D , each unramified ouside S . On the other hand, the power to which a prime divides
the discriminant of a number field of bounded degree is itself bounded (use Theorem B.2.12
and Corollary 1.3.2). Hence the discriminants of these cyclic extensions of K are bounded.
We conclude by Hermite’s discriminant theorem in B.2.14 that there are only finitely many
such cyclic fields. Hence there are only finitely many distinct fields K(α) and, since α has
bounded height, Northcott’s theorem, as in 1.6.8, shows that only finitely many possibilities
for α can occur.
Remark 4.6.4. The lim inf is with respect to the directed system of finite subsets of L .
If the sum on the right-hand side of (4.18) diverges, then L has property (N). Thus the
question arises whether there are infinite extensions L where this occurs. We have been
unable to find such examples, and we consider it unlikely that this can occur for an infinite
extension.
By Proposition 4.5.3, S(Q(d) ) is the set of all prime numbers for every d ∈ N .
Example 4.6.5. Let us say that a non-zero algebraic number α is totally p -adic if the
rational prime p splits completely in the field Q(α) , meaning that all local degrees of
places over p are 1 . Then the field L of all totally p -adic algebraic numbers is normal
and p ∈ S(L) . Hence L has the Bogomolov property. This may be considered as the
p -adic analog of results of Schinzel and Smyth for totally real algebraic numbers alluded to
in Remark 4.3.6.
Example 4.6.6. Let p1 , . . . , pm be distinct rational primes and let L be the field of all
totally p -adic algebraic numbers for p = p1 , . . . , pm . Then it is clear that pi ∈ S(L) for
4.6. Remarks on the Bogomolov property 121
This shows that the lower bound given by (4.18) is of the correct order of magnitude, insofar
as the contribution of primes with fp = ep = 1 is concerned. We will not prove this result
here, and refer instead to E. Bombieri and U. Zannier [40].
4.6.7. Proof of Theorem 4.6.3: We shall prove a general lower bound for the height of an
algebraic number, of which Theorem 4.6.3 will be an easy corollary.
we find
d
v(ad ) + min(0, v(αd )) = min v(ai ),
i
i=1
We substitute (4.19) in the right-hand side of this inequality and obtain from the formula
for the discriminant in Proposition 1.6.9 the inequality
d
v(D) ≥ 2 v(αi − αj ) − 2 (d − j) v(αj ). (4.20)
i<j≤r j=r+1
Consider now the reductions of αi , i ≤ r , modulo the maximal ideal of the valuation ring
of v . They are elements of the finite field Fq with q = pfp . For x ∈ Fq , let Nx be the
number of conjugates αi with reduction x . Suppose i < j ≤ r . If αi and αj have the
same reduction, we have v(αi − αj ) > 0 , hence v(αi − αj ) ≥ 1/ep , and otherwise we
have v(αi − αj ) ≥ 0 ; note that the number of pairs (i, j) with i < j and such that αi and
αj have the same reduction x is Nx (Nx − 1)/2 . If instead j > r , we have v(αj ) < 0 ,
hence v(αj ) ≤ −1/ep .
In view of these remarks, we deduce from (4.20) that
1 1
v(D) ≥ Nx (Nx − 1) + (d − r)(d − r − 1). (4.21)
ep x∈F ep
q
we rewrite (4.21) as
d2 1 d
v(D) ≥ Vp (α; K) + − . (4.22)
ep q+1 ep
This estimate is useful only in the range q < d , but, since D is a non-zero rational integer,
we have v(D) ≥ 0 in any case. Thus from (4.22) it follows that
1 1 1
where the sum ranges over all primes p with q = pf < d . On the other hand, by Proposi-
tion 1.6.9, we have
log |D| ≤ d log d + (2d − 2)d h(α). (4.24)
Combining (4.23) and (4.24), we finally obtain:
Theorem 4.6.8. Let K be a Galois extension of Q . For a non-archimedean place v of K
lying over the rational prime p let fp and ep be the residue degree and ramification index
of v over p and write q := pfp . Let α ∈ K \ {0} be of degree d . Then
log d d 1 1 1
h(α) ≥ − + Vp (α; K) + − log p.
2d − 2 2d − 2 ep q+1 d
q<d
4.7. Bibliographical notes 123
4.6.9. Completion of the proof of Theorem 4.6.3: For α ∈ L , we apply Theorem 4.6.8 with
K = L . Since Vp (α; K) ≥ 0 in any case and we may restrict the sum to finitely many
primes p ∈ S(L) , the proof is completed by noting that, by Northcott’s theorem from 1.6.8,
in any infinite sequence of distinct algebraic numbers of bounded height, the degrees must
go to ∞ . Thus d → ∞ if we want to estimate lim inf h(α) in L .
2
1 deg(αn )
Nx − log p → 0. (4.25)
deg2 (αn ) x∈Fq ∪{∞}
q+1
This may be regarded as an analog of Bilu’s theorem from 4.3.1. See also R. Rumely [254]
for related results in a p -adic and adelic setting.
Further results going beyond Zagier’s lower bound in Remark 4.2.7 can be found
in C. Doche [91]. The proof of Theorem 4.3.1 is a modification of an argument of
Bilu [24] and the alternative argument is a suggestion of J. Bourgain.
Dobrowolski’s theorem has been slightly improved by R. Louboutin [183], who
obtains a constant 9/4 instead of the constant 1/8 given here, by a different
method. The argument given here can also be refined to give the same constant
9/4, by using the full force of Siegel’s lemma in 2.9.4 (including the use of suc-
cessive minima).
The higher dimensional version of Dobrowolski’s theorem is due to Amoroso and
David [9] (see also [10] for a correction and further results).
The presentation of the Amoroso–Dvornicich theorem and its application to
Smyth’s theorem follows closely [11]. A relative version of this result has been
obtained by Amoroso and Zannier in [12].
The remarks about the Northcott property and the Bogomolov property can be
found in a paper of Bombieri and Zannier [40].
5 T H E U N I T E QUAT I O N
5.1. Introduction
x+y =1
to be solved with (x, y) ∈ Γ . A basic result, going back to Siegel, Mahler, and
Lang, asserts that this equation has only finitely many solutions. In Section 5.2,
we shall give a complete proof of this result based on the uniform Zhang theorem
in 4.2.3 and obtain a uniform bound for the number of solutions. This is applied
in Section 5.3 to give an upper bound for the number of integer solutions of the
Thue–Mahler equation and of a hyperelliptic equation.
The important problem of finding explicit upper bounds for the height of solutions
of a unit equation requires different methods. In Section 5.4, we give just some
results. We refer to A. Baker’s monograph [14] and J.-P. Serre [277], §8.3, for an
approach using Baker’s theory of linear forms in logarithms.
We may also consider a linear torus G over a field of characteristic 0, a finitely
generated subgroup Γ of G and study the set C ∩ Γ , where C is a geometrically
irreducible algebraic curve in G . Lang proved that, if C ∩ Γ is an infinite set,
then C is a translation of a subtorus of G . Lang conjectured and Liardet proved
that the same conclusion holds if we replace Γ by its division group, that is the
group Γ consisting of all points y ∈ G such that y n ∈ Γ for some n (we use
multiplicative notation in G ). We will give a sketch of an effective version of this
theorem in the special case when the field is a number field, see Theorem 5.4.5.
Similar statements can be made for G a commutative algebraic group with no Ga
components (that is, a semiabelian variety) and replacing C by a subvariety X
of G , but they are far more difficult to prove; indeed, even the simplest case of a
curve in an abelian variety turns out to be equivalent to Mordell’s conjecture. The
latter will be proved in Chapter 11 and for the semiabelian case we refer to the
bibliographical notes in Section 11.11.
125
126 T H E U N I T E QUAT I O N
We have the following nice result of F. Beukers and H.P. Schlickewei [23]:
Theorem 5.2.1. There are absolute computable constants C1 , C2 with the fol-
× ×
lowing property. Let Γ be a subgroup of Q × Q with rank Q (Γ) = r < ∞ ,
where rank Q (Γ) is the maximum number of multiplicatively independent elements
in Γ . Then the equation
x + y = 1, (x, y) ∈ Γ
has at most C1 · C2r solutions.
2
This result improves bounds C r and (Cr)r previously obtained by Schlickewei
and Schmidt. Beukers and Schlickewei give the values C1 = C2 = 256.
5.2.2. It is an interesting problem to determine the maximum number of solutions
of the equation x + y = 1 with (x, y) in a group Γ of rank r . In this vein, we
may remark the following. Suppose Γ is a subgroup of K × × K × , where K ×
is the multiplicative group of a number field K . If we take cosets in Γ/tors of
the subgroup of fourth powers, we are led to finding K -rational points on curves
ax4 + by 4 = 1, which have genus 3.
It has been conjectured by L. Caporaso, J. Harris, and B. Mazur [56] that the
number of K -rational points on a curve of genus g ≥ 2 is bounded solely in
terms of K and g . Since we have 4r cosets, this argument suggests that perhaps
C2 ≤ 4.
5.2.3. In many applications the group Γ is the group (US,K )2 , where S is a finite
set of places, containing all infinite places, of a number field K and US,K is the
group of units of the ring of S -integers of K . By Dirichlet’s unit theorem from
1.5.13, US,K is finitely generated of rank |S| − 1 and it is possible to determine
effectively a set of generators of US,K .
We will give below some examples of Γ with a large number of solutions of the
unit equation.
Example 5.2.4. The following simple argument yields an example of a subgroup of Q× ×
Q× with a large number of solutions.
Let N ≥ 2 be a positive integer. Let M be the number of positive integers up to x whose
prime factors do not exceed x1/N . It is clear that
π(x1/N )N
M≥ .
N!
Since π(y) > y/ log y for y ≥ 17 (see B. Rosser and L. Schoenfeld [245], Th.1, Cor.1,
p.69) and N ! ≤ 12 N N , we see that M > 2x/(log x)N if x ≥ 17N . Consider the M 2
sums n + n , where n , n ≤ x are positive integers whose prime factors do not exceed
x1/N . Since n + n ≤ 2x , one sum must occur at least M 2 /(2x) > 2x/(log x)2N
5.2. The number of solutions 127
times. In other words, there is an integer b such that the equation n + n = b has at least
2x/(log x)2N solutions.
It follows that, if Γ is the subgroup of Q× × Q× generated on each factor by all primes
up to x1/N and by b , then the unit equation in Γ has at least 2x/(log x)2N solutions,
provided x ≥ 17N .
This group Γ has rank r equal to either 2π(x1/N ) or 2π(x1/N ) + 2 , hence we have
r ∼ 2N x1/N / log x as x1/N tends to ∞ . If we make the asymptotically optimal choice
. /
log x log x
N= − ,
2 log log x 2(log log x)2
we verify that the number of solutions is at least
√
x (c+o(1)) √
r
log r
2N
=e
(log x)
√
with c = 2/e .
Example 5.2.5. Consider the equation au + bv = 1 for non-zero algebraic numbers a, b
to be solved with (u, v) ∈ Γ . This may be reduced to the unit equation by enlarging the
range of solutions to the group generated by Γ and (a, b) . This procedure will be used
later.
Here, we are interested directly in the equation axm + by m = 1 for varying m , corre-
sponding to a group Γ = (x, y)Z of rank 1 . We want to find a , b , x , y such that it has
the maximum number of solutions for m ∈ Z .
We may assume that m = 0 is a solution. Suppose that m = 1 is also a solution, so the
equation becomes (y − 1)xm + (1 − x)y m − (y − x) = 0 . Here we must exclude x = 1 ,
y = 1 , and x = y which correspond to degenerate cases.
If we fix two other solutions, say m1 and m2 , we can eliminate y and obtain an equation
for x . In general, this leads to pairs (x, y) such that we have four solutions, namely m =
0, 1, m1 , m2 .
Note however that there are special cases. If m1 = 2 , the equation degenerates into
(x − 1)(y − 1)(x − y) = 0 , so m1 = 2 must be excluded. Also, if m1 = 3 , the
values m2 = 4, 5, 6, 7, 9 must be excluded, because they lead to degenerate cases or a
group Γ of rank 0 .
However, taking m1 = 4 and m2 = 6 gives the equation x6 + x5 + 2x4 + 3x3 + 2x2 +
x + 1 = 0 . For any root ξ of this equation, we see that taking η = −1/(1 + ξ + ξ 3 ) ,
which is another root of the same equation, we have
η−1 m 1−ξ m
ξ + η =1
η−ξ η−ξ
for the six values m = 0, 1, 4, 6, 13, 52 .
Other examples are obtained by letting the Galois group of the equation (which is of order
6, generated by ξ → 1/ξ and ξ → η ) act on Γ = (ξ, η)Z , and also going to a division
group. It is conceivable that 6 is the maximum number of solutions and that any group of
rank 1 with six solutions is obtained in this way.
128 T H E U N I T E QUAT I O N
The above problem is closely connected to finding zeros of a linear recurrence: Let um+1 =
Aum + Bum−1 + Cum−2 be a linear recurrence of the third order, which we assume
non-degenerate in the sense that the roots βi (i = 1, 2, 3) of the associated characteristic
equation x3 −Ax2 −Bx−C are distinct and non-zero. We may consider the recurrence also
in the negative direction by solving for um−2 . Then the general solution of the recurrence
is given by
Let a = −C1 /C3 , b = −C2 /C3 , x = β1 /β3 and y = β2 /β3 . Then solving the equation
axm + by m = 1 in the group Γ = (x, y)Z of rank 1 is equivalent to finding the zeros of
the recurrence {um | m ∈ Z} .
√
Example 5.2.6. For a prime p , consider the cyclotomic field Cp = Q( p 1) and the corre-
sponding unit equation. Here we choose Γ = U × U , where U is the group of units in the
ring of algebraic integers of Cp .
If u + v = 1 and u , v are not real, then u + v = 1 is another solution of the unit
equation. By Kronecker’s theorem in 1.5.9, ε := u/u and ε := v/v are roots of unity
in Cp . Solving the system u + v = 1 , εu + ε v = 1 , we get u = (ε − 1)/(ε − ε) ,
v = (1 − ε)/(ε − ε) . Conversely, given distinct roots of unity ε , ε in Cp , not equal to
1 , we obtain a solution u , v of the unit equation. Thus the number of non-real solutions of
the unit equation in Cp is (p − 1)(p − 2) .
Example 5.2.7. The number of solutions of the unit equation in the maximal real subfield
Kp of Cp is much larger. A computer search using cyclotomic units produced three solu-
tions for K5 , 42 solutions for K7 , 570 solutions for K11 , 1830 solutions for K13 , 11 700
solutions for K17 , and 28 398 solutions for K19 .
Example 5.2.8. The following example gives an equation u + v = 1 with at least 2532
solutions u, v ∈ U , for a certain group U of rank 5 . Let K = Q(α) with α the real root
α > 1 of the Lehmer equation
x10 + x9 − x7 − x6 − x5 − x4 − x3 + x + 1 = 0.
This equation has another real root 1/α and eight non-real roots all of absolute value 1 ;
we shall refer to the map α
→ 1/α as real conjugation in Q(α) . The Mahler measure of
α is M (α) = α = 1.17628081825991 . . . , and it is widely conjectured to be the infimum
of the Mahler measure of an algebraic number, not a root of unity − the so-called Lehmer
conjecture (see 1.6.15).
The group U of units of K has rank 5: U = {±1}× < α, 1 − α, 1 + α, 1 + α +
α2 , 1 + α − α3 > . Now an extensive computer search for solutions of the corresponding
unit equation produced a remarkable total of 2532 solutions.
The following is a plot of the 2532 points (log |u|, log |u |) , where u is a real unit and u
is the real conjugate of u .
5.2. The number of solutions 129
15
. . ..
.. .. .
...
10 . . . ..
. .. . . ...
. .. . . ..
. . .. . .. . . ........... .. .
.
. . . .. ... . ....... . ..
5 . . . ... ... . .... ... ............................ .. ..
. . . .... .. .... .... ................ .. .
. . . . . . .
. . .... .. ...... ....... . . . . . . . . . . . . . . . . . . . .
. . .. .... ........... ............................................................. ....... . .
.. . ... .. ............................................................................................................ ..... .. .. ... . ..
. . . . . . . . .... . . . .. .. . . .
0 . .... . . .. ................................................................................................................................................................................................................. .. ..
.. .. .
.. . ... .. .. ..... .... ............................................................................................. .. ... . .
. . ...... . ......................................................... ..... .. .. . .
.. ................ . ... .. .. . .. .
.. .. . .... ......... ....... . ..... .... . .
. ............................................. ..... . .. .. .
. .. .... .. .. . . .
5 .. .... ... ..................... .. . . ..... . .. .
. . . ... . . ..
. ....... . .. .. . . . .
... . . .. .
.. . . .
10 . ..
... . .
.. . .
15
15 10 5 0 5 10 15
Lemma 5.2.9. Let f (x) ∈ K[[x]] be a formal power series with coefficients in
a field K . Let L, M be positive integers. Then there are polynomials P (x) ∈
K[x], Q(x) ∈ K[x] of degrees at most L and M and with Q not identically 0,
such that
N +j L+M −j j
QL,M,N (x) = x ,
j=0
N L
1
PL,M,N (x) = (−x)L QN,L,M (1 − ),
x
RL,M,N (x) = (−1)L QL,N,M (1 − x)
L+M +N +1
1 (QL,M,N ) = QL,M,N (1) = .
M
Finally, we have the identities, with denoting the derivative
(PL,M,N (x)) = −(L + M + N + 1)PL−1,M,N (x),
((1 − x)L+N +1 QL,M,N (x)) = −(L + M + N + 1)(1 − x)L+N QL−1,M,N (x)
(xL+M +1 RL,M,N (x)) = −(L + M + N + 1)xL+M RL−1,M,N (x).
Proof: We have the following classical transformation of a hypergeometric inte-
gral, due to Kummer (A. Erdélyi, W. Magnus, F. Oberhettinger, and F. Tricomi
[100], I, (29), p.106)
1
tM (t − 1)N (t − x)L dt
0
1 1/x
= xL+M +1 + uM (xu − 1)N (u − 1)L du
0 1
1
= xL+M +1 uM (u − 1)L (xu − 1)N du
0
1
+ (−1)N (1 − x)L+N +1 v N (1 − v)L (1 − (1 − x)v)M dv,
0
where we have performed the changes of variables t = xu in the first equation,
and xu = 1 − (1 − x)v in the second equation.
This hypergeometric identity determines explicitly the (L, M )-Padé approximant
of (1 − x)L+N +1 . We define polynomials P , Q, R of precise degrees L, M , N
5.2. The number of solutions 131
by means of
1
P (x) = tM (1 − t)N (t − x)L dt
0
1
Q(x) = v N (1 − v)L (1 − (1 − x)v)M dv
0
1
R(x) = (−1)L uM (1 − u)L (1 − xu)N du.
0
M L+M −j
= v N +j
(1 − v) dv xj
j=0
j 0
M
(5.4)
M (N + j)!(L + M − j)! j
= x
j=0
j (L + M + N + 1)!
M
−1 N +j L+M −j j
=D x ,
j=0
N L
L+M +N +1
1 (QL,M,N ) = D v (1 − v) dv =
N L
.
0 M
The uniqueness of Padé approximants now can be used to obtain relations between
two Padé approximants associated to a triple (L, M, N ) and to a permutation.
132 T H E U N I T E QUAT I O N
1
PL,M,N (x) = (−x) QN,L,M L
1− , RL,M,N (x) = (−1)L QL,N,M (1−x).
x
Note that P (λ) = 0, because P (x) and R(x) have no common zeros. Dividing
by P (x), we find
S(x)
−(γ − β)ϕ(x) = (x − λ)2 − (α − β).
P (x)
cb − c b ac − a c
x1 = , x2 = .
ab − a b ab − a b
134 T H E U N I T E QUAT I O N
Hence
h(x) = h((ab − a b : cb − c b : ac − a c))
= log max(|ab − a b|v , |cb − c b|v , |ac − a c|v )
v
≤ log 2 + log max(|a|v , |b|v , |c|v ) + log max(|a |v , |b |v , |c |v )
v v
= log 2 + h((a : b : c)) + h((a : b : c )).
3n + 1
= log + (n + 1)h(x).
n
By definition
h((a : b : 1)) = h(yx−2n ),
136 T H E U N I T E QUAT I O N
3n + 1
2n h(x) ≤ log 2 + log + (n + 1)h(x) + h(yx−2n ),
n
hence
1 3n + 1 1
h(x) ≤ log 2 + h(yx−2n ). (5.10)
n−1 n n−1
If instead equations (5.6) and (5.7) on page 134 are linearly dependent, equ-
ations (5.7) and (5.8) on page 135 must be linearly independent. The same cal-
culation as before now shows that
1 3n 1
h(x) ≤ log 2 + h(yx−2n ),
n n n
which is better than (5.10). Thus (5.10) holds in any case.
The maximum of n−1 1
log 2 3n+1
n occurs for n = 2 and equals log 42. This
proves the lemma.
5.2.16. Continuation of the proof of Theorem 5.2.1: Let Γ be a finitely generated
× ×
subgroup of Q × Q , of rank r . Let Γtors be its subgroup of torsion elements.
Then Γ/Γtors is a free abelian group of rank r , which we may identify with Zr .
Let Z be the set of solutions of x + y = 1 in Γ and let Z0 be its image in Zr
under the projection Γ −→ Zr . We claim that
|Z| ≤ 2 |Z0 | . (5.11)
Indeed, elements of Z with same image in Z0 can be written as (aε, bζ) with a,
b fixed and ε and ζ roots of unity. Consider the triangle in the complex plane with
vertices at 0, aε and 1. Then the equation aε + bζ = 1 shows that its sides have
length 1, |a|, |b|.
There are at most two such triangles (intersect a circle of radius |a| and centre 0
and a circle of radius |b| and centre 1), showing that the projection of Z onto Z0
is at most two-to-one.
1
h(x) ≤ max(h(x1 ), h(x2 )) ≤ h(x) ≤
h(x).
2
In view of this inequality, Lemma 5.2.14 shows that for u, v ∈ Z0 and any integer
n ≥ 2 we have
2
u ≤ 2κ + v − 2nu . (5.12)
n−1
The idea behind the last two displayed inequalities is the following.
For a vector u ∈ Rr let ν(u) = u/u be the associated unit vector with respect
to the norm . Suppose that the vectors ν(u) and ν(v) are nearly the same, so
that u and v point about in the same direction. If v is much larger than u,
then we can find an integer n such that v − 2nu is small compared with n u,
and now (5.12) can be used to get an upper bound for u.
The details are quite simple. Let ε > 0 be a small positive constant and let u, v ∈
Z0 be two points with
5.2.18. Conclusion of the proof of Theorem 5.2.1: Let us call large a solution
x ∈ Γ of x1 + x2 = 1 if h(x) = h(x1 ) + h(x2 ) ≥ max(4κ, 5) and small
otherwise.
The counting of large solutions is done in two steps, first by providing an upper
bound for the number of points u ∈ Z0 such that H ≤ u ≤ AH and lying in a
fixed cone
C(ε; a) := {w ∈ Rr | ν(w) − a ≤ ε},
and then by covering all of Rr by means of finitely many cones C(ε; ai ).
For the first step we use (5.13). Suppose we have two distinct points u, v ∈
Z0 ∩ C(ε; a) with
u ≤ log 4 + 2 v − u
= log 4 + 2 v · ν(v) − u · ν(u)
≤ log 4 + 2 (v − u) + 2 u · ν(v) − ν(u)
≤ log 4 + (2δ + 4ε) u .
5.2. The number of solutions 139
hence, by Lemma 5.2.17, we have um ≤ 90 u1 . On the other hand, the
preceding gap principle shows that um ≥ ( 54 )m−1 u1 . Hence m − 1 ≤
log 90/ log(5/4) < 21 and, by (5.11) on page 136, we cannot have more than 42
1
large solutions with image in any given cone C( 20 ; a).
This means that we can find points (ai , bi ) ∈ Γ , numbering not more than
(1 + 2t/γ)r , such that every x = (x1 , x2 ) ∈ Γ with x1 + x2 = 1 and
h(x) ≤ t
can be written, for some i, as x1 = ai ξ , x2 = bi η with ai ξ + bi η = 1 and
h(ξ) + h(η) ≤ γ . Since there are at most N such (ξ, η), we deduce that the
number of (x, y) in question does not exceed N · (1 + 2t/γ)r .
We can take t = max(4κ, 5). Hence the number of small solutions does not
exceed N · (1 + 2 max(4κ, 5)/γ)r . Thus the total number of solutions does not
exceed
42 · 41r + N · (1 + 2 max(4κ, 5)/γ)r .
This completes the proof of Theorem 5.2.1 if Γ is finitely generated. In the general
case, Γ is the union of its finitely generated subgroups (of rank at most r ). Since
the above upper bound depends only on the rank, this proves the claim.
5.3. Applications
The importance of the generalized unit equation stems from the fact that many
diophantine problems can be reduced to it. In this section, we review some of the
most interesting applications.
5.3.1. Let K be a number field, S a finite set of places of K containing all places
at infinity, and let OS,K denote the ring of S -integers of K , and let US,K be
the group of units of OS,K . Let also F (x, y) ∈ OS,K [x, y] be homogeneous of
degree r ≥ 3 with coefficients in OS,K and assume that F has at least three
non-proportional linear factors in a factorization over K .
The Thue–Mahler equation is the equation F (x, y) ∈ US,K , to be solved with
x, y ∈ OS,K .
In 1909, using a new method based on diophantine approximation, Thue proved
that the equation F (x, y) = m , with F (x, y) ∈ Z[x, y] and with three non-
proportional linear factors over C has only finitely many solutions in integers (see
6.2.1 for the argument). Through the work of Siegel and Mahler, this was extended
to equations in number fields to be solved in S -integers and to the more general
Thue–Mahler equation, with the proviso of considering equivalent two solutions
differing only by multiplication by an S -unit. We have:
a b
A=
c d
with
a = F (x0 , y0 )−1 (a0 xr−1 0 + a1 xr−2
0 y0 + · · · + ar−1 y0r−1 )
b = F (x0 , y0 )−1 ar y0r−1
c = −y0
d = x0 .
Then det(A) = 1 and A has entries in OS,K . Therefore, the Thue–Mahler equa-
tion F (x, y) ∈ US,K is equivalent to the other Thue–Mahler equation
G(x, y) := F (x0 , y0 )−1 F (dx − by, −cx + ay) ∈ US,K
with leading coefficient G(1, 0) = 1.
Let α1 , α2 , α3 be three distinct roots of F (x, 1) = 0 over K , and define:
(a) K = K(α1 , α2 , α3 );
(b) S the set of places of K over S , OS ,K the ring of S -integers of K ,
and US ,K the group of units of OS ,K .
The field K has degree at most r(r − 1)(r − 2) over K and |S | ≤ r(r − 1)(r −
2)|S| (use Corollary 1.3.2). The group US ,K has rank |S |−1 by Dirichlet’s unit
theorem (see Theorem 1.5.13). Now we define Γ to be the group of pairs (u, v)
with u, v ∈ US ,K ; it is clear that Γ has rank s not exceeding 2(|S | − 1) ≤
2r(r − 1)(r − 2)|S| − 2.
Since F has leading coefficient 1, all roots α1 , . . . , αr of the equation F (x, 1) = 0
are integral over OS,K . For a solution (x, y) ∈ OS,K 2
of the Thue–Mahler
142 T H E U N I T E QUAT I O N
equation, each factor x − αi y is integral over OS,K and the product is an S -unit.
Hence the factors x − αi y are S -units for i = 1, 2, 3.
On the other hand, the three linear forms x − αi y , i = 1, 2, 3, must be linearly
dependent, and in fact a linear relation is
α2 − α3 x − α1 y α3 − α1 x − α2 y
+ = 1.
α2 − α1 x − α3 y α2 − α1 x − α3 y
This is an equation of type Au + Bv = 1 with (u, v) ∈ Γ . We extend Γ
to a new group Γ of rank at most s + 1 by adding a new generator (A, B)
and apply Theorem 5.2.1. Then the number of solutions (u, v) does not exceed
C1 · C2s+1 . Conversely, (u, v) determines (x, y) up to multiplication by a scalar,
and it follows that (u, v) determines at most one equivalence class of solutions
(x, y) of the Thue–Mahler equation F (x, y) ∈ US,K .
5.3.4. Another equation which can be treated by similar methods is the hyperel-
liptic equation
by 2 = a0 xr + a1 xr−1 + · · · + ar
with coefficients in OS,K , b = 0, to be solved with (x, y) ∈ OS,K . For its treat-
ment, the reader is required to have some basic knowledge of algebraic number
theory. We recall that the class group of a number field is the group of fractional
ideals in K modulo the principal fractional ideals.
There is little loss in generality if we assume that b = 1 and that f (x) := a0 xr +
a1 xr−1 + · · · + ar has no multiple roots. In fact, we can always write bf (x) =
F (x)H(x)2 with F, H ∈ OS,K [x] and F (x) without multiple roots. This yields
the equation
Y 2 = F (X),
where Y = by/H(x) and X = x . Thus Y ∈ K if X ∈ OS,K and since Y is
integral over OS,K , which is integrally closed, we see that Y ∈ OS,K too.
Proof: We use a method of Siegel to reduce the equation to a finite number of unit
equations. We consider first the special case in which:
We write f (x) = (x − α1 )(x − α2 )(x − α3 )h(x). By (a), (b), and Gauss’s lemma
in 1.6.3, we have α1 , α2 , α3 ∈ OS,K and h(x) ∈ OS,K [x].
Let x ∈ OS,K be a solution of y 2 = f (x) with f (x) = 0. We claim that
the principal ideals [x − α1 ], [x − α2 ], [x − α3 ], [h(x)] in OS,K are pairwise
coprime. The first three ideals are coprime because αi − αj ∈ [x − αi , x − αj ]
and αi − αj divides Df , which is a unit by assumption (c). A similar argument
applies to x − αi and h(x), noting that by (a) all roots are integral over OS,K and
working with a factorization of h(x) in the splitting field of f .
Now the ideal equation
[x − α1 ] [x − α2 ] [x − α3 ] [h(x)] = [y]2
shows that
[x − αi ] = y2i , i = 1, 2, 3
for some ideal yi of OS,K . By assumption (d), we conclude that the square root
of [x − αi ] must be a principal ideal. Thus we can write
x − αi = ui yi2
with ui ∈ US,K and yi ∈ OS,K , for i = 1, 2, 3. Note also that we can take
u1 , u2 , and u3 modulo squares. By Dirichlet’s unit theorem in 1.5.13, the group
of units US,K of OS,K is the direct product of a cyclic torsion group and a free
abelian group of rank |S| − 1. Therefore, we need not consider more than 8|S|
triples (u1 , u2 , u3 ). Eliminating x from these equations we find
ui yi2 − uj yj2 = αj − αi for i, j = 1, 2, 3 and i = j. (5.14)
√ √ √
Let F = K( u1 , u2 , u3 ) and let S be the set of places of F lying over S .
In the field F , the equations (5.14) factorize as
√ √ √ √
( ui yi − uj yj )( ui yi + uj yj ) = αj − αi
√ √
with ui yi ± uj yj ∈ OS ,K , while αj − αi ∈ US ,K by (c). Thus vij :=
√ √
ui yi − uj yj is a unit in OS ,K for i, j = 1, 2, 3 and i = j . On the other
hand, we have identically
(v12 /v13 ) + (v23 /v13 ) = 1.
144 T H E U N I T E QUAT I O N
Now (v12 /v13 , v23 /v13 ) ∈ US ,K ×US ,K , which has rank 2|S |−2. According
2|S |−2
to Theorem 5.2.1, this equation has at most C1 · C2 solutions. Also |S | ≤
[F : K] |S| ≤ 8 |S|.
Hence let us fix (v12 /v13 , v23 /v13 ), so that we can write vij = cij w with the
2|S |−2
{cij } having not more than C1 · C2 possibilities. We have
c−1 −1
ij (αj − αi ) = cij (ui yi − uj yj )
2 2
√
= c−1
ij ui yi2 − ( ui yi − cij w)2
√
= 2 ui yi w − cij w2 .
For a given i we have two distinct values j, k = i and cij = cik . Hence
c−1 −1
ij (αj − αi ) − cik (αk − αi ) = (cik − cij )w .
2
To complete the proof in the general case, we have only to enlarge the field K and
the set S so that our assumptions (a) to (d) are verified.
r −1
For (a), it suffices to add a0 2 to the field K .
For (b), it suffices to add αi , i = 1, 2, 3, to the field K .
For (c), it suffices to add to S the set S2 of places v for which ordv (Df ) > 0.
This gives us an extension K of K (of degree [K : K] ≤ 2r(r − 1)(r − 2)) and
a new set S3 , the places of K lying over S and S2 . Thus we may assume that
(a), (b), (c) are satisfied.
For (d), we use:
Proposition 5.3.6. Let K be a number field. Then we can find a finite set of places
S of K such that for any finite set of places T ∈ MK with T ⊃ S , the ring OT,K
is a principal ideal domain and hence a unique factorization domain.
Proof: The ring of integers OK is not necessarily a unique factorization domain.
Since the class group of a number field is finite (see e.g. [172], Ch.V or [162], Th.
2.7.1), there are ideals I1 , . . . , Ir in OK forming a finite set of representatives for
the class group of OK . Let M1 , . . . , Mn be a finite set of prime ideals of OK ,
containing all maximal ideals dividing at least one of the ideals Ij , j = 1, . . . , r .
5.3. Applications 145
The set
+
n
M := (OK \ Mj )
j=1
is multiplicatively closed.
Let R be the localization of OK in M . The ring R is again a Dedekind domain
([156],Th.10.4). In order to show that R is a unique factorization domain, it is
enough to prove that any maximal ideal m in R is principal. The maximal ideal
M := m ∩ OK of OK generates m . Moreover, M is equivalent to a product of
ideals Mj , because they generate the class group of OK . Therefore, m is also
equivalent to a product of ideals R Mj .
Since R Mj = R for every j , we see that M is generated by a single element.
We conclude that R is a principal ideal domain. This proves what we want, with
S the set of places determined by the prime ideals dividing the ideals Ij .
This proves a slightly weaker version of the theorem, in which ν is the number
of distinct prime ideals dividing a set of ideals which generate the class group of
K . For the more precise statement, we need two observations. First, note that in
place of (d) it suffices that the 2-primary part of the class group of OK is trivial.
Second, as shown by Landau in 1907, every ideal class contains prime ideals. This
follows from the general form of Dirichlet’s density theorem, a particular case of
which states that the set of prime ideals p of OK in a given ideal class of the class
group CK is a set of positive natural density 1/|CK | (see e.g. [215], Ch.VII, §2,
Prop.7.10, Cor.4).
Hence in the proof of the preceding proposition we can take all ideals Ij to be
prime ideals. This also shows that we can take ν to be the cardinality of a set of
generators of the 2-subgroup of the class group, completing the proof.
A discussion of the unit equation would not be complete without mentioning its effective so-
lution obtained by Baker’s method or by the so-called Thue–Siegel principle. In its simplest
formulation, everything follows from:
Theorem 5.4.1. Let K be a number field and Γ a finitely generated subgroup of K × . Let
0 < ε ≤ 1 and v ∈ MK . There is an effectively computable function C(K, Γ, v, ε) such
that every solution of the diophantine inequality
|1 − γ|v ≤ H(γ)−ε , γ∈Γ
satisfies h(γ) ≤ C(K, Γ, v, ε) .
Remark 5.4.2. The best bounds for C(K, Γ, v, ε) are obtained via Baker’s theory of lin-
ear forms in logarithms in many variables, as in A. Baker and G. Wüstholz [16] in the
archimedean case and Kunrui Yu [335] in the non-archimedean case, see also Y. Bugeaud
[53] and Y. Bugeaud and M. Laurent [54].
A self-contained proof of Theorem 5.4.1, obtained with quite a different method (the so-
called Thue–Siegel method) is in [31] and E. Bombieri and P.B. Cohen [32]. The special
case A = 1 of the estimate in [32], which holds in the non-archimedean case, yields the
following completely explicit result.
We define ρ(x) ≥ e5 to be the solution of ρ/(log ρ)5 = x if x > e5 5−5 and ρ(x) = e5
otherwise and denote by ξ1 , . . . , ξt a set of generators of Γ/tors . Let K be a number
field of degree d , v be a non-archimedean place of K dividing the rational prime p and
with residue class degree fv . For Dv∗ := max (1, d/(fv log p)) , define h (x) to be the
modified height h (x) = max (h(x), 1/Dv∗ ) and H (x) = exp(h (x)) . Finally, let
t
C = 66pfv (Dv∗ )6 and Q = (2tρ(C/ε))t h (ξi ).
i=1
5.4.3. It is easy to see how Theorem 5.4.1 can be used to solve effectively the unit equation
x + y = 1 in the group Γ = US,K of S -units of K . We give a quick sketch of the
argument.
Since y is an S -integer, we have v∈S log+ |y|v = h(y) and also v∈S log |y|v = 0
by the product formula, because log |y|v = 0 for v ∈ / S . Therefore, there is v ∈ S such
that
1
log |y|v ≤ − h(y),
|S|
which is the same as
|1 − x|v = |y|v ≤ H(y)−1/|S | .
Moreover, it is clear that H(x) = H(1 − y) ≤ 2H(y) (use Proposition 1.5.15) , hence
|1 − x|v ≤ 21/|S | H(x)−1/|S | . (5.15)
5.4. Effective methods 147
If H(x) ≤ 4 , we have the desired bound. If instead H(x) > 4 , it is immediate from (5.15)
that
|1 − x|v ≤ H(x)−1/(2|S |) ,
hence in any case Theorem 5.4.1 yields
h(x) ≤ max{log 4, C(K, Γ, v, 1/(2|S|))}.
Remark 5.4.4. Lang’s proof of the theorem stated in the introduction is obtained by using
Hurwitz’s genus formula to show that, for large m , a component of the pull-back of the
curve C ⊂ Gnm by the isogeny x
→ xm has necessarily large genus. Then we conclude
by an application of the well-known Siegel’s finiteness theorem on integral points on curves.
(For curves over a number field, see Theorem 7.3.9 and Remark 7.3.10. For the general case,
see [169], Ch.8, Th.2.4.)
We give here a sketch of a different argument, for the case when the curve is defined over a
number field, because it reduces the proof to the statement of Theorem 5.4.1 rather than the
ineffective Siegel theorem. We prove:
Theorem 5.4.5. Let C be a geometrically irreducible closed curve in Gnm , defined over a
number field K , not a translate of a subtorus of Gnm , and let Γ be any finitely generated
subgroup of Gnm (K) . Then C ∩ Γ is an effectively computable finite set.
Proof: By using the projections x
→ (xi , xj ) onto G2m , we easily reduce the problem to
the case n = 2 and where C is given by the equation f (x, y) = 0 .
Since Γ is finitely generated, there is a number field L and a finite set S ⊂ ML containing
×
all archimedean places such that Γ ⊂ (OS,L )2 . Replacing K by L and enlarging Γ , we
×
may assume that Γ = Γ1 × Γ1 , where Γ1 ⊂ OS,K is finitely generated.
For any (α, β) ∈ Γ , the affine multiplicative height satisfies
H((α, β)) := max (1, |α|v , |βv |v ) ≤ max (1, |α|v , |β|v )|S | . (5.16)
v∈S
v∈M K
Now we let (α, β) range over C ∩ Γ and we want to get an effective upper bound for
the height. Replacing α by α−1 or β by β −1 , which does not affect the standard height
h(α) + h(β) , and replacing C by the image of the corresponding automorphism, we may
asssume that the maximum in (5.16)is attained in v ∈ S with min (|α|v , |β|v ) ≥ 1 . Now
consider the polynomial f (x, y) = aij xi y j and order terms as
|apq αp β q |v ≥ |ars αr β s |v ≥ . . .
Since f (α, β) = 0 , the two largest terms must be of the same order of magnitude, hence
p log |α|v + q log |β|v = r log |α|v + s log |β|v + O(1)
(5.17)
≥ i log |α|v + j log |β|v + O(1)
for all monomials xi y j appearing in f (x, y) . Note that we may restrict our attention to the
set of (α, β) ∈ C ∩ Γ having the above properties with respect to the fixed absolute value
v ∈ S and with |apq αp β q |v ≥ |ars αr β s |v as the largest terms in f for fixed (p, q), (r, s) .
Here and in the following, the Landau (and Vinogradov) symbols are with respect to this
set. In particular, (p, q) and (r, s) must be linearly independent if H((α, β)) is large,
because of (5.16) and min(log |α|v , log |β|v ) ≥ 0 .
148 T H E U N I T E QUAT I O N
Consider now another monomial xi y j . We have (i, j) = Aij · (p, q) + Bij · (r, s) for
certain rational numbers Aij , Bij , with denominators dividing D = |ps − qr| ≥ 1 , hence
DAij , DBij ∈ Z . By (5.17), we deduce
p log |α|v + q log |β|v ≥ i log |α|v + j log |β|v + O(1)
= (Aij p + Bij r) log |α|v + (Aij q + Bij s) log |β|v + O(1)
= (Aij + Bij )(p log |α|v + q log |β|v ) + O(1) .
Therefore, if H((α, β)) is large enough, we must have either Aij + Bij = 1 or Aij +
Bij ≤ 1 − 1/D . Now we define I to be the set of all pairs (i, j) such that Aij + Bij = 1 .
Thus we have for (i, j) ∈ I an equation
xi y j = (xp y q )(xr−p y s−q )B i j . (5.18)
/ I , then Aij + Bij ≤ 1 − 1/D implies that
If instead (i, j) ∈
log |αi β j |v ≤ (1 − 1/D) log |αp β q |v + O(1). (5.19)
where
R(t) = aij tDB i j .
(i,j)∈I
Now we specialize (ξ, η) with (ξ D , η D ) = (α, β) and correspondingly write Ξ , H for the
specializations of X and Y . For (i, j) ∈ I , the bound (5.19) yields
|ξ Di η Dj |v = O(|Ξ|D−1
v ).
Since f (ξ D , η D ) = f (α, β) = 0 , from (5.20) and (5.16), we find in case of pq = 0 that
|R(H/Ξ)|v |Ξ|−1
v ≤ max (|α|v , |β|v )−1/D ≤ H((α, β))−1/(|S |D) .
A similar exponential bound follows easily from the definition of (p, q) if pq = 0 . By
(5.17) again, |H/Ξ|v is bounded and bounded away from 0 and from the last displayed
equation we see that
|1 − ζ −1 H/Ξ|v H((α, β))−c
for some root ζ of R(t) and some c > 0 .
Finally, ζ −1 H/Ξ belongs to the finitely generated group Γ2 obtained by adding ζ to
the division group {P ∈ G2m | P D ∈ Γ1 } of order D of Γ1 . It is also clear that
H((α, β))−c H(ζ −1 H/Ξ)−κ for some κ > 0 . Thus we may apply Theorem 5.4.1
to Γ2 and conclude that H(αr−p β s−q ) is bounded. By Northcott’s theorem in 1.6.8,
αr−p β s−q belongs to a finite set, hence we have shown that (α, β) belongs to a finite
union of effectively computable torus cosets in G2m . Since C is geometrically irreducible
and not a torus coset by hypothesis, the intersection of C with these torus cosets is finite
and effectively computable.
For comparison with this argument, see also the proof of Theorem 7.4.7.
5.5. Bibliographical notes 149
Special cases of the unit equation appear in the work of Siegel [283] and Mahler
[187], and finiteness of the number of solutions is proved through a reduction to
a finite set of Thue equations. The first general formulation in a geometric setting
was done by S. Lang in 1960 [166]. S. Lang’s conjectured extension [167] to the
division group of a finitely generated group was later proved by P. Liardet [182].
Theorem 5.2.1 is the coronation of a long series of successive improvements in
counting the number of solutions of unit equations. Our proof follows [23] quite
closely. K.K. Choi has shown, in an unpublished note, that we can take C1 = 30,
C2 = 70 in this theorem. The first such bound depending only on the rank was
obtained, for the generalized S -unit equation in a number field, by J.-H. Evertse
[103] in a paper which sparked much research in this area.
The argument in Example 5.2.4 is due to D. Zagier and simplifies a more precise
calculation due to P. Erdős, C.L. Stewart, and R. Tijdeman [101], which leads to a
better value of the constant c. Example 5.2.5 is due to J. Berstel and is mentioned
in Beukers and Schlickewei [23], where it is shown that such an equation has at
most 61 solutions. The remark in Example 5.2.6 on complex solutions of the unit
equation in a cyclotomic field is due to H.W. Lenstra; this applies in a more general
setting, notably CM -fields.
The reduction of a Thue equation to a unit equation is in [283], Zweiter Teil, §1.
Siegel studies the unit equation by taking cosets of units modulo high powers,
thereby reducing it to a finite set of equations axr + by r = c , for which he had
independently proved finiteness of the number of integral solutions using diophan-
tine approximation methods.
Theorem 5.3.2 is due to Evertse [103]. Uniform polynomial bounds in r were
independently obtained in [30] and, with a sharper result and better proof, in [109].
The reduction of a hyperelliptic equation to a unit equation is in a two-page paper
[282] by C.L. Siegel in 1926, published under the pseudonym X.
6 ROT H ’ S T H E O R E M
6.1. Introduction
The Liouville inequality in 1.5.21, or its projective version in 2.8.21, while simple
and useful, does not tell the real truth about how well we can approximate algebraic
numbers by algebraic numbers in a fixed field K .
In 1909 A. Thue obtained the first improvement on Liouville’s theorem about ap-
proximation of algebraic numbers by rational numbers. He proved the following
result:
Thue’s theorem: Let α be a real algebraic number of degree d ≥ 3 and let
ε > 0. Then there are only finitely many rational numbers p/q , q ≥ 1, such that
α − p ≤ d 1 .
q q 2 +1+ε
d √
min s + < 2 d.
s∈N s+1
The fact that this exponent is of order o(d) rather than d turned out to be quite
important in Siegel’s proof of the finiteness of the number of integral points on a
curve of genus g ≥ 1, in treating the case g = 1 (for the case g ≥ 2, Siegel had
to develop a corresponding method dealing with simultaneous approximations). A
little later, Mahler developed the same method over the p -adic numbers, thereby
obtaining as an application the finiteness of the number of solutions of a Thue
equation in S -integers rather than ordinary integers; again, this had significant
applications to other problems in number theory. However, all these extensions of
Thue’s theorem also suffer from the same problem of ineffectivity.
Thue’s method depended on an auxiliary construction with polynomials in two
variables. It was expected that a similar construction in m variables would yield
further drastic improvements, and this was explored by Siegel and Schneider in
the 1930s. They could not deal with one crucial point of the construction, namely
the non-vanishing of the auxiliary polynomial when evaluated at a special point.
It was only in 1955 that Roth was able to overcome this stumbling block, thereby
obtaining the sharp exponent 2 in place of d/2 + 1 in Thue’s theorem.
In Section 6.2, we start by proving finiteness of integer solutions for the Thue
equation, then we formulate Roth’s theorem for number fields with respect to a
finite set of places and we sketch the proof in the case K = Q with respect to the
ordinary absolute value.
In Section 6.3, we introduce the index of a polynomial as a measure for its van-
ishing in a given point. It is used together with Wronskian techniques to prove
Roth’s lemma, the crucial point for the non-vanishing of the auxiliary polynomial
mentioned above. Section 6.4 is reserved for the proof of Roth’s theorem.
In Section 6.5, we prove Vojta’s generalization with moving targets, we give quan-
titative results for the number of exceptional good approximations, and we mention
the Cugiani–Mahler theorem. This section provides us with additional informa-
tion, which may be omitted in a first reading.
For this chapter, the reader should be familiar with the results about Siegel’s lemma
in Section 2.9. However, we do not need here the most sophisticated formulations,
for example the dependence on the discriminant of the number field will be irrele-
vant and we may deduce Roth’s theorem also from a relative version of the easier
Corollary 2.9.2. Roth’s theorem will be rather important in the sequel, it will be
used and generalized in Chapters 7 and 14 and Roth’s lemma is a tool in the proof
of Schmidt’s subspace theorem in Chapter 7 and in the proof of Mordell’s conjec-
ture in Chapter 11.
152 ROT H ’ S T H E O R E M
This section states Roth’s theorem and gives a sketch of the proof.
6.2.1. In order to see how diophantine approximation can be applied to diophan-
tine equations, we start with the argument that Thue’s theorem implies finiteness
of integer solutions of the Thue equation (see the introduction in 6.1 for the state-
ments, and see Theorem 5.3.2 for a generalization and a quantitative result).
The argument is by contradiction. First, we assume F irreducible. We use the
decomposition
x x x m
F , 1 = ad − α1 · · · − αd = d
y y y y
into linear factors. If there are infinitely many integer solutions (xn , yn ) of the
Thue equation, then |yn | → ∞ and we may assume, by passing to a subsequence,
that xn /yn tends to a zero αj . As the other factors are bounded away from 0, we
get infinitely many solutions of |x/y − αj | ≤ C|y|−d for some constant C > 0.
Since d ≥ 3, this contradicts Thue’s theorem.
In general, let F1 , . . . , Fr be the non-constant irreducible polynomials in Z[x, y]
dividing F . By a linear change of coordinates, we may assume that y is not
a divisor of F . Dirichlet’s box principle gives finitely many divisors mj of m
such that the system of equations Fj (x, y) = mj (j = 1, . . . , r) has infinitely
many solutions (xn , yn ). The above argument shows that xn /yn approaches a
zero of every Fj (x, 1) and hence r = 1. Since F has at least three different
linear factors, we get deg(F1 ) ≥ 3. Now the irreducible case considered above
leads to a contradiction with the initial assumption of infinitely many solutions of
F1 (x, y) = m1 .
6.2.2. Now we give Lang’s general formulation of Roth’s theorem over number
fields; the same statement for the rational field Q belongs to D. Ridout.
We use the notions introduced in Chapter 1, 1.4.3: For a place v of a number field
K , we denote by | |v the normalized absolute value (as in (1.6) on page 11) to
get the product formula. As usual, we denote again by | |v its extension to the
completion Kv . The absolute exponential height H(α) = eh(α) for an algebraic
number is as defined in 1.5.7.
Theorem 6.2.3. Let K be a number field with a finite set S of places. For each
v ∈ S let αv ∈ Kv be K -algebraic. Let κ > 2. Then there are only finitely many
β ∈ K such that
min (1, |β − αv |v ) ≤ H(β)−κ .
v∈S
The classical theorem of Roth is the special case K = Q and S = {∞}, so that
| |v is the ordinary absolute value in R .
6.2. Roth’s theorem 153
Remark 6.2.4. Theorem 6.2.3 is ineffective in the sense that the proof does not
give an upper bound for H(β). However, it does give an upper bound for the
number of solutions β , see 6.5.3.
inequality
−2−ε
|α∞ − β|∞ = |1 − 3k /(n · 2k )| < 2k 3k |n|3 3k |n|3
154 ROT H ’ S T H E O R E M
6.2.9. We first give a sketch of the main steps in the proof of Roth’s theorem in the
simplest possible case, namely K = Q and S = {∞}, so that | |v is the ordinary
euclidean absolute value. There is only one α to worry about. Suppose we have
infinitely many rational approximations p/q to α such that
α − p ≤ q −κ .
q
6.2. Roth’s theorem 155
Then, for any positive integer m and any large constant M, we can find m rational
approximations to α , namely pj /qj , j = 1, . . . , m, with log q1 > L and also
log qj+1 > M log qj , j = 1, . . . , m − 1.
Since P does not vanish at the rational point, it is bounded away from 0 as
q1−d1 · · · qm
−dm
≤ |P (p1 /q1 , . . . , pm /qm )| .
In this section we prove several preliminary results needed for the proof of Roth’s
theorem. It will also be convenient to use the notation and formalism already
developed in Chapter 1.
6.3.1. We abbreviate x = (x1 , . . . , xm ), αv = (αv1 , . . . , αvm ) and write
m
m mj
=
µ j=1
µj
and
µ1
µm
1 ∂ ∂
∂µ = ··· .
µ1 ! · · · µm ! ∂x1 ∂xm
We have
m m m−µ
∂µ x = x . (6.1)
µ
6.3.2. We work with polynomials in F [x1 , . . . , xm ] for a field F , vanishing to
high order at a point. For P ∈ F [x1 , . . . , xm ] and positive weights d = (d1 , . . . ,
dm ), we define the index of P at a point α = (α1 , . . . , αm ) to be
$ %
µ1 µm
ind(P ; d; α) = min + ··· + | ∂µ P (α) = 0 .
µ d1 dm
The following properties hold:
6.3. Preliminary lemmas 157
For an upper bound for Z , we note that, if (i1 /d1 , . . . , im /dm ) is a lattice point
in Vm (t), then
i1 + 1 im + 1 1 1
+ ··· + ≤t+ + ··· +
d1 dm d1 dm
and iν + 1 ≤ dν + 1. It follows that, if we rescale Vm (t) by a factor 1 +
max(1, t−1 )(1/d1 + · · · + 1/dm ), then the rescaled domain contains all paral-
lelepipeds associated to lattice points in Vm (t). Hence
m
−1 1 1
Vm (t)d1 · · · dm ≤ Z ≤ Vm (t) 1 + max(1, t ) + ··· + d1 · · · dm ,
d1 dm
thus showing that Z ∼ Vm (t)d1 · · · dm as dj → ∞ .
In particular, we have N > rM if rVm (t) < 1 and dj → ∞ . By (6.1) on
page 156, the matrix of coefficients has entries
J
A= (αk ) J−I
I
with rows indexed by (I, k) and columns indexed by J .
We apply Siegel’s lemma as given in Theorem 2.9.19. In our case, it suffices to
produce only one small solution to our system of equations. Theorem 2.9.19 and
2.9.8 imply that, if N > rM , there is a solution x such that
r/(N −rM )
√
H(x) ≤ |DK | 1/2r
N H(A(I,k) ) , (6.2)
(I,k)
m
Vm (tk )/2
dj Vm (tk ) (1+o(1))d1 ···dm
(dj + 1) 2 H(αkj ) .
k=1 j=1
The conclusion follows from (6.2), noting that the term |DK |1/2r and the terms
(dj + 1)Vm (tk )/2 are negligible with respect to 2dj Vm (tk ) , so they do not affect the
asymptotics in the final estimate.
6.3. Preliminary lemmas 159
Lemma 6.3.5. If 0 ≤ ε ≤ 1
2, then
1 2
Vm − ε m ≤ e−6mε .
2
Proof: We use a familiar method of probability theory. We set χ(x) = 1 if x < 0
and 0 if x > 0. Since χ(x) < e−λx for every λ > 0, we have
1 12
1 2
Vm −ε m = ··· χ(x1 + · · · + xm + mε) dx1 · · · dxm
2 − 12 − 12
12 12
≤ ··· e−λ(mε+ xj ) dx1 · · · dxm
−1 − 12
2 1 m
2
−λ(ε+x)
= e dx
− 12
= exp(−mU (λ))
with
sinh(λ/2)
U (λ) = ελ − log .
λ/2
It is possible to show that this estimate is quite precise, in the sense that
1
log Vm −ε m = −m max U (λ) + O(log m)
2 λ
Then
m −1
ind(P ; d; ξ) ≤ 2m σ 1/2 .
160 ROT H ’ S T H E O R E M
where s ≤ dm and where the fj s, and similarly the gj s, are linearly independent
polynomials defined over Q .
ν
ind(∂µ,ν P ) ≥ max ind(P ) − , 0 − ε. (6.5)
dm
By 6.3.2 (a), (b), inequality (6.5), and expanding W by means of the Laplace
expansion, we get (here π runs over permutations of 0, . . . , s)
s
π(i)
≥ min max ind(P ) − ,0 − σ
π
i=0
dm
s
i
= max ind(P ) − ,0 − σ
i=0
dm
1 1
≥ (s + 1) min ind(P ), ind(P ) − (s + 1)σ.
2
(6.6)
2 2
6.4. Proof of Roth’s theorem 163
i 1 1 2
max t − , 0 ≥ (s + 1) min t, t .
i=0
s 2 2
and using Roth’s lemma inductively on the number of variables to estimate ind(U )
and ind(V ).
Suppose we have proved Roth’s lemma for polynomials in l < m variables. We
apply the inductive assumption of Roth’s lemma to U and V but with (s + 1)dj
in place of dj . In view of the bounds obtained in (6.4) on page 161 for h(U ) and
h(V ), the hypotheses of Roth’s lemma are satisfied. Therefore, we obtain
m −2
ind(U ) ≤ 2(m − 1)(s + 1) σ 1/2 , ind(V ) ≤ (s + 1) σ
(use the better bound given in 6.3.9 for the case m = 1 when estimating ind(V )).
We insert these two estimates in (6.7) obtaining an upper bound for ind(W ), and
compare with the lower bound (6.6). This gives
m −2
min (ind(P ), ind(P )2 ) ≤ 4(m − 1)σ 1/2 + 4σ.
In any case ind(P ) ≤ m , hence the preceding bound may be simplified to
m −2 m −2
ind(P )2 ≤ 4m(m − 1)σ 1/2 + 4mε ≤ 4m2 σ 1/2 .
This completes the induction and the proof of Roth’s lemma.
The general case of Roth’s theorem is proved along similar lines as outlined in
Section 6.2, but the presence of several places means that we can only compare
approximations which have a similar behaviour at each place v . This creates ad-
ditional complications.
We shall prove the following statement:
Theorem 6.4.1. Let K be a number field with a finite set S of places. Let F be a
finite-dimensional extension of K and, for v ∈ S , let αv ∈ F . We extend | |v to
an absolute value | |v,K of F . Then for any κ > 2, there are only finitely many
β ∈ K such that
min(1, |β − αv |v,K ) ≤ H(β)−κ . (6.8)
v∈S
164 ROT H ’ S T H E O R E M
This is a point in the |S|-dimensional unit cube lying on the hyperplane where
the sum of the coordinates is 1. We partition this cube by means of a grid of
semi-open subcubes of side 1/N where N is a positive integer, and classify β
according to the subcube containing the corresponding vector. The set of non-
trivial approximations β determining a same subcube is called an approximation
class. The quantity 1/N is the size of the approximation class.
Let λ := (λv )v∈S be the south-west corner of a subcube, namely the point
with x any point in the subcube. Then we denote by Q(λ) the corresponding sub-
cube and by C(λ; N ) the approximation class of size 1/N determined by Q(λ).
For every β ∈ C(λ; N ) and v ∈ S , we have by definition
1
Λ(β)λv + N < min (1, |β − αv |v,K ) ≤ Λ(β)λv . (6.9)
N + |S|
< 2N +|S| .
|S|
Proof: Consider an approximation class C(λ; N ). Then nv := N λv is a non-
negative integer and, by (6.10), we have
nv ≤ N.
v∈S
N +|S|
The number of solutions of this inequality is |S|
.
1 2 1
rVm (t) = r |S| Vm − ε m < r |S| e−6mε ≤
2 2
provided
log(2r |S|)
m> ,
6ε2
which we shall assume for the rest of this section. Thus the hypothesis of Lemma
6.3.4 is verified. We estimate (b) of Lemma 6.3.4 by noting that rVm (t)/(1 −
rVm (t)) ≤ 1. In this way we obtain a non-trivial polynomial P with coefficients
166 ROT H ’ S T H E O R E M
as D → ∞ . If we define
C1 := |S| (max h(αv ) + log 2), (6.12)
v∈S
we obtain for large D the bound
h(P ) ≤ 2C1 D/L. (6.13)
6.4.6. Step II: Application of Roth’s lemma.
We would like P to have the additional property that P (β) = 0. In order to do
this, we apply Roth’s lemma to show that the polynomial P so constructed does
not vanish too much at β if β is (L, M )-independent and L, M are large, and
then work with a suitable derivative of P rather than P itself. The details are as
follows.
Let 0 < σ ≤ 1
2 . By Roth’s lemma in 6.3.7, we have
m −1
ind(P ; d; β) ≤ 2m σ 1/2
provided dj+1 /dj ≤ σ and dj h(βj ) ≥ σ −1 (h(P ) + 4md1 ).
In our case dj = D/h(βj ) ∼ D/h(βj ) and h(βj+1 ) ≥ M h(βj ), therefore the
first condition dj+1 /dj ≤ σ is verified if
M ≥ 2σ −1
and D is large enough, which we shall suppose. Similarly, using dj h(βj ) ∼
D , d1 ≤ D/h(β1 ) ≤ D/L and (6.13) we see that the condition dj h(βj ) ≥
σ −1 (h(P ) + 4md1 ) is verified for large D if
2C1 + 5m
D ≥ σ −1 D,
L
which is so if L ≥ (2C1 + 5m)σ −1 .
We conclude that, if M ≥ 2σ −1 , L ≥ (2C1 + 5m)σ −1 and D is large enough,
we have m −1
ind(P ; d; β) ≤ 2m σ 1/2 .
m −1
We choose σ = ε2 and deduce that there is µ such that ∂µ P (β) = 0 and
m
µj
≤ 2mε.
j=1
dj
j1 jm 1
∂j Q(αv ) = 0 if + ··· + < − 3ε m ; (6.15)
d1 dm 2
and also a direct estimate yields
m
log |∂k Q(αv )|v,K ≤ log |Q|v + log+ |αv |v,K (dj − kj ) + εv (log 2 + o(1))dj .
j=1
(6.16)
We have the easily verified inequality
1
log |a − b|v,K ≤ − log+ + log+ |a|v,K + log+ |b|v,K + εv log 2
|a − b|v,K
168 ROT H ’ S T H E O R E M
(the first term on the right-hand side suffices if |a−b|v,K ≤ 1, while the remaining
terms take care of the case |a − b|v,K > 1), hence using (6.16) we find
m
log ∂k Q(αv ) kj
(βj − αv )
j=1 v,K
m
1
≤− kj log+ + log |Q|v
j=1
|βj − αv |v,K
m
+ (log+ |βj |v + log+ |αv |v,K + (log 4 + o(1))εv ) dj .
j=1
Now we can estimate log Q(β)v
m
log |Q(β)|v = log ∂k Q(αv ) (βj − αv )kj
j=1 v,K
m
m
≤ max log ∂k Q(αv ) kj
(βj − αv ) + εv log(dj + 1)
k v,K
j=1 j=1
⎛ ⎞
m
1
≤ − min ⎝ kj log + ⎠ + log |Q|v
j=1
|βj − α v |v,K
m
+ (log+ |βj |v + log+ |αv |v,K + (log 4 + o(1))εv ) dj , (6.17)
j=1
where min means that the minimum is taken over (k1 , . . . , km ) not as in (6.15),
that is with
k1 km 1
+ ··· + ≥ − 3ε m. (6.18)
d1 dm 2
We put together (6.14) and (6.17), note that v εv = 1, and find
⎛ ⎞
m
1
log Q(β)v ≤ − min ⎝ kj log+ ⎠
j=1
|βj − α v |v,K
v∈MK v∈S
m
Lemma 6.4.7 (c) provides an upper bound for h(Q), and we have already observed
that dj ≤ 2d1 ≤ 2D/L+o(D/L). Also h(βj )dj ∼ D and h(βj ) ≥ L. Hence
6.4. Proof of Roth’s theorem 169
C2
+ m+ D + o(D) (6.19)
L
with for example C2 = 4C1 + 2 log 4 + 2|S| maxv∈S log+ |αv |v,K .
It remains to estimate the minimum in the last displayed inequality. It is here that
we use the fact that the approximations βj are of similar type. Let C(λ; N ) be the
approximation class of the approximations βj . Recall that
Λ(βj ) := min(1, |βj − αv |v,K ).
v∈S
By the hypothesis (6.8) on page 163 and by (6.9) on page 164, we have
1 1
λv κ h(βj ) ≤ λv log ≤ log+ ;
Λ(βj ) |βj − αv |v,K
therefore, using h(βj )dj ∼ D , we get
⎛ ⎞ ⎛ ⎞
m
1 m
min ⎝ kj log+ ⎠≥ min ⎝ λv κ h(βj ) kj ⎠
j=1
|βj − α v |v,K j=1
v∈S v∈S
⎛ ⎞
m
k
λv min ⎝ h(βj ) dj ⎠
j
=κ
j=1
d j
v∈S
⎛ ⎞
m
kj ⎠
∼ Dκ λv min ⎝ .
j=1 j
d
v∈S
Now (6.10) on page 164 gives λv ≥ 1 − |S|/N and (6.18) gives
1
min kj /dj ≥ − 3ε m.
2
We substitute in the last displayed inequality and find
⎛ ⎞
m
min ⎝ kj log+
1 ⎠ ≥ κ 1 − |S| 1
− 3ε mD + o(D).
j=1
|βj − αv |v,K N 2
v∈S
(6.20)
Finally, we substitute (6.20) into (6.19), obtaining the desired majorization
|S| 1 C2
log Q(β) v ≤ −κ 1 − − 3ε mD + m + D + o(D).
N 2 L
v∈MK
(6.21)
170 ROT H ’ S T H E O R E M
|S| 1 C2
−κ 1 − − 3ε + 1 + ≥ 0,
N 2 mL
1
which (assuming 2> 3 ε ) we may rewrite as
−1
−1
C2 |S| 1
κ≤ 1+ 1− − 3ε . (6.22)
L N 2
The right-hand side of this inequality tends to 2 as ε → 0, N → ∞ and L → ∞ ,
contradicting κ > 2.
For the reader’s convenience, we state in which order these various parameters
need to be chosen. First, note that the constants C1 and C2 depend only on the
given data K, F, (αv )v∈S in Theorem 6.4.1. Arguing by contradiction, we assume
that there are infinitely many solutions β of (6.8) on page 163 for some κ > 2.
We choose ε > 0 so small that ( 12 − 3ε)−1 < κ . Then we fix m, L, M so that the
assumptions of Lemma 6.4.7 are satisfied. Moreover, we may assume that L, N
are so large that (6.22) above is not satisfied. There is one approximation class
of size 1/N containing infinitely many solutions β of (6.8) on page 163, and
in particular containing m (L, M )-independent solutions β1 , . . . , βm . Once all
these data have been fixed the new parameter D is introduced, assumed very large,
and the polynomial Q satisfying (a) to (c) of Lemma 6.4.7 is constructed. This
leads to the asymptotic inequality (6.21) and, letting D → ∞ , to (6.22), which is
the desired contradiction.
Theorem 6.5.2. Let K be a number field and let S be a finite subset of MK . Let F be a
finite extension of K and for each v ∈ S , extend | |v to an absolute value | |v,K of F .
Let κ > 2 . Then we cannot have an infinite sequence of points (αj , βj ) such that:
|S|
h(β ) ≥ − log 4 + 1− κ − 1 h(β).
N
may assume that |αv − β|v,K < 1 and |αv − β |v,K < 1 for v ∈ S . By (6.9) on page 164,
we have for v ∈ S
log |β − β |v = log |(αv − β ) − (αv − β)|v
≤ max(log |αv − β |v,K , log |αv − β|v,K ) + εv log 2
≤ − min κλv h(β ), κλv h(β) + εv log 2
= −κλv h(β) + εv log 2.
Now we sum over v ∈ S and using (6.10) on page 164, we find
Finally, the fundamental inequality (1.8) on page 20 and Proposition 1.5.15 give (note that
β = β )
log |β − β |v ≥ −h(β − β ) ≥ −h(β) − h(β ) − log 2
v∈S
and the result follows from the last two displayed inequalities.
Lemma 6.5.6. Let K be a number field, S a finite subset of MK and F a finite extension
. Let κ
> 2 and let αv ∈ F for v ∈ S . Let N be an integer so large that
of K
|S |
c := 1− N
κ − 1 > 1 and let X > log 16/(c − 1) and A > 1 . Then the inequality
min(1, |β − αv |v,K ) ≤ H(β)−κ
v∈S
+|S |
has at most log A/ log c+1
2
N |S |
solutions β with h(β) ∈ (X, AX] .
Proof: The interval (X, AX] is contained in the union of log A/ log c+1
2
intervals, each
of type (( c+1
2
)k X, ( c+1
2
)k+1 X] .
+|S |
By the pigeon-hole principle, if there were more than log A/ log c+1 2
N |S |
approx-
imations β with height h(β) ∈ (X, AX] , we would find an interval of type (Y, c+1 Y]
+|S | 2
with Y > (log 16)/(c − 1) , and at least N |S |
approximations β , such that h(β) ∈
(Y, c+1
2
Y ] . Now, using Lemma 6.4.3, another application of the pigeon-hole principle
would give two elements β and β with h(β), h(β ) ∈ (Y, cY − log 4] and in a same
approximation class C(λ; N ) . This contradicts Remark 6.5.5.
6.5.7. It is now clear how to obtain a bound for the number of solutions of the inequality in
Theorem 6.4.1. Let ε > 0 be such that ( 12 − 3ε)−1 < κ , let m = log(2r|S|)/(6ε2 ) ,
m −1 m −1
L ≥ m(C1 + 5)ε−2 and M = 2ε−2 (as required in Lemma 6.4.7). We choose
L, N so large that we contradict (6.22) on page 170. Then we cannot find m solutions
βj (j = 1, . . . , m) that are (L, M ) -independent and belong to a fixed approximation class
C(λ; N ) .
Consider all solutions β ∈ K of
min(1, |β − αv |v,K ) ≤ H(β)−κ
v∈S
6.5. Further results 173
|S|
h(β) ≤ log 16/(c − 1) with c = 1− κ − 1 > 1;
N
(b) small solutions, those with log 16/(c − 1) < h(β) ≤ L ;
(c) large solutions, those with h(β) > L .
A bound for the number of very small solutions can be obtained by Northcott’s theorem
in 1.6.8.
A bound for the number of small solutions can be obtained by appealing to Lemma 6.5.6
for the interval (log 16/(c − 1), L] , obtaining
c + 1 N + |S|
log L/ log .
2 |S|
Note also that in this case we automatically have a bound for the height.
We cannot give a bound for the height of large solutions, but the above considerations show
that their number does not exceed
c + 1 N + |S|
m log M/ log .
2 |S|
In order to get an idea of the size of this bound, we set δ = min(κ − 2, 1) . Then the above
quantity is majorized by
−2
(2r|S|)c 1 δ (c2 /δ)|S | (6.23)
for certain absolute constants c1 , c2 . For |S| = 1 , this is due to H. Davenport and K.F.
Roth [79] with the bound exp(70r2 δ −2 ) .
174 ROT H ’ S T H E O R E M
6.5.8. The bound for the number of large solutions obtained by the above method is rather
large as δ → 0 . It is possible to improve drastically this bound by replacing Roth’s lemma
by more sophisticated results, notably Dyson’s lemma in several variables or Faltings’s
product theorem (see Theorem 7.6.4). For example, an improved bound obtained using
Dyson’s lemma is
c
log(2r|S|/δ) 3 (c4 /δ)|S |+4
for some absolute (and not very large) constants c3 , c4 . This is superior to (6.23) above.
6.5.9. No refinement of Roth’s theorem is known replacing κ > 2 by 2 + f (h(β)) , where
f (t) → 0 as t → ∞ . Heuristic arguments based on the analogy between diophantine
approximation and Nevanlinna theory suggest that f (t) = (1 + o(1))(log t)/t should be
admissible for this purpose (see Remark 13.2.25 and S. Lang and H. Trotter [175]).
On the other hand, something can still be said if the function f (t) tends to 0 sufficiently
slowly as t → ∞ . We have the Cugiani–Mahler theorem which we state without proof
(see E Bombieri and A.J. van der Poorten [36] for details):
Theorem 6.5.10. Let K be a number field and let S be a finite subset of MK . For each
v ∈ S let αv ∈ F ∩ Kv , where F is a finite extension of K of degree r . Let also
0 log log(t + log 4)
f (t) = 6 log r 4 .
log(t + log 4)
Let (βj ) be the sequence of solutions in K of
min (1, |βj − αv |v,K ) < (4H(βj ))−2−f (h(β j )) ,
v∈S
ordered by strictly increasing height. Then either the sequence (βj ) is finite or
h(βj+1 )
lim sup = ∞.
j→∞ h(βj )
More recently, alternative ways to Roth’s method for dealing with the difficult
point of the non-vanishing of the polynomial at the special point have been ob-
tained by H. Esnault and E. Viehweg [102] in their m -dimensional version of
Dyson’s lemma, and by G. Faltings [114] with his product theorem (see Theorem
7.6.4) and its quantitative versions due independently to J.-H. Evertse [106], R.G.
Ferretti [119], and G. Rémond [241]. We have not considered in this chapter these
very important improvements of Roth’s lemma and we have limited ourselves to a
rather elementary and explicit treatment along Roth’s original line.
Roth’s theorem with moving targets is proved in P. Vojta [315] and later M. Ru
and P. Vojta [250] extended the theorem to a version of Schmidt’s subspace the-
orem with moving targets. Vojta’s argument, inspired by N. Steinmetz’s paper
on Nevanlinna’s second theorem with moving targets [290] (see also the Biblio-
graphical Notes in Chapter 13), obtains the result as a consequence of Schmidt’s
subspace theorem (see Theorem 7.2.2). The simple direct proof outlined here ap-
pears to be new.
The improved bound in 6.5.8 for the case |S| = 1, with explicit values of c3 ,
c4 , see [36], Th.2 and the Note at the end of the proof. A similar result was
independently obtained by H. Luckhardt [184] combining Dyson’s lemma with
methods of mathematical logic. This was extended to several places by R. Gross
[132].
The improved function f (t) in Theorem 6.5.10 given here is obtained in [36],
Th.3. The improvements mentioned in 6.5.8 and 6.5.10 stem from replacing Roth’s
lemma with the deeper Dyson’s lemma in several variables of Esnault and Viehweg,
loc. cit.; the versions of Faltings’s product theorem mentioned above would suffice
as well. Except for this point, the structure of the proofs follows rather closely the
arguments in [79] and [74].
7 T H E S U B S PAC E T H E O R E M
7.1. Introduction
This chapter deals with Schmidt’s far-reaching extension of Roth’s theorem to sys-
tems of inequalities in linear forms. This is not a routine generalization; entirely
new difficulties appear in the course of the proof, which Schmidt resolved by in-
troducing new ideas from Minkowski’s geometry of numbers.
In the case of systems of inequalities, it is possible to have infinitely many solu-
tions. However, even then a finiteness theorem still holds, in the sense that solu-
tions are contained in finitely many proper linear subspaces of the ambient space.
This paves the way for applying induction arguments.
As is the case for Thue’s and Roth’s theorems, again Schmidt’s subspace theorem
is ineffective in the sense that no bound can be placed a priori on the height of the
finitely many linear spaces which contain the solutions. At any rate, it remains a
very flexible tool with wide applicability in many questions; the reader will find
some unusual applications of the subspace theorem in this chapter.
It is also possible, as for Roth’s theorem, to give an effective bound for the num-
ber of linear spaces containing the solutions. This requires rather sophisticated
methods beyond the scope of this book and will not be done here.
An extension of Schmidt’s theorem with a formulation allowing a finite set of
places, entirely analogous to Ridout’s and Lang’s generalizations of Roth’s theo-
rem, was later obtained by Schlickewei. This is quite important in applications.
Section 7.2 contains several equivalent formulations of the subspace theorem.
Then in Section 7.3 we consider applications of the subspace theorem related to
diophantine approximation and we give an alternative proof of Siegel’s theorem on
integral points, due to Corvaja and Zannier. In Section 7.4 we apply the subspace
theorem to the generalized unit equation. As a consequence we give Schmidt’s
solution of the norm-form equation, Laurent’s solution of the Lang conjecture for
tori and a nice result of Corvaja and Zannier from elementary number theory.
The proof of the subspace theorem, in the general form obtained by Schlickewei,
is given in full in Section 7.5. It is quite involved, first developing the “Roth
176
7.2. The subspace theorem 177
machinery” and then using new ideas of Schmidt, with further simplifications by
Evertse.
In the last two sections, we describe, without proofs, further important results. We
begin with Faltings’s product theorem in Section 7.6, which is a very important
geometric alternative to Roth’s lemma. In Section 7.7 we describe a deep exten-
sion of the subspace theorem obtained by Faltings and Wüstholz, dealing with
inequalities determined by forms of arbitrary degree; the finiteness statement be-
comes that the solutions are contained in a proper algebraic subvariety of the am-
bient space. Their method, quite different from Schmidt’s, uses deep tools from
algebraic geometry such as the theory of semi-stable bundles, Faltings’s product
theorem and an induction argument. Somewhat surprisingly, it was later shown
by Evertse and Ferretti that the Faltings–Wüstholz theorem can also be obtained
directly by an ingenious application of the subspace theorem.
In this chapter, the reader is assumed to be familiar with Chapter 6. For Section
7.4 it is also helpful to know the basic results of Chapters 3 and 5.
for all forms Lvi using suitable extensions of | |v to the number field F . We may
assume that F is a finite Galois extension of K . Then use Lui = Lvi for the place
u ∈ MF corresponding to the chosen extension of v to F . For the other places
w|v (w ∈ MF ), there is σ ∈ Gal(F/K) with w = u ◦ σ −1 (Corollary 1.3.5)
and we set Lwi := σ(Lvi ). The subspace theorem for F and this set of linear
forms implies the subspace theorem in the relative setting, because the left-hand
sides are the same for x ∈ K n+1 .
Note that the field of definition of the subspaces can always be assumed to be K ,
since we may replace Tj by the linear span of all solutions x ∈ PnK (K) contained
in Tj .
7.2.4. A very important refinement of the theorem above is a quantitative version in
which we also control the number of hyperplanes needed to contain all solutions.
The first result of this type was obtained by Schmidt and the best bounds are in
J.-H. Evertse [108] and J.-H. Evertse and H.P. Schlickewei [111]. These bounds
are quite sharp and uniform with respect to the number field K , which is essential
in many applications, an example is Theorem 7.4.1. Here we will limit ourselves
to the qualitative statement above, referring to the original papers [108], [111], for
statements and proofs of the quantitative versions.
The proof will be given in Section 7.5. The following affine version of the subspace
theorem is worth noting. We denote by OS,K the ring of S -integers (see 1.5.10).
Corollary 7.2.5. Let K , S , Lvi be as before with S containing all archimedean
places and let ε > 0. Then there are finitely many linear subspaces T1 , . . . , Th of
K n+1 such that the set of S -integral solutions x ∈ OS,K
n+1
\ {0} of
n
|Lvi (x)|v < H(x)−ε
v∈S i=0
is contained in T1 ∪ · · · ∪ Th .
Proof: Since x ∈ OS,K
n+1
\ {0}, we have
H(x) = |x|v ≤ |x|v .
v∈MK v∈S
Hence we see that every S -integral solution in Corollary 7.2.5 induces a projec-
tive solution in Theorem 7.2.2. We conclude that Corollary 7.2.5 follows from
Theorem 7.2.2.
Theorem 7.2.6. The affine subspace theorem as in Corollary 7.2.5 implies the
subspace theorem as in Theorem 7.2.2.
Proof: Since the subspace theorem for a set of places S implies a fortiori the
subspace theorem for a subset S ⊂ S , we may assume that S is so large that it
contains all the archimedean places and also that OS,K is a principal ideal domain
7.2. The subspace theorem 179
(see Proposition 5.3.6). Now let x ∈ K n+1 \{0} verify the projective diophantine
inequality
in the statement of Theorem 7.2.2 and let X be the fractional ideal
X = xi OS,K of the ring OS,K . Then, if v ∈ / S , we have
|x|v = max |xi |v = max |ξ|v . (7.1)
i ξ∈X
On the other hand, we have X = (δ) for some δ ∈ K × . Therefore, from equation
(7.1) we infer that
max |xi |v = |δ|v
i
for v ∈
/ S , whence
|x |v = H(x ) = H(x)
v∈S
and also x ∈ n+1
OS,K \ {0}. Now x is an affine solution of the inequality in
Corollary 7.2.5. This shows the equivalence of the projective and affine diophan-
tine inequalities in Theorem 7.2.2 and Corollary 7.2.5, completing the proof.
Example 7.2.7. Roth’s theorem in 6.2.3 is the special case n = 1 of the subspace
theorem. In the form given in this book, it states that, given K and S as above
and given K -algebraic αv ∈ Kv for v ∈ S , the inequality
min(1, |β − αv |v ) < H(β)−2−ε (7.2)
v∈S
has only finitely many solutions β ∈ K . Note that, by splitting the solutions of
(7.2) into finitely many subsets according to the places v ∈ S for which |β −
αv |v < 1, we see immediately that (7.2) is equivalent to the statement that the
solutions to
|β − αv |v < H(β)−2−ε , |β − αv |v < 1 (v ∈ S ∗ ) (7.3)
v∈S ∗
for some constant C . Now by Northcott’s theorem in 1.6.8, we have C < H(x)ε/2 =
H(β)ε/2 except for finitely many β ∈ K . Thus we may apply the subspace theo-
rem with n = 1 and ε/2 in place of ε and conclude that the solutions β of (7.3)
form a finite set. Roth’s theorem follows.
Next we give a stronger formulation of the subspace theorem due to P. Vojta [307]
and which is of importance in some applications.
Definition 7.2.8. Let F be a field. A set {L1 , . . . , Lm } of linear forms in the ring
F [X0 , . . . , Xn ] is said to be in general position if any subset of cardinality not
exceeding n + 1 is linearly independent over F .
is contained in T1 ∪ · · · ∪ Th .
Proof: We note first that we may assume mv ≥ n . This is clear by extending the
set of linear forms L0v , . . . , Lmv v with suitable standard coordinates to a basis of
the space of linear forms in case of mv < n. Next, we reduce to the case mv = n .
By partitioning the set of solutions x into finitely many classes, we may assume
after a permutation of the forms Lvi that
We keep only the forms Lvi with i ≤ n and remove the forms with i > n,
& vi of linear forms with L
obtaining a new set L & vi = Lvi for i ≤ n . The forms
Lvi with i ≤ n form a basis of the space of all linear forms, hence
7.3. Applications
There are several consequences of the subspace theorem in the theory of diophan-
tine approximation of algebraic numbers, and we examine here a few of them. At
the end of this section, we present a proof of Siegel’s theorem on integral points
relying on the subspace theorem. For this, the reader should be familiar with the
theory of algebraic curves as provided by Appendix A.13.
7.3.1. There are several consequences of the subspace theorem in the theory of
diophantine approximation of algebraic numbers, and we examine here a few of
them.
A well-known theorem of Dirichlet states that, if 1, α1 , . . . , αn are real numbers,
then for every positive integer N there is a point x ∈ Zn+1 \{0} with max |xi | ≤
N , i = 1, . . . , n, such that
|x0 + α1 x1 + · · · + αn xn | ≤ N −n .
The easy proof is obtained applying the pigeon-hole principle to
{α1 x1 + · · · + αn xn (mod 1) | xi = 0, . . . , N },
or by geometry of numbers applying Minkowski’s first theorem in C.2.19 to the
symmetric convex body of volume 2n+1 given by
|X0 + α1 X1 + · · · + αn Xn | ≤ N −n , |Xi | ≤ N, i = 1, . . . , n.
which we may assume to hold on the set X (by partitioning). Without loss of
generality, we may suppose An = 0. It follows that
0 < |α0 x0 + · · · + αn xn | = |β0 x0 + · · · + βn−1 xn−1 |
with βi = αi − αn Ai /An , and a fortiori
0 < |β0 x0 + · · · + βn−1 xn−1 | ≤ H(x)−n−ε ≤ H(x )−n+1−ε
with x = (x0 , . . . , xn−1 ). Now the induction hypothesis applies and we obtain
finitely many possibilities for x . Moreover, since αn = 0, given x there are only
finitely many possibilities for xn , and there are only finitely many possibilities for
the subspace T . Hence the set X is finite.
7.3.7. It is an interesting open problem to find the best exponent κd for which the inequality
|α − ξ| ≤ cH(fξ )−κ d +ε
has infinitely many real algebraic solutions ξ of degree at most d , for every fixed ε > 0 and
every real α not an algebraic number of degree at most d , for some constant c depending
on α and ε . If d = 1 and α is irrational, Dirichlet’s theorem shows that κ1 = 2 (even
with ε = 0 ). If d = 2 and α is not rational or a quadratic irrational, a difficult theorem
of H. Davenport and W.M. Schmidt [80] shows that κ2 = 3 (even with ε = 0 ). However,
their method breaks down if d ≥ 3 and it remains completely unclear what the correct
result should be.
In this connexion, it is instructive to consider the not unrelated problem of approximating
real numbers by algebraic integers of degree at most d . The first non-trivial case is d = 2 ,
where we can easily show (see H. Davenport and W.M. Schmidt [81]) that the inequality
|α − ξ| ≤ cH(fξ )−2
has infinitely many solutions in algebraic integers ξ of degree at most 2 for some constant
c depending on α , provided α is irrational. The next case d = 3 turned out to be quite
interesting. Davenport and Schmidt proved that, if α is not algebraic of degree at most 2 ,
then the inequality
√
|α − ξ| α H(fξ )−(3+ 5)/2
has infinitely
√ many solutions with ξ an algebraic integer of degree at most 3 . The exponent
(3 + 5)/2 looked somewhat strange and the authors commented, “We have no reason to
think that the exponents in these theorems are best possible.”
Thus it was a great surprise when D. Roy [246] constructed a real transcendental number α
and a constant c > 0 such that
√
|α − ξ| ≥ cH(fξ )−(3+ 5)/2
7.3.8. We conclude this section with an application of Corollary 7.2.5, which illus-
trates its power. It is an alternative direct proof, due to P. Corvaja and U. Zannier
[72], of a famous theorem of Siegel on integral points on curves. The standard
proof uses diophantine approximation (see 14.3.5) on the Jacobian and Roth’s the-
orem (see [277], §7.3, for details).
Let C be a geometrically irreducible affine curve over a number field K and let
S be a finite set of places containing the archimedean places. We assume that C
is given as a closed subvariety of AnK . An S -integral point of C is a point of C
with OK,S -integral coordinates. This notion depends on the embedding of C into
affine space.
&aff → C be the normalization of C and we extend the affine curve C
Let π : C &aff
&
to a smooth projective curve C , which is unique up to isomorphism (see A.13.2,
&\C
A.13.3). The points in C &aff are called the points of C at ∞ .
184 T H E S U B S PAC E T H E O R E M
We may assume C smooth, as we will show at the beginning of the proof of Theorem 7.3.9.
& has positive genus g , we may take, after replacing K by a larger number field, an
If C
unramified covering of C & to obtain a new curve C& with at least three distinct points at ∞
with C the inverse image of C . In order to show existence, we note that the first homology
group H1 (C &an , Z) is a free abelian group of rank 2g . By standard techniques from the
theory of covering spaces (see Section 12.3), we may easily construct a normal subgroup of
the fundamental group of finite index ≥ 3 and hence a finite unramified covering of C &an
of degree ≥ 3 . By Theorem 12.3.12, the covering is defined over a finite extension of K
and leads to our desired unramified covering π : C & → C & of degree ≥ 3 ; hence C has at
least 3 points at ∞ .
Moreover, the Chevalley–Weil theorem as in Theorem 10.3.11 shows that rational points on
C lift to rational points over a finite extension K of K . By enlarging S , we may assume
that π extends to an unramified finite morphism over OS,K and the valuative criterion of
properness ([148], Th. II.4.7) shows that S -integral points on C lift to S -integral points
on C for S the set of places of K lying over S . Thus we can apply the above theorem
to C to deal with arbitrary curves of positive genus.
Proof of Theorem 7.3.9: To begin with, we show that, by increasing the set S , we
may assume that C is normal and therefore smooth.
Let R be the ring of regular functions on C . By a base change to a larger number
field, we may assume that the K -rational points of C lift to K -rational points of
the normalization (using birationality from A.13.2). The S -integral points on C
are those at which all coordinate functions x1 , . . . , xn take values in OS,K . Now
let f ∈ K(C) be integral over R . Then f satisfies some equation
N
fN + pj (x)f N −j = 0,
j=1
where pj (X) ∈ K[X]. By enlarging S , we may assume that pj (X) ∈ OS,K [X].
Since OS,K is integrally closed in K , we see that f continues to take values in
OS,K at the S -integral points of C . Thus adding f to the coordinate ring R
preserves S -integrality (possibly by enlarging S ). Therefore, by enlarging S we
may replace R by its integral closure, which is what we needed to prove.
& the associated smooth projective curve. Clearly, we may assume
We denote by C
now that C ⊂ C& . Let Q1 , . . . , Qr be the distinct points at ∞ on C , which we
7.3. Applications 185
For v ∈ S , we have that |Lvj (ϕ(Pν ))|v is bounded, because of the definition of
S . Therefore, by multiplying these inequalities we get
d
d(d−2N −1)/2
|Lvj (ϕ(Pν ))|v |λv (Pν )|v (7.7)
v∈S j=1 v∈S
for ν sufficiently large. On the other hand, the ϕj (Pν ) are S -integers; hence
|ϕj (Pν )|v ≤ 1 for v ∈/ S , and also |ϕj (Pν )|v is bounded for v ∈ S . For
−N
v ∈ S and ν sufficiently large, we have |ϕj (Pν )|v |λv (Pν )|v because the
maximum order of pole of ϕj at P v is at most N . This shows that
−N
H((ϕ1 (Pν ) : · · · : ϕd (Pν ))) |λv (Pν )|v . (7.8)
v∈S
By combining (7.7) and (7.8), we infer that
d
d (d −2N −1)
|Lvj (ϕ(Pν ))|v H((ϕ1 (Pν ) : · · · : ϕd (Pν )))− 2N
v∈S j=1
A remarkable feature of this result is the uniformity of the bound for the number of
solutions, which, as in Theorem 5.2.1, is simply exponential in the rank of Γ and
independendent of the field K . This theorem is quite difficult to prove. We prove
now a weaker, but still very useful, version of Theorem 7.4.1, due independently to
Schlickewei and van der Poorten (see the bibliography in [260]) and J.-H. Evertse
[104].
Theorem 7.4.2. Let K be a number field and let S be a finite set of places of K
containing all archimedean places, with group of S -units US,K . Let X be the set
of solutions of
x1 + · · · + xn = 1
such that (x1 , . . . , xn ) ∈ (US,K )n and no proper subsum of x1 + · · · + xn van-
ishes. Then X is a finite set.
Corollary 7.4.3. Let X be the set of solutions of
x1 + · · · + xn = 1
such that (x1 , . . . , xn ) ∈ (US,K )n . Then there is a finite set F ⊂ US,K such that
every x ∈ X has at least one coordinate in F .
Proof of corollary: Clear by induction on n and Theorem 7.4.2.
Proof of theorem: We follow U. Zannier [337]. The proof is by induction on n ,
the case n = 1 being obvious.
We say that a solution x ∈ X of the S -unit equation x1 + · · · + xn = 1 is non-
degenerate if no proper subsum of x1 + · · · + xn vanishes. We partition X into
finitely many subsets according to the set of indices jv (v ∈ S) such that
jv = min{j | |xj |v = max |xi |v }
i
and then it suffices to prove the result for each subset.
Let us fix one of these subsets, say X , and define linear forms Lvj by Lvj = Xj
if j = jv and Lvjv = X1 + · · · + Xn . Since Lvjv (x) = x1 + · · · + xn = 1 and
since |xjv |v = |x|v , we have for v ∈ S
n
n
|Lvj (x)|v = |xj |v = |x|−1
v |xj |v .
j=1 j=jv j=1
because xj ∈ US,K (use the product formula in 1.4.3). Therefore, from the last
two displayed equations, we infer that
n
|Lvj (x)|v = |x|−1
v .
v∈S j=1 v∈S
Moreover, we have
H(x) = |x|v = |x|v
v∈MK v∈S
because each coordinate xj is an S -unit, and we conclude that
n
|Lvj (x)|v = H(x)−1 .
v∈S j=1
Thus we can apply Corollary 7.2.5 with ε = 1 and obtain that the solutions x ∈ X
lie in a finite union S of proper linear subspaces of K n . Now we partition X
finitely many subsets, such that in a typical subset X we have a relation
into
aj xj = 0. We may suppose, after a permutation of coordinates, that an = 0.
Eliminating xn and using the equation x1 + · · · + xn = 1, we then obtain an
equation
b1 x1 + · · · + bn−1 xn−1 = 1
with bj ∈ K , not all 0. By removing vanishing subsums from the above rela-
tion,
we end up with a new relation i∈I bi xi = 1, where no proper subsum of
i∈I b i x i vanishes. Thus by partitioning once again X into finitely many sub-
sets, it suffices to deal with one such relation. We enlarge S to a new finite set S ,
where each bi (i ∈ I) is an S -unit, and we obtain a non-degenerate solution y
in S -units, yi = bi xi , of the equation
yi = 1.
i∈I
Since |I| ≤ n − 1, the induction hypothesis shows that the bi xi (i ∈ I), and
hence the coordinates xi (i ∈ I) themselves belong to a finite set. Thus we have
proved that the inductive hypothesis implies that there is a finite set Φ such that
any solution x ∈ X has at least one coordinate in Φ . Let xi0 = c, c ∈ Φ ,
be one of these relations. Then c = 1 because x is a non-degenerate solution.
By enlarging S to a new set S so that 1 − c becomes an S -unit and setting
zi = (1 − c)−1 xi , we have that zi = (1 − c)−1 xi (i = i0 ) is an S -unit and
yields a non-degenerate solution of
zi = 1.
i=i0
Thus we can apply induction again to conclude that all remaining coordinates xi
also belong to a finite set.
7.4. The generalized unit equation 189
7.4.4. Theorem 7.4.2 is a very powerful tool and we devote the next few paragraphs
to some of its consequences. The first application is to the so-called norm-form
equation, a generalization of the Thue equation (for irreducible polynomials). For
simplicity, we shall consider here only norm-form equations over Z .
Let ω1 , . . . , ωn ∈ Q and let K = Q(ω1 , . . . , ωn ), d = [K : Q]. Let L(X) be
the linear form
n
L(X) = ωj Xj ,
j=1
let S be the set of distinct embeddings σ : K → C , and define
n
Lσ (X) = σ(ωj )Xj .
j=1
and the norm-form equation NK/Q (αµ ) = c is equivalent to the new norm-form
equation NK /Q (µ ) = ε , with ε = 1 if [K : K ] is odd and ε = ±1 otherwise.
Since K is a proper subfield of K , the inductive hypothesis applies to K and
X is indeed the union of finitely many families of solutions. Since there are only
finitely many subclasses X to consider, the result follows.
The following theorem is due to M. Laurent [178].
×
Theorem 7.4.7. Let Γ be a finitely generated subgroup of (Q )n and let Σ be
any subset of Γ . Then the Zariski closure of Σ in Gnm is a finite union of translates
of algebraic subgroups of Gnm .
Remark 7.4.8. In his paper Laurent proves the stronger statement, previously
conjectured by Lang, in which the subgroup Γ is a subgroup of C× of finite Q -
rank. This was obtained from the above theorem by using additional arguments
from Kummer theory. A proof of the stronger result is also immediate if in the
argument below we use Theorem 7.4.1, rather than Corollary 7.4.3.
ci
− gai −a0 = 1 (g ∈ V ∩ Σ).
i=1
c 0
Proof of Theorem 7.4.10: We may assume that S contains at least one non-
archimedean place, |v| ≥ |u| and v = 1. Let d be the denominator of the fraction
(u−1)/(v−1) in its lowest terms, thus d ≤ 2|v|1−ε because GCD(u−1, v−1) ≥
max(|u|, |v|)ε by hypothesis. We define zj ∈ Q and cj ∈ Z for j ∈ N by
u−1 cj
zj := uj−1 = .
v−1 d
We have the approximation
1 1 h
= + O |v|−h−1
v − 1 r=1 v r
7.4. The generalized unit equation 195
For (ν, i) ∈
/ {(∞, 1), . . . , (∞, k)} we define instead
Lνi := Xi .
Obviously, for each ν ∈ S the linear forms Lν1 , . . . , Lνn are linearly indepen-
dent. Now define for a pair (u, v) ∈ Σ the point x = (x1 , . . . , xn ) to be
x := dv h z1 , . . . , zk , v −1 , . . . , v −h , uv −1 , . . . , uv −h , . . . , uk v −1 , . . . , uk v −h .
Then, for i > k , we have xi = dua v b , for suitable integers a and b , hence xi
equals d times an S -unit. Since Lνi = xi , we easily deduce that
|Lνi (x)|ν ≤ d for i > k.
ν∈S
Therefore, we have
n
k
|Lνi (x)|ν ≤ dn−k |Lνi (x)|ν
ν∈S i=1 ν∈S i=1
(7.13)
k
k
=d n−k
|L∞i (x)|∞ |xi |ν .
i=1 ν∈S\{∞} i=1
Moreover, by (7.12) we have |L∞i (x)| = O d|u|i |v|−1 for i = 1, . . . , k .
Putting this estimate and (7.14) in (7.13), we get
n
|Lνi (x)|ν = O dn |u|k(k+1)/2 |v|−hk−k .
ν∈S i=1
Recalling that d ≤ 2|v|1−ε and |u| ≤ |v|, from the last displayed inequality we
obtain
n
|Lνi (x)|ν = O |v|h+k(k+1)/2−εn . (7.15)
ν∈S i=1
Finally, each xi is an integer and we have max |xi | ≤ 2d|v|h+k < 4|v|h+k+1 ,
hence H(x) ≤ 4|v|h+k+1 . In view of (7.15), we conclude that
n
|Lνi (x)|ν h,k H(x)−δ
ν∈S i=1
We follow here the classical proof of Schmidt, with important modifications intro-
duced by H.P. Schlickewei [261] and J.-H. Evertse [108] in order to cover the case
of arbitrary number fields and allow a finite set of places.
7.5.1. By Theorem 7.2.6, it is sufficient to prove the subspace theorem in its affine
form Corollary 7.2.5. The proof is by contradiction.
The first step in the proof consists in following, as far as possible, the blueprint
provided by the proof of Roth’s theorem. Here a major new difficulty appears,
namely the non-vanishing of the auxiliary construction cannot be done at a single
point and it requires the consideration of n independent points. This gives only a
rather weak version of the subspace theorem.
The conclusion of the proof is to show how to go from n independent points to one
point only. This is where we need new ideas. Schmidt’s original method consisted
in applying the weaker form of the subspace theorem to a new set of linear forms,
obtained by taking exterior products of the given linear forms, and using geometry
of numbers to deduce the strong version of the subspace theorem. This part of
Schmidt’s proof has been substantially simplified by Evertse [108] and we will
follow his exposition here, with some further simplifications, because our goal is
only the easier qualitative form of the subspace theorem. As formally stated in the
next article, we will not keep track of constants depending only on K , S and the
linear forms Lvi . This will allow substantial simplifications in the exposition.
7.5.2. Before starting the proof of the subspace theorem, we need some notation.
We write
We will also need to use some elementary exterior algebra and our notation will be
as follows. Let V be a vector space over a field F , of dimension n + 1, with basis
ei (i = 0, . . . , n). For k = 1, . . . , n, we equip the exterior power ∧k V with the
standard basis
ei1 ∧ · · · ∧ eik
with i1 < i2 < · · · < ik , in lexicographic order.
198 T H E S U B S PAC E T H E O R E M
n
We extend the standard scalar product x · y = j=0 xj yj on V to ∧k V by
Laplace’s identity
(x1 ∧ · · · ∧ xk ) · (y1 ∧ · · · ∧ yk ) = det((xi · yj ))i,j=1,...,k (7.16)
and by multilinearity. Obviously, this is just the standard scalar product on ∧k V
with respect to the standard basis (ei1 ∧ · · · ∧ eik )i1 <···<ik .
(see Proposition 5.3.6) and, by the proof of Theorem 7.2.6, it suffices to deal only
with S -integral vectors x satisfying the additional condition
|x|v = H(x).
v∈S
If x = (x0 , . . . , xn ) ∈ OS,K
n+1
is primitive, clearly ux is again primitive whenever
u is an S -unit. Moreover, if x is a solution of the basic inequality (7.18), then
ux is again a solution. Such solutions form an equivalence class. The following
lemma shows that in every equivalence class there is an element with small affine
height, as defined in 1.5.7. In order to distinguish it from the projective height, we
set
haff (x) := h((1, x)).
Lemma 7.5.4. There is a positive constant C0 , depending only on S and K , with
the following property. Let x = (x0 , . . . , xn ) ∈ OS,K
n+1
be a primitive point. Then
there is an S -unit u in OS,K such that
h(x) ≤ haff (ux) ≤ h(x) + C0 .
Proof: The inequality h(x) ≤ haff (ux) is obvious because h(ux) = h(x).
Now suppose for example that x0 = 0. Since x0 is a non-zero S -integer, the
product formula shows that
log |x0 |v = − log |x0 |v ≥ 0.
v∈S v ∈S
/
Corollary 7.5.5. Every equivalence class of primitive solutions of the basic in-
equality (7.18) contains an element x such that for every v ∈ S we have
haff (Lv (x)) ≤ h(x) + C1 .
In particular, we have for Lvi (x) = 0
log |Lvi (x)|v,K ≤ rh(x) + rC1 . (7.19)
Proof: Since the forms Lvi (i = 0, . . . , n) are linearly independent, we have for
w ∈ MF
log |Lv (x)|w − log |x|w ≤ γw ,
with γw depending only on the forms Lvi and equal to 0 up to finitely many w .
We conclude that haff (Lv (x)) and haff (x) differ only by a bounded quantity and
an application of Lemma 7.5.4 yields the first claim. Now let w be the place of F
with w|v . Then Proposition 1.2.7 shows that
| |v,K = | |[F
w
:K]/[Fw :Kv ]
and the claim follows from the fundamental inequality in (1.8) on page 20 and
[F : K] = r .
7.5.6. As in the proof of Roth’s theorem, it is necessary to consider only solutions
x for which the factors |Lvi (x)|v,K , for each (v, i), have a similar behaviour
compared with H(x). In other words, we want
h(x)−1 log |Lvi (x)|v,K v∈S,i=0,...,n (7.20)
to be nearly constant along our set of primitive solutions. If Lvi (x) = 0 for some
(v, i), then the solution x lies in the hyperplane defined by the linear form Lvi .
Hence such solutions satisfy the conclusion of the theorem to be proved. Thus,
by the preceding corollary, we may and shall assume that for all (v, i) we have
Lvi (x) = 0 and that (7.19) holds.
The next step consists in splitting solutions into finitely many approximation classes,
as it was done in the proof of Roth’s theorem. Since we are not interested here in
counting the number of approximation classes, it will suffice to note that, given any
infinite set of solutions of inequality (7.18), and given N > 0 (which will be taken
very large), then by (7.19) there exists a cube in R(n+1)|S| of edge size 1/N , with
north-east corner at a point (cvi ) ∈ [−2r, 2r](n+1)|S| , containing (7.20) for an in-
finite subset of the given set of primitive solutions. (The point of the constant 2r
is to swallow the contribution of rC1 in (7.19) as soon as h(x) is large enough.)
Thus we have an infinite primitive approximation class of primitive solutions of
inequality (7.18), consisting of solutions satisfying
1
cvi − h(x) ≤ log |Lvi (x)|v,K ≤ cvi h(x) (7.21)
N
7.5. Proof of the subspace theorem 201
for each pair (v, i). By this inequality and (7.18) on page 199, we necessarily have
n
ε
cvi ≤ −ε + (n + 1)|S|/N < − (7.22)
2
v∈S i=0
Proof: Statement (a) is obvious from the definition of βv and, similarly, we would
get (b), (c), (d) if Πv (Q) is the cuboid {|ξvi |v ≤ Qcv i | i = 0, . . . , n}. In general,
a transformation of coordinates ξvi = Lvi (ξ v ) is necessary to get Πv (Q) from
the cuboid. Then the volume changes by a factor det(Lvi )−1 −d
v,K = |∆v |v,K (use
C.1.3) proving (b)–(d). Finally (e) follows from (C.4) on page 606.
Corollary 7.5.8. The volume of an approximation domain is
β n+1 (Π(Q)) Q−dε/2 .
Proof: Clear from the preceding lemma and (7.22).
We summarize the results obtained so far as:
Lemma 7.5.9. Suppose the affine subspace theorem in 7.2.5 is false for ε > 0.
Let d = [K : Q], let F be a finite extension of K, which is a field of definition
for all forms Lvi , and define r = [F : K]. Then there are real numbers cvi
(v ∈ S, i = 0, . . . , n) with |civ | ≤ 2r and
n
ε
civ ≤ − ,
i=0
2
v∈S
coefficients
in a finite extension F/K . Let cvi ∈ R (v ∈ S, i = 0, . . . , n) with
vi cvi < 0 defining an approximation domain Π(Q) of level Q as in 7.5.6.
Then {V (Q) | Q ≥ 1, rank(Π(Q)) = n} is a finite set.
We break the proof of this theorem into several steps, trying to imitate the
arguments
given in Chapter 6 for Roth’s theorem. We fix 0 < ε ≤ 1 with
c
vi vi ≤ −ε/2. The cardinality of the above set of subspaces may be expressed
in terms of K, S, Lvi , ε (see [108], Th.C), but we restrict our attention to the qual-
itative result.
7.5.14. Step I: The auxiliary polynomial.
In the proof of Roth’s theorem, we start by constructing a polynomial in several
variables vanishing to high weighted order at points (αv , . . . , αv ) ∈ Kvm with
v ∈ S . We begin here in a similar way, but here it proves to be convenient to work
with multihomogeneous polynomials. So we need to develop some notation first.
For an m -tuple of positive integers d = (d1 , . . . , dm ), we denote by P(d) the
K -vector space of multihomogeneous polynomials P (x1 , . . . , xm ) of degree dh
in the block of variables xh . By Example A.6.13, we can identify
P(d) = Γ(P, OP (d)), P := PnK × · · · × PnK .
' () *
m factors
Let I = (i1 , . . . , im ) with ih = (ih0 , . . . , ihn ). Using 6.3.1, we define
i1
im
1 1 ∂ ∂
∂I := ... ... = ∂i1 · · · ∂im .
i1 ! im ! ∂x1 ∂xm
Similarly as in the homogeneous case, the normalizations yield that
f (x) = ∂I f (0)xi11 · · · xm im
I
for any polynomial or more generally any power series f .
Let L(x) = (L0 (x), . . . , Ln (x)) be linearly independent linear forms over F .
Then, given P ∈ P(d) and the differential operator operator ∂I , we can write
∂I P (x1 , . . . , xm ) = a(L; J; I) L(x1 )j1 · · · L(xm )jm
J
with jh = (jh0 , . . . , jhn ), J = (j1 , . . . , jm ), and a(L; J; I) ∈ F . By homo-
geneity, we also have |jh | = dh − |ih | for h = 1, . . . , m.
If V = (v1 , . . . , vm ), we will write Vi for the vector
Vi = (v1i , . . . , vmi ).
Also, it will prove to be convenient to use the notation
|v1 | |vm |
(V/d) := + ··· + .
d1 dm
7.5. Proof of the subspace theorem 205
Here and in the following, the lower bound for d1 , . . . , dm may depend on the
given data including η, m, and so on.
Proof: We give here a proof which parallels the argument used in Lemma 6.3.4
in the construction of the auxiliary polynomial. We are interested here only in
asymptotics for d1 → ∞, . . . , dm → ∞ for fixed m .
The dimension of the vector space P(d) is the number of m -tuples J = (j1 , . . . ,
jm ) with non-negative integer components such that (|j1 |, . . . , |jm |) = d .
Since prescribing jhi for i = 0, . . . , n − 1 with the sum not exceeding dh deter-
mines jhn , we have dim(P(d)) ∼ d1 . . . dm V0 with
m 1 1 n−1
V0 = ... χ[0,1] xhi dxh0 · · · dxh,n−1
h=1 0 0 i=0
and χ[a,b] the characteristic function of the interval [a, b]. The integral equals
(n!)−1 and we obtain
V0 = (n!)−m . (7.23)
Let us fix v ∈ S and compute, for η > 0, an asymptotic upper bound for the
number of linear conditions (with coefficients in F ) imposed on an element P of
P(d) by the vanishing of the coefficients a(Lv ; J; 0) whenever
m
(Ji /d) ≤ − mη (7.24)
n+1
for some fixed i, for example i = 0. The number of such m -tuples J is asymp-
totic to d1 · · · dm V with
1 1 m m n−1 2 m n−1
V = ··· χ[0, m −mη] xh0 χ[0,1−xh 0 ] xhi dxhi .
n +1
0 0 h=1 h=1 i=1 h=1 i=0
206 T H E S U B S PAC E T H E O R E M
Therefore, we have
1 1 m m
(1 − xh0 )n−1
V = ··· χ[0, nm+1 −mη] xh0 dx10 · · · dxm0 .
0 0 (n − 1)!
h=1 h=1
In order to obtain a good upper bound for V , we proceed as in Lemma 6.3.5 noting
the majorization
χ[a,b] (x) ≤ eλ·(b−x) ,
valid for λ ≥ 0. This decouples the variables xh0 and we get
1
m
(1 − x)n−1
V ≤ eλ/(n+1)−λη e−λx dx .
0 (n − 1)!
for any λ ≥ 0. Suppose now that 0 < λ ≤ n + 4. By expanding the exponential
into a MacLaurin series and integrating term by term we obtain
1 ∞
(1 − x)n−1 (−λ)k
e−λx dx =
0 (n − 1)! (n + k)!
k=0
$ %
1 λ 1
≤ 1− + λ 2
n! n + 1 (n + 1)(n + 2)
1 λ 1
< exp − + 2
λ .
n! n + 1 (n + 1)(n + 2)
Hence from the last two displayed equations we conclude that
1 2 1
V < m
e−mλη+mλ (n +1)(n +2)
(n!)
provided 0 < λ ≤ n + 4. If we choose for example λ = η(n + 1)(n + 2)/2 with
η ≤ 2/(n + 1), we get
1 2
V < e−(n+1)(n+2)η m/4 . (7.25)
(n!)m
Now we apply the relative Siegel lemma in 2.9.19 to find a non-zero P with small
coefficients satisfying (7.24) for some i = 0, . . . , n. The calculation is entirely
analogous to the proof of Lemma 6.3.4. The final result is that, if
1
r(n + 1)|S| V /V0 ≤ , (7.26)
2
then h(P ) ≤ C2 |d|, for d1 , . . . , dm sufficiently large and with C2 depending
only on the forms Lvi and K .
Applying a differential operator ∂I to P increases the height by not more than
(log 2)|d|, proving h(∂I P ) ≤ C2 |d|. Then a bound h((a(Lv ; J; I))) ≤ C3 |d| is
obtained looking at (∂I P )(Mv (x)), where Mv ◦ Lv (x) = x .
7.5. Proof of the subspace theorem 207
m mn m
(Ji0 /d) < (Ji /d)− − 2mη n ≤ m− +2mnη = +2mnη
i=0
n+1 n+1 n+1
for every i0 , hence J is outside of the complementary range defined above.
It remains to verify condition (7.26). By (7.23) on page 205 and (7.25) this is is
verified as soon as m is so large that
2 1
r(n + 1)|S|e−(n+1)(n+2)η m/4 ≤ ,
2
which is our initial assumption in the lemma.
7.5.16. Step II: A generalization of Roth’s lemma.
In the higher-dimensional setting of the subspace theorem, the non-vanishing of
the auxiliary polynomial cannot be obtained at a single point (x1 , . . . , xm ) and
has to be replaced by the vanishing on a product V1 × · · · × Vm , where the factors
Vh are K -vector subspaces of K n+1 , all of dimension n . Accordingly, the notion
of index of a multihomogeneous polynomial has to be changed as follows.
Let M1 , . . . , Mm ∈ Q[x0 , . . . , xn ] be non-zero linear forms, let M =
(M1 , . . . , Mm ), and let d = (d1 , . . . , dm ) be an m -tuple of positive numbers.
Definition 7.5.17. Denote by I(t; d; M) the ideal in Q[x1 , . . . , xm ] generated
by all monomials M1 (x1 )j1 · · · Mm (xm )jm with
(j/d) ≥ t.
Then
m −1
ind(P ; d; M) ≤ 2mσ 1/2 .
Proof: The idea is to specialize, in each group of variables xh , all variables
except two to 0 (which we may relabel as (xh0 , xh1 )) and apply Roth’s lemma
from 6.3.7. However, we must make sure that the specialized polynomial is not
identically 0 and also we must be able to compare index and heights before and
after specialization.
We may assume n ≥ 2 since if n = 1 this is simply Roth’s lemma in a homoge-
neous setting. Let F be a number field containing the coefficients of P and M.
Let b = (b0 , . . . , bn ) ∈ F n+1 \ {0} and suppose for simplicity that b0 = 0. Then
n
h(b) ≤ log(max(|b0 |v , |bi |v ))
v∈MF i=1
≤ n max h((b0 , bi )).
i=1,...,n
Thus, after relabeling the variables (xj0 , . . . , xjn ) we may assume that the linear
forms
Mj (xj ) = bj0 xj0 + · · · + bjn xjn
specialize under xji = 0 (i = 2, . . . , n) to
3j (xj ) = bj0 xj0 + bj1 xj1
M
with
h(Mj ) ≥ h(M 3j ) ≥ 1 h(Mj ).
n
Since bj0 = 0, we can write uniquely
P = c(j)M1 (x1 )j1 · · · Mm (xm )jm qj (x1 , . . . , xm ), (7.27)
j
where the coefficients qj are polynomials in the variables xj = (xj1 , . . . , xjm ).
This makes it plain that in computing the index of P we may restrict ourselves to
decompositions as in (7.27).
7.5. Proof of the subspace theorem 209
After removing from P the highest factor xk12 dividing it, we specialize x12 = 0,
obtaining a new polynomial P ∗ , not identically 0. Since the coefficients of P ∗
are a subset of the set of coefficients of P , we certainly have
h(P ∗ ) ≤ h(P ).
Moreover, by the uniqueness of the decomposition (7.27), if xk12 divides P , then
every, qj in (7.27) is divisible by xk12 , and it follows that
ind(P ∗ ; M|x12 =0 ; d) = ind(P ; M; d).
Proceeding step-by-step in this way, we eventually arrive at a multihomogeneous
polynomial P&(& &m ), in the variables x
x1 , . . . , x &j = (xj0 , xj1 ), not identically 0,
with multidegree &
r ≤ d componentwise, such that
h(P&) ≤ h(P ), 3 d) = ind(P ; M; d),
ind(P&; M; 3j ) ≥ h(Mj )/n
h(M
for j = 1, . . . , m.
We apply Roth’s lemma in 6.3.7 to the polynomial P& , with M3 in place of ξ (this
3j ) ≥
is due to the fact that we work here in a homogeneous setting). Since dj h(M
n dj h(Mj ), and h(P&) ≤ h(P ), condition (b) of Roth’s lemma is verified as
−1
soon as
min dj h(Mj ) ≥ nσ −1 (h(P ) + 4md1 ).
j
Therefore
3 d) ≤ 2mσ 1/2m −1 .
ind(P ; M; d) = ind(P&; M;
7.5.20. Step III: The height of V (Q).
As in Section 2.8, we define the height of a non-zero subspace V of Q by h(V ) :=
h(b1 ∧ · · · ∧ bk ), where b1 , . . . , bk is a basis of V . In the Roth case n = 1 we
have dim(V (Q)) = 1 and V (Q) is generated by x , hence
h(V (Q)) = h(x) = log(Q).
The goal of this key step is to show that a result of comparable strength still holds
if n ≥ 2.
Lemma 7.5.21. Let rank(Π(Q)) = n , let cmax = maxvi cvi , and suppose that
log(Q) ≥ C4 ε−1 . There is a linear space W , independent of Π(Q) and ε , such
that either V (Q) = W or
(4r |S|)−1 ε log(Q) − C5 ≤ h(V (Q)) ≤ ncmax |S| log(Q) + C6 .
Proof: Let y(1) , . . . , y(n) be a basis of V (Q) with ι(y(i) ) ∈ Π(Q). Then V (Q)
is given by an equation
Hence
n
h(V (Q)) = h(y(1) ∧ · · · ∧ y(n) ) ≤ h(y(i) ) + C6 . (7.29)
i=1
We also have for v ∈ S , using ι(y(i) ) ∈ Π(Q) and that the form Lvi (i =
0, . . . , n) are linearly independent
|y(i) |v,K |Lv (y(i) )|v,K ≤ max Qcv j ≤ Qcmax ,
j
while |y |v ≤ 1 for v ∈
(i)
/ S . In view of (7.29), we get the upper bound for
h(V (Q)).
The proof of the lower bound is more intricate. For the linear forms
vk = (Lv0 ∧ · · · ∧ Lv,k−1 ∧ Lv,k+1 ∧ · · · ∧ Lvn )∗
L (k = 0, . . . , n)
we set
vk ((y(1) ∧ · · · ∧ y(n) )∗ ).
Dvk = L
Since the ∗-operator is an isometry and by Laplace’s identity (7.16) on page 198
we have
Therefore, expanding the determinant and using |Lvi (y(j) )|v,K ≤ Qcv i , we find
|Dvk |v,K ≤ max(1, |n!|v ) max |Lvi (yπ(i) )|v,K
π
i=k (7.30)
−cv k + i cv i
≤ max(1, |n!|v ) Q ,
where π ranges over all bijective mappings π : {0, . . . , n} \ {k} → {1, . . . , n}.
Now suppose for the time being that there is an |S|-tuple {iv } such that
ε
cviv ≥ − , Dviv = 0 (v ∈ S). (7.31)
4
v∈S
Then, by the product formula, (7.34), (7.35), and the fundamental inequality (1.8)
on page 20, we find
1= |x · w|v = |x · w|v |x · w|v ≤ |x · w|v |w|v
v∈MK v∈S v ∈S
/ v∈S v ∈S
/
−ε/4
Q v ∈S cv j v
Q .
−1
Therefore, log(Q) ε contradicting, for large C4 , the hypothesis log(Q) ≥
C4 ε−1 of the lemma.
7.5.22. Step IV: Application of the generalized Roth’s lemma.
Here we combine Lemma 7.5.15 and the generalized Roth’s lemma from 7.5.19,
obtaining a polynomial vanishing to high order at Lv , v ∈ S , but not identically
0 when restricted to V1 × · · · × Vm , where the Vh s are suitable linear subspaces
of K n+1 .
We choose parameters as follows
4
m≥ log(2(n + 1)[F : K] |S|),
(n + 1)(n + 2)η 2
m −1 1
σ = (η/4)2 , η≤ .
n+1
We prove Theorem 7.5.13 by contradiction. Then there is a sequence Qν ≥ 1
with rank(Πv (Qν )) = n and V (Qν ) = V (Qµ ) for µ = ν . We may assume
that V (Qν ) omits the exceptional subspace W from Lemma 7.5.21. Going to a
subsequence (again denoted by Qν ), we may assume that log(Qν ) → ∞ at an
arbitrarily fast rate. Indeed, if this were not the case, then V (Qν ) would have
bounded height and, by Northcott’s theorem in 2.4.9, there would be only finitely
many spaces V (Qν ). Hence we may and shall assume that
log(Q1 ) ≥ C,
(7.36)
log(Qj+1 ) ≥ 2σ −1 log(Qj )
for every j and any given constant C , which may depend on the given parameters.
Since rank(Π(Qν )) = n , the vector space V (Qν ) is defined by a single equation
bν0 x0 + · · · + bνn xn = 0
and we denote by Mν the associated linear form, thus (7.28) on page 209 shows
h(V (Qj )) = h(Mj )
for every j . Now we take dj = D/ log Qj (j = 1, . . . , m) and apply Lemma
7.5.15. This gives us, for large D , a certain non-zero polynomial P of multidegree
d. We claim that P and M = (M1 , . . . , Mm ) satisfy the hypotheses of the
generalized Roth lemma in 7.5.19.
7.5. Proof of the subspace theorem 213
In order to verify (a) and (b) of Lemma 7.5.19 we appeal to Lemma 7.5.21.
Condition (a), in view of our choice of the weights dj , follows from (7.36) for D
sufficiently large, which we will assume from now on.
For condition (b), we note first that, if log(Q1 ) ≥ C4 ε−1 , we have by Lemma
7.5.21 the estimate
dj h(Mj ) = dj h(V (Qj ))
≥ D/ log(Qj )((4r |S|)−1 ε log(Qj ) − C5 ) ≥ C8 εD.
On the other hand, the bound for h(P ) from Lemma 7.5.15 and the just verified
dj+1 ≤ σdj yield
nσ −1 (h(P ) + 4md1 ) ≤ nσ −1 (C2 + 4m)(D/ log(Q1 ))(1 + σ + · · · + σ m−1 )
σ −1 mD/ log(Q1 ),
which is negligible with respect to εD if log(Q1 ) ≥ C9 (εσ)−1 m with C9 large
enough. This proves that condition b) is satisfied for large log(Q1 ).
Therefore, Lemma 7.5.19 yields
ind(P ; d; M) ≤ mη/2.
It follows that there is I = (i1 , . . . , im ) with
(I/d) ≤ mη/2
such that ∂I P does not vanish identically on the product space
V (Q1 ) × · · · × V (Qm ).
7.5.23. Step V: Non-vanishing at a small point of V (Q1 ) × · · · × V (Qm ).
What we really want is the non-vanishing of a derivative of P at a point Y =
(y1 , . . . , ym ) with yh of small height, say comparable with Qh . The next easy
lemma allows us to do so from the information we have gathered so far.
Lemma 7.5.24. Let k be field of characteristic 0 and x1 , . . . , xN algebraically
independent over k . Let f (x1 , . . . , xN ) ∈ k[x1 , . . . , xN ] be a non-zero polyno-
mial of degree at most ej in xj , and let B > 0. Then there are rational integers
zj and ij (j = 1, . . . , N ), with
|zj | ≤ B, 0 ≤ ij ≤ ej /B
for j = 1, . . . , N , such that for i = (i1 , . . . , iN ) we have
∂i f (z1 , . . . , zN ) = 0.
The general idea in applying this lemma is to consider many points which are linear
combinations of a basis y(i) with small coefficients and show that the auxiliary
polynomial we have constructed cannot vanish at every such point. However, we
must be sure that the height of the polynomial evaluated at the point does not
increase more that O(|d|). This means that the size B of the coefficients must
be kept bounded and then we would have too few points at our disposal to meet
our goal. On the other hand, applying a differential operator ∂I only increases
the height by O(|d|), and ∂i f cannot vanish at a point for every i unless f is
identically 0. Thus, if we vary not only the choice of the point but also vary the
polynomial by applying a differential operator, we can prove what we want without
increasing the height too much in the process.
(1) (n)
The details are as follows. Let yh , . . . , yh ∈ K n+1 be linearly independent
(l)
points with ι(yh ) ∈ Π(Qh ), which is possible because Π(Qh ) has rank n . We
(1) (n)
write zh = (zh1 , . . . , zhn ), Z = (z1 , . . . , zm ), and Yh = (yh , . . . , yh ). Then
the polynomial
R(Z) := (∂I P )(z1 · Y1 , . . . , zm · Ym )
is not identically 0 because ∂I P does not vanish identically on V (Q1 ) × · · · ×
V (Qm ). Clearly, R(Z) has degree at most dh in the block of variables zh . Hence
by the previous lemma there are a point Z ∈ Zmn , and rational integers jhl
(h = 1, . . . , m, l = 1, . . . , n), such that
|zhl | ≤ B, 0 ≤ jhl ≤ dh /B
and
(∂J R)(Z ) = 0.
Since ∂J R is a linear combination of derivatives ∂I P evaluated at the point
n
n
X = (x1 , . . . , xm ) :=
(l)
z1l · y1 , . . . , zml · ym
(l)
,
l=1 l=1
m −1
Let σ = (η/4)2 and let Qh (h = 1, . . . , m) be such that rank(Π(Qh )) = n
and, for a certain constant C10 ,
log(Q1 ) ≥ C10 (εσ)−1 m, log(Qh+1 ) ≥ 2σ −1 log(Qh ) (h = 1, . . . , m−1).
(l)
For h = 1, . . . , m, let yh (l = 1, . . . , n) be a basis of V (Qh ) such that
(l)
ι(yh ) ∈ Π(Qh ). Then there are a non-zero multihomogeneous polynomial T (X) ∈
K[X], of multidegree majorized by d componentwise, and rational integers zhl
with
|zhl | ≤ 2n/η
n
with the following properties. Let xh = l=1 zhl yh . Then:
(l)
(a) T (X ) = 0.
(b) h(T ) d1 .
j1 jm
(c) If v ∈ S and T (X) = J a(Lv ; J)Lv (x1 ) . . . Lv (xm ) , then
a(Lv ; J) = 0 unless (Ji /d) − m ≤ 2nmη for every i .
n + 1
(d) h((a(Lv ; J))) d1 for every v ∈ S .
Proof: Statement (a) follows from the construction of X . Also, (b) and (d) follow
from h(P ) |d| (see Lemma 7.5.15) and |d| d1 . Finally, we note that
a(Lv ; J) is non-zero if and only if a(Lv ; J; I ) is non-zero. Hence (c) follows
from (7.37) and Lemma 7.5.15.
7.5.26. Step VI: Conclusion of the proof of Theorem 7.5.13.
The proof of Theorem 7.5.13 is now easy and follows the blueprint of the proof of
Roth’s theorem. On the one hand, we have by the product formula and T (X ) = 0
that
log |T (X )|v = 0.
v∈MK
On the other hand, |xh |v ≤ 1 for v ∈ S and hence
log |T (X )|v ≤ log |T (X )|v + log |T |v .
v∈MK v∈S v ∈S
/
and hence the last three displayed formulas and Lemma 7.5.25 lead to
0≤ log max Lv (x1 )j1 · · · Lv (xm )jm v,K + O(d1 ).
a(Lv ;J)=0
v∈S
216 T H E S U B S PAC E T H E O R E M
m
= cvi D + O(mη D) + O(log(1/η) D/ log(Q1 )).
i=0
n+1
Therefore, putting together the last two sequences of inequalities and recalling the
definition of ε at the beginning of the proof of Theorem 7.5.13, we deduce
n
m
0≤ cvi D + O(mη D) + O(log(1/η) D/ log(Q1 ))
i=0
n +1
v∈S
ε/2
≤− m D + O(mηD) + O(log(1/η) D/ log(Q1 )).
n+1
Hence, dividing by mD , we conclude that
ε/2 log(1/η)
0≤− + O(η) + O .
n+1 m log(Q1 )
Since we are allowed to take Q1 arbitrarily large, we get 0 ≤ −ε/(2n+2)+O(η),
a contradiction if η is positive and small. This completes the proof of Theorem
7.5.13.
7.5.27. Step VII: A general strategy and Evertse’s lemma.
In order to provide some motivation for what follows, we begin by describing in a
special case Schmidt’s original strategy for the proof of the subspace theorem.
Consider the simplest case, namely n = 2, K = Q , S = {∞} and three linear
forms L1 , L2 , L3 . Let λ1 , λ2 , λ3 be the three successive minima of Π(Qν ). If
λ1 ≤ λ2 ≤ 1, the rank is 2 and Theorem 7.5.13 can be applied. If instead
λ1 ≤ 1 < λ2 ≤ λ3 , the rank is 1. By Minkowski’s second theorem, we have
β 3 (Π(Qν ))−1 λ1 λ2 λ3 β 3 (Π(Qν ))−1 .
7.5. Proof of the subspace theorem 217
The constant C depends only on K , S , and the set of linear forms Lvi .
Proof: By induction on n, the case n = 0 being trivial. Now suppose n ≥ 1 and
that the lemma holds for n − 1 in place of n .
Let us fix v ∈ S and let V be the K -vector subspace of V with basis x(i)
(i = 1, . . . , n). Then the linear system
n
Lvk (x(j) )αvk = 0, (j = 1, . . . , n)
k=0
The restrictions of the forms Lvi to V yield a set of linear forms of rank n and
the restriction of Lv,πv (n+1) to V is linearly dependent on the restrictions of the
remaining linear forms. Hence the restrictions to V of the linear forms Lvk ,
7.5. Proof of the subspace theorem 219
with γvj ∈ Kv .
Now note that OS,K is a lattice in v∈S Kv , meaning that it is a discrete subgroup
with compact quotient. This follows as in the proof of Proposition C.2.6, which is
the special case, where S is the set of archimedean places. Hence the lattice has
a bounded fundamental domain and thus there is a vector (ξn+1,1 , . . . , ξn+1,n )∈
OS,K such that for v ∈ S we have
n
|ξn+1,j + γvj |v ≤ Av ,
with Av bounded only in terms of K and S . Moreover, we may assume that
Av = 1 for non-archimedean v by the following simple argument. There is a non-
zero m ∈ OS,K , depending only on K and S , such that |m|v ≤ min(1, A−1 v ) for
v | ∞ (v ∈ S) (for example, a sufficiently high power of the product of the rational
primes p with v|p ). Then, applying the argument to m−1 γvi and multiplying by
220 T H E S U B S PAC E T H E O R E M
we infer from (7.40) and the inductive step (7.39) the estimate
(n+1)
n
Lv,πv (i) (v
) = Lv,πv (i) (x (n+1)
)+ (j)
ξn+1,j Lv,πv (i) (v )
v v
j=1
n
= (j)
(γvj + ξn+1,j )Lv,πv (i) (v )
j=1 v
|n|v Av C µvi if v is archimedean,
≤
µvi if v is not archimedean,
for i = 1, . . . , n.
7.5.30. Step VIII: Application of the Grassmann algebra.
Let Π(Q) be an approximation domain associated to the forms Lvi and parame-
ters cvi as in Step 0, and suppose that R := rank(Π(Q)) is such that 1 ≤ R ≤ n .
Then R is determined by λR ≤ 1 < λR+1 and, as already noted in the proof of
Lemma 7.5.12, we have
λn+1 Qε/(2n+2) .
Now we define k to be the smallest integer in the interval [R, n] such that the
quotient λk /λk+1 is minimal. Since
⎛ ⎞ n +1−R
1
n
1
λk λj ⎠ λR n +1−R 1
− n +1−R
≤⎝ = ≤ λn+1 ,
λk+1 λj+1 λn+1
j=R
we have
λk
Q−ε/{2(n+1)n} . (7.41)
λk+1
As usual, we write
[Kv : R]/[K : Q] if v is archimedean,
εv =
0 otherwise.
Then we define
µvj = λεj v (j = 1, . . . , n + 1).
7.5. Proof of the subspace theorem 221
Since v|∞ εv = 1 (Corollary 1.3.2), we have
µvj = λj , µvj = 1 if v | ∞, v ∈ S. (7.42)
v|∞
(i = 1, . . . , n + 1) such that
i−1
v(1) = x(1) , v(i) = ξij x(j) + x(i)
j=1
for i, j = 1, . . . , n + 1.
Now we pass to the Grassmann algebra of order n + 1 − k , where k , as defined
at the beginning of this article, verifies (7.41). We abbreviate
i = (i1 , . . . , in+1−k ), where {i1 < i2 < · · · < in+1−k },
Lv,πv (i) = Lv,πv (i1 ) ∧ Lv,πv (i2 ) ∧ · · · ∧ Lv,πv (in +1−k ) ,
v(i) = v(i1 ) ∧ v(i2 ) ∧ · · · ∧ v(in +1−k ) ,
cv,πv (i) = cv,πv (i1 ) + cv,πv (i2 ) + · · · + cv,πv (in +1−k ) ,
n+1−k
n+1−k
λi = λiν , µvi = µv,iν .
ν=1 ν=1
The linear forms Lv,πv (i) are linearly independent and so are the points v(i) .
By (7.43) and the Laplace identity (7.16) on page 198, it is immediate that, for
some constant C12 , we have for every i and j and v ∈ S
C12 µvi Qcv , π v (i)
(j) if v|∞,
Lv,πv (i) (v ) ≤ (7.44)
v,K c
Q vv , π (i) if v | ∞.
222 T H E S U B S PAC E T H E O R E M
Moreover, if i = (k + 1, k + 2, . . . , n + 1) but j = (k + 1, k + 2, . . . , n + 1)
Evertse’s lemma shows that we can do a little better, namely
Lv,πv (i) (v(j) ) ≤ C12 · (µvk /µv,k+1 ) · µvi Qcv , π v (i) if v|∞, (7.45)
v,K
because in this case j1 ≤ k , hence
µvj ≤ µvk µv,k+2 · · · µv,n+1 = (µvk /µv,k+1 ) · µvi .
Now we can prove:
Lemma 7.5.31. Let S(Q) be the symmetric convex domain in ∧n+1−k (KAn+1 )
defined in obvious notation by
Lv,π (i) (X) ≤ C12 µvi Qcv , π v (i) v|∞, i = (k + 1, . . . , n + 1),
v v,K
Lv,π (i) (X) ≤ C12 · (µvk /µv,k+1 ) · µvi Qcv , π v (i)
v|∞, i = (k + 1, . . . , n + 1),
v v,K
Lv,π (i) (X) ≤ Q v , π v (i)
c
v ∈ S, v | ∞,
v v,K
X ≤ 1 v∈/ S.
v
Let λj (S(Q)) be the successive minima of S(Q). Then we have
n+1
λj (S(Q)) ≤ 1 if j < ,
k
n+1
λj (S(Q)) > C13 Qε/{2n(n+1))} if j = .
k
Proof: The first part of the thesis of the lemma (namely, for j < n+1
k ) is obvious
from (7.44) and (7.45), since they provide independent points in S(Q). For the
second part, we apply Minkowski’s second theorem, which gives
⎛ n +1 ⎞
(k )
⎜ ⎟ n +1
1⎝ λj (S(Q))⎠ β ( k ) (S(Q))1/d 1, (7.46)
j=1
By Minkowski’s second theorem, Lemma 7.5.7, and (7.41) on page 220 this may
be bounded by
(nk )
λk λk
λ1 · · · λn+1 β n+1
(Π(Q))1/d
Q−ε/{2n(n+1)} .
λk+1 λk+1
We have already noticed that λj (S(Q)) ≤ 1 if j < n+1 k+1 , hence the second part
of the thesis follows from the left-hand side of (7.46).
7.5.32. Step IX: Proof of the subspace theorem.
We apply Theorem 7.5.13 as follows. Let (Qν ) be a unbounded family such that
we have approximation domains Π(Qν ) as in Step 0 with rank(Π(Qν )) = R ,
where 1 ≤ R < n (the case R = n being already covered by Theorem 7.5.13).
By going to a subfamily, we may assume that the parameter k defined at the be-
ginning of Step VIII (see 7.5.30) is constant along this family.
With µvi as in Step VIII (relative now to Q = Qν ), we note that there is a constant
C14 > 0 such that
Q−C
ν
14
≤ µvi ≤ QC
ν
14
for every v|∞ and every i (of course, µvi = 1 if v | ∞). The argument is the same
as for the proof of Corollary 7.5.5. Clearly, it suffices to verify the corresponding
result for the successive minima of Π(Qν ). By 7.5.6 and Lemma 7.5.7, we have
≥ Qν−2rd(n+1)|S|
d cv i
β n+1 (Π(Qν )) Qν vi
and hence, by Minkowski’s second theorem, it suffices to obtain a lower bound for
the first minimum. Since the linear forms Lv0 , . . . , Lvn are linearly independent,
we have for every v ∈ S and x ∈ K n+1 \ {0} with ι(x) ∈ Π(Qν )
log |x|v ≤ max log |Lvi (x)|v,K + C15 ≤ cvi log(Qν ) + C15
i
for some constant C15 . Since cvi ≤ 2r , from this it follows haff (xi ) log(Qν )
and
−C16 log(Qν ) ≤ log max |Lvi (x)|v,K
i
for large Qν (use (1.8) on page 20). If we apply this with x such that ι(x) deter-
mines the first successive minimum of Π(Qν ) and with v|∞, then the right-hand
side is bounded by log |λ1 |v + cvi log(Qν ), proving what we want.
Once this observation has been made, we proceed as we did in defining approxi-
mation classes and, going once more to a subfamily, we may assume that, given
any small positive number γ > 0, there are bounded real numbers dvi (v|∞)
such that
−γεv + dvi < log(C12 µvi )/ log(Qν ) ≤ dvi if i = (k + 1, . . . , n + 1)
224 T H E S U B S PAC E T H E O R E M
and
−γεv +dvi < log(C12 (µvk/µv,k+1 )µvi )/ log(Qν ) ≤ dvi if i = (k+1, . . . , n+1).
If v ∈ S and v | ∞, we set dvi = 0.
If Πk (Qν ) is the parallelepiped in ∧n+1−k (KAn+1 ) defined by
Lv,π (i) (X) c +d
≤ Qνv , π v (i) v i (v ∈ S)
v v,K
X ≤1 (v ∈/ S),
v,K
n+1
λj (Πk (Qν )) ≤ 1 if j < ,
k
n+1
λj (Πk (Qν )) > C17 Qνε/{4n(n+1)} if j = .
k
Hence
n+1
rank(Πk (Qν )) = −1
k
as soon as Qν is sufficiently large, which we may suppose.
We have
1
µvk
1 λk
= log + log λi + O(1) + γ
log(Qν ) λk+1
i
1 λk n
= log + log(λ1 · · · λn+1 ) + O(1) + γ.
log(Qν ) λk+1 k
Therefore, using (7.41) on page 220 and again Minkowski’s second main theorem
together with Lemma 7.5.7, we get
ε n O(1)
dvi ≤ − − cvi + γ + .
2(n + 1)n k vi log(Qν )
v∈S i
7.5. Proof of the subspace theorem 225
In this section we give a quick review, without proofs, of important progress in this area.
7.6.1. In his landmark paper [114], G. Faltings introduced a completely new approach to
study the index of a multihomogeneous polynomial at a point. We describe now the simplest
version of Faltings’s basic result, the product theorem.
We work in a product
PK := PnK1 × · · · × PnKm
of projective spaces, over an algebraically closed field K of characteristic 0 and with
sections
f ∈ Γ(P, OP (d1 , . . . , dm ))
associated to the ample line bundle OP (d1 , . . . , dm ) (here the degrees d1 , . . . , dm are
positive integers). Recall that f may be identified with a multihomogeneous polynomial,
homogeneous of degree dh in the variables xh = (xh0 , . . . , xhn h ) (see Example A.6.13).
7.6.2. For x ∈ PK , we define the index of f in x with respect to the weights d by
ind(f ; d; x) := min{(I/d) | ∂I f (x) = 0},
where I = (i1 , . . . , im ) with ih ranging over Nn h for h = 1, . . . , m and where (I/d)
and ∂I are defined as in 7.5.14.
This notion extends the definition of the index in 6.3.2 in the following way. For a polyno-
mial F ∈ K[x1 , . . . , xm ] with partial degrees at most dh , we may consider the multiho-
mogenization
f (x10 , x11 ; . . . ; xm0 , xm1 ) := xd101 · · · xdm0
m
F (x11 /x10 , . . . , xm1 /xm0 )
m
of multidegree d . By passing from x ∈ AK to the multiprojective space P1K
n
, it is
clear that the index of F in x with respect to the weight d as defined in 6.3.2 is the same
as ind(f ; d; x) .
7.6.3. Let σ ≥ 0 . Faltings’s product theorem gives information on the geometry of the set
Zσ of PK on which ind(f ; d; x) ≥ σ . Since Zσ is the zero set of the multihomogeneous
polynomials ∂I f ((I/d) < σ) , it is a closed subvariety of PK .
(a) d1 > · · · > dm are rapidly decreasing positive integers, namely dh /dh+1 ≥ C
for h = 1, . . . , m − 1 .
(b) f ∈ Γ(PK , OP (d1 , . . . , dm )) \ {0} .
7.7. The Faltings–Wüstholz theorem 227
Then:
We will not prove here this important result, referring to Faltings’s paper [114], to the article
of M. van der Put [303] for a simple proof of (i) and (ii), and to the versions with explicit
good constants in Evertse [106], Ferretti [119], and Rémond [241].
Remark 7.6.5. Part (iii) of the thesis of this theorem is best stated taking for h(Zi ) a more
intrinsic notion of height, rather than the hand-made height through presentations. This is
done by Faltings in [114], by definining the height as an intersection number of arithmetic
cycles in Pn i . Another definition is by taking the height of the Chow point defining Zi ;
this second definition is equivalent to Faltings’s, up to a simple uniformly bounded error
term (see J.-B. Bost, H. Gillet, C. Soulé [45], Sec.4.3).
7.6.6. The product theorem is used as follows. Let N be an integer N > dim(P) and let
σ > 0 . Assume that ind(f ; d; x) ≥ σ at some x ∈ P . Then there exist a chain
P = Z1 ⊃ Z2 ⊃ · · · ⊃ ZN x
with Zi an irreducible component of Ziσ/N . Since each Zi is irreducible, the dimension
drops every time we have Zi = Zi+1 and it follows from N > dim(P) that Zi = Zi+1
for some i . Taking ε = σ/N , we may apply the product theorem and deduce that Zi is a
product variety Zi = Zi1 × · · · × Zim . Obviously, we must have Ziµ = Pn µ for some µ
and the study of the vanishing of f on Zi is reduced, by projecting to some linear subspace
of Pn µ , to the study of the vanishing in a multiprojective space with smaller dimension
than that of P . Then we apply induction.
It turns out that this inductive procedure is much more efficient that Roth’s method using
Wronskians, where we lose a square root every time we increase m by 1 . For example,
m −1
proving Roth’s lemma using the product theorem allows us to replace σ 1/2 by the much
1/m
better σ (with minor changes for the other constants, see [106], Th.3). In quantitative
results, this has the effect of replacing doubly exponential bounds by simply exponential
bounds.
on page 173, 6.5.8). Evertse and Schlickewei [111] have obtained a remarkably strong result
of this type, which in view of its strong uniformity with respect to its dependence on the
field K and the set of places S has proved to be a powerful tool in applications (Theorem
7.4.1 is an example).
7.7.1. Let K be a number field with a finite set S of places. For v ∈ S , let Lv0 , . . . , Lvn
be linearly independent linear forms in x0 , . . . , xn with coefficients in K . We assume that
the coefficients of the linear forms Lv0 , . . . , Lvn are contained in a field extension of K
of degree at most D and that HAr (Liv ) ≤ H . We denote by | |v,K an extension of | |v
to K .
Now we can state the absolute subspace theorem of Evertse and Schlickewei (see [111],
Th.3.1, for explicit constants and a proof).
Theorem 7.7.2. Let ε > 0 . Then there are proper linear subspaces T1 , . . . , Th of PnK
with h bounded in terms of n , |S| , D and ε , with the property that the set of solutions
x ∈ Pn (K) of
n
|Lvi (σx)|v,K
max ≤ H(x)−n−1−ε |det(Lvi )|v,K
v∈S i=0
σ∈Gal(K /K ) |σx|v,K v∈S
The set of filtrations so obtained on V is jointly semistable, if for each non-zero proper
subspace W ⊂ V we have µ(W ) ≤ µ(V ) .
7.7. The Faltings–Wüstholz theorem 229
Theorem 7.7.4. Assume that the linear forms Lwi (w ∈ S, i ∈ Iw ) define a jointly
semistable filtration on V and that µ(V ) > 1 . Then the number of points x ∈ Pn (K)
with
|Lwi (x)|w
< H(x)−c w i (w ∈ S, i ∈ Iw )
|x|w
is finite.
7.7.5. A more general theorem where the linear forms Lwi need not be jointly semistable
is then obtained by considering the first non-trivial step W in the Harder–Narasimhan fil-
tration of V . This is the unique subspace W of V characterized by the property that
(µ(W ), dim(W )) is maximal with respect to the lexicographic order.
Let P∗ (V ) denote the projective space of one-dimensional quotient spaces of V . The
conclusion now is that if µ(W ) > 1 then there are only finitely many x ∈ P∗ (V )(K) \
P∗ (V /W ) such that
|Lwi (x)|w
< H(x)−c w i (w ∈ S, i ∈ Iw ).
|x|w
7.7.6. The Faltings–Wüstholz theorem can be applied to the study of a system of inequali-
ties
|fwi (x)|w < H(x)−c w i (w ∈ S, i ∈ Iw ),
where now fwi ∈ F [x0 , . . . , xn ] are homogeneous forms of any degree. One may assume
that they have all the same degree r and then one associates to this the corresponding
linear forms obtained by a Segre embedding Pn → PN using all monomials of degree r ,
see 1.5.14. Although the results obtained in this way probably are not optimal, they are
usually stronger than those obtained by a straightforward application of Schmidt’s subspace
theorem.
7.7.7. The computation of the invariants µ(V ) and the verification of the semistability
condition is not easy. R.G. Ferretti [120] has considered more generally replacing the forms
fwi , which define hypersurfaces in Pn , by projective subvarieties of Pn , and has shown
how to compute the associated invariants using the Chow form associated to a subvariety of
Pn .
Finally, in an interesting paper J.-H. Evertse and R.G. Ferretti [110] have been able to
combine this point of view with the absolute subspace theorem in 7.7.2, obtaining a rather
strong absolute version of the general Faltings–Wüstholz theorem. A new idea in their paper
is the use of a more general type of Segre embedding, which is chosen in an optimal way
so as to produce the best exponents. As a consequence, they obtain the Faltings–Wüstholz
theorem as a consequence of the original Schmidt’s subspace theorem and of their analysis
of generalized Segre embeddings.
230 T H E S U B S PAC E T H E O R E M
The exposition in Sections 7.2 and 7.3 follows to a large extent material gleaned
from J.-H. Evertse’s expository paper [105]. The presentation of Siegel’s theorem
and Section 7.4 follow closely Zannier [337], Ch.II–IV.
The proof of the subspace theorem in the present form is the result of many years
of step-by-step progress. The first step towards it was W.M. Schmidt’s paper [264]
of 1967, in which he solved the problem in the case n = 2 and K = Q . In that
paper the role of geometry of numbers emerges clearly through the use of Mahler’s
theorems on successive minima of polar bodies. However, the extension to the
general case required control of all successive minima and this was done only in
1970 in [266], when Schmidt introduced the tool of the Grassmann algebra. This
was followed by Schlickewei’s generalization with several places and a general
number field K .
A new direction began with Schmidt’s extension of the Davenport–Roth theorem
to the multidimensional case in [269]. This line of research culminated in the ab-
solute subspace theorem 7.7.2 of Evertse and Schlickewei [111]. The remarkable
uniformity of their result with respect to fields of definition and the set of places S
has proven to be essential in applications. Essential ingredients in the proof of the
absolute subspace theorem are an absolute version of geometry of numbers (with
a corresponding absolute Siegel lemma) found by Roy and Thunder [247], [248],
a precise gap principle for the sequence of solutions, and a precise quantitative
version of Faltings’s product theorem.
Vojta in [307] gives a succint account of the subspace theorem stressing certain
analogies with the work of L. Ahlfors [7] on meromorphic curves. A version
of the subspace theorem allowing “moving targets,” similar to Theorem 6.5.2 in
Roth’s case, is in Ru and Vojta [250].
The paper by Faltings and Wüstholz [117] gave a deep geometric extension of
the subspace theorem, using new methods quite independent of Schmidt’s. The
main tools are Faltings’s fundamental product theorem and the introduction of the
Harder–Narasimhan filtration in order to be able to apply probabilistic methods for
the construction of the auxiliary polynomial.
The proof of Theorem 7.5.13 is patterned after Evertse in [108], with several sim-
plifications because we do not keep track of constants. The rest of the proof is
modeled after Evertse’s treatment of the rational case, see [105]. For Faltings’s
product theorem, we also recommend the illuminating review of [117] by J.-H.
Evertse in [107].
8 A B E L I A N VA R I E T I E S
8.1. Introduction
8.2.11. This allows us to change conventions. From now on, we write an abelian
variety additively, hence
m(x, y) = x + y,
ι(x) = −x,
and the identity is denoted by 0. For a ∈ A , the morphism τa (x) : = x + a is
called translation by a.
homomorphism x
→ xp on the multiplicative group Gm over Fp contains only 1 , and
therefore cannot be distinguished from the kernel of the identity map. These difficulties
disappear in the context of group schemes. For details, we refer to I.R. Shafarevich [280],
Ch.V, 4.2. The natural way is to define ker(ϕ) as the cartesian product G ×H Spec(K) in
the category of schemes with respect to the Cartesian diagram
ker(ϕ) → Spec(K)
εH
↓ ↓
ϕ
G → H
where εH is the map of Spec(K) to the neutral element of H . Then ker(ϕ) is a closed
subscheme such that its K -rational points form a group. On the other hand, ker(ϕ) need
not be reduced and so is not necessarily a group variety. Working with group schemes is
therefore the natural way of overcoming these obstacles, leading to a coherent theory. The
more elementary classical theory in the framework of group varieties is adequate only for
separable maps. On the other hand, a famous result of Cartier states that every group scheme
in characteristic 0 is reduced, i.e. is a group variety (see [85], Ch.II, §6, no.1).
Since the fact that ker(ϕ) is in general only a group scheme plays no role in this text, we
only consider
ker(ϕ) := {x ∈ G(K) | ϕ(x) = εH }
as a closed subgroup of G , unless specified otherwise.
To give an idea of the Riemann form, let L be a line bundle on the complex torus T .
Here in these analytical remarks, this always means a holomorphic line bundle. Then the
cohomology group H k (T, Z) can be identified with the group of alternating Z -valued k -
∼
forms on Λ . This is clear for k = 1 and follows from the isomorphism Λk H 1 (T, Z) →
H (T, Z) induced by cup product. It is easy to see that multiplication by i is an isometry
k
with respect to the alternating bilinear form corresponding to the Chern class c1 (L) ∈
H 2 (T, Z) and so it may be viewed as the imaginary part of a unique Riemann form H .
Then H is positive definite if and only if L is ample. Note that two line bundles have the
same Chern class (and hence the same Riemann form) if and only if they are algebraically
equivalent (use A.14.8).
An analytic homomorphism ϕ : T = V /Λ → T = V /Λ of complex tori is the quotient
map of the linear map dϕ : V → V of tangent spaces using that dϕ(Λ) ⊂ Λ is induced
from
Λ = H1 (T, Z) → Λ = H1 (T , Z), γ
→ ϕ ◦ γ.
240 A B E L I A N VA R I E T I E S
By Proposition 8.2.26, the cotangent bundle of an abelian variety over the field K
is trivial. Thus an abelian variety of dimension 1 has genus 1, i.e. is an elliptic
curve. In this section we prove the converse statement, namely that an elliptic
curve has a group structure and is an abelian variety. By Corollary 8.2.9, this
group structure is unique up to translations.
Elliptic curves are a major tool in arithmetic and play a role also in other parts of
mathematics. The interested reader may consult the monographs by D. Husemöller
[155] or J. Silverman [284] for a deeper study of the subject. We assume the reader
to be familiar with the theory of algebraic curves as provided by Section A.13.
Definition 8.3.1. An elliptic curve over K is a geometrically irreducible smooth
projective curve E of genus g(E) = 1 defined over K , equipped with a rational
point P0 ∈ E(K).
8.3.2. Note that geometrically irreducible is the same as irreducible by A.7.14. Let
E be an elliptic curve over K and let D be a divisor on EK of degree deg(D) >
0. The space of global sections Γ(EK , O(D)) may be realized as the subspace
with coefficients in K . By the above, c5 and c6 are different from zero, so that
we may normalize c5 = −1. If we divide by c36 and replace x by x/c6 and y by
y/c26 , we get a relation of the form
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 (8.2)
with ai ∈ K .
242 A B E L I A N VA R I E T I E S
Since deg(3[P0 ]) = 3 = 2g(E) + 1, the divisor 3[P0 ] is very ample (cf. A.13.7).
Hence the basis of L(3[P0 ]) corresponding to 1, x, y induces a closed embedding
of E into P2K (cf. Remark A.6.11). We know by (8.2) that the image of E is
contained in the projective curve with Weierstrass equation
x0 x22 + a1 x0 x1 x2 + a3 x20 x2 = x31 + a2 x0 x21 + a4 x20 x1 + a6 x30
in the homogeneous coordinates (x0 : x1 : x2 ) of P2K .
It is an easy matter to prove that the curve defined above is geometrically irre-
ducible, hence it gives a projective model of E as a smooth plane cubic curve.
Note also that the rational functions x = x1 /x0 and y = x2 /x0 are nothing
else than the two functions x, y defined before, hence the affine form (8.2) of the
Weierstrass equation describes the affine curve E ∩ {x0 = 0}. The only point of
E outside this part is the point (0 : 0 : 1) ∈ P2K , corresponding to P0 ∈ E . It is
easily seen that, in this model, P0 is an inflexion point of E .
Remark 8.3.5. If char(K) = 2, then replacing y by 12 (y − a1 x − a3 ) leads
to a Weierstrass equation with a1 = a3 = 0. Then the Jacobi criterion shows
that a Weierstrass equation describes a smooth curve C in P2K if and only if the
discriminant of the cubic polynomial x3 + a2 x2 + a4 x + a6 is not zero (see
Proposition 10.2.3 for the argument). By the genus formula
1
g(C) = (deg(C) − 1)(deg(C) − 2),
2
for a smooth plane curve (cf. A.13.4), this is an elliptic curve. If char(K) = 3,
then a further linear transformation leads to the well-known Weierstrass normal
form
y 2 = 4x3 − g2 x − g3
of the elliptic curve. For generalizations and details, see [284].
8.3.6. Now we go back to arbitrary characteristic. We describe more explicitly
the group structure of the abelian group E , beginning by proving that the inverse
operation is a morphism.
Proposition 8.3.8 (Addition Law). Let E be the elliptic curve in normal form
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 .
Then the origin O of the group E is the unique point at infinity and the group law
+ is defined as follows. Let P1 = (x1 , y1 ), P2 = (x2 , y2 ) be two finite points on
E and set
y2 − y1
a= if x1 = x2 ,
x2 − x1
3x2 + 2a2 x1 + a4 − a1 y1
a= 1 if x1 = x2 ,
2y1 + a1 x1 + a3
b = y1 − ax1 .
Then:
The addition law and the associative law for an elliptic curve in Weierstrass form
can be seen visually as in the following picture:
P+Q
P
-2 -1 Q 1 2
-(P+Q+R)
Q+R
-1
-2
R
8.3.9. The addition law shows that addition is a rational map. In order to finish
the proof of Proposition 8.3.3, it remains to show that + is a morphism. By
A.11.9, we may assume that K is algebraically closed. In a first step, we prove
that translation τQ by Q ∈ E is a morphism. We may assume Q = O . By the
formulas in Proposition 8.3.8, τQ is a rational map which restricts to a morphism
E \ {O, Q, −Q} → E \ {Q, O, Q + Q}. Since every rational map between
projective smooth curves extends to a morphism (cf. A.11.10), we get a morphism
τQ : E → E , which agrees with τQ on E \ {O, Q, −Q}. It remains to prove
that τQ = τQ . For R ∈ E , we get τQ ◦ τR = τQ+R
. In particular, every τQ is
an isomorphism with inverse τ−Q . We conclude that τQ maps {O, Q, −Q} onto
{Q, Q + Q, O}. For any R ∈ {O, Q, −Q, Q + Q, −Q − Q}, we have
τR (τQ
(Q)) = τQ+R
(Q) = τQ (τR (Q)) = τQ
(Q + R) = Q + Q + R.
This excludes τQ (Q) = Q immediately. On the other hand, we know τR (O) ∈
{O, R, R + R}. Hence τQ (Q) = O is only possible if Q + Q = O . This proves
τQ (Q) = Q + Q = τQ (Q).
The equation
τQ (−Q) = O = τQ (−Q)
is proved in a similar fashion. Thus, using that τQ is a bijection, we conclude
that τQ (O) = Q = τQ (O). We have handled all exceptions, thereby proving that
τQ = τ Q .
Next, we prove that addition is a morphism. The formulas in 8.3.8 show that
addition is a rational map m , which is a morphism outside of
Z := {(P, P ) | P ∈ E} ∪ {(P, −P ) | P ∈ E} ∪ (E × {O}) ∪ ({O} × E) .
For (P, Q) ∈ Z , there are R, S ∈ E such that (P + R, Q + S) ∈ Z . Since
translations are morphisms by our above considerations, we see that
τ−P −Q ◦ m ◦ (τR × τS )
is a morphism in a neighbourhood of (P, Q) and agrees with + everywhere. This
proves that + is a morphism.
8.3.10. Complex analytically, an elliptic curve is biholomorphic to C/Λ for a lattice Λ in
C (cf. 8.2.27). In dimension 1 the converse is true, i.e. every one-dimensional complex
torus is biholomorphic to an abelian variety: The imaginary part of a Riemann form must
be an integer multiple of the alternating bilinear form E0 given in the following way. Let
λ1 , λ2 be a positively oriented Z -basis of Λ ; then E0 is characterized by
v ∧ w = E0 (v, w)λ1 ∧ λ2 (v, w ∈ C).
A Riemann form is positive definite if and only if its imaginary part is a negative multiple of
E0 . Note however that in higher dimensions a complex torus need not be an abelian variety
(in fact, this is the case for a general complex torus).
246 A B E L I A N VA R I E T I E S
The description of the elliptic curve determined by C/Λ is done quite explicitly by means
of the Weierstrass ℘ - function associated to the lattice Λ , namely
1 $ 1 1
%
℘(z) := 2 + − 2 .
z (z − ω)2 ω
ω∈Λ\{0}
It is a Λ -periodic meromorphic function on C and has double periods at the lattice points.
It satisfies the first-order differential equation
℘ (z)2 = 4℘(z)3 − g2 ℘(z) − g3 ,
where the coefficients g2 , g3 are given by
1 1
g2 := 60 , g3 := 140 .
ω4 ω6
ω∈Λ\{0} ω∈Λ\{0}
Elliptic curves are the only standard explicit examples of abelian varieties, because
higher-dimensional abelian varieties can be defined only by means of a very large
number of equations, and little can be understood on abelian varieties by looking
directly at these equations.
In this respect, the cubic model of an elliptic curve is rather special and not rep-
resentative of the general situation. However, abelian varieties are ubiquitous in
algebraic geometry and they occur most naturally, through the Picard variety, in
the parametrization of families of divisor classes on a variety. For singular vari-
eties, it is better to use the Picard group instead of divisor classes (cf. Section A.8
and A.9.18).
This section is devoted to fundamental facts about Picard varieties. Here, the
reader is assumed to be familiar with the basic properties of the Picard group as
provided by A.5.16, with Section A.8 about divisors and with the concept of al-
gebraic equivalence for line bundles (see end of Section A.9). We fix the ground
field K and an algebraic closure K .
8.4.1. If ϕ : X −→ Y is a morphism of varieties over K and y ∈ Y , then the
fibre of ϕ over y is denoted by Xy . The pull-back of c ∈ Pic(X) to the fibre
Xy is denoted by cy . It is an element of Pic(Xy ). Note that Xy and cy are only
defined over K(y). Often, we identify X with Xy using the map x → (x, y). Of
course, this is only defined over K(y).
8.4. The Picard variety 247
In the following, we consider c ∈ Pic(X × Y ) and the fibres with respect to the
projections p1 , p2 onto the factors. For x ∈ X, y ∈ Y , we have
cy = c|X×{y} ∈ Pic(XK(y) ), cx = c|{x}×Y ∈ Pic(YK(x) ).
Corollary 8.4.5. Let A be an abelian variety over K , let pi be the ith projection
A × A onto A , and let m be addition as usual. The following conditions are
equivalent for c ∈ Pic(A):
Note that identity (b) takes place in Pic(AK(a) ) after identifying A with A × {a}
as usual.
Proof: The equivalence is a consequence of
(m∗ (c) − p∗1 (c) − p∗2 (c)) |A×{a} = τa∗ (c) − c
and the seesaw principle from 8.4.4. If we pull-back equation (a) by the morphism
A −→ A × A, a −→ (a, −a),
then we get [−1]∗ (c) = −c .
8.4.6. Let X be an irreducible smooth complete variety defined over a field K .
Recall that Pic0 (X) denotes the subgroup of Pic(X) consisting of those classes
represented by line bundles algebraically equivalent to the trivial line bundle (cf.
A.9.35). A fundamental result is that the group Pic0 (X) is canonically an abelian
variety. In what follows, we would like to describe this algebraic structure on
Pic0 (X).
We assume X(K) not empty and we fix a point P0 on X , which will play the role
of a base point. By A.7.14, we know that X is geometrically irreducible. Let T
be an irreducible variety. We say that c ∈ Pic(X ×T ) is a subfamily of Pic0 (X)
parametrized by T if:
The following fundamental result due to Poincaré gives the existence of a universal
family p and parameter space B , such that any subfamily c parametrized by any
T is obtained by an appropriate pull-back of p . We denote by idX the identity
map of X .
Theorem 8.4.7. There is a subfamily p of Pic0 (X), parametrized by an irre-
ducible smooth complete variety B , with the following universal property. For
8.4. The Picard variety 249
8.4.9. The following result will show that the F -rational points of the Picard vari-
ety may be identified with Pic0 (XF ) for any extension F/K . In particular, the set
of points of the Picard variety corresponds to Pic0 (XK ). We denote the Picard
variety of X cursively by P ic0 (X) to distinguish from the subgroup Pic0 (X)
of Pic(X).
250 A B E L I A N VA R I E T I E S
onto X × B . For c : = p∗1 (p) + p∗2 (p) and a, b ∈ B , the restriction of c to the
fibre X × {a} × {b} is equal to a + b (identifying X × {a} × {b} with X ).
In order to see this, note that the restriction of p∗1 (p) is equal to the restriction of
p to X × {a} and then use Remark 8.4.11. Since c is a subfamily of Pic0 (X)
parametrized by B × B , there is a unique morphism m : B × B → B with
(idX × m)∗ (p) = c . By Remark 8.4.11, we obtain
m(a, b) = pm(a,b) = c(a,b) = a + b
and so addition is a morphism. Let ι : B → B be the unique morphism with
(idX × ι)∗ (p) = −p. We get similarly
ι(b) = pι(b) = −pb = −b
and so the inverse is also a morphism.
(a) There is p ∈ Pic(X × P ic0 (X)) such that pb = b for b ∈ P ic0 (X) and
pP0 is trivial.
(b) For any subfamily c of Pic0 (X) parametrized by an irreducible variety T
over K , the set-theoretic map
T −→ P ic0 (X), t −→ ct
is actually a morphism over K .
8.4.15. At the end, we describe the situation complex analytically. For details, we refer to P.
Griffiths and J. Harris [130], pp.326–332. Let X be an irreducible proper smooth complex
variety endowed with its structure as a compact connected complex manifold (cf. A.14).
The considerations hold more generally for a connected compact Kähler manifold. As in
Example A.10.10, the transition functions (gαβ ) of a line bundle L on X may be viewed
× ×
as a Čech-cocycle with values in OX and we may identify Pic(X) with H 1 (X, OX ).
The exponential map induces a short exact sequence
× exp
0 −→ ZX −→ OX −→ OX −→ 0,
where ZX is the sheaf associated to the constant presheaf Z on X . The beginning of the
associated long exact cohomology sequence is
0 −→ Z −→ C −→ C× −→ H 1 (X, Z) −→
× 1 c
−→ H 1 (X, OX ) −→ H 1 (X, OX ) −→ H 2 (X, Z) −→ · · · .
The map c1 gives the Chern class of line bundles. A line bundle is algebraically equiv-
alent to 0 if and only if its Chern class is 0 (cf. [130], p.462). If we use the canonical
isomorphism
H 1 (X, OX ) ∼ 0,1
= H (X)
arising from the Dolbeault complex, we conclude that the Picard variety is biholomorphic
to the complex torus H 0,1 (X)/H 1 (X, Z) .
8.5. The theorem of the square and the dual abelian variety
In Section 8.4 we have defined the Picard variety P ic0 (X) of an irreducible smooth com-
plete variety X with K -rational base point. Now X is always an abelian variety A over
the field K and the base point is the origin. Then the associated Picard variety is called the
.
dual abelian variety denoted by A
The theorem of the square says that for any c ∈ Pic(A) the point ϕc (a) := τa∗ (c) − c is in
A and additive in a ∈ A . Over C , it is quite clear that the translated τa∗ (c) is algebraically
equivalent to c using a path from 0 to a for the deformation. In the special case of an
elliptic curve E with origin P0 and divisor D we have
τP∗ (D) ∼ D − [P ] + [P0 ]
and the theorem of the square is evident from [P ] − [P0 ] algebraically equivalent to 0 and
[P + Q] ∼ [P ] + [Q] − [P0 ] . The theorem of the square in the general case will be obtained
from our results about the Picard variety.
As a consequence of the theorem of the square, we will prove that an abelian variety is
always projective. If c is ample, we will see that ϕc is surjective and has finite kernel, thus
A has the same dimension as A .
At the end, we will mention a direct construction of the dual abelian variety, biduality and
the complex analytic situation. These considerations are not essential for the sequel of the
book.
Theorem 8.5.1. Let c ∈ Pic(A) and a ∈ A . Then ϕc (a) := τa∗ (c)−c ∈ P ic0 (A)(K(a))
and ϕc : A → P ic0 (A) is a homomorphism of abelian varieties over K .
8.5. The theorem of the square 253
See [212], II.8, Th.1 for a cohomological proof of this key fact.
Remark 8.5.4. The kernel of ϕc gives much information about c . If c is ample, then the
kernel is finite. We will prove a partial converse of this statement, which we will use in
Section 8.10 in the proof that the Θ -divisor on the Jacobian is ample. On the other hand,
ker(ϕc ) = A if c ∈ Pic0 (A) . These statements about the kernel will be proved next and
afterwards we shall sketch a construction of the dual abelian variety.
Proposition 8.5.5. A class c ∈ Pic(A) is ample if and only if ker(ϕc ) is finite and in
addition H 0 (A, nc) = 0 for some positive integer n .
Proof: Assume that c is ample. Let B be the connected component of the closed subgroup
ker(ϕc ) containing 0 . For any b ∈ B , we have
τb∗ (c) = c,
and hence
[−1]∗ (cB ) = −cB
by Corollary 8.4.5. Since
0B = cB + [−1]∗ (cB )
is ample, B has to be the trivial abelian subvariety {0} (use A.6.15). Therefore ker(ϕc )
is finite. Choose n so large that nc is very ample. This gives H 0 (A, nc) = 0 .
In the other direction, we may assume that H 0 (A, c) = 0 , i.e. there is an effective divisor
D such that O(D) is in the class of c . The next lemma shows that c is ample.
Lemma 8.5.6. Let D be an effective divisor on A and suppose that the subgroup
{a ∈ A | τa∗ (D) = D} is finite. Then D is ample on A .
254 A B E L I A N VA R I E T I E S
Proof: Note that D is ample if and only if DK is ample on AK (cf. Remark A.6.12);
hence we may assume K algebraically closed. The proof then proceeds by proving first
that the linear system |2D| is base-point free and defines a morphism ϕ of A into some
projective space. Then we prove that ϕ is a finite morphism and the conclusion comes by
pull-back. The details are as follows.
Let a, b ∈ A . If b is in the support of the effective divisor
Ea := τa∗ (D) + τ−a
∗
(D),
then b + a or b − a is in the support of D . (The reader need not worry about pull-back
of divisors because it is done with respect to an isomorphism. In this case the pull-back
is just given by the inverse images of the components, keeping the multiplicities.) For any
given b ∈ A we can always find a point a ∈ (D − b) ∪ (b − D) , i.e. b ∈ / supp(Ea ) .
Then by the theorem of the square in 8.5.2 the effective divisor Ea is an element of |2D| .
Thus the linear system |2D| is base-point free hence, by A.6.8, it defines a morphism
ϕ : A −→ PnK .
The morphism ϕ is proper (cf. A.6.15 (b), (e)). Let F be an irreducible component of any
fibre. All elements of |2D| are pull-backs of hyperplanes by the definition of ϕ . Now for
any a ∈ A either F is contained in the support of Ea or F ∩ supp(Ea ) = ∅ , hence we
can find a ∈ A such that F and the support of Ea are disjoint, namely
a∈
/ supp(D) − F.
Let Z be an irreducible component of D . Then Z − F is an irreducible closed subset of
A not containing a . We conclude that Z − F is of codimension 1 . Now note that for any
b ∈ F we have
Z − F = Z − b,
whence it follows that Z is invariant by translations in F − F . Therefore, the same is
true for D instead of Z . By assumption, this is only possible for dim(F ) = 0 and we
conclude that ϕ has finite fibres. Thus, since ϕ is a proper morphism, it must also be finite
(cf. A.12.4). Finally, recalling that the pull-back of an ample class by a finite morphism is
again ample (see A.12.7), we see that 2D is ample.
Corollary 8.5.7. An abelian variety is projective.
Proof: Let U be an affine open subset of A containing 0. We may assume that dim(A) ≥
1 . Let Z1 , . . . , Zr be the irreducible components of A \ U . Enlarging them, we may
assume that Z1 , . . . , Zr are prime divisors. In order to see this note that the complement
of a divisor in an affine smooth variety is affine (use A.2.10). Setting
r
D := Zi ,
i=1
8.5.8. The situation is even better than as described above. On an abelian variety, any ample
divisor D is rationally equivalent to an effective divisor. As we have seen in the proof of
Lemma 8.5.6, the linear system |2D| is base-point free. In the case of elliptic curves, it is
obvious that 3D is very ample. The striking fact is that this remains true in general. We
will not need these results in this book and we only refer to [212] for a proof.
and similarly
c{0}×T = p.
Let us denote by p2 the projection of A × T onto T . The subfamily c − p∗2 (p) of
Pic0 (A) parametrized by T induces a morphism T → P ic0 (A), which is equal
to ϕ (cf. Theorem 8.4.13). Since ϕ(A × {0}) = 0, the constancy lemma in 8.2.6
shows τa∗ (b) = b for all a ∈ A . This proves the claim.
The implication (b) ⇒ (c) is Theorem 8.5.3. The existence of an ample class is
assured by Corollary 8.5.7 and so (d) is a consequence of (c). The implication (d)
⇒ (a) is part of Theorem 8.5.1.
Definition 8.5.10. The Picard variety P ic0 (A) is called the dual abelian variety
of A and will be denoted by A.
has the same dimension as A .
Corollary 8.5.11. The dual abelian variety A
256 A B E L I A N VA R I E T I E S
As suggested by the name, we have the following fact (needed only in the example below):
Theorem 8.5.13. Let p be the Poincaré class of A . Then A is the dual abelian variety of
P ic0 (A) and the Poincaré class of P ic0 (A) is the pull-back of p by the isomorphism
P ic0 (A) × A −→ A × P ic0 (A), (b, a)
→ (a, b).
We first deduce some elementary facts for involutions and quadratic functions on
an abelian group and then prove the theorem of the cube. The theorem of the cube
states that the pull-back of a fixed line bundle on an abelian variety is a quadratic
function in the morphism. It will be deduced from the theorem of the square. In
fact, both statements are easily seen to be equivalent. The theorem of the cube will
258 A B E L I A N VA R I E T I E S
and
is defined by
SI (x1 , . . . , xk ) := xi .
i∈I
In particular, s∅ is identically 0.
260 A B E L I A N VA R I E T I E S
The first sum equals 0, by induction applied to the linear and hence quadratic
function b(·, xk ). The second sum is 0 by carrying out the identity (1 − 1)k−1 =
0.
We apply the above to the abelian group M = Mor(X, A) of morphisms from X
to A and to the abelian group N = Pic(X).
Theorem 8.6.11. Let X be a variety over the field K and let A be an abelian
variety over K with c ∈ Pic(A). Then the map of Mor(X, A) into Pic(X) given
by ϕ → ϕ∗ (c) is quadratic.
θ& associated to θ such that θ/θ& is a trivial theta function. Note that θ& is unique up to a
multiplicative non-zero constant.
For example, let Z be a symmetric complex g × g matrix with Z positive definite and
let Λ be the lattice Zg + Z · Zg in Cg . Then the Riemann bilinear relations (cf. [130],
p.306) say that this is equivalent for V /Λ to be an abelian variety. Then
H(v, w) := vt (Z)−1 w (8.9)
is a positive definite Riemann form for Λ . The Riemann theta function for Λ is the entire
function defined by the Fourier series
θ(v, Z) = exp(πikt Zk + 2πikt v).
k∈Zg
& πi t −1
θ(v, Z) = exp v (Z) v · θ(v)
2
with quadratic character α(m + Zn) = exp(πimt n) .
Let L be a line bundle on the complex torus T := V /Λ . By the theorem of Appell–
Humbert (cf. 8.5.15), the trivialization V × C of π ∗ L may be normalized by a Riemann
form H and a quadratic character α : Λ → T . Then a global (resp. meromorphic) section
of L lifts to a normalized holomorphic (resp. meromorphic) theta function for Λ . This
induces a one-to-one correspondence between global (resp. meromorphic) sections of L
and normalized holomorphic (resp. meromorphic) theta functions with Hermitian form H
and quadratic character α . If L is ample or equivalently H positive definite (cf. 8.2.27),
then V /Λ is an abelian variety A and the Riemann–Roch theorem for abelian varieties
states that 0
dim(Γ(A, L)) = |det E(λj , λk )|,
where (λj ) is a Z -basis for the lattice Λ .
We conclude this section by proving the theorem of the cube complex analytically. Let
A = V /Λ be an abelian variety together with a line bundle L . We choose a trivialization
V × C of π ∗ L as in the theorem of Appell–Humbert. Since A is algebraic, there is a non-
zero meromorphic section of L inducing a normalized meromorphic theta function θ with
Riemann form H and quadratic character α . Then we consider the meromorphic function
θ(v1 + v2 + v3 )θ(v1 )θ(v2 )θ(v3 )
f (v1 , v2 , v3 ) :=
θ(v1 + v2 )θ(v1 + v3 )θ(v2 + v3 )
on V × V × V . It corresponds to a meromorphic section of
∗
4 |I |
pi L(−1) .
I ⊂{1,2,3} i∈I
using the facts that α is a homomorphism and H is bilinear. We conclude that the auto-
morphy factor e(λ 1 ,λ 2 ,λ 3 ) of the relevant line bundle above is trivial and hence this line
bundle is also trivial. This proves the theorem of the cube complex analytically.
Let A be an abelian variety over the field K . We assume that the reader is familiar
with the basic notions and results of Section A.12. In this section we study the
endomorphism [n] : A −→ A given by multiplication by n ∈ Z on the abelian
variety A . The main result is Proposition 8.7.2 dealing with its kernel A[n]. This
endomorphism plays a fundamental role in the study of abelian varieties, both
geometrically and from the arithmetic point of view. The arithmetic significance
of this isogeny is made clear by the construction of the Néron–Tate height in the
next chapter and by its role in the proof of the Mordell–Weil theorem, which will
be treated in Chapter 10.
Over the complex numbers, it is a classical fact that
A[n] ∼
= (Z/nZ)2dim(A) .
(0,0)
The isogeny [3] on C/Λ and its nine points kernel, marked with dots
264 A B E L I A N VA R I E T I E S
However, all this changes quite drastically for char(K)|n . Then the map [p] is
always inseparable and its reduced kernel A[p] has cardinality pa , with a an inte-
ger 0 ≤ a ≤ dim(A). All values of a in this range may occur (the case a = 0 is
the so-called supersingular case). As it is mentioned in 8.2.15, it is better to give
the kernel of [p] the structure of a (non-reduced) group scheme, but this will not
be treated here.
Proposition 8.7.1. Let c ∈ Pic(A) and n ∈ Z . Then
n2 + n n2 − n
[n]∗ (c) = c+ [−1]∗ c .
2 2
In particular, we have [n]∗ (c) = n2 c if c is even and [n]∗ (c) = −c if c is odd.
Proof: By Theorem 8.6.11, the function
q : Z −→ Pic(A), n −→ [n]∗ c
is quadratic. Then the claim follows from Lemma 8.6.6
Proposition 8.7.2. Let n ∈ Z\{0}. Then [n] is a finite flat surjective morphism
of degree n2dim(A) . The separable degree of [n] equals the number of points of
any fibre. If char(K) | n , then [n] is an étale morphism and
A[n] ∼
= (Z/nZ)2dim(A) .
If p = char(K) divides n , then [n] is not separable.
Proof: Let g be the dimension of A . By Corollary 8.5.7, there is an ample
c ∈ Pic(A). The restriction of [n]∗ (c) to A[n] is trivial. Since [−1] is an auto-
morphism, [−1]∗ (c) is also ample. Proposition 8.7.1 shows that [n]∗ (c) is ample
and so is the restriction to A[n]. Therefore, A[n] has to be finite. Now the dimen-
sion theorem from 8.2.16 and Proposition 8.2.17 show that [n] is a surjective finite
flat morphism, whose fibres have cardinality equal to the separable degree of [n].
In order to compute its degree, we use intersection theory of divisors (see A.9).
There is a very ample even line bundle L on A (cf. Proposition 8.6.4). By A.9.18,
we may assume that L = O(D) for a divisor D on A . By the projection formula
(cf. A.9.24), we have the following identity of g -fold intersection numbers
[n]∗ (D) · · · [n]∗ (D) = deg[n] D · · · D.
By Proposition 8.7.1, we have [n]∗ (D) ∼ n2 D , where ∼ denotes rational equiv-
alence of divisors (cf. A.8.11). Noting that D · · · D = deg(X) = 0 (see A.9.33),
we deduce
n2g = deg[n].
We know that the differential d[n] is multiplication by n on the tangent space at
0 (cf. Corollary 8.2.25).
If char(K) | n , then we see by a translation argument that d[n] induces an iso-
morphism on tangent spaces. By Proposition B.3.5, the morphism [m] is étale and
8.8. Odd elements in the Picard group 265
hence separable (cf. A.12.19). We have seen that the number of points of A[n]
is equal to the separable degree of [n]. This means |A[n]| = n2g . For any m|n ,
it follows that the subgroup A[m] of A[n] has m2g elements. By the theory of
finite abelian groups, A[n] is isomorphic to (Z/nZ)2g .
If p = char(K)|n , then the differential d[n] vanishes at 0. Hence d[n] van-
ishes everywhere by a translation argument. Since a separable dominant mor-
phism is generically étale (cf. A.12.19), Proposition B.3.5 shows that [n] is not
separable.
Let A be an abelian variety. All varieties are assumed to be defined over the field
K.
Recall from Example 8.6.3 that we have a canonical involution of Pic(A) defining
even and odd elements. First, we prove that Pic0 (A) is a divisible subgroup and
thus we get a decomposition into even and odd parts on the Picard group. Another
application is the beautiful result that the classes in the Picard group algebraically
equivalent to 0 are precisely the odd classes. Note that this is quite evident for an
elliptic curve E with origin P0 . Then a divisor class D is algebraically equivalent
to 0 if and only if deg(D) = 0. On the other hand, 8.3.6 yields [−1]∗ D ∼
2 deg(D)[P0 ] − D , proving that D is odd if and only if deg(D) = 0. At the end,
we will prove that the Poincaré class of an abelian variety is even.
Proposition 8.8.1. If c ∈ Pic(A) and r ∈ Z \ {0} with rc ∈ Pic0 (A), then
c ∈ Pic0 (A).
Proof: Obviously, we have rϕc = ϕrc and the latter is equal to zero by Proposi-
tion 8.5.9 (a), (b). Theorem 8.5.1 tells us that ϕc is a homomorphism of abelian
varieties and so ϕc = 0 (Proposition 8.7.2). Using once more Proposition 8.5.9
(a), (b), this implies that c ∈ Pic0 (A).
Corollary 8.8.2. Let c ∈ Pic(A). Then there are an odd element c− and an even
element c+ of Pic(A) such that c = c− + c+ . The element c− is determined
only up to 2-torsion elements in Pic(A).
Proof: This follows from 8.6.1, Lemma 8.6.2 and Proposition 8.8.1.
Theorem 8.8.3. If c ∈ Pic(A), then [−1]∗ c − c ∈ Pic0 (A). Moreover, the
following statements are equivalent:
(a) c is odd.
(b) For any variety X , the map Mor(X, A) −→ Pic(A), given by ϕ −→
ϕ∗ (c), is linear.
266 A B E L I A N VA R I E T I E S
Theorem 8.8.4. Let A be the dual abelian variety with corresponding Poincaré
class p ∈ Pic(A × A). Then p is even.
Proof: Let b ∈ A . By Remark 8.4.11 and Theorem 8.8.3, we have
([−1]∗ (p)) A×{b} = [−1]∗ (pA×{−b} )[−1]∗ (−b) = b.
Since
([−1]∗ (p)) {0}×B = [−1]∗ (p{0}×B ) = 0,
we get [−1]∗ (p) = p by Remark 8.4.11.
8.9. Decomposition of abelian varieties 267
(remember that i◦ϕ is onto because its restriction to B is an isogeny). By Corollary 8.5.11,
we have dim(A) = dim(A) and dim(B) = dim(B) . Therefore m : B × C → A is
a homomorphism of abelian varieties of the same dimension, with finite kernel. By the
dimension theorem again, we conclude that the homomorphism is onto A , i.e. an isogeny.
Definition 8.9.4. An abelian variety B = {0} is called simple if {0} and B are its only
abelian subvarieties.
268 A B E L I A N VA R I E T I E S
Corollary 8.9.5. There are simple abelian subvarieties B1 , . . . , Br of A such that addi-
tion gives an isogeny m : B1 × · · · × Br −→ A .
Proof: Proceed by induction on the dimension and apply Theorem 8.9.3.
Proposition 8.9.6. Let A1 , A2 be abelian varieties over K and let Hom(A1 , A2 ) be the
set of homomorphisms from A1 to A2 . Then, with respect to addition of homomorphisms,
the group Hom(A1 , A2 ) is a torsion-free abelian group.
Proof: Let us assume that we have m ∈ Z \ {0} and ϕ ∈ Hom(A1 , A2 ) such that
[m] ◦ ϕ = 0 . Let l be a prime different from char(K) with l | m . For r ∈ N , the
restriction of ϕ induces a homomorphism A1 [lr ] → A2 [lr ] . Proposition 8.7.2 shows that
Ai [lr ] ∼ r
= (Z/l Z)
2dim(A i )
.
Since [m] ◦ ϕ = 0 and l | m , we conclude
ϕ A 1 [l r ]
= 0.
Now let B be a simple abelian subvariety of A1 . The above applied to B instead of A1
shows that ϕ vanishes on infinitely many points of B . Therefore the restriction of ϕ to B
is zero (take the closure of the vanishing points and use that B is simple). Corollary 8.9.5
shows that ϕ = 0 .
Remark 8.9.7. We can even prove that Hom(A1 , A2 ) is a free abelian group of rank
smaller or equal to 4dim(A1 )dim(A2 ) . It can also be shown that, for an isogeny ϕ :
A1 → A2 , there is an isogeny ψ : A2 → A1 such that ψ ◦ ϕ is equal to multiplication
with some non-zero integer. With this in mind, we show that the factors in Corollary 8.9.5
are uniquely determined up to isogeny and renumbering. For the proof of these results, we
refer to [212], Ch.IV, §19, or to [204], §12.
8.9.8. Complex analytically, A may be identified with a complex torus V /Λ with a positive
definite Riemann form H on V for Λ (cf. 8.2.27). An abelian subvariety B of A is given
by a subspace W of V such that Λ ∩ W is a lattice in W . Then B = W/(Λ ∩ W ) .
Now Poincaré’s complete reducibility theorem may be proved complex analytically in the
following way. Let W ⊥ be the orthogonal complement with respect to H . As H is not
degenerate (i.e. its matrix has full rank), the same holds for E := H . We may view E
as a non-degenerate alternating bilinear form on the Q -vector space Λ ⊗ Q and hence the
orthogonal complement of Λ ∩ W in Λ ⊗ Q with respect to E has dimension equal to
2(dim(V ) − dim(W )) . It is easily checked that the orthogonal complement is spanned
by Λ ∩ W ⊥ , where ⊥ is first meant with respect to E . Now for a complex subspace
of V , the orthogonal complements with respect to E and H are the same. We conclude
that Λ ∩ W ⊥ is a lattice in W ⊥ . Let C be the abelian subvariety W ⊥ /(Λ ∩ W ⊥ ) . As
(Λ ∩ W ) ⊕ (Λ ∩ W ⊥ ) is a sublattice of Λ (using H positive definite), we get an isogeny
B × C → A . The kernel has cardinality equal to the index of the lattices.
In this section, we assume that the reader is familiar with the Riemann–Roch
theorem of curves from A.13.5 and with related cohomological arguments
8.10. Curves and Jacobians 269
8.10.2. The goal of this section is to deduce the properties of the Jacobian variety
from our previous results for Picard varieties. However, this is not the historical
approach, where Weil and Chow constructed the Jacobian directly from products
of C . We first sketch Weil’s approach in 8.10.3 and then Chow’s approach in
8.10.4. As both constructions are more elementary than the use of the Picard
variety (involving some machinery of algebraic geometry), we should note that
they do not give immediately the universal property of the Picard variety. In the
logic of this book, our developement of the theory does not follow the historical
approach.
In 8.10.5, we briefly describe the well-known analytic construction of the Jacobian
making it clear that the holomorphic differential forms on C and J are the ‘same’.
This result holds generally for any base field K , but we skip the algebraic proof
here because it is best proved in the language of schemes (see J.S. Milne [205]).
As an immediate consequence, we shall see that the Jacobian is g -dimensional.
On J , there is an important divisor Θ given as the (g − 1)-fold sum of the image
of C . The goal is to prove that Θ is ample and that the isogeny ϕΘ (cf. 8.5.1)
is an isomorphism, i.e. the Jacobian is self-dual. These facts will have important
diophantine consequences in the following chapters because we can endow J(K)
with a canonical inner product which, at least in the number field case, allows us
to work with an euclidean norm to count rational points.
After some introductory lemmas for divisors on curves, which follow from the
Riemann–Roch theorem, we prove in Proposition 8.10.13 that C may be seen
as a closed subvariety of J . Then we study in 8.10.14–8.10.21 the interrelation
between the Poincaré classes of C and J leading to the self-duality of J and to the
ampleness of Θ in Theorem 8.10.22. In 8.10.23, we proceed complex analytically
to construct a canonical Riemann form for the Jacobian and we give Riemann’s
theorem relating the theta divisor with Riemann’s theta function.
8.10.3. The first construction goes back to Jacobi, in the analytic setting. The construction
over fields of arbitrary characteristic was obtained by A. Weil, as sketched below.
270 A B E L I A N VA R I E T I E S
We assume that K is algebraically closed. Let Sr be the symmetric group with r elements.
The group Sr acts on C r by permuting factors. By considering symmetric functions, we
show that C (r) := C r /Sr is a smooth variety of dimension r , which clearly parametrizes
the effective divisors on C of degree r .
Now consider the special case r = g . By the Riemann–Roch theorem (see A.13.6), if D
is any divisor of degree g , the linear system |D| has dimension dim H 1 (C, O(D)) ≥ 0
and therefore is not empty. Generally, its dimension will be 0 (see Lemma 8.10.10 below
for details); if the dimension is larger than 0 , D is called a special divisor. Note that,
by Roch’s part of the Riemann–Roch theorem (the duality), D is special if and only if the
linear system |KC − D| , where KC denotes a canonical divisor of C , is not empty. For
example, if g = 2 we see that D is special if and only if it is a canonical divisor. The study
of special divisors for curves of high genus is highly non-trivial, and we refer for example
to E. Arbarello, M. Cornalba, P. Griffiths, and J. Harris [13] for the reader who wants to
learn more on this topic. We denote by U(g) the Zariski open subset of C (g) of non-special
effective divisors of degree g (see Lemma 8.10.11 below).
Recall that P0 is the base point on C and let D1 , D2 be two points on U(g) . The divisor
D1 + D2 − gP0 has degree g and if it were not special it would give a well-defined third
point D3 on U(g) , namely the unique effective divisor in |D1 + D2 − gP0 | . We show next
that there is an open dense subset V(g) of U(g) such that the addition law D1 ⊕ D2 = D3
induces a well-defined morphism
V(g) × V(g) −→ U(g) .
This is in fact a so-called birational group law on C (g) , where the sum D3 of D1 , D2 ∈
V(g) is given by the rational equivalence
D1 + D2 − gP0 ∼ D3 ≥ 0.
Now Weil gives a general process which attaches to every birational group law on a smooth
variety X a group variety G and a birational map ϕ of X into G with ϕ(ab) = ϕ(a)ϕ(b)
whenever the left-hand side is defined. Moreover, G is uniquely defined up to canonical
isomorphisms. So, in our case, we get a commutative group variety J . It can be identified
with Pic0 (C) in such a way that the birational map C (g) J is equal to the morphism
j(g) : C (g) → Pic0 (C) given by
j(g) (P1 , . . . , Pg ) = cl([P1 ] + · · · + [Pg ] − g[P0 ]).
Since C (g) is complete, the same is true for J (see A.6.15 (c)) and so J is an abelian
variety of dimension g .
It is a point of historical significance that this last part of the construction required, and in
fact motivated, the concept of abstract variety in the sense of Weil. It still remained to see
that the Jacobian so constructed was indeed a projective variety.
8.10.4. Somewhat later Chow gave a different construction which obtained J directly as a
projective variety. Again, we assume for simplicity K algebraically closed. Chow bypassed
the difficulty created by special divisors by constructing first not the Jacobian itself, but
rather a certain projective bundle over it. The idea is the following.
8.10. Curves and Jacobians 271
Consider C (n) with n ≥ 2g − 1 . Let D be an effective divisor of degree n ; then the Serre
duality in A.10.29 gives
dim H 1 (C, O(D)) = 0,
therefore |D| is a projective space of dimension precisely n − g , which is a subvariety of
C (n) . Conversely, every closed subvariety of C (n) isomorphic to Pn−g is equal to a linear
system |D| of an effective divisor D of degree n . Just note that any line in C (n) joins
rationally equivalent divisors by the very definition of rational equivalence. Thus the variety
parametrizing the projective subspaces of C (n) of dimension n − g may be identified with
the rational equivalence classes of effective divisors on C of degree n . By the Riemann–
Roch theorem, every divisor of degree n is rationally equivalent to an effective divisor. By
means of the map D
→ D − n[P0 ] , the above parameter variety may be identified with
J = P ic0 (C) .
In fact, the subvarieties of a projective variety X of fixed degree and fixed dimension form
an algebraic family (Chow, Van der Waerden, Cayley); constructive proofs can be obtained
via elimination theory and the theory of Cayley–Chow form. This suffices to give J the
structure of a projective variety and the only problem is to show that it is smooth, which
Chow proved by a direct calculation.
8.10.5. In the context of complex geometry there is a more familiar construction of the
Jacobian variety (for details consult [130]).
-
Let γ be a 1 -cycle on C . Then γ ω is a linear functional on the holomorphic 1 -forms on
C , whose value depends only on the homology class of γ in H1 (C, Z) . Thus we obtain a
homomorphism
H1 (C, Z) −→ H 0 (C, Ω1C )∗
of the homology group H1 (C, Z) into the dual H 0 (C, Ω1C )∗ of the space of holomorphic
1 -forms on C . This embeds H1 (C, Z) as a lattice in H 0 (C, Ω1C )∗ . Then the complex
torus
J = H 0 (C, Ω1C )∗ / H1 (C, Z)
realizes the Jacobian variety complex analytically (cf. 8.10.23 for details). We have an
embedding
j : C −→ J, P
−→ ,
γp
where γp is any path connecting the base point P0 with P . The value j(P ) is independent
of the choice of the path. Independently of the choice of the base point P0 , we have a
homomorphism
n
n
Pic0 (C) −→ J, ([Pi ] − [Qi ])
−→ (j(Pi ) − j(Qi )).
i=1 i=1
Abel’s theorem gives the injectivity and the Jacobi inversion theorem the surjectivity of
this homomorphism. There is a natural isomorphism of H 0 (J, Ω1J ) onto the dual of the
tangent space TJ,0 (use Proposition 8.2.26). Pull-back induces an isomorphism
∼ H 0 (C, Ω1 ).
H 0 (J, Ω1J ) −→ (8.11)
C
272 A B E L I A N VA R I E T I E S
Remark 8.10.6. This isomorphism holds for any base field K . More precisely,
let J = P ic0 (C) be the Jacobian of C and let us consider the map j : C → J ,
given by P → cl([P ] − [P0 ]). It follows from the theory of Picard varieties that j
is a morphism of varieties over K . In fact, let ∆ be the diagonal in C × C and let
p1 , p2 be the projections of C × C onto C , then cl(∆) − p∗1 cl([P0 ]) − p∗2 cl([P0 ])
is a subfamily of Pic0 (C) parametrized by C . By Theorem 8.4.13, we conclude
that j is a morphism. Then j is the j(1) in Weil’s approach 8.10.3 and j is the
same as the analytic j in 8.10.5. It can be proved that j ∗ gives the isomorphism
(8.11) (use [44], Th.8.4.1 and Prop. 8.4.2, or [205], Prop.2.2).
Corollary 8.10.7. The Jacobian variety of C has dimension g .
Proof: By Proposition 8.2.26, the tangent bundle TJ is a trivial vector bundle of
rank dim(J). By duality, the same holds for the cotangent bundle. Now recall that
the only regular functions on an irreducible complete variety over an algebraically
closed field are the constants (use A.6.15 (b), (d)). Since J is geometrically re-
duced, compatibility of cohomology and base change holds (see A.10.28). This
shows that H 0 (J, OJ ) = K and hence
dim H 0 (J, Ω1J ) = dim(J) · dim H 0 (J, OJ ) = dim(J).
The claim follows using the isomorphism (8.11).
Remark 8.10.8. An important role on the Jacobian variety J is played the theta
divisor
Θ := j(C) + · · · + j(C) .
' () *
g−1
We shall show below that Θ is indeed a divisor on J . In order to deduce el-
ementary properties of j and of the theta divisor, we need the following three
lemmas. For a divisor D and a line bundle L, it is convenient to use the notation
L(D) := L ⊗ O(D) and similarly for the sheaf of sections.
Lemma 8.10.9. Assume that the ground field K is algebraically closed. Let L be
a line bundle on C . Then for every P ∈ C , we have
dim Γ(C, L(−[P ])) ≥ dim Γ(C, L) − 1.
Equality holds if and only if P is not a base point of L.
Proof: Let sP be the canonical global section of O([P ]) corresponding to the
divisor [P ] on C . Then we identify Γ(C, L(−[P ])) with a subspace of Γ(C, L)
using the injective map s → s ⊗ sP . Clearly, Γ(C, L(−[P ])) is the kernel of the
evaluation map
Γ(C, L) −→ K, s → s(P )
at P . This map is surjective if and only if P is not a base point of L (cf. A.5.20).
A fancier way to deduce this is the use of the skyscraper sheaf KP at P (see
Example A.10.20). Taking global sections in the short exact sequence
0 −→ L(−[P ]) −→ L −→ KP −→ 0,
8.10. Curves and Jacobians 273
where L is the sheaf of sections of L, we get the same result. Finally, using the
dimension formula from linear algebra, we get the claim.
Lemma 8.10.10. Let L be a line bundle on C and let r ∈ {1, . . . , dim Γ(C, L)}.
Then there is a dense open subset U of C r such that
⎛ ⎛ ⎞⎞
r
dim Γ ⎝CK , LK ⎝− [Pj ]⎠⎠ = dim Γ(C, L) − r
j=1
On the right-hand side, we have used that forming cohomology groups is compat-
ible with base change (see A.10.28). Fix a basis s1 , . . . , sn of Γ(C, L) and let
Q1 , . . . , Qr be distinct points of C . We have an exact sequence
⎛ ⎞
r 5 r
0 −→ LK ⎝− [Qj ]⎠ −→ LK −→ KQj −→ 0
j=1 i=1
Proof: By Lemma 8.10.10 and its proof applied to L = Ω1C , the set of (P1 , . . . , Pr )
with ⎛ ⎛ ⎞⎞
r
dim H 0 ⎝CK , Ω1CK ⎝− [Pj ]⎠⎠ = g − r
j=1
and Pi = Pj for all i = j is an open dense subset of C r . By the Riemann–Roch
theorem in A.13.5, this set is Ur .
Remark 8.10.12. For any r ∈ N , we have a map
⎛ ⎞
r
jr : C r −→ J, (P1 , . . . , Pr ) → cl ⎝ [Pj ] − r[P0 ]⎠ .
j=1
tangent vectors ([148], Remark II.7.8). The first (resp. second) condition follows
by injectivity of j (resp dj ).
8.10.14. As a divisor on J , we also consider
Θ− := [−1]∗ Θ = −j(C) − · · · − j(C) .
' () *
g−1
−
In Pic(J), we use θ := cl(O(Θ)) and θ := [−1]∗ θ . For a ∈ J , we set
ja := τ−a ◦ j , i.e. ja is the map C → J given by ja (P ) = j(P ) − a.
The pull-back of a divisor D with respect to a morphism ϕ : X → X of
irreducible smooth varieties over K is well defined as a divisor if ϕ(X) is not
contained in the support of D (see A.8.26). In this case, viewing D as a Cartier
divisor on X locally given on Uα by a rational function fα , the pull-back ϕ∗ (D )
is given on ϕ−1 (Uα ) by the rational function fα ◦ ϕ . Note that ϕ∗ (D ) is
well defined in CH 1 (X) for any divisor D on X (see A.9.18). If ϕ is an
isomorphism (as [−1] is in the cases above) and if D is a prime divisor, then
ϕ∗ (D ) = ϕ−1 (D ).
Proposition 8.10.15. Assume K algebraically closed. For all (P1 , . . . , Pg ) ∈
C g , we have the rational equivalence relation
g
[Pi ] ∼ ja∗ (Θ− )
i=1
If K is not algebraically closed, then Corollary 8.4.10 (a) shows that the rational
equivalence holds over any field of definition for P1 , . . . , Pg .
Proof: The idea is to show that the intersection of j(C) and Θ− + a is transverse
for generic (P1 , . . . , Pg ). Then the proposition will follow in the generic case
from our first step below. An application of the theorem of the square will lead to
the general case.
g
First step: For (P1 , . . . , Pg ) ∈ C g with dim Γ(C, O( i=1 [Pi ])) = 1, we have
(Θ− + a) ∩ j(C) = {j(P1 ), . . . , j(Pg )}.
To prove the first step, we note that for Q ∈ C , we have j(Q) ∈ Θ− + a if and
only if there are Q1 , . . . , Qg−1 ∈ C such that
g−1
g
[Q] − [P0 ] ∼ (g − 1)[P0 ] − [Qi ] + [Pi ] − g[P0 ].
i=1 i=1
This is equivalent to
g−1
g
[Q] + [Qi ] ∼ [Pi ].
i=1 i=1
276 A B E L I A N VA R I E T I E S
g g
By assumption, the linear system | i=1 [Pi ]| consists only of i=1 [Pi ]. This
proves immediately the first step.
Second step: For 1 ≤ r ≤ g and (P1 , . . . , Pr ) ∈ Ur (defined in Lemma 8.10.11),
the differential djr : TC r ,(P1 ,...,Pr ) → TJ,jr (P1 ,...,Pr ) is injective.
As in the proof of Proposition 8.10.13, it is enough to show that the dual map is
surjective. The kernel of the dual map
5
r
∗
TJ,j r (P1 ,...,Pr )
−→ TC∗ r ,(P1 ,...,Pr ) ∼
= ∗
TC,P i
i=1
using (8.11) on page 271 and using that regular functions on C are constant (see
A.6.15 (b), (d)). By the Riemann–Roch theorem in A.13.5 (cf. proof of Lemma
8.10.11), we conclude that the kernel has dimension g − r . Since dim(J) = g
(Corollary 8.10.7), this proves the second claim.
Third step: For generic (P1 , . . . , Pg ) ∈ C g , the intersection of Θ− + a and j(C)
is transverse.
For the definition of transverse intersection, see Example A.9.22. Generic means
that it holds in an open dense subset of C g . Thus we may assume (P1 , . . . , Pg ) ∈
Ug . From the first step, we get
{j(P1 ), . . . , j(Pg )} = (Θ− + a) ∩ j(C).
We have to omit the singular part of Θ− +a, i.e. we do not want j(Pi ) ∈ Θ−
sing +a
for any i ∈ {1, . . . , g}. This incidence is equivalent to
j(P1 ) + · · · + j(Pi−1 ) + j(Pi+1 ) + · · · + j(Pg ) ∈ Θsing .
Clearly, these are closed conditions of codimension ≥ 1 on C g . So we may
assume that j(C) intersects Θ− + a only in the smooth locus. It remains to show
that the tangent spaces Tj(C),j(Pi ) and TΘ− +a,j(Pi ) span TJ,j(Pi ) . Without loss
of generality, we may assume i = 1.
We consider the morphism
ϕ : C g → J, (P1 , . . . , Pg ) → j(P1 ) − j(P2 ) − · · · − j(Pg ).
For P := (P1 , . . . , Pg ) ∈ Ug , we claim that dϕP is injective. Again, it is enough
to show surjectivity of the dual map
5
g
∗
(δϕ)P : TJ,ϕ(P) −→ TC∗ g ,P ∼
= ∗
TC,P i
.
i=1
8.10. Curves and Jacobians 277
We may identify the left-hand side with Γ(C, Ω1C ) (via the isomorphism (8.11) on
page 271 and Proposition 8.2.26). For ω ∈ Γ(C, Ω1C ), we have (δj)P (ω) = ω(P )
and Corollary 8.2.25 leads easily to δ(−j)P = −ω(P ) for any P ∈ C . This
shows
5g
∗
(δϕ)P (ω) = (ω(P1 ), −ω(P2 ), . . . , −ω(Pg )) ∈ TC,P i
i=1
and, as in the second step, we deduce that (dϕ)P is one-to-one. By dimensionality
reasons, (dϕ)P is an isomorphism for all P ∈ Ug .
∼
We note that (dj)P1 : TC,P1 −→ Tj(C),j(P1 ) by Proposition 8.10.13. As the claim
has to be proved only generically, we may assume that (P2 , . . . , Pg ) ∈ Ug−1 . The
second step shows that
∼
d(−jg−1 ) : TC g −1 ,(P2 ,...,Pg ) −→ TΘ− ,−jg −1 (P2 ,...,Pg ) .
We have also used that we map to the smooth locus of Θ− . We apply (dϕ)P to
the canonical direct sum decomposition
TC g ,(P1 ,...,Pg ) ∼
= TC,P1 ⊕ TC g −1 ,(P2 ,...,Pg )
from A.7.17. These four isomorphisms lead to the interior direct sum decomposi-
tion
TJ,ϕ(P) = Tj(C)−jg −1 (P2 ,...,Pg ),ϕ(P) ⊕ TΘ− +j(P1 ),ϕ(P) .
The third claim now comes by applying the isomorphism dτjg −1 (P2 ,...,Pg ) .
Fourth step: Proof of the proposition for generic (P1 , . . . , Pg ) ∈ C g .
We choose a generic (P1 , . . . , Pg ) on C g . By the third step, we know that the
intersection of Θ− + a and j(C) is transverse, hence all components in the in-
tersection product have multiplicity one (Example A.9.22). By the first step, we
get
g
(Θ− + a).j(C) = j(Pi ).
i=1
If we identify C with j(C), then it follows from the definitions that ja∗ (Θ− )
corresponds to the above intersection product. This proves the fourth step.
Fifth step: Proof of the proposition for all (P1 , . . . , Pg ) ∈ C g .
We note first that for two dense open subsets U, V of J , we have J = U − V .
This follows easily from the fact that the intersection of U and a + V is not empty
for all a ∈ J . By the fourth step and since jg is a surjective closed morphism,
there is an open dense subset U of J such that
g
[Pi ] ∼ ja∗ (Θ− )
i=1
278 A B E L I A N VA R I E T I E S
As pC |{P0 }×C = 0 as well, the claim follows from the seesaw principle (see
Corollary 8.4.4).
Proposition 8.10.20. Let ϕθ , ϕθ− be the morphisms J → J introduced in 8.5.1
and let c := m∗ θ − − p∗1 θ − − p∗2 θ − ∈ Pic(J × J), as in Proposition 8.10.19.
Then
(idJ × ϕθ− )∗ (pJ ) = (idJ × ϕθ )∗ (pJ ) = c.
Proof: Let a ∈ J . Then
ϕθ− (a) = τa∗ (θ − ) − θ −
by definition. By Theorem 8.4.13, we get
(idJ × ϕθ− ) (pJ )J×{a} pJ J×{ϕ − (a)} = τa∗ (θ − ) − θ − .
∗
θ
The latter is equal to c J×{a} (see proof of Proposition 8.10.19) and so we obtain,
by the seesaw principle in 8.4.4, that
∗
(idJ × ϕθ− ) (pJ ) = c.
280 A B E L I A N VA R I E T I E S
and the corresponding class in Pic(J) is denoted by θ . Let θ − := [−1]∗ θ and let
ϕθ : J → J be the natural morphism introduced in Theorem 8.5.1. There are three
canonical morphisms from J × J onto J , namely addition m , first projection p1
and second projection p2 . The pull-back of the Poincaré class pJ ∈ Pic(J × J)
by idJ × ϕθ is equal to the class
c := m∗ θ − − p∗1 θ − − p∗2 θ −
and it follows from the proof of Proposition 8.10.20 that
c = m∗ θ − p∗1 θ − p∗2 θ.
The next theorem shows that ϕθ is an isomorphism. We can identify J and J by
this natural isomorphism, so it makes sense to consider J as self-dual and c as
the Poincaré class of J .
Theorem 8.10.22. The map ϕθ is an isomorphism of J onto J whose inverse is
−
j . Moreover, θ is ample.
j ◦ ϕθ = idJ . For a ∈ J , we have
Proof: We first show −
j ◦ ϕθ (a) = −j ∗ (τa∗ (θ) − θ) .
−
Since τa∗ (θ) − θ ∈ J = P ic0 (J) is odd (Theorem 8.8.3), we may replace −j ∗
on the right-hand side by j ∗ ◦ [−1]∗ , leading to
j ◦ ϕθ (a) = j ∗ ◦ τ−a
∗
(θ − ) − j ∗ (θ − ) = ja∗ (θ − ) − j ∗ (θ − ) = a
by Corollary 8.10.17. We get −j ◦ ϕθ = idJ . In particular, ϕθ is injective. Since
J and J have the same dimension (Corollary 8.5.11), the dimension theorem in
8.2.16 proves that ϕθ is also surjective proving the first claim. By Proposition
8.5.5, θ has to be ample.
8.10. Curves and Jacobians 281
8.10.24. We end this chapter by interpreting the situation complex analytically (for details,
cf. [130]). We claimed in 8.10.5 that the Jacobian variety J is given by the complex torus
H 0 (C, Ω1C )∗ /H1 (C, Z). (8.12)
∼ 1 ∗
- (C) −→ H (X, Ω ) given by map-
0,1 0
Note first that we have a complex isomorphism H
ping the (0, 1) -form ρ to the linear functional · ∧ ρ . By Serre duality in A.10.29 and
8.4.15, we conclude that (8.12) is indeed the Jacobian variety.
We give a direct argument that (8.12) is a complex abelian variety. The intersection num-
ber induces a canonical alternating Z -valued bilinear form E on H1 (C, Z) given by
E(γ1 , γ2 ) := γ2 · γ1 . It is enough to show that E is the imaginary part of a posi-
tive definite Hermitian form on H 0,1 (C) . By Poincaré duality, we have an isomorphism
∼
H1 (C, R) −→ HdR 1
(C, R) by mapping the cycle γ to ηγ characterized by
1
η= η ∧ ηγ (η ∈ HdR (C, R)).
γ C
The lattice H1 (C, Z) is realized in H 0,1 (C) by mapping γ to the projection ηγ0,1 of ηγ
with respect to the Hodge decomposition. Now recall that intersection products of cycles
correspond via Poincaré duality to wedge products of forms (cf. [130], p.59), i.e.
γ·δ = ηγ ∧ ηδ .
C
Then it is easily seen that E is the imaginary part of the positive definite Hermitian form
H(ρ, µ) = −2i ρ ∧ µ (8.13)
on H 0,1 (C) .
282 A B E L I A N VA R I E T I E S
Topologically, the compact Riemann surface C is characterized by its g holes and there is
a basis γ1 , . . . , γ2g of H1 (C, Z) such that the intersection matrix has the form
0 Ig
(γj .γk ) = , (8.14)
−Ig 0
where Ig is the g × g -identity matrix (to see this, choose γ1 , . . . , γg “around” the holes
and γg+1 , . . . , γ2g “through” the holes). We choose a basis ω1 , . . . , ωg of H 0 (C, Ω1C )
such that the period matrix has the form
ωj = (Ig Z) .
γk j=1,...,g; k=1,...,2g
Then ω1 , . . . , ωg give rise to complex coordinates on the torus J , which we identify with
Cg /Λ such that γ1 , . . . , γg correspond to the the standard basis. Since H is a positive
definite Riemann form, we easily deduce that Z is a symmetric g × g -matrix with Z
positive definite. We have Λ = Zg + Z · Zg as in the example of 8.6.13. Moreover, the
Riemann form in (8.13) equals the one in (8.9) on page 262. By (8.14) and 8.6.13 again, we
deduce that the Riemann theta function
θ(v, Z) = exp(πikt Zk + 2πikt v)
k∈Zg
corresponds to a global section s of a line bundle L on J , with Riemann form H and with
dim Γ(J, L) = 1 . Riemann’s theorem (cf. [130], p.338) states that the theta divisor Θ
from 8.10.8 is equal to div(s) up to translation.
The presentation of the material is as in the classical treatises of S. Lang [165] and
Mumford [212], to which our exposition owes a great deal. We have also used
the modern survey articles of J.S. Milne [204], [205] at several places. The reader
is interested in going deeper into the complex analytic theory of abelian varieties
may consult Griffiths and Harris [130].
The theory of abelian varieties over an arbitrary field was initiated by A. Weil
[327], motivated by his proof of the Riemann hypothesis, where we also find his
construction of the Jacobian. Chow’s construction is given in W.L. [67].
An important part of the theory, which we have left aside, is the study of -adic
representations and Tate groups, for which we refer to Mumford [212].
On another point, while Weierstrass models for elliptic curves are very useful and
simple to study, it is very difficult to describe abelian varieties of dimension > 1
by homogeneous equations. This leads to a deep study of theta functions for which
the reader may consult D. Mumford’s Tata lectures [214].
9 N É RO N – TAT E H E I G H T S
9.1. Introduction
283
284 N É RO N – TAT E H E I G H T S
In Section 9.5, which is based on Section 2.7, we describe Néron’s canonical lo-
cal heights. This complements the picture and will not be used anywhere else in
this book. In Section 9.6, we use the theory of heights to deduce and extend a
result of Sprindžuk on the distribution of poles and, as an application, we give
a generalization of an old theorem of Runge leading in turn to a new proof of
Hilbert’s irreducibility theorem, based on the theory of local heights. This section
depends on additional material requiring more knowledge from the reader and may
be skipped in a first reading.
Chapters 2 (at least Sections 2.2–2.4) and 8 are the prerequisites for reading this
chapter.
Let K be a field with product formula and let A be an abelian variety over K .
All varieties and morphisms are assumed to be defined over K .
9.2.1. Let X be a complete variety over K . By Theorem 2.3.8, we have the height
homomorphism
h : Pic(X) −→ RX(K) /O(1),
which associates to c the equivalence class of heights hc .
In general, there is no canonical height function associated to c ∈ Pic(X). They
are only determined up to bounded functions. But on an abelian variety, there is
a canonical choice hc of a height function in any class hc characterized by good
behaviour with respect to the group operation. The reader is assumed to be familiar
with the notions and results from Section 8.6.
By the theorem of the cube in 8.6.11, for every c ∈ Pic(A) we have a quadratic
function
Mor(X, A) −→ Pic(X), ϕ −→ ϕ∗ (c). (9.1)
Note that the decomposition c = c+ + c− of c into an even part c+ and an odd
part c− (Corollary 8.8.2) gives a decomposition of our quadratic function into a
quadratic form ϕ → ϕ∗ (c+ ); hence with the homogeneity property
(nϕ)∗ (c+ ) = n2 ϕ∗ (c+ )
(see Proposition 8.7.1), and into a linear form ϕ → ϕ∗ (c− ) (see Theorem 8.8.3).
The composite of the height homomorphism and the quadratic function in (9.1) is
a quadratic function
q : Mor(X, A) −→ RX(K) /O(1), ϕ −→ hϕ∗ (c) .
We conclude that q = q+ + q− for the quadratic form q+ (ϕ) := hϕ∗ (c+ ) and
the linear form q− (ϕ) := hϕ∗ (c− ) . Since 2 is invertible in the abelian group
9.2. Néron–Tate heights 285
= nd
h(x).
We have proved the existence of a homogeneous function
h of degree d for N ,
such that
h − h remains bounded.
If we combine 9.2.1–9.2.4, then we obtain canonical global height functions asso-
ciated to every class of Pic(A). This procedure is called Tate’s limit argument.
hc :=
Definition 9.2.6. The height function hc+ +
hc− is called the Néron–Tate
height associated to c .
All the formalism in Chapter 2 involving heights on abelian varieties is now true
for Néron–Tate heights as exact equations, and not just up to bounded functions.
More precisely:
Theorem 9.2.7. The Néron–Tate heights on abelian varieties have the following
properties:
Suppose that
hϕ (x) = 0 . Then
hϕ (ϕr (x)) = λr
hϕ (x) = 0
9.3. The bilinear form 289
(b)
hf (f (x)) = n
hf (x) .
Remark 9.2.12. Note that Theorem 9.2.10 extends Kronecker’s theorem from 1.5.9. In-
deed, if f (x) = xn in the situation of Example 9.2.11, we have hf (x) = h(x) , where h
is the Weil height. The preperiodic points of f are 0 , ∞ and the roots of unity.
In the next section, we shall prove a counterpart of Kronecker’s theorem for abelian vari-
eties, namely Theorem 9.3.5, proved along similar lines.
9.3.2. We would like that the bilinear form bR (x, y) determines a scalar product
and an associated norm x2 = bR (x, x) on M R .
To this end, it is of course necessary that b(x, x) > 0 for every x ∈ M \ {0}.
Suppose this is the case. By clearing denominators, b(x, x) > 0 for x ∈ M Q \
{0}; therefore, by continuity we also have bR (x, x) ≥ 0 for x ∈ M R \ {0}.
Note however that this is not enough for bR to be positive definite, as is seen in
the following example: If α is a transcendental number in R , then the quadratic
form in R2 given by q(x) := (x1 − αx2 )2 is positive semidefinite. We have
2
q(α, 1) = 0, but q(x) > 0 whenever x ∈ Q \ {0}, because α is transcendental.
Thus some care is needed if we want to obtain a scalar product from the bilinear
form b .
Lemma 9.3.3. With the notation and assumptions of 9.3.1, the bilinear form bR is
positive definite if and only if for every finitely generated subgroup M of M and
for every C > 0 the set
{x ∈ M | bR (x, x) ≤ C}
is finite.
Proof: We may assume that M is finitely generated. Since M is torsion-free, it is
a lattice in M R . If bR is a scalar product, then there are only finitely many lattice
points in a bounded set (Proposition C.2.4). This proves the result in one direction.
Conversely, assume that bR is not positive definite. We may assume that bR is
positive semidefinite. Otherwise, the set {x ∈ M | bR (x, x) ≤ C} is clearly
infinite. There is a y ∈ M R \ {0} such that bR (y, y) = 0. For bR positive
semidefinite, the Cauchy–Schwarz inequality is valid. Therefore, y is in the kernel
of bR . By construction, the restriction of bR to M ×M has trivial kernel and hence
y ∈ M Q .
Choose a basis x1 , . . . , xr of M . It is also a basis of M R . For any n ∈ N , there
is a yn ∈ M such that the coordinates of yn − ny are in the interval [0, 1]. The
elements yn − ny are contained in the compact cube
r 2
αi xi | 0 ≤ αi ≤ 1 ,
i=1
while on the other hand
b(yn , yn ) = b(yn − ny, yn − ny).
Since bR is continuous, it is bounded on that cube, say by C . Since y ∈ M Q , the
set {yn | n ∈ N} is infinite and contained in
{x ∈ M | bR (x, x) ≤ C}.
This proves the lemma.
9.3. The bilinear form 291
Theorem 9.3.5. Let K be a number field and let c be ample and even. Then hc
vanishes exactly on the torsion subgroup of A(K). Moreover, there is a unique
scalar product , on the abelian group A(K) ⊗Z R such that
hc (x) = x ⊗ 1, x ⊗ 1
Proposition 9.3.6. Let c ∈ Pic(A) and let b be the symmetric bilinear form
associated to be the Poincaré class of A and
hc . Moreover, let δ ∈ Pic(A × A)
let ϕc : A −→ A be the homomorphism of Theorem 8.5.1. Then
b(a, a ) =
hδ (a, ϕc (a ))
Proof: By definition
b(a, a ) =
hc (a + a ) −
hc (a) −
hc (a )
=
hc ◦ τa (a) −
hc (a) −
hc (a ).
For the moment, let us keep a fixed and view the above as functions of a. By
Theorem 2.3.8, we conclude that
hc + b(·, a )
is a representative in the class hτ ∗ (c) . Then the representative above is a quadratic
a
function too, being a sum of a quadratic function and a linear form. Now Theorem
9.2.8 shows that
hτ ∗ (c) (a) =
hc (a) + b(a, a )
a
and hence by Theorem 9.2.7 we get
b(a, a ) = hϕ (a ) (a).
c
and
hc =
hδ (·, c ).
Corollary 9.3.8. Let c ∈ Pic0 (A) and let c ∈ Pic(A) be an even ample class.
Then
hc = O(
hc1/2 ).
9.3. The bilinear form 293
where θ is the theta divisor 8.10.8, m is the sum homomorphism and pi is the
projection of A × A onto the ith factor.
hδ (a, a) =
hθ+θ− (a)
9.4.2. In the light of Proposition 9.4.1, we will use the following notation for
a, a ∈ J(K)
a, a : = hδ (a, a ),
|a| : =
hδ (a, a)1/2 =
hθ+θ− (a)1/2 .
As shown by Mumford, this formula has some rather interesting consequences for
curves of genus g ≥ 2.
Proposition 9.4.5. Assume that C has genus g ≥ 2 and let cos α ∈ ( g1 , 1),
ε > 0. Then there is a constant B = B(C, P0 , ε) > 0 such that for any pair
(P, Q) ∈ C(K) one of the following four possibilities occurs:
(a) P = Q;
(b) j(P ), j(Q) < (cos α) · |j(P )| · |j(Q)|;
296 N É RO N – TAT E H E I G H T S
1 1 1
cos α ≤ r+ +O .
2g r |w|
We multiply the last inequality by 2g , note that 1/r ≤ 1 and find
1 |z|
2g cos α − 1 − O ≤r= .
|w| |w|
If we choose B sufficiently large, then either (c) or (d) hold.
Corollary 9.4.6. With the notation of Proposition 9.4.5, let P , Q be points on
C . Then, if j(P ) − j(Q) is in the kernel of , , either P = Q or |j(P )| =
|j(Q)| ≤ B . In particular, if j(P ) − j(Q) is a torsion point and |j(Q)| > B ,
then P = Q.
Proof: It is an immediate consequence of the preceding Proposition 9.4.5.
9.4.7. Our next goal is to count rational points of C (still assuming g ≥ 2), using
a procedure similar to that used in the proof of Theorem 5.2.1. As in Section 9.3,
the canonical bilinear form b = hδ extends to a symmetric positive semidefinite
bilinear form &b on J(K) ⊗Z R . Let NR be its kernel; then &b induces a scalar
product on E := J(K) ⊗Z R/NR , again denoted by , . By Corollary 9.4.6
above, the map
i : C(K) → E, P → j(P ) ⊗ 1 + NR ,
is one-to-one on the subset of points P such that |j(P )| > B .
Definition 9.4.8. A point P ∈ C(K) is called small if |j(P )| ≤ B , otherwise it
is called large.
9.4. Néron–Tate heights on Jacobians 297
9.4.9. We may choose (and fix) 0 < α < π/2 and ε > 0 such that cos α ∈ ( g1 , 1)
is such that λ := 2g cos α − 1 − ε > 1.
In the euclidean space E , we have the following geometric interpretation of Propo-
sition 9.4.5. If P, Q are different large points such that i(P ) and i(Q) include an
angle ≤ α and if |j(P )| ≤ |j(Q)|, then λ|j(P )| ≤ |j(Q)|. This shows that we
have gaps between points on C pointing in approximatively the same direction.
Let us consider the cone
with center 0, angle α/2 and axis through a ∈ E . We order the large points in
C(K) mapping to T in a sequence Q0 , Q1 , . . . such that
The above shows |j(Qk )| ≥ λk |j(Q0 )| for every k . For H > B , let nT (H) be
the number of large points Q ∈ C(K) mapping to T with |j(Q)| ≤ H . We get
9 :
log(H/B)
nT (H) ≤ .
log λ
9.4.10. The above bound for nT (H) is uniform with respect to T (for fixed
angle) and yields a counting of all large K -rational points of C mapping to T .
This can be used, in some circumstances, to count all large points in C(K) with
bounded height. To this end, it is necessary to assume that J(K) is a finitely gen-
erated group. Another possibility consists in fixing a priori a finitely generated
subgroup Γ of J(K), and consider only the subset of large points P ∈ C(K) for
which j(P ) ∈ Γ . The question whether we can take J(K) for such a group Γ
can then be examined independently. As we shall see later in the chapter dedicated
to the Mordell–Weil theorem, J(K) is finitely generated if K is a number field
or K is a function field over a finite field.
Thus we fix a subgroup Γ of J(K) of finite rank r := rankQ (Γ), where the rank
is the maximum number of Z -linearly independent elements in Γ .
We associate to Γ the finite-dimensional real vector subspace EΓ spanned by the
image of Γ in E . It is clear that dim(EΓ ) ≤ r .
For x ∈ E \ {0}, we set ν(x) = x/|x|. Then ν maps cones to spherical caps and
we get a bound for the minimal number of cones needed to cover EΓ by Lemma
5.2.19. From the proof of Lemma 5.2.19, we deduce immediately the following
result leading to a little sharpening of the bound for the number of large points.
Proposition 9.4.12. For ρ = 2 sin(α/2) and r = rankQ (Γ), the number nΓ (H)
of large points Q ∈ C(K) with j(Q) ∈ Γ and |j(Q)| ≤ H does not exceed
9 :
log(H/B)
nΓ (H) ≤ · (1 + 2/ρ)r .
log λ
In particular, nΓ (H) log H .
Proof: For any k ∈ N , we count the number of Q ∈ C(K) with λk B < |j(Q)| ≤
λk+1 B . By 9.4.9, the angle between two such points Q, Q is > α . We conclude
that |ν(Q)−ν(Q )| > ρ for ρ := 2 sin(α/2). By Lemma 9.4.11, there are at most
(1+2/ρ)r such points. Now the interval (B, H] may be covered by log(H/B))
such intervals, proving the claim.
9.4.13. Still assuming g ≥ 2, we may choose α = π/60and ε > 0 such that
√
λ = 2g cos α − 1 − ε ≥ 2 and ρ = 2 sin(π/12) = 2 − 3 > 12 . With
this choice of parameters, we may summarize our findings about Mumford’s gap
principle:
Theorem 9.4.14. Let C be an irreducible smooth projective curve of genus g ≥ 2
over K with base point P0 ∈ C(K) leading to a closed embedding j of C
into the Jacobian variety J and let Γ be a subgroup of J of finite Q -rank r .
Then there is a constant B > 0 depending only on C and P0 with the following
properties:
(a) If we choose any cone T in E with center 0 and angle α/2 (see 9.4.9) and
if we order {Q ∈ C | j(Q) ∈ Γ, |j(Q)| > B, i(Q) ∈ T } by increasing
norm, then |j(Qn+1 )| ≥ 2|j(Qn )| for every n ∈ N .
(b) For H > B , the number nΓ (H) of points Q ∈ C with j(Q) ∈ Γ, B <
|j(Q)| ≤ H is bounded by
9 :
log(H/B)
nΓ (H) ≤ · 5r .
log 2
(c) In particular, nΓ (2H) − nΓ (H) ≤ 2 · 5r .
The question of bounding the number of small points requires different considerations. In
Chapter 5 we used a uniform version of Zhang’s theorem, as at the end of the proof of
Theorem 5.2.1. Another way of dealing with the problem is to apply a finiteness result, as
in Northcott’s theorem in 2.4.9. This approach leads to the following Northcott condition
already used in 4.5.1:
Definition 9.4.15. The field K satisfies the Northcott property (N) if {α ∈ K | h(α) ≤
R} is finite for every R > 0 .
Proposition 9.4.16. If K satisfies (N), then for any projective variety X over K and any
ample class c ∈ Pic(X) , the set {x ∈ X(K) | hc (x) ≤ H} is finite for every H > 0 .
9.4. Néron–Tate heights on Jacobians 299
Proof: Since hc is determined up to bounded functions, this property is well defined. Now
the proof is the same as our deduction of Northcott’s theorem in 2.4.9 from 1.6.8.
By definition, | |2 is a height function with respect to an ample class, hence:
Proposition 9.4.17. The Northcott property (N) for a field K implies that the number of
small K -rational points of C is finite.
Remark 9.4.18. If K satisfies (N), then we see as in 9.3.4 that the kernel of , on
J(K) is the torsion subgroup of J(K) . By Lemma 9.3.3, , is a scalar product on
EJ (K ) = J(K) ⊗Z R . Thus, if Γ is a subgroup of J(K) , we get dim(EΓ ) = rankQ (Γ) .
9.4.19. Faltings’ theorem in 11.1.1 says that nJ (K ) (H) is in fact bounded whenever K is
a number field. On the other hand, there are fields K satisfying (N) for which J(K) is
finitely generated and Mumford’s bound nJ (K ) (H) log H in Proposition 9.4.12 is best
possible. The following example is due to Serre [277], p.80.
Example 9.4.20. Let K be a finite field with q elements and let X be an irreducible
smooth projective curve. There is a natural set MK (X ) of absolute values on the function
field K(X) , satisfying the product formula (see 1.4.6–1.4.9). A point P ∈ Pn (K(X))
corresponds to a rational map ϕ : X PnK defined over K . But ϕ is always de-
fined in codimension 1 (by the valuative criterion of properness and smoothness of X ,
see A.11.10), hence is a morphism. To ϕ we attach the line bundle L = ϕ∗ OPn (1) gen-
erated by the global sections sj := ϕ∗ (xj ), 0 ≤ j ≤ n . Example 2.4.11 shows that
h(P ) = degL (X) .
We claim that the field K(X) has the Northcott property (N). To prove that, let H > 0
and f ∈ K(X) with h(f ) ≤ H . By Example 1.5.23, we have
h(f ) = − deg(Z) min(0, ordZ (f )),
Z
where Z ranges over all prime divisors of X . It is easy to see that over a finite field, only
finitely many prime divisors of bounded degree can occur. We conclude that the number
of pole divisors of all rational functions f with h(f ) ≤ H is finite. By A.8.22, f is
determined up to a locally constant function by its pole divisor. Since K ∩ K(X) is a finite
field, we get the claim.
Let C be a geometrically irreducible smooth projective curve of genus g ≥ 2 over K . We
may assume that X is geometrically irreducible and that there is a point P ∈ C(K(X)) \
C(K) . This can certainly be achieved by a finite base extension of K(X) (extending K
also), which is again a function field of an irreducible normal projective curve by Lemma
1.4.10. The points of a normal curve are regular (see A.8.5) and the finite field K is perfect,
hence the curve has to be smooth again (see A.7.12 and A.7.13).
Denote the Jacobian of C by J . By Remark 10.6.5, J(K(X)) is a finitely generated
abelian group. Thus the assumptions of Proposition 9.4.12 are satisfied.
We recall the definition of the Frobenius map over K . For a variety Y over K , it is the
K -morphism F : Y → Y , given, in affine coordinates x on a chart, by x
→ xq . For
a line bundle L over Y , we have F ∗ (L) ∼= L⊗q , which is easily proved by considering
transition functions. Moreover, for every K -morphism ψ , we have F ◦ ψ = ψ ◦ F . In
300 N É RO N – TAT E H E I G H T S
defines a smooth plane curve C of degree q n + 1 in P2K (t) with homogeneous coordinates
(x0 : x1 : x2 ) . By the Jacobian criterion in A.7.15, this happens if det(aij (t)) is not
identically 0 . The curve C has genus g(C) = q n (q n − 1)/2 (see A.13.4) and in particular
g(C) ≥ 2 if q n ≥ 3 . Let P be a point on C rational over K(t) . It is easy to see that
the tangent line to C at P has intersection multiplicity q n with C at P ; therefore, since
C has degree q n + 1 , it intersects C residually in a single point P , again rational over
K(t) . Thus, starting with a point P0 , we obtain by the above tangent process a sequence
of points P1 , P2 , . . . all defined over K(t) . In general, the sequence obtained in this way
is infinite and we have
m log(h(Pm )) m.
The associated equation αq+1 + tα − 1 = 0 has a solution with a formal continued fraction
expansion
2 3
α = [0; t, tq , tq , tq , . . . ].
It is now easy to see, using classical elementary properties of continued fractions (see for ex-
ample G.H. Hardy and E.M. Wright [145] or O. Perron [233]), that, if we set (x−1 , y−1 ) =
(1, 0) , (x0 , y0 ) = (0, 1) , (x1 , y1 ) = (1, t) , and in general
m m
xm+1 = tq xm + xm−1 , ym+1 = tq ym + ym−1 ,
then
m −1 xm
[0; t, t2 , . . . , tq ]=
ym
9.5. The Néron symbol 301
and
xq+1
m
q
+ txm ym q+1
− ym = (−1)m+1 .
Thus we obtain a polynomial solution for every odd m , and in fact for any m if we are
in characteristic 2 . Moreover, it is the same as the geometric construction above, i.e. for
P0 = (1, 0) , we get Pm = (x2m−1 , y2m−1 ) for all m ∈ N . We leave the details to the
reader.
This gives an example of an affine curve (in fact, a Thue equation) over K[t] , of genus
q(q − 1)/2 , with infinitely many integral points over K[t] , occurring precisely at the rate
predicted by Mumford’s bound.
We give here Néron’s decomposition of the Néron–Tate height into canonical local heights.
This will lead to the Néron symbol on arbitrary smooth complete varieties, namely a pairing
between a divisor and a disjoint zero-dimensional cycle, both assumed to be algebraically
equivalent to 0 . This section is based on Section 2.7 and provides additional material not
essential to the rest of the book. Instead of working with local heights, we adopt the equiv-
alent concept of metrized line bundles promoted by Arakelov theory with much success.
Examples are given at the end and in particular we will describe quite explicitly the theory
when applied to Tate’s elliptic curves.
Let K be a field with a given set of places MK . For every place v ∈ MK , we fix an
absolute value | |v on K in the equivalence class determined by v . We assume that
{v ∈ MK | |α|v = 1} is finite for every α ∈ K \ {0} . We also fix an embedding K ⊂ K
and denote by M a set of places of K such that every u ∈ M restricts to a v ∈ MK . We
denote by | |u the unique extension of | |v to an absolute value in the equivalence class of
u.
9.5.1. Let L be a line bundle on a complete K -variety X . For the moment, we fix a place
u , which is the reference for boundedness and metrics. Let , be locally bounded
metrics on L . By Proposition 2.6.6, the norm of the constant section 1 with respect to the
metric / of OX is a bounded function ρ and we set
d( , ) := sup | log ρ(x)|.
x∈X
Lemma 9.5.2. The distance d on the space of locally bounded metrics of L satisfies:
⊗n
(a) If n ∈ Z , then d( ⊗n , ) = |n| · d( , ) .
(b) If ϕ : X → X is a morphism, then d(ϕ∗ , ϕ∗ ) ≤ d( , ) .
Proof: These properties follow immediately from the definitions.
302 N É RO N – TAT E H E I G H T S
Remark 9.5.8. The theorem gives canonical metrics for rigidified line bundles on all
abelian varieties simultaneously. We have seen in the proof that uniqueness already fol-
lows for all rigidified line bundles on a given abelian variety from (a), (b) and the special
case ψ = [m] of (d), where m ∈ Z, |m| ≥ 2 is fixed.
Remark 9.5.9. Note that νν,u = 1 . This follows from uniqueness because the locally
bounded M -metrics (ν−1
ν,u ν,u )u∈M also satisfy (a)–(d) of Theorem 9.5.7. If ν is
another rigidification on L , then
ν ,u = |ν/ν |u · ν,u (u ∈ M ).
Hence the M -metric ( ν,u )u∈M is canonically determined by L up to (|a|u )u∈M for
some a ∈ K × . The corresponding canonical local height will be canonically determined
by the divisor up to (log |a|u )u∈M for some a ∈ K × .
We define an M -constant to be a family (γv )v∈M of real numbers with γv = 0 only for
finitely many v ∈ M . Hence the canonical local height above is determined by the divisor
up to an M -constant.
9.5.10. In order to eliminate this indeterminacy, we introduce first some notation. Let f
be a non-zero rational function on a smooth irreducible variety X . Recall that Z0 (XK )
denotes the group of zero-dimensional cycles on the base change XK . We denote here
by [P ] the cycle associated to P ∈ X . For Z = nj Pj ∈ Z0 (XK ) with supp(Z) ∩
supp(div(f )) = ∅ , we define
×
f (Z) = f (Pj )n j ∈ K .
j
By B0 (X) , we denote the subgroup of Z0 (XK ) , which is the kernel of the degree map.
In the next result, we will make use of the fact that the indeterminacy of the canonical local
heights from Remark 9.5.9 cancels by restriction to B0 (X) .
We also make use of the pull-back τa∗ of cycles with respect to translation. Pull-back of
cycles is defined with respect to flat morphisms, but here for an isomorphism it is simply
the inverse of push-forward, meaning that τa∗ (Y ) = Y − a for any prime cycle.
Theorem 9.5.11. Let A be an abelian variety over K . For u ∈ M , a divisor D on A
and Z ∈ B0 (A) with supp(D) ∩ supp(Z) = ∅ , there is a pairing (D, Z)u ∈ R called
the Néron symbol, which is uniquely characterized by the following properties:
In order to prove uniqueness, it is enough to show that [D, [P ] − [0]]u = 0 for every
P ∈ A . For m ∈ Z , the cycle
m ([P ] − [0]) − ([mP ] − [0])
has degree 0 and is in the kernel of S . By our considerations above and (b), we get
m[D, [P ] − [0]]u = [D, [mP ] − [0]]u . (9.17)
With the same arguments as for (9.16), we deduce that the right-hand side of (9.17) is an
M -bounded function of Q = mP . Hence the left-hand side of (9.17) is an M -bounded
function of m and P and as m → ∞ this is possible only if [D, [P ] − [P0 ]]u = 0 . This
proves uniqueness.
Lemma 9.5.13. Let (L, ν) be a rigidified line bundle on the abelian variety A over K , let
a ∈ A and let νa be a rigidification on τa∗ (L) . For u ∈ M , there is a constant Cu > 0
such that
τa∗ ν,u = Cu · ν a ,u .
Proof: In order to compare the two metrics, we consider as in 9.5.1 the function ρ on A
given as the norm of 1 with respect to the metric τa∗ ν,u / ν a ,u on OA . We have to
prove that
ωL,a (P ) := log ρ(P ) − log ρ(0)
vanishes identically in P ∈ A . By Remark 9.5.9, it is clear that ωL,a does not depend on
the choice of the rigidifications ν, νa . From Theorem 9.5.7, we easily deduce the following
properties:
9.5. The Néron symbol 307
Remark 9.5.15. It is obvious that the Néron symbol is compatible with extension of the
base field. So it makes sense to work over an algebraic closure K . Note however that
the Néron symbol does not extend to arbitrary complete smooth varieties X over K , this
happens only if D belongs to the group B 1 (X) of divisors algebraically equivalent to zero.
Theorem 9.5.16. Let X be an irreducible smooth complete variety over K with base
point P0 . For every line bundle L on X algebraically equivalent to 0 and rigidification
ν , there is a unique locally bounded M -metric ( ν,u )u∈M on L satisfying the following
properties:
Since deg(Z) = 0 , it is clear that (D, Z)u does not depend on the choices of base point
and rigidification. Now we easily deduce (a)–(d) from Theorem 9.5.16.
For uniqueness, we proceed as in the proof of Theorem 9.5.11. We get a pairing [D, Z]u
with properties (a),(c), and (d), which is defined for all D ∈ B 1 (X), Z ∈ B0 (X) and
which depends only on the rational equivalence class of D . By Theorem 9.5.11, the re-
striction of the Néron symbol to abelian varieties must be unique, i.e. [D, Z]u = 0 for
abelian varieties. In general, we have seen in the proof of Theorem 9.5.16 that D is ra-
tionally equivalent to the pull-back of a divisor algebraically equivalent to 0 on the dual
abelian variety of P ic0 (X) . We conclude from (c) and the above that [D, Z]u = 0 . This
proves uniqueness.
9.5.18. Let X, X be irreducible smooth complete varieties over K . We consider a divisor
E on X × X called a correspondence. For P ∈ X such that {P } × X ⊂ supp(E)
(resp. P ∈ X with X × {P } ⊂ supp(E) ), we define a divisor on X (resp. X ) by
E(P ) := (p2 )∗ (E.({P } × X ))
and
t
E(P ) = (p1 )∗ (E.(X × {P })),
where p1 , p2 are the projections of X ×X onto the factors and where we use the proper in-
tersection product from A.9.20. By linearity, we extend these operations to zero-dimensional
cycles. Almost by definition, the resulting divisor has to be algebraically equivalent to 0 if
the zero-dimensional cycle has degree 0 .
Theorem 9.5.19. Let Z ∈ B0 (X), Z ∈ B0 (X ) with supp(Z)×supp(Z ) disjoint from
supp(E) . Then the reciprocity law
(E(Z), Z )u = (t E(Z ), Z)u
holds for every u ∈ M .
Proof: We deal first with the case where X and X are abelian varieties. Let Pj be an
irreducible component of Z . For the canonical meromorphic section sE of O(E) , we
have
E(Pj ) = (p2 )∗ div(sE |{P j }×X ) .
Let ( j,u )u∈M be the canonical M -metric of O(E(Pj )) with respect to a rigidification.
If Z = mj [Pj ] and Z = mk [Pk ] , then the proof of Theorem 9.5.17 shows that
(E(Z), Z )u = − mj mk log sE (P j ) (Pk )j,u . (9.20)
j,k
By Theorem 9.5.7 and Lemma 9.5.13, the pull-back of a canonical metric E ,u on O(E)
with respect to the morphism P
→ (Pj , P ) is equal to j,u times a positive constant
Cu . The latter does not influence (9.20) because deg(Z ) = 0 , thus we obtain
(E(Z), Z )u = − mj mk log sE (Pj , Pk )E ,u .
j,k
(a) It is bilinear.
(b) If f ∈ K(C)× , then (div(f ), D )u = − log |f (D )|u .
9.5. The Néron symbol 311
Using deg(Z) = 0 , it is clear that the pairing does not depend on the choice of .
Obviously, the pairing fulfills properties (a)–(c) of Theorem 9.5.11. Using the fact that
the harmonic forms on a complex torus are the same as the translation invariant forms,
we immediately deduce (d) as well. Finally, (e) follows from smoothness of the function
P
→ (D, [P ] − [P0 ])u on A \ supp(D) . We conclude that ( , )u is the Néron symbol.
Let be the canonical metric of O(D) with respect to a rigidification. From the proof
of Theorem 9.5.11, it is clear that
(D, [P ] − [0])u = log sD (0) − log sD (P )
for every P ∈ supp(D) . This shows easily that = up to multiplication with a
positive constant.
312 N É RO N – TAT E H E I G H T S
Example 9.5.22. Now let K be a field with a complete discrete absolute value | |v nor-
malized by log |K × |v = Z . It has a unique extension to an absolute value | |u of K (see
Proposition 1.2.7). Let A be an abelian variety over K with good reduction in v , i.e. there
is a proper smooth scheme A over the discrete valuation ring Rv with AK = A called the
Néron model. Note also that A is a group scheme unique up to isomorphism (cf. 10.3.9).
We claim that a metric on a line bundle L of A is a canonical metric with respect
to some rigidification if and only if there is a line bundle L on A with L = LK and
= L (Example 2.7.20).
For the proof, we note first that the restriction map L
→ LK gives an isomorphism from
Pic(A) onto Pic(A) . The inverse is induced by the map D
→ D of divisors taking Zariski
closures of components. To see this, use that the special fibre of A over the residue field is
irreducible and therefore every divisor supported in the special fibre is rationally equivalent
to 0 .
For L = LK with rigidification ν , we set
ν,u := ν−1
L · L .
It is easily checked that our metric ν,u on L satisfies (a)–(d) of Theorem 9.5.7, where
we allow in (d) only endomorphisms of A . Because it is also bounded (Example 2.7.20),
Remark 9.5.8 yields that the metric ν,u is the canonical metric ν,u from Theorem
9.5.7. This proves our claim.
Remark 9.5.23. For an archimedean place u , we may always reduce to Example 9.5.21
by base change to C . Hence canonical metrics u are always C ∞ .
For a non-archimedean place u ∈ M , canonical metrics u are always continuous with
respect to the u -topology and satisfy L(K)u = |K|u . This follows from Tate’s limit
argument in the proof of Theorem 9.5.4 and Remark 2.7.6. In the case of good reduction in
a discrete valuation, Example 9.5.22 shows that even L(K)u = |K|u holds. However,
the following example will show that this is not in general the case. Note also that |K|u =
1/n
∪n∈N\{0} |K|u (see Lemma 11.5.2).
Example 9.5.24. Every complex elliptic curve is biholomorphic to a complex torus C/(Z+
Zτ ) , τ > 0 . The map ζ = exp(2πiz) gives an analytic group isomorphism to the Tate
uniformization C× /q Z , q := e2πiτ . As in 8.6.13, we consider the theta function
∞
2
θ(z, τ ) = eπin τ +2πinz
.
n=−∞
It transforms into
∞
2
θ(ζ, q) = q̃ n ζ n , (9.22)
n=−∞
which shows that θ(·, q) has only simple zeros and they are precisely in ζ = −q̃ · q Z
(for details, see K. Chandrasekharan [62], Ch.V, Th.6). Let p : C× → C× /q Z be the
quotient map and let P be the 2 -torsion point p(−q̃) . Then p∗ O([P ]) is trivial and may
be identified with C× × C such that the canonical global section sP of O([P ]) pulls back
to ζ
→ (ζ, θ(ζ, q)) (use 8.6.13).
Conversely, O([P ]) may be obtained as the quotient of the trivial bundle by a Z -action
Z × (C× × C) −→ (C× × C), (k; ζ, v)
→ (q k ζ, ek (ζ, q) · v).
To determine the cocycle ek (ζ, q) , note that
∞
2 2
∞
2 2
θ(q k ζ, q) = q̃ n ζ = q̃ −k
+2kn n
q̃ (n+k) ζ n = q̃ −k ζ −k θ(ζ, q)
n=−∞ n=−∞
−k 2 −k
and hence ek (ζ, q) = q̃ ζ . Note that O([P ]) gets rigidified by choosing ν corre-
sponding to 1 in the fibre over 0 of the trivial bundle. For m ∈ Z , [m]∗ O([P ]) is given
by the cocycle
2
m 2 −km 2 2
ekm (ζ m , q) = q̃ −k ζ = ek (ζ, q)m .
The right-hand side is the cocycle of O(m2 [P ]) , hence O([P ]) is even and we have also
checked [m]∗ O([P ]) ∼ = O(m2 [P ] . By our rigidifications, the isomorphism is uniquely
determined. To determine the canonical metric ν , we compute the positive function
µ := 1ν . In fact, 1 is a section of p∗ O([P ]) = C× × C , but, locally, it may be also
viewed as a section of O([P ]) determining the metric completely. Compatibility of the
fibres over ζ and q k ζ leads to
µ(q k ζ) = |e−k (q k ζ, q)| · µ(ζ). (9.23)
2
On the other hand, [m]∗ ν = ⊗m
ν gives
2
µ(ζ m ) = µ(ζ)m .
To find the canonical metric, we may start with any metric, i.e. with a µ0 satisfying (9.23).
For |q| ≤ |ζ| ≤ 1 , we choose µ0 (ζ) = |ζ|1/2 and we extend µ0 by (9.23) to C× leading
to a continuous function. Then the proof of Theorem 9.5.4 gives
2
µ(ζ) = lim µ0 (ζ m )1/m .
m→∞
It is easy to check that µ satisfies (9.23) and therefore (9.24) holds for all ζ ∈ C× .
We owe to Tate the counterpart of these considerations for the non-archimedean case. Let
K be a field endowed with a complete (discrete) absolute value | | . For every |q| < 1 , the
314 N É RO N – TAT E H E I G H T S
×
analytic torus K /q Z is isomorphic as an analytic group to Tate’s elliptic curve Eq given
by
y 2 + xy = x3 + a4 x + a6 ,
where a4 , a6 are convergent power series in q given by
∞
1
∞
a4 = − n3 q n /(1 − q n ), a6 = − (7n5 + 5n3 )q n /(1 − q n ).
n=1
12 n=1
×
The isomorphism K /q Z → Eq is given by
∞
∞
x(ζ, q) = q n ζ/(1 − q n ζ)2 − 2 nq n /(1 − q n )
n=−∞ n=1
∞
∞
y(ζ, q) = q 2n ζ 2 /(1 − q n ζ)3 + nq n /(1 − q n )
n=−∞ n=1
We have seen in Theorem 2.2.11 and Remark 2.2.13 (see also Theorem 2.7.14 for such a
result in a more general context) that, if λD and λD are two local heights attached to
different presentations of the same Cartier divisor D , then for u|v with v ∈ MK we have
|λD (·, u) − λD (·, u)| ≤ γv
for some constant γv = maxm log+ |pm |v for finitely many pm ∈ K , depending only
on the geometric data of the presentations. In particular, γv = 0 for all but finitely many
v ∈ MK .
For F a finite subextension of K/K such that F ⊃ K(P ) , the sum
hD (P ) := λD (P, w)
w∈M F
is independent of the choice of F and is called the global Weil height attached to D . By
Theorem 2.3.6, it is uniquely defined as a quasi-function independent of the presentation,
i.e. up to a uniformly bounded quantity.
We state now the main result of this section.
Theorem 9.6.2. Let C be a smooth irreducible projective curve defined over a number
field K and let f : C → P1 be a surjective rational function defined over K . Suppose
Q ∈ f −1 (∞) is a pole of f and let Q be a presentation of [Q] , with a corresponding
family of local heights λQ (·, u) , u ∈ M .
316 N É RO N – TAT E H E I G H T S
Then there is a family of real numbers (∆v )v∈M K with ∆v ≥ 0 and ∆v = 0 for all
but finitely many v ∈ MK , depending only on C , f and the presentation Q , with the
following property.
For P ∈ C(K) \ f −1 (∞) and any finite subextension F of K/K with F ⊃ K(P, Q) ,
we have
ordQ (f ) 0
log+ |f (P )|w = − h(f (P )) + O( h(f (P )) + 1).
v∈M w ∈M , w |v
deg(f )
K F
λ Q (P , w )> ε w ∆v
Remark 9.6.3. In intuitive terms, if |f (P )|v > e∆v , then P is “close” to a pole Q of f .
The meaning of the theorem is that, as v varies, each pole of f is approached by P with a
probability proportional to the order of the pole.
Lemma 9.6.4. Let λ, λ be local heights relative to divisors D, D with disjoint support,
say given by presentations of D and D . Then there is a family of real numbers cv ≥ 0
and with cv = 0 for all but finitely many v ∈ MK , depending only on the presentations,
such that for u|v ∈ MK the following bound holds
min |λ(P, u)|, |λ (P, u)| ≤ cv .
The lemma may be easily generalized to a complete variety and to a family of local heights
relative to Néron divisors assuming that the intersection of all supports is empty.
Proof of lemma: Let us consider the open subsets U := C \ supp(D) and U := C \
supp(D ) . Since U ∪ U = C and C is M -bounded (see Proposition 2.6.17), there are
M -bounded families (Eu )u∈M and (Eu )u∈M of subsets of U and U , respectively, with
C(K) = Eu ∪ Eu (see 2.6.3, 2.6.14). We note that sD is a nowhere vanishing section over
U . Since λ(P, u) = − log sD (P )u for a locally bounded metric on O(D) (Proposition
2.7.11), we conclude that
sup sup |λ(P, u)|
u∈M ,u|v P ∈E u
is finite for all v ∈ MK and 0 up to finitely many. Working with a presentation, this may
also be deduced directly from another subdivision of U and Eu using the remarks 2.6.3,
2.6.14. Similarly, λ is locally M -bounded on (Eu )u∈M and the claim follows.
Proof of Theorem 9.6.2: By finite base change, we may assume that all poles of f are K -
rational and that C is geometrically irreducible. Indeed, there is a finite Galois extension
E/K such that the irreducible components of CK are given by conjugates (B σ )σ∈R for
a geometrically irreducible curve B over E with Q ∈ B(E) (see Example A.4.15). Then
the theorem for Q ∈ B proves the theorem for Q ∈ C as λQ is M -bounded on the
conjugates B σ = B (by Lemma 9.6.4).
Since the sum and pull-back of local heights remain a local height relative to suitable pre-
sentations (see 2.2.4 and 2.2.6), we see that
log+ |f (P )|u + ordQ (f )λQ (P, u)
9.6. Hilbert’s irreducibility theorem 317
The preceding lemma applied to D and D := −ordQ (f )[Q] shows that for u|v ∈ MK
we also have
min −ordQ (f )λQ (P, u), log+ |f (P )|u + ordQ (f )λQ (P, u) ≤ cv .
For ∆v := −cv /ordQ (f ) and for a finite subextension F of K/K with F ⊃ K(P ) , this
leads to
log+ |f (P )|w
v∈M K w ∈M F , w |v
λ Q (P , w )> ε w ∆v
= −ordQ (f ) λQ (P, w) + O(1) (9.25)
v∈M K w ∈M F , w |v
λ Q (P , w )> ε w ∆v
= −ordQ (f ) hQ (P ) + O(1).
Since f ∗ [∞] − deg(f )[Q] ∈ Pic0 (C) (see A.9.40 and Corollary 8.4.10), Corollary 9.3.10
yields
1 0
hQ (P ) = h(f (P )) + O h(f (P )) + 1 .
deg(f )
If we insert this in (9.25), we get the claim.
9.6.5. In 1887, C. Runge ([255], p.432) proved that, if G(x, y) is an irreducible polynomial
in Z[x, y] of degree d and if the homogeneous part Gd (x, y) of degree d of G is not pro-
portional to a power of an irreducible polynomial in Z[x, y] , then G(x, y) has only finitely
many zeros (x, y) ∈ Z × Q . Runge used this theorem to give a more precise criterion for
G to have only finitely many integer zeros (see [255], p.434). Runge’s method depends
on the construction of rational approximations to algebraic functions of one variable and is
quite explicit and applicable in practice.
Here we give an extension of Runge’s theorem, as a consequence of Theorem 9.6.2 and
Lemma 9.6.4.
Theorem 9.6.6. Let C be a smooth irreducible projective curve defined over a number
field K . Let f : C → P1 be a surjective rational function on C defined over K .
Let P ∈ C such that f (P ) ∈ OK (f (P )),S for a finite subset S ⊂ MK (f (P )) containing
all the archimedean places and satisfying the basic condition
deg(f )
[K(P ) : K] < |S|−1 , (9.26)
maxQ |ordQ (f )| [K(Q) : K]
where the maximum is taken over all poles of f . Then the set of such points P is finite and
effectively computable.
Remark 9.6.7. It is a simple exercise, which we leave to the reader, to deduce Runge’s
theorem from Theorem 9.6.6 by taking f = x , K = Q and S = {∞} .
318 N É RO N – TAT E H E I G H T S
By Theorem 2.2.11 and Remark 2.2.13, there are constants cv ≥ 0 , with cv = 0 for all but
finitely many v ∈ MK such that
log+ |f (P )|u + ord (f )λ (P, u) ≤ cv
Q Q (9.27)
Q∈f −1 (∞)
Let (∆v )v∈M K be a family as in Theorem 9.6.2 working for all presentations Q . We see
that there is Q ∈ f −1 (∞) such that
λQ (P, w) > εw ∆v (9.29)
as soon as
h(η) > εv η |S| (cv + deg(f )∆v ) .
The right-hand side is bounded by an effective constant independent of the choice of P, S ,
and vη (use (9.26)), hence (9.29) holds as soon h(η) is sufficiently large, which we shall
suppose henceforth.
Now let σ be an automorphism of F/K . We write w , Q for w ◦ σ and σQ , and note
that we may choose the presentations of the poles of f and associated local heights so that
they are compatible by the action of σ . Since the action of Gal(F/K) on the places of F
extending v is transitive (see Corollary 1.3.5), we have εw = εw and
λQ (σP, w )λQ (P, w) > εw ∆v .
It is now clear that, if {P1 , . . . , Pr } , {Q1 , . . . , Qs } are a full set of conjugates of P and
Q over K , then
r s
log+ |η|w
i=1 j=1 w ∈M F , w |v
λ Q (P i , w )> ε ∆v
j w
≥ log+ |η|w ≥ log+ |η|v η ,
w ∈M F , w |v
where the last step comes from Lemma 1.3.7. Noting that ordQ j (f ) = ordQ (f ) for every
j and applying Theorem 9.6.2, we conclude with (9.28) that
|ordQ (f )| 0
|S|−1 h(η) ≤ rs h(η) + O( h(η) + 1).
deg(f )
9.6. Hilbert’s irreducibility theorem 319
The reader will perceive the connexion of such a finiteness theorem with Siegel’s finiteness
theorem on integral points on curves as in Theorem 7.3.9, and indeed a proof of the finite-
ness result can be obtained by using the full force of Siegel’s theorem. However, Siegel’s
theorem is ineffective. It is therefore of some interest that Theorem 9.6.6 is strong enough
to prove a general effective version of Hilbert’s irreducibility theorem.
Theorem 9.6.9. Let C be a smooth irreducible projective curve defined over a number
field K and let f : C → P1 be a surjective rational function on C , also defined over
K . Then for all n ∈ N except for a set of natural density 0 , the divisor f ∗ [n] is a prime
divisor over K .
Remark 9.6.10. Recall that a subset M ⊂ N is called of natural density ρ if the following
limit exists and
|{n ∈ M | n ≤ x}|
ρ = lim .
x→∞ |{n ∈ N | n ≤ x}|
As will be clear from the proof, this theorem is effective in the following sense. There
are effectively computable quantities r , κ(B, r) , an effectively computable polynomial
P (x1 , . . . , xr−1 ) ∈ K[x1 , . . . , xr−1 ] \ {0} , depending on the set of ramification points of
f , such that, if 1 ≤ bi ≤ B with bi ∈ N , n ≥ κ(B, r) , and P (b1 , . . . , br−1 ) = 0 , then
at least one of the divisors f ∗ [n] , f ∗ [n + bi ] , i = 1, . . . , r − 1 , is a prime divisor over
K . In particular, the least n ∈ N such that f ∗ [n] is a prime divisor over K is effectively
computable.
Dèbes and Zannier [88] gave a proof of Hilbert’s irreducibility theorem relying on G -
functions. They used a clever trick by considering generic fractional linear transforms to
omit ramification. Here, we put this in the context of algebraic geometry by working on an
auxiliary curve to which we can apply directly Theorem 9.6.6. Before we come to the proof
of Theorem 9.6.9, we need a couple of lemmas.
Lemma 9.6.11. For i = 0, . . . , N , let Ci be a smooth geometrically irreducible curve
over a field F of characteristic 0 and let fi : Ci → P1F be a surjective rational function
320 N É RO N – TAT E H E I G H T S
gi : Ci × AN 1 N
F → PF × A F , (xi , t)
→ (fi (xi ) − ti , t) (i = 0, . . . , N ),
where we set t0 := 0 . Let Ri be the set of points in P1F \ {∞} over which fi ramifies.
Then the following properties hold:
we use the Lefschetz principle: Ct is defined over a finitely generated subfield of F , which
may be embedded into C .
Let Rt := N i=0 (Ri − ti ) . Then the same arguments as in the proof of (d) show that
gt : (Ct )an → P1an \ {∞} is a finite ramified covering as in 12.3.8, and outside of gt−1 (Rt )
we get a topological covering (see 12.3.9) of degree d0 · · · dN (every fibre has exactly
d0 · · · dN points by definition of the fibre product). We choose x = (x0 , . . . , xN ) ∈
(Ct )an with y = g(x) ∈ Rt . To prove that (Ct )an is connected, it is enough to show that
there is a path connecting x with any other fibre point over y . We may just change one
entry at the time and, for notational simplicity, we change x0 to x0 ∈ g0−1 (y) .
The fundamental group π1 (P1an \ (R0 ∪ {∞}), y) is generated by the loops (σz )z∈R 0 ,
where σz is the loop starting in the base point y , then passing to a small neighbourhood of
z , turning in a small positive circle around z and returning the same way back to y . This
is a consequence of van Kampen’s theorem and is proved in the case |R0 | = 2 in 12.6.1;
the general case is by induction. We may assume that σz omits Rt and ∞ .
By the theory of covering spaces, π1 (P1an \ (R0 ∪ {∞}), y) operates transitively on the
fibre g0−1 (y) (see 12.3.5), hence there is γ ∈ π1 (P1an \ (R0 ∪ {∞}), y) such that the lift
γ0 to (C0 )an with starting point x0 ends in x0 . Now we may assume that γ is a product
of σz s (repetition possible). Then γ does not pass through Rt ∪ {∞} and we choose a
gi -lift γi in (Ci )an with starting point xi . It is enough to show that the lift of σz ends
in xi for i = 1, . . . , N . Since (Ci )an is a topological covering over a neighbourhood
of z (as gi does not ramify over z , otherwise ti − t0 = ti ∈ Ri − R0 ), we conclude
that the lift of σz has xi as end point (consider first the turn around z ). Therefore, the
path γ = (γ0 , . . . , γN ) starts in x and ends in (x0 , x1 , . . . , xN ) . This proves that Ct is
geometrically irreducible for every t as in (d).
Since we may do this for the generic point ξ of AN F , we conclude that the generic fibre
Cξ of C → AN N
F is a geometrically irreducible curve over F (AF ) . We conclude that
C is irreducible and hence we get (b). By Zariski’s main theorem (note that F (AN F ) is
algebraically closed in the function field of C by geometric irreducibility (see A.4.11) and
hence we may apply [136], Cor.4.3.10), we conclude that the fibres Ct are geometrically
connected for every t ∈ AN F , proving (c).
In the course of the proof, we have also seen that the generic fibre of g has degree d0 · · · dN ,
thus proving the remaining part of (a). Here, we use the invariance of the degree with respect
to base change.
Remark 9.6.12. Alternatively, we give a more geometric way of deducing geometric con-
nectedness from (d) without using the theory of covering spaces. As before, it is enough
to show that Ct is geometrically connected for any t as in (d). We may assume F alge-
braically closed.
By induction on N and passing to normalizations, it suffices to consider the case N = 1 .
For t ∈ R1 − R0 , the curve
f1 (x1 ) − f0 (x0 ) = t
in C0 × C1 is Ct . We consider the rational map
F : C0 × C1 P1F , (x0 , x1 )
→ f1 (x1 ) − f0 (x0 ),
322 N É RO N – TAT E H E I G H T S
Lemma 9.6.14. Under the hypothesis of 9.6.13, mQ (g ) is the least common multiple of
the multiplicities m0 , . . . , mN and LQ is contained in the compositum of L0 , . . . , LN in
F.
Proof: It is enough to prove the claim for N = 1 , the general case is easily obtained by
induction. By base change to Q = (Q0 , Q1 ) , we may assume Q0 , Q1 to be F -rational.
We would like to argue complex analytically to work in a chart around Q ∈ C0 × C1 with
coordinates (z0 , z1 ) , where the fibre product is given by the equation u0 z0m 0 = u1 z1m 1 .
This would not change the relevant multiplicities and we could resolve the singularity at
(0, 0) in the chart to get the normalization over Q . However, to also get our claim about the
fields Lj , the argument should be carried out over F . This will be done in the framework of
formal geometry (see [148], Section II.9), meaning that we will work over an infinitesimal
neighbourhood of Q considering formal power series with coefficients in F instead of
convergent complex power series. We will use that the residue field and the local parameter
of the m -adic completion of a local ring remain the same as for the original local ring,
hence the multiplicities and Lj , LQ may be calculated in an infinitesimal neighbourhood.
The formal completion of X := C0 ×B · · · ×B CN along Q is Spec(O X,Q ) , obtained
as the mQ -adic completion of the local ring OX,Q (see [148], Example II.9.3). Note that
OC ,Q is regular ([148], Th.I.5.4.A) and hence is isomorphic to the ring of the formal
j j
power series F [[πj ]] by the Cohen structure theorem ([148], Th.I.5.5A). This yields easily
X,Q ∼
O m m
= F [[π0 , π1 ]]/u0 π0 0 − u1 π1 1 .
There is a finite succession of blowing-ups in points over Q realizing the normalization C
in a neighbourhood over Q . This may be performed formal analytically, which also follows
X,Q in its ring of fractions is isomorphic to the
from the fact that the integral closure of O
completion of the integral closure of OX,Q (see O. Zariski and P. Samuel [338], Ch.VIII,
Th.33). Hence we may replace X by the affine curve in A2F given by
A0 xm
0
0
= A1 xm 1
1 ,
Therefore, we get a resolution of the singularity at (0, 0) into k non-singular points located
at the points (0, y1 ) with y1k = A1 /A0 . Now Q corresponds to one of the points (0, y1 ) ,
where π := y0 is a local parameter. Using this correspondence, we see that
g = u0 xm
0
0
= u0 y1a 0 m 0 π m 0 µ 1 .
Hence mQ (g ) = kµ0 µ1 proving the first claim. Moreover
u0 (Q)y1a 0 m 0 (Q) = A0 (A1 /A0 )µ 0 a 0 = A−µ
0
1 a1
Aµ1 0 a 0 .
Noting that
−a /m 0
(A−a 1 µ1 a /m 1
0 Aa1 0 µ 0 )1/(kµ 0 µ 1 ) = A0 1 A1 0
we see that LQ ⊂ LQ 0 LQ 1 , proving the second claim.
9.6.15. Let t ∈ AN
F be as in Lemma 9.6.11(d) and assume in addition that t is F -rational.
Let Ct be the normalization of Ct . Our goal is to apply Runge’s theorem to the canonical
morphism ft : Ct → P1F given as the composition of gt with the normalization morphism.
So we need an estimate for −ordQ (ft )[F (Q) : F ] , where Q is any pole of ft .
Lemma 9.6.16. Suppose the hypothesis of 9.6.15 holds and let d := deg(f ) . Then
−ordQ (ft )[F (Q) : F ] ≤ dd .
Proof: We will apply Lemma 9.6.14 for Cj := C , B := P1F , gj := gj,t and hence
C = Ct . We note first that the canonical images Qj ∈ Cj of Q are all poles of f .
Let us choose J ⊂ {0, . . . , N } such that, for every i = 0, . . . , N , there is exactly one
j ∈ J with Qj conjugate to Qi . For conjugates, we refer to Example A.4.14. Note that
−ordQ i (gi ) is equal to the multiplicity mi of the pole Qi with respect to f . Similarly,
the field Li defined in 9.6.13 is the same with respect to f and with respect to gi , since f
and gi differ only by a constant. Obviously, mi and Li do not depend on the conjugation
class of Qi . Now Lemma 9.6.14 yields
−ordQ (ft )[F (Q) : F ] ≤ (mj [Lj : F ]) .
j∈J
over K and they are all conjugate with respect to Gal(L/K) . If Theorem 9.6.9 is known
for the restriction fi : Ci → P1L of f , then fi∗ [n] is
a prime divisor over L , for n ∈ N
outside a set of natural density 0 . But then f ∗ [n] = i fi∗ [n] has to be a prime divisor
over K by considering the Galois action. So we may assume C geometrically irreducible.
For r ≥ 2 , let g : C → P1K × AK
r−1
be the fibre product considered in Lemma 9.6.11 with
Ci := C and fi := f for i = 0, . . . , N := r − 1 . Let R be the set of points in P1K \ {∞}
over which f ramifies and define t0 := 0 . We set
r−1
r−2
P (t1 , . . . , tr−1 ) := (ti − tj − z),
z∈R−R i=0 j=i+1
hence the points t considered in Lemma 9.6.11(d) form the Zariski open dense subset
U := {P = 0} .
Clearly, P is invariant under conjugation and hence defined over K . Note also that the
proof of Lemma 9.6.11 shows that the fibre gt : Ct → P1K (t) has degree dr . To get an
irreducible smooth projective curve, we will replace Ct by its normalization denoted by
Ct . By Lemma 9.6.11(d), they are isomorphic over P1K (t) \ {∞} . The induced morphism
ft : Ct → P1K (t) has also degree dr since the normalization morphism is birational (see
A.13.2).
Let B ∈ N and let B denote the box
B = {t | 1 ≤ ti ≤ B, i = 1, . . . , r − 1}.
The number of points b = (b1 , . . . , br−1 ) ∈ Nr−1 in the box B is B r−1 , while trivially
the number of those which verify the further condition P (b) = 0 is at most cB r−2 , for
some constant c depending only on P . Now we apply Theorem 9.6.6 to fb : Cb → P1K ,
for b in the set
H(B) := {b ∈ B | P (b) = 0}.
By Lemma 9.6.15, we have
max |ordQ (fb )| [K(Q) : K] ≤ dd , (9.31)
Q
Note also that deg(fb ) = dr . Therefore, using (9.31) and the last displayed inequality,
Theorem 9.6.6 applied with S the set of archimedean places of K yields
r−1
[K(Pi ) : K] ≥ [K : Q]−1 dr−d (9.32)
i=0
for all n ≥ κ(B, r) , where κ(B, r) may depend on B , r , and the geometric data.
If f ∗ [n + bi ] is not a prime divisor over K , then it has an irreducible component Y with
[K(Y ) : K] ≤ d/2 (see Example 1.4.12) and hence every Pi ∈ f −1 (n + bi ) satisfies
326 N É RO N – TAT E H E I G H T S
[K(Pi ) : K] ≤ d/2 . Thus by (9.32) the number s of indices i for which f ∗ [n + bi ] is not
a prime divisor over K satisfies
(d/2)s dr−s ≥ [K : Q]−1 dr−d ,
hence [K : Q]dd ≥ 2s and finally
s ≤ (d log d + log([K : Q]))/ log 2.
Now let n ≥ κ(B, r) and suppose that there are at least ρB integers m ∈ (n, B + n]
such that f ∗ [m] is not a prime divisor over K . Choosing entries of the form b = m −
n , we get at least (ρB)r−1 − cB r−2 points b ∈ H(B) . We choose r > (d log d +
log([K : Q]))/ log 2 . Then the above shows that no such points b ∈ H(B) exist, hence
ρ ≤ (c/B)1/(r−1) .
For B sufficiently large, this yields zero density for the set of integers m with f ∗ [m] not
a prime divisor.
Remark 9.6.17. If we replace εw by [Tw : Kv ]/[F : K] according to our normalizations
in 1.3.12 then Theorem 9.6.2 and its proof hold for any field K with product formula.
Moreover, in Runge’s theorem, we can still say that the height of such points is effectively
bounded by a constant c . In the proof, some care is needed in case of a finite characteristic.
For F , we choose a sufficiently large finite normal extension and the argument shows that
we may replace [K(P ) : K] and [K(Q) : K] in the basic condition of Theorem 9.6.6 by
their separable degrees.
We claim that the same arguments as in the proof of Theorem 9.6.9 show that a field K
with product formula and of characteristic 0 is Hilbertian.
To see this, we choose a non-archimedean v ∈ MK and α ∈ K such that log |α|v is
above the bound c in Runge’s theorem. Every n ∈ N coprime to the residue characteristic
of v is a unit in Rv and hence |α|v = |α + n|v ≥ ec . We conclude that h(α + n) ≥ c .
We apply Runge’s theorem in the proof of Theorem 9.6.9 to the fibres fb−1 (α + n) and
with S such that α is an S -unit. In the same way as in the number field case, we deduce
that the set n ∈ N coprime to char(k(v)) and with f ∗ [α + n] not a prime divisor over K
has natural density 0 in N .
In fact, R. Weissauer [329] proved by different methods from model theory that every field
with product formula is Hilbertian.
Large parts of this chapter are borrowed from the books of Lang [169] and Serre
[277]. The additional remarks 9.2.9–9.2.12 on dynamical systems and further in-
formations may be found in G.S. Call and J.H. Silverman [55], J.H. Silverman
[285]. The results from Section 9.4 are from D. Mumford [211]. Finally, Section
9.5 is due to Néron [218], see also [169], Ch.11.
As mentioned in the introduction, Néron obtained in [217] a best model for abelian
varieties A over the quotient field of a discrete valuation ring R . This Néron
9.7. Bibliographical notes 327
model is a smooth group scheme over R with generic fibre A . It is proper over R
if and only if A has good reduction. A detailed account for Néron models may be
found in the book of S. Bosch, W. Lütkebohmert, and M. Raynaud [44].
In [218], an interpretation of the Néron symbol in terms of intersection multiplic-
ities on the Néron model is given as in Example 9.5.22. For the singular case,
we refer to [169], Ch.11, §5. Over the complex numbers, Néron [218] has also
expressed the Néron symbol in terms of theta functions. For explicit formulas
of Néron’s canonical local height on elliptic curves, we refer to Silverman [286],
Ch.VI, §3, §4.
Beilinson and Bloch have generalized the Néron pairing to a pairing (Y, Z)u for
disjoint cycles Y and Z algebraically equivalent to 0 on the smooth complete
variety X satisfying dim(Y ) + dim(Z) = dim(X) + 1 (see A. Beilinson [20], S.
Bloch [27], where there are further generalizations and related conjectures). For a
general approach of canonical local and global heights of subvarieties with respect
to line bundles, the reader is referred to [141].
The literature on Hilbert’s irreducibility theorem is quite ample and we refer to
Lang [169], Ch.9 for a treatment of rather general cases. A critical analysis of both
early and modern works on the subject is in Schinzel’s monograph on polynomials
[259]. Very general results were obtained by Weissauer [329] using methods of
non-standard analysis and logic; proofs on classical lines were given by M. Fried
[124].
Theorem 9.6.2 is due essentially to V.G. Sprindžuk [289], who used quite different
techniques related to the theory of G -functions. The proof given here is in [28],
with the corrections provided by P. Dèbes [86]. See also P. Dèbes [87] and P. Dèbes
and U. Zannier [88]. The proof of Runge’s theorem 9.6.6 follows an argument in
[28]. The proof of Theorem 9.6.9 is a modification of [88] avoiding the theory of
G -functions and using translations, rather than the general Möbius transformations
of [88] and [124], in order to deal with Hilbert subsets of N .
10 THE MORDELL–WEIL THEOREM
10.1. Introduction
The main content of this chapter is the proof of the Mordell–Weil theorem, namely
the finite generation of the group of rational points of an abelian variety defined
over a number field.
The finiteness of the rank of the group of rational points on an elliptic curve E
defined over Q was proved by L.J. Mordell in his celebrated paper [207]. Mordell
worked with the elliptic curve given by a quartic equation y 2 = a0 x4 +. . .+a4 and
used its parametrization by means of Jacobi elliptic functions and theta functions.
It was by no means obvious at the time how to extend this result to elliptic curves
over number fields and to abelian varieties, and this was done by Weil in his famous
thesis [324].
A. Weil [325] also realized that, in the case of elliptic curves, it was somewhat
simpler to work with a Weierstrass model rather than the quartic equation used
by Mordell, replacing the addition and duplication formulas of elliptic functions
used by Mordell by rational functions on the curve; since then this has become
the standard elementary approach to the Mordell–Weil theorem for elliptic curves
over a field.
The basic structure of the proof, which remains unchanged until now, is in two
stages. The first stage consists in proving the so-called weak Mordell–Weil the-
orem, namely the finiteness of A(K)/φA(K) for some non-trivial isogeny φ of
the abelian variety A , usually taken to be [m], namely multiplication by an inte-
ger m ≥ 2. In the second stage, we use a Fermat descent argument to complete
the proof. The explicit approach by Mordell using elliptic functions, even with
Weil’s simplifications, is not practical enough to be carried out explicitly on ellip-
tic curves when dealing with multiplication by a general m , let alone on abelian
varieties, which we do not quite know how to describe by means of a useful ex-
plicit set of equations. Thus in the general case we follow a more abstract point of
view, culminating in the systematic use of Galois cohomology in the proof.
328
10.2. The weak Mordell–Weil theorem (elliptic curves) 329
In this chapter, we shall follow a mid-course, beginning in Section 10.2 with the
naive proof of the weak Mordell–Weil theorem for an elliptic curve over a number
field.
Section 10.3 contains a detailed proof of the important Chevalley–Weil theorem
on unramified morphisms, with its standard application (Corollary 10.3.13) that,
1
for an abelian variety A over a number field K , the extension K( m A(K))/K is
finite.
This leads in Section 10.4 to an alternative proof of the weak Mordell–Weil the-
orem through the Kummer pairing, and its standard interpretation in Galois co-
homology is the content of Section 10.5. We give also an extension of the weak
Mordell–Weil theorem suitable to function fields of transcendence degree 1. The
final short Section 10.6 concludes the proof of the Mordell–Weil theorem by means
of the Fermat descent.
The Mordell–Weil theorem as proved here is ineffective. Indeed, as yet no general
method is known for finding generators for the Mordell–Weil group A(K) of an
abelian variety A over a number field K . The ineffectiveness arises from the fact
that no procedure is known for finding representatives in A(K) of the finite group
A(K)/φA(K), due to our inability to decide whether a homogeneous space for A
has a rational point over K and, if so, to find an algorithm to produce such a point.
The question appears to be extraordinarily deep, even in the case of elliptic curves
over Q , and represents one of the most interesting open diophantine problems at
the time of writing this book.
For Section 10.2, the reader is assumed to be familiar with the basics of elliptic
curves presented in 8.3. For the other sections, we recommend first reading the
whole of Chapter 8. The proof of the Mordell–Weil theorem in 10.6 uses also the
fundamental properties of the Néron–Tate height from 9.2 and 9.3.
10.2.2. The intersection of E with the line P2K \ A2K in P2K is a divisor 3O , and
the point O = (0 : 0 : 1) is an inflexion point of E , which is taken as the identity
of the group multiplication. The affine part of E is simply E \ {O}, and in what
follows it will prove to be notationally convenient to write a point P ∈ E \ {O}
as P = (x, y) in terms of its affine coordinates.
Proposition 10.2.3. Let α1 , α2 , α3 ∈ K and f (x) := (x−α1 ) (x−α2 ) (x−α3 ).
Then the plane curve X in P2K , given in affine coordinates by y 2 = f (x), is an
elliptic curve over K if and only if the discriminant
Df := (αi − αj )
i=j
of f is not 0.
Proof: It is easy to show that X is irreducible. By the Jacobi criterion A.7.15,
(x, y) ∈ X is a singular point if and only if
∂ 2 ∂ 2
(y − f (x)) = 0 and (y − f (x)) = 0,
∂x ∂y
hence if and only if f (x) = 0 and 2y = 0. But y = 0 is equivalent to f (x) = 0,
while f and f have a common zero if and only if the discriminant Df of f
vanishes.
In the affine neighborhood {y = 0} of the point at infinity O , a defining equation
for X is given by the polynomial
g(z, x) := z − (x − zα1 ) (x − zα2 ) (x − zα3 )
∂g
and since ∂z (0, 0) = 1, the point O is always smooth.
We conclude that X is smooth if and only if Df = 0. In this case, the genus
formula for plane curves in A.13.4 yields that X is an elliptic curve.
Under the assumptions of 10.2.1, we describe now the morphism [2] explicitly.
Let (x, y) be the standard affine coordinates of E . We have a group structure on
E given by the zero element O at infinity and we have the geometric description
of addition and inverse given in 8.3.6–8.3.7.
Let P = (x0 , y0 ) ∈ E(K); then −2P is equal to the third intersection point of
the tangent at P with E . If 2P = −2P = O , this tangent is vertical, proving:
Proposition 10.2.4. The group E[2] of 2-torsion points of E consists of the iden-
tity element O and the points (αi , 0), i = 1, 2, 3, of order 2.
10.2.5. Let P = (x0 , y0 ) ∈ E(K) and suppose that P is not a 2-torsion point.
The tangent line at P has equation
y = ax + b (10.1)
10.2. The weak Mordell–Weil theorem (elliptic curves) 331
The reader is assumed to be familiar with the concepts of bounded sets from Sec-
tion 2.6 and ramification from Section A.12 and Appendix B. To fix the ideas, let
us consider for the moment an unramified finite morphism ϕ : Y → X of pro-
jective varieties over a field K . For P ∈ Y, Q := ϕ(P ) and a discrete valuation
v of K , we may ask the question whether the field extension K(P )/K(Q) is
unramified at all places over v . This is not true in general, but the local form of
the theorem of Chevalley–Weil states that the discriminant of the completions of
K(P )/K(Q) divides an element of the valuation ring independent of P .
This occupies us in the first part of this section, and then we globalize the statement
to families of places. In particular, if K is a number field, then we will see that
K(P )/K(Q) is unramified over all but finitely many places v .
336 THE MORDELL–WEIL THEOREM
1
Together with Hermite’s theorem, this will lead us to the finiteness of K( m A(K))/
K for an abelian variety A over K and m ∈ Z \ {0}.
10.3.1. Let K be a field with a non-archimedean absolute value | | on the alge-
braic closure K ⊃ K . We will first prove a local version of the Chevalley–Weil
theorem with respect to this absolute value, under fairly general conditions. We
need to introduce first some notation:
Recall that, for Q ∈ X(K), the residue field K(Q) is the intermediate field
of K/K generated by the coordinates of Q (in any affine chart). Let RQ be
the valuation ring of the restriction of | | to K(Q). The valuation ring of the
restriction of | | to K is denoted by R , and completions with respect to | | will
be denoted by .
Let ϕ : Y −→ X be a morphism of K -varieties and let P ∈ Y (K) with image
Q ∈ X(K). Since ϕ is defined over K , we have K(Q) ⊂ K(P ) and RQ =
K(Q) ∩ RP .
We define the local discriminant
P },
dP/Q := {det(T r (ai bj )) | a1 , . . . , a , b1 , . . . , b ∈ R
K(P )/K(Q) d d
where we have abbreviated d for the local degree dP := [K(P ) : K(Q)].
If R
is a discrete valuation ring, then this agrees with the discriminant dR P /R Q from
B.1.14.
Lemma 10.3.2. The discriminant Q .
dP/Q is an ideal in R
Proof: Replacing a1 by λa1 for any λ ∈ R Q , it is evident that R
Q
dP/Q ⊂
dP/Q .
To prove the claim, it is enough to show that the trace of any a ∈ RP is in R Q .
This follows from the fact that a subset S of a valuation ring R is an ideal if and
only if RS ⊂ S . Let f (t) = tm +am−1 tm−1 +· · ·+a0 be the minimal polynomial
. By transitivity of traces, it is clear that T r
of a over K(Q) (a) is a Z -
)/K(Q)
K(P
ensures that all conjugates of a have
multiple of am−1 . Completeness of K(Q)
Q [t]. This
the same absolute value (use Proposition 1.2.7) and hence f (t) ∈ R
proves T rK(P
)/K(Q)
(a) ∈ RQ and the claim.
Now we state the local Chevalley–Weil theorem. Briefly, it means that for an
unramified morphism, the extension K(P )/K(Q) cannot be too ramified.
Proposition 10.3.3. Let us fix an embedding of the field K into K and let | | be
a non-archimedean absolute value on K . Let ϕ : Y −→ X be a finite unramified
morphism of K -varieties and E be a bounded set in X . Then there is α ∈ R\{0}
such that α ∈
dP/Q whenever P ∈ Y (K), with Q := ϕ(P ) ∈ E .
Proof: It is known that an unramified morphism is locally equal to a closed em-
bedding followed by a standard étale morphism (see A.12.17). This means that for
10.3. The Chevalley–Weil theorem 337
↓ ↓
K[W ] →→ K(P )
Here dw 0
P/Q is the discriminant of Rw0 over Rw0 , where Rw0 is the valuation ring
of w0 in K(Q) and Rw0 is its integral closure in K(P ).
Proof: To prove the claim, we may assume Y irreducible and ϕ surjective (since
ϕ is a closed map, A.12.4). By Proposition 2.6.17, X(K) is M -bounded in X ,
where M is the set of places on K extending those of MK . Now the result
follows from the remarks below, Theorem 10.3.5 and the decomposition
dw0
dw
P/Q Rw0 = P/Q
w
of B.1.21, where w ranges over all places in MK(P ) with w|w0 . Note that the
extension K(P )/K(Q) is separable because ϕ is unramified. The number of
places w lying over w0 is bounded by [K(P ) : K(Q)] (see Remark 1.3.3) and
this is uniformly bounded (by the maximum of the d = deg(f ) occuring in the
proof of Proposition 10.3.3). Finally note that K(P )/K(Q) is unramified over
w0 if and only if 1 ∈ dw0
P/Q , see B.2.13.
Now we can state the global version of the Chevalley–Weil theorem for number
fields.
Theorem 10.3.7. Let K be a number field and let ϕ : Y −→ X be an unramified
finite morphism of K -varieties. If X is complete, then there is a non-zero α ∈ OK
such that for any P ∈ Y (K) and Q := ϕ(P ) the discriminant dP/Q of OK(P )
over OK(Q) contains α .
Proof: By B.1.20, we have in the notation of Corollary 10.3.6
dwP/Q ∩ OK(Q) ,
0
dP/Q =
w0
where w0 ranges over all non-archimedean places of K(Q). Now we apply Corol-
lary 10.3.6 with MK the set of non-archimedean places of K . If v ∈ S , then we
may choose αv ∈ OK . Note that in a Dedekind domain the product or intersection
340 THE MORDELL–WEIL THEOREM
Finally, we apply the Chevalley–Weil theorem to an abelian variety and the mor-
phism multiplication by m . In 10.3.9 and in Proposition 10.3.10, we relate the
notion of good reduction to the discriminants dP/Q introduced before. Here, the
reader should be familiar with schemes. The main reference is [44]. We shall
not need these results in the further course of this book, and the reader may move
immediately to Theorem 10.3.11.
10.3.9. Let A be an abelian variety over the field K and let Rv be a discrete valuation
ring in K . We say that A has good reduction in v if there is a proper smooth scheme A
over Rv such that we may identify the generic fibre AK with A .
Assume also that A has good reduction in v and let Y be any smooth scheme over Rv .
A morphism YK → A over K induces a rational map Y A , defined on the points
of codimension 1 by the valuative criterion of properness (see [148], Theorem II.4.7). By
the analogue of Theorem 8.2.21 for group schemes, it is a morphism. Then A is unique
up to isomorphisms which extend the identity and is called the Néron model of A . For
m ∈ Z \ {0} , the morphism multiplication by m extends to A , as well as the group
structure. It can be shown that this extension [m]A is flat ([44], 7.3.2). If m is not divisible
by the characteristic of the residue field relative to v , then [m]A is fibre-wise étale (by
Proposition 8.7.2) and it is even an étale morphism ([44], 7.3.2).
The local Chevalley–Weil theorem for abelian varieties may be stated in the following
precise form:
Proposition 10.3.10. Let A be an abelian variety over K , let m ∈ Z \ {0} and let v be
a discrete valuation on K . If A has good reduction in v and if the characteristic of the
residue field does not divide m , then for any P ∈ A(K) , the extension K(P )/K([m]P )
is unramified at all places lying over v .
10.4. The weak Mordell–Weil theorem (abelian varieties) 341
Let A be an abelian variety over the field K . The goal of this section is the proof
of the following statement, known as the weak Mordell–Weil theorem.
Theorem 10.4.1. Let A be an abelian variety over a number field K and let m
be a positive integer. Then A(K)/mA(K) is finite.
10.4.2. To begin with we need to introduce some notation. As usual, Gal(L/K)
denotes the Galois group of an intermediate field extension K ⊂ L ⊂ K . Let
g ∈ Gal(L/K) and let X be a variety over K . We view a point x ∈ X(L) as
belonging to some affine chart, with affine coordinates in L. Applying g −1 to
the coordinates, we get a well-defined point xg ∈ X(L). Clearly, xgh = (xg )h
and hence we have an action of the Galois group on X(L). If ϕ : X → Y
is a morphism over K , then ϕ(xg ) = ϕ(x)g . If F denotes the fixed field of
342 THE MORDELL–WEIL THEOREM
Recall that A[m] denotes the group of m -torsion points of A . The next statement
is contained in 10.2.13. We give an alternative proof using methods of Kummer
theory.
Lemma 10.4.3. Let L be a finite Galois extension of K and let m ∈ Z \ {0}. If
A(L)/mA(L) is finite, then A(K)/mA(K) is finite.
Proof: The inclusion A(K) ⊂ A(L) induces a homomorphism
A(K)/mA(K) −→ A(L)/mA(L)
of abelian groups. Let N be its kernel. It is enough to show that N is finite.
Choose a system of representatives in A(K) for N . For each representative a,
choose ba ∈ A(L) such that a = mba . Consider an element g ∈ Gal(L/K) and
define
λa (g) := bga − ba .
By 10.4.2, we have
mλa (g) = (mba )g − mba = ag − a.
By K -rationality of a, this is zero. Using our system of representatives, the rule
a → λa defines a map from N to the set of maps
Gal(L/K) −→ A[m].
N will be finite if the map is injective and the range is finite. The latter statement
follows from Proposition 8.7.2. In order to prove the former, let us suppose that
λa = λa for representatives a, a . We have
bga − ba = bga − ba
and hence
(ba − ba )g = ba − ba
for every g , or equivalently ba − ba ∈ A(K) (by 10.4.2). Therefore, by applying
[m] we get a = a .
10.4.4. An important step in the proof of the weak Mordell–Weil theorem is the
generalization of some aspects of Kummer theory to abelian varieties.
Let m ∈ Z \ {0} be not divisible by char(K) and assume that A[m] ⊂ A(K).
We denote the separable algebraic closure of K in K by K s . For a ∈ A(K),
there is b ∈ A(K s ) such that a = mb (using [m] unramified from Proposition
8.7.2, every such b ∈ A(K) is in A(K s )). If g ∈ Gal(K s /K), then we define
a, g := bg − b.
By 10.4.2, we have a, g ∈ A[m].
10.4. The weak Mordell–Weil theorem (abelian varieties) 343
This shows that a, g is independent of the choice of b (choose b ∈ A[m] and
use that b ∈ A(K) by assumption). Moreover, we see that , is linear in the
first variable.
The map
, : A(K) × Gal(K s /K) −→ A[m]
is called the Kummer pairing. The right-kernel of , is defined by
{g ∈ Gal(K s /K) | a, g = 0 for every a ∈ A(K)}
and the left-kernel is defined in a similar fashion by
{a ∈ A(K) | a, g = 0 for every g ∈ Gal(K s /K)}.
1
As in Corollary 10.3.13, let K m A(K) be the smallest intermediate field K ⊂
L ⊂ K such that any b ∈ A(K) with mb ∈ A(K) is rational over L.
Proposition 10.4.5. The Kummer with left-kernel mA(K) and
pairing1 is bilinear,
right-kernel the subgroup Gal K s /K( m A(K)) of Gal(K s /K).
Proof: Let g, g ∈ Gal(K s /K). Using the notion and arguments of 10.4.2 and
10.4.4, we have
a, gg = bgg − b = (bg − b)g + bg − b.
Since a, g is K -rational by assumption, we get
a, gg = a, g + a, g .
This proves linearity in the second variable and thus , is bilinear (see 10.4.4).
For a ∈ mA(K), choose b ∈ A(K) such that a = mb . By K -rationality of b ,
we have
a, g = bg − b = 0
for every g ∈ Gal(K s /K). Conversely, let a be in the left-kernel. For any
b ∈ A(K s ) with a = mb , we have
0 = a, g = bg − b.
Since this is true for every g ∈ Gal(K s /K) and since K is the fixed field of the
Galois group, we conclude b ∈ A(K) (see 10.4.2). So the left-kernel is equal to
mA(K).
1
Obviously, Gal K s /K( m A(K)) is contained in the right-kernel H .
On the other hand, let g be an element of the right-kernel. For b ∈ A(K s ) with
mb ∈ A(K), we have bg = b . It follows that the restriction of g to the residue
field K(b) is equal to the identity, hence the same is true for the restriction of g
344 THE MORDELL–WEIL THEOREM
to K( m1
A(K)). This proves H ⊂ Gal K s /K( m
1
A(K)) . We conclude that
equality holds.
Remark 10.4.6. It follows from Proposition 10.4.5 that the right-kernel is a closed
1
normal subgroup of Gal(K s /K). By Galois theory, K( m A(K)) is a Galois
extension of K . By the same Proposition 10.4.5, we conclude that the Kummer
pairing induces a non-degenerate pairing
A(K)/mA(K) × Gal K( m 1
A(K))/K −→ A[m]
(i.e. left- and right-kernel are zero). Thus in order to prove the finiteness
of the
1
group A(K)/mA(K), it is enough to show that Gal K( m A(K))/K is finite.
Proof of Theorem 10.4.1: By Lemma 10.4.3 and Proposition 8.7.2, we
1may assume
that A[m] ⊂ A(K). Since here K is a number field, we see that K m A(K) /K
is finite by Corollary 10.3.13. As we have seen in Remark 10.4.6, this is enough
to prove the weak Mordell–Weil theorem.
In this section, we give additional information to the previous sections. The reader may skip
it in a first read because the results are of minor importance in our book. First, we recall
Kummer theory of algebraic field extensions. For completeness, we give the classical inter-
pretation of the Kummer pairing in 10.4.4 in terms of Galois cohomology. This is essential
for a deeper understanding of the group A(K)/mA(K) , going beyond its finiteness. Then
we use Kummer theory to give a generalization of the proof of the weak Mordell–Weil
theorem working also for a curve over an algebraically closed field.
10.5.1. Let us recall the basic facts from Kummer theory.
Let K be a field and m ∈ N , m > 0 . Assume that the group µm (K) of m th roots of unity
in K has m elements (hence m is not divisible by char(K) ). A finite-dimensional exten-
sion L/K is called abelian of exponent m if L/K is a Galois extension and Gal(L/K)
is abelian and if the least common multiple of the orders of the elements of Gal(L/K)
is m . For S ⊂ K × , we denote by K(S 1/m ) the smallest subfield of K s containing
{α ∈ K s | αm ∈ S} and K . Moreover, we write K ×m for the subgroup of m th powers
in K × and, more generally, for any subgroup H of K × , we will write H m for the group
of m th powers in H .
10.5.4. We return to the case when A is an abelian variety over the field K . Here m ∈
Z \ {0} is not divisible by char(K) .
We note that the Galois group Gal(K s /K) is a profinite group. By 10.4.4, we have a short
exact sequence of Gal(K s /K) -modules
[m]
0 −→ A[m] −→ A(K s ) −→ A(K s ) −→ 0.
Here A(K s ) has the discrete topology. By 10.4.2, the fixed part of A(K s ) and A[m]
under the action of Gal(K s /K) are A(K) and A(K)[m] respectively. The first part of
the long exact cohomology sequence reads then
0 −→ A(K)[m] −→ A(K) −→ A(K) −→ H 1 (Gal(K s /K), A[m])
[m]
−→ H 1 (Gal(K s /K), A(K s )) −→ H 1 (Gal(K s /K), A(K s )),
where the last map is multiplication by m . For a ∈ A(K) , a closer look at the definition
of the map δ : a
→ δa shows that δa is the following derivation: Choose ba ∈ A(K s )
such that mba = a , and set
δa (g) = bga − ba , g ∈ Gal(K s /K).
Since a is K -rational and [m] is defined over K , δa (g) is indeed an element of A[m] . It
is easily seen that δa is a continuous derivation. We get a short exact sequence
0 −→A(K)/mA(K) −→ H 1 (Gal(K s /K), A[m])
−→H 1 (Gal(K s /K), A(K s ))[m] −→ 0
called the Kummer sequence. If A[m] ⊂ A(K) , then
H 1 (Gal(K s /K), A[m]) = Hom (Gal(K s /K), A[m])
(the action of the Galois group is trivial, use Leibniz’s rule). Hence the Kummer sequence
may be viewed as an analogue of the Kummer pairing when we do not assume A[m] ⊂
A(K) .
346 THE MORDELL–WEIL THEOREM
10.5.5. A basic
1 step inthe proof of the weak Mordell–Weil theorem is the finiteness of
the group K m A(K) /K . This was shown in Corollary 10.3.13, using the theorems of
Chevalley–Weil and Hermite. Next, we give an alternative proof of the finiteness, which is
also valid when K is the function field of a curve. Here m ∈ N \ {0} is always an integer
not divisible by char(K) and we also assume that the field K contains all m th roots of
unity.
By Kummer theory (see Theorem 10.5.2), there is a unique maximal abelian extension
K ab /K of exponent dividing m , contained in K .
Let v be a valuation on K , i.e. there is a non-archimedean absolute value | | on K with
v = − log | | . We assume v to be a non-trivial valuation and we will usually identify v
with the corresponding place of K .
Let Kvnr /K be the maximal subextension of K ab /K , which is unramified over v (see
B.2.8). By Kummer theory, we have the corresponding subgroup Hv := (Kvnr )×m ∩ K ×
of K × .
Lemma 10.5.6. Suppose that the characteristic of the residue field of v does not divide m .
Then
Hv = {α ∈ K | v(α) ∈ mv(K × )}.
Proof: Let α = β m ∈ Hv with β ∈ Kvnr . Since K(β)/K is unramified over v , m
divides v(α) in the value group of v (by Lemma B.2.6). On the other hand, let α ∈ K ×
be with m|v(α) in the value group of v . Again by Lemma B.2.6, K(α1/m ) is unramified
over K , whence K(α1/m ) ⊂ Kvnr . This proves α ∈ Hv and the claim.
Remark 10.5.7. Let M be a set of non-trivial valuations on K . An extension L/K is
said to be M - unramified if L/K is unramified over all v ∈ M . The unique maximal
nr
element KM /K in the set of M -unramified subextensions of K ab /K is given by
nr
+ nr
KM := Kv .
v∈M
divides m , then Lemma B.2.6 shows that L/K is unramified over v ∈ M and hence L is
nr
contained in KM = Lnr
M L (see Proposition B.2.3), where ML is the set of valuations on
L restricting to M in K .
10.5.8. Next, we need an analogue of the class group of a number field. We assume that M
satisfies the following finiteness condition
(F) For every α ∈ K × , we have v(α) = 0 up to finitely many v ∈ M .
An M -divisor is a finite formal sum
nv [v],
10.5. Kummer theory and Galois cohomology 347
By assumption (F), this is a finite sum. The set {divM (f ) | f ∈ K × } forms a subgroup
of DivM (K) . The quotient of Div(K) by this subgroup is called the M -class group of
K and will be denoted by ClM (K) .
For the next claim, we need the subgroup
UM := {α ∈ K × | v(α) = 0 for every v ∈ M }
of K × , which is the kernel of the homomorphism divM .
Proposition 10.5.9. Let m ∈ N \ {0} , let K be a field containing all m th roots of unity
and let M be a set of valuations on K, satisfying the finiteness condition (F) from 10.5.8
and with residue characteristics not dividing m . Then
nr m
[KM : K] = [UM : UM ] · |ClM (K)[m]|
(possibly ∞ ).
Proof: By Kummer theory (see Theorem 10.5.2) and Remark 10.5.7, we have
nr
[KM : K] = [HM : K ×m ].
For every α ∈ HM there is an element a ∈ DivM (K) with divM (α) = ma (see Remark
10.5.7). The map α
→ a yields a surjective homomorphism
HM −→ ClM (K)[m].
×m
Obviously, K is in the kernel. Let us consider the group homomorphism
ϕ : HM /K ×m −→ ClM (K)[m]
and let α be in its kernel. There is f ∈ K × such that
divM (α) = m divM (f ) = divM (f m ),
showing that α/f m ∈ UM . It follows that we have a natural exact sequence
m
0 −→ UM /UM −→ HM /K ×m −→ ClM (K)[m] −→ 0
and this proves the claim.
Lemma 10.5.10. Let K be a field with a set M of discrete valuations and let m be a
positive integer not divisible by any of the residue characteristics of v ∈ M . We denote by
ML the set of valuations on an extension L/K restricting to M . If [Lnr M L : L] < ∞ for
\S : K] < ∞ for every finite subset S ⊂ M .
nr
every finite extension L/K , then [KM
Proof: By Remark 10.5.7, we may assume that K contains all m th roots of unity.
Let πv be a local parameter of v and let L be the field extension of K generated by
1/m
{πv | v ∈ S} . Then L/K is a finite extension. It is obvious that, if w ∈ ML , w|v ∈ S ,
and α ∈ K × , then m belongs to the value group of w and that m divides v(α) . By
1/m
Lemma B.2.6, L(α1/m )/L is unramified over w . Therefore, L(HM \S ) is unramified over
w (see Definition B.2.8). By Kummer theory (see Theorem 10.5.2) and Proposition B.2.4,
348 THE MORDELL–WEIL THEOREM
1/m
this is true also for w ∈ ML , w | v ∈
/ S . Since L(HM \S ) has exponent dividing m , we
1/m
conclude that L(HM \S ) ⊂ Lnr
M L . The claim follows from
nr 1/m 1/mnr
KM \S = K (HM \S ) ⊂ L(HM \S ) ⊂ LM L .
Example 10.5.11. Let K be a number field, M := MK the canonical set of places, and S
a finite subset of M containing all archimedean places. In such a situation, L/K is called
unramified outside S, if the extension is (M \ S) -unramified. The group ClM \S (K)
is the class group of the S -integers, which is finite ([162], Theorem 2.7.1). UM \S is
the group of S -units, which is finitely generated by Dirichlet’s unit theorem in 1.5.13.
m nr
Therefore, UM \S is of finite index in UM \S . This proves the finiteness of [KM \S : K] .
Indeed, we may enlarge S to include all places where the residue characteristic divides m .
By Remark 10.5.7, we may assume that K contains all m th roots of unity. Then the result
follows from Proposition 10.5.9.
Example 10.5.12. Let k(X) be a function field of a projective geometrically irreducible
smooth variety X over the field k and let M = MX be its set of discrete valuations, as
in 1.4.6. If S is a finite subset of M (i.e. finitely many prime divisors), then A.9.18 shows
that
ClM \S (K) ∼= Pic(X \ S)
and
UM \S = {f ∈ k(X)× | supp(div(f )) ⊂ S}.
Here, we identify S with the union of its prime divisors. Assume that S = ∅ . Then
ClM (K) is isomorphic to the Picard group of X . We claim that UM ∼ = k× . By A.8.21,
it is clear that f ∈ UM is regular on X . Using geometric irreducibility and A.6.15, we
conclude that f is constant. By A.4.11, k has to be algebraically closed in k(X) proving
f ∈ k× .
Now let X = C be an irreducible projective smooth curve of genus g over an algebraically
closed field k . We assume that the positive integer m is not divisible by char(k) . Then
we claim that
[k(C)nr
M : k(C)] ≤ m .
2g
on A is finite and unramified (see Proposition 8.7.2). By Corollary 10.3.6, there is a finite
S ⊂ M such that K(P )/K is unramified outside S for any P ∈ A(K) with Q =
[m] P ∈ A(K) . This proves the claim.
Now we are ready to prove a more general version of the weak Mordell–Weil theorem.
Theorem 10.5.14. Let K be a number field or a function field of an irreducible curve over
an algebraically closed field. For any abelian variety A over K and any positive integer
m not divisible by char(K) , the quotient A(K)/mA(K) is finite.
Proof: In the function field case, we may assume that the curve is projective and smooth
(see A.13.2, A.13.3). By a finite base change, we may also assume that K contains all
m th roots of unity and that A[m] ⊂ A(K) (see Lemma 10.4.3, Lemma 1.4.10) . Now use
1
Examples 10.5.11, 10.5.12 and Proposition 10.5.13 to show that K( m A(K)) is a finite
extension of K . By Remark 10.4.6, this proves the claim.
Remark 10.5.15. If K is a number field or a function field of a curve over an algebraically
closed field, then we may choose S in Proposition 10.5.13 as follows. First, note that M
denotes then the set of standard non-archimedean places of K . By Proposition 10.3.10, we
may choose
S = {v ∈ M | v(m) ≥ 1 or A has bad reduction}.
Obviously, the first condition is only necessary in the number field case. Note also that S
is finite (see [44], 1.4.3).
In order to prove this fundamental theorem we need first the Fermat descent:
Lemma 10.6.2. Let G be an abelian group and let m ≥ 2 be a positive integer.
Let also be a real function on G satisfying
x − y ≤ x + y, mx = m x
for any x, y ∈ G . Assume that S is a set of representatives for G/mG , bounded
relative to by a constant C . Then for any x ∈ G , there is a decomposition
l
x= mi yi + ml+1 z,
i=0
defined over a finite field, we may reduce to the case where C is a geometrically irreducible
smooth projective curve with a k -rational base point.
Then we note that the weak Mordell–Weil theorem holds for K = k(C) . Indeed the
same proof as in Theorem 10.5.14 applies with the only exception that we have no longer
[k× : k×m ] = 1 but this index is trivially finite.
The same proof as in 10.6.4 then shows that the Mordell–Weil theorem holds for K =
k(C) . It suffices to remark that Northcott’s theorem holds by Proposition 9.4.16 because
the Northcott property (N) is valid for K (see Example 9.4.20).
11.1. Introduction
The assumption g ≥ 2 is crucial, the theorem fails for C = P1K and for elliptic
curves of positive rank. We may easily dispense with the other assumptions on
C . If C is any curve over the number field K , then A.13.2 and A.13.3 show the
existence of a finite extension L/K such that CL is birational to a finite disjoint
union of geometrically irreducible smooth projective curves Cj over L and we
may apply Faltings’s theorem to Cj (L) assuming that the genus of Cj is ≥ 2.
The above theorem was conjectured by Mordell (for the rational field K = Q )
at the end of his paper [207] proving the finite generation of the group of rational
points of an elliptic curve defined over Q . Its function field analogue was proved
by H. Grauert [129] in 1965 (the important earlier paper by Yu.I. Manin [189],
which claimed a proof of this result, was much later recognized to contain a serious
gap, which was eventually corrected by R. Coleman [69] in 1990). The Mordell
conjecture was at last proved by G. Faltings [113] in 1983, as a consequence of his
proofs of the Tate conjecture and the Shafarevich conjecture. He used Arakelov
theory on moduli spaces; we refer to the books of G. Faltings and G. Wüstholz
[116] and L. Szpiro [295] for details.
A completely new proof was then given by P. Vojta, first in the function field case
[308] and then in the arithmetic case [310]. In this chapter, we shall give Vojta’s
proof with the simplifications given in [29]. In view of the complicated proof, it
is worthwhile to give an outline of the basic ideas behind it.
The precursor of Vojta’s proof goes back to Mumford’s paper [211] of 1965, with
the results, which have been already described and proved in detail in Theorem
9.4.14. There we show that the height h∆ on C × C can be expressed, up to
bounded quantities, in terms of Néron–Tate heights on the Jacobian J of C . It
352
11.1. Introduction 353
Here we see directly that the quadratic form in the right-hand side of this equation
is indefinite, if the genus g is at least 2.
On the other hand, since ∆ is an effective curve on C × C , the height h∆ (P, Q)
is bounded below, away from the diagonal. This puts strong restrictions on the
pair (P, Q) because it means that P and Q, considered as lattice points in the
euclidean space J(K) ⊗ R , can never be too close to each other with respect to
the positive definite inner product determined by the canonical form. A simple
geometric argument now shows that the values of the height of rational points
on C , arranged in increasing order, grow at least exponentially. This is in sharp
contrast with the quadratic growth of rational points on elliptic curves, and shows
that rational points on curves of genus at least 2 are much harder to come by.
Mumford’s argument works over any field and is not limited to zero characteristic.
Since there are examples of curves of genus at least 2 over a function field of
positive characteristic having infinitely many rational points with height increasing
at an exponential rate, as shown here in 9.4.19 and the following examples, some
new idea was needed to attack the Mordell conjecture along Mumford’s line.
This new idea was provided by Vojta. We may look at other divisors than the
diagonal, and more precisely the set of divisors D for which the quadratic part of
the height hD (P, Q) is indefinite. If C is a general curve (in particular smooth),
the Néron–Severi group of numerical equivalence classes of divisors on the surface
C × C is generated by {P0 } × C , C × {P0 } (here P0 is any point on C ) and
the diagonal ∆ , so we have to deal with a divisor D numerically equivalent to
l {P0 }×C+m C×{P0 }+n ∆ (see [130], pp.285–286). If (l+n)(m+n) < g 2 n2 ,
the quadratic form associated to hD is indefinite, and, moreover, for large k , the
Riemann–Roch theorem shows that the linear system |kD| is not empty and of
positive dimension if gn2 < (l + n)(m + n). Thus, replacing D by a divisor
in the rational equivalence class of kD , we may suppose that D is effective and
therefore hD is bounded below away from the support of D . Hence the idea is to
choose the parameters l, m, n so as to force the quadratic form to be very negative
at (P, Q), and then use the fact that hD is bounded below to conclude that P and
Q have bounded height.
The new problem we face here is the fact that the choice of l, m, n depends on
the ratio of the heights of P and Q, so that we need to show that not only hD is
bounded below, but also that this lower bound has a sufficiently good uniformity
with respect to the divisor D . This difficulty was overcome by Vojta using the
arithmetic intersection theory [126] and the arithmetic Riemann–Roch theorem
obtained by H. Gillet and C. Soulé in [126] and [127].
354 FA LT I N G S ’ S T H E O R E M
As Vojta’s paper clearly shows, this idea is overly simple as such and there is
one more big obstacle to overcome. The difficulty is that the lower bound fails if
(P, Q) belongs to the support of the effective divisor D . In order to obtain a small
lower bound, the divisor D must be defined locally by means of equations with
small height, and there is little room to move it away from (P, Q). In characteristic
zero, by an appropriate use of derivations, we see that this is not too serious a
difficulty unless the divisor D goes through (P, Q) with very high multiplicity.
Note that in positive characteristic the argument using derivations fails (as it must),
and it is here that characteristic zero becomes part of the proof.
This situation is reminiscent of a familiar point, which occurs in many proofs in
diophantine approximation and transcendence theory, namely the non-vanishing at
specific points of functions arising from auxiliary constructions. In the classical
case, there are various techniques at our disposal: Roth’s lemma, which is arith-
metic in nature, the algebro-geometric Dyson’s lemma [102], and the powerful and
flexible product theorem of Faltings [114] (see Theorem 7.6.4).
In our situation, application of any of these methods requires the important con-
dition that the ratio of the heights of Q and P be sufficiently large in order to
conclude that the divisor D cannot go through (P, Q) with high multiplicity.
More precisely, Vojta uses a suitable extension of Dyson’s lemma for the product
of two curves and shows that, if (l + n)(m + n) is sufficiently close to gn2 and
(l + n)/(m + n) is sufficiently small, then any effective D as above does not
vanish too much at (P, Q), thereby completing the proof.
Faltings, in his solution of the Lang conjecture for subvarieties of abelian varieties,
uses his product theorem. In our particular situation, it shows that either (P, Q)
belongs to a finite union of proper product subvarieties of C × C , and in particular
P or Q belong to a finite set, or again D does not vanish too much at (P, Q).
The paper [29] simplified Vojta’s proof by showing that a direct application of
Roth’s lemma also suffices for obtaining the required small vanishing of D at
(P, Q). The proof uses only the elementary theory of heights developed in Chap-
ters 1 and 2 and replaces the difficult arithmetic Riemann–Roch theorem in Vojta’s
proof by the algebro-geometric Riemann–Roch theorem on the surface C × C and
by the classical Siegel lemma.
This chapter is organized as follows. In Section 11.2 we study Vojta divisors and
the associated heights. The short Section 11.3 extends Mumford’s method, already
presented in Chapter 9, to Vojta divisors, getting the required upper bound for the
height.
The short Section 11.4 gives a simple proof of a local version of Eisenstein’s the-
orem on the coefficients of Taylor series of algebraic functions, which is used to
control derivations.
11.1. Introduction 355
Section 11.5 introduces norms on certain spaces of power series and proves a gen-
eralization of Gauss’s lemma in 1.6.3. These results are used to give a less ad
hoc proof of the Eisenstein theorem, which is now derived as an application of
Banach’s fixed point theorem. Our goal in including this material here has been
to provide the reader with additional information, which may be useful in other
contexts.
Section 11.6 obtains the crucial lower bound for the height, in terms of the height
of a set of defining equations of the Vojta divisor and of its multiplicity or, more
precisely, index at (P, Q). Sections 11.7 and 11.8 construct a Vojta divisor of
small height, apply Roth’s lemma and show that the Vojta divisor has low multi-
plicity at (P, Q). Section 11.9 compares the upper and lower bounds so obtained
and completes the proof of Faltings’s theorem outlined before.
We conclude this chapter by describing in Section 11.10, without proofs, two fur-
ther important results. The first is Faltings’s big theorem [114], [115] dealing
with rational points of subvarieties of abelian varieties, and we give two applica-
tions. The second result, conjectured by Bogomolov in the case of curves, deals
with small points on subvarieties of abelian varieties and is due to S.W. Zhang
[342] and L. Szpiro, E. Ullmo, and S.W. Zhang [296]. The much easier corre-
sponding results for small points on subvarieties of a linear torus were treated in
some detail in Chapter 4. This section may be skipped in a first reading.
In the proof of Faltings’s theorem, we have assumed for simplicity that C has a
point P0 defined over K , and made P0 as part of our data; for example, we embed
C into its Jacobian by means of the Albanese map P → cl([P ] − [P0 ]). Thus
certain constants appearing in the proof will depend a priori not only on the curve
C , but also on the choice of P0 . This can be avoided by fixing instead a divisor D0
of small degree and small height (for example a suitable canonical divisor), and
working with the map P → cl((deg D0 )[P ] − D0 ) instead of the Albanese map,
at the cost of introducing additional complications in the application of Mumford’s
method, to get an upper bound for the height (for an application, see E. Bombieri,
A. Granville, and J. Pintz [33]).
The finiteness result of Faltings’s theorem is ineffective, in the sense that it pro-
vides no upper bound on the height of solutions. However, we can get bounds for
the number of solutions. An examination of the proof shows that solutions can
be viewed as the union of two sets, namely small solutions, for which an explicit
bound for the height can be given, and large solutions. In order to study large solu-
tions, we work in an euclidean space Rr , where r is the rank of the Mordell–Weil
group J(C)(K) of the Jacobian of C . The group J(C)(K)/tors is a lattice in
Rr and the euclidean metric induces the Néron–Tate height on this lattice. Now
the result of the Vojta construction is that any two large solutions in C(K) either
have comparable height within a constant factor, or determine two vectors at an
356 FA LT I N G S ’ S T H E O R E M
angle of at least say 40◦ , which suffices for proving finiteness in any cone with
center O and opening 40◦ . However, just to start Vojta’s method we need at least
two solutions in a cone and if there is only one solution we cannot say anything on
the height. Of course, in this case we obtain finiteness for free and a good bound
for the number of solutions.
We have chosen not to give explicit bounds for the constants c , C1 , C2 , . . . and
also other constants involved in the symbols and O( ), which appear in the
course of the proof. However, all such constants are effectively computable and,
at the end of Section 11.9, we mention some explicit bounds for the number of
solutions.
This chapter is based on Chapters 2, 8, 9, and 10.
Let C be an irreducible smooth projective curve of genus g over the field K with
a K -rational point. Obviously, this is no restriction for the proof of Faltings’s
theorem. Thus we begin by fixing a point P0 ∈ C(K). In fact, this section is
devoted to purely geometric properties of certain divisors on C × C , and we never
use the assumption that K is a number field.
By ∆ we denote the diagonal of C × C and for simplicity of notation we shall
also write
∆ := ∆ − {P0 } × C − C × {P0 }.
We study here properties of divisors on C × C , which are expressed as linear
combinations of the divisors {P0 } × C , C × {P0 }, and ∆ . It is worth noting
that, since we are in characteristic 0, for a general curve C , these three divisors
generate the full group of divisors of C × C up to algebraic equivalence, hence the
apparently special situation considered here is in fact typical for the general case.
Lemma 11.2.1. The following table gives the intersection numbers:
{P0 } × C C × {P0 } ∆
{P0 } × C 0 1 0
C × {P0 } 1 0 0
∆ 0 0 −2g
be, without mentioning the closed embeddings φB or ψ . Because they are defined
with a basis of global sections, no coordinate xj , xj , or yi vanishes identically on
C × C . Let us consider the condition:
(V1) δ1 := (d1 + M d)/N and δ2 := (d2 + M d)/N are positive integers.
By adding to d1 and d2 integers bounded by N , (V1) will be satisfied. This is
only a technical condition of minor import. Then we get a decomposition
V = (δ1 N {P0 } × C + δ2 N C × {P0 }) − dB
of the Vojta divisor V into the difference of two very ample divisors (Lemma
11.2.4 and Lemma 11.2.5). It follows from A.10.38 that for sufficiently large
δ1 , δ2 , d the following condition is satisfied:
(V2) The first cohomology groups of Jψ(C×C) (δ1 , δ2 ) and JφB (C×C) (d) van-
ish.
Here, as usual, JX denotes the ideal sheaf of a closed subvariety X and F(δ1 , δ2 ),
F(d) denote the tensor product of a sheaf F with the corresponding standard very
ample sheaf of a multiprojective space.
Now the long exact cohomology sequence shows that the natural maps
ψ ∗ : Γ (PnK × PnK , O(δ1 , δ2 )) −→ Γ (C × C, ψ ∗ O(δ1 , δ2 ))
and
φ∗B : Γ (Pm
K , O(d)) −→ Γ (C × C, O(dB))
are surjective (use Example A.10.20, A.10.22, and A.10.25). This is the property
we shall use in the following lemma.
Lemma 11.2.7. Suppose that V is a Vojta divisor satisfying (V1) and (V2). For
any global section s of O(V ), there are polynomials Fi (x, x ), i = 0, . . . , m,
bihomogeneous of bidegree (δ1 , δ2 ), such that
s = Fi (x, x )/yid C×C
(11.2)
for i = 0, . . . , m.
11.3. Mumford’s method 359
where the inner sum is over all solutions of l0 + · · · + lj = l ; the pij s are the
coefficients of the polynomial p . The sum of the terms with lλ = l and λ = 0
is simply pt · ∂l ξ ; since ∂l (p(x, ξ(x))) is identically 0 and the absolute value is
ultrametric, we get
|pt (0, ξ(0))| · |∂l ξ(0)| ≤ |p| max |∂l1 ξ(0)| · · · |∂lj ξ(0)| ,
where max runs over l1 + · · · + lj ≤ l with each lλ < l . On the other hand,
by hypothesis pt (0, ξ(0)) = 0 and we also have |ξ(0)| ≤ 1. This means that in
the last displayed inequality we need only consider products in which each lλ is at
least 1; noting that al = ∂l ξ(0) and |pt (0, ξ(0))| ≤ |p|, we get Theorem 11.4.1
by induction.
We treat the case in which | | is archimedean in a different fashion. We assume
that | | is normalized as usual. Let us abbreviate ξ (l) = ( dx
d l
) ξ . By induction on
l we establish that there is a polynomial ql (x, t) such that
ql (x, ξ(x))
ξ (l) (x) = −
pt (x, ξ(x))2l−1
for l ≥ 1 . We have q1 = px and
ql+1 = (ql )x p2t − (ql )t pt px + (2l − 1)ql (ptt px − pxt pt ) . (11.4)
Note that the partial degree of ql with respect to x (resp. t ) is bounded by l(2d −
1) − d (resp. l(2d − 2) + 2 − d ). For two polynomials f, g in any number of
variables, we have |f g| ≤ N · |f | · |g|, where N is the number of monomials in
g . Thus, from the recurrence (11.4), we estimate the Gauss norm |ql+1 | by
|ql+1 | ≤ A |p|2 |ql | ,
where
A = d4 (d + 1)2 (degx (ql ) + degt (ql )) + 2(2l − 1)d6 (d + 1)
≤ 8(d + 1)7 l .
Noting that |q1 | ≤ d |p|, this yields
|ql | ≤ (l − 1)! 8l−1 (d + 1)7l−6 |p|2l−1 .
362 FA LT I N G S ’ S T H E O R E M
In this section K is a field complete with respect to an absolute value | | . We shall obtain
here some simple but useful facts about norms of polynomials and power series, and deduce,
by an application of Banach’s fixed point theorem, a more general version of Theorem
11.4.1 covering the case of an algebraic function of several variables over a field of arbitrary
characteristic.
11.5.1. Let us fix n and let x = (x1 , . . . , xn ) . The ring of formal power series in x is
denoted by K[[x]] . For
f (x) := aα xα ∈ K[[x]]
α∈Nn
Then
K r−1 x := {f ∈ K[[x]] | f r < ∞}
is a Banach algebra over K , complete with respect to the norm r , satisfying f gr ≤
f r · gr . The spectral norm of f is defined by
ρr (f ) := inf f k 1/k
r .
k∈N\{0}
ρr (f ) = lim f k 1/k
r .
k→∞
Lemma 11.5.2. Given r > 0 , there is always a field extension L/K and an extension of
| | to L such that r is in the value group of this extension. In the non-archimedean case,
the value group of K is given by
×
|K | = {|α|1/m | α ∈ K × , m ∈ N \ {0}}.
×
Proof: We begin by proving the second claim. The inclusion ⊃ is clear. Now let β ∈ K
be a zero of the polynomial
f (t) := an tn + an−1 tn−1 + · · · + a0
with coefficients in K , not all 0. Since f (β) = 0 and | | is ultrametric, there must be two
distinct indices i , j such that
|ai β i | = |aj β j | = maximum.
This yields
|β| = |ai /aj |1/(j−i) ,
completing the proof of the second claim. The first claim is a consequence of the second
one and of [47], Ch.VI, §10, no.1, Prop.1. In the archimedean case, we do not need an
extension by Ostrowski’s theorem (see Theorem 1.2.6).
In view of this lemma, we see that going to a field extension we may always renormalize
everything so that r = (1, . . . , 1) .
Lemma 11.5.3. If the absolute value on K is not archimedean, then
ρr (f ) = sup |aα |rα .
α∈Nn
−1
(b) If K = C , then C{r x} is equal to the set of continuous complex functions on
P (r) , which are analytic in the interior and ρr is the supremum norm on P (r) .
(c) If K = R , then R{r−1 x} is the Banach subalgebra of C{r−1 x} given by the
functions that have Taylor series at 0 with real coefficients.
Proof: The non-archimedean case follows easily from Lemma 11.5.3. If K = C , then
Lemma 11.5.4 and Weierstrass’s theorem show that C{r−1 x} consists of continuous func-
tions on P (r) , which are analytic in the interior, and that ρr is the supremum norm on
P (r) .
Conversely, let f be a continuous function on P (r) , which is analytic in the interior.
Clearly, we may assume r1 = · · · = rn = 1 . Let aα xα be the Taylor series of f
at 0 . By the Cauchy formula and continuity of f , we have
n
1 f (y)
aα = ··· α 1 +1 dy1 · · · dyn .
2πi |y 1 |=1 |y n |=1 y1 · · · ynα n +1
We conclude that aα eiα·t is the Fourier series of the periodic function f (eit 1 , . . . , eit n ) ,
ti ∈ R . For β ∈ Nn , let us consider the partial sum
sβ (x) = ··· aα xα .
α 1 ≤β 1 α n ≤β n
By the generalization of Féjer’s theorem to several variables (cf. [343], Th.XVII.1.20), the
arithmetic means
1 1
σγ (eit 1 , . . . , eit n ) = ··· ··· sβ (eit 1 , . . . , eit n )
γ1 + 1 γn + 1
β 1 ≤γ 1 β n ≤γ n
f (x + x0 ) = g(x)
for every x ∈ P (r0 ) . If K is a field of characteristic 0 , we have
1 ∂ |α| f
∂α f (x0 ) = (x0 ),
α1 ! · · · αn ! ∂x1 · · · ∂xαn n
α1
which is the familiar formula for the coefficients of the Taylor expansion, and justifies the
notation ∂α for them. The Taylor series in the non-archimedean case cannot be used for
the purpose of analytic continuation, contrary to what happens if K = C .
For convenience, we now state the Cauchy inequalities in terms of the norm ρr :
Proposition 11.5.9. If f (x) = aα xα ∈ K{r−1 x}, then
|aα | ≤ ρr (f )r−α
for every α .
Proof: If | | is not archimedean, this follows from Lemma 11.5.3. If instead | | is
archimedean then, as noted before, the result in one variable is classical and due to Cauchy,
and in general follows by induction on the number of variables.
Proof: In the non-archimedean case, the claim follows directly from Lemma 11.5.3. If
instead | | is archimedean, we may assume that K = C . By Lemma 11.5.4, there is
x∗ ∈ P (tr) with |f (x∗ )| = ρtr (f ) . Since f vanishes to order k at 0 , the function
g(ζ) = ζ −k f (ζ x∗ /t) is an element of K{ζ} . By the maximum principle, we have
convergent in a neighborhood of 0 .
We give here a proof of the local Eisenstein theorem for a strictly convergent power
series p(x, t) . Our argument is as for the implicit function theorem in calculus, but nonethe-
less we give here a detailed proof.
Existence and uniqueness of ξ will follow from an application of Banach’s fixed point
theorem, and we will also get a polydisk P on which ξ is strictly convergent and an upper
bound for the supremum norm of ξ on P . Then Cauchy’s inequalities will give the desired
estimate for the Taylor coefficients aα .
We first recall Banach’s fixed point theorem:
Theorem 11.5.15. Let (X, d) be a complete metric space and let ϕ : X −→ X be a
contractive map, i.e. there is θ < 1 such that
d (ϕ(x), ϕ(y)) ≤ θ · d(x, y) for all x, y ∈ X.
Then ϕ has a unique fixed point.
Proof: We start with an arbitrary point x0 of X . Then we apply ϕ iteratively to define
xn := ϕn (x0 ) . The triangle inequality shows
l−1
l−1
θk
d(xk+l , xk ) ≤ d(xk+n+1 , xk+n ) ≤ θk+n d(x1 , x0 ) ≤ · d(x1 , x0 ).
n=0 n=0
1−θ
Hence we have a Cauchy sequence converging to x ∈ X and it follows from continuity of
ϕ that x is a fixed point. Uniqueness is clear for a contraction.
Proof of Theorem 11.5.14: We indicate by a dot the partial derivative ∂
∂t
, as in f˙ := ∂f
∂t
.
By the Cauchy inequalities (see Proposition 11.5.9), we have
B := S |ṗ(0, 0)| ρR,S (p)−1 ≤ 1.
If the absolute value is archimedean, we may assume that | | is the usual euclidean absolute
value. Let
g(x, t) := t − ṗ(0, 0)−1 p(x, t).
We shall apply Banach’s fixed point theorem to the map
ϕ : f (x)
−→ g(x, f (x)).
We have to choose an appropriate set X of power series and a metric d(x, y) in X which
satisfy the hypothesis of the Banach fixed point theorem. The set X will depend on positive
parameters r1 , . . . , rn , s to be chosen later, which at this moment are required to satisfy
rj /Rj ≤ s/S ≤ 1 and, in the archimedean case, the stronger condition rj /Rj ≤ s/S ≤
1/4 . As usual, we set r := (r1 , . . . , rn ) . We define
X = X(r, s) := {f ∈ K{r−1 x} | ρr (f ) ≤ s}.
The set X is a closed subset of K{r−1 x} , hence X is complete with respect to the supre-
mum norm ρr in the polydisk P (r) . We shall determine the parameters r, s in such a way
that ϕ is a contraction on X with respect to the distance function d(f1 , f0 ) = ρr (f1 −f0 ) .
In order to achieve this, we may always assume, after a suitable field extension as in Lemma
11.5.2, that rj , R, s, S ∈ |K| . This has the advantage that the spectral norm of the occuring
strictly convergent power series equals the maximum norm on the corresponding polydisk
(see Lemma 11.5.4). At the end of the argument for applying Banach’s fixed point theorem,
we will go back to the original K to be sure that ξ(x) will have coefficients in K .
First, we claim that for t, t0 ∈ K with |t| ≤ s, |t0 | ≤ s , we have
ρr (g(x, t) − g(x, t0 )) ≤ θ |t − t0 |, (11.6)
11.5. Norms and the local Eisenstein theorem 369
where
s
S
B −1 in the non-archimedean case
θ := 8s
S
B −1 in the archimedean case.
Suppose first that the absolute value is archimedean. By Ostrowski’s theorem in 1.2.6, we
may assume K = C . For x ∈ P (r) , we have
t
g(x, t) − g(x, t0 ) = ġ(x, u)du, (11.7)
t0
where the integral is on the line segment from t0 to t ; note that if |t0 |, |t| ≤ s then |u| ≤ s
follows by convexity, and also ġ is well defined in any closed polydisk in the interior of
P (R, S) . By construction, we have ġ(0, 0) = 0 . We apply Schwarz’s lemma from 11.5.11
and Cauchy’s inequality (see Corollary 11.5.10), getting (note that rj /Rj ≤ s/S ≤ 1/4
and B ≤ 1 )
2s 4s 4s 8s −1
ρr,s (ġ) ≤ ρ S r S (ġ) ≤ 2 ρ(R,S ) (g) ≤ (1 + B −1 ) ≤ B .
S 2s , 2 S S S
Using this bound in (11.7), we get (11.6).
In the non-archimedean case, let us introduce the operator Dt acting on power series in t
by f (t)
→ 1t (f (t) − f (0)) .
For x ∈ P (r) and |t| ≤ s , |t0 | ≤ s , an easy calculation shows
|g(x, t) − g(x, t0 )| ≤ ρ(r,s) (Dt g) |t − t0 |,
whence Schwarz’s lemma in 11.5.11 implies
s
|g(x, t) − g(x, t0 )| ≤ ρ(R,S ) (Dt g) |t − t0 |. (11.8)
S
We have
ρ(R,S ) (Dt g) ≤ S −1 ρ(R,S ) (g) ≤ B −1 .
Now (11.6) follows from this inequality and (11.8).
By Proposition 11.5.12, ϕ maps X to K{r−1 x} . The condition θ < 1 , as needed for the
application of Banach’s fixed point theorem, follows from (11.6) if Ss < B or Ss < B/8
according as we are in the non-archimedean or archimedean case.
In order to complete the proof, we need to check that ϕ maps X into itself for r, s suffi-
ciently small. This means checking that
|g(x, t)| ≤ s
for x ∈ P (r) and |t| ≤ s .
We begin by noting that, since g(0, 0) = 0 , Schwarz’s lemma from 11.5.11 yields
sup |g(x, 0)| ≤ λ sup |g(x, 0)| ≤ λSB −1 , (11.9)
x∈P (r) x∈P (R)
where λ = max{rj /Rj } . Suppose first that | | is archimedean. By (11.6) and (11.9) we
have
|g(x, t)| ≤ θs + |g(x, 0)| ≤ θs + λSB −1 . (11.10)
370 FA LT I N G S ’ S T H E O R E M
In the non-archimedean case, (11.6) on page 368 and the ultrametric inequality give
|g(x, t)| ≤ max (θs, |g(x, 0)|) ≤ max θs, λSB −1 . (11.11)
Now let r, s be given by
r1 rn s
= ··· = = a B2, = b B,
R1 Rn S
where we choose for example any 0 < a = b < 1 in the non-archimedean case, and
a = 1/32 and b = 1/16 in the archimedean case. Thus we see from (11.10) and (11.11)
that
ρ(r,s) (g(x, t)) ≤ s, θ<1
hold. We apply Banach’s fixed point theorem, obtaining a unique ξ ∈ K{r−1 x} with
p(x, ξ(x)) = 0, ρr (ξ) ≤ s.
This proves the existence of a solution ξ(x) satisfying the bound stated in the theorem.
It is also clear that ξ(0) = 0 . In fact, since ϕ maps the closed set
X0 := {f ∈ X | f (0) = 0}
into itself, we may apply Banach’s fixed point theorem to X0 as well. Uniqueness of the
fixed point shows that ξ ∈ X0 , hence ξ(0) = 0 .
We leave the proof of the uniqueness of ξ as a formal power series as an exercise for the
reader. For example, we can apply once more the Banach fixed point theorem, in the space
K[[x]] of formal power series with the norm
|f | = max e−|α| ,
a α =0
11.5.17. The bound given by the local Eisenstein theorem is sharp in the non-archimedean
case, and not far from the truth in the archimedean case.
Consider the polynomial p(x, t) = x − 2at + t2 , where a = 0 is a parameter satisfying
|a| ≤ 1 . The formal power series solution with ξ(0) = 0 is given by
0 ∞
1
ξ(x) = a − a 1 − x/a2 = (−1)j−1 2 a1−2j xj .
j=1
j
1
It is easy to see that (−1)j−1 22j−1 2
j
is a positive integer for j ≥ 1 .
Suppose that | | is not archimedean, with valuation ring R and residue field k(v) , and
suppose that |2| = 1 . Then
|2a|
|ṗ(0, 0)|/|p| = = |a|,
max(1, |2a|)
and it follows that ;
2j−1
∂p
|∂j ξ(0)| = |p| (0, 0)
∂t
1
whenever | j2 | = 1 . This indeed happens for infinitely many values of j if |2| = 1 .
Suppose this is not the case. We choose a ∈ K with |a| = 1 and look at the reduction
π(ξ(x)) of ξ(x) ∈ R[[x, a−1 ]] modulo the maximal ideal of R . Then this reduction would
be a polynomial in x and a−1 , contradicting the fact that the polynomial x − 2π(a)t + t2
is irreducible over k(v)(x) .
The same example shows that, apart for the value of the numerical constant C , the bound
given in the local Eisenstein theorem is also sharp in the archimedean case.
Let C be an irreducible projective smooth curve over a number field K and let
us fix P0 ∈ C(K). Let V be a Vojta divisor satisfying (V1) and (V2). We are
interested in getting a lower bound for the height hV (P, Q), where P, Q ∈ C(K).
This is obtained by means of Lemma 11.6.5 and Lemma 11.6.7. The first lemma
gives an explicit lower bound in term of the Taylor coefficients of local coordinates
for C viewed as algebraic functions of a uniformizing parameter of C at P or Q.
The second lemma applies the local Eisenstein theorem to bound these Taylor
coefficients. The reader is assumed to be familiar with the concept of tangent
spaces provided by A.7.
11.6.1. Let ∂, ∂ be non-zero vectors in the tangent space of C at P, Q. As in
11.5.7, we abbreviate
1 1
∂i := ∂ i , ∂i := ∂ i .
i! i!
Any differential operator on OC×C,(P,Q) of degree k with values in K(P, Q) is
a homogeneous polynomial of degree k in the variables ∂, ∂ with coefficients in
K(P, Q). In fact, they act on K(C × C). The advantage of the normalizations
372 FA LT I N G S ’ S T H E O R E M
above is that Leibniz’s rule has an easier form. If f1 , . . . , fr are rational functions
on C , then
∂i (f1 · · · fr ) = ∂i1 f1 · · · ∂ir fr ,
where the sum ranges over all (i1 , . . . , ir ) with i1 + · · · + ir = i. This is easily
proved by induction on r , starting with r = 2.
11.6.2. Let s ∈ Γ(C × C, O(V )) \ {0}. A pair (i∗1 , i∗2 ) ∈ N2 is called admissible
if and only if
∂i∗1 ∂i∗2 s(P, Q) = 0
and
∂i1 ∂i2 s(P, Q) = 0
whenever i1 ≤ i∗1 , i2 ≤ i∗2 and (i1 , i2 ) = (i∗1 , i∗2 ). In order to make sense of the
above formulas, we should choose a trivialization of O(V ) in (P, Q). It is also
clear that admissibility is independent of the choice of the trivialization and the
choice of ∂, ∂ . By 11.2.6
ξij := (xi /xj )|C , ξij := (xi /xj )|C
are well-defined non-zero rational functions on C for i, j = 0, . . . , n. We also
write ξ j for the vector with components ξij , i = 0, . . . , n, and similarly for ξ j .
11.6.3. We are going to choose an explicit height function relative to O(V ).
Choose a finite extension L/K such that P, Q ∈ C(L). We use the decompo-
sition
V = (δ1 N {P0 } × C + δ2 N C × {P0 }) − dB
of V as a difference of two very ample divisors (see 11.2.6). Now we have gener-
h
ating sections xh x (|h| = δ1 , |h | = δ2 ) of O(δ1 N {P0 }×C +δ2 N C ×{P0 })
and yi (|i| = d) of O(dB). With respect to the presentation (see 2.2.1)
h
sV ; O(δ1 N {P0 } × C + δ2 N C × {P0 }), xh x ; O(dB), yi ,
we have the global height function (see 2.4.6)
xh x h
hV (P, Q) : = max min log (P, Q)
h,h i yi
v∈ML v
δ1 δ2
xj xj
= max min log (P, Q) .
j,j i yid
v∈ML v
Note that the vectors x(P ), x (Q) and y(P, Q) are only defined up to a multiple.
By the product formula, hV (P, Q) is well-defined; it is the difference of two Weil
heights, the first given by the closed embedding
h
(P, Q) −→ xh (P )x (Q) |h|=δ1 , |h |=δ2
11.6. A lower bound for the height 373
− max
max log |∂i ξνj
(Q)|v
{iλ } ν vλ
v∈ML λ
− (δ1 + δ2 + i∗1 + i∗2 ) ,
where {iλ } and {iλ } run over all partitions of i∗1 and i∗2 .
Proof: We fix trivializations of OPn (1) at P and Q and of OPm (1) at (P, Q). By
tensor product, this gives trivializations of all the bundles in question, in particular
of V by 11.2.3 and 11.2.6. In the following, we use the trivializations without
mention. For example s is viewed as a regular function at (P, Q).
We have seen in 11.6.3 that
yd
hV (P, Q) = − max min log δ i δ2 (P, Q) .
v
i j,j xj 1 x
j v
Assume that xj (P )xj (Q)yi (P, Q) = 0. Only such i, j, j are of importance in
the above formula. By admissibility and Leibniz’s rule, we have
yid yid
∂i∗ ∂i∗ s (P, Q) = ∂i∗1 ∂i∗2 s (P, Q)
xδj 1 xj δ2 1 2 xδj 1 xj δ2
and the right-hand side equals
∂i∗1 ∂i∗2 Fi (ξ j , ξ j )(P, Q).
Using (∂i∗1 ∂i∗2 s) (P, Q) = 0 and the product formula, we get
hV (P, Q) = − max min log∂i∗1 ∂i∗2 Fi (ξ j , ξ j )(P, Q)v .
i j,j
v
374 FA LT I N G S ’ S T H E O R E M
Now, since for each v we take the minimum with respect to j and j , we may
take instead j = jv and j = jv . This remark is quite important in what follows.
Let us consider log |∂i∗1 ξ jl (P )|v for v ∈ ML . By Leibniz’s rule, we have
n
lν
∂i∗1 ξ jl = ∂iµ ν ξνj ,
ν=0 µ=1
where iµν = i∗1 . Since the total number of pairs µν equals δ1 , the number
µν
∗ ∗
of possibilities for iµν equals δ1 +ii∗1 −1 ≤ 2δ1 +i1 . We are interested in the case
1
in which j = jv and j = jv . We note that, since |ξjv 0 (P )|v is the largest
|ξj0 (P )|v , we have |ξνjv (P )|v = |ξν0 (P )/ξjv 0 (P )|v ≤ 1 for every ν , allowing
us to get rid of the terms with iµν = 0 in estimating derivatives by means of
Leibniz’s rule; the same remark of course applies to |ξ jv |v .
This gives the bound
l
log ∂i∗1 ξ jv (P ) v ≤ max max log |∂iλ ξνjv (P )|v + (δ1 + i∗1 ) εv log 2,
{iλ } ν
λ
By assumption and A.4.11, K(C) is a finite extension of K(f ). Choose gij (x, t) ∈
K[x, t] such that gij (f, t) ∈ K(f )[t] is a minimal polynomial of ξij over K(f ).
Since char(K) = 0, we have ∂t ∂
gij (f, ξij ) = 0 in K(C) (irreducible polynomi-
als are separable). Let deg(gij ) be the total degree of gij (x, t).
11.6. A lower bound for the height 375
max max log |∂iλ ξνjv (P )|v i∗1 |j(P )|2 + 1 ,
{iλ } ν
v∈ML λ
where the maximum runs over all partitions {iλ } of i∗1 . The constant implied in
the symbol is independent of P and i∗1 .
Proof: We denote by a dot differentiation with respect to t , as in f˙ = ∂f
∂t .
It is clear that (f, ξij ) may be viewed as the first two affine coordinates of a
morphism ϕ from C into some projective space. Since N [P0 ] is ample on C ,
there is k ∈ N such that O(kN [P0 ]) ⊗ ϕ∗ O(−1) is very ample (see A.6.10). By
Theorem 2.3.8, this proves
h ((1 : f (P ) : ξij (P ))) ≤ hϕ (P ) hN [P0 ] (P ) + 1. (11.14)
376 FA LT I N G S ’ S T H E O R E M
+ 1
≤ i∗1 log(C2 ) + max max(2iλ − 1) log |pνjv | + log
+ .
v
{iλ } ν ṗνjv (0, 0) v
λ
∗ ∗ + 1
2i1 h(g) + 2i1 log
ij v
|ġij (f (P ), ξij (P ))|v
+ 2i∗1 log(C2 C1 ) + 2i∗1 C3 (hN [P0 ] (P ) + 1),
where h(g) is the height of the vector formed with the coefficients of all polyno-
mials gij and 1, and where C3 is the largest constant involved in (11.14) times
the maximum of all deg(gij ). Now the claim follows from
N
hN [P0 ] (P ) = |j(P )|2 + O(|j(P )|) + O(1),
2g
which is a consequence of Proposition 11.3.1 applied to the Vojta divisor N {P0 }×
C , and from the fact that, by (11.14) and h(a) = h(1/a), we have
1 1
log+ = h
ij v∈ML
|ġij (f (P ), ξij (P ))|v ij
ġij (f (P ), ξij (P ))
= h(ġij (f (P ), ξij (P )))
ij
h((1 : f (P ) : ξij (P ))) hN [P0 ] (P ) + 1.
In this section, we prove the crucial Lemma 11.7.3, which gives the existence
of a section of O(V ), with V a Vojta divisor, of small height. The argument
is fairly standard. The space of sections of O(V ) is presented as a subspace,
given by linear relations with small height, of a vector space with a standard basis.
The Riemann–Roch theorem shows that this subspace has large dimension, and
the existence of a small section follows by Siegel’s lemma or, equivalently, by
geometry of numbers.
Let C be an irreducible projective smooth curve of genus g over a number field K
and let us fix P0 ∈ C(K). We shall use the notation of Section 11.2, in particular
V will be a Vojta divisor satisfying (V1) and (V2).
11.7. A Vojta divisor of small height 377
because an effective divisor cannot have a negative intersection number with the
ample divisor H . By the Riemann–Roch theorem in A.13.9, we get
1
dim Γ(C × C, O(V )) ≥ V · (V − KC×C ) + 1 + pa (C × C),
2
where pa (C × C) is the arithmetic genus. Again by Lemma 11.2.1, we get
V · V = 2d1 d2 − 2gd2 .
Together with
V · KC×C = (2g − 2)V · H = (2g − 2)(d1 + d2 ),
we get the second claim.
11.7.2. We need now another assumption for the parameters of the Vojta divisor.
(V3) d1 + d2 > 4g − 4 and d1 d2 − gd2 > γ d1 d2 for some γ > 0.
Here, γ is independent of d1 , d2 , d . As we have seen at the beginning of A.11.6,
we may map C × C by a birational morphism onto a hypersurface of degree D in
P3K . We denote the projective coordinates in P3K by z and we may assume that
the polynomial giving the hypersurface has the same form as in A.11.7 or in 2.5.1.
All presentations (see 2.5.4) will refer to this set up.
Now let U be the subspace of Γ(C × C, O(V )) consisting of sections of the form
−d
s = yi Fi (x0 , x1 , x2 ; x0 , x1 , x2 )
C×C
N −3 N −3
N δ1 −
2
δ2 − .
2 2
Since f is irreducible over K(x0 , x1 ), we see that the restrictions to C × C of the
above monomials are linearly independent over K . By Lemma 11.2.7, the vector
space Γ(C × C, O(V )) may be identified with
$ %
W := (Fi ) ∈ Γ(C × C, O(δ1 , δ2 ))m+1 | (Fi ) satisfies (11.2) on page 358 .
We consider equation (11.16) as a linear system, where the unknowns are the co-
efficients ai = (aiββ ) of the polynomials
β
Fi (x0 , x1 , x2 ; x0 , x1 , x2 ) = ··· aiββ xβ x .
for i = 0, . . . , m, where the coefficients Liαα (aj ) are linear forms with un-
knowns the vector aj of coefficients of the polynomial Fj . Here the indices α,
α range over
α = (α0 , α1 , α2 ), α = (α0 , α1 , α2 ),
|α| = δ1 + d · d1 (p), |α | = δ2 + d · d2 (p),
α2 < N, α2 < N.
It remains to compute a bound for the height of the linear forms Liαα . This is
a routine procedure, which we have already used in Chapter 2 in analyzing the
height of presentations, specifically in Lemma 2.5.6. More precisely, let q be the
presentation
β β β
q = (xβ0 0 xβ1 1 xβ2 2 x0 0 x1 1 x2 2 ), |β| = δ1 , |β | = δ2 , β2 < N, β2 < N.
We consider the new presentation
(pdi qββ ), i = 0, . . . , m
with β , β as above. Using the equations f (x0 , x1 , x2 ) = 0 and f (x0 , x1 , x2 ) =
0, we express the restriction to C × C of monomials appearing with exponents
greater or equal to N as linear combinations of monomials in x0 , . . . , x2 involv-
ing only x2 and x2 with exponents strictly less than N . In this way we obtain the
new presentation
⎛ ⎞
α
⎝ Liαα ,ββ xα x ⎠,
α,α
where the Liαα ,ββ are the coefficients of the linear form Liαα (a).
Now the same proof as in Lemma 2.5.6 for bigraded presentations of morphisms
shows that the height of the linear forms Liαα (a) is bounded by
d h(p) + h(q) + o(d) = d h(p) + o(d).
11.8. Application of Roth’s lemma 381
Here we have used our special model and F ∈ U to get h(q) = 0, otherwise
the upper bound would have been only d1 + d2 + d , not sufficient for the
applications we have in mind.
From (11.17) we have to solve the linear system
L0αα (ai ) = Liαα (a0 )
with i = 0, . . . , m and α, α as above. The number of unknowns is
N −3 N −3
(m + 1)N 2 δ1 − δ2 − ≤ (m + 1)N 2 δ1 δ2 .
2 2
By (11.15) on page 379 and assumption (V3), the dimension of the space of solu-
tions is bounded below by
d1 d2 − gd2 − O(d1 + d2 ) ≥ γd1 d2 − O(d1 + d2 ).
There is a constant C4 > 0 such that this is bounded below by γ2 d1 d2 for d1 , d2 ≥
C4 /γ . Therefore, by Siegel’s lemma in Corollary 2.9.9 (even the simple version
in Corollary 2.9.2 is enough if we replace the height h(p) by log maxσ,i |σ(pi )|),
there is a solution yielding an F = (Fi ) ∈ U m+1 ∩ W with
δ1 δ2
h(F) ≤ 2(m + 1)N 2 (h(p)d + log(δ1 ) + log(δ2 ) + o(d)) .
γd1 d2
Since d1 d2 > gd2 , we easily get
δ1 δ2 d δ1 δ2 log(δi ) 0
= O(d1 + d2 ) and = O( d1 + d2 log(d1 + d2 ))
d1 d2 d1 d2
proving our claim.
Let C be an irreducible projective smooth curve over the number field K and
let P0 ∈ C(K). With the notation introduced in Section 11.2, let V be a Vojta
divisor satisfying (V1) and (V2). Let Fi (x, x ), i = 0, . . . , m denote bihomoge-
neous polynomials of bidegree (δ1 , δ2 ), describing a non-trivial global section s
of O(V ) as in Lemma 11.2.7, hence
s = Fi (x, x )/yid |C×C
for i = 0, . . . , m. We are looking for an upper bound of the admissible pair
(i∗1 , i∗2 ) in the point (P, Q) ∈ (C × C)(K), defined in 11.6.2. The idea is to
project down to P1K × P1K to get a bihomogeneous polynomial instead of s and
then apply Roth’s lemma to that polynomial. In 11.8.2–11.8.5, we describe the
push-down of syid and show that it has similar properties as s . Lemma 11.8.6, the
goal of this section, is the application of Roth’s lemma.
382 FA LT I N G S ’ S T H E O R E M
where H denotes a hyperplane in PnK . Now from (11.1) on page 357 it follows
that the divisor div (Fi |C×C ) is rationally equivalent to δ1 N {P0 }×C +δ2 N C ×
{P0 }, while N {P0 } × C is in the class of H × PnK |C×C . Using Lemma 11.2.1
for C × C , we get k2 = N 2 δ2 . Similarly, we have k1 = N 2 δ1 .
Remark 11.8.3. Let Norm denote the norm with respect to the field extension
K(C × C)/K(P1K × P1K ). Then we may choose
Gi (ξ1 , ξ1 ) = Norm(Fi (ξ, ξ )). (11.18)
In order to see this, note that
div(Norm(Fi (ξ, ξ )) = (π × π)∗ div(Fi (ξ, ξ )))
= div(Gi (x, x )) − (π × π)∗ div(xδ01 x0 2 |C×C )
δ
proves (11.18).
Lemma 11.8.4.
h(Gi ) ≤ N 2 h(Fi ) + O(δ1 + δ2 ).
Proof: Using 11.8.1, it is clear that 1, ξ2 , . . . , ξ2N −1 form a basis of K(C) over
K(ξ1 ). There are ajk ∈ K(ξ1 ), k = 0, . . . , N − 1, such that
N −1
ξj = ajk ξ2k , j = 3, . . . , n
k=0
and a similar relation holds for ξj .
There are polynomials b(x0 , x1 ), bjk (x0 , x1 ) ∈ K[x0 , x1 ], all homogeneous of
the same degree N , such that ajk = bjk /b ; then we have a presentation p of the
closed embedding C −→ PnK given by
⎧
⎪ −1
⎪
⎪ bxN
0 if j = 0
⎨ N −2
bx0 xj if j = 1, 2
pj (x0 , x1 , x2 ) =
⎪
⎪ N
−1
⎪
⎩ bjk xN0
−k−1 k
x2 if j ≥ 3.
k=0
The relations
pj xj
=
p0 x0 C×C
are obvious, hence p is indeed a presentation of degree N = N + N − 1. By
Lemma 2.5.6, any monomial in ξ1 , . . . , ξn of degree ≤ δ1 has the form
1
ck1 k2 ξ1k1 ξ2k2
b(ξ1 )δ1
k1 +k2 ≤N δ1
k2 <N
384 FA LT I N G S ’ S T H E O R E M
with ck1 k2 ∈ K and h(c) ≤ δ1 h(p) + O(δ1 ) = O(δ1 ). We get a similar ex-
pression for a monomial in ξ1 , . . . , ξn of degree ≤ δ2 . By Proposition 1.6.2, any
monomial in ξ, ξ of bidegree ≤ (δ1 , δ2 ) has the form
1 k k
ck1 k2 k1 k2 ξ1k1 ξ1 1 ξ2k2 ξ2 2
b(ξ1 )δ1 b (ξ1 ) k +k ≤N δ k +k ≤N δ
δ2
(11.19)
1 2 1 1 2 2
k2 <N k2 <N
with h(c) ≤ O(δ1 + δ2 ). Using the very definition of the height of a polynomial,
it becomes clear that Fi (ξ, ξ ) may be written in the form (11.19) with
h(c) ≤ h(Fi ) + O(δ1 + δ2 ). (11.20)
Similar considerations also apply to the polynomials ξ k2 ξ2k Fi (ξ, ξ ) instead of
Fi (ξ, ξ ), we get expressions as in (11.19) with bounds as in (11.20).
The computation of the norm is done using the basis ξ2k ξ2k , 0 ≤ k, k < N , of
K(C × C) over K(ξ1 , ξ1 ). Since we may assume Gi (ξ1 , ξ1 ) = Norm(Fi (ξ, ξ ))
(cf. Remark 11.8.3), it is the determinant of a N 2 × N 2 matrix A with entries
Aµν (ξ1 , ξ1 ) ∈ K(ξ1 , ξ1 )
whose numerators Bµν ∈ K[ξ1 , ξ1 ] have degree of order O(δ1 + δ2 ) and height
bounded by h(Fi )+O(δ1 +δ2 ). So far, we have given the argument for the height,
but if we follow the arguments carefully then we see that
|Bµν |v ≤ Cvδ1 +δ2 |Fi |v
and Cv = 1 for all but finitely many v ∈ MK . Therefore, there are B1 , B2 ∈
K[ξ1 , ξ1 ] of degree O(δ1 + δ2 ) such that
Gi = B1 /B2
and, with a new Cv still such that Cv = 1 for all but finitely many v ∈ MK , we
have 2
|B1 |v ≤ Cvδ1 +δ2 |Fi |N
v
for all v ∈ MK . This shows
h(B1 ) ≤ N 2 h(Fi ) + O(δ1 + δ2 ).
Then appealing to Theorem 1.6.13 we get
h(Gi ) ≤ N 2 h(Fi ) + O(δ1 + δ2 ).
This proves the claim.
Lemma 11.8.5. There is a bihomogeneous polynomial E(x0 , x1 , x0 , x1 ), of bide-
gree (N d1 , N d2 ), with the following properties:
This follows from the fact that the last coefficient of the minimal polynomial equals
the norm up to a sign. By Leibniz’s rule, we see that
∂i1 ∂i2 F (P, Q) = 0
for some i1 ≤ j1∗ , i2 ≤ j2∗ . Since (y) does not vanish at (P, Q), an admissible
pair for F in (P, Q) is the same as an admissible pair for s in (P, Q). This proves
(b).
Let Hi be the bihomogeneous polynomial of bidegree (N M, N M ) such that
(π × π)∗ div(yid |C×C ) = div(Hi ).
Then we may assume
EHid = Gi
as above. By Theorem 1.6.13, we have
h(Hid ) = O(d)
and
h(E) + h(Hid ) = h(Gi ) + O(d1 + d2 + d).
Together with Lemma 11.8.4, we conclude
h(E) ≤ N 2 h(Fi ) + O(d1 + d2 + d)
proving (c).
Lemma 11.8.6. There is√a constant C6 > 0, independent of d1 , d2 , d , and γ ,
such that for 0 < ε ≤ 1/ 2 , for any Vojta divisor satisfying (V1), (V2), (V3), with
d2 ≥ C4 /γ, d2 /d1 ≤ ε2
and for any P, Q ∈ C(K) with
d1
min d1 hN [P0 ] (P ), d2 hN [P0 ] (Q) ≥ C6 2 (11.21)
γε
there exists a global section s of O(V ) with an admissible pair (i∗1 , i∗2 ) in (P, Q)
such that
d1 + d2 i∗1 i∗
h(F) ≤ C5 , + 2 ≤ 4N ε.
γ d1 d2
Proof: By Lemma 11.7.3, there is a non-zero global section s of O(V ) with
d1 + d2
h(F) ≤ C5 .
γ
We apply 11.8.1–11.8.5 to this s . The goal is to use Roth’s lemma in 6.3.7 for the
polynomial E(ξ1 , ξ1 ). By Lemma 11.8.5, the partial degrees of E are bounded
by r1 := N d1 and r2 := N d2 , respectively, and
d1
h(E) + 8r1 (11.22)
γ
11.9. Proof of Faltings’s theorem 387
d1 + d2
−C9 · + i∗1 |z|2 + i∗2 |w|2 + i∗1 + i∗2
γ
d1 2 d2
≤ |z| + |w|2 − d z, w + O(d1 |z| + d2 |w| + d1 + d2 ).
2g 2g
388 FA LT I N G S ’ S T H E O R E M
√
Now we also assume that there is an ε , 0 < ε ≤ 1/ 2, with
d2 /d1 ≤ ε2 (11.24)
and
d1
min (d1 hN [P0 ] (P ), d2 hN [P0 ] (Q)) ≥ C6 (11.25)
γε2
as in Lemma 11.8.5. Applying this lemma, we get
d1
−C9 · 2 + 4N εd1 |z|2 + 4N εd2 |w|2
γ
(11.26)
d1 2 d 2
≤ |z| + |w|2 − d z, w + O(d1 |z| + d2 |w| + d1 ).
2g 2g
For a small positive number γ0 < 1 and D ∈ N , we choose
√ D
d1 = g + γ0 + O(1),
|z|2
√ D
d2 = g + γ0 + O(1)
|w|2
and
D
d= + O(1).
|z| |w|
The O(1) terms are for small adjustments so that d1 , d2 , d, δ1 , δ2 are all non-
zero natural numbers. This is a choice which makes (11.26) relatively sharp and
fulfills (V1), (V2), (V3), for D sufficiently large as a function of |z|, |w|, γ0 . It
is immaterial here how this notion of D being large depends on |z|, |w|, or γ0 ,
since in the end we shall let D → ∞ . Note that we have
d1 d2 − gd2 ≥ γ d1 d2
for
γ0
γ= + o(1),
g + γ0
where the term implicit in o(1) tends to 0 as D → ∞ . Using
d2 |z|2
= + o(1),
d1 |w|2
condition (11.24) becomes
|z|
≤ ε + o(1). (11.27)
|w|
As remarked in the proof of Lemma 11.6.7, we have
N 2
hN [P0 ] (P ) = |z| + O(|z|) + O(1),
2g
11.9. Proof of Faltings’s theorem 389
1 g + γ0 z, w 1 1
−C11 · +ε D ≤ D − D +O + D +o(D),
γ0 |z|2 g |z| |w| |z| |w|
for a certain constant C11 depending only on C and P0 . Assuming (11.28), we
divide by D , let D tend to ∞ , simplify, and find after rearranging terms
√
z, w g + γ0
− ≤ C12 ε (11.29)
|z| |w| g
with C12 depending only on C, P0 . We still need conditions (11.28) and |z| <
ε |w|, the limit of (11.27) for D → ∞ . To this end, we choose first γ0 so small
that √
3 g + γ0
− > 0,
4 g
and then ε so small that
√
3 g + γ0
− > C12 ε. (11.30)
4 g
√
Here we have used 3
4 > 2
2
and g ≥ 2. Let
<
γ0
C7 > C10 / ε
g + γ0
and
1
C8 > .
ε
For P, Q ∈ C(K) satisfying
|z| ≥ C7 , |w| ≥ C8 |z|,
(11.27) and (11.28) are both satisfied and inequality (11.29) holds. Finally, because
of (11.30) and C10 ≥ 1, we see that (11.29) implies
z, w 3
≤ ,
|z| |w| 4
proving the theorem.
Now we are ready to prove Faltings’s theorem (see Theorem 11.1.1).
Proof: We may assume that C has a base point P0 ∈ C(K). From Theorem
9.3.5 and the Mordell–Weil theorem in 10.6.1, we know that J(K) ⊗Z R is a
finite-dimensional euclidean space. By Lemma 5.2.19, we may cover it by finitely
390 FA LT I N G S ’ S T H E O R E M
many cones T centered at 0 with angle α/2 from the axis to the ending, where
cos α > 34 . Let C7 , C8 be the constants in Theorem 11.9.1.
By Proposition 9.4.5 (Mumford’s gap principle with ε ≤ 2g cos α − 3), there is
a constant C13 , depending only on C and P0 , such that for any pair of distinct
points P , Q in a same cone T , with C13 ≤ |j(P )| ≤ |j(Q)|, we have
Let C14 = max(C7 , C13 ). The set of K -rational points in the ball with center 0
and radius C14 is finite by Northcott’s theorem in 2.4.9, so it remains to see that
By (11.31), we have
|j(Pi+1 )| ≥ 2 |j(Pi )|,
yielding
|j(Pk )| ≥ 2k |j(P0 )|.
On the other hand, Theorem 11.9.1 shows that
This proves
k ≤ log(C8 )/ log 2
and the theorem.
11.9.3. In much the same way as in Section 9.4, Lemma 5.2.19 shows easily that the num-
ber of cones may be bounded by 7r , where r is the rank of the Mordell–Weil group over
K of the Jacobian of C . Hence C(K) is the union of two finite sets, namely the set of
small points P with h(P ) ≤ C14 , and the set of large points P with h(P ) > C14 .
The former set is finite by Northcott’s theorem, and the latter contains not more than
(log(C8 )/ log 2 + 1) 7r elements. The constants C14 and C8 are effectively com-
putable. As mentioned in the introduction to this chapter, we can dispense with the choice
of a K -rational point P0 . It turns out that we can take C8 = C15 g C 16 and C14 =
C17 g C 18 (h(C) + 1) for suitable absolute effectively computable constants C15 , . . . , C18 ;
here h(C) is the height of a presentation of C by means of (for example) a bicanonical
closed embedding. A sketch of the additional arguments needed can be found in [33]. In
any case, it is noteworthy that the bound for the height of small solutions is independent of
the field K and is linear in h(C) + 1 , and that the dependence on the field K shows up
only through the rank of the Mordell–Weil group.
11.10. Further developments 391
Inspired by Vojta’s proof, G. Faltings [114], [115], proved the following generalization of
the Mordell conjecture called now Faltings’s big theorem:
Theorem 11.10.1. Let A be an abelian variety over a number field K , let Γ = A(K)
and let X be a geometrically irreducible closed subvariety of A , which is not a translate
of an abelian subvariety over K . Then X ∩ Γ is not Zariski dense in X .
We will not prove this theorem here, since the proof is better understood in the language of
arithmetic geometry and we refer instead to the original papers by Faltings [114], citeFa2, as
well as to B. Edixhoven and J.-H. Evertse [96] or P. Vojta [313]. We state two applications of
this theorem, with only a sketch of proofs. For the properties of curves needed to understand
the arguments, we refer to [13]. The first result is due to J. Harris and J.H. Silverman [147],
Th.2, Cor.3.
11.10.2. Recall that a smooth geometrically irreducible curve C of genus g ≥ 2 , defined
over a number field K , is hyperelliptic if it is a double cover of P1L and bi-elliptic if it is a
double cover of an elliptic curve over some finite extension L/K . The hyperelliptic cover
is indeed defined over K and unique up to GL(2, K) (see [148], Prop.IV.5.3). If g ≥ 6 ,
the bi-elliptic cover is unique up to translation ([13] Ch.VIII, C-2, p.366) and also defined
over K ([153], Lemma 5).
Theorem 11.10.3. Let C be a smooth geometrically irreducible curve over a number field
K , of genus g ≥ 3 . Let D(K, 2) be the set of effective divisors of degree 2 on C , defined
over K . Then:
implies that X cannot be a translate of an abelian surface. Thus it remains to describe the
set φ−1 (X(K)) .
If we have two distinct divisors [Q1 ]+[Q2 ] and [Q1 ]+[Q2 ] with the same image x0 under
φ , then H 0 (C, O([Q1 ] + [Q2 ])) is two-dimensional and the quotient of independent global
sections leads to a hyperelliptic structure on C , which is unique ([148], Prop.IV.5.3). We
conclude that |φ−1 (x0 )| ≥ 2 implies that φ−1 (x0 ) is a curve of genus 0 . Conversely, if
there is an hyperelliptic morphism f : C → P1 , then x0 := φ(f ∗ ([P ])) remains constant
as P varies on P1 , hence φ−1 (x0 ) is a curve of genus 0 in C (2). Thus φ is one-to-one,
except at the point x0 if C is hyperelliptic.
Note that, if g ≥ 4 , the curve C cannot be both hyperelliptic and bi-elliptic ([13], ∗
Ch.VIII, C-2, p.366). Hence, if the curve C is bi-elliptic, the above shows that the image
Y of E under the map P
→ φ(f ∗ ([P ])) is a curve. By Corollary 8.2.9, it is a translate of
an elliptic curve in A . Conversely, we can show that, if C has genus g ≥ 9 , any translate
Y = E + b ⊂ X of an elliptic curve determines a bi-elliptic structure C → Y ([147],
proof of Th.2 (b)). Moreover, since g ≥ 6 , this bi-elliptic structure is unique and defined
over K . This yields (b) and (c). For statement (a), we prove that, if X contains a curve of
genus 1 , then the curve C is either hyperelliptic or bi-elliptic ([147], Th.2 (a)).
11.10.4. In our second application of Faltings’s big theorem, we consider a variety with
many rational points. The study of such varieties is an important branch of diophantine
geometry, which could not be treated in this book. For a survey of the theory, the reader is
referred to E. Peyre [234].
For a projective variety X over a number field K and a fixed height function hL with
respect to an ample line bundle L , we consider the multiplicative height HL := eh L and
NL (X, T ) := {P ∈ X(K) | HL (P ) ≤ T }
11.10.5. The simplest example of a variety with many rational points is Pn−1
K for n ≥ 2 .
Let N (Pn−1
K , T ) be the counting function of points of bounded height with respect to the
standard height. The case K = Q was considered by Dedekind and Weber, and extended
to the counting of rational points on Grassmannians by W.M. Schmidt [265]. For general
K , let d be the degree, r (resp. s ) the number of real (resp. complex) places, RK the
regulator, DK/Q > 0 the discriminant, wK the number of roots of unity, hK the class
number, and ζK the zeta function of our number field K (see [172] for definitions). If the
reader is not willing to enter into the terminology of algebraic number theory, he may just
consider the special case K = Q and hence d = r = RQ = DK/Q = hK = 1 , wK = 2 ,
s = 0 and ζK is the Riemann zeta function.
∗
The result is stated to hold for g ≥ 3 , but the proof there gives the condition g ≥ 4 . The
example y 2 = x8 + 1 shows that g ≥ 4 is indeed necessary.
11.10. Further developments 393
xi xj
Λ := det
xi xj 0≤i<j≤4
(with for example (i, j) in lexicographic order) determining a closed embedding of G(2, 5)
in P9 . It is easy to see that Σ becomes a closed projective subvariety of P9K (see J. Harris
[146], Example 6.19). This variety, of paramount importance in the study of the cubic
threefold X , is called in honor of Fano the Fano surface † associated to X . The surface
Σ is a smooth geometrically irreducible surface (see [34], Lemma 3).
11.10.9. We assume that Σ(K) = ∅ and we fix a base point Q0 ∈ Σ(K) . Then there
is a canonical abelian variety A , called the Albanese variety of Σ , and a canonical map
a : Σ → A mapping Q0 to 0 factoring through every such morphism from Σ to arbitrary
abelian varieties (see [165], II.3). It is obvious that a(Σ) generates A . This construction
works for any irreducible smooth projective variety with a base point and it may be shown
that the Albanese variety is the dual of the Picard variety ([165], VI, §1, Th.1). It was also
used in the deduction of Corollary 9.3.10.
Complex analytically, the Albanese variety is given as a -complex torus H 0 (Σ, Ω1Σ )∗ /
H 1 (Σ, Z) and the canonical map a is given by a(Q)(ω) = γ ω , where ω ∈ H 0 (Σ, Ω1Σ )
Q
and γP is a path from Q0 to Q (see [130], II.6). The reader will note that the Albanese
variety is a generalization of the Jacobian variety to higher dimensions.
Now we come back to the case of the cubic threefold. Then the Albanese variety A of the
Fano surface Σ has dimension 5 ([34], Lemma 5). Using the complex analytic description
of the Albanese map a above and that KΣ is very ample, it is easy to see that a has finite
fibres and hence it is a finite morphism (see A.6.15, A.12.4). Moreover, C.H. Clemens and
†
We should not be confused with the notion of Fano variety above, the surface Σ has a very
ample canonical bundle and therefore is of general type, see E. Bombieri and H.P.F. Swinnerton-Dyer
[34], proof of Lemma 5, [68], Lemma 10.13.
11.10. Further developments 395
P. Griffiths have shown that ϕ is an immersion (i.e. a local embedding, see [68], §12), but
we do not need this fact for our purposes.
11.10.10. On Σ , we use the Arakelov height induced by restricting the Arakelov height
hAr from P9K . If Q ∈ Σ with corresponding line LQ in X , then we have
hAr (Q) = hAr (LQ ),
where on the right-hand side we have the Arakelov height of the line in P4K . Here and in
the following, the Arakelov height of a projective linear subspace is defined as the Arakelov
height of the corresponding linear subspace in the sense of Section 2.8. Let Nlines (X, T )
be the number of points P ∈ X(K) of height HAr (P ) ≤ T , which are on K -rational
lines in X .
Theorem 11.10.11. If K = Q , then for T ≥ 1 the estimate
Nlines (X, T ) = c2 γ T 2d + O T 2d−1 (1 + log T )r/2
holds, where
hK RK 2d−1 π d
γ= HAr (Q)−d < ∞ and c2 =
wK ζK (2) DK/Q
Q∈Σ(K )
Before we come to the proof, we need some useful results. First, we interpretate the
Arakelov height of a subspace in terms of the associated lattice.
11.10.12.
Let W be an n -dimensional subspace of K N . The diagonal embedding K N →
N ∼
v|∞ Kv = R maps the K -lattice Λ := W ∩ OK to an nd -dimensional R -lattice
Nd N
Λ∞ in the closure of W in R Nd
(cf. Corollary C.2.7). Let vol(Λ∞ ) be the volume of
a fundamental domain of R2d /Λ∞ with respect to the Lebesgue measure. We denote by
λ1 the length of the shortest non-zero vector in Λ∞ with respect to the euclidean norm on
RN d .
We have the following result of Schmidt [263], Th.1, which we quote without proof.
n/2
Theorem 11.10.13. 2ns vol(Λ∞ ) = DK/Q HAr (W )d .
holds
⎧
⎨O T (1 + log+ ( T 0)) if d = 1, n = 2,
T nd λ1
λ1
NAr (L, T ) = cn +
HAr (L)d ⎩O ( T )nd−1 otherwise,
λ1
where the implicit constant in the bound may depend on n and K and where
hK RK −n/2 r+s−1
cn = D n α(n)r {2n α(2n)}s .
wK ζK (n) K/Q
For the proof, we need the following lemma from geometry of numbers. Let Ω be a
bounded measurable set in Rn with Lipschitz parametrizable boundary meaning that there
are finitely many Lipschitz continuous maps, defined on bounded subsets of Rn−1 and with
images covering ∂Ω .
Lemma 11.10.15. Let Λ be a lattice in Rn , let λ1 be the length of the shortest non-zero
lattice vector with respect to the euclidean norm . For T > 0 , the number N (Λ, T ) of
lattice points in T Ω satisfies
n−1
vol(Ω) n T
N (Λ, T ) = T +O λ1
+ 1,
vol(Λ)
where the implied constant depends on n and Ω but not on Λ .
Proof: We refine the standard counting argument from [172], Ch.6, §2, Th.2 to make the
error term uniform with respect to the lattice. First, we use a result of geometry of numbers
(see C.G.Lekkerkerker [181], Ch.2, §10, Th.4) confirming that Λ has a basis v1 , . . . , vn
such that the fundamental domain FΛ := { ni=1 mi vi | 0 ≤ mi < 1} is not too skew,
namely
v1 · · · vn
= O(1).
vol(FΛ )
Hence there is a change of coordinates of norm bounded nby a constant C1 independent
of Λ , which transforms Λ into the orthogonal lattice i=1 Zvi ei , where (ei ) is the
standard basis of Rn . We conclude that a ball of diameter < λ1 /C1 intersects at most 2n
translates λ + FΛ , λ ∈ Λ .
Now let Nint (Λ, T ) (resp. Nbd (Λ, T ) ) be the number of lattice points λ ∈ Λ with λ+FΛ
contained in the interior or Ω (resp. with (λ + FΛ ) ∩ ∂(T Ω) = ∅) . Then
Nint (Λ, T ) ≤ N (Λ, T ) ≤ Nint (Λ, T ) + Nbd (Λ, T ).
Obviously, we have
Nint (Λ, T )vol(Λ) ≤ vol(T Ω) = T n vol(Ω),
hence it is enough to show that Nbd (Λ, T ) may be estimated by the error term in the claim.
The Lipschitz parametrizations of the boundary yield easily that ∂Ω may be covered by at
most C2 /ν n−1 balls of diameter < ν ≤ 2 . If we set ν = λ1 /(C1 T ) , then the above
yields
n−1
T
Nbd (Λ, T ) ≤ 2n C1n−1 C2
λ1
at least for ν ≤ 2 . Since N (Λ, T ) = 1 for T < λ1 , this proves the claim.
11.10. Further developments 397
Proof of Theorem 11.10.14: For simplicity, we restrict the proof to the case K = Q . The
extension to arbitrary number fields may be done by S.H. Schanuel’s method in [257].
Let N (W, T ) be the number of x ∈ ZN ∩ W with |x1 |2 + · · · + |xN |2 ≤ T 2 . Let Ω
be the intersection of W with the unit ball in RN . Then Lemma 11.10.15 and Theorem
11.10.13 yield
α(n)
N (W, T ) = T n + O(( λT1 )n−1 ) + 1.
HAr (L)
Note that the implied constant is independent of W because all Ω s are isometric. Let
N ∗ (W, T ) be the number of primitive solutions, i.e. x ∈ ZN ∩ W with |x1 |2 + · · · +
|xN |2 ≤ T 2 and GCD(x1 , . . . , xN ) = 1 . It is clear that
∞
T
N (W, T ) − 1 = N ∗ W,
k
k=1
T
N ∗ (W, T ) = µ(k) N W, −1 ,
k
k=1
where µ is the Möbius function. We have N (W, T /k) = 1 if k > T /λ1 . We must count
only one half of the primitive solutions for N (L, T ) , hence we get
T /λ 1 T /λ 1
1 ∗ α(n) T n µ(k) |µ(k)|
NAr (L, T ) = N (W, T ) = n
+O(( λT1 )n−1 ) .
2 2 HAr (L) k kn−1
k=1 k=1
1 µ(k) ∞
= (1 − p−s ) =
ζ(s) p
ks
k=1
for the Riemann zeta function ([8], Ch.5, Sec.4.1, Th.9) and using Minkowski’s first theo-
rem in C.2.19 together with Theorem 11.10.13, we easily deduce the claim.
Proof of Theorem 11.10.11: We may assume that Σ has a K -rational base point Q0 ,
otherwise the theorem is trivial. K -rational lines on X correspond to K -rational points
on Σ and we can apply Faltings’s big theorem to the image a(Σ) of Σ in the Albanese
variety A . Since a(Σ) generates A , no translate of an abelian subvariety is equal to a(Σ) .
Hence a(Σ(K)) is contained in the union of finitely many elliptic curves Yi and a finite
set S . Here, the elliptic curves are allowed to have origin different from the origin of A
and so we may assume that they are defined over K . Let h be the Néron–Tate height on
an elliptic curve Yi with respect to an even ample line bundle Hi ; then we have
Theorem 11.10.14 and noticing that the first successive minimum is uniformly bounded
10
from below (by the first successive minimum of OK ), we have
" # T 2d
P ∈ LQ (K) | HAr (P ) ≤ T = c2 + O T 2d−1 (11.32)
HAr (Q) d
Substituting the parametrization of LQ into this equation and looking at the coefficient of
u2 v , we infer that
4
where π ranges over all permutations. In other words, the line LQ is contained in the
cubic surface S in P3 , which is the intersection of X with the hyperplane ΠQ defined by
(11.33). Now we distinguish cases.
If S contains only finitely many lines, these lines are determined algebraically by the coef-
ficients of the defining equations of X and ΠQ , hence they have height of polynomial size
in the heights of these equations, hence HAr (x0 )ρ ≤ T ρ for some absolute constant ρ
(for a precise result, we may use some sort of arithmetic Bézout theorem, see [45], Th.5.5.1,
for an advanced version). Since LQ is one of these lines, this contradicts the lower bound
HAr (LQ ) > T κ if κ > ρ and T is larger than some constant depending only on X ,
which we can assume by taking ρ large enough. Thus we need only deal with the case in
which S contains infinitely many lines.
11.10. Further developments 399
Note that the smooth cubic surface contains exactly 27 lines ([148], Th.V.4.7). Therefore,
if S contains infinitely many lines it must be singular and, intersecting S with a generic
place in ΠQ , we verify that S can be only one of the following possibilities:
Cases (b), (c), (d) are immediately excluded because they would give rise to a family of
lines of X parametrized by a rational curve. This would lead to a rational curve in the Fano
surface Σ . Since the Albanese map is finite, the image under the Albanese map a would
give a rational curve inside an abelian variety contradicting 8.2.18 or Proposition 8.2.19.
Case (a) can actually occur. The maximum number of such cones is 30 , attained for ex-
ample with the Fermat cubic threefold x30 + x31 + x32 + x33 + x34 = 0 for the complete
intersections with the hyperplanes xi + ηxj = 0 , η 3 = 1 and i = j , see [68], Lemma 8.1
and p.315.
We conclude that the vertices x0 of these cones belong to a finite set P and by choosing
T sufficiently large only the vertices contribute to the counting, proving the claim.
11.10.16. We have studied in Chapter 4 the theory of small points on subvarieties of a linear
torus. There is a similar theory on an abelian variety A over a number field K , which we
mention briefly. For details, we refer to the overview article of Abbes [1].
Let X be a closed subvariety of A . As in 3.2.2, a torsion coset of X is a translate of an
abelian subvariety by a torsion point and we define X ∗ to be the complement of the union
of all torsion cosets in X . We have the following analogue of Theorem 4.2.2, also due to
Zhang [342]:
Theorem 11.10.17. Let hL be the Néron–Tate height on A with respect to an ample sym-
metric line bundle L . Then
(a) There are only finitely many maximal torsion cosets in X and its union is X \ X ∗ .
(b) There is a positive lower bound for the restriction of
hL to X ∗ .
11.10.18. This theorem is called the Bogomolov conjecture. In fact, Bogomolov conjec-
tured it only for a subcurve of A . We omit the proof of Theorem 11.10.17, which is best
done in the framework of Arakelov theory. It relies on an equidistribution theorem due to
Szpiro, Ullmo, and Zhang [296]. This inspired Bilu’s approach in Section 4.3.
Since torsion points are characterized by Néron–Tate height 0 , Theorem 11.10.17 yields
immediately:
Corollary 11.10.19. If X is not a torsion coset, then the torsion points are not Zariski-
dense in X .
11.10.20. A. Moriwaki [208] generalized Theorem 11.10.17 to finitely generated fields F
over Q using a new type of heights given in terms of Arakelov geometry. This proves
400 FA LT I N G S ’ S T H E O R E M
Corollary 11.10.20 over F , which is the Manin–Mumford conjecture. The latter was first
proved by M. Raynaud [237], [238], using different methods.
12.1. Introduction
401
402 T H E abc-CONJECTURE
This section begins with the formulation of the abc-conjecture over the rational
numbers. Then we give a weak explicit form and indicate how to deduce Fermat’s
and Catalan’s conjectures from it. The standard formulation of the abc-conjecture
involves a positive small parameter ε and an unspecified positive constant C(ε),
which depends only on ε , to rule out the possibility of disproving the conjecture
by finding numerical counterexamples. The main drawback of such an approach is
that algebraic geometry has not been of any help in producing plausible heuristics
about the behavior of C(ε). However, considerations of diophantine approxima-
tion have led A. Baker to suggest a specific dependence of C(ε) on ε , and we
mention his conjecture in 12.2.6.
The remaining part is dedicated to an argument, due independently to Langevin
and Elkies, that the abc-conjecture implies Roth’s original theorem over Q , in fact
in a stronger form. Its importance also lies in the fact that an effective abc-theorem
would make Roth’s theorem effective. The proof is based on Belyı̆’s lemma from
12.2.7. In Section 14.4, Elkies’s argument will be extended to number fields and
the proof becomes more conceptual. We conclude this section with an amusing
application of the abc-conjecture to a classical question of analytic number theory.
Definition 12.2.1. The radical rad(N ) of an integer N is the product of all dis-
tinct primes dividing N
rad(N ) = p.
p|N
12.2.3. The adjective “strong” here refers to the fact that this statement is supposed
to hold for every positive ε . If we assume that it only holds for some fixed ε > 0,
for example ε = 1, then we refer to it as the weak abc-conjecture.
In this respect, we note that for applications the weak abc-conjecture often is as
useful as in its strong formulation. The statement
for every pair a, b of positive coprime integers has in fact been conjectured by
several authors as a likely explicit form of the weak abc-conjecture.
Example 12.2.4. Suppose xn + y n = z n is a non-trivial solution in coprime positive
integers of the famous Fermat equation. Let us take a = xn , b = y n , c = z n in (12.1).
Since abc = (xyz)n we deduce
Since z > 1 , this implies n ≤ 5 . It is well known that the Fermat equation has no non-
trivial solutions for n = 3 (Euler), n = 4 (Fermat), n = 5 (Dirichlet, Legendre). For a
proof of these classical cases, we refer to [97]. We conclude that (12.1) implies Fermat’s
last theorem. Any weak abc -conjecture would lead to a proof of the asymptotic Fermat
conjecture, namely a proof for all sufficiently large exponents n .
Fermat’s last theorem is now proved, at last, by A. Wiles and R. Taylor [331], [297]. For an
account of the proof, we refer to the book of H. Darmon, F. Diamond, and R. Taylor [77].
Example 12.2.5. The same argument applies to Catalan’s conjecture that 8 and 9 are
the only two consecutive perfect powers in the sequence of positive natural integers. If we
apply (12.1) to the Catalan equation xm + 1 = y n , we find
n
y n ≤ (xy)2 < y 2 m +2
and n(m − 2) < 2m . As we may restrict to m, n prime, this leaves us with the well-
known possibilities m = 2 (V.A. Lebesgue), or n = 2 (Chao Ko), or (m, n) one of the
pairs (3, 3) (trivial), (3, 5) , (5, 3) (Nagell). For an account of these cases, we refer to P.
Ribenboim [243]. Hence the Catalan conjecture follows from (12.1).
The Catalan conjecture has recently been established unconditionally by P. Mihăilescu [203]
(see also Yu.F. Bilu [25] for an exposition of his proof). It had been shown earlier uncon-
ditionally by R. Tijdeman [300] that the Catalan equation has only a finite number of solu-
tions, and effective bounds for the size of the solutions x , y and of the exponents m , n
could also be given. It suffices to consider the equation xp − y q = ±1 with p > q odd
primes and what really matters is to give a upper bound for the exponent p . Tijdeman’s
proof achieves this by appealing to Baker’s theory of linear forms in logarithms. A note-
worthy aspect of his method is that it extends to study the equation xp − y q = ±1 over an
arbitrary number field.
The full solution required additional considerations from the theory of cyclotomic fields
so to impose severe restrictions on the possible pairs of exponents (q, p) . In particular,
M. Mignotte and Y. Roy [201] proved its validity for q < 105 , while P. Mihăilescu [202]
404 T H E abc-CONJECTURE
proved that if odd primes q < p allow a solution of Catalan’s equation, then (q, p) satisfies
the two congruences pq−1 ≡ 1 (mod q 2 ) and q p−1 ≡ 1 (mod p2 ) , forming a so-called
Wieferich pair. A few examples of such pairs are known, namely
(2, 1093), (3, 1006003), (5, 1645333507), (83, 4871), (911, 318917), (2903, 18787),
but they appear to be rather uncommon.
Finally, Mihăilescu [25] was able to prove the further congruence p ≡ 1 (mod q) , hence
p ≡ 1 (mod q 2 ) followed because of the previously established congruence pq−1 ≡ 1
(mod q 2 ) . Therefore, p ≥ q 2 + 1 . A slightly more accurate argument also proved p ≥
4q 2 + 1 . This clean lower bound for p could be combined with the upper bound for p in
terms of q , again obtained by means of Baker’s theory, showing that q had to be within
the range covered by the Mignotte and Roy result. This completed the proof of Catalan’s
conjecture.
Variants of this proof, avoiding the use of Baker’s theory and using instead the theory of
cyclotomic fields to obtain the required upper bound for p , have been described by several
authors and we refer to Mihăilescu [203] for further details.
Numerical experiments are consistent with a small value for the constant K .
We continue with the application of the abc-conjecture to Roth’s theorem, begin-
ning with Belyı̆’s lemma:
Lemma 12.2.7. Let g : C → C be a non-constant morphism between two ir-
reducible smooth curves C , C , defined over a number field K . Let S be any
finite set of points on C(K). Then there is a non-constant rational function
h : C → P1K such that the composite morphism f = h ◦ g : C → P1K is
unramified outside of f −1 ({0, 1, ∞}) and moreover f (S) ⊂ {0, 1, ∞}.
Unramified morphsims are studied in A.12, B.4, and B.3. Here, we deal with
unramified morphisms ϕ : C → C of smooth curves over a number field. Then it
suffices to know that ϕ is unramified in x ∈ C if and only if d(w ◦ ϕ)/dz(x) = 0
with respect to local analytic coordinates z at x and w at ϕ(x) (see Proposition
A.12.18 and Example B.4.8).
12.2. The abc-conjecture 405
S = g(S ∪ S1 ) = {0, 1, ∞, 1 − a3 }.
406 T H E abc-CONJECTURE
By our choice of S and g , this makes the second step redundant. To lower the cardinality,
we may take A = 1 − a3 , B = a3 whence
3 3 3 3
h(x) = (1 − a3 )−1+a (a3 )−a x1−a (1 − x)a .
The composite map f = h◦g yields the desired morphism f : P1Q → P1Q with the property
that f is unramified outside of f −1 ({0, 1, ∞}) and f (S) ⊂ {0, 1, ∞} .
If we further specialize a = −1 , we find
(1 + 3x2 + 2x3 )2
f (x) = ,
4x2 (3 + 2x)
(1 + x)4 (1 − 2x)2
1 − f (x) = − ,
4x2 (3 + 2x)
yielding the identity
(1 + 3x2 + 2x3 )2 − (1 + x)4 (1 − 2x)2 = 4x2 (3 + 2x).
In general, this procedure based on composition of maps quickly leads to rational functions
with gigantic degree and height. Note that the procedure followed here is not necessarily
the best, for example if a = −1 the polynomial h(x) = (x − 1)2 does the job as well,
with a corresponding f (x) = (3x2 + 2x3 )2 and polynomial identity
x4 (3 + 2x)2 + (1 + x)2 (1 − 2x)(1 + 3x2 + 2x3 ) = 1.
Theorem 12.2.9. The strong abc-conjecture over Q implies Roth’s theorem (see
Theorem 6.2.3), in the special case K = Q and S = {∞}.
Remark 12.2.10. The proof is constructive; hence an effective version of the
strong abc-conjecture implies the effective Roth theorem, indeed a stronger ver-
sion of it which takes into account arithmetic ramification (see Section 14.4).
Proof: Let α be an algebraic number of degree n ≥ 2 (the case n = 1 is trivial)
and let g(x) be its minimal polynomial over Z . We apply Belyı̆’s lemma from
12.2.7 to the morphism g : P1Q → P1Q and the set S consisting of the roots of g(x),
obtaining a rational function h(x), defined over Q , such that the composition
f (x) = h(g(x)) has the following properties:
Without loss of generality, we may suppose that h(0) = 0. Let f (x) = u(x)/w(x),
where u , w are polynomials with integral coefficients without common factors,
and let v(x) := w(x) − u(x). Let d be be the degree of the morphism f , hence
d = max(deg(u), deg(w)), and let
U (X, Y ) = Y d u(X/Y ), V (X, Y ) = Y d v(X/Y ), W (X, Y ) = Y d w(X/Y )
be the associated homogeneous forms of degree d . Note that U + V = W and
they have no common factors as well.
12.2. The abc-conjecture 407
therefore, recalling that U1 (k, l) = G(k, l) and |Ui (k, l)| f ldeg(Ui ) , we get
rad(abc) f |G(k, l)| lK ,
where
t
t
K = − deg(U1 ) + deg(Ui ) = −n + deg(Ui ).
i=1 i=1
Next, we note that, since f (α) = 0 and |α − k/l| is assumed to be sufficiently
small, we must have that k/l is bounded away from the zeros of v . It then follows
k
|V (k, l)| = ld v f ld .
l
By Lemma 12.2.11, we conclude |b| f ld , with an implied constant depending
only on the height and degree of f (x).
In view of these considerations, the abc-inequality yields
ld ε,f (|G(k, l)| lK )1+ε
whence
|G(k, l)| ε,f ld−K−dε/(1+ε) .
It remains to evaluate d − K . To this end, we apply Hurwitz’s theorem from
B.4.6 to the ramified covering f : P1Q → P1Q . The ramification occurs only over
{0, 1, ∞} and we get
t
−2 = d · (−2) + (mi − 1) deg(Ui ).
i=1
Moreover, we have
t
3d = mi deg(Ui )
i=1
leading to
t
deg(Ui ) = d + 2
i=1
and finally d − K = d + n − deg(Ui ) = n − 2. We conclude that
|G(k, l)| ε,f ln−2−εd/(1+ε) . (12.2)
Since G(k, l) = ln g(k/l) and g(α) = 0, the mean-value theorem shows that
k
G(k, l) = ln g(k/l) = ln (g(k/l) − g(α)) = ln − α g (ξ)
l
for some point ξ between k/l and α . Since k/l is close to α and g (α) = 0, we
see that |g (ξ)| is bounded away from 0, giving
k
− α g l−n |G(k, l)|.
l
12.2. The abc-conjecture 409
This proves ω(p) ≤ deg(f ) and hence the infinite product p cp (f ) is absolutely
convergent.
Let x be a large integer and let M := p≤√log x p2 . By the Chinese remainder theo-
rem,
in any interval {k,
. . . , k + M − 1} of M consecutive integers 2there are exactly
p≤√log x p − ω(p) integers n such that f (n) is not divisible by p for every prime
√ 2
M 1− 2 + O(M ) ∼ c(f ) · x,
M √ p
p≤ log x
where c(f ) := p cp (f ) is non-zero, as we have seen above.
√
The number of integers n ≤ x for which p2 |f (n) and log x < p < x is majorized by
ω(p) x
2
·x+ ω(p) √
√ p p<x
log x
p> log x
again because of the elementary estimate π(x) x/ log x , is also negligible in our
counting.
It remains to show that the sequence of integers n , for which p2 |f (n) , for some prime
p ≥ n , has zero density. This is the difficult step and sieve methods, combined with
additional ingenious ideas, have been used to prove this for polynomials f (x) of degree at
most 3 .
Unfortunately, this approach fails if the degree of f (x) is 4 or more. However, on the as-
sumption of the abc -conjecture we can use a clever trick and Corollary 12.2.13 to conclude
the proof in a single stroke. We choose an integer m larger than the distance of any two
roots of f (x) and let l be another positive integer, which is at our disposal. Consider the
polynomial
g(x) = f (x)f (x + m) · · · f (x + lm)
and note that the assumption on m ensures that g(x) too has no multiple roots.
By Corollary 12.2.13, the strong abc -conjecture implies that
p g,ε |n|deg(g)−1−ε
p|g(n)
may admit such a square factor. In particular, the density of integers n for which f (n)
admits a square factor p2 with p ≥ n is at most m/(lm) = 1/l . Since l can be chosen
arbitrarily large, the set of such integers n has density 0 , concluding the proof.
Our next goal is Belyı̆’s striking theorem that a complex projective curve is defined over Q
if and only if it is a covering over P1C , unramified outside of {0, 1, ∞} . It will be of minor
importance in our book and the reader may skip it in a first reading.
We begin by gathering the necessary facts about coverings, referring for details to W.S.
Massey [196], Ch.5, and J.B. Conway [70], Ch.16. The reader is assumed to be familiar
with the connexion between algebraic and analytic structures on a smooth complex variety
as provided by A.14. Then we will prove Belyı̆’s theorem using the language of schemes.
An extension of the last part of the proof will yield a result of Grothendieck.
12.3.1. A topological covering is a continuous map π : Y → X of non-empty topological
spaces which is locally trivial, i.e. every x ∈ X has an open neighbourhood U and a
discrete non-empty topological space F such that π −1 (U ) is homeomorphic to U × F
with π corresponding to the first projection.
12.3.2. A morphism of topological coverings π1 : Y1 → X , π2 : Y2 → X is a continu-
ous map ϕ : Y1 → Y2 with π1 = π2 ◦ ϕ . If Y1 = Y2 and ϕ is a homeomorphism, then
ϕ is called an automorphism of the covering. The group of automorphisms of a topological
covering is called the covering group.
12.3.3. We always assume in this section that X is a connected locally contractible topo-
logical space with a base point x0 ∈ X . The fundamental group π1 (X, x0 ) is the group
of homotopy classes of loops with origin x0 . For a topological covering π : Y → X ,
the fundamental group π1 (X, x0 ) acts from the right on the fibre π −1 (x0 ) . Explicitly, for
y ∈ π −1 (x0 ) and γ ∈ π1 (X, x0 ) , we define y γ as the end point of the unique lift of γ to
Y with starting point y . Every morphism of topological coverings of X restricts to a map
of fibres over x0 compatible with the action of π1 (X, x0 ) .
12.3.4. There is a bijective correspondence between isomorphism classes of connected
topological coverings with base point over x0 and subgroups of π1 (X, x0 ) , similar to the
Galois correspondence in the theory of algebraic field extensions. A connected topological
covering π : Y → X with base point y0 over x0 corresponds to the subgroup π1 (Y, y0 )
of π1 (X, x0 ) . Note that the identification of the elements of π1 (Y, y0 ) with its images in
X is allowed because this homomorphism is injective. If ϕ : Y1 → Y2 is a morphism of
connected topological coverings mapping the base point of Y1 to the base point of Y2 , then
it has to be surjective and induces an inclusion H1 ⊂ H2 of corresponding fundamental
groups. Moreover, every inclusion of subgroups arises this way and the corresponding ϕ is
unique up to isomorphisms.
12.3.5. Let Y be a connected topological covering of X with base point y0 over x0 . Then
the (right) action of G := π1 (X, x0 ) on the fibre π −1 (x0 ) is transitive. The stabilizer H
of y0 is equal to π1 (Y, y0 ) . Let N (H) := {g ∈ G | gHg −1 = H} be the normalizer
of H in G . Then N (H)/H is isomorphic to the covering group Γ . In fact, the choice of
412 T H E abc-CONJECTURE
y0 leads to an identification of π −1 (x0 ) with the right coset space H\G and N (H)/H
operates from the left giving rise to the isomorphism with Γ .
If H is a normal subgroup of G , then Γ ∼ = G/H operates transitively and freely on
π −1 (x0 ) and we may write X = Γ\Y , as a quotient of Y by the left Γ -action.
12.3.6. If we change the base point from x0 to x1 , then we choose a path ρ from x0 to
x1 getting an isomorphism
π1 (X, x0 ) −→ π1 (X, x1 ), γ
→ ρ−1 γρ,
where ρ−1 is obtained from ρ by following the reverse direction. Now fixing x0 but
varying the base point y0 of Y in 12.3.4, we get a conjugated subgroup π1 (Y, y1 ) of
π1 (Y, y0 ) in π1 (X, x0 ) . Hence there is a bijective correspondence between isomorphism
classes of connected topological coverings of X and conjugation classes of subgroups of
π1 (X, x0 ) .
The connected topological covering corresponding to the trivial subgroup {1} is the uni-
versal covering X & of X . By 12.3.4 and 12.3.5, the covering group Γ
& of X& is isomorphic
to π1 (X, x0 ) and there is a bijective correspondence between conjugation classes of sub-
groups Γ of Γ & and isomorphism classes of connected topological coverings Y of X given
by Y = Γ\X &.
The covering Y is called finite if the number of fibre points is finite. The number of fibre
& : Γ] , hence the covering Y is finite over X if
points in Y of X is equal to the index [Γ
&.
and only if Γ has finite index in Γ
12.3.7. Now let X be a connected Riemann surface, in other words a connected complex
manifold of dimension 1 . Then every topological covering π : Y → X has a canonical
analytical structure: By 12.3.1, there is an analytic atlas (Uι )ι∈I of X such that Y is
locally trivial over Uι . Then we use the atlas (π −1 (Uι ))ι∈I of Y to define Y as a complex
manifold. Moreover, it is clear that every morphism of topological coverings of X will be
analytic.
12.3.8. Let us also assume that X is algebraic and π : Y → X is a finite connected
topological covering. We have seen in 12.3.7 that the covering is analytic. Let X be
the irreducible smooth projective curve containing X as an open subvariety (see A.13.3).
We use the complex manifold structure on X and X . A local application of 12.3.6 to
{z ∈ C | 0 < |z| < 1} proves that π extends uniquely to a finite ramified covering
π : Y → X , namely to a holomorphic map of connected Riemann surfaces locally of the
form z
→ z n for some n ∈ N , n ≥ 1 .
The Riemann existence theorem says that every compact Riemann surface is projective
algebraic (see [148], Th.B.3.1). This applies to Y and the GAGA-principle in A.14.7 shows
that π is algebraic. We conclude that every finite topological covering of X is algebraic
and that every morphism of topological coverings of X extends uniquely to a ramified
algebraic morphism of the smooth projective compactifications. As an algebraic morphism,
π is finite (because π is finite) and also étale. The latter follows from the fact that π is
analytically a local isomorphism and from A.12.18.
12.3.9. Conversely, we claim that every finite étale algebraic morphism π of a variety X
on to a smooth complex variety X gives rise to a finite topological covering πan : Xan →
12.3. Belyı̆’s theorem 413
Xan . To see this, note first that πan is a local isomorphism (use A.12.18). For x ∈ X with
−1
πan (x) = {x1 , . . . , xn } , we get disjoint open neighbourhoods Uj of xj in the complex
topology such that πan maps Uj biholomorphically onto an open neighbourhood U of x .
−1
We need to prove, for U sufficiently small, that πan (U ) = U1 ∪ · · · ∪ Un , yielding local
triviality of πan , but this is an easy consequence of properness of πan (cf. A.14.6).
12.3.10. The universal covering space of P1an \ {0, 1, ∞} is the upper half plane H :=
{z ∈ C | z > 0} . The covering map is the modular function λ known from the
proof of Picard’s little theorem (see L. Ahlfors [8]; in 13.2.35 we give another proof using
Nevanlinna theory). Let Γ(2) be the kernel of the reduction modulo 2 on SL(2, Z) and let
Γ(2) be the image of Γ(2) in P SL(2, Z) = SL(2, Z)/{±1} . Then the covering group of
λ : H → P1an \ {0, 1, ∞} is Γ(2) . The next result is Belyı̆’s theorem:
Theorem 12.3.11. Let C be an irreducible smooth projective curve defined over C . The
following conditions are equivalent:
(a) There is a curve C defined over Q with C isomorphic to the base change CC .
(b) There exists a non-constant rational function f : C → P1C ramified at most over
three points.
(c) There is a subgroup Γ of finite index in Γ(2) such that Γ\H is isomorphic to Uan
for a Zariski open subset U of C .
Moreover, if (a) holds, if we identify CC with C , and if S is a finite subset of C (Q) , then
Γ in (c) can be chosen such that S ⊂ C \ U .
The details are as follows. By 12.3.8, f is a finite étale algebraic morphism and therefore
U is affine. The coordinate ring of X := P1 \ {0, 1, ∞} over C is
? @
1 1
C[X] = C x, ,
x x−1
and C[U ] is a finitely generated C[X] -module. There are variables y = (y1 , . . . , yN )
and x = (x, x1 , 1−x1
) such that C[U ] = C[x, y]/I for an ideal I . By finiteness, we may
assume that y1 , . . . , yN generate C[U ] as a C[X] -module, hence
N
yi yj − λk (x)yk ∈ I (12.3)
k=1
for suitable λk (x) ∈ C[X] . Hence I is generated by polynomials p1 (x, y), . . . , pr (x, y)
consisting of all polynomials in (12.3) and some other polynomials of degree 1 in y . The
coefficients of these polynomials are contained in a subring R of C containing Q such that
R is a finitely generated Q -algebra. We note that R is an integral domain.
Now it is convenient to use the language of schemes. We consider the affine scheme
of finite type over the integral affine scheme S := Spec(R) and let us denote by π :
U0 → S the morphism of structure. By construction, we have a canonical finite morphism
f0 : U0 → XR defined over R whose base change from R to C is f . Note that R ⊂
C induces a geometric point z ∈ S(C) , namely a morphism z : Spec(C) → S . By
construction, the fibre (f0 )z of f0 over z is f and hence étale. The image of z is the
generic point ζ of S and (f0 )z is the base change of the fibre (f0 )ζ to C . By flat descent,
we conclude that (f0 )ζ is an étale morphism (cf. A. Grothendieck and J.A. Dieudonné
[137], Prop.17.7.1). Obviously, (f0 )ζ is also surjective.
Note that f0 is flat in every point u over ζ . This follows from the fact that OU 0 ,u is
canonically isomorphic to the local ring of u in the fibre (U0 )ζ , from the similar fact for
f0 (u) and from flatness of (f0 )ζ (for details about fibres, cf. [139], 3.4). Clearly, f0 is
also unramified in u because this is a property of the fibre over π(u) = ζ . This implies
that f0 is étale in all the points of (U0 )ζ . The étale points of f0 form an open subset V0
of U0 (cf. B.3.2).
Note that there is an open dense subset S0 of S such that π −1 (S0 ) ⊂ V0 . This follows
from a theorem of Chevalley stating that the image of a morphism is a constructible sub-
set, i.e. a finite disjoint union of intersections of a closed and an open subset (cf. [148],
Ex.s II.3.18, II.3.19). Since (U0 )ζ ⊂ V0 , we get ζ ∈ T0 := π(U0 \ V0 ) . Then the con-
structibility of T0 implies that ζ is not contained in the closure of T0 in S and, by setting
S0 := S \ T 0 , we get our claim.
Now we consider S0 as an irreducible variety over Q and, by passing to a dense open
subset, we may assume that S0 is smooth (cf. A.7.16). Thus X ×S0 is a smooth irreducible
variety over Q (cf. A.4.11 and A.7.17). Since the restriction of f0 to π −1 (S0 ) is étale, we
conclude that π −1 (S0 ) is a smooth variety over Q (cf. [137], Prop.17.3.3). By Hilbert’s
Nullstellensatz in A.2.2, there is an algebraic point s ∈ S0 (Q) .
12.3. Belyı̆’s theorem 415
F : π −1 (S0 )C −→ X × (S0 )C
over (S0 )C . By construction, the fibre Fz is f and the fibre Fs is equal to the base change
of (f0 )s from Q to C . By 12.3.9 and connectedness of Uan , we conclude that Fan is a
connected topological covering between associated complex manifolds.
We choose x ∈ X(C) . Let ρ be a path in (S0 )an from z to s and let ρx = {x} × ρ . For
y ∈ F −1 (x, z) , let y ρ be the endpoint of the lift of ρx to a path with origin y .
The fibres of F over S are finite étale morphisms, therefore (fz )an and (Fs )an are also
topological coverings of Xan = P1an \ {0, 1, ∞} . For γ ∈ G := π1 (Xan , x) , let γz :=
γ × {z} and γs := γ × {s} . Obviously, we have
ρ−1
x γz ρx = γs ∈ π1 (Xan × (S0 )an , (x, s)).
(y γ )ρ x = (y γ z )ρ x = (y ρ x )γ s = (y ρ x )γ
Theorem 12.3.12. Let X be a variety over Q and let ϕ be a finite topological covering of
(XC )an . Then there is a finite étale morphism ψ : Y → X of varieties over Q such that
(ψC )an = ϕ . Moreover, Y and ψ are unique up to isomorphism.
Proof: The arguments are similar to the proof of (c) ⇒ (a) for Belyı̆’s theorem in 12.3.11.
The details are left to the reader. We give the following hints:
(a) By passing to the connected components, we may assume that X and Y are both
connected.
(b) There is a unique way to endow Y with the structure of a complex space such that
ϕ is a finite analytic morphism which is locally biholomorphic.
(c) For algebraicity, we have to use Grothendieck’s generalization of the Riemann
existence theorem (cf. A. Grothendieck [140], Exposé XII, Th.5.1; see also [148],
Th.B.3.2 for the original version of Grauert–Remmert):
Theorem 12.3.13. Let us assign to every finite étale covering of a complex algebraic variety
Z the associated analytic morphism. Then this gives an equivalence from the category of
finite étale coverings of Z to the category of finite analytic coverings of Zan (meaning
finite morphisms which are locally biholomorphic).
416 T H E abc-CONJECTURE
12.4. Examples
Theorem 12.4.1. Let K be a field of characteristic 0 and let a(x), b(x), c(x) ∈
K[x] be not all constant, coprime in pairs and such that a(x) + b(x) + c(x) = 0.
Let rad(abc) be the monic polynomial with simple zeros at the zeros of abc. Then
We give here two simple proofs, which will be reinterpreted and extended to more
general cases in Section 14.5.
First proof: Let us first observe that on the right-hand side, we count the roots
z without their multiplicity ordz . Clearly, we may assume that K algebraically
closed and, by a permutation, deg(a) = deg(b) ≥ deg(c).
The hypotheses of the theorem imply that none of a, b , c is identically 0 and the
rational functions u := −a/c, v := −b/c are not constant, with
u + v = 1.
we conclude that
−2 ≥ deg(u) − u−1 ({0, 1, ∞}) . (12.4)
It remains to give a upper bound for u−1 ({0, 1, ∞}) .
We distinguish two cases. Suppose first that deg(u) = deg(a) = deg(b) =
deg(c). Then
1 1 1
a b c .
a b c
Clearly, its rank is 2 otherwise aa = bb = cc , contradicting the assumption that a,
b , c are coprime and not all constant. By Cramer’s rule, it follows that the solution
418 T H E abc-CONJECTURE
(a, b, −c) of the above linear system is proportional to the vector of cofactors of
the matrix. Let λ ∈ K(x) be the the associated proportionality factor, hence
c b
− = λ · a,
c b
a c
− = λ · b,
a c
b a
− = λ · (−c).
b a
Recall that rad(abc) is the monic polynomial with simple zeros at the zeros of
abc. By the basic property d log(f g) = d log(f ) + d log(g), it is clear that
the product of rad(abc) with each cofactor on the left is a polynomial. Since
a, b, c are coprime, we conclude that the denominator of λ must be a divisor
of rad(abc). Since each cofactor vanishes at ∞ , taking degrees we infer that
max(deg(a), deg(b), deg(c)) ≤ deg(rad(abc)) − 1.
Remark 12.4.2. Theorem 12.4.1 is sharp precisely whenever u := −a/c defines
a Belyı̆ morphism u : P1an → P1an ramified only over {0, 1, ∞} and a, b , c are
not all of the same degree. This is clear from the first proof of the theorem, because
additional ramification will contribute an additional positive term to the right-hand
side of (12.4).
Remark 12.4.3. A more natural geometric formulation of Theorem 12.4.1 is ob-
tained by replacing a, b , c by non-zero elements of the function field K(x) with
a + b + c = 0 and introducing the obvious projective height function
h((a : b : c)) = − min(ordz (a), ordz (b), ordz (c)),
z∈P1 (K)
The second proof of the abc-theorem for polynomials easily extends to prove the
general abc-theorem for polynomials.
12.4. Examples 419
All these generalizations will be proved, in more detail and in a more general
setting, in Section 14.5.
Remark 12.4.5. The coefficient is sharp if n = 3 or 4. If n ≥ 5, it is not clear
what is the best coefficient in Theorem 12.4.4. The simple example
n − 2
n−2
xej − (xe + 1)n−2 = 0
j=0
j
shows that 12 (n−1)(n−2) cannot be replaced by any constant smaller than n−2.
Better lower bounds can be provided for n ≥ 4. The following clever example is
in J. Browkin and J. Brzeziński [51]. Let Pk (x) be the polynomial of degree k
such that
x2k+1 − 1 k (x − 1)2
= x Pk .
x−1 x
It is easy to see that all roots of Pk are negative, hence Pk has positive coefficients.
If we take k = n − 3 and set xe in place of x , we obtain an identity
n−3
x(2n−5)e − 1 − sj (xe − 1)2j+1 x(n−3−j)e = 0
j=0
The exponent 1 + ε is necessary in the abc -conjecture over Q , as we see by the following
interesting result, due to Stewart and Tijdeman [291].
Theorem 12.4.6. For every δ > 0 , there are infinitely many triples a , b , c of coprime
positive numbers with a + b = c and
√
(4−δ) loglog c
c>e log c p.
p|abc
420 T H E abc-CONJECTURE
Note that the factor is of order O(cε ) for any ε > 0 and hence this lower bound is com-
patible with the strong abc -conjecture in 12.2.2.
12.4.7. The strategy of proof is quite simple. If we restrict a and c to integers whose
prime factors belong to a rather small set, then the radical of ac is small. On the other
hand, the number of such integers up to x is large, and an application of Dirichlet’s pigeon-
hole principle shows that we can make b = c − a fairly small. It does not matter much
which absolute value we use for this purpose, and we shall use the 2 -adic topology in our
argument. If b is small in this sense, then its radical is smaller than expected, thus giving a
non-trivial example of an (a, b, c) -triple.
We begin with counting the number of integers up to N all of whose prime factors belong
to a fixed set P , provided the set P is not too large.
Proposition 12.4.8. Let P be a finite set of primes and let ϑP = p∈P log p . Then the
number Ψ(N, P) of integers n ≤ N with prime factors only from P is bounded by
−1
−1
(log N )|P| (log N + ϑP )|P|
log p ≤ Ψ(N, P) ≤ log p .
|P|! p∈P
|P|! p∈P
Proof: Let n = p∈P pa p and let us associate to n the integer vector a = (ap )p∈P . This
makes it clear that Ψ(N, P) is the number of lattice points in
$ %
T (log N ) := x ∈ RP | (log p)xp ≤ log N, xp ≥ 0 for p ∈ P .
p∈P
Let A be the set of lattice points in T (log N ) . We associate to a lattice point a ∈ A the
unit cube C(a) = {z | ap ≤ zp < ap + 1, p ∈ P} . Then it is clear that
T (log N ) ⊂ C(a) ⊂ T (log N + ϑP ).
a∈A
y
log Ψodd (x, y) = (π(y) − 1) log log x − y + o .
log y
Proof: We apply Proposition 12.4.8 with N = x , taking for P the set of all odd primes
not exceeding y , which has cardinality |P| = π(y) − 1 . We get
−1
−1
(log x)|P| (log x + ϑP )|P|
log p ≤ Ψodd (x, y) ≤ log p . (12.5)
|P|! p∈P
|P|! p∈P
Partial summation of
log log p = (π(n) − π(n − 1)) log log n
2<p≤y 2<n≤y
12.4. Examples 421
yields
log log p = (π(n) − π(n − 1)) log log y
p∈P 2<n≤y
m du
+ (π(n) − π(n − 1))
m+1 2<n≤u u log u
2<m≤y−1
y
π(u) − 1
= (π(y) − 1) log log y − du.
3 u log u
By the prime number theorem and integration by parts, we deduce
y
log log p π(y) log log y + O . (12.6)
p∈P
(log y)2
Similarly, we find
y
ϑP = log p = y + o . (12.7)
log y
2<p≤y
y
log Ψodd (x, y) = (π(y)−1) log log x−π(y) log π(y)+π(y)−π(y) log log y+o .
log y
The lemma follows from the prime number theorem in the stronger form (see G.J.O.
Jameson [158], Exercise 1.5.4, Th. 5.1.8)
y y y
π(y) = + + o .
log y (log y)2 (log y)2
√
12.4.10. Proof of Theorem 12.4.6: We choose y = log x and for simplicity we abbreviate
M = Ψodd (x, y) . We also define k by 2k < M ≤ 2k+1 . Now consider the sequence 1 =
n1 < n2 < · · · < nM ≤ x of odd integers up to x without prime factors larger than y .
Since 2k < M , by Dirichlet’s pigeon-hole principle there are two distinct elements ni <
nj in this sequence, having the same residue class modulo 2k . Let d = GCD(ni , nj )
and set
a = ni /d, b = (nj − ni )/d, c = nj /d.
The triple (a, b, c) so obtained is the desired good example for the abc -conjecture.
By construction, 2k divides nj − ni = d · b . On the other hand, ni and nj are both odd,
thus d is odd. Hence 2k divides b , giving
rad(b) ≤ 2−k+1 b ≤ 4M −1 b.
Also, every prime factor of ni nj does not exceed y , whence
rad(ac) ≤ p = eϑ P .
p∈P
422 T H E abc-CONJECTURE
rad(abc) ≤ 4M −1 eϑ P b. (12.9)
2y y
log M = y + +o ,
log y log y
therefore from (12.7) and (12.9) we infer that
b>K p
p|abc
for some positive constant κ . This hypothetical bound is not too far away from the actual
lower bound. Thus Baker’s conjecture, if true, would probably be close to optimal.
Let N be a positive integer. Then we can find coprime positive integers a , b , c = a + b not
exceeding N , such that the prime factors of a and c are only from the set P and moreover
−1
1 (log N )|P| −ϑ P
c> e log p rad(abc).
4 |P|! p∈P
Proof: Let M = Ψ(N, P) and let n1 < n2 < · · · < nM be the set of integers up to N
all of whose prime factors are from the set P . Then Proposition 12.4.8 yields
−1
(log N )|P|
M≥ log p .
|P|! p∈P
If 2k < M ≤ 2k+1 , then by Dirichlet’s pigeon-hole principle we can find two distinct
elements ni , nj in the same class (mod 2k ) ; the rest of the proof is as in 12.4.10.
12.4. Examples 423
n n
Example 12.4.13. Consider the special case a = 1 , b = 32 − 1 , c = 32 . Then Euler’s
theorem shows that 2n+1 divides b , giving the explicit example
log c
c> rad(abc).
3 log 3
This is comparable in strength with what we can obtain from Proposition 12.4.12 if P
consists of only one prime.
Example 12.4.14. We note here the example 2 + 109 · 310 = 235 , which has high abc -
ratio log c/ log rad(abc) , namely 1.62991 . This was found by E. Reyssat, by searching
for very good rational approximations to numbers of type a1/n . Since
1 1
109 5 = 2 +
1
1+
1
1+
1
4+
77733 + . . .
we see that 239
is a very good convergent to 1091/5 , corresponding to the example (note
that since 9 is a square this example is rather more effective than others of similar type).
Remark 12.4.15. For any coprime integers a, b, c with a+b = c , the abc -ratio is defined
by
log max(|a|, |b|, |c|)
.
log rad(abc)
The strong abc -conjecture implies that the set of accumulation points of all abc -ratios is the
interval [ 13 , 1] . For a proof, note that the lower bound 13 is obvious and the upper bound 1
is the strong abc -conjecture. Now take a = nk g(n) , b = h(n) , c = nk g(n) + h(n) with
suitable polynomials g , h , and apply Theorem 12.2.15 to f (x) := xg(x)h(x)(xk g(x) +
h(x)) to ensure that f (n) is square free infinitely often. For such values of n , we have
rad(abc) = f (n) . As n → ∞ , the abc -ratio tends to
" #
max k + deg(g), deg(h), deg(xk g + h)
.
1 + deg(g) + deg(h) + deg(xk g + h)
The set of these rational numbers is contained in [ 13 , 1] and is dense there.
12.4.16. A simple heuristic argument has been proposed, suggesting that statements stronger
than Proposition 12.4.12 may be true. This is based on the so-called birthday paradox.
There are precisely ms integral vectors {n1 , . . . , ns } with 1 ≤ ni ≤ m . On the
other hand, the number of such vectors where all components ni are distinct is m(m −
1) · · · (m − s + 1) . By Stirling’s formula, we have
2
m(m − 1) · · · (m − s + 1)/ms ∼ e−s /2m
√
uniformly for s = o(m2/3 ) . This means that, if we look at vectors of length s ∼ α m
with α = o(m1/6 ) , then the probability of such a vector of having two equal components
2
is asymptotic to 1 − e−α /2 . In other words, a relatively short vector has a positive prob-
ability of having two equal components. For example, in a group of 40 people there is an
89 % chance that two persons will be born on the same day of the year (whence the name
“birthday paradox,” since such a conclusion appears to be implausible at first sight).
424 T H E abc-CONJECTURE
12.4.17. On the other hand, it has been pointed out by Granville that such a sequence ni
is unlikely to have the randomness property needed to apply the birthday paradox with
confidence. In fact, the construction proposed before can be rephrased as follows:
Let p1 , . . . , pt be the elements of P (we assume 2 ∈
/ P ). We want to solve the congruence
pa1 1 · · · pat t ≡ pb11 · · · pbt t (mod 2k ).
Let G be the group (Z/2k Z)× of units of the ring Z/2k Z . Then, if g1 , . . . , gt are the
images of P in G , the above congruence becomes a relation
g1a 1 −b 1 · · · gta t −b t = 1
in the group G . Now there are too many choices of pairs (a1 , . . . , at ) , (b1 , . . . , bt ) leading
to the same difference (a1 − b1 , . . . , at − bt ) and the argument on which the birthday
paradox was built collapses. A closer analysis of this argument only shows that the constant
4 − δ in the theorem of Stewart and Tijdeman should be replaced by a larger one.
Remark 12.4.18. Indeed, M. van Frankenhuysen [304], using a packing of spheres argu-
ment, has improved the constant 4 − δ in Theorem 12.4.10 unconditionally to 6.068 .
We start with Hall’s conjecture and its strong version. Then we recall some ba-
sic facts about elliptic curves over Dedekind domains to introduce notions such
as reduction, minimal discriminant, and conductor. This enables us to formulate
Szpiro’s conjecture bounding the discriminant in terms of the conductor of an el-
liptic curve over Q . Finally, we prove the equivalence of the abc-conjecture, the
strong Hall conjecture, and the generalized Szpiro conjecture, via Frey’s famous
elliptic curve y 2 = x(x + a)(x − b). At the end, we give the generalization of
Hall’s conjecture due to Hall–Lang–Waldschmidt–Szpiro.
12.5.1. In 1969, on the basis of what at that time was considered extensive numer-
ical evidence, Marshall Hall [144] conjectured that there is a positive constant C
1
such that |x3 − y 2 | ≥ C|x| 2 for x, y ∈ Z with x3 − y 2 = 0. The exponent 1/2
in this statement cannot be improved upon, as shown a little later by L.V. Danilov,
12.5. Equivalent conjectures 425
who proved in [75] that 0 < |x3 − y 2 | < 0.97|x|1/2 has infinitely many solutions
in integers x , y .
The Hall conjecture is unlikely to be true as originally formulated and nowadays
we refer to Hall’s conjecture as the slightly weaker statement in which the expo-
nent 12 is replaced by 12 − ε and C by some C(ε) > 0, for every fixed ε > 0. An
equivalent formulation is that, for any solution of y 2 = x3 − z with x, y, z ∈ Z
and z = 0 viewed as a parameter, we have |x| ε |z|2+ε , |y| ε |z|3+ε .
12.5.2. What is of interest to us here is a stronger form of the conjecture, the
strong Hall conjecture below, claiming a bound in terms of rad(z) rather than
|z|. However, we have to avoid the counterexample
4 3 6 2
2p − 3p = −p12 .
Note that every solution of z = x3 − y 2 = 0 with GCD(x3 , y 2 ) divisible by a
sixth power g 6 may be reduced to a solution x := x/g 2 , y := y/g 3 , z := z/g 6
such that GCD(x , y ) is free from sixth powers. We call (x , y , z ) a primitive
3 2
solution.
Conjecture 12.5.3. Given ε > 0, every primitive solution of x3 − y 2 = z = 0
satisfies
|x| ε rad(z)2+ε , |y| ε rad(z)3+ε .
12.5.4. Another conjecture concerns the discriminant and the conductor of an el-
liptic curve. We recall first some basic facts (cf. [284] for more details).
Let K be any field. By (8.2) on page 241, an elliptic curve over K may be given
by an affine Weierstrass equation
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 . (12.10)
Using the quantities
b2 = a21 + 4a2 , b4 = a1 a3 + 2a4 , b6 = a23 + 4a6
and
b8 = a21 a6 + 4a2 a6 − a1 a3 a4 + a2 a23 − a24 ,
we define
c4 = b22 − 24b4 , c6 = −b32 + 36b2 b4 − 216b6
and the discriminant
∆ = −b22 b8 − 8b34 − 27b26 + 9b2 b4 b6 .
If char(K) = 2, then replacing y by 12 (y − a1 x − a3 ) leads to the Weierstrass
normal form
y 2 = 4x3 + b2 x2 + 2b4 x + b6 .
426 T H E abc-CONJECTURE
1 1 1 1
(x, y) = x − b2 , y − a1 x + a1 b2 − a3 (12.14)
12 2 24 2
transforms the Weierstrass equation
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
into
1 1
(y )2 = (x )3 − c4 x − c6 . (12.15)
48 864
The discriminant ∆ of (12.15) is ∆ = ∆, because the change of variables
(12.14) has determinant 1.
A B
The coefficients of (12.14) are a priori only in the ring R 12 , 13 , but if
1 1 1
b2 ∈ R, a1 ∈ R, a3 ∈ R, (12.16)
12 2 2
then (12.14) has coefficients in R and (12.15) is again an R -Weierstrass equation
equivalent to the original one.
Now we claim that the conditions (12.16) already follow from 48 1
c4 ∈ R and
1
864 c6 ∈ R . Assuming this claim the proof of the proposition is immediate, be-
1
cause if 48 c4 ∈ π 4 R and 8641
c6 ∈ π 6 R then we can make the further change of
428 T H E abc-CONJECTURE
1 1
− b2 + 4b4 b2 ∈ R,
3 3
yielding 1
3 b2 ∈ R , as wanted.
If char(k(v)) = 2, then 3 is invertible in R and the argument becomes rather
1
intricate. We do this in several steps. The assumption now is 16 c4 ∈ R and
1
c
32 6 ∈ R , and we want to show that
1 1 1
a1 ∈ R, a3 ∈ R, b2 ∈ R.
2 2 4
Since c4 = b22 − 24b4 and v(c4 ) ≥ 4v(2), we get v(b22 ) ≥ 3v(2) and v(b2 ) ≥
2 v(2). Now b2 = a1 + 4a2 and we conclude that v(a1 ) ≥ 4 v(2).
3 2 3
9 3 7
≥ min 5v(2), v(2), 2v(2) + v(2) = v(2)
2 2 2
and v(b6 ) ≥ 12 v(2) follows. Since b6 = a23 + 4a6 , we see that v(a3 ) ≥ 14 v(2).
Therefore, v(a1 a3 ) ≥ 34 v(2)+ 14 v(2) = v(2) and, recalling that b4 = a1 a3 +2a4 ,
we infer that v(b4 ) ≥ v(2). Now c4 = b22 − 24b4 gives
v(b22 ) ≥ min(v(c4 ), v(24b4 )) ≥ 4v(2)
and v(b2 ) ≥ 2v(2). Hence 1
4 b2 ∈ R , as wanted.
Now b2 = a21 + 4a2 shows that v(a1 ) ≥ v(2) and 1
2 a1 ∈ R , as wanted.
Finally
v(216b6 ) = v(b6 ) + 3v(2) ≥ min(v(c6 ), v(b32 ), v(36b2 b4 ))
≥ min(5v(2), 6v(2), 2v(2) + 2v(2) + v(2)) = 5v(2)
and we get v(b6 ) ≥ 2v(2). Since b6 = a23 + 4a6 , we conclude that v(a3 ) ≥ v(2)
and 12 a3 ∈ R , as wanted.
12.5. Equivalent conjectures 429
12.5.8. Now let R be a Dedekind domain with quotient field K . Then every
maximal ideal p induces a discrete valuation vp and hence a minimal discriminant
∆p of the elliptic curve E over K . The minimal discriminant of E is the ideal
∆ := pvp (∆p ) ,
p
where p ranges over all maximal ideals of R . Clearly, only finitely many prime
ideals give a non-trivial contribution and ∆ is well defined. In general, there is
no minimal Weierstrass equation with coefficients in R working for all primes.
However for R = Z , there is such a minimal Weierstrass equation working for
all primes (cf. [284], VIII.Cor.8.3) and we call it the global minimal Weierstrass
equation.
12.5.9. We do not give the precise definition of the conductor of an elliptic curve
over a number field K (the reader can find it in A. Ogg [230], T. Saito [256] or
J.H. Silverman [286], IV, §10). All we need to know here is that it is an ideal
cond(E) = p fp ,
p
where p ranges over the maximal ideals of OK and the exponent fp ∈ N has the
following properties (cf. [286], Th.IV.10.2, Cor.IV.11.2):
If p = 2 and 16 | abc, then 211 | ∆ and the transformation formula for ∆ in
(12.13) on page 426 shows that (12.17) is a minimal Weierstrass equation for p =
2 and hence a global minimal Weierstrass equation.
This is not necessarily true if one of a, b, c is divisible by 16. Up to evident
changes of coordinates, there are two cases:
Since the residue characteristic is not 2, we see that the singular point (0, 0) is a
node and there is multiplicative reduction.
For p = 2, we assume first that the minimal Weierstrass equation has the form
(12.17) on page 429. Then ∆ ≡ 0 (mod 2) and Frey’s curve E has always bad
reduction at 2. The singular point (x0 , y 0 ) of E is determined by the unique
solution of x20 = ab (we are in characteristic 2). The transformation x = x + x0 ,
yy + y 0 leads to the Weierstrass equation
(y )2 = (x )3 + (x0 + a − b)(x )2
for E . As the residue characteristic is 2, the tangents in the singularity have the
same direction and hence it is a cusp, and there is additive reduction at 2.
Finally, assume that the minimal Weierstrass equation of E in p = 2 has the form
(12.18), hence a ≡ 1 (mod 4) and b ≡ 0 (mod 16) . Assuming also that E has
bad reduction at 2, ∆ ≡ 0 (mod 2) leads to b ≡ 0 (mod 32) . Then E is given
by the affine Weierstrass equation
a−b−1
y 2 + xy = x3 + dx2 , d=
4
and it is clear that the singular point (0, 0) is a node. Hence we have multiplicative
reduction.
We conclude that the conductor of E has the form
cond(E) = 2f2 · p,
p |a b c
p =2
1+ε
x3 y 2 z
max(|x|3 , |y|2 ) ε Γ rad . (12.19)
Γ3
We claim that
x3 y 2 z
Γ rad ≤ |xy| rad(z). (12.20)
Γ3
For the proof, it is enough to show that
3 2
x y z
vp (Γ) + vp rad ≤ vp (x) + vp (y) + vp (rad(z)) (12.21)
Γ3
holds for every prime p . This will be done by checking the following three cases:
First case: 3vp (x) < 2vp (y)
Then x3 − y 2 = z gives 3vp (x) = vp (Γ) = vp (z) and (12.21) reads as
3vp (x) + 1 ≤ vp (x) + vp (y) + χ(vp (x)), (12.22)
where χ(t) = 0 if t ≤ 0 and χ(t) = 1 if t > 0. Since Γ is free from sixth
powers, only the cases vp (x) = 0, 1 occur and (12.22) obviously holds.
Second case: 2vp (y) < 3vp (x)
Similarly as in the first case, we have 2vp (y) = vp (Γ) = vp (z) and (12.21) is
equivalent to
2vp (y) + 1 ≤ vp (x) + vp (y) + χ(vp (y)).
Checking vp (y) = 0, 1, 2, this is true.
Third case: 3vp (x) = 2vp (y)
Since Γ is free from sixth powers, we have vp (x) = vp (y) = 0 and (12.21) holds.
Now we deduce the strong Hall conjecture from (12.19) and (12.20). We conclude
that
|x|3 ε |xy|1+ε rad(z)1+ε , (12.23)
and
|y|2 ε |xy|1+ε rad(z)1+ε . (12.24)
By combining (12.23) and (12.24), we get
|xy|6 ε |xy|5(1+ε) rad(z)5(1+ε) .
By passing to a sufficiently small ε , as we are allowed to do, we get
|xy| ε rad(z)5+ε .
12.5. Equivalent conjectures 433
We have already remarked in 12.5.5 and 12.5.9 that if p = 2, 3 and p divides both
c4 and c6 then E has additive reduction at p and fp ≥ 2. Hence 12.5.9 leads to
rad(∆) rad(Γ) ≤ 6 cond(E),
proving what we want.
(c) ⇒ (a): Let a, b, c ∈ Z be coprime with a + b = c and let us consider the
Frey curve E from Example 12.5.10 with affine equation
y 2 = x(x + a)(x − b).
First, we assume that 16 | abc. Then (12.17) on page 429 is the global minimal
Weierstrass equation for E and the generalized Szpiro conjecture leads to
max 163 |a2 + ab + b2 |3 , 16|abc|2 ε cond(E)6+ε . (12.26)
2 3 abc
max |a + ab + b | ,
2
ε cond(E)6+ε .
16
434 T H E abc-CONJECTURE
divide the identity by 33 to get coprime x and y . Then Conjecture 12.5.14 gives
a2 + ab + b2 ε rad(abc)2+ε
proving immediately the strong abc-conjecture for c = a + b .
Remark 12.5.16. The conjecture implies that the generalized Fermat equation
Axm + By n = Cz p has only finitely many solutions in coprime integers x,
y , z provided ABC = 0 and m 1
+ n1 + p1 < 1. In the next section, we will
show that this is actually a consequence of Faltings’s theorem. It has been also
conjectured that for fixed A , B , C there will be no non-trivial solutions (namely,
satisfying a certain necessary coprimality condition) for sufficiently large m , n ,
and p , although this remains open.
If A = B = C = 1, we quickly find the solutions
1 + 23 = 32 , 25 + 72 = 34 , 73 + 132 = 29 ,
27 + 173 = 712 , 35 + 114 = 1222 .
However, Beukers did a more thorough computer search and found the amazing
examples
177 + 762713 = 210639282 , 14143 + 22134592 = 657 ,
438 + 962223 = 300429072 , 338 + 15490342 = 156133
to which Zagier added
92623 + 153122832 = 1137 .
These five large solutions look unusual from the point of view of diophantine equa-
tions. Their respective abc-ratios log c/ log rad(abc) are approximately
1.14125, 1.12221, 1.06109, 1.05700, 1.08836.
These values, fairly close to 1, are consistent with the strong abc-conjecture pre-
dicting that 1 is the largest accumulation point of abc-ratios.
examples with infinitely many coprime solutions by essentially doubling the points (see
[78], §6).
In the general case, the argument is more involved because the equation no longer describes
a projective curve. The idea of Darmon and Granville is to consider a finite Galois covering
π : C → P1 , which is unramified outside π −1 {0, 1, ∞} and with ramification indices
p, q, r over 0, 1, ∞ . The construction of π will use the topological and analytical means
from Section 12.3 and Belyı̆’s theorem will show that π is defined over a number field K .
By analysing the ramification of K(P )/K(β) for β = Axp /(Cz r ) and P ∈ π −1 (β) ,
we can deduce from Hermite’s theorem that the fibre points over β are rational over a fixed
number field. This may be seen as an extension of the Chevalley–Weil theorem to modestly
ramified coverings. By Hurwitz’s theorem, the genus of C will turn out to be ≥ 2 and
hence Faltings’s theorem will lead to the finiteness of the number of coprime solutions.
This completes the discussion in Remark 12.5.16.
In this section, we assume that the reader is familiar with Section 12.3.
12.6.2. Our goal is to construct a ramified covering π : Can → P1an for a connected
Riemann surface Can which is unramified outside π −1 {0, 1, ∞} . By 12.3.4 and 12.3.8,
such a covering corresponds to the subgroup H = π1 (Can \ π −1 {0, 1, ∞}, y0 ) of G =
π1 (P1an \ {0, 1, ∞}, x0 ) , where the base point y0 lies over x0 . We would like to have a
Galois covering, which means that H is a normal subgroup. It follows easily from 12.3.5
and 12.3.8 that the covering group G/H acts also transitively on the ramified fibres of π .
For x ∈ Can , we denote by ex/π(x) the ramification index of the corresponding valua-
tions which is equal to the multiplicity of the fibre divisor π ∗ ([π(x)]) in x (see Example
12.6. The generalized Fermat equation 437
1.4.12 for an algebraic discussion). The ramification index should not be confused with the
multiplicity of the ramification divisor at x which is ex/π(x) − 1 .
Lemma 12.6.3. Let π be a covering as in 12.6.2. If π is finite, then ex/π(x) is equal to
the cardinality of the stabilizer (G/H)x of x .
Proof: The argument may be done complex analytically or complex algebraically by A.14.7.
Since the covering group G/H acts transitively on the fibre π −1 (y) for any y ∈ P1an , the
ramification index ex/π(x) depends only on π(x) . By 12.3.6, the degree of π is equal to
[G : H] . By Example 1.4.12, we have deg(π) = mex/π(x), where m is the cardinality of
the set π −1 (y) . Since m = [G : H]/|(G/H)x | , we get the claim.
12.6.4. In hyperbolic geometry, we easily show that the disc D = {|z| < 1} contains a
geodesic triangle with angles πp , πq , πr . This means that the sides are contained in circles
perpendicular to {|z| = 1} and the sum of angles is π( p1 + 1q + 1r ) < π leading to hyper-
bolicity. Moreover, reflecting the triangles successively at their sides, we get a tessellation
of D . By the Riemann mapping theorem ([8], Ch.6, Th.s 1-4), the interior of the original ge-
odesic triangle may be mapped biholomorphically on to the upper half plane {(w) > 0}
and the boundary is mapped to the boundary. Moreover, by Schwarz’s reflection principle,
we get an extension to a holomorphic map Ω : D → C . The multivalued inverse may be
described explicitly by a quotient of hypergeometric functions. For details, we refer to C.
Carathéodory [57], Vol.II, Part 7, Ch.III.
By the description above, it is clear that Ω is ramified only over points over 0, 1, ∞ with
ramification indices p , q , and r , respectively. Let P0 , P1 , P∞ be the vertices of the origi-
nal geodesic triangle ∆ lying over 0, 1, ∞ . Let τ∞ be the hyperbolic reflection at the side
through P0 and P1 . Similarly, we denote the reflections at the other sides of ∆ by τ0 , τ1
and we set γ0 := τ1 ◦ τ∞ , γ1 := τ∞ ◦ τ0 , γ∞
:= τ0 ◦ τ1 . These are hyperbolic rotations
2π 2π 2π
around the vertices P0 , P1 , P∞ with angles p , q , r . So we have the relations
(γ0 )p = (γ1 )q = (γ∞
r
) = γ0 γ1 γ∞
=1 (12.27)
in the subgroup Γ of the automorphism group of D generated by γ0 , γ1 , γ∞
.
By construc-
tion, it is easy to see that Γ is the covering group of Ω . We call Γ a triangle group.
12.6.5. We can show that Γ is the free group on the set {γ0 , γ1 , γ∞
} modulo the relations
(12.27) (see S. Katok [159], Sec.4.3). It follows from 12.3.4 and 12.3.5 that a covering
π : Can → P1an as in 12.6.2 has ramification indices over 0, 1, ∞ dividing p , q , and r ,
respectively, if and only if π factors through Ω . We will use this only as a motivation to
consider suitable normal subgroups of finite index in Γ to get our desired finite covering
π of ramification indices p , q , and r . This will be provided by Fox’s theorem solving a
conjecture of Fenchel.
Theorem 12.6.6. The triangle group Γ has a torsion-free normal subgroup H of finite
index.
For the quite elementary algebraic proof, we refer to the original article R.H. Fox [121].
Corollary 12.6.7. There is a compact connected Riemann surface Can and a finite Galois
covering π : Can → P1an which is unramified outside of π −1 {0, 1, ∞} and which has
ramification indices p, q, r at the points over 0 , 1 , and ∞ , respectively.
438 T H E abc-CONJECTURE
Proof: By 12.3.4 and 12.3.5, the normal subgroup H from Fox’s theorem lifts to a normal
subgroup of π1 (P1an \{0, 1, ∞}, x0 ) leading to a ramified Galois covering π : Can → P1an .
From 12.3.6, we deduce that π is finite and deg(π) = [Γ : H] . By construction, π
is unramified outside of π −1 {0, 1, ∞} . The geometric description of Γ given in 12.6.4
shows that Γ contains an element γ0 of order p . Since H is torsion free, we conclude
that Γ/H contains also an element γ0 of order p . By 12.3.4 and 12.3.8, there is a ramified
covering ϕ : D → Can such that Ω = π ◦ϕ . By 12.3.5, it is clear that Γ/H is the covering
group of π . By construction, γ0 is in the stabilizer of ϕ(P0 ) .
Thus Lemma 12.6.3 yields that p divides the ramification index eϕ(P 0 )/0 . On the other
hand, we have eϕ(P 0 )/0 |eP 0 /0 = p , hence eϕ(P 0 )/0 = p . We argue at points over 1 and
∞ much in the same way, proving the claim.
12.6.8. By Belyı̆’s theorem and its proof (see Theorem 12.3.11), there is a number field K
and a morphism π : C → P1K from an irreducible smooth projective curve C over K such
that the ramified covering from Corollary 12.6.7 is induced by base change. We are free to
enlarge K , hence we may assume that all fibre points over 0, 1, ∞ are K -rational. Since
we are in characteristic 0 , local parameters do not change under field extension (use A.4.6)
and hence the ramification points of C lie over 0, 1, ∞ with ramification indices p , q , and
r , respectively.
Proposition 12.6.9. The genus of C satisfies g(C) ≥ 2 .
Proof: The genus is invariant under base extension (use A.10.28), so we may work over the
complex numbers. Since the covering is Galois, the covering group operates transitively on
the fibres (see 12.6.2) and the ramification indices are the same in a given fibre. By Example
1.4.12, the number of fibre points over 0, 1, ∞ is equal to dp , dq , and dr , respectively, where
d is the degree of the covering morphism π . Then Hurwitz’s theorem (see Theorem B.4.6)
yields
d d d 1 1 1
2 − 2g(C) = 2d − (p − 1) − (q − 1) − (r − 1) = d( + + − 1) < 0
p q r p q r
proving the claim.
12.6.10. Now we use the language of schemes to construct an irreducible projective scheme
C over OK with generic fibre C = CK . This can be done by using a set of homogeneous
equations with coefficients in OK describing C in projective space PnK . This leads to a
closed subscheme of PnO K and we choose C to be the irreducible component containing
C.
The morphism π does not necessarily extend to a morphism C → P1O K , its domain is
an open subset containing the generic fibre. Hence there is a closed finite subset S of
Spec(OK ) such that π extends to a morphism π : CS → P1O S , K over OS,K , where OS,K
is the ring of S -integers in OK and CS is the part of C lying over Spec(OS,K ) . Since
the set of smooth points of CS is open and contains C (use ([137], Th.17.5.1), the same
argument shows that we may assume, after enlarging S , that CS is a smooth scheme over
OS,K . This means that C has good reduction over the non-archimedean places outside of
S . We may also assume that no place outside of S divides pqr .
The set of unramified points of CS with respect to π is open (see B.3.2) and does not contain
the points over 0, 1, ∞ of the generic fibre C . Hence the ramified points are contained in
12.6. The generalized Fermat equation 439
the closure of π −1 {0, 1, ∞} in CS and in finitely many fibres of C over closed points of
Spec(OS,K ) . By enlarging S again, we may exclude the latter.
For two points in C , their closures in C intersect at most in finitely many points lying over
the closed points of Spec(OK ) . Again, we may assume that the closures in CS of different
fibre points over 0 are disjoint and that the same holds for the fibres over 1 and ∞ .
12.6.11. A point P ∈ C may be viewed as an OK (P ) -integral point of C by the valuative
criterion of properness ([148], Th.II.4.7). This means that there is a unique morphism P :
Spec(OK (P ) ) → C mapping the generic point {0} to P . For a non-archimedean place w
of K(P ) with maximal ideal mw ∈ Spec(OK (P ) ) , let P (w) := P (mw ) . We suppose that
the restriction v of w to K is not contained in S and that P (w) is a k(v) -rational point of
the reduction Ck(v) . Since C has good reduction in v , the reduction is a smooth curve over
k(v) and hence we have a local parameter ζ& in the discrete valuation ring OCk (v ) ,P (w) .
Let ζ be a lift to OC,P (w) .
Lemma 12.6.12. Under the assumptions above and if we identify the completion Kv with
a subfield of the completion K(P )w , then K(P )w = Kv (ζ(P )) .
Proof: We need to show that K(ζ(P )) is dense in K(P ) with respect to w . Let πv be a
local parameter in the valuation ring Rv of K , let πw be similarly defined for w , and let
N ∈ N.
First step: For every a ∈ OC,P (w) , there is p(x) ∈ Rv [x] with
a − p(ζ) ∈ [πv , ζ]N OC,P (w) .
of divisors on CR v . This follows from surjectivity of Ck(v) onto P1k(v) , so only horizontal
components occur in π ∗ (0) and hence the multiplicities may be computed in the generic
fibre C . For the reduction π & of π modulo v , we have the identity
& ∗ [0(v)] =
π eR/0(v)
&
&
[R]
(12.29)
π −1 (0(v))
R̃∈&
of divisors on Ck(v) . Pulling (12.28) back to Ck(v) and comparing with (12.29), we con-
clude that reduction modulo v maps π −1 (0) onto π −1 (0(v)) . By our assumptions in
12.6.10, this reduction is also one-to-one and hence bijective. Moreover, the above compar-
ison shows that the ramification index eR/0(v)
& is equal to p . By our assumptions in 12.6.8,
all fibre points over 0 are K -rational and hence all fibre points over 0(v) are k(v) -rational.
Now we consider P ∈ π −1 (Q) and a place w of K(P ) over v . Since Q(v) = 0(v) , the
above shows the k(v) -rationality of P (w) ∈ π & −1 (0(v)) . There is a unique R ∈ π −1 (0)
with R(v) = P (w) . Let ζ be a local equation of the prime divisor R in R(v) . Hence
ζ ∈ OC,P (w) with reduction ζ& equal to a local parameter of P (w) inside the smooth curve
Ck(v) . Let x be the affine coordinate on P1K , then x is a local equation for 0 and therefore
(12.28) shows
x = uζ p (12.30)
for a unit u ∈ OC,P (w) .
12.6. The generalized Fermat equation 441
Let L be the number field obtained from K by adjoining all p th roots of u(R) . Since
by assumption the characteristic of k(v) does not divide p , the extension L/K is unram-
ified over v (Lemma B.2.6). By Proposition B.2.3, it is enough to show that L(P )/L is
unramified over v . Replacing K by L , we may assume that K contains all p th roots of
u(R) .
We claim that Kv (β 1/p ) = K(P )w . Replacing u by u/u(R) and ζ by u(R)1/p ζ ,
we may assume u(R) = 1 . For the reduction u & of u modulo v , we have u &(P (w)) =
&(R(v)) = 1 ∈ k(v) and hence |u(P ) − 1|w < 1 . If u has a p th root, then our claim
u
follows directly from (12.30) and Lemma 12.6.12. However, in general the p th root of the
unit u exists only as a v -adic analytic function and we need to modify the proof slightly.
First we note the formal identity
1 ∞
1/p
u =
p (u − 1)n , (12.31)
n=0
n
which becomes an equation in the w -adic topology by evaluating it at u = u(P ) (see proof
of Lemma 12.6.13). The first step in the proof of Lemma 12.6.12 shows that u − 1 may
be expressed as a formal power series in ζ without the constant term and with coefficients
in the valuation ring R̂v of the completion Kv . Inserting this in the right-hand side of
(12.31), and using (12.30), we get a formal identity
1
∞
xp = ζ + an ζ n (12.32)
n=2
for unique coefficients bn ∈ R̂v . This follows from putting (12.33) into (12.32) and com-
parison of coefficients. Inserting P in (12.32) and (12.33), completeness yields Kv (β 1/p ) =
Kv (ζ(P )) and hence Lemma 12.6.12 implies Kv (β 1/p ) = K(P )w .
By assumption the residue characteristic of v does not divide p and p|v(β) , hence Lemma
B.2.6 yields that K(P )w /Kv is unramified over v . Since residue fields do not change by
passing to completions (see Proposition 1.2.11), we have proved that K(P )/K is unrami-
fied over w .
Now we are ready to prove the theorem of Darmon and Granville.
Theorem 12.6.15. Let p, q, r ∈ N \ {0} with p1 + 1q + 1r < 1 and let A, B, C ∈
Z \ {0} . Then there are only finitely many solutions (x, y, z) ∈ Z3 of the generalized
Fermat equation
Axp + By q = Cz r
with GCD(x, y, z) = 1 .
Proof: Let π : C → P1K be the covering from 12.6.8 and let S be the finite set of non-
archimedean places on K constructed in 12.6.10. By enlarging S , we may assume that it
442 T H E abc-CONJECTURE
contains all places dividing ABC . Let (x, y, z) be a coprime solution of the generalized
Fermat equation. We may assume xyz = 0 excluding at most finitely many solutions. For
Axp
β := ,
Cz r
Szpiro’s conjecture, originally formulated only for the discriminant and with a un-
determined exponent, was formulated at an exposition in Hannover in 1983. Influ-
enced by this conjecture and the theorem of Stothers–Mason, the abc-conjecture
came out from a discussion between Masser and Oesterlé in 1985. The original
purpose of both conjectures was to give new insights into Fermat’s last theorem.
N. Elkies [98] proved that the abc-conjecture implies Mordell’s conjecture, effec-
tively if an effective version of the abc-conjecture is available. He also mentioned
that his argument using Belyı̆’s lemma proves Vojta’s height inequality and hence
Roth’s theorem as a special case. This aspect of the abc-conjecture over arbitrary
number fields will be touched upon in Section 14.4. Proofs of Theorem 12.2.9
have also been given by Langevin in [177] and Oesterlé (in a seminar).
The implication (b) ⇒ (a) in Belyı̆’s theorem was well known to the experts, the
argument uses a specialization argument due to Weil. The input of Grothendieck
to Theorem 12.3.12 was the generalization of Riemann’s existence theorem. In
1979, G.V. Belyı̆ [21] proved the converse implication in Theorem 12.3.11 and
caused Grothendieck to remark that such a deep result was never proved in such
a simple and short way (see A. Grothendieck [138], where the study of dessins
d’enfant was initiated).
12.7. Bibliographical notes 443
In 1981 W.W. Stothers [293] obtained the abc-theorem for polynomials; our first
proof follows essentially his arguments with Hurwitz’s genus formula. Indepen-
dently, Mason was inspired by an analogue of Baker’s theory of linear forms of log-
arithms for function fields and found, in 1983, the generalization given in 12.4.3,
which is a major tool in his studies of diophantine equations over function fields
(R.C. [192], [193]).
The idea of the second proof appears, in a special case, in a Comptes Rendus
note of A. Korkine in 1880 (see [89], Vol.II, p.750). The general abc-theorem for
polynomials is a special case of a result of D. Brownawell and D. Masser [52],
where also the first example in Remark 12.4.5 appears and, independently, of J.F.
Voloch [319]. Granville’s remark at the end of Section 12.4 is unpublished, so it
remains to the reader to fill in the necessary details.
The equivalence of the various conjectures in Section 12.5 is given by Vojta [307],
Ch.5, App.ABC. Note however that his version of the Hall–Lang–Waldschmidt–
Szpiro conjecture is not true as stated, with the counterexample given in 12.5.2. If
we assume x, y coprime as in Conjecture 12.5.14, then his proof of “Hall–Lang–
Waldschmidt–Szpiro conj. ⇒ generalized Szpiro conj.” is incomplete because c4
and c6 may have common divisors for a minimal Weierstrass equation.
In J. Oesterlé [229], a sketch of proof for “abc ⇒ generalized Szpiro conj.” is
given after ideas of Hindry, analysing the reduction of the elliptic curve at every
place. This is similar to our proof of Theorem 12.5.12 (a) ⇒ (b) based on the
minimality criterion in Corollary 12.5.7, for which we could not find a complete
reference in the literature. The implication “Hall–Lang–Waldschmidt–Szpiro conj.
⇒ abc” is from A. Nitaj [222], where the reader will also find further applications
of the abc-conjecture.
In Section 12.6, we follow the presentation of Darmon and Granville [78]. Beck-
mann’s proof of the crucial Lemma 12.6.14 uses Galois representations. We ex-
pand here the sketch of proof given in [78]. Darmon [76] has also an interpretation
of the lemma as a Chevalley–Weil theorem for orbifolds. For explicit examples
and a similar treatment for the superelliptic equation, we refer to [78].
1 3 N E VA N L I N NA T H E O RY
13.1. Introduction
In 1987 Vojta formulated a sweeping set of precise conjectures about the structure
of the set of rational points on algebraic varieties. The rationale about these con-
jectures was a rather precise analogy between the Nevanlinna theory of the distri-
bution of values of meromorphic functions and diophantine approximation. In this
way, Vojta motivated, clarified, and unified results and conjectures in diophantine
approximation and diophantine equations. The analogy between Nevanlinna the-
ory and diophantine approximation had also been noticed earlier by Ch. Osgood,
in a somewhat different setting.
Here we discuss the Nevanlinna theory, while Vojta’s conjectures, their connexion
with the abc-conjecture studied in the preceding chapter, and their parallelism with
Nevanlinna theory, will be dealt with in the next final chapter.
Section 13.2 is a brief introduction to the classical Nevanlinna theory in one vari-
able, presenting the main results together with some examples. Next, Section
13.3 presents the Ahlfors–Shimizu elegant formulation of the theory, including
Ahlfors’s proof of the second main theorem. Holomorphic curves are the content
of Section 13.4 dealing with geometric aspects of Nevanlinna theory, culminating
with the conjectural second main theorem of Griffiths.
This chapter may be read independently of the other parts of the book.
Nevanlinna theory in one variable in its simplest form describes the value distri-
bution theory of a non-constant meromorphic function f : C → P1an . First, we
prove Jensen’s formula, which is the basic tool. As a consequence, we will ob-
tain Nevanlinna’s first main theorem. Then we state without proof the lemma on
the logarithmic derivative and the second main theorem of Nevanlinna. Next we
present the notion of defect, deduce the basic defect inequality and Picard’s lit-
tle theorem as a corollary. The section ends by showing that the analogue of the
abc-conjecture holds for meromorphic functions on C .
444
13.2. Nevanlinna theory 445
Throughout this chapter, we shall suppose that the meromorphic function f is not
a constant, unless specified otherwise. Proofs, when given, will be written in a
concise form. Working them out in full detail is left as a useful exercise for the
reader. We follow here the standard classical notation except that the ramification
function and the counting function truncated at 1 are denoted by Nram and N (1)
instead of N1 and N , which we find in the classical literature.
The usual way to study the distribution of values of a meromorphic function f (z)
is to consider the number of solutions, counted with multiplicity, of the equation
f (z) = a in a disk {|z| < r}, as r varies. Recall that ordz (f ) denotes the order
of f at z ∈ C . The main tool at our disposal is the Poisson–Jensen formula:
Proposition 13.2.1. Let f be meromorphic in the closed disk |z| ≤ R and assume
that f (z) = 0, ∞ . Then for |z| < R it holds
2
R − az
log |f (z)| = − orda (f ) log
R(z − a)
|a|<R,a=z
2π iθ
(13.1)
1 Re + z
+ log f (Reiθ ) · dθ.
2π 0 Reiθ − z
The case in which there are no zeros or poles is called Poisson’s formula and the
case z = 0 is called Jensen’s formula.
Proof: We give only a quick sketch of proof. Consider first the special case in
which f (z) has no zeros or poles. Then u(z) := log |f (z)| is harmonic in an
open neighborhood of the disk {|z| ≤ R}, hence u(0) is the average of u(z)
on the boundary {|z| = R}. The Poisson formula expressing u(z) in terms of
its boundary values reduces to this special case by composition with a Möbius
transformation mapping the disk conformally onto itself and sending the origin to
the point z .
In the general case, the Lebesgue dominated convergence theorem applied to the
functions f (rz) with r → 1 implies that we may assume f without zeros and
the claim is easily obtained by multiplying f (z)
poles on the boundary. Finally,
by the finite Blaschke product {R(z − a)/(R2 − az)}−orda (f ) for |a| < R
and applying the formula to the new function (note that the product has absolute
value 1 on |z| = R ). For more details, we refer to W. Hayman [149], p.1 and [8],
p.208.
The special case of the Poisson–Jensen formula in which z = 0 is particularly
important to us and we proceed to rewrite it as follows. First, we introduce some
notation. For a real-valued function F (r), r > 0, quantitative estimates such as
F (r) = O(log r) are always meant with respect to r → ∞ . Also we define
F + (r) := max{F (r), 0} , F − (r) := − min{F (r), 0},
446 N E VA N L I N NA T H E O RY
the number of solutions of f (z) = a in the disk |z| < r counted with their
multiplicity. For a = ∞ , we replace f by 1/f to get
n(r, ∞, f ) := ord−
z (f ),
|z|<r
the number of poles of f (z) in the disk |z| < r counted with their multiplicity.
13.2.3. This function is too irregularly behaved and a logarithmic average
r
n(t, a, f )
dt
t
is a much better function which also arises in other contexts. However, care must
be taken if f (z) − a vanishes at the origin, because then the integral diverges at
r = 0.
Definition 13.2.4. For a ∈ C and r > 0, the counting function is defined by
r
n(t, a, f ) − ord+0 (f − a)
N (r, a, f ) : = dt + ord+0 (f − a) log r
t
0
r
= ord+0 (f − a) log r + ord +
z (f − a) log .
z
0<|z|<r
On the other hand, the function N (r, a, f ) so defined is perfectly suited for a
compact reformulation of the Poisson–Jensen formula at z = 0, which is Jensen’s
formula:
Proposition 13.2.6. Let
c(f, 0) := lim f (z)z −ord0 (f )
z→0
13.2. Nevanlinna theory 447
Re + z R+r
0≤ ≤ .
Re − z
iθ R−r
Since f (z) is entire, we have orda (f ) ≥ 0 and we conclude that
R+r
log |f (z)| ≤ m(R, ∞, f ).
R−r
Setting R = 2r , we get
log+ f r ≤ 3m(2r, ∞, f ). (13.2)
Thus in the case of entire functions the proximity function at a = ∞ plays the
same role as the logarithm of the maximum modulus. This is not so for meromor-
phic functions and we owe to R. Nevanlinna the discovery of a quantity that can
take the place of the logarithm of the maximum modulus in the general case. This
is expressed in his first main theorem.
448 N E VA N L I N NA T H E O RY
If f (0) = ∞ , we have to replace log |f (0) − eiθ | above by log |c(f, 0)| and the
proof follows along the same lines.
Corollary 13.2.14. The Nevanlinna characteristic T (r, f ) is an increasing convex
function of log r .
Proof: Note that N (r, eiθ , f ) is an increasing convex function of log r , hence so
is its mean value in θ .
Remark 13.2.15. Hence T (r, f ) is increasing but it has not to be strictly increas-
ing. For example, if f {|z| ≤ r} ⊂ {|w| < 1}, then T (r, f ) = 0.
Corollary 13.2.16. The proximity function is bounded on average on circles
2π
1
m(r, a + Reiθ , f ) dθ ≤ log 2 + log+ (1/R).
2π 0
Proof: Replacing f by f − a we may assume that a = 0. Next, note that we have
a general inequality m(r, Rb, f ) ≤ m(r, b, f /R) + log+ (1/R), as we readily see
1
using f −Rb 1
= R1 (f /R)−b and the definition of proximity function. Hence we may
assume that R = 1. Now we set a = eiθ in the first main theorem, take its mean
value with respect to θ , and conclude applying Cartan’s formula and equation
(13.3).
Proposition 13.2.17. If f is not a constant, then T (r, f ) is unbounded. If
T (r, f )
lim inf < +∞,
r→∞ log r
then f is a rational function.
Proof: For the second statement, the hypothesis entails that T (rj , f ) = O(log rj )
along an unbounded sequence (rj )j∈N , hence N (rj , ∞, f ) = O(log rj ) and we
conclude that f has only finitely many poles. Hence there is a polynomial Q(z)
such that Q(z)f (z) is an entire function. By Example 13.2.12, we also have
m(rj , ∞, Qf ) ≤ m(rj , ∞, Q) + m(rj , ∞, f ) = O(log rj ).
Using (13.2) on page 447, we conclude that Qf is an entire function with
sup |Qf |(z) rjd
|z|≤rj
Example 13.2.18. Let f be a non-constant rational function. Then there are co-
prime polynomials P, Q with f (z) = P (z)/Q(z). Recall that the degree of f ,
considered as a finite morphism, is given by deg(f ) = max{deg(P ), deg(Q)}.
Similarly as in Example 13.2.12, we may compute directly
N (r, a, f ) = deg(P − aQ) log r + O(1),
m(r, a, f ) = (deg(f ) − deg(P − aQ)) log r + O(1)
for a = ∞ and
N (r, ∞, f ) = deg(Q) log r + O(1),
m(r, ∞, f ) = (deg(f ) − deg(Q)) log r + O(1).
This illustrates the first main theorem and shows that, if f is a non-constant ratio-
nal function, then T (r, f ) = deg(f ) log r + O(1).
Definition 13.2.19. The order ρ(f ) of a meromorphic function f is
log T (r, f )
ρ(f ) := lim sup .
r→∞ log r
13.2.20. For meromorphic functions f, g not identically zero, we have
T (r, f g) ≤ T (r, f ) + T (r, g), T (r, g) = T (r, 1/g) + log |c(g, 0)|.
The first identity is obtained from the analogous relations for the proximity and
counting functions. The second identity follows immediately from Jensen’s for-
mula. We conclude that
ρ(f g) ≤ max{ρ(f ), ρ(g)}, ρ(f /g) ≤ max{ρ(f ), ρ(g)}.
13.2.21. The function N (r, a, f ) is a measure of the vanishing of f − a in the
circle {|z| < r}, counting multiplicity. It is very important to measure this multi-
plicity, in other words the ramification of f , and this is done as follows.
By the Weierstrass factorization theorem ([8], Ch.5, Th.7), there are entire func-
tions f0 , f1 without common zeros such that f = f1 /f0 .
The Wronskian W (f0 , f1 ) of (f0 : f1 ) is
f0 f1
W (f0 , f1 ) := det
f0 f1
and we define
Nram (r, f ) := N (W (f0 , f1 ), 0, r).
This function is an average measure of the ramification of f in the disk {|z| < r}
and is independent of the choice of f0 , f1 with f = f1 /f0 , as long as f0 and f1
have no common zeros. For z ∈ C and a := f (z), let m(z) be the multiplicity
of z in the equation f (z) = a. Then it is easy to show that
ordz (W (f0 , f1 )) = m(z) − 1.
13.2. Nevanlinna theory 451
Therefore, we have
Nram (r, f ) = N (r, 0, f ) + 2 N (r, ∞, f ) − N (r, ∞, f ). (13.4)
13.2.22. It is also clear how to define functions nram (r, a, f ) and Nram (r, a, f ).
The only difference consists in taking max{ord+ z (f − a) − 1, 0} instead of ordz
+
−
(f − a) in the definitions of n(r, a, f ) and N (r, a, f ) (if a = ∞ use max{ordz
(f ) − 1, 0} instead of ord−z (f )). Then we have
Nram (r, f ) = Nram (r, a, f ),
a∈P1an
where the sum on the right is actually a finite sum for each r .
The proof is elementary, but rather lengthy and certainly not obvious. We refer to
the literature, see for example [149], Th.2.2 or W. Cherry and Z. Ye [64], Ch.3.
We have seen in Corollary 13.2.16 that the average of the proximity function
m(r, a, f ) on a circle of radius R is at most log 2 + log+ (1/R). This means that
values of a for which m(r, a, f ) is large and comparable with T (r, f ) must be
quite exceptional and we expect N (r, a, f )/T (r, f ) to be usually near to 1. The
second main theorem of Nevanlinna theory makes this observation quantitative.
Theorem 13.2.24. Let a1 , . . . , aq be different elements of P1an = C ∪ {∞}. Then
q
m(r, aj , f ) + Nram (r, f ) ≤ 2T (r, f ) + O(log T (r, f )) + O(log r)
j=1
outside of a set E of finite Lebesgue measure. Moreover, if f has finite order the
result holds for all large r without exception.
Proof: This is a consequence of the lemma on the logarithmic derivative as we
will show more generally for the higher-dimensional case, in Theorem 13.4.16. A
complete proof will be given in Section 13.3.
Remark 13.2.25. Lang drew attention to the significance of the error term, pointing out
that it has a definite counterpart in metric diophantine approximation (see S. Lang [171],
p.199). The interested reader will find in [64] a thorough analysis of the error term, in-
cluding numerically explicit estimates. So far, no significant application to the theory of
meromorphic functions in the plane has come from these refinements.
452 N E VA N L I N NA T H E O RY
and hence T (r, f ) = r/π . Similarly, we get m(r, 0, f ) = r/π . For a ∈ {0, ∞} , the set
of solutions of f (z) = a is log a + 2πiZ , with multiplicity 1 . From this it is easy to verify
that
r
N (r, a, f ) = + O(log r)
π
and a more accurate analysis shows that N (r, a, f ) = π −1 r + O(1) (see [149], p.7 or [64],
Prop.1.6.1). Hence from the first main theorem we get m(r, a, f ) = O(1) .
As an aside, note that Θ(a, f ) > δ(a, f ) + θ(a, f ) may occur even for functions
of finite order (contrary to what has been written in [64], p.125).
Since Nram (r, f ) = a Nram (r, a, f ), as an almost immediate consequence of
the second main theorem we have the celebrated defect inequality of Nevanlinna:
Theorem 13.2.28. Let f in C be a meromorphic function. Then
{δ(a, f ) + θ(a, f )} ≤ Θ(a, f ) ≤ 2.
a∈P1an a∈P1an
hence the defect inequality continues to hold a fortiori for all non-constant rational
functions. The last claim follows from Nram (r, a, f ) ≤ N (r, a, f ) (use 13.2.22)
and the first main theorem.
Remark 13.2.29. The term −1/ deg(f ) in the right-hand side of (13.5) arises
because we are dealing with a covering f : C → P1an , while Hurwitz’s theorem
deals with the covering f : P1an → P1an .
Remark 13.2.30. From the definition and the first main theorem, we have
N (r, a, f )
1 − δ(a, f ) = lim sup .
r→∞ T (r, f )
The defect inequality implies that δ(a, f ) = 0 except for a countable set of values
a, hence δ(a, f ) measures the failure for N (r, a, f ) to be close to T (r, f ). This
explains the terminology “defect” for δ(a, f ) and deficient value for a whenever
δ(a, f ) > 0.
Remark 13.2.31. The notion of deficient value is not invariant by translation. There are
examples of meromorphic functions f (z) such that δ(f (z), a) = δ(f (z − z0 ), a) may
occur for a suitable translation z0 . This quite pathological phenomenon is associated to
either infinite order or extremely irregular behaviour of T (r, f ) and does not occur if T (r+
1, f )/T (r, f ) → 1 as r → ∞ (see R. Nevanlinna [221], Ch.X, §2, 230).
Remark 13.2.32. In Example 13.2.26, the deficient values are a = 0, ∞ , both with
δ(a, f ) = 1 . Thus Theorem 13.2.28 is sharp.
Example 13.2.33. Let ℘(z) be the Weierstrass elliptic function associated to the elliptic
curve y 2 = 4x3 − g2 x − g3 = 4(x − e1 )(x − e2 )(x − e3 ) (see 8.3.10). Define e0 = ∞ .
Then ℘(z) is a doubly periodic meromorphic function of order 2 with no deficient values.
A doubly periodic meromorphic function on C is called elliptic. Let f be a non-constant
elliptic function with fundamental domain Ω of the period lattice Λ . The order d of f
is the number of solutions z ∈ Ω , counted with multiplicity, of the equation f (z) = a
(the reader should be careful not to confuse this definition of order with the order of f as a
meromorphic function, which is always ρ(f ) = 2 as we will show below).
454 N E VA N L I N NA T H E O RY
An application of the argument principle shows that the order does not depend on the choice
of a . An easy covering argument shows
πd
n(r, a, f ) = · r2 + O(r)
vol(Ω)
and hence
πd
N (r, a, f ) = · r2 + O(r).
2vol(Ω)
Note that the bounds may be chosen independently of a , therefore Cartan’s formula (Propo-
sition 13.2.13) gives
πd
T (r, f ) = · r2 + O(r).
2vol(Ω)
By Remark 13.2.30, no deficient values occur for an elliptic function f .
Now we specialize f = ℘ . Since ℘ has only the pole 0 ∈ Ω and the pole is of order two,
we infer that the order of ℘ is 2 as claimed at the beginning and which is, by accident, equal
to ρ(℘) . Moreover, ℘ is an odd elliptic function of order 3 with the triple pole in 0 , hence
the set of zeros of ℘ is ( 12 Λ)\ Λ and they are all simple. We conclude that 12 Λ is the set of
ramification points of ℘ . They are the solutions of the equations ℘(z) = ej (j = 0, . . . , 3)
which have only double roots. A geometric argument as above gives
π
Nram (r, ej , ℘) = · r2 + O(r).
2vol(Ω)
We conclude that θ(ej , ℘) = 12 and hence θ(ej , ℘) = 2 giving again an extremal
example of Theorem 13.2.28.
Consider the function z = z(w) defined by
w
1 1 1 −1
z= (t − a) m −1 (t − b) n −1 (t − c) p dt,
0
1
where a , b , c are distinct real numbers and m , n , p are integers such that m + n1 + p1 = 1 .
Then z(w) maps the upper half plane biholomorphically onto an open triangle in the z -
plane with angles π/m , π/n , π/p at the vertices z(a), z(b), z(c) . (Look first at the
boundary. This is a special case of the Schwarz–Christoffel formula, see [152], Th.17.6.1,
p.374.) The Schwarz reflection principle ([8], Th.6.5.26) proves that the image of the multi-
valued function z(w) with ramification points a, b, c is the whole plane. In fact, the pos-
sible values for (m, n, p) are (2, 3, 6) , (2, 4, 4) , (3, 3, 3) up to ordering and the image
triangles of the closed upper/lower plane cover C . The inverse function f (z) = w is a
meromorphic elliptic function (Schwarz reflection principle again) with no deficient values
and similar arguments as above show that
1 1 1
θ(a, f ) = 1 − , θ(b, f ) = 1 − , θ(c, f ) = 1 − ,
m n p
providing other extremal examples of Theorem 13.2.28 (see [221], Ch.X, §3, 236).
Remark 13.2.34. The following Nevanlinna inverse problem was completely solved by
D. Drasin, in a major paper [92]. Let {(aj , δj , θj )} , j = 1, 2, . . . be any finite or countable
sequence with distinct aj ∈ C ∪ {∞} , non-negative real numbers δj , θj satisfying 0 <
δj + θj ≤ 1 and j (δj + θj ) ≤ 2 . Then there is a meromorphic function f with
{aj } as its set of deficient values and δ(aj , f ) = δj , θ(aj , f ) = θj . In general, such a
13.2. Nevanlinna theory 455
function will have infinite order but the growth can always be bounded by T (r, f ) < rω(r) ,
with ω(r) → ∞ arbitrarily slowly. For finite order, there are additional restrictions. For
example, a meromorphic function of order 0 can have at most one deficient value (this is an
old result of Valiron from 1925, see [149], Th.4.10, p.110). This extends the result, pointed
out in the proof of Theorem 13.2.28, that f (∞) is the only possible deficient value for a
rational function. According to Drasin (loc.cit.), his method also shows that the restricted
inverse problem with the additional condition δj = 0 for every j can always be solved
with f of order 0 .
Another deep result of D. Drasin [93], solving a long-standing conjecture of F. Nevanlinna,
that if the sum of the deficiencies of a meromorphic function f of finite order ρ is
is
δ(a, f ) = 2 , then 2ρ is an integer ≥ 2 , ρ δ(a, f ) is a positive integer, and every
possibility can actually occur (see R. Nevanlinna [220], p. 357, or L. Ahlfors [4], p.406).
Meromorphic functions of finite order must also satisfy additional conditions. For example,
a result of A. Weitsman [330] shows that δ(a, f )1/3 < ∞ whenever f has finite order.
0 (f − a)} log r
N (1) (r, a, f ) := min{1, ord+
r
+ min{1, ord+z (f − a)} log .
z
0<|z|<r
For a = ∞ we define
N (1) (r, ∞, f ) := min{1, ord−
0 (f )} log r
r
+ min{1, ord− z (f )} log .
z
0<|z|<r
The following result can be viewed as the analogue of an abc-theorem for mero-
morphic functions.
Theorem 13.2.39. Let f , g be non-constant meromorphic functions such that
f + g = 1.
Then
T (r, f ) ≤ cond(r, f g) + O(log T (r, f )) + O(log r)
for all r outside of a set E of finite Lebesgue measure. Moreover, if f has finite
order, then we may choose E bounded.
Proof: We apply the second main theorem to f, with the three points {a1 , a2 , a3 } =
{0, 1, ∞}. We infer that
3
m(r, aj , f ) + Nram (r, f ) ≤ 2T (r, f ) + O(log T (r, f )) + O(log r) (13.6)
j=1
the last step because N (1) (r, 1, f ) = N (1) (r, 0, g) and f, g have the same poles.
The theorem follows by combining the last two displayed inequalities.
13.3. The Ahlfors–Shimizu characteristic 457
There is a very nice geometric definition for a slightly different characteristic func-
tion, due independently to L. Ahlfors [3] and T. Shimizu [281]. It leads to a theory
equivalent to Nevanlinna’s, but also to simpler and more elegant proofs, well mo-
tivated by underlying geometric concepts. We still denote by f : C → P1an a
non-constant meromorphic function.
13.3.1. We begin by recalling the stereographic projection and its main proper-
ties. Let S be the Riemann sphere of diameter 1 in R3 , lying on the Gauss plane
C with coordinates z = x + iy and touching the plane at the origin 0. Let N
denote the North Pole on S .
.N
.P
.z(P)
z-plane
Its interpretation is that, if P , Q, are two points on S and z(P ) and z(Q) are their
images in C ∪ {∞} by the stereographic projection, then the euclidean chordal
distance P − Q between P and Q is given by
P − Q = k(z(P ), z(Q)).
This induces a distance function on P1an . For the elements of area, we easily com-
pute that the pull-back of the Fubini–Study form ω , given in affine coordinates
w by
i dz ∧ dz
ω(z) = ,
2π (1 + |z|2 )2
458 N E VA N L I N NA T H E O RY
where
⎧
⎪
⎨log k(f (0), a) if f (0) = a,
ν(a, f ) = log(|c(f − a, 0)|/(1 + |a|2 )) if f (0) = a and a = ∞,
⎪
⎩
− log |c(f, 0)| if f (0) = a = ∞.
With the new proximity function, the first main theorem takes the following very
elegant form:
Theorem 13.3.2. For every a, b ∈ P1an it holds
◦ ◦
m(r, a, f ) + N (r, a, f ) + ν(a, f ) = m(r, b, f ) + N (r, b, f ) + ν(b, f ).
◦
◦
Definition 13.3.3. The quantity T (r, f ) := m(r, a, f ) + N (r, a, f ) + ν(a, f ),
which is independent of a, is the Ahlfors–Shimizu characteristic. It has been
◦
normalized so that limr→0+ T (r, f ) = 0.
◦
Let T (r, a, f ) denote the left-hand side in the theorem. The above shows that
◦ ◦
T (r, a, f ) − T (r, ∞, f ) is constant in r . On the other hand, (13.9) shows that its
◦ ◦
limit for r → 0+ is equal to 0, hence T (r, a, f ) = T (r, ∞, f ).
Lemma 13.3.4. Let η be an integrable 2-form on P1an . Then
f ∗η = n(r, a, f ) η(a).
|z|≤r P1an
Proof: Let γr be the curve f ({|z| = r}). Then the local mapping principle shows
that the restriction of f to {|z| < r} \ f −1 (γr ) is a ramified covering. For a
non-ramified a ∈ P1an \ γr , the number of sheets over a is equal to n(r, a, f ) and
hence the transformation formula of integrals implies the claim.
Remark 13.3.5. As a corollary of proof, we note that n(r, a, f ) is a locally con-
stant function in a ∈ P1an \ γr . The same argument shows that n(r, a, f ) is a lo-
cally constant function in (r, a) outside of the closed set {(r, a) | r > 0, a ∈ γr }
of measure zero.
Theorem 13.3.6.
r
◦ dt
T (r, f ) = f ∗ω .
0 |z|≤t t
Proof: Since the chordal distance is continuous and bounded by 1, the spherical
◦
proximity function m(r, a, f ) is a non-negative measurable function on R+ ×P1an .
Interchanging the order of integration using Fubini’s theorem and recalling that the
chordal distance is rotation invariant, we get
2π
◦ 1
m(r, a, f ) ω(a) = log ω(a) = − ν(a, f ) ω(a).
P1an P1an k(f (0), a) 0
-
Remark 13.3.7. By Lemma 13.3.4, |z|≤r f ∗ ω is equal to the area of f ({|z| ≤
r}) on the sphere with respect to the spherical area form ω , counted with the sheet
◦
multiplicity. Hence T (r, f ) is a logarithmic average of the growth of the spherical
area covered by f on disks of increasing radius.
-
Remark 13.3.8. Note that |z|≤r f ∗ ω is a continuous positive strictly increasing
◦
d
function equal to r dr T (r, f ). We conclude that the Ahlfors–Shimizu character-
◦
istic T (r, f ) is a positive convex strictly increasing function of log(r) and hence
◦
limr→∞ T (r, f ) = ∞ . Moreover, using (13.7) on page 458 and polar coordinates,
◦
we easily deduce that T (r, f ) is a C ∞ -function of r .
13.3.9. We have
C
1 2π 2
1 + f (reiθ ) dθ,
◦
m(r, ∞, f ) = log
2π 0
hence
◦ 1
0 ≤ m(r, ∞, f ) − m(r, ∞, f ) ≤ log 2.
2
This proves
◦ 1
ν(∞, f ) ≤ T (r, f ) − T (r, f ) ≤ ν(∞, f ) + log 2
2
◦
and hence T (r, f ) and T (r, f ) differ by a bounded function. We will prove this
again in 13.4.8, in a general setting valid in higher dimensions.
Now we give L. Ahlfors’s proof [5] of the second main theorem, in the following
form.
Theorem 13.3.10. Let f be a non-constant meromorphic function on C and
a1 , . . . , aq be distinct points in P1an . Let k ≥ −1 be a given real number and
let ε > 0. Then
q
◦
◦ ◦
m(r, aj , f ) + Nram (r, f ) < 2 T (r, f ) + (1 + ε) log T (r, f ) + (k + ε) log r,
j=1
-
for r → ∞ outside of an open set E such that E
tk dt < ∞ . Moreover, if f
◦
has finite order ρ, the result holds with the right-hand side 2 T (r, f ) + (2ρ − 1 +
ε) log r , for all large r without exception.
Remark 13.3.11. The error term is sharp for ρ = 0 . If we apply it to a non-constant ratio-
nal function, hence ρ = 0 , we recover Hurwitz’s theorem in the form (13.5) on page 453.
For ρ > 0 , the error term was improved by Z. Ye to the sharp (ρ − 1 + ε) log r , see [64],
Th.4.3.1 and Z. Ye [334].
13.3. The Ahlfors–Shimizu characteristic 461
Let r > r0 > 0, where r0 is fixed and will be chosen later. We assume that
◦
mµ (r0 , f ) < ∞ . Then the first main theorem in 13.3.2 yields
r
◦ ◦ ds ◦
mµ (r, f ) − mµ (r0 , f ) + nµ (s, f ) = T (r, f ) − T (r0 , f ) (13.10)
r0 s
◦
proving that mµ (r, f ) < ∞ . We have
nµ (r, f ) = n(r, a, f ) dµ(a) = f ∗ (ρω)
P1an |z|≤r
(13.11)
1 r 2π
|f (seiθ )|2 iθ
= 2 2 ρ(f (se )) s ds dθ,
π 0 0 iθ
1 + f (se )
where we have used Lemma 13.3.4 and the last step was obtained by (13.7) on
page 458 writing f ∗ (ρω) in polar coordinates.
2
By (13.7) on page 458, it is clear that |f |2 / 1 + |f |2 is a C ∞ -function on C
and we define
2π
1 |f (reiθ )|2
λ(r) := ρ(f (reiθ )) dθ.
2π 0 (1 + |f (reiθ )|2 )2
Now we proceed to the key step, namely obtaining a lower bound for λ(r). To
this end, we use Jensen’s inequality (see [252], Th.3.3) applied to log which
generalizes the inequality between arithmetic and geometric means:
462 N E VA N L I N NA T H E O RY
Lemma 13.3.12. If F (x) is positive and integrable in the interval [a, b], then
b
b
1 1
log F (x) dx ≥ log F (x) dx.
b−a a b−a a
and set
log ρ(a) := 2R(a) − α log(1 + R(a)) + K, (13.16)
1 1
log 1 + R(f (reiθ )) dθ ≤ log 1 + R(f (reiθ )) dθ
2π 0 2π 0
q
◦
≤ log 1 + m(r, aj , f )
j=1
◦
≤ log T (r, f ) + log q + O(1).
+
holds for sufficiently large r . This certainly holds outside of the set E . Now the
left-hand side of this inequality is an increasing function of r , hence if r ∈ E we
have
◦ q
(q − 2)T (r, f ) + Nram (r, f ) ≤ N (r1 , aj , f ) + (ρ + k) log r1 (13.18)
j=1
-
for any r1 > r with r1 ∈ / E . On the other hand, since E rk dr < ∞ , we
can always find such an r1 with r1 < r + r−k if r is large enough. This gives
log(r1 /r) < r−k−1 . Further, for any a and r < R , we have
R R
n(r, a, f ) log ≤ N (r, a, f ) − N (r, a, f ) ≤ n(r, a, f ) log .
r r
◦
If we take R = er1 , we get a fortiori n(r1 , a, f ) ≤ T (er1 , f )+O(1) = O(rρ+ε ).
If we take R = r1 , we get N (r1 , a, f ) − N (r, a, f ) = O(rρ+ε−k−1 ) = o(1),
because k > ρ − 1 + ε . This shows that having r1 in the right-hand side of
(13.18) has no effect on the final result and completes the proof.
main theorem to holomorphic curves f remains conjectural, but the case of Pnan
with D the sum of hyperplanes in general position was proved in Cartan’s thesis.
We deduce this special case from the lemma on the logarithmic derivative using
Wronskian techniques. The case of a curve X is also well known and we finish by
describing the generalization replacing C by finite ramified analytical coverings,
which find their arithmetical counterpart in finite extensions of number fields. At
the end, we give a reformulation of the abc-theorem for meromorphic functions,
which makes the analogy to the abc-conjecture clearer.
For this section, the reader is assumed to be familiar with the analytic theory of
complex varieties (see Section A.14 for an introduction and [130] for more details).
13.4.1. Let s be an invertible meromorphic section of a line bundle L giving
rise to a Cartier divisor D = div(s). We suppose that f (C) is not contained
in the support of D . We use the notation ordY (D) := ordY (s) to denote the
multiplicity of D in the prime divisor Y .
Definition 13.4.2. For r > 0, the counting function is defined by
r
Nf,D (r) := ord0 (f ∗ D) log r + ordz (f ∗ D) log
z
0<|z|<r
and
2π
1
mf,H (r) = log max{|f0 (reiθ) |, . . . , |fn (reiθ )|} dθ
2π 0
2π
1
− log | ◦ f (reiθ )| dθ.
2π 0
∂ 1
=− log ◦ f (teiθ ) dθ dt − log ◦ f (0).
0 ∂t 2π 0
1 ∂g ∂g dr
dc g = r dθ −
4π ∂r ∂θ r
for any differentiable function g . Therefore, by interchanging differentiation and
integration, we get
r
dt
mf,H (r) = − dc log ◦ f (z)2 − log ◦ f (0).
0 |z|=t t
By Stokes’s theorem, we have
d log ◦ f (z) =
c 2
ddc log ◦ f (z)2 + n(t, 0, ◦ f ),
|z|=t |z|≤t
where the enumerating function n(t, 0, g) is the number of zeros, counted with
multiplicities, of the holomorphic function g = ◦ f in the open disk {|z| < t}.
By
f ∗ ω = ddc log ◦ f −2
and r
dt
n(g, 0, t) = N (r, 0, g),
0 t
we get
r
dt
mf,H (r) = f ∗ω − N (r, 0, ◦ f ) − log ◦ f (0).
0 |z|≤r t
This proves our final result
r
dt
Tf,H (r) = f ∗ω − log ◦ f (0) (13.19)
0 |z|≤r t
provided f (0) ∈ H . Note that
r
dt
Tfω (r) := f ∗ω
0 |z|≤r t
does not depend on the choice of . It is called the Ahlfors–Shimizu character-
istic of f generalizing the construction in 13.3.3. The equivalence with Tf,H (r)
is a consequence of (13.19).
13.4. Holomorphic curves 469
If f (0) ∈ H , then Tf,H is still equivalent to Tfω by using the second part of the
following first main theorem in the higher-dimensional case.
Theorem 13.4.9. Let L be a line bundle on the compact complex variety X with
an invertible meromorphic section s and a continuous metric giving rise to
the characteristic function Tf,div(s) (r). For a holomorphic map f : C → X with
f (C) ⊂ |div(s)|, the following results hold:
dim Hi = n − |I|.
i∈I
of (f0 : · · · : fn ) by
⎛ ⎞
f0 f1 ··· fn
⎜f0 f1 ··· ⎟
fn ⎟
W (f0 , . . . , fn ) = det ⎜
⎝ · · ··· · ⎠.
(n) (n) (n)
f0 f1 ··· fn
Clearly this is determined by f up to entire functions without zeros, hence the
counting function
Nf,ram (r) := N (r, 0, W (f0 , . . . , fn ))
is well defined. In this situation, we have Cartan’s second main theorem:
Theorem 13.4.16. For any ε > 0 and any given hyperplane H , the inequality
mf,D (r) + Nf,ram (r) ≤ (n + 1)Tf,H (r) + O(log Tf,H (r)) + O(log r)
holds for all r outside of a set E of finite Lebesgue measure.
Proof: If we add additional hyperplanes to H1 , . . . , Hq , then the left-hand side
increases up to bounded functions (see the proof of Theorem 13.4.9). So we may
assume q ≥ n + 2 and we set p := q − n − 1. Let j (x) be a linear form with
Hj = div(j (x)). For K = {k1 , . . . , kp } ⊂ {1, . . . , q} of cardinality p , we
define
K (x) := k1 (x) · · · kp (x).
We consider the morphism
(p )
ϕ : Pnan −→ Panq , x → (K (x))|K|=p .
Using ϕ∗ O (p ) (1) ∼
= OPn (p), Proposition 13.4.12 implies
P q
Tϕ◦f,O(1) (r) = pTf,H (r) + O(1). (13.20)
Let I = {i1 , . . . , in+1 } := {1, . . . , q} \ K and gi := i (f0 , . . . , fn ). Then we
have
W (gi1 , . . . , gin +1 ) = dI · W (f0 , . . . , fn ), (13.21)
where dI is the determinant of the (n + 1) × (n + 1) matrix formed with the
coefficients of (i )i∈I . We define the logarithmic Wronskian by
⎛ ⎞
1 1 ··· 1
⎜ f0 /f0 f1 /f1 · · · fn /fn ⎟
λ(f0 , . . . , fn ) := det ⎜ ⎟ = W (f0 , . . . , fn ) .
⎝ · · ··· · ⎠ f0 · · · fn
(n) (n) (n)
f0 /f0 f1 /f1 · · · fn /fn
From (13.21), we deduce
g1 · · · gq
K (f ) = · d−1 · λ(gi1 , . . . , gin +1 ). (13.22)
W (f0 , . . . , fn ) I
472 N E VA N L I N NA T H E O RY
To compute (13.20), we will use equation (13.22). The first factor is independent
of K and will be handled by Jensen’s formula. The second factor is independent
of r and hence contributes only a bounded amount to the height. The third factor
is small by the lemma on the logarithmic derivative in 13.2.23.
The details are as follows. For a moment, we fix r > 0. Given a meromorphic
function g on C and v ∈ C, |v| ≤ r , it is notationally convenient to use
⎧
⎨|g(v)|
⎪
v ordv (g)
if |v| = r,
|g|v = r if v = 0,
⎪
⎩ −ordv (g)
r if v = 0.
The functions | |v behave almost like absolute values and Jensen’s formula may
be interpretated as a product formula (see 14.2.2). Later, this notation is helpful
for translating the argument to the function field case.
Let v ∈ C , |v| = r with v ∈ f −1 (D). Then we have
(π ) (π )
log |λ(gi1 , . . . , gin +1 )|v ≤ log max |gi1 0 /gi1 · · · gin +1
n
/gin +1 |v + log |(n + 1)!|v
π
n
q
(j)
≤ log+ |gi /gi |v + log |(n + 1)!|v ,
j=0 i=1
(13.23)
where π ranges over all bijective maps of {0, . . . , n}. By Example 13.4.5 and
Jensen’s formula in 13.2.6, we have
1
Tϕ◦f,O(1) (r) = max log |K (f )|v + max log |K (f )|v dθ + O(1).
K 2π |v|=r K
|v|<r
(13.24)
For |v| < r , we use |K (f )|v ≤ 1. For |v| = r , equation (13.22) implies
1
max log |K (f )|v dθ
2π |v|=r K
1 g1 · · · gq
≤
log − log |dI |v + |λ(gi1 , . . . , gin +1 )|v dθ.
2π |v|=r W (f0 , . . . , fn ) v
(13.25)
Applying the lemma on the logarithmic derivative (see Lemma 13.2.23) to
) · · · (g /g)
(j) (j) (j−1)
gi /gi = (gi /gi
in (13.23), we get
1
log |λ(gi1 , . . . , gin +1 )|v dθ ≤ O(log Tf,H (r)) + O(log r) (13.26)
2π |v|=r
13.4. Holomorphic curves 473
for all r outside of a set E of finite Lebesgue measure. Here, we have to note that
Tgi = O(Tf,H ) and similarly for the derivatives of gi . Using (13.25) and (13.26)
in (13.24), we get
1 g1 · · · gq
Tϕ◦f,O(1) (r) ≤ log dθ + O(log Tf,H (r)) + O(log r)
2π |v|=r W (f0 , . . . , fn ) v
for r ∈ E . By Jensen’s formula in 13.2.6, we deduce
W (f0 , . . . , fn )
Tϕ◦f,O(1) (r) ≤ log + O(log Tf,H (r)) + O(log r)
g1 . . . gq v
|v|<r
Conjecture 13.4.18. There is a closed algebraic subset Z = X such that for any
holomorphic map f : C → X with f (C) ⊂ Z , the estimate
mf,D (r) + Tf,KX (r) = O(log Tf,H (r)) + O(log r)
holds for all r outside of a set of finite Lebesgue measure.
q
Remark 13.4.19. In the case X = Pnan and D = j=1 Hj , we compare the
Griffiths conjecture with Cartan’s second main theorem in 13.4.16. The assump-
tion that D has normal crossings means that the hyperplanes are in general posi-
tion. For H , we may choose a hyperplane and we have KX = O(−(n + 1)H)
(cf. [148], Example II.8.20.1). If we neglect the ramification term Nf,W , then the
Griffiths conjecture matches with Cartan’s second main theorem up to the excep-
tional set Z . P. Vojta [316] removed this discrepancy and proved that Griffiths’s
conjecture holds for X = Pnan and D a sum of hyperplanes in general position,
with an exceptional set Z equal to a finite union of proper linear subspaces.
Tg◦f,OP1 (1) (r) = Tf,g∗ OP1 (1) (r) + O(1) = O(Tf,H (r)) (13.30)
for every a ∈ C . The first main theorem in 13.4.9 proves Nf,[a] (r) > 0 for r
sufficiently large, hence every f is surjective if the genus is g = 1.
For genus g ≥ 2, KC is ample (see A.13.6, A.13.7). Theorem 13.4.23 implies that
Tf,[a] = O(log r). It follows from Proposition 13.2.17 that f induces an algebraic
morphism P1C → C , which is impossible by Hurwitz’s theorem in B.4.6.
Remark 13.4.25. The above results for curves may be proved without Nevanlinna
techniques. To get non-trivial results in the case g ≥ 1, we may prove a similar
inequality as in Theorem 13.4.23 for holomorphic maps f : {|z| ≤ R} → C and
we get a upper bound for R (cf. S. Lang and W. Cherry [174], p.93 or [64], §5.8).
(1)
Lemma 13.4.27. Nf,D (r) ≥ Nf,D (r) − NRf (r)
Proof: This follows from f ∗ D − (f ∗ D)red ≤ Rf proved similarly as in Proposi-
tion B.4.7 (or just count zeros locally) and from Proposition 13.4.13.
476 N E VA N L I N NA T H E O RY
1 r
∗
ordy (f D) log r + ∗
ordy (f D) log
deg(p) p(y)
p(y)=0 0<|p(y)|<r
Note the analogy with the normalizations in 1.3.6. They lead to the fact that the
characteristic function
is invariant under base change to a finite ramified covering of Y . Then the gener-
alization of Theorem 13.4.23 is
This is a special case of the second main theorem of P. Griffiths and J. King [131]
and W. Stoll [292] for parabolic coverings in the equidimensional setting analo-
gous to Remark 13.4.24. For a proof, we refer to [292], Th.18.13E.
Remark 13.4.31. This result indicates that NRp measures the dependence on the
covering p . However, to make sense of this statement, we should look for a second
main theorem uniform in f . This is done explicitly by W. Cherry (cf. [174] and
[63] for details) and his result implies that NRp indeed gives the contribution of
the covering p to the second main theorem.
f g
cond r, · = N (1) (r, 0, f gh)
h h
and the first claim follows from Theorem 13.2.39 applied to the relation fh + hg =
1. If f, h are of finite order, then we have seen in 13.2.20 that f /h has also finite
order and the last claim follows as well.
For more details about Nevanlinna theory in one variable, we refer the reader to
the classic books of R. Nevanlinna [221] and W. K. Hayman [149]. If the reader
is interested in a finer analysis of the error term, he is referred to W. Cherry and Z.
Ye [64].
The foundations of the theory are given in R. Nevanlinna’s article [219]. He proved
the lemma on the logarithmic derivative to deduce his second main theorem. A
little later, his brother F. Nevanlinna gave a proof of the second main theorem
based on differential geometry and differential equations. Ahlfors simplified this
478 N E VA N L I N NA T H E O RY
proof further and expanded its geometric interpretation leading to the presentation
in Section 13.3. In a breakthrough work [6], L. Ahlfors interpretated and extended
Nevanlinna theory as a geometric theory of covering surfaces.
Cartan’s formula appears in H. Cartan [59]. In his thesis (see H. Cartan [60]), he
proved his second main theorem for hyperplanes.
References for Section 13.4 are the books of S. Lang [170], S. Lang and W. Cherry
[174] and M. Ru [249]. For a generalized abc-theorem in Nevanlinna theory sim-
ilar to Theorem 12.4.4, we refer to [249], Theorem A.3.2.6. The reader may also
consult, in connexion with Cartan’s version of Nevanlinna’s theory and its appli-
cations, the expository article by G.G. Gundersen and W. Hayman [142].
R. Nevanlinna asked for a second main theorem with moving targets, i.e. the con-
stants ai are replaced by meromorphic functions gi with log T (r, gi ) =
o(log T (r, f )). He treated the case of three targets by elementary means, using
a fractional linear transformation to reduce it to the constant case. The general
case remained open for a long time. The weaker form of the second fundamental
theorem without the contribution due to ramification was then obtained indepen-
dently by C.F. Osgood [232] and N. Steinmetz [290], see [249] for details and
further extensions. This was Vojta’s motivation for Roth’s theorem with moving
targets (see Section 6.5).
Finally, in a major paper K. Yamanoi [333] obtained a second fundamental theo-
rem with moving targets in full generality, with the expected contribution coming
from ramification. However, the error terms here are far weaker than those in
earlier works, because of the use of Ahlfors’s [6] geometric theory of covering
surfaces as a main tool.
1 4 T H E VO J TA C O N J E C T U R E S
14.1. Introduction
Ch. Osgood [231], [232], was the first to observe, in his researches on diophantine
approximation in differential fields, that the corresponding Roth’s theorem in that
setting could be viewed as analogous to Nevanlinna’s second main theorem, with
the exponent 2 in Roth’s theorem and the coefficient 2 in 2 T (r, f ) having the
same significance. To P. Vojta, in his landmark Ph.D. thesis, goes the credit of
finding a solid connexion between classical diophantine geometry over number
fields and Nevanlinna theory, thereby leading to far-reaching conjectures, which
unified and motivated much further research, see [306], [307].
This final chapter is dedicated to the Vojta conjectures. They may be considered
as an arithmetic counterpart of the Nevanlinna theory discussed in Chapter 13
and of which the abc-conjecture, which was the subject of a detailed analysis in
Chapter 12, turns out to be an important special case.
The first two sections of this chapter develop Vojta’s dictionary establishing a par-
allel between diophantine approximation and Nevanlinna theory, leading to his
conjectures over number fields in Section 14.3. Schmidt’s subspace theorem and
the theorems of Roth, Siegel, and Faltings now appear as special cases of Vojta’s
conjectures without the ramification term. This lends support to the validity of Vo-
jta’s conjectures and also shows that the crux of the matter in attacking the general
case consists precisely in controlling the ramification. The next Section 14.4 con-
tains a generalization of the strong abc-conjecture to curves over number fields
and we show its equivalence to the Vojta conjecture with ramification for curves,
due to Elkies, van Frankenhuysen, and Vojta. In particular, the abc-conjecture
implies the Mordell conjecture. Section 14.5 deals with the analogue of the strong
abc-conjecture over function fields of characteristic 0, concluding with the general
abc-theorem over function fields due to Voloch and Brownawell-Masser.
As it is already clear from the above, this chapter uses many results from previous
chapters. The reader is assumed to be familiar with the geometric theory of heights
479
480 T H E VO J TA C O N J E C T U R E S
from Chapter 2, with the abc-conjecture for integers and polynomials from Chap-
ter 12 and with the basic results from Nevanlinna theory in Chapter XIII. We will
use frequently the results about ramification from Appendix B.
A reader puzzled by the fact that the set of places varies with r > 0 should identify
D(r) with the closed unit disk using the dilatation v → w := v/r and thus fr
with f(r) (w) := f (rw). Then we have
⎧
⎪
⎨|f (rw)| for |w| = 1,
|f(r) |w = |w| ord w (f (r ) )
= |w|ord r w (f )
for 0 < |w| < 1,
⎪
⎩ −ordw (f )
r for w = 0.
For w = 0, note that the normalization of the absolute value still depends on r
and not only on f(r) . This is the reason why it is better to consider the set of
places as variable. Ignoring the place w = 0 would lead to an additional error
term O(log r).
14.2.4. By the above analogy between normalized absolute values in number the-
ory and Nevanlinna theory, it is clear that the height h(β) corresponds to the char-
acteristic function T (r, f ). For a ∈ K , the analogue of the proximity function in
number theory is
mS (a, β) := − log min{1, |β − a|v } = log− |β − a|v
v∈S v∈S
which was used on the left-hand side of Roth’s theorem in 6.2.3, where S is any
finite set of places containing the archimedean ones. The counting function in this
number theoretic setting would be
NS (a, β) := log− |β − a|v .
v∈S
In this section we translate the notions from the geometric part 13.4 of Nevanlinna
theory to projective varieties over a number field K . Griffiths’s conjecture leads us
to Vojta’s conjecture which would imply Roth’s theorem, Siegel’s theorem on in-
tegral points, Faltings’s theorem, as well as several other outstanding conjectures.
If we allow the points to vary in number fields of bounded degree, then the rami-
fication term analogous to Nevanlinna theory is given by the absolute logarithmic
discriminant. The consequences of Vojta’s conjecture with ramification will be
discussed in the next Section 14.4.
14.3.1. Let X be a projective variety over K and let s = sD be an invertible
meromorphic section of a line bundle L = O(D) with corresponding Cartier
divisor D . We choose a presentation of L (or more generally a bounded M -
metric) to get a local height λD (·, v) for every v ∈ MK . For a finite subset S of
MK (usually containing the archimedean primes), the counting function is
NS,D (P ) := λD (P, w)
w|v∈MK \S
Based on his analogies between Nevanlinna theory and diophantine geometry, dis-
cussed here in Section 14.2, Vojta translated Griffiths’s conjectural second main
theorem from 13.4.18 into the following
Conjecture 14.3.2. Let X be an irreducible smooth projective variety over K .
Let D be a divisor with normal crossings, let H be an ample line bundle and let
KX be the canonical line bundle on X . For any ε > 0 and any finite subset
S ⊂ MK , there is a closed subset Z = X such that for all P ∈ X(K) \ Z , we
have
mS,D (P ) + hKX (P ) ≤ εhH (P ) + O(1).
484 T H E VO J TA C O N J E C T U R E S
Here, D is said to have normal crossings if the base change to C has normal
crossings in the analytic sense of 13.4.17. The above inequality is called Vojta’s
height inequality.
Remark 14.3.3. Since Theorem 14.2.6 easily extends to the case ai ∈ K (see
below for a generalization), Proposition 14.2.7 shows that the case X = P1K in
Vojta’s conjecture above is equivalent to Roth’s theorem.
More generally, for X = PnK Schmidt’s subspace theorem yields the follow-
ing analogue of Vojta’s version of Cartan’s second main theorem (see Remark
13.4.19):
Theorem 14.3.4. Vojta’s conjecture in 14.3.2 holds for X = PnK and D a divisor
equal to a finite union of hyperplanes in general position defined over K . The
exceptional subset Z may be chosen as a finite union of linear subspaces defined
over K .
Proof: Since all quantities in Conjecture 14.3.2 are invariant under base change to
a larger number field, we may assume that all hyperplane components {Li = 0}
of D are defined over K . For x ∈ PnK (K), we have the local height
− log (|Li (x)|v /|x|v ) , |x|v := max |xj |v ,
j
|Li (x)|v
mS,D (x) = − log + O(1).
i
|x|v
v∈S
By KPnK ∼
= OPnK (−n − 1) ([148], Example II.8.20.1), we easily deduce the claim
from Theorem 7.2.9.
Remark 14.3.5. As in Nevanlinna theory, Vojta’s conjecture in 14.3.2 is known
for curves X = C of genus g . Note that in this case, the exceptional set is finite
and hence may be omitted by enlarging the bound. We sketch the argument and
we give additional explanations to the meaning of Vojta’s conjecture:
If g = 0, then we know that Conjecture 14.3.2 is equivalent to Roth’s theorem.
If g = 1 and D = 0, then Vojta’s conjecture is a special case of the approximation
theorem for abelian varieties (cf. [277], §7.3) which is an intermediate step in the
standard proof of Siegel’s theorem on finiteness of S -integral points (see Remark
7.3.10 and Serre [277], §7.5). The latter claims that for a geometrically irreducible
smooth projective curve of genus g ≥ 1 and for a reduced divisor D = 0 (or
g = 0 and |supp(D)| ≥ 3), the S -integral points of C \ supp(D) are finite. Note
that the complement of D is always affine ([148], Exercise IV.1.3).
In order to see that the Vojta height inequality directly implies Siegel’s theo-
rem, note that S -integral means that NS,D (P ) = O(1) and hence mS,D (P ) =
14.3. Vojta’s conjectures 485
For |S| ≥ 2, again Dirichlet’s unit theorem implies easily that such points are
dense in P2K , in contradiction with Vojta’s height inequality.
Remark 14.3.7. A smooth projective variety X over K is called of general type
if there is a positive integer n , an ample line bundle L on X and an effective
⊗n ∼
divisor E on X with KX = L ⊗ O(E) (cf. [307], §1.2). If X is of general
type, then Vojta’s conjecture, applied with D = 0 and L = H , implies
(1 − ε)hH (P ) + hE (P ) ≤ O(1).
By Proposition 2.3.9, we may assume hE (P ) ≥ 0. Then Northcott’s theorem in
2.4.9 shows that X(K) is not Zariski dense in X . This is the Bombieri–Lang
conjecture.
For any projective variety X over K , the special set SpX is defined as the Zariski
closure of the union of the images of all non-constant rational maps from irre-
ducible group varieties to X . Clearly, in the special set one may have infinitely
many K -rational points by considering the images of 1, g, g 2 , · · · in X for a K -
rational point g of the group variety. Then the general Lang conjecture claims
that X is of general type if and only if (X \ SpX ) (K ) is finite for all finitely
generated extensions K /K (see [171], Ch.I, §3 for a further discussion).
Faltings proved this for a subvariety X of an abelian variety, which is Faltings’s
big theorem (see Theorem 11.10.1 for the number field case and [115] for the
general case). In this case the special set is the union of all translates of abelian
subvarieties of dimension ≥ 1 contained in XK (see [171], Ch.I, §6).
14.3.8. In Conjecture 14.3.2, we have only considered points rational in a fixed
number field K . But what happens if we allow P to vary over all K -rational
points? The analogous situation in Nevanlinna theory was considered in Theorem
13.4.30. The additional effect of finite field extension was measured by the count-
ing function of the ramification divisor of the covering. In order to get an analogy
with the number field case, we use the language of schemes. If the reader is not
familiar with the latter, then he may pass directly to Definition 14.3.9.
14.3. Vojta’s conjectures 487
where Ω denotes the sheaf of relative differentials (see A.7.29) and is the length
(see B.4.4). So it is natural to define the ramification divisor of F/Q by
RF/K = (ΩOF /OK ,P )P.
P∈Spec(OF )
By B.1.18, ΩOF /OK is a principal OF -module with annihilator equal to the dif-
ferent DF/K , hence
(ΩOF /OK ,P ) = vP (DF/K ),
where vP is the discrete valuation associated to P and where vP (I) := min{vP (a) |
a ∈ I} for any ideal I of OF . Note that after localization at P , it is always the
valuation of a principal generator. Thus the counting function of RF/K should be
NRF / K = − (ΩOF /OK ,P ) log |P|vP = − log |DF/K |vP .
P∈Spec(OF ) P∈Spec(OF )
Since the norm NF/Q of the different DF/K is the discriminant dF/K (see B.1.17),
Lemma 1.3.7 and the product formula lead to
1 1
NRF / K = − log |dF/K |℘ = log |NK/Q dF/K |.
[F : K] [F : Q]
℘∈Spec(OK )
Definition 14.3.9. For a number field K , we define the absolute logarithmic dis-
criminant dK by
1
dK := log |DK/Q |
[K : Q]
where DK/Q ∈ N is the discriminant (see B.1.14). The absolute logarithmic
discriminant of a point P in a K -variety is defined by d(P ) := dK(P ) .
Proposition 14.3.10. Let F/K be a finite extension of number fields. Then
1 1
0 ≤ dF − d K = log |NK/Q dF/K | = − log |dF/K |v .
[F : Q] [F : K] v
where v ranges over all non-archimedean places of MK .
Proof: Recall that |dF/K |v is the absolute value of a principal generator of the
localization of dF/K in the prime ideal corresponding to v . It follows from the
approximation theorem in 1.2.13 that we may choose the same principal generator
488 T H E VO J TA C O N J E C T U R E S
for a finite set of places. Then the claim follows from Proposition B.1.19 and
Lemma 1.3.7.
From 14.3.8, we get NRF / K = dF − dK . The analogy with Theorem 13.4.30
leads to Vojta’s conjecture with ramification.
Conjecture 14.3.11. Let X, D, H, S, ε be as in Conjecture 14.3.2 and let d ∈ N .
Then there is a closed subset Z = X such that for all P ∈ X \ Z with [K(P ) :
K] ≤ d it holds
mS,D (P ) + hKX (P ) − d(P ) ≤ εhH (P ) + O(1).
Remark 14.3.12. Even for C = P1Q , Vojta’s conjecture with ramification is un-
known. The case of curves and its relations to the abc-conjecture are studied in
the next section.
A natural thing to ask is what form the abc-conjecture should take over a num-
ber field K . Also, the question arises whether there is any special feature in the
structure of the equation a + b = c , and what may be an appropriate higher-
dimensional generalization of it. To this end, we may view the abc-conjecture as
a statement about points on the model x + y = 1, x, y = 0, 1 of the affine curve
P1 \ {0, 1, ∞}. There is nothing special about this particular model, for example
we could have worked instead with the affine curve x(x − 1)y = 1. What matters
here is that the abc-conjecture is a statement about ramification, both arithmetic
and geometric.
In this section, C denotes an irreducible smooth projective curve of genus g de-
fined over K . Let D be an effective reduced divisor on C with local height λ .
Then we will formulate a conjecture on C for D and λ similar to the truncated
second main theorem in Nevanlinna theory. The case D = [0]+[1]+[∞] will give
us the desired generalization of the strong abc-conjecture to number fields. Based
on the work of Elkies and Vojta, we will show that this strong abc-conjecture, the
conjectural truncated second main theorem for arbitrary C , and Vojta’s conjecture
with ramification for curves, are all equivalent.
14.4.1. Clearly, the left-hand side max{|a|, |b|, |c|} of the abc-conjecture in 12.2.2
corresponds logarithmically to the height hλ on C . So we have to deal with
log rad(abc) = log |1/p|p .
p|abc
For every natural prime p , there is a contribution log p if the local height of the
point (a : b : c) of the curve x + y = z is non-zero. This leads to the follow-
ing generalization of the radical. For a precise explanation, we refer to Example
14.4.4.
14.4. A general abc-conjecture 489
where v ∈ MK(Pfin
) denotes the discrete valuations of K(P ) with local para-
meters πv and
0 if t ≤ 0,
χ(t) =
1 if t > 0.
Remark 14.4.3. Note that the conductor is completely analogous to the truncated
counting function in Nevanlinna theory from 13.4.26. It makes also sense for
non-reduced effective divisors. The following example shows how to recover the
special conductor for the simple minded formulation of the abc-conjecture we
have considered in Chapter 12.
Example 14.4.4. Consider the case in which C = P1K and D = [0] + [1] + [∞].
Let (x0 : x1 ) be standard homogeneous coordinates on P1K , which we view as
global sections of OP1 (1). Then D has a presentation
D = (x0 x1 (x0 − x1 ); OP1 (3), x30 , x20 x1 , x0 x21 , x31 ; OP1 , 1)
with associated height
hD (P ) = 3h(P ),
where h(P ) is the standard projective height in P1K . Moreover, Ω1P1 =∼ OP1 (−2)
(use A.13.6), thus there is a presentation K of a canonical divisor such that
hK (P ) = −2h(P ).
It follows that
hD (P ) + hK (P ) = h(P ).
The local height function λ(P, v) := λD (P, v) at v ∈ MK(P ) is given by
xi0 x3−i
λ(P, v) = max log 1
i=0,...,3 x0 x1 (x0 − x1 ) v
x20 x21
= max log
, log .
x1 (x0 − x1 ) v x0 (x0 − x1 ) v
If x0 = c, x1 = a, x0 − x1 = b, where OK(P ) a, OK(P ) b , OK(P ) c are coprime
ideals, which is always possible if OK(P ) is a principal ideal domain, then a+b =
c and
2 2
c a
λ(P, v) = max log , log
ab cb
v v
therefore
condK
λ (P ) = log |1/πv |v .
v(abc)>0
Another way of writing this conductor, which works in any case, is as follows. For
x = x1 /x0 , let
condK [0] (x) = log |1/πv |v ,
v(x)>0
condK
[1] (x) = log |1/πv |v ,
v(1−x)>0
condK
[∞] (x) = log |1/πv |v .
v(1/x)>0
Clearly, they correspond to conductors of the divisors [0], [1] and [∞] on P1K .
We have
condK K K K
λ (P ) = cond[0] (x(P )) + cond[1] (x(P )) + cond[∞] (x(P )).
λ (P ) − condλ (P ) =
condK 1− log |1/πv |w ,
K
χ(λ(P, v))
fin
ew/v
v∈MK (P )
w|v
(14.6)
fin
where w ranges over MK(P ) and ew/v is the ramification index. This proves (a).
− log |d |
K(P )/K(P ) v = 1 + log |1/πv |w .
[K(P ) : K(P )] ew/v
w|v
where
1
C1 = − log |dK(P )/K(P ) |v
[K(P ) : K(P )] v
with v ranging over all non-archimedean places of K(P ) with λ(P, v) ≤ 0 and
C2 = v(ew/v ) log |1/πv |w .
fin
v∈MK w|v
(P )
p be the natural prime with v|p and let vp be the corresponding discrete valuation
normalized by vp (p) = 1. Using
v(ew/v ) = ev/p vp (ew/v ) ≤ ev/p log[K(P ) : K(P )]/ log p,
we get
log[K(P ) : K(P )]
v(ew/v ) log |1/πv |w ≤ log |1/p|w
log p
v|p w|v w|p
= log[K(P ) : K(P )],
where the last step was done by appealing to Lemma 1.3.7. Let S0 be the set of
natural primes p such that K(P )/K(P ) is wildly ramified at a place over p . For
every p ∈ S0 , Example 1.4.12 shows
p | ew/v ≤ [K(P ) : K(P )] ≤ deg(ϕ),
hence the cardinality |S0 | is bounded by a constant depending on deg(ϕ). We
conclude
C2 ≤ |S0 | log deg(ϕ) = O(1).
It remains to bound C1 . By Proposition 14.4.5, we may assume λ ≥ 0. Let M be
the set of places of K represented by the extensions of | |p , p ∈ MQ . It is easy to
show that E u := {x ∈ C \ supp(D) | λ(x, u) = 0}, u ∈ M , is an M -bounded
family in C \ supp(D) (use Proposition 2.6.17). The Chevalley–Weil theorem in
the form of Theorem 10.3.5 gives the existence of a non-zero α ∈ Z such that
α ∈ dK(P )w /K(P )v
for all P ∈ C \ supp(D) and w ∈ MK(P fin
) with λ (P , w) = 0. By Corollary
Therefore
C1 ≤ − log |α|v = log |α|
fin
v∈MK (P )
by Lemma 1.3.7 and the product formula. This finishes the proof of (c).
14.4. A general abc-conjecture 493
on C \ supp(D).
Proof: This is obvious from |πv |v ≥ |p|v and Lemma 1.3.7.
Proposition 14.4.9. Let λ be a local height relative to the effective divisor D on
C . Then
condKλ (P ) ≤ hλ (P ) + O(1)
Here, as in the other conjectures of this chapter, in the bound O[K(P ):K] (1) which
may depend on ε and a whole complex of other data, it is only the dependence on
the point P which matters here and we give it in terms of the degree [K(P ) : K].
14.4.11. If we specialize to the case C = P1K and D = [0] + [1] + [∞], then
KC ∼ = OP1 (−2) (cf. A.13.6) and hence
hD (P ) + hKC (P ) = h(P ) + O(1).
By Example 14.4.4, we get the strong abc-conjecture of Elkies generalizing the
strong abc-conjecture in 12.2.2 to number fields:
Conjecture 14.4.12. For every ε > 0, it holds
(1 − ε)h(x) ≤ condQ Q Q
[0] (x) + cond[1] (x) + cond[∞] (x) + dQ(x) + O[Q(x):Q] (1)
for all x ∈ Q \ {0, 1}, where O[Q(x):Q] (1) depends only on ε and [Q(x) : Q].
We will show that both conjectures 14.4.10 and 14.4.12 are equivalent to
Conjecture 14.3.11 restricted to curves. For the latter, the exceptional set Z is
finite and may be omitted by enlarging the O(1)-term. Hence Vojta’s conjecture
with ramification for curves reads as:
Conjecture 14.4.13. Let S be a finite subset of MK . With the same hypothesis
as in Conjecture 14.4.10, the estimate
mS,D (P ) + hKC (P ) ≤ d(P ) + εhH (P ) + O[K(P ):K] (1)
holds for all P ∈ C \ supp(D).
14.4.14. Vojta’s idea to deduce the strong abc-conjecture from his conjecture is
that, by passing to a finite covering π : C → C , we may improve the height in-
equality in 14.4.13. In fact, Conjecture 14.4.13 applied to C and D := π ∗ (D)red
(namely, the sum of the irreducible components) gives
mS,D (P ) + hKC (P ) − d(P ) ≤ ε hπ∗ H (P ) + O[K(P ):K] (1) (14.8)
for all P ∈ supp(D ). Now KC ∼
= π ∗ KC + Rπ from Theorem B.4.5 implies
hKC (P ) = hKC (P ) + mS,Rπ (P ) + NS,Rπ (P ) + O(1) (14.9)
∗
for P := π(P ). By Proposition B.4.7, we have D ≥ π (D) − Rπ and hence
mS,D (P ) ≥ mS,D (P ) − mS,Rπ (P ) + O(1). (14.10)
By (14.9) and (14.10) in (14.8), we get
mS,D (P ) + hKC (P ) + NS,Rπ (P ) − d(P ) ≤ ε hH (P ) + O[K(P ):K] (1).
This leads to the improvement NS,Rπ (P ) + d(P ) − d(P ) on the left-hand side
of the original Vojta height inequality on C . Moreover, by a Chevalley–Weil type
argument, this improvement is always bounded from below ([307], Th.5.1.6), but
we will not need this result here.
14.4. A general abc-conjecture 495
The implication (a) ⇒ (b) follows from an argument of Elkies [98] which was
elaborated by M. van Frankenhuysen [305]. The claims (b) ⇒ (c) ⇒ (d) are
trivial and (d) ⇒ (a) is due to P. Vojta (see [307], [317]).
Proof: (a) ⇒ (b): Let C be an irreducible smooth projective curve over the
number field K with local height λ relative to the reduced divisor D . The proof
is based on Elkies’s idea of using a Belyı̆ function f : C → P1K for D , in other
words with supp(D) ⊂ f −1 {0, 1, ∞} and unramified outside of f −1 {0, 1, ∞}
(see Lemma 12.2.7). We use D0 = [0] + [1] + [∞], D = f ∗ (D0 ), D1 = Dred
Note that only the dependence on P is indicated in the bound, which may depend
on K as well. By Proposition 14.4.9 and h(f (P )) = hf ∗ OP1 (1) (P ) + O(1), we
deduce
(1 − ε )hf ∗ OP1 (1) (P ) ≤ condK
λ (P ) + hD2 (P ) + d(P ) + O[K(P ):K] (1).
(14.13)
We have proved this only for P ∈ supp(D1 ), but by increasing the constants we
may assume that it holds for all P ∈ supp(D). By Theorem B.4.5 and (14.12),
we have
KC ∼ = f ∗ KP1 ⊗ O(Rf ) ∼= f ∗ OP1 (1) ⊗ O(−D1 ).
By this equation, inequality (14.13), and Theorem 2.3.8, we get
(1 − ε ) (hKC (P ) + hD1 (P )) ≤ condK
λ (P ) + hD2 (P ) + d(P ) + O[K(f (P )):K] (1).
Finally, D = D1 − D2 leads to
hKC (P )+hD (P )−ε (hKC (P ) + hD1 (P )) ≤ condK
λ (P )+d(P )+O[K(P ):K] (1).
λ (P ) ≤ NS,D (P ) + O(1)
condK
easily deduced from Proposition 14.4.8.
(c) ⇒ (d) is trivial.
(d) ⇒ (a): Let P ∈ P1Q \ {0, 1, ∞} with affine coordinate x = x(P ) ∈ Q. We
identify P1Q with C := C1 and we consider the covering π : C → C of Fermat
curves with C := Cn for suitable n ≥ 1 (see Example 14.4.15). We choose
P ∈ C with P = π(P ). For D := [0] + [1] + [∞] and D := π −1 {0, 1, ∞},
Example 14.4.15 yields easily π ∗ D = nD and hence Rπ = (n − 1)D proves
1 1
NS,Rπ (P ) = 1 − NS,π D (P ) + O(1) = 1 −
∗ NS,D (P ) + O(1).
n n
Applying (14.11) on page 495 to the covering π , we get
1
hD (P ) + hKC (P ) − d(P ) ≤ condQ
λ (P ) + NS,D (P ) + ε hH (P ) + O[Q(P ):Q] (1).
n
By Proposition 2.3.9, we have NS,D (P ) ≤ hD (P ) + O(1) and hence
1
1− hD (P ) + hKC (P ) − ε hH (P ) ≤ condQ
λ (P ) + d(P ) + O[Q(P ):Q] (1).
n
14.4. A general abc-conjecture 497
Since C = P1Q , KC ∼
= OP1 (−2) (cf. A.13.6), H = OP1 (1) with hH the standard
height and Example 14.4.4, we get
3
1 − − ε h(P ) ≤ condQ Q Q
[0] (x(P )) + cond[1] (x(P )) + cond[∞] (x(P ))
n
+ d(P ) + O[Q(P ):Q] (1)
for all P ∈ P1Q \ {0, 1, ∞} with affine coordinate x(P ). If we choose ε = ε/2
and n ≥ 6/ε , we get (a).
Remark 14.4.17. If we are only interested in x ∈ K \ {0, 1} for a fixed number
field K , then dependence on K plays no role and the strong abc-conjecture in
14.4.12 and Corollary 14.4.7 imply the K -rational abc-conjecture
(1 − ε)h(x) ≤ condK K K
[0] (x) + cond[1] (x) + cond[∞] (x) + O(1).
The proof of Theorem 14.4.16 can be adapted to show that the K -rational abc-
conjecture implies the K -rational version of Conjecture 14.4.10, namely
hD (P ) + hKC (P ) ≤ condK
λ (P ) + εhH (P ) + O(1)
By Remark 14.3.5, the same argument proves the following result of Elkies [98]:
Theorem 14.4.19. The K -rational abc-conjecture implies Faltings’s theorem in
11.1.1 for the number field K .
Remark 14.4.20. The proof we have given actually shows that an effective ver-
sion of the K -rational abc-conjecture implies effective versions of Roth’s and
Faltings’s theorems.
498 T H E VO J TA C O N J E C T U R E S
We have seen in Example 12.4.1 that the abc-conjecture holds for complex poly-
nomials. In this section, we extend this to a function field K of characteristic 0
proving the theorem of Stothers and Mason. But first, we transfer the results of
the last section to the case of function fields of characteristic 0. Then we prove
the abc-conjecture and also the Vojta height inequality in the split function field
case where no ε -term is necessary. In this situation, we can also transfer Car-
tan’s second main theorem with similar arguments as in Nevanlinna theory. As
a corollary, we obtain the result of Voloch and Brownawell-Masser bounding the
non-degenerate solutions of the unit equation with several summands.
14.5.1. In this section, K = k(B) denotes a function field of an irreducible pro-
jective variety B over a field k of characteristic zero. We assume that B is regular
in codimension 1 and we fix an ample class c on B .
Let MK be the set of prime divisors on B . We recall from Proposition 1.4.7 that
the discrete absolute values
|f |v := e− degc (v)ordv (f ) (f ∈ K, v ∈ MK )
satisfy the product formula.
14.5.2. By Lemma 1.4.10, every finite extension F/K is a function field F =
k(Y ) for a variety Y regular in codimension 1 and a finite morphism p : Y →
B . We have seen in Remark 1.4.11 that the set of places MF is independent
of the choice of the model Y and the argument shows that two such models are
isomorphic outside subsets of codimension at least 2. In fact, we may choose the
normalization of B in F as a canonical model. By Example 1.4.13, the absolute
values on MF are normalized according to 1.3.6 by
|f |w := e− degp ∗ c (w)ordw (f )/[F :K] (f ∈ F, w ∈ MF ).
14.5.3. In what follows, we assume that the reader is familiar with Appendix B.4.
In analogy to 14.3.8, we define the counting function of the ramification divisor
14.5. The abc-theorem for function fields 499
Rp by
NRp = − (ΩY /B,v ) log |πv |v .
v∈MF
Note that πv is a local parameter of v , hence
1
NRp = degp∗ c (Rp ).
[F : K]
Let KY (resp. KB ) be the canonical line bundle on the smooth part Yreg (resp.
Breg ). Since the complement of the smooth part has codimension at least 2, the
corresponding canonical divisors are well-defined in the Chow groups by pass-
ing to the Zariski closures of the components, hence their degrees are also well-
defined. By Theorem B.4.5 and projection formula (A.13) on page 558, we get
1 degp∗ c (KY )
NRp = degp∗ c (KY ) − degp∗ c (p∗ KB ) = − degc (KB ).
[F : K] [F : K]
This suggests the following analogue of the absolute logarithmic discriminant
degc (KY )
dF := .
[F : K]
By the arguments in 14.5.2, all these quantities do not depend on the choice of Y .
Example 14.5.4. If B is a geometrically irreducible smooth projective curve and
c is the equivalence class of a point, then A.13.6 shows that dF = (2g(Y ) −
2)/[F : K], where g(Y ) is the genus of Y .
Definition 14.5.5. For a complete variety X over K with any local height λ
relative to a Cartier divisor D and for a finite subset S ⊂ MK , the counting
function NS,D and the proximity function mS,D are defined as in 14.3.1. By
14.5.3, we define d(P ) := dK(P ) for any P ∈ X .
If D is effective, then we define the conductor condK
λ as in Definition 14.4.2,
where the sum now ranges over all MK(P ) .
Example 14.5.6. A point P = (f0 : · · · : fn ) ∈ Pnk (F ) induces a rational map
fP : Y Pnk defined over k . By Example 2.4.11, the standard height satisfies
1
h(P ) = degp∗ c fP∗ H
[F : K]
for every hyperplane H of Pnk with P ∈ H .
Example 14.5.7. More generally, we consider a complete variety X with a Cartier
divisor D defined over the constant field k . Then P ∈ X(F ) \ supp(D)(F )
induces a rational map fP : Y X (locally defined as in Example 14.5.6), but
note that fP is intrinsically defined because X is defined over k . We claim that
1
λ(P, v) = ordv (fP∗ D) degp∗ c (v) (v ∈ MF )
[F : K]
500 T H E VO J TA C O N J E C T U R E S
We leave the details to the reader. By the way, we may easily extend these results
to higher-dimensional complete varieties X , but we do not need that in the sequel.
14.5.10. We may pose the conjectures 14.4.10 and 14.4.13 also in the function
field case. However, as no Belyı̆ function exists in this case, we are unable to show
that the strong abc-conjecture implies the other conjectures. As a substitute for
the Fermat coverings in 14.4.15, we will use the following result:
Lemma 14.5.11. Let C be an irreducible smooth projective curve over K and let
n ∈ N\{0}. Suppose that we have disjoint non-zero rationally equivalent reduced
divisors D1 , D2 on C . Then there is an irreducible smooth projective curve C
over K and a finite morphism π : C → C of degree n which has ramification
divisor Rπ = (n − 1)D for D := (π ∗ (D1 + D2 ))red .
Proof: The argument works for any field K of characteristic zero. There is f ∈
K(C) \ {0} with
div(f ) = D1 − D2 .
√
Let C be the irreducible smooth projective curve with function field K(C)( n f )
and let π : C → C be the natural finite morphism induced by the extension of
function fields (use Lemma 1.4.10). First note that
deg(π) = [K(C ) : K(C)] ≤ n.
By B.1.9, we verify easily that the discriminant of xn −f is ±nn f n−1 . By Lemma
B.2.2, we conclude that K(C )/K(C) is unramified over any v ∈ MK(C) which
is no component of D1 or D2 . For w ∈ MK(C ) with v := π(w) equal to a
component of D1 , the ramification index satisfies
0
ew/v = ordw (f ) = n · ordw ( n f ) ≥ n.
Similarly, we obtain ew/v ≥ n if v is a component of D2 . By Example 1.4.12,
we have
[K(C ) : K(C)] = ew/v fw/v = ew/v [K(w) : K(v)],
w w
where w ranges now over all components of π −1 (v) for a given component v of
D1 + D2 . We conclude that the fibre consists only of one component w and
deg(π) = [K(C ) : K(C)] = ew/v = n, [K(w) : K(v)] = 1.
By Proposition B.4.9, we get the claim.
14.5. The abc-theorem for function fields 503
Theorem 14.5.12. The following conjectures are equivalent for the function field
K:
(a) Conjecture 14.4.10 for C = P1K ;
(b) Conjecture 14.4.10 for all curves C over K ;
(c) Vojta’s conjecture in 14.4.13 for all curves C over K .
Proof: (a) ⇒ (b): We have still a finite covering f : C → P1K using any non-
constant rational function. We choose a reduced effective divisor D0 on P1K such
that supp(D) ⊂ f −1 (D0 ) and such that f is unramified outside of f −1 (D0 ).
Then (b) follows from (a) along the same lines as in Theorem 14.4.16.
(b) ⇒ (c): This is analogous to the proof of Theorem 14.4.16.
(c) ⇒ (a): The goal is to prove Conjecture 14.4.10 for an effective reduced
divisor D on P1K . By the analogues of Propositions 14.4.5 and 14.4.9, we may
replace D by a larger effective reduced divisor. We choose any effective reduced
divisor D1 disjoint from D with deg(D1 ) = deg(D). Note that D ∼ D1 . We
may replace D by D + D1 . For n ∈ N \ {0}, Lemma 14.5.11 gives a finite
morphism π : C → P1K of degree n with ramification divisor
Rπ = (n − 1) (π ∗ (D))red .
Using this covering instead of Example 14.4.15, the proof of (a) is completely
similar as the implication (d) ⇒ (a) in the proof of Theorem 14.4.16.
If everything is defined over the constant field k , then we have seen in Example
14.5.7 that we have canonical local heights induced by geometry. In this special
case, we may prove the abc-conjecture, even without the ε -term. The correspond-
ing result is the following theorem of Stothers and Mason.
Theorem 14.5.13. Let C be an irreducible smooth projective curve over k and
let D be an effective reduced divisor on C defined over the constant field k . If we
use the canonical local heights from Example 14.5.7 relative to D and KC , then
hD+KC (P ) ≤ condK
D (P ) + d(P ) + (dim(B) − 1) degc (B)
for all P ∈ C \ supp(D) ∪ C(k) .
Proof: The basic idea is to use Bertini’s theorem to reduce the problem to the case
of a function field of a curve and then apply Hurwitz’s theorem.
By 14.5.8, we may assume that B is a curve. Let F := K(P ) with a model
p : Y → B as in 14.5.2. Taking into account the equivalence
div(fP∗ KC ) ∼ div(KY ) − RfP ,
from Theorem B.4.5, we conclude
1
hKC (P ) = degp∗ c (KY ) − degp∗ c (RfP ) .
[F : K]
504 T H E VO J TA C O N J E C T U R E S
The following simple lemma is the substitute of the lemma on the logarithmic
derivative in the function field case.
Lemma 14.5.15. Let Y be an irreducible smooth curve over k , let f ∈ k(Y ), let
v ∈ Mk(Y ) and let j ∈ N . Then
j
d
ordv f −1 f ≥ −j
dπv
for any local parameter πv in OY,v . Moreover, if f is a unit in OY,v it holds
j
−1 d
ordv f f ≥ 0.
dπv
14.5. The abc-theorem for function fields 505
Proof: By the definition of d/dπv in A.7.25 and noting that dπv is a basis of Ω1C,v
(see proof of Proposition B.4.7), we conclude that d/dπv is a derivative with
d
OY,v ⊂ OY,v .
dπv
Then the claim follows easily from
f = uπvordv (f )
for a unit u in OY,v and from Leibniz’s rule.
14.5.16. Our next goal is an analogue of Cartan’s second main theorem for
function fields. As we will see the arguments are very similar as for Theorem
13.4.16.
The basic assumptions are the following: Let S be a finite subset of MK , let
q
H1 , . . . , Hq be hyperplanes of PnK in general position, let D := j=1 Hj and
let P be a point of PnK not lying in any hyperplane defined over k . As in 14.5.2,
we may choose a model Y of F := K(P ).
14.5.17. We first assume that K is the function field of a curve. This case re-
sembles most the field of meromorphic functions on C considered in Nevanlinna
theory. It enables us to define the ramification divisor of P in the following way:
The point P is given by a rational map fP : Y Pnk defined over k . For
v ∈ MF , we may choose relatively prime elements f0v , . . . , fnv of the discrete
valuation ring OY,v with fP = (f0v : · · · : fnv ). By assumption, f0v , . . . , fnv
are linearly independent over k and so we may define ordv (RP ) as the order of
the Wronskian of f0v , . . . , fnv in v , where the derivatives are taken with respect
to a local parameter πv in v . By Leibniz’s rule and the multilinearity of the
determinant, this is independent of the choices of f0v , . . . , fnv and πv . Then we
define the ramification divisor to be
RP := ordv (RP )v,
v∈MF
n(n + 1)
mS,D (P ) + NS,ram (P ) ≤ (n + 1)h(P ) + d(P ) + log |1/πv |v ,
2 −1 v∈p S
d
W (f0 , . . . , fn ) := det fi .
dz i,j=0,...,n
No O(1)-term is necessary because the heights are canonical. Let I = {i1 <
· · · < in+1 } be the complement of {k1 < · · · < kp } in {1, . . . , q}. For the
logarithmic Wronskian
W (gi1 , . . . , gin +1 )
λ(gi1 , . . . , gin +1 ) := ,
gi1 · · · gin +1
we get again the fundamental identity
g1 · · · gq
gk1 · · · gkp = · d−1 · λ(gi1 , . . . , gin +1 ), (14.18)
W (f0 , . . . , fn ) I
where dI is the determinant formed with the coefficients of (i )i∈I .
Let v ∈ p−1 S . We choose a local parameter πv for v . Leibniz’s rule and the
multilinearity of the determinant give
n (n +1)
λ(gi1 , . . . , gin +1 ) = (dπv /dz) 2 λv (gi1 , . . . , gin +1 ), (14.19)
where λv denotes the logarithmic Wronskian with respect to the differential oper-
ator d/dπv instead of d/dz . By Lemma 14.5.15, we get
n (n +1) n (n +1)
|λ(gi1 , . . . , gin +1 )|v ≤ |dπv /dz|v 2
· |1/πv |v 2
.
Together with the product formula, (14.18) implies
max log |gk1 · · · gkp |v
k1 <...<kp
v∈p−1 S
W (f0 , . . . , fn )
≤ log
g1 · · · gq v
v∈p−1 S
n(n + 1)
+ (log |dπv /dz|v + log |1/πv |v ) .
2 −1 v∈p S
14.5. The abc-theorem for function fields 507
dπv
+ log + (n + 1) log max |fi |v .
−1
2 dz v i
v∈p S
Moreover, we have
λHj (P, v) = − log |j (f0v , . . . , fnv )|v
= − log |gj |v + log max |fi |v .
i
We conclude that
max log |gk1 · · · gkp |v ≤
k1 <···<kp
v∈p−1 S
q
NS,Hj (P ) − NS,ram (P ) − p log max |fi |v
i
j=1 v∈p−1 S
n(n + 1) dπv
+ log + log |1/π |
v v .
2 dz v −1
v∈MF v∈p S
This leads to
q
p · h(P ) ≤ NS,Hj (P ) − NS,ram (P )
j=1
n(n + 1)
+ d(P ) + log |1/πv |v
2 −1 v∈p S
q
(n)
(q − n − 1)h(P ) ≤ NHj (P )+
j=1
(14.20)
n(n + 1) degp∗ c (p−1 S)
+ d(P ) + (dim(B) − 1) degc (B) + .
2 [F : K]
To give a sketch of proof, we first note that we may assume B to be a curve by the
techniques of 14.5.8. Then the claim follows from Theorem 14.5.18 and from the
identity
q
(n)
qh(P ) − N∅,ram (P ) ≤ NHj (P ),
j=1
and assume that the elements (ui )i∈I are linearly independent over k for every
proper subset I of {0, . . . , n}. Then
n(n − 1)
h((u0 : · · · : un )) ≤ (dK + degc (S) + (dim(B) − 1) degc (B)) .
2
Proof: By the techniques introduced in 14.5.8, we may assume that B is a curve.
We note also that
degc (S) = log |1/πv |v .
v∈S
In fact, Brownawell and Masser ([52], Th.B) proved a bit more, substantially re-
laxing the linear independence condition of subsets of u0 , . . . , un . This is useful
for specific applications.
Theorem 14.5.22. Let K = k(B) be a function field in characteristic 0. Suppose
that u0 + . . . + un = 0, that no non-empty proper subsum vanishes, and that
u0 , . . . , un are S -units for some finite set S ⊂ MK . Then
(n − 1)n
h(u) ≤ max{dK + degc (S) + (dim(B) − 1) degc (B), 0}.
2
For the proof, we may not use Cartan’s second theorem for function fields because
the functions u1 , . . . , un need not be linearly independent over k . The following
510 T H E VO J TA C O N J E C T U R E S
lemma enables us to define an analogue of the Wronskians such that we can trans-
fer the steps from the proof of Cartan’s second main theorem. The proof does not
use the linearly independent case and hence reproves Corollary 14.5.20.
A subset I of {0, . . . , n} is called minimal if the set uI := {ui | i ∈ I} is linearly
dependent over k but every proper subset of uI is linearly independent over k .
Lemma 14.5.23. There are disjoint non-empty subsets I1 , . . . , Il of {0, . . . , n}
and non-empty subsets J1 , . . . , Jl−1 with
{0, . . . , n} = I1 ∪ · · · ∪ Il , Jν ⊂ I1 ∪ · · · ∪ Iν (ν = 1, . . . , l − 1)
such that I1 , J1 ∪ I2 , J2 ∪ I3 , . . . , Jl−1 ∪ Il are minimal.
For the elementary proof using simple linear algebra, we refer to [52], Lemma 6,
or [249], Lemma A.3.2.7.
Proof of Theorem 14.5.22: By the usual arguments from 14.5.8, we may assume
that B is a curve. By renumbering, we may assume that Ij = {Nj−1 , . . . , Nj −1}
for a sequence N0 = 0 < N1 < · · · < Nl = n + 1. For convenience, we set
J0 = ∅. For every ν ∈ {1, . . . , l}, we have a linear relation
cν,0 u0 + · · · + cν,n un = 0
×
with cν,i ∈ k for i ∈ Jν−1 ∪ Iν and cν,i = 0 else. We set nν := |Iν | and let
z ∈ k(B) \ k . We consider the (n1 − 1) × (n + 1) matrix
i
d
A1 := c1,j uj
dz i=0,...,n1 −2;j=0,...,n
d
Aν := cν,j uj
dz i=0,...,nν −1;j=0,...,n
Proof: We know that f induces a rational map f : B Pnk . Let S be the
set of prime divisors of B contained in the closure of f −1 (D). Dividing by f0 ,
we may assume that f0 , . . . , fn are S -units. We claim that f ∗ (D)red = S . For
− minj ordv (fj )
v ∈ S , the local equation of f ∗ ({xj = 0}) is fjv := fj πv . Hence
v ∈ f ∗ (D)red if and only if at least one fjv is in the maximal ideal of OB,v
proving immediately the claim. Hence we have
∗
condK
D (f ) = degc (f (D)red ) = degc (S)
Another intriguing question has been posed by Vojta, namely whether for any fixed
positive ε > 0 and any n ≥ 2 there is a closed subvariety V {x0 + · · · +
xn = 0} in projective space PnC , depending only on n and ε , with the following
property: If f0 (t), . . . , fn (t) are polynomials in C[t] without common zeros and
f0 + · · · + fn = 0, then
max deg(fi ) ≤ (1 + ε) max{deg(rad(f0 . . . fn )) − 1, 0}
unless the holomorphic curve {(f0 (t) : · · · : fn (t)) | t ∈ C} is contained in V .
We illustrate Vojta’s idea by showing how to exclude the example by Browkin and
Brzeziński (see 12.4.5) for the case n = 3. Consider the identity
(a + b + c)3 = a3 + b3 + c3 − 3abc + 3(ab + ac + bc)(a + b + c),
which reduces to the four term identity
a3 + b3 + c3 − 3abc = 0
if a + b + c = 0. The example of Browkin and Brzeziński is obtained by taking
a = 1, b = −t , c = t − 1 and f0 = a3 , f1 = b3 , f2 = c3 , f3 = −3abc, so that
equality holds in (14.25). Clearly, no subsum vanishes. The relations to avoid are
simply
(fi + fj + fk )3 − 27fi fj fk = 0
for any choice of distinct indices i, j, k .
14.6. Bibliographical notes 513
Sections 14.2 and 14.3 are taken from [307] and [318], where the reader will find
further informations. The degeneracy of K -rational points in a variety of gen-
eral type was posed as an open problem in Bombieri’s lecture at the university
of Chicago in 1980. For function fields in characteristic 0, this was solved by J.
Noguchi [225] under the stronger assumption that the cotangent bundle is ample.
S. Lang gave more general conjectures relating the structure of K -rational points
also to hyperbolicity and connecting the special sets from Nevanlinna theory and
diophantine approximation (see [172]).
The zero-dimensional part of the exceptional set Z in the K -rational Vojta con-
jecture must depend on ε , but it may be that the higher-dimensional part is inde-
pendent. At least, this holds in Schmidt’s subspace theorem and in Vojta’s version
of Cartan’s second main theorem, both proved by P. Vojta in [309], [316].
Vojta’s conjecture with ramification does imply Conjecture 14.4.10 also in higher
dimension. This is due to P. Vojta [317], true over number fields and function fields
of characteristic zero.
Cartan’s second main theorem holds also in the linearly degenerate case, with the
factor n + 1 replaced by n + t + 1, where t is the codimension of the linear span
of the image of f . This was done by E.I. Nochka (see [223], [224]). Ru and Wong
have worked out the number theoretic analogue, see also P. Vojta [316] for an
alternative proof. For function fields in characteristic 0 (with hyperplanes defined
over the constant field), this is due independently to J. Noguchi [226] and J.T.-Y.
Wang [320]. In [320], there is also a generalization of the linearly non-degenerate
case to characteristic p .
In the non-split case (when varieties and divisors are not defined over the constant
field), very few results are known. For a function field K of characteristic 0 and
a curve C of genus g ≥ 2 over K , P. Vojta [311] proved
hKC (P ) ≤ (2 + ε)d(P ) + O(1)
for all P ∈ C(K) using the methods of Grauert [129] in the proof of the Mordell
conjecture over function fields. M. Kim [160] generalized it to characteristic p
under an assumption on the Kodaira spencer map. Note that Vojta’s conjecture
( H = KC ample, D = 0) would predict a factor 1 + ε instead of 2 + ε , at least
for points of bounded degree.
The proof of the general abc-theorem for polynomials is similar to the one of
Brownawell and Masser [52], it is inspired by Cartan’s proof of its second main
theorem in Nevanlinna theory. The proof of Voloch [319] is more geometric and
relies on the Brill–Segre formula, which is a generalization of Hurwitz’s genus
formula for non-degenerate maps from a curve into projective space.
A P P E N D I X A A L G E B R A I C G E O M E T RY
A.1. Introduction
We collect here some definitions and results from algebraic geometry needed in
the text. For most of our purposes, it is enough to work with varieties over a base
field K and to consider points rational in a fixed algebraic closure K . Thus we
may neglect the modern language of schemes in order to keep the exposition ele-
mentary. In some side remarks or proofs, not essential for the basic understanding
of the book, it will still be convenient to use the theory of schemes on the level of
[148], Ch.II.
Arguments are only given if they are easy and instructive or if we have not found
an appropriate reference, otherwise we freely quote from standard text books up to
the volumes of Grothendieck. Most of the quoted results are true for more general
classes of schemes, but we formulate them just for varieties using the dictionary
mentioned in A.2.8.
No knowledge of algebraic geometry is required for reading Appendix A. How-
ever, the presentation is too brief for learning the subject and it would be useful,
from an educational point of view, if the reader is familiar with the theory of vari-
eties over an algebraically closed field as in the books of R. Hartshorne [148], Ch.
I, D. Mumford [213], Ch. I, or I.R. Shafarevich [279].
We advise the reader to work through Sections A.2–A.4 to gather the most fre-
quently used definitions, notations, terminology, and results. Also if you look up
to the definition of a projective variety in Section A.6, then you will be ready to
start the book, coming back to Appendix A only when required in the text.
ϕ (xi ). We get ϕ(α) = (ϕ1 (α), . . . , ϕn (α)) for all α ∈ V . This way, we
obtain a bijective correspondence between K -morphisms of varieties over K and
K -algebra homomorphisms of the corresponding coordinate rings.
The morphism ϕ is called a K -isomorphism if and only if there is a K -morphism
ψ : X → X such that ϕ ◦ ψ and ψ ◦ ϕ are both the identity map.
A.2.5. For x ∈ X , we consider pairs (U, f ), where U is an open neighbourhood
of x and f ∈ OX (U ). Two pairs (U, f ) and (U , f ) are called equivalent if
and only if f = f is on a neighbourhood of x . The set of equivalence classes is
denoted by OX,x and is called the local ring of x in X . In K[X], we consider
the maximal ideal mx = {f ∈ K[X] | f (x) = 0}. Then OX,x is the localization
of K[X] in mx with unique maximal ideal mx = mx OX,x (see A.2.10).
A.2.6. Let K ⊂ L ⊂ K be an intermediate field. Let X be an affine variety over
K . Then the set of L-rational points of X is
{x ∈ X | f (x) ∈ L ∀ f ∈ OX (X)}.
If we view X as a closed subset of AnK , then this means that all coordinates of
x are in L. For x ∈ X , the smallest L such that x is L-rational is denoted by
K(x). In fact, we have
∼
OX,x /mx −→ K(x), f + mx → f (x).
A.2.7. Note that points in X have not to be closed. For x ∈ X , the closure of x
is the set of conjugates of x . If X is a closed subset of AnK , then the conjugates
of x ∈ X are given by σx applying σ ∈ Gal(K/K) componentwise. Thus the
local rings of x and of its conjugates are the same.
A.2.8. Now we relate affine varieties to affine schemes. This makes it possible to quote
results from standard books about schemes. If the reader is not familiar with the language
of schemes, he can skip the following remarks without any problems for the understanding
of the book.
Let X be an affine variety over K with ring of regular functions A = OX (X) , Then we
have a map
t : X −→ Spec(A), x
→ I({x}).
Then t(X) is the set of maximal ideals and this is dense in Spec(A) . Moreover, the
topology on X is the coarsest topology making t continuous, i.e. U is open on X if and
only if it is the inverse image of an open subset V in Spec(A) . Then we have
∼ OX (t−1 (V ))
OSpec(A) (V ) =
or more formally t∗ OX ∼ = OSpec(A) . This follows immediately from the definitions and
the density of t(U ) in V . Therefore any morphism of affine varieties over K extends to a
morphism of affine schemes.
Conversely, for any reduced affine scheme Spec(A) of finite type over K , we can consider
the K -rational points of Spec(A) as an affine variety X over K with OX (X) = A . This
A.2. Affine varieties 517
may be used to translate results from affine varieties over K to reduced affine schemes of
finite type over K and conversely.
To prove this, let x be a point in an open subset U . There is f ∈ I(X \ U ) with f (x) = 0
and hence x ∈ Xf ⊂ U .
This is easily used to prove that the local ring OX,x of X in x is isomorphic to
the localization of K[X] in the maximal ideal mx ([148], Prop.II.2.2).
A.2.11. Here, we explain how to pass from affine varieties over K to affine vari-
eties over K . Let X be an affine variety over K . It is a closed subset of AnK given
n
by the ideal I(X). As AnK and AnK have the same underlying set K , we may
view X as a closed subset of AnK given by the zeros of the set I(X) ⊂ K[x].
The corresponding affine variety over K is denoted by XK . Note that X and
XK have the same points but a different topology and a different ring of regular
functions. It is easily seen that the variety XK does not depend on the choice of
the affine space. It is called the base change of X to K .
For any extension field F of K , we define the base change XF as the affine
variety given by the closed subset Z(I(X)) in AnF and by the corresponding co-
ordinate ring
0
F [XF ] = F [x1 , . . . , xn ]/ I(X).
518 A L G E B R A I C G E O M E T RY
To prove it, assume first that Y is irreducible. Let a ∈ K[x] \ I(Y ) and b ∈ K[x] with
ab ∈ I(Y ) . Then the union of Z(a) ∩ Y and Z(b) ∩ Y is Y , hence Z(b) ⊃ Y proving
b ∈ I(Y ) . Thus I(Y ) is a prime ideal.
Conversely, assume that I(Y ) is a prime ideal. Let A and B be closed subsets of Y with
A ∪ B = Y . If A = Y , then A.2.2 shows that there is an a ∈ I(A) \ I(Y ) . Since
I(A) ∩ I(B) = I(A ∪ B)
contains ab for every b ∈ I(B) , we conclude that b ∈ I(Y ) . Therefore I(B) = I(Y )
proving B = Y and the irreduciblity of Y .
For an affine variety X , we conclude that in the one-to-one correspondence be-
tween radical ideals of K[X] and closed subsets of X mentioned in A.2.9, the
prime ideals correspond to the irreducible closed subsets of X .
A.3.3. A maximal irreducible subset of T is called an irreducible component
of T . As the closure of an irreducible subset is again irreducible, the irreducible
components are closed. If T is irreducible, then any non-empty open subset is
irreducible and dense. The proofs are immediate from the definitions.
Example A.3.4. If Y is a closed subset of AnK , then the irreducible components
of Y are the zero sets of the minimal prime ideals containing I(Y ). There are
finitely many irreducible components and their union is equal to Y .
A.3.5. We define the dimension of T to be
dim(T ) := sup{n | A0 A1 · · · An },
where A0 A1 · · · An is ranging over all chains of irreducible closed
subsets of T .
Example A.3.6. If X is an affine variety over K , then Example A.3.4 shows
that the dimension of X is equal to the Krull dimension of the noetherian ring
K[X]. By definition, the Krull dimension dim(A) of a commutative ring A is
the supremum over the length of all prime ideal chains. So this follows from the
one-to-one correspondence between prime ideals in K[X] and irreducible closed
subsets of X, which we deduce from Example A.3.2. If X is irreducible, then
it is a consequence of commutative algebra that the dimension of X is equal to
the transcendence degree of K(X) over K, where K(X) is the quotient field of
K[X] (H. Matsumura [197], Ch.5, §14). In particular, the dimension of AnK is n .
A.3. Topology and sheaves 519
n2
with K by ordering the entries xij of the matrices lexicographically. Now we
2
consider GL(n, K) as a closed subset of AnK +1 by using the map
2
GL(n, K) → AnK +1 , (xij ) → (xij ); det(xij )−1 .
2
Let (yij )1≤i,j≤n ; yn2 +1 be the coordinates on AnK +1 . Then GL(n, K) is iden-
tified with Z(det(yij )yn2 +1 − 1). This makes GL(n, K) into a n2 -dimensional
affine variety over K , which we denote by GL(n)K . Note that for any interme-
diate field L of K and K , the L-rational points are equal to GL(n, L). The
base change of GL(n)K to K is equal to GL(n)K . The group operation gives a
K -morphism
GL(n)K × GL(n)K −→ GL(n)K , (g1 , g2 ) → g1 · g2 .
Similarly, the inverse is a K -morphism. Hence GL(n)K is an example of a group
variety handled in Section 8.2.
A.3.13. A presheaf of abelian groups on our topological space T is a map assign-
ing to each open subset U of T an abelian group F(U ) and to every open subset
V of U a homomorphism ρU V : F(U ) → F(V ) called the restriction map such
that:
(a) F(∅) = 0;
(b) ρU
U is the identity;
W = ρW ◦ ρV .
(c) if W ⊂ V ⊂ U are open subsets, then ρU V U
Instead of abelian groups, we may consider rings, K -vector spaces or other alge-
braic structures.
Example A.3.14. For an open subset U of T , let F(U ) be the set of continuous
real functions on U . If V is an open subset of U , then we define ρU
V by restricting
functions of U to V . Then F is a presheaf of R -algebras on T . If T is a
differentiable manifold, then the same construction works with C ∞ -functions.
A.3.15. A presheaf F on T is called a sheaf if the following conditions are satis-
fied for every open subset U of T and every open covering (Ui )i∈I of U :
(a) if s ∈ F(U ) with ρU
Ui (s) = 0 for all i ∈ I , then s = 0;
U
(b) if si ∈ F(Ui ) for all i ∈ I and ρUUi ∩Uj (si ) = ρUi ∩Uj (sj ) for all i, j ∈ I ,
i j
Note that s in (b) is unique by (a). The presheaves in Example A.3.14 are sheaves.
Example A.3.16. Let X be an affine variety over K . Then OX is a presheaf of
K -algebras. We define ρU V again as the restriction map of functions. It is almost
by definition of a regular function that OX is a sheaf.
A.4. Varieties 521
A.4. Varieties
the covering (Uα × Uβ )α∈I,β∈J defines a unique topology on X × X such that
each Uα × Uβ is an open subset of X × X . Then X × X is a prevariety with
affine charts (Uα × Uβ , ϕα × ψβ ). The structure does not depend on the choice
of the affine charts. The details are left to the reader.
A.4.5. A prevariety X over K is called a variety over K if the diagonal ∆ :=
{(x, x) | x ∈ X} is closed in X × X . Obviously, any affine variety over K is a
variety. Regular functions and morphisms are the same as for prevarieties.
A.4.6. In the whole book, only the notion of a K -variety will occur. Let X be
a variety over K . For an intermediate field L of K and K , a point x ∈ X is
called L-rational if x is L-rational in one affine chart (and hence in all affine
charts containing x ). The set of L-rational points is denoted by X(L). Similarly
as in A.2.5, we define the local ring OX,x . It has a unique maximal ideal mx and
the residue field K(x) := OX,x /mx .
In a first step, we show that Y0 is defined over K . To see this, we may assume that Y is
affine given by f1 (x), . . . , fm (x) in AnF . Then Y0 is given by all the polynomials of the
form σ∈R σ(fjσ ) , where 1 ≤ jσ ≤ m . Since Y is defined over a finite subextension,
we may assume that [F : K] < ∞ . We consider first the case F/K separable. Passing to
a finite extension, we may assume F/K Galois with Galois group R . It is enough to show
that Y0 is defined by the polynomials
ρ(µ) ρ ◦ σ(fjσ ) = TrF (x)/K (x) µ σ(fjσ ) (µ ∈ F, 1 ≤ jσ ≤ m)
ρ∈R σ∈R σ∈R
Example A.5.6. Let E be a vector bundle over K and let µ ∈ K . Then we have
a homomorphism [µ] : E → E given by using multiplication with µ on the fibre
Ex . To check that [µ] is a K -morphism, we use that [µ] is given in a trivialization
Uα × ArK by (x, λ) → (x, µλ),
A.5.7. In this remark, we show how to give a vector bundle by its transition ma-
trices. Let (Uα )α∈I be an open covering of X . For all α, β ∈ I , we consider
K -morphisms gαβ : Uα ∩ Uβ → GL(r)K satisfying the cocycle rule (A.3). Then
we glue the trivial bundles Uβ × ArK and Uα × ArK along the isomorphisms
∼
ϕαβ : (Uα ∩ Uβ ) × ArK −→ (Uα ∩ Uβ ) × ArK
given by ϕαβ (x, λ) = (x, gαβ (x)λ). For more on glueing, see [148], Exer-
cise II.2.12. We obtain a vector bundle E over X with trivializations ϕα :
−1
πE (Uα ) → Uα × ArK such that the transition matrices are equal to gαβ .
Conversely, if we start with a vector bundle and apply this process to its transition
matrices, we get a new vector bundle isomorphic to the original one.
Example A.5.8. Let E and E be vector bundles over X . As an abstract set,
the direct sum E ⊕ E is given as the disjoint union of (Ex ⊕ Ex )x∈X . We get
πE⊕E by mapping Ex ⊕ Ex onto x . To define a vector bundle structure on it, we
choose trivializations (Uα , ϕα )α∈I and (Uα , ϕα )α∈I of E and E , respectively.
Note that we may always assume that the open coverings are the same by passing
to a common refinement. We claim that there is a unique vector bundle structure
on E ⊕ E such that
r r
−1 rα +rα
ϕα ⊕ ϕα : πE⊕E (Uα ) −→ Uα × AK = Uα × K ⊕ K α ,
α
gαβ (x) 0
gαβ := ∈ GL(rα + rα , K(x)),
0 gαβ (x)
where gαβ , gαβ are the transition matrices of E and E , respectively. Clearly,
gαβ gives a morphism Uα ∩ Uβ → GL(rα + rα )K proving well-definedness of
defined a priori on fibres, are isomophisms of vector bundles. This is clear since
the transition matrices of E ⊗ E are given by gαβ ⊗ gαβ
. Clearly, we have
(E ⊗ E )x (K(x)) = Ex (K(x)) ⊗K(x) Ex (K(x))
and
Γ(Uα , E ⊗ E ) = Γ(Uα , E) ⊗ Γ(Uα , E ).
However, this has not to be true for all open subsets of X . Here, equal means up
to canonical isomorphism.
A.5.13. Similarly, we construct the dual vector bundle E ∗ of E . It is the disjoint
union of the dual vector spaces Ex∗ of Ex , x ∈ X . The transition matrices of E ∗
t
are given by the transposes hαβ := gβα .
We can also extend other constructions from linear algebra to vector bundles. It is
always the same pattern. First, we define the underlying set using the construction
fibrewise. Then we choose the evident trivializations. We use it to define the vector
bundle structure on the abstract set. We have to show that it fits on overlappings,
which becomes clear by considering transition matrices. Note that pointwise, they
are the same as the transformation matrices in linear algebra.
A.5. Vector bundles 529
α0 αi−1 αi+1 αn
ϕi : Ui −→ AnK , α → ,..., , ,..., .
αi αi αi αi
So PnK is a K -variety with affine charts (Ui , ϕi )i=0,...,n . For the proof that the
diagonal is closed, we refer to A.6.4. Clearly, the K -variety PnK does not depend
on the choice of coordinates, i.e. if we change coordinates by g ∈ GL(n + 1, K),
then we get the same K -variety PnK .
A.6.3. Let Y be a closed subset of PnK . By A.4.8, it is a closed subvariety of PnK .
We call it a projective variety over K . The homogeneous ideal I(Y ) of Y is
the ideal in K[x0 , . . . , xn ] generated by the homogeneous polynomials vanishing
on Y . The homogeneous coordinate ring S(Y ) of Y is defined by
5
S(Y ) := S(Y )d := K[x]/I(Y ),
d∈N
where the graduation is induced by the degree of polynomials. If Y is irreducible,
then the field of rational functions K(Y ) is the subfield of the quotient field of
S(Y ) given by
{f /g | ∃ d ∈ N with f, g ∈ S(Y )d }.
To sketch the proof, we view Y as a closed subset of PnK . Then there is a standard affine
open subset Ui with Y ∩ Ui = ∅ . By A.4.11, the quotient field of K[Ui ∩ Y ] is equal
to K(Y ) . Since Ui is isomorphic to affine n-space, any rational function may be writ-
ten as the quotient of two polynomials in x0 , . . . , xi−1 , xi+1 , . . . , xn . By passing to the
homogenizations, we get the claim.
A.6.4. The product of projective varieties over K is again a projective variety over
K . To prove it, let Y (resp. Y ) be a closed subvariety of PnK (resp. Pm K ). We
consider the Segre embedding
ι : PnK × Pm
K −→ PK ,
N
(x, x ) → (xi xj )0≤i≤n,0≤j≤m ,
where N = (n + 1)(m + 1) − 1. As Y × Y is a closed subvariety of PnK × Pm K,
it is enough to show that the Segre embedding is a closed embedding. Let (yij ) be
the coordinates on PN K (say ordered lexicographically) such that ι maps (x, x )
to (yij ) = (xi xj ). We have to prove that ι maps PK × PK isomorphically onto
n m
A.6.6. We claim that the global sections of OPnK (m) may be identified with the
homogeneous polynomials in K[x0 , . . . , xt ] of degree d .
For s ∈ Γ(PnK , OPnK (m)) and with respect to the above trivializations, there are regular
functions sα on Uα such that
(ϕα ◦ s) (x) = (x, sα (x)) (A.4)
label for all x ∈ Uα . They satisfy
sα = gαβ sβ (A.5)
A.6. Projective varieties 533
on Uα ∩ Uβ , where gαβ (x) = (xβ /xα )m are the transition functions of OPnK (m) . Con-
versely, any regular functions sα ∈ OPnK (Uα ) for α = 0, . . . , n satisfying (A.5) determine
a unique global section s with (A.4). Since Uα is a standard affine open subset of PnK , we
may identify the sα with polynomials. The rule (A.5) means that there is a homogeneous
polynomial of degree m such that sα is obtained by inserting 1 for xα . Hence the global
sections of OPnK (m) may be identified with the homogenous polynomials of degree m in
the variables x0 , . . . , xm with coefficients in K .
∗
To prove this, we choose a closed embedding i : X → Pm K such that i OPm K
(1) = L .
We denote the coordinates on PK by y0 , . . . , ym and those of PK by x0 , . . . , xn . The
m n
Remark A.6.12. A line bundle L on X is ample (resp. very ample) if and only
if the base change LK on XK is ample (resp. very ample). To see the ample
case, use the cohomological criterion of ampleness from [148], Prop.III.5.3, and
the compatibility of cohomology and base change (see A.10.28). The very ample
case follows from [137], Prop.2.7.1, implying of course also the ample case. In
fact, it is proved that a morphism is a closed embedding if and only if its base
change is a closed embedding.
Example A.6.13. A multiprojective space is a product
PK := PnK1 × · · · × PnKr
A.6. Projective varieties 535
of projective spaces. The projection to the ith factor PnKi is denoted by pi . Now
for d1 , . . . , dr ∈ Z, let
OP (d1 , . . . , dr ) := p∗1 OPn 1 (d1 ) ⊗ · · · ⊗ p∗r OPn r (dr ).
Let xi = (xi0 : · · · : xini ) be the coordinates on PnKi . By generalizing A.6.6, we
see that the global sections of OP (d1 , . . . , dr ) may be identified with the multiho-
mogeneous polynomials in x1 , . . . , xr , homogeneous of degree di in xi .
The Segre embedding may be extended to include several factors, thereby proving
that OP (1, . . . , 1) is very ample. On the other hand, p∗i OPn i (1) is generated by
global sections (but certainly not ample for two or more factors). By A.6.10, we
conclude that O(d1 , . . . , dr ) is very ample if d1 , . . . , dr ≥ 1.
A.6.14. A variety X over K is called complete if for all varieties Y over K , the
second projection p2 : X × Y → Y is closed (i.e. maps closed sets to closed sets).
In algebraic geometry, this is the analogue of compact complex manifolds.
There is a relative version of this notion called proper morphisms. Every closed
embedding and every morphism from a complete variety will be proper. This
notion is important in algebraic geometry, many finiteness results are related to it.
For our book, it plays only a minor role and is used to state the results properly.
The reader may always think of a morphism of complete varieties.
For j = 1, 2, let ϕj : Xj → S be a morphism of varieties over K . Then
X1 ×S X2 := {(x1 , x2 ) ∈ X1 × X2 | ϕ1 (x1 ) = ϕ2 (x2 )}
is called the fibre product of X1 and X2 over S . It is easily seen that the fibre
product is a closed subset of X1 × X2 , so we may view X1 ×S X2 as a closed
subvariety of X1 × X2 . Let ϕ : X → X be a morphism of varieties over K . For
every morphism ψ : Y → X , we define the base change ϕY : X ×X Y → Y
by ϕY (x, y ) = y . The morphism ϕ is called proper if all base changes of ϕ
are closed.
Clearly, a variety X is complete if and only if the constant map X → A0K is
a proper morphism. For details about proper morphisms, we refer to [148], II.4.
(However, the reader has to translate the results from the category of schemes. Note that the
fibre product of varieties is the variety associated to the fibre product of schemes. Since any
morphism of varieties is of finite type and separated, our definition of a proper map agrees
with the one in [148].)
A.6.15. We mention some properties of complete varieties over K . Most of them
can be deduced from the corresponding properties of proper morphisms in [148],
Cor.4.8.
(a) A closed subvariety of a complete variety over K is complete because any
closed embedding of varieties is proper and the composition of proper mor-
phisms is proper.
536 A L G E B R A I C G E O M E T RY
A.7.7. We use A.7.6 to extend the definition of the differential in A.7.4 assuming
no longer K(x) = K(y). Let ϕ : X → Y be a morphism mapping x to y . Since
we deal with a local problem, we may assume X, Y affine. By base change, ϕ
induces a homomorphism K[Y ] ⊗K K(x) → K[X] ⊗K K(x) of K(x)-algebras
and hence a K(x)-linear map
Der (K[X] ⊗K K(x), K(x)) → Der (K[Y ] ⊗K K(x), K(x)) , ∂ → ∂◦(ϕ ⊗1) .
By A.7.6, this is a map TX,x → TY,y ⊗K(y) K(x), which we call the differential
(dϕ)x .
A.7.8. Let x ∈ X and let mx be the maximal ideal of OX,x . Note that mx /m2x
is a K(x) = OX,x /mx -vector space whose dual is denoted by (mx /m2x )∗ . Then
we have a K(x)-linear map
ρ : TX,x −→ (mx /m2x )∗ ,
where ρ(∂) is defined for ∂ ∈ TX,x by
ρ(∂)(f ) := ∂(f ) ∈ K(x)
for f ∈ mx . By Leibniz’s rule, ρ(∂) vanishes on m2x and is K(x)-linear, hence
we may view ρ(∂) as an element of (mx /m2x )∗ .
Now we assume that x is K -rational. Then ρ is an isomorphism. Its inverse maps
∈ (mx /m2x )∗ to ∂ ∈ TX,x given by
∂(f ) := (f − f (x)), f ∈ OX,x .
The space (mx /m2x )∗ is Zariski’s definition for the tangent space at x ∈ X(K).
A.7.9. For x ∈ X , let dimx (X) := maxY dim(Y ), where Y ranges over all
irreducible components containing x. Since the prime ideals of OX,x are in one-
to-one correspondence with the closed irreducible subsets containing x , we get
dimx (X) = dim (OX,x ).
From commutative algebra ([197], p.78), we know
dimK(x) (mx /m2x ) ≥ dim (OX,x ).
We call OX,x a regular local ring if equality occurs. More generally, this holds
for any noetherian local ring.
If OX,x is regular, then we say that x is a regular point of X . A singular point
is a point which is not regular. If all points of X are regular, then X is called a
regular variety.
A.7.10. For x ∈ X , we have
dimK(x) (TX,x ) ≥ dimx (X).
A.7. Smooth varieties 539
This follows from A.7.8 and A.7.9 if x is K -rational. In general, we use base
change to K . The right-hand side does not change (see A.4.10). Then we use
A.7.6 to get the claim.
A.7.11. A point x ∈ X is called smooth if
dimK(x) (TX,x ) = dimx (X).
If all points are smooth, then X is a smooth variety over K.
A.7.12. Let x ∈ X be a smooth point. We claim that x is a regular point of XK
and hence also of X ([137], Prop.0.17.3.3, for the descent).
A.7.15. Let X be a closed subset of AnK . For x ∈ X , we have the Jacobi criterion of
smoothness: Let f1 , . . . , fr be generators of I(X) . Then x is a smooth point of X if and
only if the Jacobi-matrix
∂
(fj ) (x)
∂xi 1≤i≤n,1≤j≤r
has rank n − dimx (X) .
∂
det fj (x) = 0
∂xi i∈I,j∈J
∂ A(x) B(x)
(fj ) (x) = ,
∂xi 1≤i≤n,1≤j≤r
C(x) D(x)
A(x)
(g1 (x), . . . , gn (x)) = 0.
C(x)
By linear algebra, a basis of solutions is given by the rows of the d × n matrix
(hij (x)) := −C(x)A−1 (x) Id .
Clearly, the entries of A and C are regular functions on X . By the formula for A−1 (x)
using determinants, this also holds for nthe entries of A−1 (x) . We conclude that hij ∈
∂
K[X] . For i = 1, . . . , d , let ∂i := j=1 hij ∂ x j . Since its evaluation at every point of
X satisfies (A.7), this is a well-defined vector field on X , i.e. ∂i ∈ Der(K[X], K[X]) .
Then ∂1 |x , . . . , ∂d |x is a basis of TX,x for all x , therefore ∂1 , . . . , ∂d are K[X] -linearly
independent. For any ∂ ∈ Der(K[X], K[X]) , we have
d
∂= ∂(xn−d+j )∂j ,
j=1
A.8. Divisors
From A.8.4 (b), we get the property div(f g) = div(f )+div(g) for f, g ∈ K(C)\
{0}.
546 A L G E B R A I C G E O M E T RY
A.8.7. In order to extend this construction to a higher dimension, we need the local
ring in a prime divisor. Let Y be an irreducible closed subset of the K -variety X .
Then we consider pairs (U, f ), where U is an open subset of X with U ∩ Y = ∅
and f ∈ OX (U ). Two pairs (U, f ) and (U , f ) are called equivalent if f = f
on an open subset U ⊂ U ∩ U with U ∩ Y = ∅. (Since Y is irreducible,
U ∩ Y and U ∩ Y are both open dense subsets of Y and hence U ∩ U ∩ Y is not
empty.) The equivalence classes form a ring OX,Y called the local ring of X in
Y . It is a local ring with maximal ideal mY formed by the classes of (U, f ) with
f (Y ∩ U ) = {0}.
If Y is of dimension 0, then we have seen in A.2.7 that Y is the set of conjugates
of a point x ∈ X . Just by definition, we have OX,x = OX,Y .
A.8.8. To study the local ring in a prime divisor Y , we may restrict to any open
subset U of X with U ∩ Y = ∅. So we may assume that X is an affine variety
over K . By Example A.3.2, the ideal {f ∈ K[X] | f (Y ) = {0}} of Y is a prime
ideal ℘ in K[X]. We have a homomorphism
K[X] −→ OX,Y , f → (X, f ),
which induces a homomorphism K[X]℘ → OX,Y . The latter is an isomorphism
by definition of localization and since f is locally the quotient of two regular
functions on X . Note that the prime ideals of K[X]℘ are in one-to-one corre-
spondence with prime ideals in K[X] contained in ℘ . A prime ideal ℘˜ of K[X]℘
corresponds to its inverse image in K[X] (see [157], Prop.7.9). Using the one-to-
one correspondence between prime ideals in K[X] and irreducible closed subsets
of X , we conclude that
dim(OX,Y ) = codim(Y, X). (A.9)
This holds also for non-affine varieties X . We leave the details to the reader.
A.8.9. A variety X over K is called regular in codimension 1 if OX,Y is a
regular local ring for all prime divisors Y of X .
A.8.10. A variety X over K is said to be normal if OX,x is an integrally
closed domain for all x ∈ X . An easy exercise shows that any localization
of an integrally closed domain remains integrally closed. For a prime divisor
Y of X and y ∈ Y , OX,Y is the localization of OX,x in the prime ideal
{f ∈ OX,Y | f |Y = 0}. We conclude that OX,Y is an integrally closed do-
main of Krull dimension 1.
By Theorem A.8.5, a normal variety is regular in codimension 1. Moreover, every
regular variety is normal. This is an easy consequence of the fact that a regular
local ring is a unique factorization domain ([197], Th.48, p.142).
A.8.11. Let X be a K -variety which is regular in codimension 1. For any prime
divisor Y , the regular local ring OX,Y has Krull dimension 1 by (A.9). By
A.8. Divisors 547
Theorem A.8.5, OX,Y is a discrete valuation ring for a canonical valuation ordY
on its field of fractions Q. We call ordY the order of f ∈ Q in Y .
If X is irreducible, then Q = K(X) and we define the Weil divisor of a non-zero
rational function f on X by
div(f ) := ordY (f )Y,
Y
where Y ranges over all prime divisors of X . We call them also principal Weil
divisors. We have to prove that ordY (f ) = 0 only for finitely many prime divisors
Y . As in Example A.8.6, we may assume that X is affine and that f ∈ K[X].
Then f is an invertible regular function outside Z({f }), hence ordY (f ) = 0
only for irreducible components of Z({f }) proving the claim.
To see surjectivity, we have to prove that every line bundle L has an invertible meromorphic
section. For any irreducible component Xj of X , we choose a non-empty trivialization
(Uj , ϕj ) of L such that Uj is disjoint from the other irreducible components. Then we
have sj ∈ Γ(Uj , L) given by sj (x) = ϕ−1 j (x, 1) . The union U of all Uj s is disjoint
and there is s ∈ Γ(U, L) defined by s|U j = sj . Since U is dense, (U, s) is an invertible
meromorphic section of L .
A.8.17. The elements of D(X) may be described by Cartier divisors. The idea
behind this concept is that divisors should locally given by single equations. To
make it precise, a Cartier divisor on X is given by the data (Uα , fα )α∈I , where
(Uα )α∈I is an open covering, fα is a unit in K(Uα ), and fα /fβ is a unit in
OX (Uα ∩ Uβ ) for all α, β . We identify two Cartier divisors (Uα , fα )α∈I ,
(Uβ , fβ )β∈J if fα /fβ is a unit in OX (Uα ∩ Uβ ) for all α, β . Given two Cartier
divisors D and D , it is always possible to pass to a common refinement. So we
add two Cartier divisors D and D by choosing representatives (Uα , fα )α∈I and
(Uα , fα )α∈I and then D + D is given by (Uα , fα fα )α∈I . Clearly, they form an
abelian group.
A.8.18. We will show below that D(X) is isomorphic to the group of Cartier
divisors. The line bundle associated to the Cartier divisor D will be denoted by
O(D) occuring with a distinguished invertible meromorphic section sD . We will
speak about certain properties of D as ample or base-point-free if O(D) has the
corresponding properties.
Let s be an invertible meromorphic section of a line bundle L and let us choose a
trivialization (Uα , ϕα )α∈I of L. Then sα := ϕα ◦ s may be viewed as a rational
function on Uα . If gαβ is the transition function, then we have sα = gαβ sβ on
Uα ∩Uβ . Since gαβ is a unit, we see that D(s) := (Uα , sα )α∈I is a Cartier divisor
on X . It does not depend on the choice of the trivialization. Then (L, s) → D(s)
induces a homomorphism from D(X) to the group of Cartier divisors.
A.8. Divisors 549
A.8.26. The pull-back of a Cartier divisor is not always well defined as a Cartier
divisor. Let ϕ : X → X be a morphism of varieties over K and let D =
(Uα , fα )α∈I be a Cartier divisor on X . We have to assume that the image of no
irreducible component of X is contained in supp(D). Then the pull-back of D
is the Cartier divisor
ϕ∗ (D) := (ϕ−1 (Uα ), fα ◦ ϕ)α∈I .
Our assumptions imply that fα ◦ ϕ are well-defined rational functions on X . It
is easy to check that (O(ϕ∗ D), sϕ∗ (D) ) = (ϕ∗ O(D), ϕ∗ (sD )).
The basic reference for intersection theory is the book of W. Fulton [125]. We
need only the first two chapters, namely properties of the intersection product of
a divisor with a closed subvariety. We collect here the most important results, not
going into full generality, assuming often that the ambient variety is smooth and
neglecting the theory of refined intersections which takes care about supports.
In this section, X is a variety over a field K .
A.9.1. We extend the concept of divisors to higher codimension. A cycle of di-
mension d is a formal linear combination of irreducible closed subvarieties of X
of dimension d with coefficients
1 in Z . They form an abelian group Zd (X). The
elements of Z(X) := d∈N Zd (X) are called cycles. A basis of this abelian
groupis formed by the irreducible closed subvarieties called prime cycles. If
Z = Y nY Y is a cycle, then the prime cycles Y with multiplicity nY = 0 are
called the components of Z . By definition, they are finite in number. Their union
is called the support of Z . Note that a Weil divisor is a cycle of pure codimension
1.
A.9.2. We assume that X is irreducible and Y is a prime divisor on X . The
goal is to define the order of a non-zero function f of X in Y . This will be
a generalization of the construction in A.8.11. In A.8.7, we have introduced the
local ring OX,Y . By A.8.8, it is an integral domain of Krull dimension 1 with
quotient field K(X).
For any non-zero f in the maximal ideal mY of OX,Y , the ring A = OX,Y /f OX,Y
has Krull dimension 0. Since the localization of a noetherian ring remains noe-
therian ([157], Th.7.10), it follows from A.8.8 that OX,Y is a noetherian ring. In
commutative algebra, a theorem of Krull says that the intersection of all prime
ideals in a commutative ring is the ideal of nilpotent elements ([157], Th.7.1).
Since A is a noetherian local ring whose maximal ideal m = mY /f OX,Y is the
unique prime ideal, we conclude that m is nilpotent, i.e. the ideal mn generated
by the n -fold products in m is 0 for some n ∈ N . Now we have a chain of ideals
0 = mn ⊂ mn−1 ⊂ · · · ⊂ m ⊂ A. (A.10)
552 A L G E B R A I C G E O M E T RY
where Y ranges over all prime divisors of X . It is called the Weil divisor of f
(or principal Weil divisor). By A.9.5, we have
div(f g) = div(f ) + div(g)
for non-zero f, g ∈ K(X).
The assumption X irreducible was just made for simplicity. By the same construc-
tion, we can define the Weil divisor of a rational function, which is not identically
zero on any irreducible component of X .
A.9.8. Let X be a K -variety. We consider the subgroup R(X) of Z(X) gen-
erated by div(f ), where f ranges over all non-zero rational functions on prime
cycles of X . Note that we view the Weil divisor div(f ) of Y as a cycle on X .
Two cycles are called rationally equivalent if their difference is in R(X). This
gives an equivalence relation on Z(X) denoted by ∼ . The quotient CH(X) :=
Z(X)/R(X) is called the Chow group. We grade it by dimension.
A.9.9. Let ϕ : X → X be a morphism of varieties with X complete (or more
generally a proper morphism). Then the image of a closed subset of X is a closed
subset of X (see A.6.15). Let Y be a prime cycle of X . Then ϕ(Y ) is a prime
cycle of X and we may view K(ϕ(Y )) as a subfield of K(Y ) by the map f →
f ◦ ϕ . Let
[K(Y ) : K(ϕ(Y ))]ϕ(Y ) if [K(Y ) : K(ϕ(Y ))] < ∞,
ϕ∗ (Y ) :=
0 else.
For any cycle Z = nY Y , we define the push-forward of Z by
ϕ∗ (Z) := nY ϕ∗ (Y ) ∈ Z(X ),
where Y ranges over all the prime divisors of X .
554 A L G E B R A I C G E O M E T RY
A.9.10. Note that [K(Y ) : K(ϕ(Y ))] < ∞ if and only if Y and ϕ(Y ) have
the same dimension. This follows from the fact that the dimension is equal to
the transcendence degree of the function field (see A.4.11). If K(Y ) is separable
over K(ϕ(Y )), then [K(Y ) : K(ϕ(Y ))] is the number of points in the fibre of a
generic point of ϕ(Y ) (see A.12.9).
A.9.11. Let ϕ : X → X be a surjective proper morphism of irreducible varieties
over K and let f be a non-zero rational function on X . We have seen in A.9.9
that K(X) is a field extension of K(X ). If this is a finite extension, then
ϕ∗ (div(f )) = div(N (f )),
where N : K(X) → K(X ) is the norm. If the extension is infinite, the push-
forward is 0. For a proof, see [125], Prop.1.4.
A.9.12. Let ϕ : X → X be a proper morphism of varieties over K . Then A.9.11
shows that ϕ∗ R(X) ⊂ R(X ) and so we get a push-forward map
ϕ∗ : CH(X) −→ CH(X ),
mapping the class of a cycle to the class of its push-forward.
A.9.13. Let X be a K -variety and let L be a line bundle on X . For an invertible
meromorphic section s of L, the Weil divisor div(s) associated to s is defined
similarly as in A.8.12 and A.8.13 still holds.
A.9.14. Let Y be a prime cycle on a K -variety X . Then we define c1 (L).Y ∈
CH(X) to be the rational equivalence class of div(sY ), where sY is any in-
vertible meromorphic section of L|Y . We have seen in A.8.16 that such an sY
always exists. If sY is another choice, then sY /sY is a rational function and
hence div(sY ) and div(sY ) are rationally equivalent.
By additivity, we define c1 (L).Z for all cycles Z on X . If Z is rationally equiv-
alent to 0, then c1 (L).Z = 0 (see [125], Cor.2.4.1). Therefore c1 (L).α is well-
defined for α ∈ CH(X) by using representatives. Then the homomorphism
CH(X) −→ CH(X), α → c1 (L).α
is called the first Chern class operation of L. Clearly, it does not depend on the
isomorphism class of the line bundle.
A.9.15. If L and L are line bundles on X and α ∈ CH(X), then
c1 (L ⊗ L ).α = c1 (L).α + c1 (L ).α.
This follows immediately from A.8.13. Moreover, we have
c1 (L). (c1 (L ).α) = c1 (L ). (c1 (L).α) .
For a proof of commutativity, we refer to [125], Cor.2.4.2.
A.9. Intersection theory of divisors 555
Note that for an irreducible closed subvariety Y of a variety X over K , the base change
to F as a scheme may be non-reduced and hence may be different from the base change as
a variety. Only the use of schemes leads to the above compatibilities.
A.9.18. Now let X be a smooth variety over K . Then Cartier divisors and Weil
divisors are the same (cf. A.8.21). It follows immediately that we have an isomor-
phism
Pic(X) −→ CH 1 (X), cl(L) → c1 (L).X.
Hence we get an intersection theory with divisors: For a divisor D on X and a
cycle Z on X , the intersection product is
D.Z := c1 (O(D)).Z ∈ CH(X).
Then the intersection product is bilinear, compatible with rational equivalence and
satisfies commutativity for divisors. If ϕ : X → X is a morphism of smooth
varieties, then we have a pull-back ϕ∗ (D ) := c1 (ϕ∗ O(D )).X ∈ CH(X) of
556 A L G E B R A I C G E O M E T RY
To prove this, note that D is given as a divisor by a local equation γ ∈ OX (U ) for some
open subset U intersecting W . Then the intersection multiplicity of D.Y in W is the
order of γ in OY ,W . Since γ vanishes on W , it is contained in the maximal ideal mY ,W
of OY ,W . Hence the length of OY ,W /γOY ,W is at least 1 .
As a corollary of proof, we see that the intersection multiplicity of W in D.Y is 1
if and only if the maximal ideal mY,W of OY,W is generated by a local equation of
D . In this case, OY,W is a regular local ring since it follows that mY,W /m2Y,W is
a one-dimensional K(W ) = OY,W /mY,W -vector space and the Krull dimension
of OY,W is codim(W, Y ) = 1 (cf. A.8.8).
Example A.9.22. Assume that D and Y are both smooth. Then the intersection
of D and Y is called transversal if TY,y ⊂ TD,y for all y ∈ D ∩ Y . We claim
that the intersection multiplicity of an irreducible component W of D∩Y in D.Y
is 1.
To prove it, we may assume that K is algebraically closed because the intersection product
is compatible with base change (see A.9.17). We may assume that D is given by a single
equation γ ∈ OX (X) . Let γ = uπ n for a unit u in OY ,W and a local parameter π
in OY ,W (which is a principal ideal domain since Y is smooth). We know that n =
ordW (γ|Y ) ≥ 1 . There is a y ∈ W such that π vanishes in y and u is regular in
y . Using Zariski’s definition of the tangent space (see A.7.8), TY ,y ⊂ TD,y means that
γ ∈ mY ,y \ m2Y ,y . It follows n = 1 proving the claim.
A.9.26. Let X be a projective variety over K and let L be an ample line bundle
on X . For a prime cycle Z ∈ Zd (X), the degree of Z with respect to L is
degL (Z) := deg(c1 (L) . . . c1 (L).Z) ∈ Z,
A.9. Intersection theory of divisors 559
where c1 (L) occurs d times. By additivity, we extend the degree to all cycles. In
particular, we define the degree of X with respect to L by
degL (X) := degL (Xj ),
j
Let X be a prime divisor of AnK . First, we show that the ideal I(X) of X in K[x] is
generated by an irreducible polynomial. We choose any non-zero f (x) ∈ I(X) . Clearly,
f (x) is not a constant. By considering the prime factorization of f (x) and since I(X)
is a prime ideal, we see that I(X) contains an irreducible polynomial g(x) . Since X
has codimension 1 , we have X = Z(g) and Hilbert’s Nullstellensatz shows that g(x)
generates I(X) . Next, we have to prove X = div(g) . To see it, note that g is invertible
outside of X . This proves div(g) = mY for some m ∈ N . The maximal ideal of the
local ring OAnK ,X is generated by g , proving X = div(g) . Since this holds for any prime
divisor, we get CH 1 (AnK ) = {0} .
As a corollary of A.9.18, we see that any line bundle on AnK is isomorphic to the
trivial bundle, i.e. Pic(AnK ) = {0}.
Example A.9.28. We deduce from the above that
∼ CH 1 (Pn ) ∼
Pic(Pn ) = = Z.
K K
To see it, let Y be a prime divisor of PnK . We choose a standard affine open subset Uj =
{xj = 0} with Uj ∩ Y = ∅ . By the example above, Y ∩ Uj = div(f ) for some
f ∈ OPnK (Uj ) . Using Uj ∼ = AnK , f comes from an irreducible polynomial of degree d in
˜
n variables. Let f be the homogenization of f , i.e.
x0 xj−1 xj+1 xn
f˜(x0 , . . . , xn ) = xdj f ,..., , ,..., .
xj xj xj xj
We may view f˜ as a global section of OPnK (d) (see A.6.6). We claim that div(f˜) = Y .
This follows immediately from div(f ) = Y ∩ Uj and the non-vanishing of f˜ outside
of Uj . So we conclude that d
→ c1 (OPnK (d)).PnK is a surjective homomorphism of Z
onto CH 1 (PnK ) . Since the line bundles OPnK (d) are pairwise not isomorphic (compare the
dimensions of Γ(PnK , OPnK (±d)) ) and since Pic(X) ∼ = CH 1 (X) (see A.9.18), we get the
claim.
For a multiprojective space PK := PnK1 × · · · × PnKr (see A.6.13), we can similarly
show that Pic(PK ) ∼= Zr . We simply have to replace homogeneous polynomials
in the above consideration by multihomogeneous polynomials.
560 A L G E B R A I C G E O M E T RY
∼
Proposition A.9.29. Let X be a variety over K , then CH p (X) −→ CH p (X ×
AnK ), where the isomorphism is given by mapping a prime cycle Y on X to the
prime cycle Y × AnK on X (see A.4.11).
Proof: For simplicity, we only prove surjectivity (see [125], Th.3.3, for a whole proof). By
induction, we may assume that n = 1 . Let Y be a prime cycle on X × A1K . We have to
prove that Y is rationally equivalent to a linear combination of cycles of the form Y ×A1K .
Replacing X by the closure of p1 (Y ) , where p1 denotes the first projection, we may
assume X = p1 (Y ) . If dim(Y ) > dim(X) , then Y = X × A1K . So we may assume
dim(Y ) = dim(X) . Then Y is a divisor in X × A1K . Let U be a non-empty affine open
subset of X . As a closure of an irreducible subset, X is irreducible and hence the same
holds for U . We consider the ideal I(Y ∩ U ) in K[U ] = K[U ][x] for U = U × A1K .
The ideal in K(U )[x] generated by I(Y ∩ U ) is generated by a polynomial f with
coefficients in K(U ) . By shrinking U , we may assume that f ∈ K[U ][x] and that f
generates I(Y ∩ U ) in K[U ] . This shows that div(f ) and Y agree on Y ∩ U , hence
Y = div(f ) + j nj (Yj × A1K ) , where Yj is ranging over the irreducible components
of X \ U (of codimension 1 ). This proves surjectivity.
Remark A.9.30. In particular, we have CH(AnK ) = {0}. Note that we need only
surjectivity in the above statement.
We use the above isomorphism to define a ring structure on CH(PnK ). The mul-
tiplication is called the intersection product of cycles on PnK (see [125] for an
intersection product on any smooth variety). Note that this extends the intersection
A.9. Intersection theory of divisors 561
For a proof, we may assume that L is very ample (replace L by a suitable power). But
then, we may assume that X is a closed subvariety of PnK and L = OPnK (1)|X . Then
degL (Z) is the degree of Z in PnK . We may assume that Z is a prime cycle of dimension
d . Let H be a hyperplane in PnK not containing the prime cycle Z . If d > 0 , then Z is
not contained in the affine space PnK \ H (see A.6.15). Therefore H ∩ Z is not empty and,
since the multiplicities of H.Z are at least 1 in every irreducible component of H ∩ Z (see
A.9.21), we get the claim by induction on d .
for some t1 , t2 ∈ T (K). Here, we identify the fibres Xtj := X × {tj } with X .
This is possible because of the K -rationality of tj .
562 A L G E B R A I C G E O M E T RY
By passing to a suitable tensor power, we may assume that L1 and L2 are both very
ample (see A.6.10). Then degL j (X) is determined by the leading coefficient of the Hilbert
polynomial of Lj (see A.10.33). By definition, we have a line bundle L on X × T with
T an irreducible smooth variety over K such that Lt 1 ∼
= L1 , L t 2 ∼
= L2 for some t1 , t2 ∈
T (K) . Since the projection of X × T onto X is flat (see A.12.11) and X is complete,
the Hilbert polynomial of Lt does not depend on the choice of t ∈ T (see A.10.35). This
proves the claim.
Chow’s lemma ([137], Th.5.6.1) says that every complete K -variety X is image
of a birational surjective morphism ϕ : X → X from a projective K -variety X .
By projection formula, the invariance of the degree under algebraic equivalence
holds more generally for complete varieties over K .
A.9.38. Let X be a complete K -variety with line bundles L1 , . . . , Lr and let
Z ∈ Zr (X). Then deg(c1 (L1 ) . . . c1 (Lr ).Z) depends only on the algebraic
equivalence classes of L1 , . . . , Lr .
(a) If D1 and D2 are rationally equivalent, then they are algebraically equiva-
lent.
(b) If D1 and D2 are algebraically equivalent and D is a further divisor on
X , then D1 + D is algebraically equivalent to D2 + D .
(c) Algebraic equivalence is preserved by pull-back of divisors with respect to
a morphism of smooth varieties over K .
(d) Algebraically equivalent divisors on a smooth projective curve have the
same degree.
A.9.40. Let X be an irreducible smooth projective curve over an algebraically
closed field. Then the converse of (d) also holds, i.e. if deg(D1 ) = deg(D2 ), then
D1 and D2 are algebraically equivalent.
To prove it, note that D1 and D2 are sums of ±[x] for some x ∈ X . Since they have the
same degree and algebraic equivalence is compatible with sum of divisors, we may assume
D1 = [x1 ], D2 = [x2 ] . Note that X × X is smooth (see A.7.17) and the diagonal ∆
is a divisor on X × X . Moreover, for any x ∈ X , we have O(∆)|X ×{x} ∼ = O([x]) .
This follows from transversality of X × x and ∆ in X × X . So we choose T := X as
parameter space, t1 := x1 , t2 := x2 and L = O(∆) to get the claim.
This section gives a brief introduction to sheaf cohomology on varieties. For more
details, we refer to [148], Ch.III. First, we need some additional constructions of
sheaves. We consider a topological space T . On T , all (pre-)sheaves will be
(pre-)sheaves of abelian groups. Later, we pass to a K -variety X .
A.10.1. For a presheaf F on T , there is a canonical way to associate a sheaf F +
on T and a homomorphism ι : F → F + such that for any sheaf G on T and any
homomorphism ϕ : F → G , there is a unique homomorphism ϕ+ : F + → G
with ϕ = ϕ+ ◦ ι . Then F + is called the sheaf associated to F . For the easy
construction, we refer to [148], Prop.-Def.II.1.2.
A.10.2. Let F be a sheaf on T . A subsheaf of F is a sheaf G such that G(U )
is a subgroup of F(U ) for all open subsets U of T and such that the restriction
maps of G are induced by the restriction maps of F .
564 A L G E B R A I C G E O M E T RY
where ρ is the restriction homomorphism of the sheaf F . Then we get the Čech
complex
d−1 d0 d1
0 −→ C 0 (U, F) −→ C 1 (U, F) −→ · · ·
A.10. Cohomology of sheaves 565
over the terminology of vector bundles (especially of line bundles) to locally free
OX -modules (of rank 1).
We have an open covering (Uα )α∈I of X such that F|U α is free of rank rα . We have an
OX (U ) -basis sα1 , . . . , sαr α in F (Uα ) . On non-empty Uα ∩ Uβ ,
ρU α Uα
U α ∩U β (sα1 ), . . . , ρU α ∩U β (sαr α )
and
U U
ρU βα ∩U β (sβ 1 ), . . . , ρU βα ∩U β (sβ r β )
are both an OX (Uα ∩ Uβ ) -basis of F (Uα ∩ Uβ ) . Therefore, we have rα = rβ and
gαβ ∈ GL(rα , OX (Uα ∩ Uβ )) with
U
rα
ρU βα ∩U β (sβ i ) = (gαβ )ji ρU α
U α ∩U β (sαj ) (A.15)
j=1
for i = 1, . . . , rα .
An element of GL(rα , OX (Uα ∩ Uβ )) is an invertible rα × rα -matrix with entries in
OX (Uα ∩ Uβ ) . Therefore gαβ may be viewed as a morphism Uα ∩ Uβ → GL(rα )K .
Clearly, we have gαβ gβ γ = gαγ on Uα ∩ Uβ ∩ Uγ . By the construction in A.5.7, we get a
−1
vector bundle E on X with transition matrices gαβ . Let ϕα : πE (Uα ) → Uα × ArK be
the corresponding trivialization. We consider e1 , . . . , er α ∈ Γ(Uα , Uα × ArK ) , pointwise
equal to the standard basis. Then ϕ−1 −1
α ◦ e1 , . . . , ϕα ◦ er α form an OX (Uα ) -basis of
Γ(Uα , E) . Since they satisfy the same transition rule on Uα ∩ Uβ as in (A.15), we may
identify them with sα1 , . . . , sαr α . Then the sheaf of sections E of E coincides with F .
A.10.15. We have seen in Example A.5.11 that not every sheaf is locally free. We
now introduce the notion of coherent sheaves, which includes almost all sheaves
of importance for our book. A coherent sheaf is an OX -module F on X , which
is locally isomorphic to the cokernel of free sheaves, i.e. for all x ∈ X there is
an open neighbourhood U of x and an OU -module homomorphism ϕ : OU rU
→
OU for some rU , sU ∈ N such that OU /im(ϕ) is isomorphic to F . Obviously,
sU sU
We deduce that a complete affine variety X is finite. Let ϕ : X → A0K be the constant
map. Then ϕ∗ (OX )(A0K ) = K[X] is a finite-dimensional K -vector space. We conclude
that ϕ is a finite morphism proving the claim (see A.12.4).
We briefly sketch the argument assuming some familiarity with homological algebra (see
for example [157], Ch.6). By [148], Prop.II.5.6, the sequence
ϕ ϕ
0 −→ F (U ) −→ F (U ) −→ F (U ) −→ 0
570 A L G E B R A I C G E O M E T RY
is exact for any affine open subset U . Since the intersection of finitely many affine open
subsets remains affine ([148], Exercise II.4.3), we see that the sequence
ϕ ϕ
0 −→ C ∗ (U, F ) −→ C ∗ (U, F ) −→ C ∗ (U, F ) −→ 0
is also exact. By homological algebra, we get the long exact sequence of cohomology
groups.
For a short exact sequence, this follows from the long exact sequence of cohomology groups
and the fact that the alternate sum of dimensions is zero for a finite exact sequence of finite-
dimensional vector spaces. In general, we have
n n
(−1)j χ(Fj ) = (−1)j (χ(ker(ϕj )) + χ(im(ϕj )))
j=0 j=0
n
n
= (−1)j χ(im(ϕj−1 )) + (−1)j χ(im(ϕj )) = 0.
j=0 j=0
of supp(F ) and let n(F ) be the number of d(F ) -dimensional irreducible components of
supp(F ) . We order the pairs (d(F ), n(F )) lexicographically and we use induction with
respect to this order. If F = 0 , then the support of F is not empty and we find a standard
open subset Uj = {xj = 0} intersecting supp(F ) . Consider the exact sequence
ϕ
0 −→ K −→ F (−1) −→ F −→ C −→ 0, (A.18)
where the homomorphism ϕ is tensoring with xj . Note that the restriction of ϕ to
Uj is an isomorphism. We conclude that the supports of the kernel K and the co-
kernel C are contained in supp(F ) ∩ {xj = 0}. By induction, we may assume that
d
χ(K(n)) and χ(C(n)) have the form n
j=0 aj j for some aj ∈ Z . Note that (A.18)
⊗n
remains exact after tensoring with L and the supports do not change. By (A.17), we
d(F )−1 n
see that χ(F (n)) − χ(F (n − 1)) = b j for some bj ∈ Z . Then
d(F )−1 n+1 j=0 j
χ(F (n)) = χ(F (−1)) + j=0 bj j+1 .
As a corollary of the proof, we see that the Hilbert polynomial has the form
x
d(F )
P (x) = aj
j=0
j
with integer coefficients aj . By [135], Prop.5.3.1, the degree of the Hilbert poly-
nomial is equal to the dimension of supp(F).
A.10.31. Let i : X → Pm K be a closed embedding with OPK (1)|X = L. From
m
we deduce that the Hilbert polynomial of F with respect to L is the same as the
Hilbert polynomial of i∗ (F) with respect to OPm
K
(1). We conclude that without
loss of generality, we can always assume X = PmK , L = OPK (1).
m
that
5∞
Γ∗ (F) := K , F(n))
H 0 (Pm
n=0
is a graded K[x0 , . . . , xn ]-module. It follows from [148], Exercise II.5.9 and
from A.10.26 that Γ∗ (F) is a finitely generated graded K[x]-module. Let P be
the Hilbert polynomial of F (always with respect to OPm K
(1)). By A.10.27, we
get
P (n) = dim H 0 (PmK , F(n))
for all n 0. Hilbert has shown that for any finitely generated K[x]-module M,
there is a polynomial PM ∈ Q[x] with PM (n) = dim Mn for n 0 (see [157],
Th.7.23). We conclude that the Hilbert polynomial of F is equal to PΓ∗ (F ) .
Example A.10.32. The Hilbert polynomial of OPm K
is P (x) = x+m
m by A.6.6.
A.10. Cohomology of sheaves 573
m+n
P (n) = − dim I(X)n .
n
This follows from the long exact cohomology sequence applied to (A.16) on
page 569, with arguments as those in A.10.31.
The Hilbert polynomial P of X has degree d = dim(X) (see A.10.30). By
[125], Example 2.5.2, the highest coefficient of P is ad /d! with ad equal to the
degree of X . This is illustrated in the following example.
A.10.34. Assume that the homogeneous ideal I(X) of X is generated by a ho-
mogenous non-zero polynomial f of degree d . By A.10.14, we have OPm (−X) =
JX giving rise to
f·
0 −→ OPm K
−→ OPmK
(d) −→ OPm K
/JX (d) ∼ = i∗ OX (d) −→ 0.
The associated long exact sequence, A.10.25 and Example A.10.32 show that the
Hilbert polynomial of X is
x+m x+m−d
P (x) = − .
m m
Then P is a polynomial of degree m − 1 with leading coefficient d/(m − 1)!.
A.10.35. An important property of the Hilbert polynomial is that it is invariant under flat
perturbation. This is useful for modular problems (see [280], Ch.VI, 4). To formulate
it properly, we have to use the language of schemes. The reason is that for a morphism
ϕ : X → X and y ∈ X , the fibre Xy may be different in the sense of schemes.
However, in our applications, there is no difference, i.e. the fibres will be reduced.
Let ϕ : X → X be a proper morphism of noetherian schemes. Consider a coherent
sheaf F on X such that F is flat over X , i.e. F (ϕ−1 U ) is a flat OX (U ) -module
(see A.12.10) for every affine open subset U of X . Then the Euler characteristic of
Fy := F|X y is locally constant in y ∈ X ([212], II.5). If L is a very ample invertible
sheaf on X , we conclude that the Hilbert polynomial of Fy with respect to Ly is locally
constant on X . If X is connected, it is independent of the choice of the fibre Xy .
Example A.10.36. Let K = F2 , let X be the closed subvariety of P1K × A1K given
by 0 = x2 + yx + 1 , where x is an affine coordinate on P1K and y is the coordinate
on A1K . Let ϕ : X → A1K be the second projection. Then the fibre of ϕ over y as a
2
scheme
isx−1 by the equation x + yx + 1 = 0 , hence the Hilbert polynomial of Xy is
given
x+1
1
− 1 = 2 . On the other hand, the fibre over 0 is as a variety given by x = 1 , i.e.
the Hilbert polynomial is equal to 1 . This shows that it is necessary to consider the fibres
in the sense of schemes.
574 A L G E B R A I C G E O M E T RY
A.10.37. It is possible to relate the cohomology of the product to the cohomologies of the
factors. This is done in the Künneth formula: Let X1 and X2 be varieties over K . Let
E be a locally free sheaf on X1 and let F be a coherent sheaf on X2 . We denote the
projection of X1 × X2 onto Xi by pi . Then
5
H n (X1 × X2 , p∗1 E ⊗ p∗2 F ) ∼
= H p (X1 , E) ⊗K H q (X2 , F ).
p+q=n
For the proof (of a much more general result), we refer to [136], Th.6.7.8.
We sketch the proof: By A.4.11, K(X) is separable over K , i.e. there are algebraically
independent f1 , . . . , fr ∈ K(X) such that K(X) is a finite-dimensional separable field
extension of F := K(f1 , . . . , fr ) . By the primitive element theorem ([156], Sec.4.14),
K(X) is generated over F by a rational function fr+1 . It follows that the vector ϕ(x) :=
(f1 (x), . . . , fr+1 (x)) gives a rational map to Ar+1
K . Let p be the minimal polynomial
of fr+1 over F . By clearing denominators, we may assume that p = q(f1 , . . . , fr , ·)
for some q ∈ K[x1 , . . . , xr+1 ] . The hypersurface Z({q}) in Ar+1K has a function field
isomorphic to K(X) . It follows easily that ϕ gives a birational map from X to Z({q}) .
A.11.9. Let ϕ : X X be a rational map of K -varieties with X smooth. If the base
change ϕK̄ extends to a morphism XK̄ → XK̄ , then ϕ extends to a morphism X → X .
Proof: Assume that ϕK̄ extends to a morphism ϕ̄ : XK̄ → XK̄ and let x ∈ X . We need
to prove that ϕ extends to a morphism in a neighbourhood of x with image x := ϕ̄(x) .
We may assume that all irreducible components of X pass through x . Let U be an open
dense subset of X , where ϕ is defined as a morphism.
Clearly, x is in the closure of ϕ(U ) . By A.11.8, we have to prove that, for any regular
function f in an open neighbourhood of x , the rational function f ◦ ϕ is regular in a
neighbourhood of x .
Since f ◦ ϕ̄ is regular in x over K̄ , we know that no poles of div(f ◦ ϕ̄) pass through
x . Since div(f ◦ ϕ̄) is the base change of div(f ◦ ϕ) to K̄ (use that X is smooth and
work with Cartier divisors), we conclude that f ◦ ϕ has no poles through x .
Now we know that a rational function on a smooth variety without poles is regular (cf.
A.8.21). Therefore, the function f ◦ ϕ is regular in an open neighbourhood of x .
A.12. Properties of morphisms 577
The inequality follows from [137], Th.5.5.8, and the existence of U is a consequence of
generic flatness (see A.12.13 below). The whole statement is part of [148], Exercise II.3.22.
A.12.2. Let ϕ : X → X be a morphism of K -varieties. Then ϕ is called a
finite morphism if there is an open affine covering (Uα )α∈I of X such that
Uα := ϕ−1 (Uα ) is also affine and such that K[Uα ] is a finitely generated K[Uα ]-
module using A.2.4.
Lemma A.12.3. A finite morphism has finite fibres.
Proof: We may assume X and X both affine.
For x ∈ X , the fibre over x is an affine variety
√ over K(x ) with coordinate ring
K(x ) ⊗K [X ] K[X] divided by the radical ideal 0 . We may assume that K[X] is a
finite K[X ] -module. Therefore, the coordinate ring of Xx is a finite-dimensional K(x ) -
vector space.
Suppose now that Y is an irreducible component of Xx . Then K[Y ] is a quotient of
K[Xx ] . We conclude that K[Y ] is also a finite-dimensional K(x ) -vector space and
hence a field. Its transcendence degree over K(x ) is 0 proving dim(Y ) = 0 (see Exam-
ple A.3.6).
A.12.4. There is the following converse: A morphism ϕ : X → X of varieties
over K is finite if and only if ϕ is a proper morphism with finite fibres ([136],
Prop.4.4.2).
A.12.5. Let ϕ : X → X be a finite morphism and let U be any affine open
subset of X . Then U := ϕ−1 (U ) is affine and K[U ] is a finitely generated
K[U ]-module ([148], Exercise II.3.4). Therefore, the definition does not depend
on the choice of the affine open covering.
Moreover, we conclude that the composition of finite morphisms remains finite.
578 A L G E B R A I C G E O M E T RY
For an affine open subset U of X , the sheaf OX (ϕ−1 (U )) is a finitely generated K[U ] -
module (see A.10.18). Although ϕ−1 U may not be affine, there is an affine variety U ,
unique up to isomorphism, with K[U ] = OX (ϕ−1 (U )) . This follows from the fact that
the K -algebra OX (ϕ−1 (U )) is finitely generated and without nilpotents, so we may use
the generators to define coordinates. Using the homomorphism ϕ : K[U ] → K[U ] ,
we get a finite morphism ψ : U → U with ψ = φ (see A.2.4). Using the affine
open subsets Ug , g ∈ K[U ] , we prove easily that the morphisms ψ : U → U agree
on overlappings, i.e. with varying U we can paste the U and the morphisms U →
U to get a K -variety X and a morphism ψ : X → X . By construction, ψ is
finite. To define ρ : X → X on ϕ−1 (U ) , let x1 , . . . , xn be generators of K[U ] =
OX (ϕ−1 (U )) as a K -algebra. We may view them as coordinates on U and we define
ρ(x) := (x1 (x), . . . , xn (x)) . It is easy to see that we get a well-defined morphism ρ :
X → X with ρ−1 (U ) = ϕ−1 (U ) for all U . By construction, ρ : K[U ] →
OX (ϕ−1 (U )) is the identity. We conclude that ϕ = ρ ◦ψ on K[U ] proving ϕ = ψ◦ρ
on X . By the factorization property and A.6.15(c), we conclude that ρ is proper. Since
ρ is one-to-one, it is easy to see that ρ(X) is dense in X (use [148], Exercise II.2.18).
Since a proper map is closed, we conclude ρ(X) = X . For the proof of connected fibres
and more details, we refer to [136], 4.3.
open dense subset U of X such that the restriction of ϕ gives a finite morphism
ϕ−1 (U ) → U whose fibres have cardinality equal to the separable degree of
K(X)/K(X ) also called the separable degree of ϕ.
We sketch the proof. We may assume that X and X are affine. Using the correspon-
dence between finitely generated fields and varieties up to birationality, we may assume
that K(X) is a primitive extension of K(X ) , which is either separable or purely insepa-
rable. Let q(t) be the minimal polynomial of a primitive element over K(X ) . Again by
shrinking X and X , we may assume that q(t) ∈ K[X ][t] and that
K[X] ∼
= K[X ][t]/(q(t)).
This proves already finiteness. If q is separable, then q and dt d
q are coprime in K(X )[t] ,
hence there are a, b ∈ K(X )[t] with aq + b dt q = 1 . Shrinking X, X again, we may
d
assume that a, b ∈ K[X ][t] . It follows that the identity survives after specializing in any
x ∈ X , hence the specialization of q in x is separable. We conclude that the fibre over
x has exactly deg(q) = [K(X) : K(X )] points.
If K(X)/K(X ) is purely inseparable, then K has characteristic p = 0 and q(t) =
e
tp − h for some h ∈ K(X ) ([157], Prop.8.13). Again, we may assume h ∈ K[X ] and
all the fibres have exactly one element.
A.12.10. In algebra, there is an important generalization of free modules. Let A
be a commutative ring with 1 and let M be an A -module. Then M is called
flat if for every injective A -module homomorphism N → N , the tensor homo-
morphism M ⊗A N → M ⊗A N is also injective. Flatness is preserved under
base change and localization. Moreover, M is flat if and only if the localization
M℘ is a flat A℘ -module for all prime ideals ℘ of A . Clearly, the same holds if
we consider only maximal ideals. For proofs and more details, we refer to [197],
Ch.2, §3.
A.12.11. We transfer this concept to algebraic geometry by calling a morphim
ϕ : X → X of varieties over K flat in x ∈ X if ϕ : OX ,ϕ(x) → OX,x makes
OX,x into a flat OX ,ϕ(x) -module. If ϕ is flat in all points of X , then ϕ is called
flat. If X and X are affine, then ϕ is flat if and only if K[X] is a flat K[X ]-
module. This follows from A.12.10 and the one-to-one correspondence between
maximal ideals and points modulo conjugates (see A.2.7).
A.12.12. Flatness is important since many properties of one fibre hold for all fibres.
For example, if ϕ : X → X is a flat morphism of pure dimensional varieties over
K , then for x ∈ ϕ(X), we have
dim ϕ−1 (x ) = dim(X) − dim(X ).
[148], Cor.III.9.6. Without flatness, the dimension theorem only ensures ≥ . For
example, if X is the blow up of a two-dimensional variety in a point, then the
inequality is strict. If ϕ is a flat morphism of projective varieties, then equal-
ity follows from the more general fact that the Hilbert polynomial is stable with
respect to the scheme-theoretic fibres (use A.10.33 and A.10.35).
580 A L G E B R A I C G E O M E T RY
A.12.13. The theorem on generic flatness states that for any morphism ϕ : X →
X of varieties over K , there is an open dense subset U of X such that the
restriction of ϕ to a map ϕ−1 (U ) → U is flat ([137], Th.6.9.1).
Note that if ϕ is not dominant, this is not interesting since we may choose U :=
X \ ϕ(X) . On the other hand, every flat morphism of K -varieties is open ([137],
Th.2.4.6) and hence dominant.
A.12.14. A morphism ϕ : X → X of varieties over K is called unramified in
x ∈ X if the following conditions are satisfied:
(a) The maximal ideal mx of OX,x is generated by the image of mϕ(x) under
the map ϕ : OX ,ϕ(x) → OX,x , f → f ◦ ϕ .
(b) ϕ induces a separable extension K(ϕ(x)) ⊂ K(x) of residue fields.
If ϕ is unramified in all points of X , then it is called an unramified morphism.
If ϕ is unramified and flat, then it is called étale.
Example A.12.15. We assume that X and X are both affine varieties over K
and K[X] = K[X ][t]/(f (x , t)) where f (x , t) is a monic polynomial in the
variable t with coefficients in K[X ]. There is a unique morphism ϕ : X → X
with ϕ equal to the canonical homomorphism K[X ] → K[X] (see A.2.4). Note
that ϕ makes K[X] into a free K[X ]-module of rank deg(f ). Hence ϕ is a
finite flat morphism. We claim that the set of unramified (even étale) points in X
is equal to X \ Z({ ∂t
∂
f }).
by the Chinese remainder theorem. The fibre points are in one-to-one correspondence
with the zeros of f1 (x0 , ·), . . . , fr (x0 , ·) . Let x0 = (x0 , t0 ) be a fibre point over x0 .
Without loss of generality, we may assume that t0 is a zero of f1 (x0 , t) (and hence
of no other fj (x0 , t) ). Let I(x0 ) be the ideal of vanishing of x0 in K[X ] , hence
K(x0 ) = K[X ]/I(x0 ) . By the Chinese remainder theorem again, we have
r
K[X]/K[X]I(x0 ) ∼
= K(x0 )[t]/(fj (x0 , t)n j ).
j=1
To see this, we note first that the disjoint union of projective curves is projective. Hence
we may assume C irreducible. By passing to an open affine part and then to the projective
closure, we may assume C projective. The normalization π : C → C is a birational
finite morphism (see A.12.6). By A.12.7, C is projective. Since a normal curve is regular
(Theorem A.8.5), we get the claim.
582 A L G E B R A I C G E O M E T RY
A.13.3. If K is perfect, then a regular curve is smooth (see A.7.13). For non-
perfect K , this does not necessarily hold (see [148], Exercise III.10.1). For any
curve C over K , the above implies that CK is birational to a smooth projective
curve over K . Clearly, this holds also over a finite-dimensional subextension of
K/K .
A.13.4. Let C be a geometrically irreducible smooth projective curve over K .
The dual of the tangent bundle is a line bundle which we call the canonical line
bundle of C . It is denoted by KC . The genus of C is
g(C) := dim Γ(C, KC ).
In other words, it is the dimension of the space Ω1C (C) of globally defined 1-forms
on C . For example, if C is a smooth plane curve of degree d in P2K , i.e. C is
the zero set of a geometrically irreducible homogeneous polynomial f (x0 , x1 , x2 )
of degree d , then the genus formula
1
g(C) = (d − 1)(d − 2)
2
holds ([148], Exercise II.8.4(f)).
We have seen in A.9.24 that the degree of a zero-dimensional cycle does not de-
pend on its rational equivalence class. Using CH0 (C) ∼= Pic(C) from A.9.18,
the degree deg(L) of a line bundle L on C is defined as the degree of the
corresponding rational equivalence class. Most important is the Riemann–Roch
theorem for curves:
Theorem A.13.5. Let L be a line bundle on the geometrically irreducible smooth
projective curve C over K . Then we have
dim Γ(C, L) − dim Γ(C, KC ⊗ L−1 ) = deg(L) + 1 − g(C).
Proof: For K algebraically closed, this is proved in [148], Th.IV.1.3. By base change and
the following remarks, we reduce to this case. First, note that the base change of the tangent
bundle is the tangent bundle of the base change (see A.7.6). So the same holds for the
canonical line bundle. The cohomology on a geometrically reduced variety is compatible
with base change (see A.10.28). We have seen in the proof of A.9.25 that degree of a divisor
(and hence of a line bundle) is invariant under base change. This proves the claim in general.
sections L of L.
A.14. Connexion to complex manifolds 583
To see this, we may assume K algebraically closed as above. Then the claim
follows from Serre duality (see A.10.29), i.e. H 1 (C, L) is the dual space of
H 0 (C, Ω1C ⊗ L−1 ).
A.13.7. Let C be a smooth geometrically irreducible projective curve over K and
let L be a line bundle on C . Then L is ample if and only if deg(L) > 0. If
deg(L) ≥ 2g(C) + 1, then L is very ample.
For K algebraically closed, this is proved in [148], Cor.IV.3.2 and the general case follows
by base change using A.6.12.
This section is for a reader which is more familiar with the theory of complex
manifolds (as in [130]) than with the algebraic side. We explain the meaning of
the other sections of the appendix in terms of complex analysis, which could be
helpful for the understanding. For more details and proofs, we refer to [148], App.
B, [280], Ch.s VII, VIII, and J.-P. Serre [275].
584 A L G E B R A I C G E O M E T RY
H p (Xan , Ean ).
A.14.8. Let X be a smooth variety over K . Any divisor D on X gives rise to
a divisor Dan on Xan . Principal divisors pass to principal divisors on Xan . We
have a canonical homomorphism CHr (X) → H2r (Xan , Z) mapping the prime
cycle Y to Yan in the homology group. If D is a divisor on X , then D.Y maps
to the cup product of Dan and Yan . Hence it is the same to compute intersection
numbers on X or in homology on Xan . Two divisors on X are algebraically
equivalent if and only if they are homologically equivalent on Xan , i.e. if they have
the same image in Hdim(X)−2 (Xan , Z). For details, we refer to [125], Ch.19.
A.14.9. If X is an irreducible smooth projective curve over C , then Xan is a
connected projective manifold of dimension 1. The genus of X is equal to the
genus of Xan . A complex manifold of dimension 1 is called a Riemann surface.
We conclude that Xan is a connected compact Riemann surface. Conversely, Rie-
mann’s existence theorem says that every connected compact Riemann surface
has this form ([148], Th.B.3.1).
A P P E N D I X B R A M I F I C AT I O N
B.1. Discriminants
Remark B.1.6. Even if we do not know the zeros, we get information from Ex-
ample B.1.4. Let f (t) := tn + a1 tn−1 + · · · + an ∈ R[t]. By the specialization
xi = ai in Example B.1.4, we see that the discriminant Df is a polynomial in
a1 , . . . , an with coefficients in Z . The right-hand side of (B.2) is homogeneous
of degree n(n − 1) in y1 , . . . , yn . Hence if we consider ai of degree i, then
(B.1) shows that the discriminant is a homogeneous polynomial in the weighted
variables a1 , . . . , an of degree n(n − 1).
B.1.7. Let f (t) = am + am−1 t + · · · + a1 tm−1 + a0 tm , g(t) = bn + bn−1 t + · · · +
b1 tn−1 + b0 tn be polynomials with coefficients in R . For k ∈ Z \ {0, . . . , m}, we
set ak = 0 and, similarly, we proceed with the coefficients of g . Then we form
the (m + n) × (m + n) matrix M by the rules
ai−j if 1 ≤ j ≤ n,
Mij =
bi−j+n if n + 1 ≤ j ≤ m + n.
For m = 2, n = 3, the matrix is
⎛ ⎞
a0 0 0 b0 0
⎜a1 a0 0 b1 b0 ⎟
⎜ ⎟
⎜a2 a1 a0 b2 b1 ⎟
⎜ ⎟.
⎝ 0 a2 a1 b3 b2 ⎠
0 0 a2 0 b3
Then the resultant of f, g with respect to m, n is resm,n (f, g) := det(M ).
If we make the natural choice m = deg(f ) and n = deg(g), then we skip the
reference to m and n .
B.1.8. Suppose that f is monic and g is an arbitrary polynomial in R[t], then
res(f, g) is invertible in R if and only if the ideal generated by f and g is R[t]
(N. Bourbaki [49], Ch.IV, §6, No.6, Prop.7, Cor.1). In particular for a field R , we
have res(f, g) = 0 if and only if f and g are coprime (i.e. if and only if they have
no common root in an algebraic closure).
B.1.9. The resultant may be used to compute the discriminant in terms of the co-
efficients. Let f (t) be the formal derivative of the monic polynomial f (t) =
tn + a1 tn−1 + · · · + an ∈ R[t] of degree n . Then
n (n −1)
Df = (−1) 2 resn,n−1 (f, f ).
([49], Ch.IV, §6, No.7, Prop.11). This shows that Df is a polynomial of degree
≤ 2n − 2 in a1 , . . . , an with coefficients in R and we have certainly equality
in “most” cases. Note that in contrast to Remark B.1.6, the a1 , . . . , an are not
weighted, i.e. we consider a1 , . . . , an as variables of degree 1.
B.1.10. The considerations in B.1.9 make it natural to define the discriminant of
any polynomial f (t) := a0 tn + a1 tn−1 + · · · + an by
Df := a02n−2 Dg ,
B.1. Discriminants 589
B.1.14. Now we study discriminants in the case of Dedekind domains. The main
reference is the book of J.-P. Serre [276]. A Dedekind domain is an integrally
closed domain of Krull dimension ≤ 1. Let R be a Dedekind domain with field
of fractions K . We consider a finite-dimensional separable field extension L/K .
In most of our applications, R will be the ring of algebraic integers in a number
field. The integral closure of R in L is denoted by R̄L . Note that R̄L is not
necessarily a free R -module. The integral closure is only an R -lattice, i.e. a
finitely generated R -module which generates L as a K -vector space (see [276],
Ch.I, §4, Prop.8). Let n := [L : K]. Then the discriminant of R̄L over R is the
ideal dR̄L /R in R generated by
{det(TrL/K (ai bj )) | a1 , . . . , an , b1 , . . . , bn ∈ R̄L }.
Since the K -bilinear form α, β := TrL/K (αβ) on L is not degenerated (as
L/K is separable, see [49], Ch.V, §10, no.6, Prop.10), the discriminant is not
zero.
590 RAMIFICATION
and hence dL/K is the principal ideal of R generated by det(σi (ej ))2 .
Example B.1.16. Let d√ be a square free integer, |d| ≥ 2 . We consider the quadratic
√ field L := Q( d) . We first determine the algebraic integers in L . Let α :=
number
r + s d ∈ L with r, s ∈ Q . Then
α2 − 2rα + r2 − s2 d = 0.
Hence α is an algebraic integer if and only if 2r, r2 − s2 d ∈ Z . For α to be an algebraic
integer, it is necessary that r = m/2, s = n/2 for m, n ∈ Z . If m is even, then r2 −
s2 d = (m2 − n2 d)/4 is an integer if and only if n is even. If m is odd, then r2 − s2 d ∈ Z
if and only if n is odd and d ≡ 1 (mod 4) . This proves that OL has Z -basis
√
1, 1+2 d if d ≡ 1 (mod 4),
√
1, d if d ≡ 2, 3 (mod 4).
In the first case, Remark B.1.15 shows
D √ 2 E
1+ d
1 2√
dL/Q = det 1− d
= [d].
1 2
B.1.18. The Kähler differentials build an R̄L -module Ω1R̄ L /R := I/I 2 , where I is the
kernel of tensor multiplication R̄L ⊗R R̄L → R̄L (compare with A.7.29). If all residue
extensions of L/K are separable, then Ω1R̄ L /R is generated by one element as an R̄L
-module and DL/K is its annihilator (see [276], Ch.III, §7, Prop.14).
Proposition B.1.19. Let K be the fraction field of the Dedekind domain R and
let L/K and M/L be finite-dimensional separable field extensions. Then
[M :L]
DM/K = DM/L · DL/K , dM/K = NL/K (dM/L ) · dL/K .
Proof: See [276], Ch.III, §4, Prop.8.
B.1.20. By localizing in maximal ideals, we can always reduce to the free case.
More generally, let S be a multiplicative submonoid of R \ {0}. Then the local-
ization S −1 R is still a Dedekind domain with integral closure S −1 R̄L in L and
we have
DS −1 R̄L /S −1 R = S −1 DR̄L /R , dS −1 R̄L /S −1 R = S −1 dR̄L /R .
In particular, if p is a non-zero maximal ideal of R , we get a discrete valuation
ring Rp and the integral closure of Rp in L is (R̄L )p . Since (R̄L )p is a finitely
generated torsion free module over the principal ideal domain Rp , we conclude
that (R̄L )p is a free Rp -module of rank [L : K] ([156], Th.3.10) and
d(R̄L )p /Rp = (dR̄L /R )p .
Note that (dR̄L /R )p is a power of pRp , hence
dR̄L /R = d(R̄L )p /Rp ∩ R
p
is the prime factorization of the discriminant. Similarly, we may proceed for the
different using prime ideals of R̄L .
B.1.21. The formation of the discriminant and the different is also compatible with
completions. Let p be a non-zero maximal ideal of R and let P be a maximal
ideal of S = R̄L with P|p (i.e. P ∩ R = p ). The completions of the discrete
valuation rings Rp and SP are denoted by R̂p and ŜP , respectively. They are
still discrete valuation rings and ŜP is a free R̂p -module of rank equal to the local
degree. By [276], Ch.III, §4, (iii), we have
DŜP /R̂p = R̂P DR̄L /R , dŜP /R̂p = R̂p dR̄L /R .
P|p
B.2.1. We call L/K unramified in w if L/K and the extension of residue fields
k(w)/k(v) are separable and if the local degree [Lw : Kv ] is equal to the residue
degree [k(w) : k(v)]. Otherwise, L/K is called ramified in w . Note that by
passing to completions, the residue degree fw/v and the ramification index ew/v
do not change, hence
ew/v fw/v ≤ [Lw : Kv ] (B.3)
by Proposition 1.2.11. Hence ew/v = 1 if L/K is unramified in w . If the
valuation is discrete, then we have equality in (B.3) and hence [Lw : Kv ] =
[k(w) : k(v)] is equivalent to ew/v = 1. We say that L/K is unramified over v
if L/K is unramified in every place of L lying over v .
Lemma B.2.2. Assume that L is generated over K by a root of a monic f ∈
Rv [t], where Rv is the valuation ring of v . If the discriminant Df is a unit in
Rv , then L/K is unramified over v .
Proof: We call the root α . Using B.1.12, we deduce that α and hence L/K are separa-
ble. A standard trick using the ultrametric inequality or Gauss’s lemma (see Lemma 1.6.3)
proves α ∈ Rv . By B.1.12, we have Df¯ = Df = 0̄ , where the bar denotes reduction
to the residue field. By Remark B.1.5, the polynomial f¯ is separable. By Gauss’s lemma
and B.1.11, we may assume that f is also irreducible. Let f = f1 · · · fr be the factoriza-
tion into irreducible polynomials fj ∈ Kv [t] . By Gauss’s lemma again, we may assume
that the coefficients of these polynomials are contained in the discrete valuation ring of the
completion. By separabiltiy of f¯, the factors fj are pairwise coprime. The polynomial
f1 , . . . , fr are in one-to-one correspondence with the places of L over v (see Proposition
1.3.1), we denote the latter by w1 , . . . , wr . By Hensel’s lemma (see Lemma 1.2.10), f¯j is
a power of an irreducible polynomial in k(v)[t] . Using separability of f¯, we conclude that
fj is indeed irreducible over k(v) . Since Kv [t]/[fj (t)] = Lw j (see Proposition 1.3.1),
we get
[Lw j : Kv ] ≥ fw j /v ≥ deg(fj ) = deg(fj ) = [Lw j : Kv ]
and hence equality occurs everywhere. We conclude that ᾱ is a primitive separable element
of k(wj )/k(v) . Hence L/K is unramified in every wj proving the claim.
Proof: Since LL /L is generated by separable elements (from L ), the extension is separa-
ble. Note that the completion (LL )u is the composite of Lw and L . This follows since
Lw is the closure of L in (LL )u and Lw L is a finite-dimensional field extension of the
complete field Lw , hence Lw L is also complete (Proposition 1.2.7). For the restriction
w of u to L , we get
(LL )u = Lw L = Lw Lw .
Because of the invariance of the residue fields under completions, we may assume that L
and L are complete with respect to w and w , respectively. Since k(w)/k(v) is separable,
the theorem of the primitive element (see [156] §4.14) gives k(w) = k(v)(ᾱ) for a suitable
α in the valuation ring Rw of w . As a consequence of unramifiedness and completeness,
we have [L : K] = [k(w) : k(v)] , hence α is a primitive element of L/K . Let f be the
minimal polynomial of α over K , then our above considerations also show that we may
assume f to be a monic polynomial with coefficients in Rv and that the reduction f¯ is
the minimal polynomial of ᾱ over k(v) . Obviously, α is a primitive element of LL /L .
By Remark B.1.5, the discriminant Df¯ of the separable polynomial f¯ is not zero. Using
Df¯ = Df (see B.1.12), we conclude that Df is a unit in Rv . Finally, Lemma B.2.2 proves
that LL /L is unramified in u .
Example B.2.7. Let K := Q2 (t) be the quotient field of the polynomial ring over the field
Q2 of 2 -adic numbers. On K , we have the discrete valuation v induced by
i
ai t := max |ai |2 ,
i
i v
which we considered√in Definition 1.6.1. Clearly, the same construction gives a discrete
valuation w on K( t) extending v . Obviously, we have ew/v = 1 and the residue
√
field k(w) = F2 (√ t) is a non-separable extension of k(v) = F2 (t) of degree 2 . Since
fw/v = 2 = [K( t) : K] , we conclude that w is the unique valuation over v . Although
√
2|v(t) = 0 , the extension K( t)/K is ramified over v .
Definition B.2.8. Let E/K be an algebraic field extension which is not necessar-
ily finite-dimensional and let v be a non-archimedean place of K . By Corollary
B.2.5, the union of all intermediate fields L of E/K with [L : K] < ∞ and
with L/K unramified over v is still a subfield of E denoted by Kvnr . It fol-
lows from Corollary B.2.5 that a finite-dimensional subextension L/K of E/K
is unramified over v if and only if L is contained in Kvnr .
B.2.9. Let R be a Dedekind domain with field of fractions K = R and let L/K
be a finite-dimensional separable field extension. Then the integral closure R̄L of
R in L is also a Dedekind domain (see B.1.14). By Theorem A.8.5, the localiza-
tion Rp in a maximal ideal p of R is a discrete valuation ring, inducing a place vp
of K . We say that L/K is unramified in a maximal ideal P of R̄L if the exten-
sion is unramified in the corresponding place vP of L. A fundamental property
of a Dedekind domain is that every non-zero ideal has a unique factorization into
maximal ideals (see [157], Th.10.1). Let p := P ∩ R with factorization
pR̄L = Pe11 . . . Perr
into different maximal ideals P1 , . . . , Pr of R̄L . Note that ej is the ramifica-
tion index of vPj over vp and that every place of L over vp has this form. We
conclude that L/K is unramified in P if and only if P occurs in the prime fac-
torization with exact power 1 and if R̄L /P is a separable field extension of R/p .
B.2.10. Under the assumptions of B.2.9, let P be a non-zero prime ideal of R̄L
and let p := P ∩ R . We assume that R̄L /P is a separable field extension of R/p .
Then vP is said to be wildly ramified over vp if and only if p|evP /vp . Otherwise,
vP is called tamely ramified over vp . Note that unramified is considered as a
special case of tame ramification.
If K has only complex archimedean places, we fix a place w and we consider the open
convex subset
S∞ := {α ∈ K∞ | |#(αw )|w < | 12 |w , |(αw )|w < |C|w , |αv |v < | 12 |v ∀v|∞, v = w}.
Then the same argument as above yields the claim.
Corollary B.2.15. Let K be a number field and let S be a finite set of places
of K containing all archimedean ones. Then there are only finitely many number
fields L in K of bounded degree which are unramified outside of S .
Proof: The transitivity rule of discriminants (see B.1.19) implies
[L:K ]
dL/Q = NK/Q (dL/K ) · dK/Q .
By Dedekind’s discriminant theorem (see Theorem B.2.12), the norm is bounded. We con-
clude that DL/Q is also bounded and hence the above theorem proves the claim.
For completeness, we state Minkowski’s discriminant theorem:
Theorem B.2.16. Let K be a number field of degree d with exactly s complex
places. Then
π 2s d2d
|DK/Q | ≥ · .
4 (d!)2
The proof is another application of Minkowski’s first theorem. For details, we refer to [162],
Th.2.13.5.
d d
Remark B.2.17. Note that the function f (d) := π4 dd! satisfies
f (d+1) d
f (d)
= π4 1 + d1 ≥ π2 > 1,
hence Minkowski’s discriminant theorem shows that |DK/Q | > 1 for K = Q and that d
is bounded in terms of the discriminant.
B.2.18. Let L/K be a finite-dimensional Galois extension and let w be a place
of L lying over the complete discrete valuation v of K . We assume also that the
residue field extension k(w)/k(v) is separable. We will find a maximal unram-
ified subextension LI /K such that L/LI is totally ramified, meaning that the
ramification index is equal to the degree.
By Proposition 1.2.7 and Corollary 1.3.5, the Galois group D = Gal(L/K) op-
erates isometrically on L with respect to | |w , hence reduction modulo w gives a
homomorphism ε : D → Gal(k(w)/k(v)). The kernel of ε is called the inertia
group and is denoted by I . From Hensel’s lemma in 1.2.10, Gauss’s lemma in
1.6.3 and the primitive element theorem ([156], §4.14), we deduce:
(a) The extension k(w)/k(v) is Galois and ε is surjective.
For the fixed field LI of I , Galois theory and (a) yield
∼
Gal(LI /K) → Gal(k(w)/k(v))
and hence the theory of finite fields (see [156], §4.13) proves:
B.2. Unramified extensions 597
(b) Gal(LI /K) is a cyclic group of order equal to the residue degree fw/v .
In this section, we assume that the reader is familiar with the language of schemes. It will be
of minor importance in the book and serves mainly to connect the results about unramified
morphisms from A.12 with Appendix B. We consider a morphism ϕ : X → X of finite
type between noetherian schemes.
B.3.1. The morphism ϕ is called unramified in x ∈ X if the following conditions are
satisfied:
Now we proceed as in the proof of Proposition 10.3.3 using the same notation. Since Q
is an Rv -integral point of X , we have ai (Q) ∈ Rv and hence we may choose a = 1 .
Similarly, f (P ) is a unit in Rw . So we may assume α = 1 . This proves that the dis-
criminant of the completions K(P )w /Kv is equal to the valuation ring R̂v of Kv . Using
base change to R̂v , the argument at the beginning of the proof shows that the extension
K(P )w = Kv (P )/Kv is separable. Now B.2.13 implies that K(P )w /Kv is unrami-
fied over v . Since residue fields do not change by passing to completions (see Proposition
1.2.11), it follows that K(P )/K is also unramified.
B.4.4. We have seen in the proof above that Ω1Y /B is a torsion sheaf, hence the
stalk Ω1Y /B,v is of finite length for a prime divisor v of Y (following also from
the proof below). So we may define the ramification divisor of p by
Rp := (Ω1Y /B,v )v,
v
where is the length. It is supported in the set R considered above.
600 RAMIFICATION
(Ω1Y /B ,v ) = ordv (det(νv )). (B.5)
∗
Let W ⊂ V be a trivalization of p Ω1B
and Ω1Y. With respect to the trivialization, γW :=
det(ν) is a regular function on W . Clearly, (γW ) is a well-defined Cartier divisor on V .
By (B.5), its associated Weil divisor is the ramification divisor Rp . On the other hand, we
may consider det(ν) as an injective homomorphism
det(ν) : p∗ KB |V −→ KY |V
induced by pull-back. For an invertible meromorphic section s of KB reg , we get an invert-
ible meromorphic section s := det(ν) ◦ p∗ (s) of KY |V . Working locally with respect to
the trivializations, it is easy to check that
div(s ) = p∗ (div(s)) + Rp |V
where e is the multiplicity of φ∗ (D) in v . We claim that Ω1C,v is generated by dπv and
similarly Ω1C ,v is generated by dπv . If v is k -rational, we argue as follows. By A.7.8,
we may identify the fibre of Ω1C over v with mv /m2v . As πv generates the latter and
corresponds to dπv in the former, we conclude that dπv generates also the stalk Ω1C,v . In
general, we may use base change to k . Using that v is geometrically reduced (see A.4.6),
πv is a local parameter in all points of Xk lying in v . The special case considered above
and the compatibility of Ω1C with base change (use A.7.6) show that dπv is a basis of
Ω1C k ,x for all x ∈ v . The stalk Ω1C,v is free of rank 1 over the discrete valuation ring
OC,v proving that dπv is a basis of Ω1C,v .
Returning to the proof of our original claim, Leibniz’s rule gives
dπv = eπve−1
u dπv + πve du .
By the short exact sequence (B.4) on page 599 and using char(k) = 0 , we conclude that
Ω1C /C,v has length e − 1 proving immediately the claim.
analytic coordinates. Hence the above proof shows that Rφ has multiplicity e − 1
in v where e is the order of zeros of the holomorphic function πv ◦ φ ◦ πv−1
(z)
in z = 0.
Proposition B.4.9. Let w be a prime divisor of Y with ramification index ew/v
over v := p(w) (see Example 1.4.13). Then the multiplicity of the ramification
divisor Rp in w is equal to ew/v − 1.
Proof: We have Ω1Y /B ,w = Ω1OY , w /OB , v . By B.1.18, this module over the discrete
valuation ring OY ,w is generated by one element and its annihilator is the different of
OY ,w /OB ,v . We conclude that
(Ω1Y /B ,w ) is the order of the different in the place w . By
Dedekind’s different theorem (see B.2.11), we get the claim.
A P P E N D I X C G E O M E T RY O F N U M B E R S
C.1. Adeles
We first recall the existence and uniqueness of a Haar measure on a locally compact
group. In the special case of a completion of a number field K of degree d , we
prove this by an explicit construction. Afterwards we introduce adeles, which are
an important tool in number theory, following A. Weil [328]. Section C.2, with
McFeat’s version [198] of Minkowski’s second theorem in an adelic setting, is
modeled after the exposition of Bombieri and Vaaler [35]. Section C.3 presents
J.D. Vaaler’s cube slicing inequality [302] using also techniques from A. Prékopa
[236].
Theorem C.1.1. Let G be a locally compact group. There is a non-zero positive
left-invariant Borel measure µG on G , i.e.
f (yx) dµG (x) = f (x) dµG (x)
G G
for all y ∈ G and all continuous complex functions f with compact support. The
measure µG is uniquely determined up to positive multiples and is called a Haar
measure on G .
We refer to N. Bourbaki [46], Ch.7, §1, no.2, Th.1 for a proof. Now let H be a
normal closed subgroup and let π : G −→ G/H be the quotient morphism. By
definition, a subset U of G/H is open if and only if π −1 (U ) is open in G . Note
that π is an open map and hence G/H is a locally compact group.
Corollary C.1.2. Given Haar measures µG , µH on G and H , there is a unique
Haar measure µG/H on G/H such that
f (x) dµG (x) = dµG/H (π(x)) f (xy) dµH (y)
G G/H H
for all continuous complex functions f with compact support. Moreover, this for-
mula continues to hold for any f ∈ L1 (G, µG ).
Proof: We prove easily that
G/H −→ C, Hx
→ f (xy)dµH (y)
H
602
C.1. Adeles 603
is a positive linear functional on the space of continuous complex functions with compact
support. By the Riesz representation theorem ([249], Th.2.14), there is a unique Borel
measure µ on G such that
f (x) dµ(x) = dµG/H (π(x)) f (xy) dµH (x)
G/H H
for every continuous complex function f on G with compact support. Clearly, µ is left
invariant and hence it is equal to µG up to a positive multiple (Theorem C.1.1). By normal-
ization, we get the first claim. In order not to go into too many details of measure theory,
we only refer for the last claim to [46], Ch.7, §2, no.3, Prop.5. The argument proceeds by
showing that, for µG/H -almost every π(x) ∈ G/H , the function y
→ f (xy) is µH -
-
integrable on H , then that the function x
→ H f (xy) dµH (x) is µG/H -integrable on
G/H , and finally that the desired formula holds.
= p−fv .
1/ev
δ = πv = pv
This proves (C.1) first for Ω = Bδn (0). By translation invariance, we get (C.1)
for any closed ball and by uniqueness of the extension we get it for all Borel
measurable subsets Ω of Kv .
C.1.9. Now we fix the Haar measures βv on Kv and on the adeles in the following
way:
For a finite subset S of MK containing all the archimedean primes, the product
measure
βS := βv × βv |Rv
v∈S v∈S
is a Haar measure on the open topological subgroup HS of KA introduced in
Remark C.1.5. The measures βS fit together to give a Haar measure β on KA .
Clearly, the counting measure is a Haar measure on the discrete subgroup K . By
Corollary C.1.2, we get a uniquely determined Haar measure βKA /K on KA /K .
Proposition C.1.10. The volume of KA /K with respect to the Haar measure
βKA /K is 1.
Proof: The fundamental domain Ω is measurable. Since it is contained in a compact subset,
Ω has finite measure. By Corollary C.1.2, we have
β(Ω) = βK A /K (KA /K).
By definition, we have
⎛ ⎞
1/2
β(Ω) = ⎝ βv ⎠ (Ω∞ ) DK . (C.3)
v /Qp p
v|∞ v finite
Now (C.3) and (C.4) show that β(Ω) = 1 , proving the claim.
C.1. Adeles 607
C.1.11. To motivate our normalizations of the Haar measures in C.1.9, we have to use
duality and Fourier theory. The following considerations are not used in the sequel. For
proofs, we refer to N. Bourbaki [48], E. Hewitt and K.A. Ross [150], [151], W. Rudin
[253], and A. Weil [326].
Let G be a locally compact abelian group. A character of G is a continuous homomor-
phism of G into T := {z ∈ C | |z| = 1} . Together with the compact-open topology
(also called the topology of uniform convergence on compact sets), the set of characters G
is a locally compact abelian group. For x ∈ G and γ ∈ G , let x, γ := γ(x) . Then
we get a perfect duality between G and G , i.e. the canonical homomorphism of G into
the characters of G is an isomorphism. For a continuous complex function f on G with
compact support, the Fourier transform is the continuous complex function f on G given
by
f(γ) := f (x)x, γ dµ(x),
G
In this section, we prove Minkowski’s second theorem over the adeles. We begin
by introducing various types of lattices. This material is mainly borrowed from
[328].
We fix notation in the following way. Let K be a number field of degree d ,
N ∈N , v ∈ M
N
K . By E , Ev , E∞ , EA , we denote the euclidean spaces K ,
N N N
Kv , v|∞ Kv , and KA . We fix the Haar measure on EA by using the N -fold
product of the Haar measure β on KA introduced in C.1.9.
Definition C.2.1. For a finite place v of K , a Kv -lattice in Ev is an open and
compact Rv -submodule of Ev .
Proposition C.2.2. Let Λv be an Rv -submodule of Ev . Then Λv is a Kv -lattice
in Ev if and only if Λv is a finitely generated Rv -module which generates Ev as
a Kv -vector space.
Proof: Let Λv be a Kv -lattice. Since Λv is open, it is clear that Ev is generated by Λv as
a Kv -vector space. The Rv -span of N linearly independent vectors over K is an open and
compact Rv -submodule of Ev . Note that Λv is covered by such submodules contained in
Λv . Since Λv is compact, we can select a finite subcovering. This leads to a finite set of
generators for Λv .
Conversely, let Λv be a finitely generated Rv -submodule which generates Ev as a Kv -
vector space. As a continuous image of some RvM , the space Λv is compact. There is a
Kv -basis of Ev contained in Λv . Let U be its Rv -span. Then U is an open neighbour-
hood of 0 and U ⊂ Λv . By a translation argument, we conclude that Λv is open in Ev
and hence Λv is a Kv -lattice in Ev .
(a) Λ is an R -lattice in V ;
(b) Λ is discrete in V and contains an R -basis of V ;
(c) Λ has a Z-basis which is an R -basis for V .
Proof: (a) ⇒ (b). We claim that Λ generates V as an R -vector space. Let W be a
complementary subspace of RΛ . Then W is contained in the compact V /Λ , hence W =
0 . We conclude that Λ spans V over R , hence it contains an R -basis of V .
C.2. Minkowski’s second theorem 609
(b) ⇒ (c). We may assume that V = RN and that Λ contains the standard basis e1 , . . . ,
eN . Obviously, Λ is generated by S := {λ ∈ Λ | maxi |λi | ≤ 1} . Since Λ is discrete
in V , it is a closed subgroup (as in the proof of Theorem C.1.6). As an intersection of
a compact cube and a discrete closed subset, our S has to be finite. Thus Λ is a finitely
generated abelian group without torsion, hence free of finite rank r ≥ N . Since every
element of Λ/ ⊕j Zej has a representative in the finite set S , we conclude that r = N ,
proving (c).
Finally, (c) ⇒ (a) is obvious.
Note that the first projection maps C onto E∞ /Λ∞ , hence this proves compactness of the
latter.
We have an isomorphism
(E∞ × Λfin ) /Γ ∼
= (E + (E∞ × Λfin )) /E (C.6)
of locally compact groups. Since E∞ × Λfin is an open subgroup of EA , we conclude that
E + (E∞ × Λfin ) and also its complement are open subgroups of EA . This is easily seen
by writing them as a union of cosets of E∞ × Λfin . Hence the right-hand side of (C.6)
is a closed subset of the compact quotient EA /E (Theorem C.1.6). This proves that the
left-hand side of (C.6) is also compact. Since the quotient homomorphism is an open map,
we can cover (E∞ × Λfin ) /Γ by the images of finitely many open subsets Ui which are
relatively compact in E∞ × Λfin . Choosing C = ∪i Ui , we get (C.5). This finishes the
proof that Λ∞ is an R -lattice in E∞ .
Now Proposition C.2.4 shows that every Z -basis of Λ∞ is an R -basis of E∞ . Hence Λ
is a free abelian group of finite rank which generates E as a Q -vector space. Obviously, Λ
is a K -lattice in E .
Next, we prove that the closure of Λ in Ev equals Λv for every finite v . Let xv ∈ Λv
and let ε > 0 , then the strong approximation theorem (see Theorem 1.4.5) applied to the
coordinates of xv shows that it exists x ∈ E such that |x − xv |v < ε and x ∈ Λw for
all finite w = v . If ε is sufficiently small, then the openness of Λv implies also x ∈ Λv .
Thus x ∈ Λ proving the density of Λ in Λv .
It remains to prove uniqueness. Let Λ be another K -lattice with closure Λv in Ev for
every finite v . By construction, we have Λ ⊂ Λ . Since both are free abelian groups of
rank N d , the index is finite. Hence there is a non-zero m ∈ Z such that mΛ ⊂ Λ . The
strong approximation theorem shows that Λ is dense in Λfin with respect to the diagonal
embedding (choose OK -generators of Λ and then apply it to the coordinates). Let λ ∈ Λ .
Since mΛfin is an open subgroup of Λfin , there is λ ∈ Λ such that λ − λ ∈ mΛfin . We
deduce
λ − λ ∈ mΛfin ∩ E = mΛ ⊂ Λ
proving λ ∈ Λ . We conclude Λ = Λ .
Using the substitution y = µy and noting that (KAN −n )∞ has real dimension d(N − n) ,
we get
vol (Φn (µT )) = µd(N −n) vol Φn (µT )(µy ) dy . (C.7)
N −n
KA
By (µT )(µy ) = µT (y ) and the case N = n , we conclude
vol Φn (µT )(µy ) ≥ vol Φn T (y ) . (C.8)
given by
(x1 , . . . , xn , xn+1 , . . . , xN ) → (x1 , . . . , xn+1 , xn+2 , . . . , xN )
C.2. Minkowski’s second theorem 613
which follows from (C.10) for n = 0 and the transformation formula. By Propo-
sition C.1.10, the left-hand side of (C.11) is bounded by 1, thereby proving the
Davenport–Estermann theorem.
C.2.17. Proof of Minkowski’s second theorem: Note that a change of coordinates
in EA by an invertible N × N matrix γ does not affect the statement, because
Example C.1.3 shows that the volume changes by
det(γ)v = 1.
v∈MK
Thus we may assume that for n = 1, . . . , N the closure of λn S contains the first
n elements of the standard basis e1 , . . . , eN of EA = KAN .
Let us apply the Davenport–Estermann theorem with T = S . It is enough to prove
µn ≥ 12 λn . We proceed by induction on n . Let x, y ∈ 12 λn S with x − y ∈ E .
Since S is convex and symmetric, we get
1 1
x−y = (2x) + (−2y) ∈ λn S.
2 2
614 G E O M E T RY O F N U M B E R S
C.2.18. We compare our adelic approach with the classical geometry of numbers.
Let Λ∞ be an R -lattice in RN and let S∞ be a non-empty open convex symmet-
ric bounded subset of RN . Let λ1 , . . . , λN be the classical successive minima of
S∞ with respect to Λ∞ defined by
" #
λn := inf t > 0 | tS∞ contains n linearly independent vectors of Λ∞ .
The classical second theorem of Minkowski (see [181], Ch.2, §9, Th.1) states
that
λ1 · · · λN vol(S∞ ) ≤ 2N vol(Λ∞ ),
where the volumes are taken with respect to the Lebesgue measure on RN , with
vol(Λ∞ ) the volume of a fundamental domain of RN /Λ∞ .
We show that this is equivalent to Theorem C.2.11. In order to see that the adelic version
implies the classical theorem we may assume, by a linear transformation, that Λ∞ = ZN
and then we may apply Theorem C.2.11 in the case K = Q .
Conversely, let Λ be a K -lattice in E = K N and let Sv be a non-empty open symmetric
convex bounded subset of Ev , for every non-archimedean place v . We choose a Z -basis
C.3. Cube slicing 615
of OK . Then we may identify E = K N with QN d , and similarly v|p KvN with QN d
p ,
for every p ∈ MQ by (C.2) on page 605. By Proposition C.1.10, the normalized Haar
measures on KAN and on QN A
d
an R -lattice
agree. Our K -lattice Λ may be viewed as
in RN d . For the non-empty open convex symmetric bounded subset S∞ = v|∞ Sv of
RN d and for S = S∞ × v finite Λv ⊂ EA as in Theorem C.2.11, we claim that
vol(S) = vol(S∞ )/vol(Λ∞ ), (C.12)
where on the left-hand side of the equation the volume is taken with respect to the normal-
ized Haar measure from C.1.9, while on the right the volume of S∞ is taken with respect
to the usual real Lebesgue measure on E∞ = RN d and vol(Λ∞ ) is the volume of a
fundamental domain of E∞ /Λ∞ .
We have already seen that in proving this we may assume K = Q . By the argument in
C.2.17, both sides are invariant under a transformation by A ∈ GL(N d, Q) , so we can
reduce everything to the case Λ = ZN d , where the claim is obvious.
Let λ1 , . . . , λN d be the classical successive minima of S∞ with respect to Λ∞ . Then it
is clear that λ1 = λ1 and λj ≤ λd(j−1)+1 . Hence (C.12) shows that the classical second
theorem of Minkowski implies Theorem C.2.11.
This is a special case of the classical second theorem of Minkowski using λ1 ≤ λn . It is
also an immediate consequence of Blichfeldt’s principle [26], a special case which states
that, if Σ is a measurable set, k ∈ N , and vol(Σ) > k vol(Λ∞ ) , then there is a translate of
Σ containing k + 1 distinct points of Λ∞ . Birkhoff’s elementary proof in H.F. Blichfeldt
[26] (see [181], pp.35, 40–43 for extensions and an alternative proof) may be presented as
follows. By intersecting Σ with a sufficiently large ball, we may assume that Σ is bounded.
Let R be a parallelepiped which is a fundamental domain for Λ∞ and consider all lattice
translates of R by x ∈ Λ∞ which intersect Σ ; they cover Σ . The sum of the volumes
of the sets Σ ∩ (R + x) − x ⊂ R is vol(Σ) > k vol(R) ; hence there is a point z ∈ R
which belongs to at least k + 1 such sets (the sum of the characteristic functions cannot be
bounded by k ). The translate Σ − z contains at least k + 1 points of Λ∞ .
Minkowski’s first theorem is immediate by applying Blichfeldt’s principle to Σ = (1/2)S∞
with k = 1 , because if Σ is a convex symmetric set about the origin and y1 , y2 ∈ Σ then
y1 − y2 ∈ 2Σ .
In this section, we prove Vaaler’s theorem [302] that the slice of a linear subspace with the
symmetric unit cube of volume 1 has volume at least 1 . The proof uses some basic facts
about log-concave functions, which we handle first.
616 G E O M E T RY O F N U M B E R S
This is a kind of converse of Hölder’s inequality. In fact, it holds for all non-negative Borel
measurable functions. The proof for log-concave functions is easier and needs just basic
techniques from analysis. For details, we refer to Prékopa [236], Th.3.
Lemma C.3.4. Let f : Rn ×Rm −→ R+ be a log-concave function and let A be a convex
set of Rm . Then
x
→ f (x, y) dy
A
is a log-concave function on Rn if the integral is always finite.
Proof: We fix x1 , x2 ∈ Rn and let λ1 , λ2 > 0 with λ1 +λ2 = 1 . Let x3 := λ1 x1 +λ2 x2 .
For i = 1, 2, 3 , we define a function fi on Rm by
fi (y) := χA (y)f (xi , y),
where χA is the characteristic function of A . Clearly, fi is log-concave and
f3 (y) ≥ sup f1 (y )λ 1 f2 (y )λ 2 .
λ 1 y +λ 2 y =y
Remark C.3.6. Let Bρ(n) be the closed ball of volume 1 in Rn with centre 0 and radius
ρ(n) . It is well known that
1
n 1
ρ(n) = π − 2 Γ
n
+1 .
2
Lemma C.3.7. Let N = n1 + · · · + nr be a partition of N and define
QN := Bρ(n 1 ) × · · · × Bρ(n r ) .
Let A be a closed symmetric convex subset of RN . Then we have
µ(A) ≤ vol(A ∩ QN ),
where µ is the Gauss measure and vol is the Lebesgue measure on RN .
Proof: We prove the lemma by induction on r . Suppose that r = 1 and let N = n . On
the sphere Sn−1 = {x ∈ Rn | |x| = 1} , we consider the Lebesgue measure λn−1 . Then
the polar decomposition x = rx with r > 0 and x ∈ Sn−1 gives
∞
µ(A) = χA (rx ) exp(−πr2 )rn−1 dr dλn−1 (x )
Sn −1 0
For notational simplicity, we do the induction step only in the case r = 2 . Points on
RN = Rn 1 × Rn 2 are denoted by x = (y, z) . For y ∈ Rn 1 , let
Ay := {z ∈ Rn 2 | (y, z) ∈ A} .
Then Lemma C.3.4 shows that the symmetric function
f (y) := dz
B ρ (n ) ∩A y
2
is log-concave on R n1
. Note that the functions
∞
1
fn := · χ{x∈Rn 1 |f (x)≥ k }
n n
k=0
decrease as n → ∞ to f . Since the sets involved in the characteristic functions are closed,
convex, and symmetric, the induction hypothesis and monotone convergence give
f (y) dµ(y) ≤ f (y) dy .
B ρ (n )
1
shows that
f (z)dµ(z) ≤ f (z) dz .
B ρ (n )
2
Using this in (C.13), the induction step is completed by Fubini’s theorem and Remark
C.3.5.
Finally, we are ready to prove Vaaler’s cube-slicing theorem. In the simplest case of a
cube in RN , this simply states that the volume of a linear slice through the centre of a cube
of volume 1 is bounded below by 1 . In general, it states that the volume of a slice through
the centre of a product of balls of volume 1 is bounded below by 1 .
Theorem C.3.8. Let N = n1 + · · · + nr be a partition and let QN := Bρ(n 1 ) × · · · ×
Bρ(n r ) as above. For a real N × M matrix B of rank M , we have
1
det(B t B)− 2 ≤ vol {y ∈ RM | By ∈ QN } .
As ε → 0 , the left-hand side of this inequality tends to 1 . Here, we have used that
µ(Lε ) = µ(L)µ(Bε ) = µ(Bε )
with respect to the decomposition above and
lim µ(Bε )/vol(Bε ) = 1 .
ε→0
Let y ∈ L , not a boundary point of L ∩ QN . Then
[1] A. Abbes, Hauteurs et discrétude (d’après L. Szpiro, E. Ullmo et S. Zhang), Séminaire Bour-
baki, Exposé 825, Vol. 1996/97, Astérisque 245 (1997), 141–166.
[2] D. Abramovich, Uniformité des points rationnels des courbes algébriques sur les extensions
quadratiques et cubiques, C. R. Acad. Sci. Paris Sér. I Math. 321 (1995), 755–758.
[3] L.V. Ahlfors, Beiträge zur Theorie der meromorphen Funktionen, Skand. Mathematik-
erkongress 7 (1930), 84–88.
[4] L.V. Ahlfors, Über eine in der neueren Wertverteilungstheorie betrachtete Klasse transzen-
denter Funktionen, Acta Math. 58 (1932), 375–406. Also Collected Papers. Vol. 1, 112–143.
Birkhäuser, Boston-Basel-Stuttgart 1982. xx+520 pp.
[5] L.V. Ahlfors, Über eine Methode in der Theorie der meromorphen Funktionen, Soc. Sci. Fenn.
Comm. Phys.-Math. 8, No. 10 (1935), pp.1–14. Also Collected Papers. Vol. 1, 190–203.
Birkhäuser, Boston-Basel-Stuttgart 1982. xx+520 pp.
[6] L.V. Ahlfors, Zur Theorie der Überlagerungsflächen, Acta Math. 65 (1935), 157–194.
[7] L.V. Ahlfors, The theory of meromorphic curves, Acta Soc. Sci. Fennicae Nova Ser. A. 3, No.
4 (1941), 31 pp.
[8] L.V. Ahlfors, Complex Analysis: An Introduction to the Theory of Analytic Functions of
One Complex Variable. Third edition. International Series in Pure and Applied Mathematics.
McGraw-Hill Book Co., New York 1978. xi+331 pp.
[9] F. Amoroso and S. David, Le problème de Lehmer en dimension supérieure, J. reine angew.
Math. 513 (1999), 145–179.
[10] F. Amoroso and S. David, Distribution des points de petite hauteur dans les groupes multipli-
catifs, Ann. Scuola Norm. Sup. Pisa Cl. Sci. (5) 3 (2004), 325–348.
[11] F. Amoroso and S. Dvornicich, A lower bound for the height in abelian extensions, J. Number
Th. 80 (2000), 260–272.
[12] F. Amoroso and U. Zannier, A relative Dobrowolski lower bound over abelian extensions, Ann.
Scuola Norm. Sup. Pisa Cl. Sci. (4) 29 (2000), 711–727.
[13] E. Arbarello, M. Cornalba, P.A. Griffiths, and J. Harris, Geometry of Algebraic Curves. Vol.
I. Grundlehren der mathematischen Wissenschaften 267. Springer-Verlag, New York 1985.
xvi+386 pp.
[14] A. Baker, Transcendental Number Theory. Cambridge University Press 1975. ix+147 pp.
[15] A. Baker, Logarithmic forms and the abc -conjecture, in Number Theory: Diophantine, Com-
putational and Algebraic Aspects, 37–44. Györy, Kálmán et al. (eds), Proceedings of the inter-
national conference (Eger, Hungary, 1996). De Gruyter, Berlin 1998.
[16] A. Baker and G. Wüstholz, Logarithmic forms and group varieties, J. reine angew. Math. 442
(1993), 19–62.
[17] V.V. Batyrev and Yu. Tschinkel, Rational points on some Fano cubic bundles, C. R. Acad. Sci.
Paris Sér. I Math. 323, No. 1 (1996), 41–46.
[18] B. Beauzamy, E. Bombieri, P. Enflo, and H.L. Montgomery, Products of polynomials in many
variables, J. Number Th. 36 (1990), 219–245.
620
References 621
[65] C. Chevalley and A. Weil, Un théorème d’arithmétique sur les courbes algèbriques, C. R. Acad.
Sci. Paris Sér. I Math. 195 (1932), 570–572.
[66] K. K. Choi and J. D. Vaaler, Diophantine approximation in projective space, in Number Theory
(Ottawa ON 1996), 55–65. CRM Proc. Lecture Notes 19. Amer. Math. Soc., Providence RI
1999.
[67] W. L. Chow, The Jacobian variety of an algebraic curve, Amer. J. Math. 76 (1954), 453–476.
[68] C.H. Clemens and P.A. Griffiths, The intermediate Jacobian of the cubic threefold, Ann. of
Math. (2) 95 (1972), 281–356.
[69] R. Coleman, Manin’s proof of the Mordell conjecture over function fields, Enseign. Math. (2)
36, No. 3–4 (1990), 393–427.
[70] J.B. Conway, Functions of One Complex Variable. II. Graduate Texts in Mathematics 159.
Springer-Verlag, New York 1995. xvi+394 pp.
[71] J.H. Conway and A.J. Jones, Trigonometric diophantine equations (On vanishing sums of roots
of unity), Acta Arith. 30 (1976), 229–240.
[72] P. Corvaja and U. Zannier, A subspace theorem approach to integral points on curves, C. R.
Math. Acad. Sci. Paris Sér. I Math. 334, No. 4 (2002), 267–271.
[73] P. Corvaja and U. Zannier, On the greatest prime factor of (ab+1)(ac+1) , Proc. Amer. Math.
Soc. 131, No. 6 (2003), 1705–1709.
[74] M. Cugiani, Sull’approssimabilità di un numero algebrico mediante numeri algebrici di un
corpo assegnato, Boll. Un. Mat. Ital. (3) 14 (1959), 151–162.
[75] L.V. Danilov, The Diophantine equation x3 − y 2 = k and a conjecture of M. Hall. (Russian)
Mat. Zametki 32 (1982), 273–275, 425. English translation in Math. Notes 32 (1983), 617–618.
[76] H. Darmon, Faltings plus epsilon, Wiles plus epsilon, and the generalized Fermat equation, C.
R. Math. Rep. Acad. Sci. Canada 19, No. 1 (1997), 3–14. Corrigenda, C. R. Math. Rep. Acad.
Sci. Canada 19, No. 2 (1997), 64.
[77] H. Darmon, F. Diamond, and R. Taylor, Fermat’s Last Theorem. R. Bott et al. (eds) Current
Developments in Mathematics. International Press, Cambridge MA 1995. 1–154.
[78] H. Darmon and A. Granville, On the equation z m = F (x, y) and Axp + By q = Cz r , Bull.
London Math. Soc. 27 (1995), 513–543.
[79] H. Davenport and K.F. Roth, Rational approximations to algebraic numbers, Mathematika 2
(1955), 160–167.
[80] H. Davenport and W.M. Schmidt, Approximation to real numbers by quadratic irrationals. Acta
Arith. 13 (1967/1968), 169–176.
[81] H. Davenport and W.M. Schmidt, Approximation to real numbers by algebraic integers. Acta
Arith. 15 (1968/1969), 393–416.
[82] S. David and P. Philippon, Minorations des hauteurs normalisés des sous-variétés de variétés
abéliennes, in Number Theory, 333–364. V. Kumar Murty (ed.) et al., Proceedings of the Int.
Conference of the Ramanujan Mathematical Society, Providence, Contemp. Math. 210 (1998).
[83] S. David and P. Philippon, Minorations des hauteurs normalisées des sous-variétés des tores,
Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4) 28 (1999), 489–543. Errata: ibid. 29 (2000), 729–731.
[84] S. David and P. Philippon, Minorations des hauteurs normalisées des sous-variétés de variétés
abéliennes II, Comment. Math. Helv. 77 (2002), 639–700.
[85] M. Demazure and P. Gabriel, Groupes algébriques. Tome I: Géométrie Algébrique, Généralités,
Groupes Commutatifs: Avec une appendice Corps de Classes Local par Michel Hazewinkel.
Masson & Cie, Éditeur, Paris; North-Holland Publishing Co., Amsterdam 1970. xxvi+700 pp.
[86] P. Dèbes, Quelques remarques sur un article de Bombieri concernant le Theorème de
Décomposition de Weil, Amer. J. Math. 107 (1985), 39–44.
[87] P. Dèbes, G -fonctions et théorème d’irreducibilité de Hilbert, Acta Arith. 47 (1986), 371–402.
[88] P. Dèbes and U. Zannier, Hilbert’s irreducibility theorem and G -functions, Math. Ann. 309
(1997), 491–503.
624 REFERENCES
[89] L.E. Dickson, History of the Theory of Numbers. Vol. II: Diophantine Analysis. Reprinted
Chelsea Publishing Co., New York 1966. xxv+803 pp.
[90] E. Dobrowolski, On a question of Lehmer and the number of irreducible factors of a polyno-
mial, Acta Arith. 39 (1979), 391–401.
[91] C. Doche, On the spectrum of the Zhang–Zagier height, Math. Comp. 70 (2001), 419–430.
[92] D. Drasin, The inverse problem of the Nevanlinna theory, Acta Math. 138 (1977), 83–151.
[93] D. Drasin, Proof of a conjecture of F. Nevanlinna concerning functions which have deficiency
sum two, Acta Math. 158 (1987), 1–94.
[94] R. Dvornicich and U. Zannier, On sums of roots of unity, Monatsh. Math. 129 (2000), 97–108.
[95] B.M. Dwork and A.J. van der Poorten, The Eisenstein constant, Duke Math. J. 65 (1992), 23–
43. Corrigenda, Duke Math. J. 76 (1994), 669–672.
[96] B. Edixhoven and J.-H. Evertse (eds), Diophantine Approximation and Abelian Varieties: Intro-
ductory Lectures. Papers from the Conference held in Soesterberg, April 12–16, 1992. Lecture
Notes in Mathematics 1566. Springer-Verlag, Berlin 1993. xiv+127 pp.
[97] H. Edwards, Fermat’s Last Theorem: A Genetic Introduction to Algebraic Number Theory.
Graduate Texts in Mathematics 50. Springer-Verlag, New York 1996. xvi+410 pp.
[98] N.D. Elkies, ABC implies Mordell, Internat. Math. Res. Notices (1991), 99–109.
[99] W.J. Ellison, Waring’s problem, Amer. Math. Monthly 78 (1971), 10–36.
[100] A. Erdélyi, W. Magnus, F. Oberhettinger, and F.G. Tricomi, Higher Transcendental Functions.
Vol I. With a preface by Mina Rees. With a foreword by E.C. Watson. Reprint of the 1953
original. Robert E. Krieger Publishing Co., Inc., Melbourne, Fla. 1981. xiii+302 pp. Also: H.
van Haeringen, L.P. Kok, Table errata: Higher Transcendental Functions, Vol. I by A. Erdélyi,
W. Magnus, F. Oberhettinger, and F.G. Tricomi, Math. Comp. 41 (1983), 778.
[101] P. Erdős, C.L. Stewart, and R. Tijdeman, Some diophantine equations with many solutions,
Compos. Math. 66 (1988), 37–56.
[102] H. Esnault and E. Viehweg, Dyson’s lemma for polynomials in several variables (and the
theorem of Roth), Invent. Math. 78 (1984), 445–490.
[103] J.-H. Evertse, On equations in S -units and the Thue–Mahler equation, Invent. Math. 75 (1984),
561–584.
[104] J.-H. Evertse, On sums of S -units and linear recurrences, Compos. Math. 53 (1984), 225–244.
[105] J.-H. Evertse, The subspace theorem of W. M. Schmidt, in Diophantine Approximation and
Abelian Varieties. Introductory Lectures, 31–50. B. Edixhoven and J.-H. Evertse (eds), papers
from the Conference held in Soesterberg, April 12–16, 1992. Lecture Notes in Mathematics
1566. Springer-Verlag, Berlin 1993.
[106] J.-H. Evertse, An explicit version of Faltings’ product theorem and an improvement of Roth’s
lemma, Acta Arith. 73 (1995), 215–248.
[107] J.-H. Evertse, Math. Reviews 95g:11068.
[108] J.-H. Evertse, An improvement of the quantitative subspace theorem, Compos. Math. 101
(1996), 225–311.
[109] J.-H. Evertse, The number of solutions of the Thue–Mahler equation, J. reine angew. Math. 482
(1997), 121–149.
[110] J.-H. Evertse and R.G. Ferretti, Diophantine inequalities on projective varieties, Internat. Math.
Res. Notices (2002), 1295–1330.
[111] J.-H. Evertse and H.P. Schlickewei, A quantitative version of the absolute subspace theorem, J.
reine angew. Math. 548 (2002), 21–127.
[112] J.-H. Evertse, H.P. Schlickewei, and W.M. Schmidt, Linear equations in variables which lie in
a multiplicative group, Ann. of Math. (2) 155 (2002), 807–836.
[113] G. Faltings, Endlichkeitssätze für abelsche Varietäten über Zahlkörpern, Invent. Math. 73
(1983), 349–366. Erratum: ibid. 75 (1984), 381.
[114] G. Faltings, Diophantine approximation on abelian varieties, Ann. of Math. (2) 133 (1991),
549–576.
References 625
[115] G. Faltings, The general case of S. Lang’s conjecture, in Barsotti Symposium in Algebraic
Geometry, 175–182. V. Cristante and W. Messing (eds), papers from the symposium held in
Abano Terme, 1991. Perspectives in Mathematics 15, Academic Press, Inc., San Diego CA
1994.
[116] G. Faltings and G. Wüstholz (eds), Rational Points. Seminar Bonn/Wuppertal 1983/84. Third
enlarged edition. Aspects of Mathematics E6. Vieweg, Braunschweig 1992. x+311 pp.
[117] G. Faltings and G. Wüstholz, Diophantine approximations on projective spaces, Invent. Math.
116 (1994), 109–138.
[118] G. Fano, Sul sistema ∞2 di rette contenuto in una varietà cubica generale dello spazio a quattro
dimensioni, Atti R. Acc. Sc. Torino 39 (1904), 778–792.
[119] R.G. Ferretti, An effective version of Faltings’ product theorem, Forum Math. 8 (1996), 401–
427.
[120] R.G. Ferretti, Mumford’s degree of contact and Diophantine approximations, Compos. Math.
21 (2000), 247–262.
[121] R.H. Fox, On Fenchel’s conjecture about F -groups, Mat. Tidsskr. B. 1952 (1952), 61–65.
[122] J. Franke, Yu.I. Manin, and Y. Tschinkel, Rational points of bounded height on Fano vari-
eties, Invent. Math. 95 (1989), 421–435. Erratum: “Rational points of bounded height on Fano
varieties,” ibid. 102 (1990), 463.
[123] G. Frey, Links between stable elliptic curves and certain diophantine equations, Ann. Univ.
Sarav. Ser. Math. 1 (1986), 1–40.
[124] M. Fried, On the Sprindžuk–Weissauer approach to universal Hilbert subsets, Israel J. of Math.
51 (1985), 347–363.
[125] W. Fulton, Intersection Theory. Second edition. Ergebnisse der Mathematik und ihrer Grenzge-
biete, 3. Folge. Springer-Verlag, Berlin 1998. xiv+470 pp.
[126] H. Gillet and C. Soulé, Arithmetic intersection theory, Publ. Math. IHES 72 (1990), 93–174.
[127] H. Gillet and C. Soulé, An arithmetic Riemann–Roch theorem, Invent. Math. 110 (1992), 473–
543.
[128] A. Granville, ABC allows us to count squarefrees. Internat. Math. Res. Notices (1998), 991–
1009.
[129] H. Grauert, Mordells Vermutung über rationale Punkte auf algebraischen Kurven und Funktio-
nenkörper. Publ. Math. IHES 25 (1965), 131–149.
[130] P. Griffiths and J. Harris, Principles of Algebraic Geometry. Pure and Applied Mathematics.
Wiley-Interscience, New York 1978. xii+813 pp.
[131] P. Griffiths and J. King, Nevanlinna theory and holomorphic mappings between algebraic vari-
eties, Acta Math. 130 (1973), 145–220.
[132] R. Gross, A note on Roth’s theorem, J. Number Th. 36 (1990), 127–132.
[133] A. Grothendieck, Fondements de la géométrie algébrique, Séminaire Bourbaki, Exposé 236,
Vol. 1961/62, Secrétariat Math., Paris 1962.
[134] A. Grothendieck, Eléments de géométrie algébrique I. Le Langage des Schémas. Rédigés avec
la collaboration de J. Dieudonné. Publ. Math. IHES 4 (1960), 228 pp.
[135] A. Grothendieck, Eléments de géométrie algébrique II. Étude globale élémentaire de quelques
classes de morphismes. Rédigés avec la collaboration de J. Dieudonné. Publ. Math. IHES 8
(1961), 222 pp.
[136] A. Grothendieck, Eléments de géométrie algébrique III. Étude cohomologique des faisceaux
cohérents. Rédigés avec la collaboration de J. Dieudonné. I, Publ. Math. IHES 11 (1961), 167
pp.; II, ibidem 17 (1963), 91 pp.
[137] A. Grothendieck, Eléments de géométrie algébrique IV. Étude locale des schémas et des mor-
phismes de schémas. Rédigés avec la collaboration de J. Dieudonné. I, Publ. Math. IHES 20
(1964), 259 pp.; II, ibid. 24 (1965), 231 pp.; III, ibid. 28 (1966), 255 pp.; IV, ibid. 32 (1967),
361 pp.
626 REFERENCES
[160] M. Kim, Geometric height inequalities and the Kodaira–Spencer map, Compos. Math. 105, No.
1 (1997), 43–54. erratum: ibid 121, No. 2 (2000), 219.
[161] M. Kim, D.S. Thakur, and J.F. Voloch, Diophantine approximation and deformation, Bull. Soc.
Math. France 128 (2000), 585–598.
[162] H. Koch, Number Theory: Algebraic Numbers and Functions. Translated from the German by
David Kramer. Graduate Studies in Mathematics 24. AMS, Providence RI 2000. xviii+368 pp.
[163] M. Krasner, Nombre des extensions d’un degré donné d’un corps p-adique, in Les Tendances
Géom. en Algèbre et Théorie des Nombres, 143–169, Editions du Centre National de la
Recherche Scientifique, Paris 1966.
[164] E. Landau, Vorlesungen über Zahlentheorie. Bd. II: Aus der analytischen und geometrischen
Zahlentheorie. Hirzel, Leipzig 1927. Reprinted, Chelsea Publ. Co. 1947. viii+308 pp.
[165] S. Lang, Abelian Varieties. Interscience Publishers, New York 1959. x+169 pp. Reprinted,
Springer-Verlag, New York–Berlin 1983. xii+256 pp.
[166] S. Lang, Integral points on curves, Publ. Math. IHES 6 (1960), 319–335.
[167] S. Lang, Division points on curves, Annali Mat. Pura Appl. (4) 70 (1965), 229–234.
[168] S. Lang, Introduction to Algebraic and Abelian Functions. Second edition. Graduate Texts in
Mathematics 89. Springer-Verlag, New York–Berlin 1982. ix+169 pp.
[169] S. Lang, Fundamentals of Diophantine Geometry. Springer-Verlag, New York 1983. xviii+370
pp.
[170] S. Lang, Introduction to Complex Hyperbolic Spaces. Springer-Verlag, New York 1987.
viii+271 pp.
[171] S. Lang, Number Theory III: Diophantine Geometry. Encyclopaedia of Mathematical Sciences,
Vol. 60. Springer-Verlag, Berlin 1991, xiv+296 pp.
[172] S. Lang, Algebraic Number Theory. Second edition. Graduate Texts in Mathematics 110.
Springer-Verlag, New York 1994. xiv+357 pp.
[173] S. Lang, Algebra. Revised third edition. Graduate Texts in Mathematics 211. Springer-Verlag,
New York 2002. xvi+914 pp.
[174] S. Lang and W. Cherry, Topics in Nevanlinna Theory. With an Appendix by Zhuan Ye. Lecture
Notes in Mathematics 1433. Springer-Verlag, Berlin 1990. 174 pp.
[175] S. Lang and H. Trotter, Continued fractions for some algebraic numbers, J. reine angew. Math.
255 (1972), 112–134; addendum, ibid. 267 (1974), 219–220.
[176] M. Langevin, Cas d’égalité pour le théorème de Mason et applications de la conjecture (abc) ,
C. R. Acad. Sci. Paris Sér. I Math. 317 (1993), 441–444.
[177] M. Langevin, Liens entre le théorème de Mason et la conjecture (abc) , in Number Theory
(Ottawa ON 1996), 187–213. CRM Proc. Lecture Notes 19, AMS, Providence RI 1999.
[178] M. Laurent, Équations diophantiennes exponentielles, Invent. Math. 78 (1984), 299–327.
[179] M. Laurent, M. Mignotte, and Yu. Nesterenko, Formes linéaires en deux logarithmes et
déterminants d’interpolation, J. Number Th. 55 (1995), 285–321.
[180] D.H. Lehmer, Factorization of certain cyclotomic functions, Ann. of Math. 34 (1933), 461–479.
[181] C.G. Lekkerkerker, Geometry of Numbers. Bibliotheca Mathematica, Vol. VIII. Wolters-
Noordhoff Publishing, Groningen; North-Holland Publishing Co., Amsterdam–London 1969.
ix+510 pp.
[182] P. Liardet, Sur une conjecture de Serge Lang. (French) Journées Arithmétiques de Bordeaux
(Conf., Univ. Bordeaux, Bordeaux 1974), 187–210. Astérisque 24–25, Soc. Math. France,
Paris 1975.
[183] R. Louboutin, Sur la mesure de Mahler d’un nombre algébrique, C. R. Acad. Sci. Paris Sér. I
Math. 296 (1983), 707–708.
[184] H. Luckhardt, Herbrand-Analysen zweier Beweise des Satzes von Roth: Polynomiale An-
zahlschranken, J. Symbolic Logic 54 (1989), 234–263.
[185] K. Mahler, On the fractional parts of the powers of a rational number (II), Mathematika 4
(1957), 122–124.
628 REFERENCES
[186] K. Mahler, Lectures on Diophantine Approximations. Part I: g -adic Numbers and Roth’s
Theorem. Prepared from the notes by R. P. Bambah of my lectures given at the University
of Notre Dame in the Fall of 1957. University of Notre Dame Press, Notre Dame, Ind. 1961.
xi+188 pp.
[187] K. Mahler, On some inequalities for polynomials in several variables, J. London Math. Soc. 37
(1962), 341–344.
[188] K. Mahler, An inequality for the discriminant of a polynomial, Michigan Math. J. 11 (1964),
257–262.
[189] Yu.I. Manin, Rational points of algebraic curves over function fields, Izv. Akad. Nauk SSSR Ser.
Mat 27 (1963), 1395–1440; Amer. Math. Soc., Transl., II. Ser. 50 (1966), 189–234.
[190] Yu.I. Manin, Cubic Forms. Algebra, Geometry, Arithmetic. Translated from the Russian by
M. Hazewinkel. Second edition. North-Holland Mathematical Library, Vol. 4. North-Holland
Publishing Co., Amsterdam 1986. x+326 pp.
[191] H.B. Mann, On linear relations between roots of unity, Mathematika 12 (1965), 107–117.
[192] R.C. Mason, The hyperelliptic equation over function fields, Math. Proc. Cambridge Philos.
Soc. 93 (1983), 219–230.
[193] R.C. Mason, Diophantine Equations over Function Fields. London Math. Soc. Lecture Note
Ser. 96. Cambridge University Press 1984. x+125 pp.
[194] D. Masser, On abc and discriminants, Proc. Amer. Math. Soc. 130, No. 11 (2002), 3141–3150.
[195] D. Masser and G. Wüstholz, Fields of large transcendence degree generated by values of elliptic
functions, Invent. Math. 72 (1983), 407–464.
[196] W.S. Massey, Algebraic Topology: An Introduction. Reprint of the 1967 edition. Graduate Texts
in Mathematics 56. Springer-Verlag, New York–Heidelberg 1977. xxi+261 pp.
[197] H. Matsumura, Commutative Algebra. Second edition. Mathematics Lecture Note Series 56.
W.A. Benjamin, Inc., New York 1970. xii+262 pp. Benjamin/Cummings Publishing Co., Inc.,
Reading MA 1980. xv+313 pp.
[198] R.B. McFeat, Geometry of numbers in adele spaces, Dissertationes Math. Rozprawy Mat. 88
(1971), 49 pp.
[199] M. McQuillan, Division points on semiabelian varieties, Invent. Math. 120 (1995), 143–159.
[200] M. Mignotte, Sur l’Équation de Catalan, C. R. Acad. Sci. Paris Sér. I Math. 314 (1992), 165–
168.
[201] M. Mignotte, Y. Roy, Minorations pour l’équation de Catalan. C. R. Acad. Sci. Paris Sér. I
Math. 324 (1997), 377–380.
[202] P. Mihăilescu, A class number free criterion for Catalan’s conjecture, J. Number Th. 99 (2003),
225–231.
[203] P. Mihǎilescu, Primary cyclotomic units and a proof of Catalan’s conjecture, J. reine angew.
Math. 572 (2004), 167–195.
[204] J.S. Milne, Abelian varieties, in Arithmetic Geometry, 103–150. Cornell and Silverman (eds),
papers from the conference at University of Connecticut, Storrs Conn. 1984. Springer, New
York 1986.
[205] J.S. Milne, Jacobian varieties, in Arithmetic Geometry, 167–212. Cornell and Silverman (eds),
papers from the conference at University of Connecticut, Storrs Conn. 1984. Springer, New
York 1986.
[206] L. Mirsky, An Introduction to Linear Algebra. Oxford at the Clarendon Press 1955. xi+433 pp.
[207] L.J. Mordell, On the rational solutions of the indeterminate equations of the third and fourth
degrees. Proc. Cambridge Philos. Soc. 21 (1922), 179–192.
[208] A. Moriwaki, Arithmetic height functions over finitely generated fields, Invent. Math. 140
(2000), 101–142.
[209] T. Muir, The Theory of Determinants in the Historical Order of Development. Vol 4: The period
1880 to 1900. Macmillan & Co. Limited, St. Martin Street, London 1923. xxxi+508 pp.
References 629
[210] D. Mumford, The topology of normal singularities of an algebraic surface and a criterion for
simplicity, Publ. Math. IHES 9 (1961), 5–22.
[211] D. Mumford, A remark on Mordell’s conjecture, Amer. J. Math. 87 (1965), 1007–1016.
[212] D. Mumford, Abelian Varieties. Published for the Tata Institute of Fundamental Research Stud-
ies in Mathematics, No. 5. Oxford University Press, London 1970. viii+242 pp.
[213] D. Mumford, The Red Book of Varieties and Schemes. Second, expanded edition. Includes
the Michigan lectures (1974) on curves and their Jacobians. With contributions by Enrico Ar-
barello. Lecture Notes in Mathematics 1358. Springer-Verlag, Berlin 1999. x+306 pp.
[214] D. Mumford, Tata Lectures on Theta. I: Introduction and Motivation: Theta Functions in One
Variable. Basic Results on Theta Functions in Several Variables. II: Jacobian Theta Functions
and Differential Equations. III (with M. Nori, P. Norman). Progr. Math. 28, 43, 97. Birkhäuser
1983, 1984, 1991. xiii+235 pp., xiv+272 pp., viii+202 pp.
[215] W. Narkiewicz, Elementary and Analytic Theory of Algebraic Numbers. Second Edition. PWN–
Polish Scientific Publishers and Springer-Verlag, Warszawa 1990. xiv+746 pp.
[216] A. Néron, Problèmes arithmétiques et géométriques rattachés à la notion de rang d’une courbe
algébrique dans un corps, Bull. Soc. Math. France 80 (1952), 101–166.
[217] A. Néron, Modèles minimaux des variétés abéliennes sur les corps locaux et globaux, Publ.
Math. IHES 21 (1964), 361–482.
[218] A. Néron, Quasi-fonctions et hauteurs sur les variétés abéliennes, Ann. of Math. (2) 82 (1965),
249–331.
[219] R. Nevanlinna, Zur Theorie der meromorphen Funktionen, Acta Math 46 (1925), 1–99.
[220] R. Nevanlinna, Über Riemannsche Flächen mit endlich viele Windungspunkten, Acta Math. 58
(1932), 295–373.
[221] R. Nevanlinna, Eindeutige Analytische Funktionen. Zweite Auflage, Reprint. Grundlehren der
mathematischen Wissenschaften 46. Springer-Verlag, Berlin-New York 1974. x+379 pp.
[222] A. Nitaj, La conjecture abc , Enseign. Math. (2) 42, No. 1–2 (1996), 3–24.
[223] E.I. Nochka, Defect relations for meromorphic curves (Russian) Izv. Akad. Nauk Moldav. SSR
Ser. Fiz.-Tekhn. Mat. Nauk No. 1 (1982), 41–47, 79.
[224] E.I. Nochka, On the theory of meromorphic functions, Sov. Math., Dokl 27 (1983), 377–381;
transl. from Dokl. Akad. Nauk SSSR 269 (1983), 547–552.
[225] J. Noguchi, A higher dimensional analogue of Mordell’s conjecture over function fields, Math.
Ann. 258 (1981), 207–212.
[226] J. Noguchi, Nevanlinna–Cartan theory over function fields and a diophantine equation, J. reine
angew. Math. 487 (1997), 61–83. Correction: ibid. 497 (1998), 235.
[227] D.G. Northcott, An inequality in the theory of arithmetic varieties, Proc. Cambridge Philos.
Soc. 45 (1949), 502–509.
[228] D.G. Northcott, A further inequality in the theory of arithmetic varieties, Proc. Cambridge
Philos. Soc. 45 (1949), 510–518.
[229] J. Oesterlé, Nouvelles approches du “théorème” de Fermat, Séminaire Bourbaki, Exposé 694,
Vol. 1987/88, Astérisque 161/162 (1988), 165–186.
[230] A. Ogg, Elliptic curves and wild ramification, Amer. J. Math. 89 (1967), 1–21.
[231] Ch.F. Osgood, A number theoretic-differential equations approach to generalizing Nevanlinna
theory, Indian J. Math. 23 (1981), 1–15.
[232] Ch.F. Osgood, Sometimes effective Thue–Siegel–Roth–Schmidt–Nevanlinna bounds, or better,
J. Number Th. 21 (1985), 347–389.
[233] O. Perron, Die Lehre von den Kettenbrüchen. Bd. I. Elementare Kettenbrüche. 3. Aufl., Teubner
Verlagsgesellschaft, Stuttgart 1954, vi+194 pp.
[234] E. Peyre, Points de hauteur bornée, topologie adélique et mesures de Tamagawa, 22nd Journées
Arithmétiques (Lille 2001), J. Théor. Nombres Bordeaux 15, No. 1 (2003), 319–349.
[235] B. Poonen, Mordell–Lang plus Bogomolov, Invent. Math. 137 (1999), 413–425.
630 REFERENCES
[236] A. Prékopa, On logarithmic concave measures and functions, Acta Sci. Math. (Szeged) 34
(1973), 335–343.
[237] M. Raynaud, Courbes sur une variété abélienne et points de torsion, Invent. Math. 71 (1983),
207–233.
[238] M. Raynaud, Sous-variétés d’une variété abélienne et points de torsion, in Arithmetic and Ge-
ometry I, 327–352. J. Coates and S. Helgason (eds), Progr. Math. 35, Birkhäuser Boston
Inc. 1983.
[239] L. Rédei, Algebra. Vol. 1. International Series of Monographs in Pure and Applied Mathematics
91. Oxford: Pergamon Press 1967. xviii+823 pp.
[240] G. Rémond, Décompte dans une conjecture de Lang, Invent. Math. 142 (2000), 513–545.
[241] G. Rémond, Sur le théorème du produit, 21st Journées Arithmétiques (Rome, 2001), J. Théor.
Nombres Bordeaux 13, No. 1 (2001), 287–302.
[242] G. Rémond, Approximation diophantienne sur les variétés semi-abéliennes, Ann. Sci. Éc.
Norm. Sup. (4) 36, No. 2 (2003), 191–212.
[243] P. Ribenboim, Catalan’s Conjecture. Are 8 and 9 the Only Consecutive Powers? Academic
Press, Inc., Boston MA 1994. xvi+364 pp.
[244] P. Roquette, Analytic Theory of Elliptic Functions over Local Fields. Hamburger mathematis-
che Einzelschriften (N.F.), Heft 1. Vandenhoeck & Ruprecht, Göttingen 1970. 90 pp.
[245] J.B. Rosser and L. Schoenfeld, Approximate formulas for some functions of prime numbers,
Illinois J. Math. 6 (1962), 64–94.
[246] D. Roy, Approximation to real numbers by cubic algebraic integers. II, Ann. of Math. (2) 158
(2003), 1081–1087.
[247] D. Roy and J. L. Thunder, An absolute Siegel’s lemma, J. reine angew. Math. 476 (1996), 1–26.
[248] D. Roy and J. L. Thunder, Addendum and erratum to: “an absolute Siegel’s lemma”, J. reine
angew. Math. 508 (1999), 47–51.
[249] M. Ru, Nevanlinna Theory and its Relation to Diophantine Approximation. World Scientific
Publishing Co., Inc., River Edge NJ 2001. xiv+323 pp.
[250] M. Ru and P. Vojta, Schmidt’s subspace theorem with moving targets, Invent. Math. 127 (1997),
51–65.
[251] W. Rudin, Functional Analysis. McGraw-Hill Series in Higher Mathematics. McGraw-Hill
Book Co., New York–Düsseldorf–Johannesburg 1973. xiii+397 pp. Second edition. Interna-
tional Series in Pure and Applied Mathematics. McGraw-Hill, Inc., New York 1991. xviii+424
pp.
[252] W. Rudin, Real and Complex Analysis. Third edition. McGraw-Hill Book Co., New York 1987.
xiv+416 pp.
[253] W. Rudin, Fourier Analysis on Groups. Reprint of the 1962 original. Wiley-Interscience, New
York 1990. ix+285 pp.
[254] R. Rumely, On Bilu’s equidistribution theorem, in Spectral Problems in Geometry and Arith-
metic. Iowa City IA 1997, 159–166. Contemp. Math. 237, AMS, Providence RI 1999.
[255] C. Runge, Ueber ganzzahlige Lösungen von Gleichungen zwischen zwei Veränderlichen, J.
reine angew. Math. 100 (1887), 425–435.
[256] T. Saito, Conductor, discriminant, and the Noether formula of arithmetic surfaces, Duke Math.
J. 57 (1988), 151–173.
[257] S.H. Schanuel, Heights in number fields, Bull. Soc. Math. France 107 (1979), 433–449.
[258] A. Schinzel, On the product of the conjugates outside the unit circle of an algebraic number,
Acta Arith. 24 (1973), 385–399. Addendum ibid. 26 (1974/75), 329–331.
[259] A. Schinzel, Polynomials with Special Regard to Reducibility. With an Appendix by Umberto
Zannier. Encyclopedia of Mathematics and its Applications 77. Cambridge University Press,
Cambridge 2000. x+558 pp.
[260] H.P. Schlickewei, S -unit equations over number fields, Invent. Math. 102 (1990), 95–107.
References 631
[261] H.P. Schlickewei, The quantitative subspace theorem for number fields, Compos. Math. 82
(1992), 245–273.
[262] H.P. Schlickewei, Equations in roots of unity, Acta Arith. 76 (1996), 99–108.
[263] W.M. Schmidt, On heights of algebraic subspaces and diophantine approximations, Ann. of
Math. (2) 85 (1967), 430–472.
[264] W.M. Schmidt, On simultaneous approximations of two algebraic numbers by rationals, Acta
Math. 119 (1967), 27–50.
[265] W.M. Schmidt, Asymptotic formulae for point lattices of bounded determinant and subspaces
of bounded height, Duke Math. J. 35 (1968), 327–339.
[266] W.M. Schmidt, Simultaneous approximation to algebraic numbers by rationals, Acta Math. 125
(1970), 189–201.
[267] W.M. Schmidt, Norm form equations, Ann. of Math. (2) 96 (1972), 526–551.
[268] W.M. Schmidt, Diophantine Approximation. Lecture Notes in Mathematics 785. Springer-
Verlag, Berlin 1980. x+299 pp.
[269] W.M. Schmidt, The subspace theorem in Diophantine approximations, Compos. Math. 69
(1989), 121–173.
[270] W.M. Schmidt, Eisenstein’s theorem on power series expansions of algebraic functions, Acta
Arith. 56 (1990), 161–179.
[271] W.M. Schmidt, Diophantine Approximations and Diophantine Equations. Lecture Notes in
Mathematics 1467. Springer-Verlag, Berlin 1991. viii+217 pp.
[272] W.M. Schmidt, Heights of algebraic points lying on curves or hypersurfaces, Proc. Amer. Math.
Soc. 124, No. 10 (1996), 3003–3013.
[273] W.M. Schmidt, Heights of points on subvarieties of Gn m , in Number Theory (Paris, 1993–
1994), 157–187. London Math. Soc. Lecture Note Ser. 235. Cambridge University Press, Cam-
bridge 1996.
[274] H. Seifert and W. Threlfall, Lehrbuch der Topologie. B.G. Teubner VII, Leipzig und Berlin
1934. Also Herbert Seifert and William Threlfall, Seifert and Threlfall: A Textbook of Topology.
Translated from the German edition of 1934 by M.A. Goldman. With a preface by J.S. Birman.
With Topology of 3 -dimensional Fibered Spaces by Seifert. Translated from the German by
Wolfgang Heil. Pure and Applied Mathematics 89. Academic Press, Inc.. New York–London
1980. xvi+437 pp.
[275] J.-P. Serre, Géométrie algébrique et géométrie analytique, Ann. Inst. Fourier 6 (1956), 1–42.
[276] J.-P. Serre, Corps Locaux. Deuxième édition. Publications de l’Université de Nancago, No.
VIII. Hermann, Paris 1968. 245 pp.
[277] J.-P. Serre, Lectures on the Mordell–Weil Theorem. Translated from the French and edited
by Martin Brown from notes by Michel Waldschmidt. Aspects of Mathematics, E15. Friedr.
Vieweg & Sohn, Braunschweig 1989. x+218 pp.
[278] J.-P. Serre, Galois Cohomology. Translated from French by P. Ion. Springer-Verlag, Berlin
1997. x+210 pp.
[279] I.R. Shafarevich, Basic Algebraic Geometry. 1. Varieties in Projective Space. Second edition.
Translated from the 1988 Russian edition and with notes by Miles Reid. Springer-Verlag, Berlin
1994. xx+303 pp.
[280] I.R. Shafarevich, Basic Algebraic Geometry. 2. Schemes and Complex Manifolds. Second edi-
tion. Translated from the 1988 Russian edition by Miles Reid. Springer-Verlag, Berlin 1994.
xiv+269 pp.
[281] T. Shimizu, On the theory of meromorphic functions, Japanese Journ. of Math. 6 (1929), 119–
171.
[282] X, The integer solutions of the equation y 2 = axn + bxn−1 + . . . + k , J. London Math. Soc.
1 (1926), 66–68. Also Gesammelte Abhandlungen, Bd. I, Springer-Verlag, Berlin-Heidelberg-
New York 1966, 207–208.
632 REFERENCES
[283] C.L. Siegel, Über einige Anwendungen diophantischer Approximationen, Abh. Preuß. Akad.
Wissen. Phys.-math. Klasse 1929, Nr. 1. Also Gesammelte Abhandlungen, Bd. I, Springer-
Verlag, Berlin–Heidelberg–New York 1966, 209–274.
[284] J.H. Silverman, The Arithmetic of Elliptic Curves. Graduate Texts in Mathematics 106.
Springer-Verlag, New York 1986. xii+400 pp.
[285] J.H. Silverman, Rational points on K3 surfaces: A new canonical height, Invent. Math. 105
(1991), 347–373.
[286] J.H. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves. Graduate Texts in Math-
ematics 151. Springer-Verlag, New York 1994. xiv+525 pp.
[287] C.J. Smyth, On the product of the conjugates outside the unit circle of an algebraic integer,
Bull. London Math. Soc. 3 (1971), 169–175.
[288] C.J. Smyth, On the measure of totally real algebraic numbers, I, J. Austral. Math. Soc. Ser. A
30 (1980/81), 137–149; II, Math. Comp. 37 (1981), 205–208.
[289] V.G. Sprindžuk, Arithmetic specializations in polynomials, J. reine angew. Math. 340 (1983),
26–52.
[290] N. Steinmetz, Eine Verallgemeinerung des zweiten Nevanlinnaschen Hauptsatzes, J. reine
angew. Math. 368 (1986), 134–141.
[291] C.L. Stewart and R. Tijdeman, On the Oesterlé–Masser conjecture, Monatsh. Math. 102 (1986),
251–257.
[292] W. Stoll, Value Distribution on Parabolic Spaces. Lecture Notes in Mathematics 600. Springer-
Verlag, Berlin-New York 1977. viii+216 pp.
[293] W. W. Stothers, Polynomial identities and Hauptmoduln, Quart. J. Math. Oxford Ser. (2) 32
(1981), 349–370.
[294] T. Struppeck and J.D. Vaaler, Inequalities for heights of algebraic subspaces and the Thue–
Siegel principle, in Analytic Number Theory (Allerton Park IL 1989), 493–528. Progr. Math.
85, Birkhäuser Boston, Boston MA 1990.
[295] L. Szpiro (ed.), Séminaire sur les pinceaux arithmétiques: la conjecture de Mordell. Papers
form the seminar held at the École Normale Supériore, Paris 1983–84. Astérisque 127. Soc.
Math. France, Paris 1985. x+287 pp.
[296] L. Szpiro, E. Ullmo, and S. Zhang, Equirépartition des petits points, Invent. Math. 127 (1997),
337–347.
[297] R. Taylor and A. Wiles, Ring-theoretic properties of certain Hecke algebras, Ann. of Math. (2)
141 (1995), 553–572.
[298] A. Thue, Über Annäherungswerte algebraischer Zahlen, J. reine angew. Math. 135 (1909), 284–
305. Also Selected Mathematical Papers of Axel Thue. With an introduction by Carl Ludwig
Siegel. Universitetsforlaget, Oslo-Bergen-Tromsø 1977, 232–253.
[299] J.L. Thunder, Asymptotic estimates for rational points of bounded height on flag varieties,
Compos. Math. 88, No. 2 (1993), 155–186.
[300] R. Tijdeman, On the equation of Catalan, Acta Arith. 29 (1976), 197–209.
[301] E. Ullmo, Positivité et discrétion des points algébriques des courbes, Ann. of Math. (2) 147
(1998), 167–179.
[302] J.D. Vaaler, A geometric inequality with applications to linear forms, Pacific J. Math. 83 (1979),
543–553.
[303] M. van der Put, The product theorem, in Diophantine Approximation and Abelian Varieties.
Introductory Lectures, 77–82. Papers from the Conference held in Soesterberg, April 12–16,
1992. B. Edixhoven and J.-H. Evertse (eds), Lecture Notes in Mathematics 1566. Springer-
Verlag, Berlin 1993.
[304] M. van Frankenhuysen, A lower bound in the abc conjecture, J. Number Th. 82 (2000), 91–95.
[305] M. van Frankenhuysen, The ABC conjecture implies Vojta’s height inequality for curves, J.
Number Th. 95 (2002), 289–302.
References 633
[306] P. Vojta, A Diophantine conjecture over Q , in Séminaire de théorie des nombres, Paris 1984–
85, 241–250. Progr. Math. 63, Birkhäuser Boston, Boston MA 1986.
[307] P. Vojta, Diophantine Approximations and Value Distribution Theory. Lecture Notes in Mathe-
matics 1239. Springer-Verlag, Berlin 1987. x+132 pp.
[308] P. Vojta, Mordell’s conjecture over function fields, Invent. Math. 98 (1989), 115–138.
[309] P. Vojta, A refinement of Schmidt’s subspace theorem, Amer. J. Math 111, No. 3 (1989), 489–
518.
[310] P. Vojta, Siegel’s theorem in the compact case, Ann. of Math. (2) 133 (1991), 509–548.
[311] P. Vojta, On algebraic points on curves, Compos. Math. 78, No. 1 (1991), 29–36.
[312] P. Vojta, A generalization of theorems of Faltings and Thue–Siegel–Roth–Wirsing, J. Amer.
Math. Soc. 5 (1992), 763–804.
[313] P. Vojta, Applications of arithmetic algebraic geometry to Diophantine approximations, in
Arithmetic Algebraic Geometry, 164–208. E. Ballico (ed.), lectures from the Second C.I.M.E.
Session held in Trento, 1991. Lecture Notes in Mathematics 1553. Springer-Verlag, Berlin
1993.
[314] P. Vojta, Integral points on subvarieties of semiabelian varieties, I, Invent. Math. 126, No. 1
(1996), 133–181.
[315] P. Vojta, Roth’s theorem with moving targets, Internat. Math. Res. Notices (1996), 109–114.
[316] P. Vojta, On Cartan’s theorem and Cartan’s conjecture, Amer. J. Math. 119 (1997), 1–17.
[317] P. Vojta, A more general abc conjecture, Int. Math. Res. Notices (1998), 1103–1116.
[318] P. Vojta, Nevanlinna theory and diophantine approximation, in Several Complex Variables,
535–564. M. Schneider (ed.) et al., Berkeley CA 1995–96, MSRI Publ. 37, Cambridge Univer-
sity Press, Cambridge 1999.
[319] J.F. Voloch, Diagonal equations over function fields, Bol. Soc. Brasil. Mat. 16 (1985), 29–39.
[320] J.T.-Y. Wang, The truncated second main theorem of function fields, J. Number Th. 58 (1996),
139–157.
[321] F. Warner, Foundations of Differentiable Manifolds and Lie Groups. Corrected reprint of the
1971 edition. Graduate Texts in Mathematics 94, Springer-Verlag, New York 1983. ix+272 pp.
[322] L.C. Washington, Introduction to Cyclotomic Fields. Second edition. Graduate Texts in Math-
ematics 83. Springer-Verlag, New York 1997. xiv+487 pp.
[323] W.C. Waterhouse, Introduction to Affine Group Schemes. Graduate Texts in Mathematics 66.
Springer-Verlag, New York–Berlin 1979. xi+164 pp.
[324] A. Weil, L’arithmétique sur les courbes algébriques, Acta Math. 52 (1929), 281–315. Also
Œuvres Scientifiques – Collected Papers. Vol. I, Corrected Second Printing, Springer-Verlag,
New York–Heidelberg–Berlin 1980, 11–45.
[325] A. Weil, Sur un théorème de Mordell, Bull. Sc. Math. (2) 54 (1929), 182–191. Also Œuvres
Scientifiques–Collected Papers. Vol. I, Corrected Second Printing, Springer-Verlag, New York–
Heidelberg–Berlin 1980, 47–56.
[326] A. Weil, Arithmétique et Géométrie sur les Variétés Algébriques. Actualités Scientifiques et
Industrielles, No. 206. Hermann, Paris 1935. 3–16. Also Œuvres Scientifiques–Collected Pa-
pers. Vol. I, Corrected Second Printing, Springer-Verlag, New York–Heidelberg–Berlin 1980,
87–100.
[327] A. Weil, Variétes Abéliennes et Courbes Algébriques. Hermann, Paris 1948. 163 pp.
[328] A. Weil, Arithmetic on algebraic varieties, Ann. of Math. (2) 53 (1951), 412–444. Also Œuvres
Scientifiques–Collected Papers. Vol. I, Corrected Second Printing, Springer-Verlag, New York–
Heidelberg–Berlin 1980, 454–486.
[329] R. Weissauer, Der Hilbertsche Irreduzibilitätssatz, J. reine angew. Math. 333 (1982), 203–220.
[330] A. Weitsman, A theorem on Nevanlinna deficiencies, Acta Math. 128 (1972), 41–52.
[331] A. Wiles, Modular elliptic curves and Fermat’s Last Theorem, Ann. of Math. (2) 141 (1995),
443–551.
634 REFERENCES
[332] P.M. Wong, On the second main theorem of Nevanlinna theory, Amer. J. Math. 111 (1989),
549–583.
[333] K. Yamanoi, The second main theorem for small functions and related problems, Acta Math.
192 (2004), 225–294.
[334] Z. Ye, A sharp form of Nevanlinna’s second main theorem of several complex variables, Math.
Z. 222 (1996), 81–95.
[335] K. Yu, p -adic logarithmic forms and group varieties. I, J. reine angew. Math. 502 (1998), 29–
92; II, Acta Arithmetica 89 (1999), 337–378.
[336] D. Zagier, Algebraic numbers close to both 0 and 1, Math. Comp. 61 (1993), 485–491.
[337] U. Zannier, Some Applications of Diophantine Approximation to Diophantine Equations, with
Special Emphasis on the Schmidt Subspace Theorem. Forum, Editrice Universitaria Udinese,
Udine 2003. 69 pp.
[338] O. Zariski, P. Samuel, Commutative Algebra. Vol. II. Reprint of the 1960 edition. Graduate
Texts in Mathematics 29. Springer-Verlag, New York–Heidelberg–Berlin 1975. x+414 pp.
[339] S. Zhang, Positive line bundles on arithmetic surfaces, Ann. of Math. (2) 136 (1992), 569–587.
[340] S. Zhang, Positive line bundles on arithmetic varieties, J. Amer. Math. Soc. 8 (1995), 187–221.
[341] S. Zhang, Small points and adelic metrics, J. Alg. Geom. 4 (1995), 281–300.
[342] S. Zhang, Equidistribution of small points on abelian varieties, Ann. of Math. 147 (1998), 159–
165.
[343] A. Zygmund, Trigonometric Series. Vols I, II. Third edition. With a foreword by Robert A.
Fefferman. Cambridge Mathematical Library. Cambridge University Press, Cambridge 2002.
xii; Vol. I: xiv+383 pp.; Vol. II: viii+364 pp.
Glossary of Notation
635
636 Glossary of notation
p (f )
p -norm of the coefficients of the polynomial f , 24
[f ]p
p -norm of the hypercube representation, 31
sh(d, e) shuffle of type (d, e), 31
D = (sD ; L, s; M, t) presentation of Cartier divisor D, 35
λD (P ) local height of P relative to D, 35
λf (P ) local height relative to the rational function f , 36
λD (P, v), λD (P, u) local heights relative to v ∈ MF and u ∈ MK , 39
hλ (P ) global height of P relative to local height λ, 40
hc class of height functions relative to c ∈ Pic(X), 41
hϕ Weil height relative to morphism ϕ, 43
ϕ#ψ join of morphisms ϕ, ψ to projective spaces, 43
d(p), h(p) degree and height of presentation p, 47
|α| α0 + · · · + αN for α ∈ NN +1 , 49
xα xα0 0 · · · xαNN for x ∈ RN +1 , 49
L = (L, ) metrized line bundle, 58
O(D)
D, Néron divisor and its metrized line bundle, 61, 62
λD (P ) local height of P relative to D, 61
λD (P, u), λD (P, v) local heights relative to v ∈ MF and u ∈ MK , 62, 64
hL (P ) global height of P relative to L, 64
Hu (x), Hu (A) Arakelov norm for vector x and matrix A, 66, 67
hAr , HAr Arakelov height, multiplicative version, 66, 67, 69
∧m W m th exterior power of vector space W , 67
At transpose of the matrix A, 69
Hu∗ (A) dual local height, 69
ηu (ψ) distorsion factor of ψ ∈ P GL(n + 1, Qu ), 69
δu (x, y) projective distance, 69
row
HAr (A) multiplicative Arakelov height with respect to rows, 75
H(A) multiplicative height of matrix A, 75
IM M × M unit matrix, 78
G := (Gnm , ·, 1n ) multiplicative algebraic group AnK \ {0}, 82
ej standard basis of Zn , 83
ϕA (x) (xAe1 , . . . , xAen ) for integer matrix A, 83
GL(n, Z), SL(n, Z) general and special linear group over Z, 83
&
Λ division group of subgroup Λ in Zn , 83
Glossary of notation 637
K[[x]]
ring
of formal power series, 362
aα xα r |aα |rα , 362
K r−1 x {f ∈ K[[x]] | f r < ∞}, 362
ρr (f ), ρ(f ) spectral norm, for r = (1, . . . , 1), 362, 364
K{r−1 x} strictly convergent power series, 364
∂α f (x0 ) Taylor coefficients at x0 , 366
f˙ ∂
∂t
f , 368
1 i
∂i i!
∂ , 371
(i∗1 , i∗2 ) admissible pair in (P, Q) ∈ C × C, 372
NL (X, T ) number of points in X(K) with HL (P ) ≤ T , 392
RK , wK , hK regulator, number of roots of unity, class number, 392
ζK zeta function of number field K, 392
NAr (X, T ) NO(1) (X, T ) for the Arakelov height, 393
Nlines (X, T ) contributions
of points on rational lines, 395
rad(N ) p|N p, radical of N ∈ N, 402
π1 (X, x0 ) fundamental group of X, 411
Γ\Y quotient of Y by left action of Γ, 411
H upper half plane in C, 413
rad(f ) (x − α), radical of f ∈ K[x], 416
f (α)=0
ϑP p∈P log p, 420
Ψ(N, P), Ψodd (N, P) number of (odd) n ≤ N with all prime factors in P, 420
E reduction of elliptic curve E, 426
cond(E) conductor of elliptic curve E, 429
ex/π(x) ramification index of x over π(x), 436
D open unit disc in C, 437
F +, F − max{F, 0}, − min{F, 0}, 445
n(r, a, f ) enumarating function for r > 0, a ∈ P1an , 446
N (r, a, f ), m(r, a, f ) counting and proximity function, 446, 447
c(f, 0) leading coefficient of the Laurent series at 0, 446
T (r, f ) Nevanlinna’s characteristic function of f , 448
ρ(f ) order of a meromorphic function f , 450
W (f0 , . . . , fn ) Wronskian of entire functions f0 , . . . , fn , 450, 470
Nram (r, f ) counting function of ramification of f , 450
Nram (r, a, f ) counting function of ramification over a, 451
δ(a, f ), θ(a, f ) defect and ramification defect of f , 452
N (1) (r, a, f ), cond(r, f ) truncated counting function and conductor, 455
k(z1 , z2 ) chordal distance on the Riemann sphere, 457
ω Fubini–Study form on projective space, 457, 467
◦
m(r, a, f ) spherical proximity function, 458
◦
T (r, f ) Ahlfors–Shimizu characteristic, 458
ordY (D) multiplicity of divisor D in component Y , 466
Nf,D (r), mf,D counting function and proximity function, 466, 476
Tf,D (r), Tf,L (r) characteristic function for holomorphic curve f , 466, 469, 476
Nf,ram counting function for ramification of f , 470
Rf ramification divisor of holomorphic map f , 473
640 Glossary of notation
(1)
Nf,D (r) truncated counting function, 475
mS (a, β), NS (a, β) proximity and counting function for a ∈ K, 481
NS,D , mS,D counting and proximity function for divisor D, 483, 499
dK , d(P ) absolute logarithmic discriminant, 487, 499
condK λ (P ) conductor of P , 489, 499
fin
MK discrete valuations of number field K, 489
χ characteristic function for (0, ∞), 489
NS,ram counting of ramification in function field case, 505
(n)
ND (P ) higher truncated counting function, 508
AK , x1 , . . . , xn
n
affine n-space over K, fixed coordinates, 514
Z(T ) zero set of T ⊂ K[x] in AnK , 514
I(Y ) ideal of vanishing of Y ⊂ AnK , 515
K[X]
√ coordinate ring of affine variety X, 515
J radical of ideal J, 515
ϕ composition with morphism ϕ, 515, 521, 574
OX,x , mx local ring of x ∈ X and maximal ideal, 516, 522
mx {f ∈ K[X] | f (x) = 0}, 516
X(L) L-rational points of X, 516, 522
K(x) residue field of OX,x , 516, 522
Spec(A) Spectrum of ring A, 516
XF base change of X to extension F/K, 517, 522
dim(T ) dimension of topological space T , 518
dim(A) Krull dimension of commutative ring A, 518
GL(n)K , GL(n, L) invertible matrices as K-variety, with entries in L, 519
ρU
V restriction map from U to V for presheaf, 520
OX sheaf of regular function on variety X, 521
JY ideal sheaf, 522
codim(B, X) codimension of B in X, 523
K(X) function field of X, 524
πE : E → X vector bundle E over X, 525
gαβ transition function of E, 525
Ex fibre of E over x, 526
E ⊕ E, E ⊗ E direct sum and tensor product of vector bundles, 527, 528
Γ(U, E) space of sections of E over U , 527
E sheaf of sections of vector bundle E, 527
OX trivial bundle of rank 1 over X, 528
E ∗ , E ∧ E , Hom(E, E ) dual, wedge product and homomorphisms, 528
E/F quotient vector bundle by subbundle F , 529
ϕ∗ (E), E|X pull-back and restriction of E, 529
L⊗n , L−1 tensor power and dual of line bundle L, 529
c = cl(L) ∈ Pic(X) Picard group and class of L, 529
ϕ∗ (s) pull-back of section, 530
PnK , (x0 : · · · : xn ) projective space, fixed homogeneous coordinates, 530
I(Y ) homogeneous ideal of projective variety Y , 531
S(Y ) = K[x]/I(Y ) homogeneous coordinate ring of Y , 531
OPnK (m) tensor powers of the tautological line bundle, 532
Glossary of notation 641
643
644 Index